block-backend: Fix race when resuming queued requests

[PATCH] block-backend: Fix race when resuming queued requests

Posted by Kevin Wolf 2 months, 2 weeks ago

When new requests arrive at a BlockBackend that is currently drained,
these requests are queued until the drain section ends.

There is a race window between blk_root_drained_end() waking up a queued
request in an iothread from the main thread and blk_wait_while_drained()
actually being woken up in the iothread and calling blk_in_flight(). If
the BlockBackend is drained again during this window, drain won't wait
for this request and it will sneak in when the BlockBackend is already
supposed to be quiesced. This causes assertion failures in
bdrv_drain_all_begin() and can have other unintended consequences.

Fix this by increasing the in_flight counter immediately when scheduling
the request to be resumed so that the next drain will wait for it to
complete.

Cc: qemu-stable@nongnu.org
Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/block-backend.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f8d6ba65c1..d6df369188 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1318,9 +1318,9 @@ static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
          * section.
          */
         qemu_mutex_lock(&blk->queued_requests_lock);
+        /* blk_root_drained_end() has the corresponding blk_inc_in_flight() */
         blk_dec_in_flight(blk);
         qemu_co_queue_wait(&blk->queued_requests, &blk->queued_requests_lock);
-        blk_inc_in_flight(blk);
         qemu_mutex_unlock(&blk->queued_requests_lock);
     }
 }
@@ -2767,9 +2767,11 @@ static void blk_root_drained_end(BdrvChild *child)
             blk->dev_ops->drained_end(blk->dev_opaque);
         }
         qemu_mutex_lock(&blk->queued_requests_lock);
-        while (qemu_co_enter_next(&blk->queued_requests,
-                                  &blk->queued_requests_lock)) {
+        while (!qemu_co_queue_empty(&blk->queued_requests)) {
             /* Resume all queued requests */
+            blk_inc_in_flight(blk);
+            qemu_co_enter_next(&blk->queued_requests,
+                               &blk->queued_requests_lock);
         }
         qemu_mutex_unlock(&blk->queued_requests_lock);
     }
-- 
2.51.1

Re: [PATCH] block-backend: Fix race when resuming queued requests

Posted by Fiona Ebner 2 months, 2 weeks ago

Am 19.11.25 um 6:27 PM schrieb Kevin Wolf:
> When new requests arrive at a BlockBackend that is currently drained,
> these requests are queued until the drain section ends.
> 
> There is a race window between blk_root_drained_end() waking up a queued
> request in an iothread from the main thread and blk_wait_while_drained()
> actually being woken up in the iothread and calling blk_in_flight(). If

Small typo here: blk_inc_in_flight()

> the BlockBackend is drained again during this window, drain won't wait
> for this request and it will sneak in when the BlockBackend is already
> supposed to be quiesced. This causes assertion failures in
> bdrv_drain_all_begin() and can have other unintended consequences.
> 
> Fix this by increasing the in_flight counter immediately when scheduling
> the request to be resumed so that the next drain will wait for it to
> complete.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>

Re: [PATCH] block-backend: Fix race when resuming queued requests

Posted by Andrey Drobyshev 2 months, 2 weeks ago

On 11/19/25 7:27 PM, Kevin Wolf wrote:
> When new requests arrive at a BlockBackend that is currently drained,
> these requests are queued until the drain section ends.
> 
> There is a race window between blk_root_drained_end() waking up a queued
> request in an iothread from the main thread and blk_wait_while_drained()
> actually being woken up in the iothread and calling blk_in_flight(). If
> the BlockBackend is drained again during this window, drain won't wait
> for this request and it will sneak in when the BlockBackend is already
> supposed to be quiesced. This causes assertion failures in
> bdrv_drain_all_begin() and can have other unintended consequences.
> 
> Fix this by increasing the in_flight counter immediately when scheduling
> the request to be resumed so that the next drain will wait for it to
> complete.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/block-backend.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)

I can confirm that the crash is no longer reproducible with this fix
applied.  Thanks for looking into this!

Tested-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

Re: [PATCH] block-backend: Fix race when resuming queued requests

Posted by Hanna Czenczek 2 months, 2 weeks ago

On 19.11.25 18:27, Kevin Wolf wrote:
> When new requests arrive at a BlockBackend that is currently drained,
> these requests are queued until the drain section ends.
>
> There is a race window between blk_root_drained_end() waking up a queued
> request in an iothread from the main thread and blk_wait_while_drained()
> actually being woken up in the iothread and calling blk_in_flight(). If
> the BlockBackend is drained again during this window, drain won't wait
> for this request and it will sneak in when the BlockBackend is already
> supposed to be quiesced. This causes assertion failures in
> bdrv_drain_all_begin() and can have other unintended consequences.
>
> Fix this by increasing the in_flight counter immediately when scheduling
> the request to be resumed so that the next drain will wait for it to
> complete.
>
> Cc: qemu-stable@nongnu.org
> Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/block-backend.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>