[PATCH] block-backend: Fix race when resuming queued requests

Kevin Wolf posted 1 patch 2 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251119172720.135424-1-kwolf@redhat.com
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>
block/block-backend.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
[PATCH] block-backend: Fix race when resuming queued requests
Posted by Kevin Wolf 2 months, 2 weeks ago
When new requests arrive at a BlockBackend that is currently drained,
these requests are queued until the drain section ends.

There is a race window between blk_root_drained_end() waking up a queued
request in an iothread from the main thread and blk_wait_while_drained()
actually being woken up in the iothread and calling blk_in_flight(). If
the BlockBackend is drained again during this window, drain won't wait
for this request and it will sneak in when the BlockBackend is already
supposed to be quiesced. This causes assertion failures in
bdrv_drain_all_begin() and can have other unintended consequences.

Fix this by increasing the in_flight counter immediately when scheduling
the request to be resumed so that the next drain will wait for it to
complete.

Cc: qemu-stable@nongnu.org
Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/block-backend.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f8d6ba65c1..d6df369188 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1318,9 +1318,9 @@ static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
          * section.
          */
         qemu_mutex_lock(&blk->queued_requests_lock);
+        /* blk_root_drained_end() has the corresponding blk_inc_in_flight() */
         blk_dec_in_flight(blk);
         qemu_co_queue_wait(&blk->queued_requests, &blk->queued_requests_lock);
-        blk_inc_in_flight(blk);
         qemu_mutex_unlock(&blk->queued_requests_lock);
     }
 }
@@ -2767,9 +2767,11 @@ static void blk_root_drained_end(BdrvChild *child)
             blk->dev_ops->drained_end(blk->dev_opaque);
         }
         qemu_mutex_lock(&blk->queued_requests_lock);
-        while (qemu_co_enter_next(&blk->queued_requests,
-                                  &blk->queued_requests_lock)) {
+        while (!qemu_co_queue_empty(&blk->queued_requests)) {
             /* Resume all queued requests */
+            blk_inc_in_flight(blk);
+            qemu_co_enter_next(&blk->queued_requests,
+                               &blk->queued_requests_lock);
         }
         qemu_mutex_unlock(&blk->queued_requests_lock);
     }
-- 
2.51.1
Re: [PATCH] block-backend: Fix race when resuming queued requests
Posted by Fiona Ebner 2 months, 2 weeks ago
Am 19.11.25 um 6:27 PM schrieb Kevin Wolf:
> When new requests arrive at a BlockBackend that is currently drained,
> these requests are queued until the drain section ends.
> 
> There is a race window between blk_root_drained_end() waking up a queued
> request in an iothread from the main thread and blk_wait_while_drained()
> actually being woken up in the iothread and calling blk_in_flight(). If

Small typo here: blk_inc_in_flight()

> the BlockBackend is drained again during this window, drain won't wait
> for this request and it will sneak in when the BlockBackend is already
> supposed to be quiesced. This causes assertion failures in
> bdrv_drain_all_begin() and can have other unintended consequences.
> 
> Fix this by increasing the in_flight counter immediately when scheduling
> the request to be resumed so that the next drain will wait for it to
> complete.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Re: [PATCH] block-backend: Fix race when resuming queued requests
Posted by Andrey Drobyshev 2 months, 2 weeks ago
On 11/19/25 7:27 PM, Kevin Wolf wrote:
> When new requests arrive at a BlockBackend that is currently drained,
> these requests are queued until the drain section ends.
> 
> There is a race window between blk_root_drained_end() waking up a queued
> request in an iothread from the main thread and blk_wait_while_drained()
> actually being woken up in the iothread and calling blk_in_flight(). If
> the BlockBackend is drained again during this window, drain won't wait
> for this request and it will sneak in when the BlockBackend is already
> supposed to be quiesced. This causes assertion failures in
> bdrv_drain_all_begin() and can have other unintended consequences.
> 
> Fix this by increasing the in_flight counter immediately when scheduling
> the request to be resumed so that the next drain will wait for it to
> complete.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/block-backend.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)

I can confirm that the crash is no longer reproducible with this fix
applied.  Thanks for looking into this!

Tested-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Re: [PATCH] block-backend: Fix race when resuming queued requests
Posted by Hanna Czenczek 2 months, 2 weeks ago
On 19.11.25 18:27, Kevin Wolf wrote:
> When new requests arrive at a BlockBackend that is currently drained,
> these requests are queued until the drain section ends.
>
> There is a race window between blk_root_drained_end() waking up a queued
> request in an iothread from the main thread and blk_wait_while_drained()
> actually being woken up in the iothread and calling blk_in_flight(). If
> the BlockBackend is drained again during this window, drain won't wait
> for this request and it will sneak in when the BlockBackend is already
> supposed to be quiesced. This causes assertion failures in
> bdrv_drain_all_begin() and can have other unintended consequences.
>
> Fix this by increasing the in_flight counter immediately when scheduling
> the request to be resumed so that the next drain will wait for it to
> complete.
>
> Cc: qemu-stable@nongnu.org
> Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/block-backend.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>