[PATCH 07/16] nvme: Kick and check completions in BDS context

Hanna Czenczek posted 16 patches 2 weeks, 3 days ago
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, Ronnie Sahlberg <ronniesahlberg@gmail.com>, Peter Lieven <pl@dlhnet.de>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Ilya Dryomov <idryomov@gmail.com>, "Richard W.M. Jones" <rjones@redhat.com>, Stefan Weil <sw@weilnetz.de>
There is a newer version of this series
[PATCH 07/16] nvme: Kick and check completions in BDS context
Posted by Hanna Czenczek 2 weeks, 3 days ago
nvme_process_completion() must run in the main BDS context, so schedule
a BH for requests that aren’t there.

The context in which we kick does not matter, but let’s just keep kick
and process_completion together for simplicity’s sake.

(For what it’s worth, a quick fio bandwidth test indicates that on my
test hardware, if anything, this may be a bit better than kicking
immediately before scheduling a pure nvme_process_completion() BH.  But
I wouldn’t take more from those results than that it doesn’t really seem
to matter either way.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/nvme.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 8df53ee4ca..7ed5f570bc 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -481,7 +481,7 @@ static void nvme_trace_command(const NvmeCmd *cmd)
     }
 }
 
-static void nvme_deferred_fn(void *opaque)
+static void nvme_kick_and_check_completions(void *opaque)
 {
     NVMeQueuePair *q = opaque;
 
@@ -490,6 +490,18 @@ static void nvme_deferred_fn(void *opaque)
     nvme_process_completion(q);
 }
 
+static void nvme_deferred_fn(void *opaque)
+{
+    NVMeQueuePair *q = opaque;
+
+    if (qemu_get_current_aio_context() == q->s->aio_context) {
+        nvme_kick_and_check_completions(q);
+    } else {
+        aio_bh_schedule_oneshot(q->s->aio_context,
+                                nvme_kick_and_check_completions, q);
+    }
+}
+
 static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
                                 NvmeCmd *cmd, BlockCompletionFunc cb,
                                 void *opaque)
-- 
2.51.0


Re: [PATCH 07/16] nvme: Kick and check completions in BDS context
Posted by Kevin Wolf 2 weeks, 2 days ago
Am 28.10.2025 um 17:33 hat Hanna Czenczek geschrieben:
> nvme_process_completion() must run in the main BDS context, so schedule
> a BH for requests that aren’t there.
> 
> The context in which we kick does not matter, but let’s just keep kick
> and process_completion together for simplicity’s sake.

Ok, fair, move the main BDS context for calling these functions. But
doesn't that mean that we need to move back to the request context for
calling the callback?

In particular, I see this:

    static void nvme_rw_cb_bh(void *opaque)
    {
        NVMeCoData *data = opaque;
        qemu_coroutine_enter(data->co);
    }

The next patch changes some things about coroutine wakeup, but it
doesn't touch this qemu_coroutine_enter(). So I think the coroutine is
now running in the wrong thread.

I also feel that it gets a bit confusing what is running in which
context, so maybe we can add comments to each of the callbacks telling
that they are running in main BDS context or request coroutine context.

> (For what it’s worth, a quick fio bandwidth test indicates that on my
> test hardware, if anything, this may be a bit better than kicking
> immediately before scheduling a pure nvme_process_completion() BH.  But
> I wouldn’t take more from those results than that it doesn’t really seem
> to matter either way.)
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>

Kevin


Re: [PATCH 07/16] nvme: Kick and check completions in BDS context
Posted by Hanna Czenczek 2 weeks ago
On 29.10.25 18:23, Kevin Wolf wrote:
> Am 28.10.2025 um 17:33 hat Hanna Czenczek geschrieben:
>> nvme_process_completion() must run in the main BDS context, so schedule
>> a BH for requests that aren’t there.
>>
>> The context in which we kick does not matter, but let’s just keep kick
>> and process_completion together for simplicity’s sake.
> Ok, fair, move the main BDS context for calling these functions. But
> doesn't that mean that we need to move back to the request context for
> calling the callback?
>
> In particular, I see this:
>
>      static void nvme_rw_cb_bh(void *opaque)
>      {
>          NVMeCoData *data = opaque;
>          qemu_coroutine_enter(data->co);
>      }
>
> The next patch changes some things about coroutine wakeup, but it
> doesn't touch this qemu_coroutine_enter(). So I think the coroutine is
> now running in the wrong thread.
>
> I also feel that it gets a bit confusing what is running in which
> context, so maybe we can add comments to each of the callbacks telling
> that they are running in main BDS context or request coroutine context.

Makes sense, I’ll try my best.

Hanna


Re: [PATCH 07/16] nvme: Kick and check completions in BDS context
Posted by Kevin Wolf 2 weeks, 2 days ago
Am 29.10.2025 um 18:23 hat Kevin Wolf geschrieben:
> Am 28.10.2025 um 17:33 hat Hanna Czenczek geschrieben:
> > nvme_process_completion() must run in the main BDS context, so schedule
> > a BH for requests that aren’t there.
> > 
> > The context in which we kick does not matter, but let’s just keep kick
> > and process_completion together for simplicity’s sake.
> 
> Ok, fair, move the main BDS context for calling these functions. But
> doesn't that mean that we need to move back to the request context for
> calling the callback?
> 
> In particular, I see this:
> 
>     static void nvme_rw_cb_bh(void *opaque)
>     {
>         NVMeCoData *data = opaque;
>         qemu_coroutine_enter(data->co);
>     }
> 
> The next patch changes some things about coroutine wakeup, but it
> doesn't touch this qemu_coroutine_enter(). So I think the coroutine is
> now running in the wrong thread.

It actually isn't because the patch changes in which AioContext the BH
is called. Quite confusing with all the indirections. Let's get rid of
the BH with qemu_coroutine_enter() and just call aio_co_wake() directly.

Kevin

> I also feel that it gets a bit confusing what is running in which
> context, so maybe we can add comments to each of the callbacks telling
> that they are running in main BDS context or request coroutine context.
> 
> > (For what it’s worth, a quick fio bandwidth test indicates that on my
> > test hardware, if anything, this may be a bit better than kicking
> > immediately before scheduling a pure nvme_process_completion() BH.  But
> > I wouldn’t take more from those results than that it doesn’t really seem
> > to matter either way.)
> > 
> > Cc: qemu-stable@nongnu.org
> > Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> 
> Kevin


Re: [PATCH 07/16] nvme: Kick and check completions in BDS context
Posted by Hanna Czenczek 2 weeks ago
On 29.10.25 18:39, Kevin Wolf wrote:
> Am 29.10.2025 um 18:23 hat Kevin Wolf geschrieben:
>> Am 28.10.2025 um 17:33 hat Hanna Czenczek geschrieben:
>>> nvme_process_completion() must run in the main BDS context, so schedule
>>> a BH for requests that aren’t there.
>>>
>>> The context in which we kick does not matter, but let’s just keep kick
>>> and process_completion together for simplicity’s sake.
>> Ok, fair, move the main BDS context for calling these functions. But
>> doesn't that mean that we need to move back to the request context for
>> calling the callback?
>>
>> In particular, I see this:
>>
>>      static void nvme_rw_cb_bh(void *opaque)
>>      {
>>          NVMeCoData *data = opaque;
>>          qemu_coroutine_enter(data->co);
>>      }
>>
>> The next patch changes some things about coroutine wakeup, but it
>> doesn't touch this qemu_coroutine_enter(). So I think the coroutine is
>> now running in the wrong thread.
> It actually isn't because the patch changes in which AioContext the BH
> is called. Quite confusing with all the indirections. Let's get rid of
> the BH with qemu_coroutine_enter() and just call aio_co_wake() directly.

Sure!

Hanna