[v1] block: Some multi-threading fixes

[PATCH 03/16] iscsi: Run co BH CB in the coroutine’s AioContext

Posted by Hanna Czenczek 3 months, 1 week ago

For rbd (and others), as described in “rbd: Run co BH CB in the
coroutine’s AioContext”, the pattern of setting a completion flag and
waking a coroutine that yields while the flag is not set can only work
when both run in the same thread.

iscsi has the same pattern, but the details are a bit different:
iscsi_co_generic_cb() can (as far as I understand) only run through
iscsi_service(), not just from a random thread at a random time.
iscsi_service() in turn can only be run after iscsi_set_events() set up
an FD event handler, which is done in iscsi_co_wait_for_task().

As a result, iscsi_co_wait_for_task() will always yield exactly once,
because iscsi_co_generic_cb() can only run after iscsi_set_events(),
after the completion flag has already been checked, and the yielding
coroutine will then be woken only once the completion flag was set to
true.  So as far as I can tell, iscsi has no bug and already works fine.

Still, we don’t need the completion flag because we know we have to
yield exactly once, so we can drop it.  This simplifies the code and
makes it more obvious that the “rbd bug” isn’t present here.

This makes iscsi_co_generic_bh_cb() and iscsi_retry_timer_expired() a
bit boring, and actually, for the former, we could drop it and run
aio_co_wake() directly from scsi_co_generic_cb() to the same effect; but
that would remove the replay_bh_schedule_oneshot_event(), and I assume
we shouldn’t do that.  At least schedule both the BH and the timer in
the coroutine’s AioContext to make them simple wrappers around
qemu_coroutine_enter(), without a further BH indirection.

Finally, remove the iTask->co != NULL checks: This field is set by
iscsi_co_init_iscsitask(), which all users of IscsiTask run before even
setting up iscsi_co_generic_cb() as the callback, and it is never set or
cleared elsewhere, so it is impossible to not be set in
iscsi_co_generic_cb().

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/iscsi.c | 39 ++++++++++-----------------------------
 1 file changed, 10 insertions(+), 29 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 15b96ee880..76c15e20ea 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -107,7 +107,6 @@ typedef struct IscsiLun {
 
 typedef struct IscsiTask {
     int status;
-    int complete;
     int retries;
     int do_retry;
     struct scsi_task *task;
@@ -183,18 +182,13 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
 static void iscsi_co_generic_bh_cb(void *opaque)
 {
     struct IscsiTask *iTask = opaque;
-
-    iTask->complete = 1;
     aio_co_wake(iTask->co);
 }
 
 static void iscsi_retry_timer_expired(void *opaque)
 {
     struct IscsiTask *iTask = opaque;
-    iTask->complete = 1;
-    if (iTask->co) {
-        aio_co_wake(iTask->co);
-    }
+    aio_co_wake(iTask->co);
 }
 
 static inline unsigned exp_random(double mean)
@@ -239,6 +233,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 {
     struct IscsiTask *iTask = opaque;
     struct scsi_task *task = command_data;
+    AioContext *itask_ctx = qemu_coroutine_get_aio_context(iTask->co);
 
     iTask->status = status;
     iTask->do_retry = 0;
@@ -263,9 +258,9 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
                              " (retry #%u in %u ms): %s",
                              iTask->retries, retry_time,
                              iscsi_get_error(iscsi));
-                aio_timer_init(iTask->iscsilun->aio_context,
-                               &iTask->retry_timer, QEMU_CLOCK_REALTIME,
-                               SCALE_MS, iscsi_retry_timer_expired, iTask);
+                aio_timer_init(itask_ctx, &iTask->retry_timer,
+                               QEMU_CLOCK_REALTIME, SCALE_MS,
+                               iscsi_retry_timer_expired, iTask);
                 timer_mod(&iTask->retry_timer,
                           qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + retry_time);
                 iTask->do_retry = 1;
@@ -284,12 +279,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
         }
     }
 
-    if (iTask->co) {
-        replay_bh_schedule_oneshot_event(iTask->iscsilun->aio_context,
-                                         iscsi_co_generic_bh_cb, iTask);
-    } else {
-        iTask->complete = 1;
-    }
+    replay_bh_schedule_oneshot_event(itask_ctx, iscsi_co_generic_bh_cb, iTask);
 }
 
 static void coroutine_fn
@@ -592,12 +582,10 @@ static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
 static void coroutine_fn iscsi_co_wait_for_task(IscsiTask *iTask,
                                                 IscsiLun *iscsilun)
 {
-    while (!iTask->complete) {
-        iscsi_set_events(iscsilun);
-        qemu_mutex_unlock(&iscsilun->mutex);
-        qemu_coroutine_yield();
-        qemu_mutex_lock(&iscsilun->mutex);
-    }
+    iscsi_set_events(iscsilun);
+    qemu_mutex_unlock(&iscsilun->mutex);
+    qemu_coroutine_yield();
+    qemu_mutex_lock(&iscsilun->mutex);
 }
 
 static int coroutine_fn
@@ -669,7 +657,6 @@ retry:
     }
 
     if (iTask.do_retry) {
-        iTask.complete = 0;
         goto retry;
     }
 
@@ -740,7 +727,6 @@ retry:
             scsi_free_scsi_task(iTask.task);
             iTask.task = NULL;
         }
-        iTask.complete = 0;
         goto retry;
     }
 
@@ -902,7 +888,6 @@ retry:
     }
 
     if (iTask.do_retry) {
-        iTask.complete = 0;
         goto retry;
     }
 
@@ -940,7 +925,6 @@ retry:
     }
 
     if (iTask.do_retry) {
-        iTask.complete = 0;
         goto retry;
     }
 
@@ -1184,7 +1168,6 @@ retry:
     }
 
     if (iTask.do_retry) {
-        iTask.complete = 0;
         goto retry;
     }
 
@@ -1301,7 +1284,6 @@ retry:
     }
 
     if (iTask.do_retry) {
-        iTask.complete = 0;
         goto retry;
     }
 
@@ -2390,7 +2372,6 @@ retry:
     iscsi_co_wait_for_task(&iscsi_task, dst_lun);
 
     if (iscsi_task.do_retry) {
-        iscsi_task.complete = 0;
         goto retry;
     }
 
-- 
2.51.0

Re: [PATCH 03/16] iscsi: Run co BH CB in the coroutine’s AioContext

Posted by Kevin Wolf 3 months, 1 week ago

Am 28.10.2025 um 17:33 hat Hanna Czenczek geschrieben:
> For rbd (and others), as described in “rbd: Run co BH CB in the
> coroutine’s AioContext”, the pattern of setting a completion flag and
> waking a coroutine that yields while the flag is not set can only work
> when both run in the same thread.
> 
> iscsi has the same pattern, but the details are a bit different:
> iscsi_co_generic_cb() can (as far as I understand) only run through
> iscsi_service(), not just from a random thread at a random time.
> iscsi_service() in turn can only be run after iscsi_set_events() set up
> an FD event handler, which is done in iscsi_co_wait_for_task().
> 
> As a result, iscsi_co_wait_for_task() will always yield exactly once,
> because iscsi_co_generic_cb() can only run after iscsi_set_events(),
> after the completion flag has already been checked, and the yielding
> coroutine will then be woken only once the completion flag was set to
> true.  So as far as I can tell, iscsi has no bug and already works fine.
> 
> Still, we don’t need the completion flag because we know we have to
> yield exactly once, so we can drop it.  This simplifies the code and
> makes it more obvious that the “rbd bug” isn’t present here.
> 
> This makes iscsi_co_generic_bh_cb() and iscsi_retry_timer_expired() a
> bit boring, and actually, for the former, we could drop it and run
> aio_co_wake() directly from scsi_co_generic_cb() to the same effect; but
> that would remove the replay_bh_schedule_oneshot_event(), and I assume
> we shouldn’t do that.  At least schedule both the BH and the timer in
> the coroutine’s AioContext to make them simple wrappers around
> qemu_coroutine_enter(), without a further BH indirection.

I don't think we have to keep the BH. Is your concern about replay? I
doubt that this works across different QEMU versions anyway, and if it
does, it's pure luck.

> Finally, remove the iTask->co != NULL checks: This field is set by
> iscsi_co_init_iscsitask(), which all users of IscsiTask run before even
> setting up iscsi_co_generic_cb() as the callback, and it is never set or
> cleared elsewhere, so it is impossible to not be set in
> iscsi_co_generic_cb().
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>

Kevin

Re: [PATCH 03/16] iscsi: Run co BH CB in the coroutine’s AioContext

Posted by Hanna Czenczek 3 months, 1 week ago

On 29.10.25 15:27, Kevin Wolf wrote:
> Am 28.10.2025 um 17:33 hat Hanna Czenczek geschrieben:
>> For rbd (and others), as described in “rbd: Run co BH CB in the
>> coroutine’s AioContext”, the pattern of setting a completion flag and
>> waking a coroutine that yields while the flag is not set can only work
>> when both run in the same thread.
>>
>> iscsi has the same pattern, but the details are a bit different:
>> iscsi_co_generic_cb() can (as far as I understand) only run through
>> iscsi_service(), not just from a random thread at a random time.
>> iscsi_service() in turn can only be run after iscsi_set_events() set up
>> an FD event handler, which is done in iscsi_co_wait_for_task().
>>
>> As a result, iscsi_co_wait_for_task() will always yield exactly once,
>> because iscsi_co_generic_cb() can only run after iscsi_set_events(),
>> after the completion flag has already been checked, and the yielding
>> coroutine will then be woken only once the completion flag was set to
>> true.  So as far as I can tell, iscsi has no bug and already works fine.
>>
>> Still, we don’t need the completion flag because we know we have to
>> yield exactly once, so we can drop it.  This simplifies the code and
>> makes it more obvious that the “rbd bug” isn’t present here.
>>
>> This makes iscsi_co_generic_bh_cb() and iscsi_retry_timer_expired() a
>> bit boring, and actually, for the former, we could drop it and run
>> aio_co_wake() directly from scsi_co_generic_cb() to the same effect; but
>> that would remove the replay_bh_schedule_oneshot_event(), and I assume
>> we shouldn’t do that.  At least schedule both the BH and the timer in
>> the coroutine’s AioContext to make them simple wrappers around
>> qemu_coroutine_enter(), without a further BH indirection.
> I don't think we have to keep the BH. Is your concern about replay? I
> doubt that this works across different QEMU versions anyway, and if it
> does, it's pure luck.

It is solely about replay, yes.  I assumed the 
replay_bh_schedule_oneshot_event() would be a replay point, so removing 
it would, well, remove a replay point.  I suppose we’re going to have 
one replay point per request anyway (when going through the blkreplay 
driver), so maybe it doesn’t matter much?

Anyway, it seemed safer to keep it.  But apart from replay, I don’t have 
any concern about dropping the BH.

>> Finally, remove the iTask->co != NULL checks: This field is set by
>> iscsi_co_init_iscsitask(), which all users of IscsiTask run before even
>> setting up iscsi_co_generic_cb() as the callback, and it is never set or
>> cleared elsewhere, so it is impossible to not be set in
>> iscsi_co_generic_cb().
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> Kevin
>

Re: [PATCH 03/16] iscsi: Run co BH CB in the coroutine’s AioContext

Posted by Kevin Wolf 3 months, 1 week ago

Am 31.10.2025 um 10:07 hat Hanna Czenczek geschrieben:
> On 29.10.25 15:27, Kevin Wolf wrote:
> > Am 28.10.2025 um 17:33 hat Hanna Czenczek geschrieben:
> > > For rbd (and others), as described in “rbd: Run co BH CB in the
> > > coroutine’s AioContext”, the pattern of setting a completion flag and
> > > waking a coroutine that yields while the flag is not set can only work
> > > when both run in the same thread.
> > > 
> > > iscsi has the same pattern, but the details are a bit different:
> > > iscsi_co_generic_cb() can (as far as I understand) only run through
> > > iscsi_service(), not just from a random thread at a random time.
> > > iscsi_service() in turn can only be run after iscsi_set_events() set up
> > > an FD event handler, which is done in iscsi_co_wait_for_task().
> > > 
> > > As a result, iscsi_co_wait_for_task() will always yield exactly once,
> > > because iscsi_co_generic_cb() can only run after iscsi_set_events(),
> > > after the completion flag has already been checked, and the yielding
> > > coroutine will then be woken only once the completion flag was set to
> > > true.  So as far as I can tell, iscsi has no bug and already works fine.
> > > 
> > > Still, we don’t need the completion flag because we know we have to
> > > yield exactly once, so we can drop it.  This simplifies the code and
> > > makes it more obvious that the “rbd bug” isn’t present here.
> > > 
> > > This makes iscsi_co_generic_bh_cb() and iscsi_retry_timer_expired() a
> > > bit boring, and actually, for the former, we could drop it and run
> > > aio_co_wake() directly from scsi_co_generic_cb() to the same effect; but
> > > that would remove the replay_bh_schedule_oneshot_event(), and I assume
> > > we shouldn’t do that.  At least schedule both the BH and the timer in
> > > the coroutine’s AioContext to make them simple wrappers around
> > > qemu_coroutine_enter(), without a further BH indirection.
> > I don't think we have to keep the BH. Is your concern about replay? I
> > doubt that this works across different QEMU versions anyway, and if it
> > does, it's pure luck.
> 
> It is solely about replay, yes.  I assumed the
> replay_bh_schedule_oneshot_event() would be a replay point, so removing it
> would, well, remove a replay point.  I suppose we’re going to have one
> replay point per request anyway (when going through the blkreplay driver),
> so maybe it doesn’t matter much?

Yes, I think it is a replay point. And I don't really know what replay
does when the log has an event that doesn't appear in the code (or the
other way around).

I just don't expect that compatibility of replay logs across QEMU
versions is important (and even if it were important, that it is
achieved, because we probably change the control flow leading to replay
points all the time without even noticing). As far as I understand the
idea is that you want to debug a guest, and during that single debug
session you record a guest and then replay it multiple times. I don't
think it's expected that you keep the replay logs for a long time and
across QEMU updates.

Kevin