The aio_co_reschedule_self() API is designed to avoid the race
condition between scheduling the coroutine in another AioContext and
yielding.
The QMP dispatch code uses the open-coded version that appears
susceptible to the race condition at first glance:
aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
qemu_coroutine_yield();
The code is actually safe because the iohandler and qemu_aio_context
AioContext run under the Big QEMU Lock. Nevertheless, set a good example
and use aio_co_reschedule_self() so it's obvious that there is no race.
Suggested-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
qapi/qmp-dispatch.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 176b549473..f3488afeef 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -212,8 +212,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
* executing the command handler so that it can make progress if it
* involves an AIO_WAIT_WHILE().
*/
- aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
- qemu_coroutine_yield();
+ aio_co_reschedule_self(qemu_get_aio_context());
}
monitor_set_cur(qemu_coroutine_self(), cur_mon);
@@ -227,9 +226,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
* Move back to iohandler_ctx so that nested event loops for
* qemu_aio_context don't start new monitor commands.
*/
- aio_co_schedule(iohandler_get_aio_context(),
- qemu_coroutine_self());
- qemu_coroutine_yield();
+ aio_co_reschedule_self(iohandler_get_aio_context());
}
} else {
/*
--
2.43.0
Am 06.02.2024 um 20:06 hat Stefan Hajnoczi geschrieben: > The aio_co_reschedule_self() API is designed to avoid the race > condition between scheduling the coroutine in another AioContext and > yielding. > > The QMP dispatch code uses the open-coded version that appears > susceptible to the race condition at first glance: > > aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self()); > qemu_coroutine_yield(); > > The code is actually safe because the iohandler and qemu_aio_context > AioContext run under the Big QEMU Lock. Nevertheless, set a good example > and use aio_co_reschedule_self() so it's obvious that there is no race. > > Suggested-by: Hanna Reitz <hreitz@redhat.com> > Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > Reviewed-by: Hanna Czenczek <hreitz@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > qapi/qmp-dispatch.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c > index 176b549473..f3488afeef 100644 > --- a/qapi/qmp-dispatch.c > +++ b/qapi/qmp-dispatch.c > @@ -212,8 +212,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ > * executing the command handler so that it can make progress if it > * involves an AIO_WAIT_WHILE(). > */ > - aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self()); > - qemu_coroutine_yield(); > + aio_co_reschedule_self(qemu_get_aio_context()); Turns out that this one actually causes a regression. [1] This code is ŕun in iohandler_ctx, aio_co_reschedule_self() looks at the new context and compares it with qemu_get_current_aio_context() - and because both are qemu_aio_context, it decides that it has nothing to do. So the command handler coroutine actually still runs in iohandler_ctx now, which is not what we want. We could just revert this patch because it was only meant as a cleanup without a semantic difference. Or aio_co_reschedule_self() could look at qemu_coroutine_self()->ctx instead of using qemu_get_current_aio_context(). That would be a little more indirect, though, and I'm not sure if co->ctx is always up to date. Any opinions on what is the best way to fix this? Kevin [1] https://issues.redhat.com/browse/RHEL-34618
On Fri, May 03, 2024 at 07:33:17PM +0200, Kevin Wolf wrote: > Am 06.02.2024 um 20:06 hat Stefan Hajnoczi geschrieben: > > The aio_co_reschedule_self() API is designed to avoid the race > > condition between scheduling the coroutine in another AioContext and > > yielding. > > > > The QMP dispatch code uses the open-coded version that appears > > susceptible to the race condition at first glance: > > > > aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self()); > > qemu_coroutine_yield(); > > > > The code is actually safe because the iohandler and qemu_aio_context > > AioContext run under the Big QEMU Lock. Nevertheless, set a good example > > and use aio_co_reschedule_self() so it's obvious that there is no race. > > > > Suggested-by: Hanna Reitz <hreitz@redhat.com> > > Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> > > Reviewed-by: Hanna Czenczek <hreitz@redhat.com> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > qapi/qmp-dispatch.c | 7 ++----- > > 1 file changed, 2 insertions(+), 5 deletions(-) > > > > diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c > > index 176b549473..f3488afeef 100644 > > --- a/qapi/qmp-dispatch.c > > +++ b/qapi/qmp-dispatch.c > > @@ -212,8 +212,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ > > * executing the command handler so that it can make progress if it > > * involves an AIO_WAIT_WHILE(). > > */ > > - aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self()); > > - qemu_coroutine_yield(); > > + aio_co_reschedule_self(qemu_get_aio_context()); > > Turns out that this one actually causes a regression. [1] This code is > ŕun in iohandler_ctx, aio_co_reschedule_self() looks at the new context > and compares it with qemu_get_current_aio_context() - and because both > are qemu_aio_context, it decides that it has nothing to do. So the > command handler coroutine actually still runs in iohandler_ctx now, > which is not what we want. > > We could just revert this patch because it was only meant as a cleanup > without a semantic difference. > > Or aio_co_reschedule_self() could look at qemu_coroutine_self()->ctx > instead of using qemu_get_current_aio_context(). That would be a little > more indirect, though, and I'm not sure if co->ctx is always up to date. > > Any opinions on what is the best way to fix this? If the commit is reverted then similar bugs may be introduced again in the future. The qemu_get_current_aio_context() API is unaware of iohandler_ctx and this can lead to unexpected results. I will send patches to revert the commit and add doc comments explaining iohandler_ctx's special behavior. This will reduce, but not eliminate, the risk of future bugs. Modifying aio_co_reschedule_self() might be better long-term fix, but I'm afraid it will create more bugs because it will expose the subtle distinction between the current coroutine AioContext and non-coroutine AioContext in new places. I think the root cause is that iohandler_ctx isn't a full-fledged AioContext with its own event loop. iohandler_ctx is a special superset of qemu_aio_context that the main loop monitors. Stefan > > Kevin > > [1] https://issues.redhat.com/browse/RHEL-34618 >
© 2016 - 2024 Red Hat, Inc.