[v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

[PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

Posted by Stefan Reiter 4 years, 10 months ago

The QMP dispatcher coroutine holds the qmp_queue_lock over a yield
point, where it expects to be rescheduled from the main context. If a
CHR_EVENT_CLOSED event is received just then, it can race and block the
main thread on the mutex in monitor_qmp_cleanup_queue_and_resume.

monitor_resume does not need to be called from main context, so we can
call it immediately after popping a request from the queue, which allows
us to drop the qmp_queue_lock mutex before yielding.

Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* different approach: move everything that needs the qmp_queue_lock mutex before
  the yield point, instead of moving the event handling to a different context

 monitor/qmp.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/monitor/qmp.c b/monitor/qmp.c
index 2b0308f933..092c527b6f 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -257,24 +257,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
         trace_monitor_qmp_in_band_dequeue(req_obj,
                                           req_obj->mon->qmp_requests->length);
 
-        if (qatomic_xchg(&qmp_dispatcher_co_busy, true) == true) {
-            /*
-             * Someone rescheduled us (probably because a new requests
-             * came in), but we didn't actually yield. Do that now,
-             * only to be immediately reentered and removed from the
-             * list of scheduled coroutines.
-             */
-            qemu_coroutine_yield();
-        }
-
-        /*
-         * Move the coroutine from iohandler_ctx to qemu_aio_context for
-         * executing the command handler so that it can make progress if it
-         * involves an AIO_WAIT_WHILE().
-         */
-        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
-        qemu_coroutine_yield();
-
         /*
          * @req_obj has a request, we hold req_obj->mon->qmp_queue_lock
          */
@@ -298,8 +280,30 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
             monitor_resume(&mon->common);
         }
 
+        /*
+         * Drop the queue mutex now, before yielding, otherwise we might
+         * deadlock if the main thread tries to lock it.
+         */
         qemu_mutex_unlock(&mon->qmp_queue_lock);
 
+        if (qatomic_xchg(&qmp_dispatcher_co_busy, true) == true) {
+            /*
+             * Someone rescheduled us (probably because a new requests
+             * came in), but we didn't actually yield. Do that now,
+             * only to be immediately reentered and removed from the
+             * list of scheduled coroutines.
+             */
+            qemu_coroutine_yield();
+        }
+
+        /*
+         * Move the coroutine from iohandler_ctx to qemu_aio_context for
+         * executing the command handler so that it can make progress if it
+         * involves an AIO_WAIT_WHILE().
+         */
+        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
+        qemu_coroutine_yield();
+
         /* Process request */
         if (req_obj->req) {
             if (trace_event_get_state(TRACE_MONITOR_QMP_CMD_IN_BAND)) {
-- 
2.20.1

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

Posted by Kevin Wolf 4 years, 10 months ago

Am 22.03.2021 um 16:40 hat Stefan Reiter geschrieben:
> The QMP dispatcher coroutine holds the qmp_queue_lock over a yield
> point, where it expects to be rescheduled from the main context. If a
> CHR_EVENT_CLOSED event is received just then, it can race and block the
> main thread on the mutex in monitor_qmp_cleanup_queue_and_resume.
> 
> monitor_resume does not need to be called from main context, so we can
> call it immediately after popping a request from the queue, which allows
> us to drop the qmp_queue_lock mutex before yielding.
> 
> Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> v2:
> * different approach: move everything that needs the qmp_queue_lock mutex before
>   the yield point, instead of moving the event handling to a different context

The interesting new case here seems to be that new requests could be
queued and the dispatcher coroutine could be kicked before yielding.
This is safe because &qmp_dispatcher_co_busy is accessed with atomics
on both sides.

The important part is just that the first (conditional) yield stays
first, so that the aio_co_wake() in handle_qmp_command() won't reenter
the coroutine while it is expecting to be reentered from somewhere else.
This is still the case after the patch.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

Posted by Markus Armbruster 4 years, 10 months ago

Kevin Wolf <kwolf@redhat.com> writes:

> Am 22.03.2021 um 16:40 hat Stefan Reiter geschrieben:
>> The QMP dispatcher coroutine holds the qmp_queue_lock over a yield
>> point, where it expects to be rescheduled from the main context. If a
>> CHR_EVENT_CLOSED event is received just then, it can race and block the
>> main thread on the mutex in monitor_qmp_cleanup_queue_and_resume.
>> 
>> monitor_resume does not need to be called from main context, so we can
>> call it immediately after popping a request from the queue, which allows
>> us to drop the qmp_queue_lock mutex before yielding.
>> 
>> Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>> v2:
>> * different approach: move everything that needs the qmp_queue_lock mutex before
>>   the yield point, instead of moving the event handling to a different context
>
> The interesting new case here seems to be that new requests could be
> queued and the dispatcher coroutine could be kicked before yielding.
> This is safe because &qmp_dispatcher_co_busy is accessed with atomics
> on both sides.
>
> The important part is just that the first (conditional) yield stays
> first, so that the aio_co_wake() in handle_qmp_command() won't reenter
> the coroutine while it is expecting to be reentered from somewhere else.
> This is still the case after the patch.
>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Thanks for saving me from an ugly review headache.

Should this go into 6.0?

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

Posted by Kevin Wolf 4 years, 10 months ago

Am 08.04.2021 um 11:21 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 22.03.2021 um 16:40 hat Stefan Reiter geschrieben:
> >> The QMP dispatcher coroutine holds the qmp_queue_lock over a yield
> >> point, where it expects to be rescheduled from the main context. If a
> >> CHR_EVENT_CLOSED event is received just then, it can race and block the
> >> main thread on the mutex in monitor_qmp_cleanup_queue_and_resume.
> >> 
> >> monitor_resume does not need to be called from main context, so we can
> >> call it immediately after popping a request from the queue, which allows
> >> us to drop the qmp_queue_lock mutex before yielding.
> >> 
> >> Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> >> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> >> ---
> >> v2:
> >> * different approach: move everything that needs the qmp_queue_lock mutex before
> >>   the yield point, instead of moving the event handling to a different context
> >
> > The interesting new case here seems to be that new requests could be
> > queued and the dispatcher coroutine could be kicked before yielding.
> > This is safe because &qmp_dispatcher_co_busy is accessed with atomics
> > on both sides.
> >
> > The important part is just that the first (conditional) yield stays
> > first, so that the aio_co_wake() in handle_qmp_command() won't reenter
> > the coroutine while it is expecting to be reentered from somewhere else.
> > This is still the case after the patch.
> >
> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> 
> Thanks for saving me from an ugly review headache.
> 
> Should this go into 6.0?

This is something that the responsible maintainer needs to decide.

If it helps you with the decision, and if I understand correctly, it is
a regression from 5.1, but was already broken in 5.2.

Kevin

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

Posted by Markus Armbruster 4 years, 10 months ago

Kevin Wolf <kwolf@redhat.com> writes:

> Am 08.04.2021 um 11:21 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 22.03.2021 um 16:40 hat Stefan Reiter geschrieben:
>> >> The QMP dispatcher coroutine holds the qmp_queue_lock over a yield
>> >> point, where it expects to be rescheduled from the main context. If a
>> >> CHR_EVENT_CLOSED event is received just then, it can race and block the
>> >> main thread on the mutex in monitor_qmp_cleanup_queue_and_resume.
>> >> 
>> >> monitor_resume does not need to be called from main context, so we can
>> >> call it immediately after popping a request from the queue, which allows
>> >> us to drop the qmp_queue_lock mutex before yielding.
>> >> 
>> >> Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
>> >> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> >> ---
>> >> v2:
>> >> * different approach: move everything that needs the qmp_queue_lock mutex before
>> >>   the yield point, instead of moving the event handling to a different context
>> >
>> > The interesting new case here seems to be that new requests could be
>> > queued and the dispatcher coroutine could be kicked before yielding.
>> > This is safe because &qmp_dispatcher_co_busy is accessed with atomics
>> > on both sides.
>> >
>> > The important part is just that the first (conditional) yield stays
>> > first, so that the aio_co_wake() in handle_qmp_command() won't reenter
>> > the coroutine while it is expecting to be reentered from somewhere else.
>> > This is still the case after the patch.
>> >
>> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> 
>> Thanks for saving me from an ugly review headache.
>> 
>> Should this go into 6.0?
>
> This is something that the responsible maintainer needs to decide.

Yes, and that's me.  I'm soliciting opinions.

> If it helps you with the decision, and if I understand correctly, it is
> a regression from 5.1, but was already broken in 5.2.

It helps.

Even more helpful would be a risk assessment: what's the risk of
applying this patch now vs. delaying it?

If I understand Stefan correctly, Proxmox observed VM hangs.  How
frequent are these hangs?  Did they result in data corruption?

How confident do we feel about the fix?

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

Posted by Thomas Lamprecht 4 years, 10 months ago

On 08.04.21 14:49, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
>> Am 08.04.2021 um 11:21 hat Markus Armbruster geschrieben:
>>> Should this go into 6.0?
>>
>> This is something that the responsible maintainer needs to decide.
> 
> Yes, and that's me.  I'm soliciting opinions.
> 
>> If it helps you with the decision, and if I understand correctly, it is
>> a regression from 5.1, but was already broken in 5.2.
> 
> It helps.
> 
> Even more helpful would be a risk assessment: what's the risk of
> applying this patch now vs. delaying it?

Stefan is on vacation this week, but I can share some information, maybe it
helps.

> 
> If I understand Stefan correctly, Proxmox observed VM hangs.  How
> frequent are these hangs?  Did they result in data corruption?

They were not highly frequent, but frequent enough to get roughly a bit over a
dozen of reports in our forum, which normally means something is off but its
limited to certain HW, storage-tech used or load patterns.

We had initially a hard time to reproduce this, but a user finally could send
us a backtrace of a hanging VM and with that information we could pin it enough
down and Stefan came up with a good reproducer (see v1 of this patch).

We didn't got any report of actual data corruption due to this, but the VM
hangs completely, so a user killing it may produce that theoretical; but only
for those program running in the guest that where not made power-loss safe
anyway...

> 
> How confident do we feel about the fix?
> 

Cannot comment from a technical POV, but can share the feedback we got with it.

Some context about reach:
We have rolled the fix out to all repository stages which had already a build of
5.2, that has a reach of about 100k to 300k installations, albeit we only have
some rough stats about the sites that accesses the repository daily, cannot really
tell who actually updated to the new versions, but there are some quite update-happy
people in the community, so with that in mind and my experience of the feedback
loop of rolling out updates, I'd figure a lower bound one can assume without going
out on a limb is ~25k.

Positive feedback from users:
We got some positive feedback from people which ran into this at least once per
week about the issue being fixed with that. In total almost a dozen user reported
improvements, a good chunk of those which reported the problem in the first place.

Mixed feedback:
We had one user which reported still getting QMP timeouts, but that their VMs did
not hang anymore (could be high load or the like). Only one user reported that it
did not help, still investigating there, they have quite high CPU pressure stats
and it actually may also be another issue, cannot tell for sure yet though.

Negative feedback:
We had no new users reporting of new/worse problems in that direction, at least
from what I'm aware off.

Note, we do not use OOB currently, so above does not speak for the OOB case at
all.

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

Posted by Markus Armbruster 4 years, 10 months ago

Thomas Lamprecht <t.lamprecht@proxmox.com> writes:

> On 08.04.21 14:49, Markus Armbruster wrote:
>> Kevin Wolf <kwolf@redhat.com> writes:
>>> Am 08.04.2021 um 11:21 hat Markus Armbruster geschrieben:
>>>> Should this go into 6.0?
>>>
>>> This is something that the responsible maintainer needs to decide.
>> 
>> Yes, and that's me.  I'm soliciting opinions.
>> 
>>> If it helps you with the decision, and if I understand correctly, it is
>>> a regression from 5.1, but was already broken in 5.2.
>> 
>> It helps.
>> 
>> Even more helpful would be a risk assessment: what's the risk of
>> applying this patch now vs. delaying it?
>
> Stefan is on vacation this week, but I can share some information, maybe it
> helps.
>
>> 
>> If I understand Stefan correctly, Proxmox observed VM hangs.  How
>> frequent are these hangs?  Did they result in data corruption?
>
>
> They were not highly frequent, but frequent enough to get roughly a bit over a
> dozen of reports in our forum, which normally means something is off but its
> limited to certain HW, storage-tech used or load patterns.
>
> We had initially a hard time to reproduce this, but a user finally could send
> us a backtrace of a hanging VM and with that information we could pin it enough
> down and Stefan came up with a good reproducer (see v1 of this patch).

Excellent work, props!

> We didn't got any report of actual data corruption due to this, but the VM
> hangs completely, so a user killing it may produce that theoretical; but only
> for those program running in the guest that where not made power-loss safe
> anyway...
>
>> 
>> How confident do we feel about the fix?
>> 
>
> Cannot comment from a technical POV, but can share the feedback we got with it.
>
> Some context about reach:
> We have rolled the fix out to all repository stages which had already a build of
> 5.2, that has a reach of about 100k to 300k installations, albeit we only have
> some rough stats about the sites that accesses the repository daily, cannot really
> tell who actually updated to the new versions, but there are some quite update-happy
> people in the community, so with that in mind and my experience of the feedback
> loop of rolling out updates, I'd figure a lower bound one can assume without going
> out on a limb is ~25k.
>
> Positive feedback from users:
> We got some positive feedback from people which ran into this at least once per
> week about the issue being fixed with that. In total almost a dozen user reported
> improvements, a good chunk of those which reported the problem in the first place.
>
> Mixed feedback:
> We had one user which reported still getting QMP timeouts, but that their VMs did
> not hang anymore (could be high load or the like). Only one user reported that it
> did not help, still investigating there, they have quite high CPU pressure stats
> and it actually may also be another issue, cannot tell for sure yet though.
>
> Negative feedback:
> We had no new users reporting of new/worse problems in that direction, at least
> from what I'm aware off.
>
> Note, we do not use OOB currently, so above does not speak for the OOB case at
> all.

Thanks!

Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB

Posted by Markus Armbruster 4 years, 10 months ago

I considered this for 6.0, but decided not to rock the boat at this late
stage.

Queued for 6.1, thanks!