[PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers

Phil Dennis-Jordan posted 1 patch 2 months, 3 weeks ago
include/system/system.h | 1 +
system/runstate.c       | 1 +
2 files changed, 2 insertions(+)
[PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
Posted by Phil Dennis-Jordan 2 months, 3 weeks ago
By changing the way the main QEMU event loop is invoked, I inadvertently
changed the BQL status of exit notifiers: some of them implicitly
assumed they would be called with the BQL held; the BQL is however
not held during the exit(status) call in qemu_default_main().

Instead of attempting to ensuring we always call exit() from the BQL -
including any transitive calls - this change adds a BQL lock guard to
qemu_run_exit_notifiers, ensuring the BQL will always be held in the
exit notifiers.

Additionally, the BQL promise is now documented at the
qemu_{add,remove}_exit_notifier() declarations.

Fixes: f5ab12caba4f ("ui & main loop: Redesign of system-specific main
thread event handling")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2771
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
 include/system/system.h | 1 +
 system/runstate.c       | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/system/system.h b/include/system/system.h
index 5364ad4f27..0cbb43ec30 100644
--- a/include/system/system.h
+++ b/include/system/system.h
@@ -15,6 +15,7 @@ extern bool qemu_uuid_set;
 
 const char *qemu_get_vm_name(void);
 
+/* Exit notifiers will run with BQL held. */
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
 
diff --git a/system/runstate.c b/system/runstate.c
index 3a8fe866bc..272801d307 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -850,6 +850,7 @@ void qemu_remove_exit_notifier(Notifier *notify)
 
 static void qemu_run_exit_notifiers(void)
 {
+    BQL_LOCK_GUARD();
     notifier_list_notify(&exit_notifiers, NULL);
 }
 
-- 
2.39.5 (Apple Git-154)
Re: [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
Posted by Paolo Bonzini 2 months, 2 weeks ago
On 1/12/25 22:26, Phil Dennis-Jordan wrote:
> By changing the way the main QEMU event loop is invoked, I inadvertently
> changed the BQL status of exit notifiers: some of them implicitly
> assumed they would be called with the BQL held; the BQL is however
> not held during the exit(status) call in qemu_default_main().
> 
> Instead of attempting to ensuring we always call exit() from the BQL -
> including any transitive calls - this change adds a BQL lock guard to
> qemu_run_exit_notifiers, ensuring the BQL will always be held in the
> exit notifiers.
> 
> Additionally, the BQL promise is now documented at the
> qemu_{add,remove}_exit_notifier() declarations.
> 
> Fixes: f5ab12caba4f ("ui & main loop: Redesign of system-specific main
> thread event handling")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2771
> Reported-by: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>

I'm worried that this breaks for exit() calls that happen within a 
BQL-taken area (for example, anything that uses error_fatal) due to...

void bql_lock_impl(const char *file, int line)
{
     QemuMutexLockFunc bql_lock_fn = qatomic_read(&bql_mutex_lock_func);

     g_assert(!bql_locked()); // <--- this
     bql_lock_fn(&bql, file, line);
     set_bql_locked(true);
}

Paolo
Re: [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
Posted by Phil Dennis-Jordan 2 months, 2 weeks ago
On Wed 15. Jan 2025 at 20:05, Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 1/12/25 22:26, Phil Dennis-Jordan wrote:
> > By changing the way the main QEMU event loop is invoked, I inadvertently
> > changed the BQL status of exit notifiers: some of them implicitly
> > assumed they would be called with the BQL held; the BQL is however
> > not held during the exit(status) call in qemu_default_main().
> >
> > Instead of attempting to ensuring we always call exit() from the BQL -
> > including any transitive calls - this change adds a BQL lock guard to
> > qemu_run_exit_notifiers, ensuring the BQL will always be held in the
> > exit notifiers.
> >
> > Additionally, the BQL promise is now documented at the
> > qemu_{add,remove}_exit_notifier() declarations.
> >
> > Fixes: f5ab12caba4f ("ui & main loop: Redesign of system-specific main
> > thread event handling")
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2771
> > Reported-by: David Woodhouse <dwmw2@infradead.org>
> > Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
>
> I'm worried that this breaks for exit() calls that happen within a
> BQL-taken area (for example, anything that uses error_fatal) due to...
>
> void bql_lock_impl(const char *file, int line)
> {
>      QemuMutexLockFunc bql_lock_fn = qatomic_read(&bql_mutex_lock_func);
>
>      g_assert(!bql_locked()); // <--- this
>      bql_lock_fn(&bql, file, line);
>      set_bql_locked(true);
> }
>

BQL_LOCK_GUARD expands to a call to bql_auto_lock(), which in turn defends
against recursive locking by checking bql_locked().

https://gitlab.com/qemu-project/qemu/-/blob/master/include/qemu/main-loop.h#L377

I think that should make it safe?

The only safety issue I can imagine is that exit() is called in a thread
where the BQL is not held, but a BQL-holding thread is waiting for that
thread. But I’m not sure such a pattern exists in QEMU though, and it would
have triggered the assertion in the original code. (before my patch causing
the regression was applied)

>
>
>
Re: [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
Posted by David Woodhouse 2 months, 2 weeks ago
On Wed, 2025-01-15 at 20:17 +0100, Phil Dennis-Jordan wrote:
> 
> BQL_LOCK_GUARD expands to a call to bql_auto_lock(), which in turn
> defends against recursive locking by checking bql_locked(). 
> 
> https://gitlab.com/qemu-project/qemu/-/blob/master/include/qemu/main-loop.h#L377
> 
> I think that should make it safe?

Looks like it. I did this to test:

--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -451,6 +451,10 @@ void xen_evtchn_set_callback_level(int level)
         if (level && !s->extern_gsi_level) {
             kvm_xen_set_callback_asserted();
         }
+        if (level) {
+            printf("Exiting, BQL held\n");
+            exit(77);
+        }
     }
 }
 
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -851,6 +851,7 @@ void qemu_remove_exit_notifier(Notifier *notify)
 static void qemu_run_exit_notifiers(void)
 {
     BQL_LOCK_GUARD();
+    printf("%s has BQL\n", __func__);
     notifier_list_notify(&exit_notifiers, NULL);
 }
 

So the first time a Xen guest's callback IRQ is asserted, it exited
with the BQL held, and qemu_run_exit_notifiers() didn't get stuck.

[    0.521568] ACPI: \_SB_.GSIF: Enabled at IRQ 21
Exiting, BQL held
qemu_run_exit_notifiers has BQL


The actual cleanup of the XenDevice did then deadlock on the Xen evtchn
port_lock, which had *also* been held when my hack exited in the evtchn
code. But that one is expected.

#0  0x00007fc5b2a7b0c0 in __lll_lock_wait () at /lib64/libc.so.6
#1  0x00007fc5b2a81d81 in pthread_mutex_lock@@GLIBC_2.2.5 ()
    at /lib64/libc.so.6
#2  0x0000558286c07a63 in qemu_mutex_lock_impl
    (mutex=0x558294179998, file=0x558286f9b905 "../hw/i386/kvm/xen_evtchn.c", line=2147) at ../util/qemu-thread-posix.c:95
#3  0x00005582868d774f in xen_be_evtchn_unbind (xc=0x5582939b3810, port=2)
    at ../hw/i386/kvm/xen_evtchn.c:2147
#4  0x000055828679e0a9 in qemu_xen_evtchn_unbind
    (xc=<optimized out>, port=<optimized out>)
    at /home/dwmw2/git/qemu/include/hw/xen/xen_backend_ops.h:91
#5  xen_device_unbind_event_channel
    (xendev=<optimized out>, channel=0x5582939b4cb0, errp=0x0)
    at ../hw/xen/xen-bus.c:961
#6  0x00005582865f64b9 in xen_console_disconnect
    (xendev=xendev@entry=0x5582942df4a0, errp=errp@entry=0x0)
    at ../hw/char/xen_console.c:298
#7  0x00005582865f6673 in xen_console_unrealize (xendev=0x5582942df4a0)
    at ../hw/char/xen_console.c:411
#8  0x000055828679e201 in xen_device_unrealize (dev=<optimized out>)
    at ../hw/xen/xen-bus.c:988
#9  0x0000558286c0da5f in notifier_list_notify (list=<optimized out>, data=0x0)
    at ../util/notify.c:39
#10 0x00007fc5b2a2a461 in __run_exit_handlers () at /lib64/libc.so.6
#11 0x00007fc5b2a2a52e in exit () at /lib64/libc.so.6
#12 0x00005582868d86dd in xen_evtchn_set_callback_level (level=1)
    at ../hw/i386/kvm/xen_evtchn.c:456
#13 0x00005582868d7c74 in inject_callback
    (s=0x558294179650, vcpu=<optimized out>) at ../hw/i386/kvm/xen_evtchn.c:548
#14 do_set_port_compat
    (s=<optimized out>, port=<optimized out>, shinfo=<optimized out>, vcpu_info=<optimized out>) at ../hw/i386/kvm/xen_evtchn.c:921
#15 set_port_pending (s=s@entry=0x558294179650, port=<optimized out>)
    at ../hw/i386/kvm/xen_evtchn.c:963

Re: [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
Posted by David Woodhouse 2 months, 2 weeks ago
On Sun, 2025-01-12 at 22:26 +0100, Phil Dennis-Jordan wrote:
> By changing the way the main QEMU event loop is invoked, I inadvertently
> changed the BQL status of exit notifiers: some of them implicitly
> assumed they would be called with the BQL held; the BQL is however
> not held during the exit(status) call in qemu_default_main().
> 
> Instead of attempting to ensuring we always call exit() from the BQL -
> including any transitive calls - this change adds a BQL lock guard to
> qemu_run_exit_notifiers, ensuring the BQL will always be held in the
> exit notifiers.
> 
> Additionally, the BQL promise is now documented at the
> qemu_{add,remove}_exit_notifier() declarations.
> 
> Fixes: f5ab12caba4f ("ui & main loop: Redesign of system-specific main
> thread event handling")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2771
> Reported-by: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>

(Sorry, I thought I'd done that already).

Is someone else going to pick this up, or should I round it up with the
Xen fixes for which I'm likely to send a pull request tomorrow?