accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

[PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Harsh Prateek Bora 1 day, 23 hours ago

When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
returning true, it was returning before releasing the BQL (Big QEMU Lock).
This caused a lock imbalance where the vCPU thread would loop back to
kvm_cpu_exec() while still holding the BQL, leading to deadlocks.

The issue manifests as boot hangs on PowerPC pseries machines with multiple
vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
the BQL, preventing other operations from proceeding.

The fix has two parts:

1. In kvm_cpu_exec() (kvm-all.c):
   Release the BQL before returning EXCP_HLT in the early return path,
   matching the behavior of the normal execution path where bql_unlock()
   is called before entering the main KVM execution loop.

2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
   loop expects to hold the BQL when calling kvm_cpu_exec() again.

This ensures proper BQL lock/unlock pairing:
- kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
- kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
- kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
- Next iteration has BQL held as expected

This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
changes required to support KVM VM file descriptor change") which
refactored kvm_irqchip_create() and changed the initialization timing,
exposing this lock imbalance issue.

Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
Reported-by: Gautam Menghani <gautam@linux.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 accel/kvm/kvm-accel-ops.c | 4 ++++
 accel/kvm/kvm-all.c       | 1 +
 2 files changed, 5 insertions(+)

diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
index 6d9140e549..d684fd0840 100644
--- a/accel/kvm/kvm-accel-ops.c
+++ b/accel/kvm/kvm-accel-ops.c
@@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
 
         if (cpu_can_run(cpu)) {
             r = kvm_cpu_exec(cpu);
+            if (r == EXCP_HLT) {
+                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
+                bql_lock();
+            }
             if (r == EXCP_DEBUG) {
                 cpu_handle_guest_debug(cpu);
             }
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 774499d34f..00b8018664 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
     trace_kvm_cpu_exec();
 
     if (kvm_arch_process_async_events(cpu)) {
+        bql_unlock();
         return EXCP_HLT;
     }
 
-- 
2.52.0

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Fabiano Rosas 21 hours ago

Harsh Prateek Bora <harshpb@linux.ibm.com> writes:

> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
> returning true, it was returning before releasing the BQL (Big QEMU Lock).
> This caused a lock imbalance where the vCPU thread would loop back to
> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>
> The issue manifests as boot hangs on PowerPC pseries machines with multiple
> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
> the BQL, preventing other operations from proceeding.
>

AFAIU, with halted=1, the thread should be waiting at the
qemu_process_cpu_events() qemu_cond_wait invocation which will release
the BQL during the wait.

What is your irqchip setting? on/off/split? Aren't you just hitting the
early return at do_kvm_irqchip_create()? The refactoring from commit
98884e0cc1 ("accel/kvm: add changes required to support KVM VM file
descriptor change") made it so a few lines that would have been skipped
are now executed.

In this case, probably kvm_halt_in_kernel_allowed=true is making
cpu_thread_is_idle() return false and skip the wait at
qemu_process_cpu_events().

> The fix has two parts:
>
> 1. In kvm_cpu_exec() (kvm-all.c):
>    Release the BQL before returning EXCP_HLT in the early return path,
>    matching the behavior of the normal execution path where bql_unlock()
>    is called before entering the main KVM execution loop.
>
> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>    loop expects to hold the BQL when calling kvm_cpu_exec() again.
>
> This ensures proper BQL lock/unlock pairing:
> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
> - Next iteration has BQL held as expected
>
> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
> changes required to support KVM VM file descriptor change") which
> refactored kvm_irqchip_create() and changed the initialization timing,
> exposing this lock imbalance issue.
>
> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  accel/kvm/kvm-accel-ops.c | 4 ++++
>  accel/kvm/kvm-all.c       | 1 +
>  2 files changed, 5 insertions(+)
>
> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
> index 6d9140e549..d684fd0840 100644
> --- a/accel/kvm/kvm-accel-ops.c
> +++ b/accel/kvm/kvm-accel-ops.c
> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>  
>          if (cpu_can_run(cpu)) {
>              r = kvm_cpu_exec(cpu);
> +            if (r == EXCP_HLT) {
> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
> +                bql_lock();
> +            }
>              if (r == EXCP_DEBUG) {
>                  cpu_handle_guest_debug(cpu);
>              }
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 774499d34f..00b8018664 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>      trace_kvm_cpu_exec();
>  
>      if (kvm_arch_process_async_events(cpu)) {
> +        bql_unlock();
>          return EXCP_HLT;
>      }

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Misbah Anjum N 1 day, 8 hours ago

Hi,
I've tested the patch on PowerPC pseries machine and it resolves the 
boot hang issue seen on ppc when booting KVM guest with >1 smp value.

Test Environment:
- Host Arch: ppc64le
- Host and Guest OS: Fedora 42
- Machine Type: pseries with KVM acceleration
- QEMU: Latest master with this patch applied

Test Results:
All the following SMP topologies now boot successfully:

Single and simple multi-CPU:
- -smp 1
- -smp 2
- -smp 4
- -smp 32

Various socket/core/thread combinations (8 vCPUs):
- -smp 8,sockets=8,cores=1,threads=1
- -smp 8,sockets=1,cores=8,threads=1
- -smp 8,sockets=1,cores=1,threads=8
- -smp 8,sockets=2,cores=4,threads=1
- -smp 8,sockets=1,cores=4,threads=2
- -smp 8,sockets=2,cores=1,threads=4
- -smp 8,sockets=2,cores=2,threads=2

Higher vCPU count:
- -smp 16,sockets=2,cores=4,threads=2
- -smp 32,sockets=1,cores=8,threads=4

Tested-by: Misbah Anjum N <misanjum@linux.ibm.com>

Thanks,
Misbah Anjum N <misanjum@linux.ibm.com>


On 2026-04-09 21:40, Harsh Prateek Bora wrote:
> When kvm_cpu_exec() returns EXCP_HLT due to 
> kvm_arch_process_async_events()
> returning true, it was returning before releasing the BQL (Big QEMU 
> Lock).
> This caused a lock imbalance where the vCPU thread would loop back to
> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
> 
> The issue manifests as boot hangs on PowerPC pseries machines with 
> multiple
> vCPUs, where secondary vCPUs with start-powered-off=true remain halted 
> and
> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration 
> held
> the BQL, preventing other operations from proceeding.
> 
> The fix has two parts:
> 
> 1. In kvm_cpu_exec() (kvm-all.c):
>    Release the BQL before returning EXCP_HLT in the early return path,
>    matching the behavior of the normal execution path where 
> bql_unlock()
>    is called before entering the main KVM execution loop.
> 
> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>    loop expects to hold the BQL when calling kvm_cpu_exec() again.
> 
> This ensures proper BQL lock/unlock pairing:
> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
> - Next iteration has BQL held as expected
> 
> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
> changes required to support KVM VM file descriptor change") which
> refactored kvm_irqchip_create() and changed the initialization timing,
> exposing this lock imbalance issue.
> 
> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM
> file descriptor change")
> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>  accel/kvm/kvm-accel-ops.c | 4 ++++
>  accel/kvm/kvm-all.c       | 1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
> index 6d9140e549..d684fd0840 100644
> --- a/accel/kvm/kvm-accel-ops.c
> +++ b/accel/kvm/kvm-accel-ops.c
> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
> 
>          if (cpu_can_run(cpu)) {
>              r = kvm_cpu_exec(cpu);
> +            if (r == EXCP_HLT) {
> +                /* kvm_cpu_exec() released BQL, re-acquire for next
> iteration */
> +                bql_lock();
> +            }
>              if (r == EXCP_DEBUG) {
>                  cpu_handle_guest_debug(cpu);
>              }
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 774499d34f..00b8018664 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>      trace_kvm_cpu_exec();
> 
>      if (kvm_arch_process_async_events(cpu)) {
> +        bql_unlock();
>          return EXCP_HLT;
>      }

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Ani Sinha 1 day, 11 hours ago


> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> 
> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
> returning true, it was returning before releasing the BQL (Big QEMU Lock).
> This caused a lock imbalance where the vCPU thread would loop back to
> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.

I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance? 

> 
> The issue manifests as boot hangs on PowerPC pseries machines with multiple
> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
> the BQL, preventing other operations from proceeding.
> 
> The fix has two parts:
> 
> 1. In kvm_cpu_exec() (kvm-all.c):
>   Release the BQL before returning EXCP_HLT in the early return path,
>   matching the behavior of the normal execution path where bql_unlock()
>   is called before entering the main KVM execution loop.
> 
> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>   loop expects to hold the BQL when calling kvm_cpu_exec() again.
> 
> This ensures proper BQL lock/unlock pairing:
> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
> - Next iteration has BQL held as expected
> 
> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
> changes required to support KVM VM file descriptor change") which
> refactored kvm_irqchip_create() and changed the initialization timing,
> exposing this lock imbalance issue.
> 
> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")

I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.

Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.

> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
> accel/kvm/kvm-accel-ops.c | 4 ++++
> accel/kvm/kvm-all.c       | 1 +
> 2 files changed, 5 insertions(+)
> 
> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
> index 6d9140e549..d684fd0840 100644
> --- a/accel/kvm/kvm-accel-ops.c
> +++ b/accel/kvm/kvm-accel-ops.c
> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
> 
>         if (cpu_can_run(cpu)) {
>             r = kvm_cpu_exec(cpu);
> +            if (r == EXCP_HLT) {
> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
> +                bql_lock();
> +            }
>             if (r == EXCP_DEBUG) {
>                 cpu_handle_guest_debug(cpu);
>             }
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 774499d34f..00b8018664 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>     trace_kvm_cpu_exec();
> 
>     if (kvm_arch_process_async_events(cpu)) {
> +        bql_unlock();
>         return EXCP_HLT;
>     }
> 
> -- 
> 2.52.0
>

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Harsh Prateek Bora 1 day, 10 hours ago

Hi Ani,

On 10/04/26 9:12 am, Ani Sinha wrote:
> 
> 
>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>
>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>> This caused a lock imbalance where the vCPU thread would loop back to
>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
> 
> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?

The issue is not that kvm_cpu_exec() doesn't expect the caller to hold 
the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent 
BQL handling across its return paths.

Normal execution path:

int kvm_cpu_exec(CPUState *cpu)
{
     // BQL held on entry (from caller)

     if (kvm_arch_process_async_events(cpu)) {
         return EXCP_HLT;  // ← Returns with BQL STILL HELD
     }

     bql_unlock();  // ← Normal path unlocks here
     // ... KVM execution loop ...
     bql_lock();    // ← Re-acquires before returning
     return ret;
}

The lock imbalance:

When kvm_arch_process_async_events() returns true, the function returns 
EXCP_HLT before the bql_unlock() call.
This means the early return path keeps the BQL held, while the normal 
execution path releases and re-acquires it.
The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() 
again, but now the BQL is already held from the previous iteration
This creates a situation where the BQL is never released between 
iterations when EXCP_HLT is returned.

Why this matters:
On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), 
these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each 
iteration accumulates BQL holds, preventing other threads (including CPU 
0) from making progress.

> 
>>
>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>> the BQL, preventing other operations from proceeding.
>>
>> The fix has two parts:
>>
>> 1. In kvm_cpu_exec() (kvm-all.c):
>>    Release the BQL before returning EXCP_HLT in the early return path,
>>    matching the behavior of the normal execution path where bql_unlock()
>>    is called before entering the main KVM execution loop.
>>
>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>
>> This ensures proper BQL lock/unlock pairing:
>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>> - Next iteration has BQL held as expected
>>
>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>> changes required to support KVM VM file descriptor change") which
>> refactored kvm_irqchip_create() and changed the initialization timing,
>> exposing this lock imbalance issue.
>>
>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
> 
> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
> 
> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.

Regarding commit 98884e0cc1:

Reverting the kvm_irqchip_create refactoring makes the problem go away. 
This commit may have changed timing that exposed the issue, but the root 
cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can 
either:

Remove the "Fixes:" tag entirely, or
Add a note that this is a pre-existing issue exposed by timing changes
The core fix (releasing BQL before returning EXCP_HLT) is correct and 
addresses the actual deadlock mechanism.

> 
>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>> accel/kvm/kvm-accel-ops.c | 4 ++++
>> accel/kvm/kvm-all.c       | 1 +
>> 2 files changed, 5 insertions(+)
>>
>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>> index 6d9140e549..d684fd0840 100644
>> --- a/accel/kvm/kvm-accel-ops.c
>> +++ b/accel/kvm/kvm-accel-ops.c
>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>
>>          if (cpu_can_run(cpu)) {
>>              r = kvm_cpu_exec(cpu);
>> +            if (r == EXCP_HLT) {
>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>> +                bql_lock();
>> +            }
>>              if (r == EXCP_DEBUG) {
>>                  cpu_handle_guest_debug(cpu);
>>              }
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 774499d34f..00b8018664 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>      trace_kvm_cpu_exec();
>>
>>      if (kvm_arch_process_async_events(cpu)) {
>> +        bql_unlock();
>>          return EXCP_HLT;
>>      }
>>
>> -- 
>> 2.52.0
>>
>

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Ani Sinha 1 day, 8 hours ago


> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> 
> Hi Ani,
> 
> On 10/04/26 9:12 am, Ani Sinha wrote:
>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>> 
>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>> This caused a lock imbalance where the vCPU thread would loop back to
>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
> 
> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
> 
> Normal execution path:
> 
> int kvm_cpu_exec(CPUState *cpu)
> {
>    // BQL held on entry (from caller)
> 
>    if (kvm_arch_process_async_events(cpu)) {
>        return EXCP_HLT;  // ← Returns with BQL STILL HELD
>    }
> 
>    bql_unlock();  // ← Normal path unlocks here
>    // ... KVM execution loop ...
>    bql_lock();    // ← Re-acquires before returning
>    return ret;
> }

Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.

So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.


> 
> The lock imbalance:
> 
> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.

Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.

> This means the early return path keeps the BQL held,

This would be the correct thing to do.

> while the normal execution path releases and re-acquires it.

Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.

> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
> 
> Why this matters:
> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.

This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?

> 
>>> 
>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>>> the BQL, preventing other operations from proceeding.
>>> 
>>> The fix has two parts:
>>> 
>>> 1. In kvm_cpu_exec() (kvm-all.c):
>>>   Release the BQL before returning EXCP_HLT in the early return path,
>>>   matching the behavior of the normal execution path where bql_unlock()
>>>   is called before entering the main KVM execution loop.
>>> 
>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>>   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>>   loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>> 
>>> This ensures proper BQL lock/unlock pairing:
>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>>> - Next iteration has BQL held as expected
>>> 
>>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>>> changes required to support KVM VM file descriptor change") which
>>> refactored kvm_irqchip_create() and changed the initialization timing,
>>> exposing this lock imbalance issue.
>>> 
>>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.
> 
> Regarding commit 98884e0cc1:
> 
> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:
> 
> Remove the "Fixes:" tag entirely, or

Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.

> Add a note that this is a pre-existing issue exposed by timing changes
> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.
> 
>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>> ---
>>> accel/kvm/kvm-accel-ops.c | 4 ++++
>>> accel/kvm/kvm-all.c       | 1 +
>>> 2 files changed, 5 insertions(+)
>>> 
>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>>> index 6d9140e549..d684fd0840 100644
>>> --- a/accel/kvm/kvm-accel-ops.c
>>> +++ b/accel/kvm/kvm-accel-ops.c
>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>> 
>>>         if (cpu_can_run(cpu)) {
>>>             r = kvm_cpu_exec(cpu);
>>> +            if (r == EXCP_HLT) {
>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>>> +                bql_lock();
>>> +            }
>>>             if (r == EXCP_DEBUG) {
>>>                 cpu_handle_guest_debug(cpu);
>>>             }
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index 774499d34f..00b8018664 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>>     trace_kvm_cpu_exec();
>>> 
>>>     if (kvm_arch_process_async_events(cpu)) {
>>> +        bql_unlock();
>>>         return EXCP_HLT;
>>>     }
>>> 
>>> -- 
>>> 2.52.0

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Harsh Prateek Bora 1 day, 7 hours ago


On 10/04/26 12:05 pm, Ani Sinha wrote:
> 
> 
>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>
>> Hi Ani,
>>
>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>
>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>
>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>
>> Normal execution path:
>>
>> int kvm_cpu_exec(CPUState *cpu)
>> {
>>     // BQL held on entry (from caller)
>>
>>     if (kvm_arch_process_async_events(cpu)) {
>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>     }
>>
>>     bql_unlock();  // ← Normal path unlocks here
>>     // ... KVM execution loop ...
>>     bql_lock();    // ← Re-acquires before returning
>>     return ret;
>> }
> 
> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
> 
> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
> 
> 
>>
>> The lock imbalance:
>>
>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
> 
> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
> 
>> This means the early return path keeps the BQL held,
> 
> This would be the correct thing to do.
> 
>> while the normal execution path releases and re-acquires it.
> 
> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.

It had to unlock it for the same reason - to give others a chance to 
lock. We need to handle failure/exception cases for the same purpose as 
well.

> 
>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>
>> Why this matters:
>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
> 
> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?

KVM vCPUs need to enter the kernel to handle the halted state and 
therefore can run. On spapr, it is handled via start-cpu rtas call for 
which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to 
be able to proceed before that stage is reached, but it hangs while 
trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu 
is spinning with BQL held returning EXCP_HLT. This is causing deadlock.

> 
>>
>>>>
>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>>>> the BQL, preventing other operations from proceeding.
>>>>
>>>> The fix has two parts:
>>>>
>>>> 1. In kvm_cpu_exec() (kvm-all.c):
>>>>    Release the BQL before returning EXCP_HLT in the early return path,
>>>>    matching the behavior of the normal execution path where bql_unlock()
>>>>    is called before entering the main KVM execution loop.
>>>>
>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>>>
>>>> This ensures proper BQL lock/unlock pairing:
>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>>>> - Next iteration has BQL held as expected
>>>>
>>>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>>>> changes required to support KVM VM file descriptor change") which
>>>> refactored kvm_irqchip_create() and changed the initialization timing,
>>>> exposing this lock imbalance issue.
>>>>
>>>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.
>>
>> Regarding commit 98884e0cc1:
>>
>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:
>>
>> Remove the "Fixes:" tag entirely, or
> 
> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.
> 
>> Add a note that this is a pre-existing issue exposed by timing changes
>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.
>>
>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>> ---
>>>> accel/kvm/kvm-accel-ops.c | 4 ++++
>>>> accel/kvm/kvm-all.c       | 1 +
>>>> 2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>>>> index 6d9140e549..d684fd0840 100644
>>>> --- a/accel/kvm/kvm-accel-ops.c
>>>> +++ b/accel/kvm/kvm-accel-ops.c
>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>>>
>>>>          if (cpu_can_run(cpu)) {
>>>>              r = kvm_cpu_exec(cpu);
>>>> +            if (r == EXCP_HLT) {
>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>>>> +                bql_lock();
>>>> +            }
>>>>              if (r == EXCP_DEBUG) {
>>>>                  cpu_handle_guest_debug(cpu);
>>>>              }
>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>> index 774499d34f..00b8018664 100644
>>>> --- a/accel/kvm/kvm-all.c
>>>> +++ b/accel/kvm/kvm-all.c
>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>>>      trace_kvm_cpu_exec();
>>>>
>>>>      if (kvm_arch_process_async_events(cpu)) {
>>>> +        bql_unlock();
>>>>          return EXCP_HLT;
>>>>      }
>>>>
>>>> -- 
>>>> 2.52.0
> 
> 
>

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Ani Sinha 1 day, 6 hours ago


> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> 
> 
> 
> On 10/04/26 12:05 pm, Ani Sinha wrote:
>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>> 
>>> Hi Ani,
>>> 
>>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>> 
>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>> 
>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>> 
>>> Normal execution path:
>>> 
>>> int kvm_cpu_exec(CPUState *cpu)
>>> {
>>>    // BQL held on entry (from caller)
>>> 
>>>    if (kvm_arch_process_async_events(cpu)) {
>>>        return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>>    }
>>> 
>>>    bql_unlock();  // ← Normal path unlocks here
>>>    // ... KVM execution loop ...
>>>    bql_lock();    // ← Re-acquires before returning
>>>    return ret;
>>> }
>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
>>> 
>>> The lock imbalance:
>>> 
>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>>> This means the early return path keeps the BQL held,
>> This would be the correct thing to do.
>>> while the normal execution path releases and re-acquires it.
>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
> 
> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.

But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.

> 
>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>> 
>>> Why this matters:
>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?
> 
> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.

Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing. 

> 
>>> 
>>>>> 
>>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>>>>> the BQL, preventing other operations from proceeding.
>>>>> 
>>>>> The fix has two parts:
>>>>> 
>>>>> 1. In kvm_cpu_exec() (kvm-all.c):
>>>>>   Release the BQL before returning EXCP_HLT in the early return path,
>>>>>   matching the behavior of the normal execution path where bql_unlock()
>>>>>   is called before entering the main KVM execution loop.
>>>>> 
>>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>>>>   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>>>>   loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>>>> 
>>>>> This ensures proper BQL lock/unlock pairing:
>>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>>>>> - Next iteration has BQL held as expected
>>>>> 
>>>>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>>>>> changes required to support KVM VM file descriptor change") which
>>>>> refactored kvm_irqchip_create() and changed the initialization timing,
>>>>> exposing this lock imbalance issue.
>>>>> 
>>>>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
>>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
>>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.
>>> 
>>> Regarding commit 98884e0cc1:
>>> 
>>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:
>>> 
>>> Remove the "Fixes:" tag entirely, or
>> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.
>>> Add a note that this is a pre-existing issue exposed by timing changes
>>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.
>>> 
>>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>>> ---
>>>>> accel/kvm/kvm-accel-ops.c | 4 ++++
>>>>> accel/kvm/kvm-all.c       | 1 +
>>>>> 2 files changed, 5 insertions(+)
>>>>> 
>>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>>>>> index 6d9140e549..d684fd0840 100644
>>>>> --- a/accel/kvm/kvm-accel-ops.c
>>>>> +++ b/accel/kvm/kvm-accel-ops.c
>>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>>>> 
>>>>>         if (cpu_can_run(cpu)) {
>>>>>             r = kvm_cpu_exec(cpu);
>>>>> +            if (r == EXCP_HLT) {
>>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>>>>> +                bql_lock();
>>>>> +            }
>>>>>             if (r == EXCP_DEBUG) {
>>>>>                 cpu_handle_guest_debug(cpu);
>>>>>             }
>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>>> index 774499d34f..00b8018664 100644
>>>>> --- a/accel/kvm/kvm-all.c
>>>>> +++ b/accel/kvm/kvm-all.c
>>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>>>>     trace_kvm_cpu_exec();
>>>>> 
>>>>>     if (kvm_arch_process_async_events(cpu)) {
>>>>> +        bql_unlock();
>>>>>         return EXCP_HLT;
>>>>>     }
>>>>> 
>>>>> -- 
>>>>> 2.52.0

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Harsh Prateek Bora 1 day, 6 hours ago


On 10/04/26 1:59 pm, Ani Sinha wrote:
> 
> 
>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>
>>
>>
>> On 10/04/26 12:05 pm, Ani Sinha wrote:
>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>
>>>> Hi Ani,
>>>>
>>>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>
>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>>>
>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>>>
>>>> Normal execution path:
>>>>
>>>> int kvm_cpu_exec(CPUState *cpu)
>>>> {
>>>>     // BQL held on entry (from caller)
>>>>
>>>>     if (kvm_arch_process_async_events(cpu)) {
>>>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>>>     }
>>>>
>>>>     bql_unlock();  // ← Normal path unlocks here
>>>>     // ... KVM execution loop ...
>>>>     bql_lock();    // ← Re-acquires before returning
>>>>     return ret;
>>>> }
>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
>>>>
>>>> The lock imbalance:
>>>>
>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>>>> This means the early return path keeps the BQL held,
>>> This would be the correct thing to do.
>>>> while the normal execution path releases and re-acquires it.
>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
>>
>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.
> 
> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.

I think it is better to unlock bql early in failure cases.
Even Otherwise, it would become a bql_unlock followed by a bql_lock in 
the caller for EXCP_HLT, which might look a bit odd as well.

Paolo, suggestions?

> 
>>
>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>>>
>>>> Why this matters:
>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?
>>
>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.
> 
> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.

I did explain the race condition observed above. Other architectures may 
have different timing that masks the issue.

> 
>>
>>>>
>>>>>>
>>>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>>>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>>>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>>>>>> the BQL, preventing other operations from proceeding.
>>>>>>
>>>>>> The fix has two parts:
>>>>>>
>>>>>> 1. In kvm_cpu_exec() (kvm-all.c):
>>>>>>    Release the BQL before returning EXCP_HLT in the early return path,
>>>>>>    matching the behavior of the normal execution path where bql_unlock()
>>>>>>    is called before entering the main KVM execution loop.
>>>>>>
>>>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>>>>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>>>>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>>>>>
>>>>>> This ensures proper BQL lock/unlock pairing:
>>>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>>>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>>>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>>>>>> - Next iteration has BQL held as expected
>>>>>>
>>>>>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>>>>>> changes required to support KVM VM file descriptor change") which
>>>>>> refactored kvm_irqchip_create() and changed the initialization timing,
>>>>>> exposing this lock imbalance issue.
>>>>>>
>>>>>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
>>>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
>>>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.
>>>>
>>>> Regarding commit 98884e0cc1:
>>>>
>>>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:
>>>>
>>>> Remove the "Fixes:" tag entirely, or
>>> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.
>>>> Add a note that this is a pre-existing issue exposed by timing changes
>>>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.
>>>>
>>>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>>>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>>>> ---
>>>>>> accel/kvm/kvm-accel-ops.c | 4 ++++
>>>>>> accel/kvm/kvm-all.c       | 1 +
>>>>>> 2 files changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>>>>>> index 6d9140e549..d684fd0840 100644
>>>>>> --- a/accel/kvm/kvm-accel-ops.c
>>>>>> +++ b/accel/kvm/kvm-accel-ops.c
>>>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>>>>>
>>>>>>          if (cpu_can_run(cpu)) {
>>>>>>              r = kvm_cpu_exec(cpu);
>>>>>> +            if (r == EXCP_HLT) {
>>>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>>>>>> +                bql_lock();
>>>>>> +            }
>>>>>>              if (r == EXCP_DEBUG) {
>>>>>>                  cpu_handle_guest_debug(cpu);
>>>>>>              }
>>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>>>> index 774499d34f..00b8018664 100644
>>>>>> --- a/accel/kvm/kvm-all.c
>>>>>> +++ b/accel/kvm/kvm-all.c
>>>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>>>>>      trace_kvm_cpu_exec();
>>>>>>
>>>>>>      if (kvm_arch_process_async_events(cpu)) {
>>>>>> +        bql_unlock();
>>>>>>          return EXCP_HLT;
>>>>>>      }
>>>>>>
>>>>>> -- 
>>>>>> 2.52.0
> 
>

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by BALATON Zoltan 1 day, 2 hours ago

On Fri, 10 Apr 2026, Harsh Prateek Bora wrote:
>>>>> The lock imbalance:
>>>>> 
>>>>> When kvm_arch_process_async_events() returns true, the function returns 
>>>>> EXCP_HLT before the bql_unlock() call.
>>>> Why should it unlock it before returning? In fact it’s opposite. If the 
>>>> function had unlocked bql, it should lock it again before returning.
>>>>> This means the early return path keeps the BQL held,
>>>> This would be the correct thing to do.
>>>>> while the normal execution path releases and re-acquires it.
>>>> Because the functions it calls after unlocking requires bql to be 
>>>> unlocked. Since it had to unlock it, it locks it again before returning.
>>> 
>>> It had to unlock it for the same reason - to give others a chance to lock. 
>>> We need to handle failure/exception cases for the same purpose as well.
>> 
>> But by unlocking and returning you are breaking the semantics of the 
>> function and introducing imbalance.
>
> I think it is better to unlock bql early in failure cases.
> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the 
> caller for EXCP_HLT, which might look a bit odd as well.

So why not do

bql_unlock()
/* comment explaining why this is needed */
bql_lock()
return EXCP_HLT;

That should keep the function return locked and fix the problem without 
having to hack the caller for this case.

On Fri, 10 Apr 2026, Ani Sinha wrote:
> >>> I did explain the race condition observed above.
> >> So why not fix it properly?
> > 
> > I think the suggested fix is appropriate for this scenario.
> > Keeping BQL held forever in a loop for EXCP_HLT case isnt the right thing to do.
> 
> IMHO that loop is a symptom of a deeper architectural issue. The fix proposed here feels more like a incorrect hack.

Even if this is not a complete fix of an underlying problem but only a 
work around maybe it's safer to do this just before a release than trying 
to rewrite something deeper now.

Regards,
BALATON Zoltan

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Ani Sinha 1 day, 1 hour ago


> On 10 Apr 2026, at 6:34 PM, BALATON Zoltan <balaton@eik.bme.hu> wrote:
> 
> On Fri, 10 Apr 2026, Harsh Prateek Bora wrote:
>>>>>> The lock imbalance:
>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>>>>>> This means the early return path keeps the BQL held,
>>>>> This would be the correct thing to do.
>>>>>> while the normal execution path releases and re-acquires it.
>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.
>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.
>> 
>> I think it is better to unlock bql early in failure cases.
>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.
> 
> So why not do
> 
> bql_unlock()
> /* comment explaining why this is needed */
> bql_lock()
> return EXCP_HLT;

If we go down this path, what would be wrong with

bql_unlock()
sleep(100);
bql_lock();

This would make the window even larger for the other thread to make progress … How about sleeping for 1 min? 10 min? 


> 
> That should keep the function return locked and fix the problem without having to hack the caller for this case.
> 
> On Fri, 10 Apr 2026, Ani Sinha wrote:
>> >>> I did explain the race condition observed above.
>> >> So why not fix it properly?
>> > > I think the suggested fix is appropriate for this scenario.
>> > Keeping BQL held forever in a loop for EXCP_HLT case isnt the right thing to do.
>> IMHO that loop is a symptom of a deeper architectural issue. The fix proposed here feels more like a incorrect hack.
> 
> Even if this is not a complete fix of an underlying problem but only a work around maybe it's safer to do this just before a release than trying to rewrite something deeper now.
> 
> Regards,
> BALATON Zoltan

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by BALATON Zoltan 1 day ago

On Fri, 10 Apr 2026, Ani Sinha wrote:
>> On 10 Apr 2026, at 6:34 PM, BALATON Zoltan <balaton@eik.bme.hu> wrote:
>> On Fri, 10 Apr 2026, Harsh Prateek Bora wrote:
>>>>>>> The lock imbalance:
>>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>>>>>>> This means the early return path keeps the BQL held,
>>>>>> This would be the correct thing to do.
>>>>>>> while the normal execution path releases and re-acquires it.
>>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
>>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.
>>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.
>>>
>>> I think it is better to unlock bql early in failure cases.
>>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.
>>
>> So why not do
>>
>> bql_unlock()
>> /* comment explaining why this is needed */
>> bql_lock()
>> return EXCP_HLT;
>
> If we go down this path, what would be wrong with
>
> bql_unlock()
> sleep(100);
> bql_lock();
>
> This would make the window even larger for the other thread to make progress … How about sleeping for 1 min? 10 min?

If the reason to unlock is to give a chance to other waiting threads to 
acquire the lock then no need to sleep long as we'll wait in bql_lock if 
another thread got the lock so less than a second sleep or maybe just a 
few milliseconds should be plenty of time for other threads to get a 
chance to run and avoid a deadlock.

I understand that you say that maybe there could be a better fix elsewhere 
but we don't seem to fully understand the issue and trying a bigger 
rewrite now is more likely to break something than this hack that should 
give suffucient work around for the relase at least.

But I'm not involved in this just tried to at least resolve the deadlock 
between you if I can't resolve the deadlock in BQL. :-)

Regards,
BALATON Zoltan

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Ani Sinha 1 day, 5 hours ago


> On 10 Apr 2026, at 2:31 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> 
> 
> 
> On 10/04/26 1:59 pm, Ani Sinha wrote:
>>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>> 
>>> 
>>> 
>>> On 10/04/26 12:05 pm, Ani Sinha wrote:
>>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>> 
>>>>> Hi Ani,
>>>>> 
>>>>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>> 
>>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>>>> 
>>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>>>> 
>>>>> Normal execution path:
>>>>> 
>>>>> int kvm_cpu_exec(CPUState *cpu)
>>>>> {
>>>>>    // BQL held on entry (from caller)
>>>>> 
>>>>>    if (kvm_arch_process_async_events(cpu)) {
>>>>>        return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>>>>    }
>>>>> 
>>>>>    bql_unlock();  // ← Normal path unlocks here
>>>>>    // ... KVM execution loop ...
>>>>>    bql_lock();    // ← Re-acquires before returning
>>>>>    return ret;
>>>>> }
>>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
>>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
>>>>> 
>>>>> The lock imbalance:
>>>>> 
>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>>>>> This means the early return path keeps the BQL held,
>>>> This would be the correct thing to do.
>>>>> while the normal execution path releases and re-acquires it.
>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
>>> 
>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.
>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.
> 
> I think it is better to unlock bql early in failure cases.
> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.

But you are doing exactly that across a function call. 

> 
> Paolo, suggestions?
> 
>>> 
>>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>>>> 
>>>>> Why this matters:
>>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
>>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?
>>> 
>>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.
>> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.
> 
> I did explain the race condition observed above.

So why not fix it properly?

> Other architectures may have different timing that masks the issue.
> 
>>> 
>>>>> 
>>>>>>> 
>>>>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>>>>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>>>>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>>>>>>> the BQL, preventing other operations from proceeding.
>>>>>>> 
>>>>>>> The fix has two parts:
>>>>>>> 
>>>>>>> 1. In kvm_cpu_exec() (kvm-all.c):
>>>>>>>   Release the BQL before returning EXCP_HLT in the early return path,
>>>>>>>   matching the behavior of the normal execution path where bql_unlock()
>>>>>>>   is called before entering the main KVM execution loop.
>>>>>>> 
>>>>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>>>>>>   Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>>>>>>   loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>>>>>> 
>>>>>>> This ensures proper BQL lock/unlock pairing:
>>>>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>>>>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>>>>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>>>>>>> - Next iteration has BQL held as expected
>>>>>>> 
>>>>>>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>>>>>>> changes required to support KVM VM file descriptor change") which
>>>>>>> refactored kvm_irqchip_create() and changed the initialization timing,
>>>>>>> exposing this lock imbalance issue.
>>>>>>> 
>>>>>>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
>>>>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
>>>>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.
>>>>> 
>>>>> Regarding commit 98884e0cc1:
>>>>> 
>>>>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:
>>>>> 
>>>>> Remove the "Fixes:" tag entirely, or
>>>> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.
>>>>> Add a note that this is a pre-existing issue exposed by timing changes
>>>>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.
>>>>> 
>>>>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>>>>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>>>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>>>>> ---
>>>>>>> accel/kvm/kvm-accel-ops.c | 4 ++++
>>>>>>> accel/kvm/kvm-all.c       | 1 +
>>>>>>> 2 files changed, 5 insertions(+)
>>>>>>> 
>>>>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>>>>>>> index 6d9140e549..d684fd0840 100644
>>>>>>> --- a/accel/kvm/kvm-accel-ops.c
>>>>>>> +++ b/accel/kvm/kvm-accel-ops.c
>>>>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>>>>>> 
>>>>>>>         if (cpu_can_run(cpu)) {
>>>>>>>             r = kvm_cpu_exec(cpu);
>>>>>>> +            if (r == EXCP_HLT) {
>>>>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>>>>>>> +                bql_lock();
>>>>>>> +            }
>>>>>>>             if (r == EXCP_DEBUG) {
>>>>>>>                 cpu_handle_guest_debug(cpu);
>>>>>>>             }
>>>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>>>>> index 774499d34f..00b8018664 100644
>>>>>>> --- a/accel/kvm/kvm-all.c
>>>>>>> +++ b/accel/kvm/kvm-all.c
>>>>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>>>>>>     trace_kvm_cpu_exec();
>>>>>>> 
>>>>>>>     if (kvm_arch_process_async_events(cpu)) {
>>>>>>> +        bql_unlock();
>>>>>>>         return EXCP_HLT;
>>>>>>>     }
>>>>>>> 
>>>>>>> -- 
>>>>>>> 2.52.0

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Harsh Prateek Bora 1 day, 5 hours ago


On 10/04/26 3:01 pm, Ani Sinha wrote:
> 
> 
>> On 10 Apr 2026, at 2:31 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>
>>
>>
>> On 10/04/26 1:59 pm, Ani Sinha wrote:
>>>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 10/04/26 12:05 pm, Ani Sinha wrote:
>>>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>
>>>>>> Hi Ani,
>>>>>>
>>>>>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>>>
>>>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>>>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>>>>>
>>>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>>>>>
>>>>>> Normal execution path:
>>>>>>
>>>>>> int kvm_cpu_exec(CPUState *cpu)
>>>>>> {
>>>>>>     // BQL held on entry (from caller)
>>>>>>
>>>>>>     if (kvm_arch_process_async_events(cpu)) {
>>>>>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>>>>>     }
>>>>>>
>>>>>>     bql_unlock();  // ← Normal path unlocks here
>>>>>>     // ... KVM execution loop ...
>>>>>>     bql_lock();    // ← Re-acquires before returning
>>>>>>     return ret;
>>>>>> }
>>>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
>>>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
>>>>>>
>>>>>> The lock imbalance:
>>>>>>
>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>>>>>> This means the early return path keeps the BQL held,
>>>>> This would be the correct thing to do.
>>>>>> while the normal execution path releases and re-acquires it.
>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
>>>>
>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.
>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.
>>
>> I think it is better to unlock bql early in failure cases.
>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.
> 
> But you are doing exactly that across a function call.

I am fine with either/or maintainer's choice.

> 
>>
>> Paolo, suggestions?
>>
>>>>
>>>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>>>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>>>>>
>>>>>> Why this matters:
>>>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
>>>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?
>>>>
>>>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.
>>> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.
>>
>> I did explain the race condition observed above.
> 
> So why not fix it properly?

I think the suggested fix is appropriate for this scenario.
Keeping BQL held forever in a loop for EXCP_HLT case isnt the right 
thing to do.

> 
>> Other architectures may have different timing that masks the issue.
>>
>>>>
>>>>>>
>>>>>>>>
>>>>>>>> The issue manifests as boot hangs on PowerPC pseries machines with multiple
>>>>>>>> vCPUs, where secondary vCPUs with start-powered-off=true remain halted and
>>>>>>>> repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration held
>>>>>>>> the BQL, preventing other operations from proceeding.
>>>>>>>>
>>>>>>>> The fix has two parts:
>>>>>>>>
>>>>>>>> 1. In kvm_cpu_exec() (kvm-all.c):
>>>>>>>>    Release the BQL before returning EXCP_HLT in the early return path,
>>>>>>>>    matching the behavior of the normal execution path where bql_unlock()
>>>>>>>>    is called before entering the main KVM execution loop.
>>>>>>>>
>>>>>>>> 2. In kvm_vcpu_thread_fn() (kvm-accel-ops.c):
>>>>>>>>    Re-acquire the BQL after kvm_cpu_exec() returns EXCP_HLT, since the
>>>>>>>>    loop expects to hold the BQL when calling kvm_cpu_exec() again.
>>>>>>>>
>>>>>>>> This ensures proper BQL lock/unlock pairing:
>>>>>>>> - kvm_vcpu_thread_fn() holds BQL before calling kvm_cpu_exec()
>>>>>>>> - kvm_cpu_exec() releases BQL before returning (for EXCP_HLT)
>>>>>>>> - kvm_vcpu_thread_fn() re-acquires BQL if EXCP_HLT was returned
>>>>>>>> - Next iteration has BQL held as expected
>>>>>>>>
>>>>>>>> This is a regression introduced by commit 98884e0cc1 ("accel/kvm: add
>>>>>>>> changes required to support KVM VM file descriptor change") which
>>>>>>>> refactored kvm_irqchip_create() and changed the initialization timing,
>>>>>>>> exposing this lock imbalance issue.
>>>>>>>>
>>>>>>>> Fixes: 98884e0cc1 ("accel/kvm: add changes required to support KVM VM file descriptor change")
>>>>>>> I do not think this is the right reference. The above commit may have exposed some underlying issue but is certainly not the cause of it. Further, as we have discussed in the other thread, the changes in that commit are not even getting executed.
>>>>>>> Personally I think the core issue is somewhere else. I am not convinced this is the proper fix.
>>>>>>
>>>>>> Regarding commit 98884e0cc1:
>>>>>>
>>>>>> Reverting the kvm_irqchip_create refactoring makes the problem go away. This commit may have changed timing that exposed the issue, but the root cause is the pre-existing BQL lock imbalance in kvm_cpu_exec(). We can either:
>>>>>>
>>>>>> Remove the "Fixes:" tag entirely, or
>>>>> Lets remove fixes tag entirely unless you can pin point the exact commit that introduced the architectural issues.
>>>>>> Add a note that this is a pre-existing issue exposed by timing changes
>>>>>> The core fix (releasing BQL before returning EXCP_HLT) is correct and addresses the actual deadlock mechanism.
>>>>>>
>>>>>>>> Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
>>>>>>>> Reported-by: Gautam Menghani <gautam@linux.ibm.com>
>>>>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>>>>>> ---
>>>>>>>> accel/kvm/kvm-accel-ops.c | 4 ++++
>>>>>>>> accel/kvm/kvm-all.c       | 1 +
>>>>>>>> 2 files changed, 5 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
>>>>>>>> index 6d9140e549..d684fd0840 100644
>>>>>>>> --- a/accel/kvm/kvm-accel-ops.c
>>>>>>>> +++ b/accel/kvm/kvm-accel-ops.c
>>>>>>>> @@ -52,6 +52,10 @@ static void *kvm_vcpu_thread_fn(void *arg)
>>>>>>>>
>>>>>>>>          if (cpu_can_run(cpu)) {
>>>>>>>>              r = kvm_cpu_exec(cpu);
>>>>>>>> +            if (r == EXCP_HLT) {
>>>>>>>> +                /* kvm_cpu_exec() released BQL, re-acquire for next iteration */
>>>>>>>> +                bql_lock();
>>>>>>>> +            }
>>>>>>>>              if (r == EXCP_DEBUG) {
>>>>>>>>                  cpu_handle_guest_debug(cpu);
>>>>>>>>              }
>>>>>>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>>>>>>> index 774499d34f..00b8018664 100644
>>>>>>>> --- a/accel/kvm/kvm-all.c
>>>>>>>> +++ b/accel/kvm/kvm-all.c
>>>>>>>> @@ -3439,6 +3439,7 @@ int kvm_cpu_exec(CPUState *cpu)
>>>>>>>>      trace_kvm_cpu_exec();
>>>>>>>>
>>>>>>>>      if (kvm_arch_process_async_events(cpu)) {
>>>>>>>> +        bql_unlock();
>>>>>>>>          return EXCP_HLT;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> 2.52.0
> 
>

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Ani Sinha 1 day, 5 hours ago


> On 10 Apr 2026, at 3:32 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
> 
> 
> 
> On 10/04/26 3:01 pm, Ani Sinha wrote:
>>> On 10 Apr 2026, at 2:31 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>> 
>>> 
>>> 
>>> On 10/04/26 1:59 pm, Ani Sinha wrote:
>>>>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 10/04/26 12:05 pm, Ani Sinha wrote:
>>>>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>> 
>>>>>>> Hi Ani,
>>>>>>> 
>>>>>>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>>>> 
>>>>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>>>>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>>>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>>>>>> 
>>>>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>>>>>> 
>>>>>>> Normal execution path:
>>>>>>> 
>>>>>>> int kvm_cpu_exec(CPUState *cpu)
>>>>>>> {
>>>>>>>    // BQL held on entry (from caller)
>>>>>>> 
>>>>>>>    if (kvm_arch_process_async_events(cpu)) {
>>>>>>>        return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>>>>>>    }
>>>>>>> 
>>>>>>>    bql_unlock();  // ← Normal path unlocks here
>>>>>>>    // ... KVM execution loop ...
>>>>>>>    bql_lock();    // ← Re-acquires before returning
>>>>>>>    return ret;
>>>>>>> }
>>>>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
>>>>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
>>>>>>> 
>>>>>>> The lock imbalance:
>>>>>>> 
>>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>>>>>>> This means the early return path keeps the BQL held,
>>>>>> This would be the correct thing to do.
>>>>>>> while the normal execution path releases and re-acquires it.
>>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
>>>>> 
>>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.
>>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.
>>> 
>>> I think it is better to unlock bql early in failure cases.
>>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.
>> But you are doing exactly that across a function call.
> 
> I am fine with either/or maintainer's choice.
> 
>>> 
>>> Paolo, suggestions?
>>> 
>>>>> 
>>>>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>>>>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>>>>>> 
>>>>>>> Why this matters:
>>>>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
>>>>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?
>>>>> 
>>>>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.
>>>> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.
>>> 
>>> I did explain the race condition observed above.
>> So why not fix it properly?
> 
> I think the suggested fix is appropriate for this scenario.
> Keeping BQL held forever in a loop for EXCP_HLT case isnt the right thing to do.

IMHO that loop is a symptom of a deeper architectural issue. The fix proposed here feels more like a incorrect hack.

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Harsh Prateek Bora 1 day, 5 hours ago


On 10/04/26 3:35 pm, Ani Sinha wrote:
> 
> 
>> On 10 Apr 2026, at 3:32 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>
>>
>>
>> On 10/04/26 3:01 pm, Ani Sinha wrote:
>>>> On 10 Apr 2026, at 2:31 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 10/04/26 1:59 pm, Ani Sinha wrote:
>>>>>> On 10 Apr 2026, at 1:48 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/04/26 12:05 pm, Ani Sinha wrote:
>>>>>>>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>>>
>>>>>>>> Hi Ani,
>>>>>>>>
>>>>>>>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>>>>>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>>>>>>>>
>>>>>>>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>>>>>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>>>>>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>>>>>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>>>>>>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>>>>>>>>
>>>>>>>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>>>>>>>>
>>>>>>>> Normal execution path:
>>>>>>>>
>>>>>>>> int kvm_cpu_exec(CPUState *cpu)
>>>>>>>> {
>>>>>>>>     // BQL held on entry (from caller)
>>>>>>>>
>>>>>>>>     if (kvm_arch_process_async_events(cpu)) {
>>>>>>>>         return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     bql_unlock();  // ← Normal path unlocks here
>>>>>>>>     // ... KVM execution loop ...
>>>>>>>>     bql_lock();    // ← Re-acquires before returning
>>>>>>>>     return ret;
>>>>>>>> }
>>>>>>> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
>>>>>>> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
>>>>>>>>
>>>>>>>> The lock imbalance:
>>>>>>>>
>>>>>>>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
>>>>>>> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
>>>>>>>> This means the early return path keeps the BQL held,
>>>>>>> This would be the correct thing to do.
>>>>>>>> while the normal execution path releases and re-acquires it.
>>>>>>> Because the functions it calls after unlocking requires bql to be unlocked. Since it had to unlock it, it locks it again before returning.
>>>>>>
>>>>>> It had to unlock it for the same reason - to give others a chance to lock. We need to handle failure/exception cases for the same purpose as well.
>>>>> But by unlocking and returning you are breaking the semantics of the function and introducing imbalance.
>>>>
>>>> I think it is better to unlock bql early in failure cases.
>>>> Even Otherwise, it would become a bql_unlock followed by a bql_lock in the caller for EXCP_HLT, which might look a bit odd as well.
>>> But you are doing exactly that across a function call.
>>
>> I am fine with either/or maintainer's choice.
>>
>>>>
>>>> Paolo, suggestions?
>>>>
>>>>>>
>>>>>>>> The caller (kvm_vcpu_thread_fn()) loops back and calls kvm_cpu_exec() again, but now the BQL is already held from the previous iteration
>>>>>>>> This creates a situation where the BQL is never released between iterations when EXCP_HLT is returned.
>>>>>>>>
>>>>>>>> Why this matters:
>>>>>>>> On PowerPC pseries with halted secondary vCPUs (start-powered-off=true), these vCPUs repeatedly call kvm_cpu_exec() which returns EXCP_HLT. Each iteration accumulates BQL holds, preventing other threads (including CPU 0) from making progress.
>>>>>>> This seems like some kind of architectural issue with PowerPC. Shouldn’t qemu_process_cpu_events() -> qemu_cond_wait(cpu->halt_cond, &bql) block other secondary cpus? Then the main cpu does a qemu_cpu_kick() to make them active again at some point?
>>>>>>
>>>>>> KVM vCPUs need to enter the kernel to handle the halted state and therefore can run. On spapr, it is handled via start-cpu rtas call for which the handler in qemu does a qemu_cpu_kick(). However CPU 0 needs to be able to proceed before that stage is reached, but it hangs while trying to acquire bql_lock in qemu_default_main() whereas secondary vcpu is spinning with BQL held returning EXCP_HLT. This is causing deadlock.
>>>>> Without looking at the code, it seems there is a race condition in the way the vcpu threads are initialised in spapr. I think that needs fixing.
>>>>
>>>> I did explain the race condition observed above.
>>> So why not fix it properly?
>>
>> I think the suggested fix is appropriate for this scenario.
>> Keeping BQL held forever in a loop for EXCP_HLT case isnt the right thing to do.
> 
> IMHO that loop is a symptom of a deeper architectural issue. The fix proposed here feels more like a incorrect hack.

I do not think so. Irrespective of the issue, there's no reason to keep 
holding BQL forever in a loop like that for EXCP_HLT case.


>

Re: [PATCH for 11.0-rc3] accel/kvm: Fix BQL lock imbalance in kvm_cpu_exec

Posted by Ani Sinha 1 day, 7 hours ago


> On 10 Apr 2026, at 12:05 PM, Ani Sinha <anisinha@redhat.com> wrote:
> 
> 
> 
>> On 10 Apr 2026, at 10:55 AM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>> 
>> Hi Ani,
>> 
>> On 10/04/26 9:12 am, Ani Sinha wrote:
>>>> On 9 Apr 2026, at 9:40 PM, Harsh Prateek Bora <harshpb@linux.ibm.com> wrote:
>>>> 
>>>> When kvm_cpu_exec() returns EXCP_HLT due to kvm_arch_process_async_events()
>>>> returning true, it was returning before releasing the BQL (Big QEMU Lock).
>>>> This caused a lock imbalance where the vCPU thread would loop back to
>>>> kvm_cpu_exec() while still holding the BQL, leading to deadlocks.
>>> I am not sure I understand this. Seems kvm_cpu_exec() does expect that the caller holds bql before calling the function. Where is the lock imbalance?
>> 
>> The issue is not that kvm_cpu_exec() doesn't expect the caller to hold the BQL - it does. The problem is that kvm_cpu_exec() has inconsistent BQL handling across its return paths.
>> 
>> Normal execution path:
>> 
>> int kvm_cpu_exec(CPUState *cpu)
>> {
>>   // BQL held on entry (from caller)
>> 
>>   if (kvm_arch_process_async_events(cpu)) {
>>       return EXCP_HLT;  // ← Returns with BQL STILL HELD
>>   }
>> 
>>   bql_unlock();  // ← Normal path unlocks here
>>   // ... KVM execution loop ...
>>   bql_lock();    // ← Re-acquires before returning
>>   return ret;
>> }
> 
> Yes the semantics of the function kvm_cpu_exec() is that it should always return with bql in locked state. This is because the caller kvm_vcpu_thread_fn() calls this function with bql locked and if you see the end of kvm_vcpu_thread_fn(), it releases the lock.
> 
> So if kvm_cpu_exec() unlocks bql internally, it has the responsibility to lock it again before returning. This makes the locking and unlocking symmetric.
> 
> 
>> 
>> The lock imbalance:
>> 
>> When kvm_arch_process_async_events() returns true, the function returns EXCP_HLT before the bql_unlock() call.
> 
> Why should it unlock it before returning? In fact it’s opposite. If the function had unlocked bql, it should lock it again before returning.
> 
>> This means the early return path keeps the BQL held,
> 
> This would be the correct thing to do.
> 
>> while the normal execution path releases and re-acquires it.
> 
> Because the functions it calls after unlocking requires bql to be unlocked.

Another likely reason is that we cannot have large atomic regions for performance reasons. So a general principle is to acquire the lock narrowly only for specific regions of the code that requires atomicity.