[v1] x86/paravirt: add backoff mechanism to virt_spin_lock

[PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Wangyang Guo 1 month, 3 weeks ago

When multiple threads waiting for lock at the same time, once lock owner
releases the lock, waiters will see lock available and all try to lock,
which may cause an expensive CAS storm.

Binary exponential backoff is introduced. As try-lock attempt increases,
there is more likely that a larger number threads compete for the same
lock, so increase wait time in exponential.

The optimization can improves SpecCPU2017 502.gcc_r benchmark by ~4% for
288 cores VM on Intel Xeon 6 E-cores platform.

For micro benchmark, the patch can have significant performance gain
in high contention case. Slight regression is found in some of mid-
conetented cases because the last waiter might take longer to check
unlocked. No changes to low contented scenario as expected.

Micro Bench: https://github.com/guowangy/kernel-lock-bench
Test Platform: Xeon 8380L
First Row: critical section length
First Col: CPU core number
Values: backoff vs linux-6.15, throughput based, higher is better

non-critical-length: 1
       0     1     2     4     8    16    32    64   128
1   1.01  1.00  1.00  1.00  1.01  1.01  1.01  1.01  1.00
2   1.02  1.01  1.02  0.97  1.02  1.05  1.01  1.00  1.01
4   1.15  1.20  1.14  1.11  1.34  1.26  0.99  0.93  0.98
8   1.59  1.71  1.18  1.80  1.95  1.45  1.05  0.99  1.17
16  1.04  1.37  1.08  1.31  1.85  1.50  1.24  0.99  1.24
32  1.24  1.36  1.23  1.40  1.50  1.86  1.45  1.18  1.48
64  1.12  1.24  1.11  1.31  1.34  1.37  2.01  1.60  1.43

non-critical-length: 32
       0     1     2     4     8    16    32    64   128
1   1.00  1.00  1.00  1.00  1.00  0.99  1.00  1.00  1.01
2   1.00  1.00  1.00  1.00  1.00  1.00  1.00  0.99  1.00
4   1.12  1.25  1.09  1.07  1.12  1.16  1.13  1.16  1.09
8   1.02  1.16  1.03  1.02  1.04  1.07  1.04  0.99  0.98
16  0.97  0.95  0.84  0.96  0.99  0.98  0.98  1.01  1.03
32  1.05  1.03  0.87  1.05  1.25  1.16  1.25  1.30  1.27
64  1.83  1.10  1.07  1.02  1.19  1.18  1.21  1.14  1.13

non-critical-length: 128
       0     1     2     4     8    16    32    64   128
1   1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00  1.00
2   0.99  1.02  1.00  1.00  1.00  1.00  1.00  1.00  1.00
4   0.98  0.99  1.00  1.00  0.99  1.04  0.99  0.99  1.02
8   1.08  1.08  1.08  1.07  1.15  1.12  1.03  0.94  1.00
16  1.00  1.00  1.00  1.01  1.01  1.01  1.36  1.06  1.02
32  1.07  1.08  1.07  1.07  1.09  1.10  1.22  1.36  1.25
64  1.03  1.04  1.04  1.06  1.13  1.18  0.82  1.02  1.14

Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
---
 arch/x86/include/asm/qspinlock.h | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 68da67df304d..ac6e1bbd9ba4 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -87,7 +87,7 @@ DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
 #define virt_spin_lock virt_spin_lock
 static inline bool virt_spin_lock(struct qspinlock *lock)
 {
-	int val;
+	int val, locked;
 
 	if (!static_branch_likely(&virt_spin_lock_key))
 		return false;
@@ -98,11 +98,33 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
 	 * horrible lock 'holder' preemption issues.
 	 */
 
+#define MAX_BACKOFF 64
+	int backoff = 1;
+
  __retry:
 	val = atomic_read(&lock->val);
+	locked = val;
+
+	if (locked || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
+		int spin_count = backoff;
+
+		while (spin_count--)
+			cpu_relax();
+
+		/*
+		 * Here not locked means lock tried, but fails.
+		 *
+		 * When multiple threads waiting for lock at the same time,
+		 * once lock owner releases the lock, waiters will see lock available
+		 * and all try to lock, which may cause an expensive CAS storm.
+		 *
+		 * Binary exponential backoff is introduced. As try-lock attempt
+		 * increases, there is more likely that a larger number threads
+		 * compete for the same lock, so increase wait time in exponential.
+		 */
+		if (!locked)
+			backoff = (backoff < MAX_BACKOFF) ? backoff << 1 : backoff;
 
-	if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
-		cpu_relax();
 		goto __retry;
 	}
 
-- 
2.43.5

Re: [PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Peter Zijlstra 1 month, 3 weeks ago

On Wed, Aug 13, 2025 at 08:50:43AM +0800, Wangyang Guo wrote:
> When multiple threads waiting for lock at the same time, once lock owner
> releases the lock, waiters will see lock available and all try to lock,
> which may cause an expensive CAS storm.
> 
> Binary exponential backoff is introduced. As try-lock attempt increases,
> there is more likely that a larger number threads compete for the same
> lock, so increase wait time in exponential.

You shouldn't be using virt_spin_lock() to begin with. That means you've
misconfigured your guest.

We have paravirt spinlocks for a reason.

RE: [PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Guo, Wangyang 1 month, 3 weeks ago

On 8/13/2025 10:33 PM, Peter Zijlstra wrote:
> On Wed, Aug 13, 2025 at 08:50:43AM +0800, Wangyang Guo wrote:
>> When multiple threads waiting for lock at the same time, once lock owner
>> releases the lock, waiters will see lock available and all try to lock,
>> which may cause an expensive CAS storm.
>>
>> Binary exponential backoff is introduced. As try-lock attempt increases,
>> there is more likely that a larger number threads compete for the same
>> lock, so increase wait time in exponential.
> 
> You shouldn't be using virt_spin_lock() to begin with. That means you've
> misconfigured your guest.
> 
> We have paravirt spinlocks for a reason.

We have tried PARAVIRT_SPINLOCKS, it can help to reduce the contention cycles, but the throughput is not good. I think there are two factors:

1. the VM is not overcommit, each thread has its CPU resources to doing spin wait.
2. the critical section is very short; spin wait is faster than pv_kick.

BR
Wangyang

Re: [PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Peter Zijlstra 1 month, 3 weeks ago

On Thu, Aug 14, 2025 at 01:26:59AM +0000, Guo, Wangyang wrote:
> On 8/13/2025 10:33 PM, Peter Zijlstra wrote:
> > On Wed, Aug 13, 2025 at 08:50:43AM +0800, Wangyang Guo wrote:
> >> When multiple threads waiting for lock at the same time, once lock owner
> >> releases the lock, waiters will see lock available and all try to lock,
> >> which may cause an expensive CAS storm.
> >>
> >> Binary exponential backoff is introduced. As try-lock attempt increases,
> >> there is more likely that a larger number threads compete for the same
> >> lock, so increase wait time in exponential.
> > 
> > You shouldn't be using virt_spin_lock() to begin with. That means you've
> > misconfigured your guest.
> > 
> > We have paravirt spinlocks for a reason.
> 
> We have tried PARAVIRT_SPINLOCKS, it can help to reduce the contention cycles, but the throughput is not good. I think there are two factors:
> 
> 1. the VM is not overcommit, each thread has its CPU resources to doing spin wait.

In the non-overcommit, physically pinned case, there is a knob to use
native spinlocks.

RE: [????] RE: [PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Li,Rongqing 1 month, 3 weeks ago

> On 8/13/2025 10:33 PM, Peter Zijlstra wrote:
> > On Wed, Aug 13, 2025 at 08:50:43AM +0800, Wangyang Guo wrote:
> >> When multiple threads waiting for lock at the same time, once lock
> >> owner releases the lock, waiters will see lock available and all try
> >> to lock, which may cause an expensive CAS storm.
> >>
> >> Binary exponential backoff is introduced. As try-lock attempt
> >> increases, there is more likely that a larger number threads compete
> >> for the same lock, so increase wait time in exponential.
> >
> > You shouldn't be using virt_spin_lock() to begin with. That means
> > you've misconfigured your guest.
> >
> > We have paravirt spinlocks for a reason.
> 
> We have tried PARAVIRT_SPINLOCKS, it can help to reduce the contention cycles,
> but the throughput is not good. I think there are two factors:
> 
> 1. the VM is not overcommit, each thread has its CPU resources to doing spin
> wait.

If vm is not overcommit, guest should have KVM_HINTS_REALTIME, I think native qspinlock should be better
Could you try test this patch
https://patchwork.kernel.org/project/kvm/patch/20250722110005.4988-1-lirongqing@baidu.com/


Furthermore, I think the virt_spin_lock needs to be optimized.

Br
-Li

> 2. the critical section is very short; spin wait is faster than pv_kick.
> 
> BR
> Wangyang

Re: [????] RE: [PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Peter Zijlstra 1 month, 3 weeks ago

On Thu, Aug 14, 2025 at 03:10:46AM +0000, Li,Rongqing wrote:
> > On 8/13/2025 10:33 PM, Peter Zijlstra wrote:
> > > On Wed, Aug 13, 2025 at 08:50:43AM +0800, Wangyang Guo wrote:
> > >> When multiple threads waiting for lock at the same time, once lock
> > >> owner releases the lock, waiters will see lock available and all try
> > >> to lock, which may cause an expensive CAS storm.
> > >>
> > >> Binary exponential backoff is introduced. As try-lock attempt
> > >> increases, there is more likely that a larger number threads compete
> > >> for the same lock, so increase wait time in exponential.
> > >
> > > You shouldn't be using virt_spin_lock() to begin with. That means
> > > you've misconfigured your guest.
> > >
> > > We have paravirt spinlocks for a reason.
> > 
> > We have tried PARAVIRT_SPINLOCKS, it can help to reduce the contention cycles,
> > but the throughput is not good. I think there are two factors:
> > 
> > 1. the VM is not overcommit, each thread has its CPU resources to doing spin
> > wait.
> 
> If vm is not overcommit, guest should have KVM_HINTS_REALTIME, I think native qspinlock should be better
> Could you try test this patch
> https://patchwork.kernel.org/project/kvm/patch/20250722110005.4988-1-lirongqing@baidu.com/

Right, that's the knob.

> Furthermore, I think the virt_spin_lock needs to be optimized.

Why would virt_spin_lock() need to be optimized? It is the fallback
case; but it is terrible in all possible ways.

Re: [????] RE: [PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Guo, Wangyang 1 month, 2 weeks ago

On 8/14/2025 6:41 PM, Peter Zijlstra wrote:
> On Thu, Aug 14, 2025 at 03:10:46AM +0000, Li,Rongqing wrote:
>>> On 8/13/2025 10:33 PM, Peter Zijlstra wrote:
>>>> On Wed, Aug 13, 2025 at 08:50:43AM +0800, Wangyang Guo wrote:
>>>>> Binary exponential backoff is introduced. As try-lock attempt
>>>>> increases, there is more likely that a larger number threads compete
>>>>> for the same lock, so increase wait time in exponential.
>>>>
>>>> You shouldn't be using virt_spin_lock() to begin with. That means
>>>> you've misconfigured your guest.
>>>>
>>>> We have paravirt spinlocks for a reason.
>>>
>>> We have tried PARAVIRT_SPINLOCKS, it can help to reduce the contention cycles,
>>> but the throughput is not good. I think there are two factors:
>>>
>>> 1. the VM is not overcommit, each thread has its CPU resources to doing spin
>>> wait.
>>
>> If vm is not overcommit, guest should have KVM_HINTS_REALTIME, I think native qspinlock should be better
>> Could you try test this patch
>> https://patchwork.kernel.org/project/kvm/patch/20250722110005.4988-1-lirongqing@baidu.com/
> 
> Right, that's the knob.

Yes, that works.
By providing qemu-kvm with `-cpu host,kvm-hint-dedicated` option, it can 
use mcs qspinlock and have better performance compared to virt_spin_lock().

But for non-overcommit VM, we may add `-overcommit cpu-pm=on` to let 
guest to handle idle by itself and reduce the guest latency. Current 
kernel will fallback to virt_spin_lock, even kvm-hint-dedicated is 
provided. With Rongqing's patch, it can fix this problem.

BR
Wangyang

Re: [PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Dave Hansen 1 month, 3 weeks ago

On 8/12/25 17:50, Wangyang Guo wrote:
> The optimization can improves SpecCPU2017 502.gcc_r benchmark by ~4% for
> 288 cores VM on Intel Xeon 6 E-cores platform.

Which specific locks are getting contended?

RE: [PATCH RESEND^2] x86/paravirt: add backoff mechanism to virt_spin_lock

Posted by Guo, Wangyang 1 month, 3 weeks ago

On 8/13/2025 10:23 PM, Dave Hansen wrote:
> On 8/12/25 17:50, Wangyang Guo wrote:
>> The optimization can improves SpecCPU2017 502.gcc_r benchmark by ~4% for
>> 288 cores VM on Intel Xeon 6 E-cores platform.
> 
> Which specific locks are getting contended?

The majority comes from lruvec->lru_lock when doing release_pages.

BR
Wangyang