arch/x86/include/asm/qspinlock.h | 4 ++-- arch/x86/kernel/paravirt.c | 7 +++---- 2 files changed, 5 insertions(+), 6 deletions(-)
The kernel can change spinlock behavior when running as a guest. But
this guest-friendly behavior causes performance problems on bare metal.
So there's a 'virt_spin_lock_key' static key to switch between the two
modes.
In current code, the static key is always enabled by default when
running in guest mode. The key is disabled for bare metal (and in
some guests that want native behavior).
Large performance regression is reported when running encode/decode
workload and BenchSEE cache sub-workload on the bare metal.
Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
is disabled, the virt_spin_lock_key is incorrectly set to true on bare
metal. The qspinlock degenerates to test-and-set spinlock, which
decrease the performance on bare metal.
Set the default value of virt_spin_lock_key to false. If booting in
a VM, enable this key. Later during the VM initialization, if other
high-efficient spinlock is detected(paravirt-spinlock eg), the
virt_spin_lock_key is disabled. According to the description above,
the final effect will be as followed:
X86_FEATURE_HYPERVISOR Y Y Y N
CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
PV spinlock Y N N Y/N
virt_spin_lock_key N N Y N
To summarize, the virt_spin_lock_key is disabled on the bare metal
no matter what other condidtion is. And the virt_spin_lock_key is
also disabled when other spinlock mechanism is detected in the VM
guest.
Fixes: ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Suggested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Suggested-by: Nikolay Borisov <nik.borisov@suse.com>
Reported-by: Prem Nath Dey <prem.nath.dey@intel.com>
Reported-by: Xiaoping Zhou <xiaoping.zhou@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v3->v4:
Refine the commit log.
Added Reviewed-by tag from Nikolay.
v2->v3:
Change the default value of virt_spin_lock_key from true to false.
Enable this key when it is in the VM, and disable it when needed.
This makes the code more readable. (Nikolay Borisov)
Dropped Reviewed-by because the code has been changed.
v1->v2:
Refine the commit log per Dave's suggestion.
Simplify the fix by directly disabling the virt_spin_lock_key on bare metal.
Collect Reviewed-by from Juergen.
---
arch/x86/include/asm/qspinlock.h | 4 ++--
arch/x86/kernel/paravirt.c | 7 +++----
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index a053c1293975..a32bd2aabdf9 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -66,13 +66,13 @@ static inline bool vcpu_is_preempted(long cpu)
#ifdef CONFIG_PARAVIRT
/*
- * virt_spin_lock_key - enables (by default) the virt_spin_lock() hijack.
+ * virt_spin_lock_key - disables (by default) the virt_spin_lock() hijack.
*
* Native (and PV wanting native due to vCPU pinning) should disable this key.
* It is done in this backwards fashion to only have a single direction change,
* which removes ordering between native_pv_spin_init() and HV setup.
*/
-DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
+DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
/*
* Shortcut for the queued_spin_lock_slowpath() function that allows
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 5358d43886ad..fec381533555 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
#endif
-DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
+DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
void __init native_pv_lock_init(void)
{
- if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&
- !boot_cpu_has(X86_FEATURE_HYPERVISOR))
- static_branch_disable(&virt_spin_lock_key);
+ if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ static_branch_enable(&virt_spin_lock_key);
}
static void native_tlb_remove_table(struct mmu_gather *tlb, void *table)
--
2.25.1
Chenyu,
I do not know much about x86, just give some comments(probably
incorrected) from the code.
On 2024/7/29 下午2:52, Chen Yu wrote:
> The kernel can change spinlock behavior when running as a guest. But
> this guest-friendly behavior causes performance problems on bare metal.
> So there's a 'virt_spin_lock_key' static key to switch between the two
> modes.
>
> In current code, the static key is always enabled by default when
> running in guest mode. The key is disabled for bare metal (and in
> some guests that want native behavior).
>
> Large performance regression is reported when running encode/decode
> workload and BenchSEE cache sub-workload on the bare metal.
> Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
> native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
> is disabled, the virt_spin_lock_key is incorrectly set to true on bare
> metal. The qspinlock degenerates to test-and-set spinlock, which
> decrease the performance on bare metal.
>
> Set the default value of virt_spin_lock_key to false. If booting in
> a VM, enable this key. Later during the VM initialization, if other
> high-efficient spinlock is detected(paravirt-spinlock eg), the
> virt_spin_lock_key is disabled. According to the description above,
> the final effect will be as followed:
>
> X86_FEATURE_HYPERVISOR Y Y Y N
> CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
> PV spinlock Y N N Y/N
>
> virt_spin_lock_key N N Y N
>
> To summarize, the virt_spin_lock_key is disabled on the bare metal
> no matter what other condidtion is. And the virt_spin_lock_key is
> also disabled when other spinlock mechanism is detected in the VM
> guest.
>
> Fixes: ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning")
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Suggested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Suggested-by: Nikolay Borisov <nik.borisov@suse.com>
> Reported-by: Prem Nath Dey <prem.nath.dey@intel.com>
> Reported-by: Xiaoping Zhou <xiaoping.zhou@intel.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
> v3->v4:
> Refine the commit log.
> Added Reviewed-by tag from Nikolay.
> v2->v3:
> Change the default value of virt_spin_lock_key from true to false.
> Enable this key when it is in the VM, and disable it when needed.
> This makes the code more readable. (Nikolay Borisov)
> Dropped Reviewed-by because the code has been changed.
> v1->v2:
> Refine the commit log per Dave's suggestion.
> Simplify the fix by directly disabling the virt_spin_lock_key on bare metal.
> Collect Reviewed-by from Juergen.
> ---
> arch/x86/include/asm/qspinlock.h | 4 ++--
> arch/x86/kernel/paravirt.c | 7 +++----
> 2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index a053c1293975..a32bd2aabdf9 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -66,13 +66,13 @@ static inline bool vcpu_is_preempted(long cpu)
>
> #ifdef CONFIG_PARAVIRT
> /*
> - * virt_spin_lock_key - enables (by default) the virt_spin_lock() hijack.
> + * virt_spin_lock_key - disables (by default) the virt_spin_lock() hijack.
> *
> * Native (and PV wanting native due to vCPU pinning) should disable this key.
> * It is done in this backwards fashion to only have a single direction change,
> * which removes ordering between native_pv_spin_init() and HV setup.
> */
> -DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
> +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
@@ -87,7 +87,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
{
int val;
- if (!static_branch_likely(&virt_spin_lock_key))
+ if (!static_branch_unlikely(&virt_spin_lock_key))
return false;
Do we need change it with static_branch_unlikely() if value of key is
false by fault?
>
> /*
> * Shortcut for the queued_spin_lock_slowpath() function that allows
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 5358d43886ad..fec381533555 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
> DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
> #endif
>
> -DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
> +DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
>
> void __init native_pv_lock_init(void)
> {
> - if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&
> - !boot_cpu_has(X86_FEATURE_HYPERVISOR))
> - static_branch_disable(&virt_spin_lock_key);
> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> + static_branch_enable(&virt_spin_lock_key);
> }
From my point, the sentence static_branch_disable(&virt_spin_lock_key)
can be removed in file arch/x86/xen/spinlock.c and
arch/x86/kernel/kvm.c, since function native_smp_prepare_boot_cpu() is
already called by xen_smp_prepare_boot_cpu() and kvm_smp_prepare_boot_cpu().
Regards
Bibo Mao
>
> static void native_tlb_remove_table(struct mmu_gather *tlb, void *table)
>
Hi Bibo,
On 2024-08-01 at 16:00:19 +0800, maobibo wrote:
> Chenyu,
>
> I do not know much about x86, just give some comments(probably incorrected)
> from the code.
>
> On 2024/7/29 下午2:52, Chen Yu wrote:
> > X86_FEATURE_HYPERVISOR Y Y Y N
> > CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
> > PV spinlock Y N N Y/N
> >
> > virt_spin_lock_key N N Y N
> >
> > -DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
> > +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
>
> @@ -87,7 +87,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> {
> int val;
>
> - if (!static_branch_likely(&virt_spin_lock_key))
> + if (!static_branch_unlikely(&virt_spin_lock_key))
> return false;
>
> Do we need change it with static_branch_unlikely() if value of key is false
> by fault?
My understanding is that, firstly, whether it is likely() or unlikely()
depends on the 'expected' value of the key, rather than its default
initialized value. The compiler can arrange the if 'jump' condition to
avoid the overhead of branch jump(to keep the instruction pipeline)
as much as possible. Secondly, before this patch, the 'expected' value
of virt_spin_lock_key is 'true', so I'm not sure if we should change
its behavior. Although it seems that in most VM guest, with para-virt
spinlock enabled, this key should be false at most time, but just in
case of any regression...
> > /*
> > * Shortcut for the queued_spin_lock_slowpath() function that allows
> > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> > index 5358d43886ad..fec381533555 100644
> > --- a/arch/x86/kernel/paravirt.c
> > +++ b/arch/x86/kernel/paravirt.c
> > @@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
> > DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
> > #endif
> > -DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
> > +DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
> > void __init native_pv_lock_init(void)
> > {
> > - if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&
> > - !boot_cpu_has(X86_FEATURE_HYPERVISOR))
> > - static_branch_disable(&virt_spin_lock_key);
> > + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> > + static_branch_enable(&virt_spin_lock_key);
> > }
>
> From my point, the sentence static_branch_disable(&virt_spin_lock_key) can
> be removed in file arch/x86/xen/spinlock.c and arch/x86/kernel/kvm.c, since
> function native_smp_prepare_boot_cpu() is already called by
> xen_smp_prepare_boot_cpu() and kvm_smp_prepare_boot_cpu().
>
The key is enabled by native_smp_prepare_boot_cpu() for VM guest as
the initial value(default to true). And later either arch/x86/xen/spinlock.c
or arch/x86/kernel/kvm.c disable this key in a on-demand manner.
thanks,
Chenyu
Hi Chenyu,
On 2024/8/1 下午10:40, Chen Yu wrote:
> Hi Bibo,
>
> On 2024-08-01 at 16:00:19 +0800, maobibo wrote:
>> Chenyu,
>>
>> I do not know much about x86, just give some comments(probably incorrected)
>> from the code.
>>
>> On 2024/7/29 下午2:52, Chen Yu wrote:
>>> X86_FEATURE_HYPERVISOR Y Y Y N
>>> CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
>>> PV spinlock Y N N Y/N
>>>
>>> virt_spin_lock_key N N Y N
>>>
>>> -DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
>>> +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
>>
>> @@ -87,7 +87,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
>> {
>> int val;
>>
>> - if (!static_branch_likely(&virt_spin_lock_key))
>> + if (!static_branch_unlikely(&virt_spin_lock_key))
>> return false;
>>
>> Do we need change it with static_branch_unlikely() if value of key is false
>> by fault?
>
> My understanding is that, firstly, whether it is likely() or unlikely()
> depends on the 'expected' value of the key, rather than its default
> initialized value. The compiler can arrange the if 'jump' condition to
> avoid the overhead of branch jump(to keep the instruction pipeline)
> as much as possible. Secondly, before this patch, the 'expected' value
> of virt_spin_lock_key is 'true', so I'm not sure if we should change
> its behavior. Although it seems that in most VM guest, with para-virt
> spinlock enabled, this key should be false at most time, but just in
> case of any regression...
yes, it does not inflect the result, it is a trivial thing and depends
on personal like or dislike.
>
>>> /*
>>> * Shortcut for the queued_spin_lock_slowpath() function that allows
>>> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>>> index 5358d43886ad..fec381533555 100644
>>> --- a/arch/x86/kernel/paravirt.c
>>> +++ b/arch/x86/kernel/paravirt.c
>>> @@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
>>> DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
>>> #endif
>>> -DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
>>> +DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
>>> void __init native_pv_lock_init(void)
>>> {
>>> - if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&
>>> - !boot_cpu_has(X86_FEATURE_HYPERVISOR))
>>> - static_branch_disable(&virt_spin_lock_key);
>>> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>>> + static_branch_enable(&virt_spin_lock_key);
>>> }
>>
>> From my point, the sentence static_branch_disable(&virt_spin_lock_key) can
>> be removed in file arch/x86/xen/spinlock.c and arch/x86/kernel/kvm.c, since
>> function native_smp_prepare_boot_cpu() is already called by
>> xen_smp_prepare_boot_cpu() and kvm_smp_prepare_boot_cpu().
>>
>
> The key is enabled by native_smp_prepare_boot_cpu() for VM guest as
> the initial value(default to true). And later either arch/x86/xen/spinlock.c
> or arch/x86/kernel/kvm.c disable this key in a on-demand manner.
I understand that you only care about host machine and do not want to
change behavior of VM. Only that from the view of VM, there are two
conditions such as:
1. If option CONFIG_PARAVIRT_SPINLOCKS is disabled, virt_spin_lock_key
is disabled with your patch. VM will use test-set spinlock rather than
qspinlock to avoid the over-preemption of native qspinlock, just the
same with commit 2aa79af64263. And it is the same for all the hypervisor
types.
2. If option CONFIG_PARAVIRT_SPINLOCKS is enable, pv spinlock cannot be
used because some reasons, such as host hypervisor has no
KVM_FEATURE_PV_UNHALT feature or nopvspin kernel parameter is added. The
behavior to use test-set spinlock or native qspinlock is different on
different hypervisor types.
Even on KVM hypervisor, if KVM_FEATURE_PV_UNHALT is not supported,
test-set spinlock will be used on VM; For nopvspin kernel parameter,
native spinlock is used on VM. What is the rule for this? :)
Regards
Bibo Mao
>
> thanks,
> Chenyu
>
On 2024-08-02 at 09:27:32 +0800, maobibo wrote:
> Hi Chenyu,
>
> On 2024/8/1 下午10:40, Chen Yu wrote:
> > Hi Bibo,
> >
> > On 2024-08-01 at 16:00:19 +0800, maobibo wrote:
> > > Chenyu,
> > >
> > > I do not know much about x86, just give some comments(probably incorrected)
> > > from the code.
> > >
> > > On 2024/7/29 下午2:52, Chen Yu wrote:
> > > > X86_FEATURE_HYPERVISOR Y Y Y N
> > > > CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
> > > > PV spinlock Y N N Y/N
> > > >
> > > > virt_spin_lock_key N N Y N
> > > >
> > > > -DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
> > > > +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
> > >
> > > @@ -87,7 +87,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> > > {
> > > int val;
> > >
> > > - if (!static_branch_likely(&virt_spin_lock_key))
> > > + if (!static_branch_unlikely(&virt_spin_lock_key))
> > > return false;
> > >
> > > Do we need change it with static_branch_unlikely() if value of key is false
> > > by fault?
> >
> > My understanding is that, firstly, whether it is likely() or unlikely()
> > depends on the 'expected' value of the key, rather than its default
> > initialized value. The compiler can arrange the if 'jump' condition to
> > avoid the overhead of branch jump(to keep the instruction pipeline)
> > as much as possible. Secondly, before this patch, the 'expected' value
> > of virt_spin_lock_key is 'true', so I'm not sure if we should change
> > its behavior. Although it seems that in most VM guest, with para-virt
> > spinlock enabled, this key should be false at most time, but just in
> > case of any regression...
> yes, it does not inflect the result, it is a trivial thing and depends on
> personal like or dislike.
>
> >
> > > > /*
> > > > * Shortcut for the queued_spin_lock_slowpath() function that allows
> > > > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> > > > index 5358d43886ad..fec381533555 100644
> > > > --- a/arch/x86/kernel/paravirt.c
> > > > +++ b/arch/x86/kernel/paravirt.c
> > > > @@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
> > > > DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
> > > > #endif
> > > > -DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
> > > > +DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
> > > > void __init native_pv_lock_init(void)
> > > > {
> > > > - if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&
> > > > - !boot_cpu_has(X86_FEATURE_HYPERVISOR))
> > > > - static_branch_disable(&virt_spin_lock_key);
> > > > + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> > > > + static_branch_enable(&virt_spin_lock_key);
> > > > }
> > >
> > > From my point, the sentence static_branch_disable(&virt_spin_lock_key) can
> > > be removed in file arch/x86/xen/spinlock.c and arch/x86/kernel/kvm.c, since
> > > function native_smp_prepare_boot_cpu() is already called by
> > > xen_smp_prepare_boot_cpu() and kvm_smp_prepare_boot_cpu().
> > >
> >
> > The key is enabled by native_smp_prepare_boot_cpu() for VM guest as
> > the initial value(default to true). And later either arch/x86/xen/spinlock.c
> > or arch/x86/kernel/kvm.c disable this key in a on-demand manner.
> I understand that you only care about host machine and do not want to change
> behavior of VM. Only that from the view of VM, there are two conditions such
> as:
>
> 1. If option CONFIG_PARAVIRT_SPINLOCKS is disabled, virt_spin_lock_key is
> disabled with your patch. VM will use test-set spinlock rather than
> qspinlock to avoid the over-preemption of native qspinlock, just the same
> with commit 2aa79af64263. And it is the same for all the hypervisor types.
>
> 2. If option CONFIG_PARAVIRT_SPINLOCKS is enable, pv spinlock cannot be used
> because some reasons, such as host hypervisor has no KVM_FEATURE_PV_UNHALT
> feature or nopvspin kernel parameter is added. The behavior to use test-set
> spinlock or native qspinlock is different on different hypervisor types.
>
> Even on KVM hypervisor, if KVM_FEATURE_PV_UNHALT is not supported, test-set
> spinlock will be used on VM; For nopvspin kernel parameter, native spinlock
> is used on VM. What is the rule for this? :)
>
If CONFIG_PARAVIRT_SPINLOCKS is enabled, the test-set spinlock has nothing to do
with the lock path, because if pv_enabled() is true, it will skip the
test-set spinlock and go to pv_queue section. If for some reason the pv spinlock
can not be used because KVM_FEATURE_PV_UNHALT is not supported, it will fall into
the default qpinlock without pv-qspinlock(no pv_wait hook because it is NULL).
thanks,
Chenyu
Hi Chenyu,
On 2024/8/2 下午3:56, Chen Yu wrote:
> On 2024-08-02 at 09:27:32 +0800, maobibo wrote:
>> Hi Chenyu,
>>
>> On 2024/8/1 下午10:40, Chen Yu wrote:
>>> Hi Bibo,
>>>
>>> On 2024-08-01 at 16:00:19 +0800, maobibo wrote:
>>>> Chenyu,
>>>>
>>>> I do not know much about x86, just give some comments(probably incorrected)
>>>> from the code.
>>>>
>>>> On 2024/7/29 下午2:52, Chen Yu wrote:
>>>>> X86_FEATURE_HYPERVISOR Y Y Y N
>>>>> CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
>>>>> PV spinlock Y N N Y/N
>>>>>
>>>>> virt_spin_lock_key N N Y N
>>>>>
>>>>> -DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
>>>>> +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
>>>>
>>>> @@ -87,7 +87,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
>>>> {
>>>> int val;
>>>>
>>>> - if (!static_branch_likely(&virt_spin_lock_key))
>>>> + if (!static_branch_unlikely(&virt_spin_lock_key))
>>>> return false;
>>>>
>>>> Do we need change it with static_branch_unlikely() if value of key is false
>>>> by fault?
>>>
>>> My understanding is that, firstly, whether it is likely() or unlikely()
>>> depends on the 'expected' value of the key, rather than its default
>>> initialized value. The compiler can arrange the if 'jump' condition to
>>> avoid the overhead of branch jump(to keep the instruction pipeline)
>>> as much as possible. Secondly, before this patch, the 'expected' value
>>> of virt_spin_lock_key is 'true', so I'm not sure if we should change
>>> its behavior. Although it seems that in most VM guest, with para-virt
>>> spinlock enabled, this key should be false at most time, but just in
>>> case of any regression...
>> yes, it does not inflect the result, it is a trivial thing and depends on
>> personal like or dislike.
>>
>>>
>>>>> /*
>>>>> * Shortcut for the queued_spin_lock_slowpath() function that allows
>>>>> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>>>>> index 5358d43886ad..fec381533555 100644
>>>>> --- a/arch/x86/kernel/paravirt.c
>>>>> +++ b/arch/x86/kernel/paravirt.c
>>>>> @@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
>>>>> DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
>>>>> #endif
>>>>> -DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
>>>>> +DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
>>>>> void __init native_pv_lock_init(void)
>>>>> {
>>>>> - if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&
>>>>> - !boot_cpu_has(X86_FEATURE_HYPERVISOR))
>>>>> - static_branch_disable(&virt_spin_lock_key);
>>>>> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>>>>> + static_branch_enable(&virt_spin_lock_key);
>>>>> }
>>>>
>>>> From my point, the sentence static_branch_disable(&virt_spin_lock_key) can
>>>> be removed in file arch/x86/xen/spinlock.c and arch/x86/kernel/kvm.c, since
>>>> function native_smp_prepare_boot_cpu() is already called by
>>>> xen_smp_prepare_boot_cpu() and kvm_smp_prepare_boot_cpu().
>>>>
>>>
>>> The key is enabled by native_smp_prepare_boot_cpu() for VM guest as
>>> the initial value(default to true). And later either arch/x86/xen/spinlock.c
>>> or arch/x86/kernel/kvm.c disable this key in a on-demand manner.
>> I understand that you only care about host machine and do not want to change
>> behavior of VM. Only that from the view of VM, there are two conditions such
>> as:
>>
>> 1. If option CONFIG_PARAVIRT_SPINLOCKS is disabled, virt_spin_lock_key is
>> disabled with your patch. VM will use test-set spinlock rather than
>> qspinlock to avoid the over-preemption of native qspinlock, just the same
>> with commit 2aa79af64263. And it is the same for all the hypervisor types.
>>
>> 2. If option CONFIG_PARAVIRT_SPINLOCKS is enable, pv spinlock cannot be used
>> because some reasons, such as host hypervisor has no KVM_FEATURE_PV_UNHALT
>> feature or nopvspin kernel parameter is added. The behavior to use test-set
>> spinlock or native qspinlock is different on different hypervisor types.
>>
>> Even on KVM hypervisor, if KVM_FEATURE_PV_UNHALT is not supported, test-set
>> spinlock will be used on VM; For nopvspin kernel parameter, native spinlock
>> is used on VM. What is the rule for this? :)
>>
>
> If CONFIG_PARAVIRT_SPINLOCKS is enabled, the test-set spinlock has nothing to do
> with the lock path, because if pv_enabled() is true, it will skip the
IIRC, if CONFIG_PARAVIRT_SPINLOCKS is enabled, there is two qspinlock
path: native_queued_spin_lock_slowpath() and
__pv_queued_spin_lock_slowpath(). pv_enabled is false for native
qspinlock path native_queued_spin_lock_slowpath(), test-set spinlock can
be used in function native_queued_spin_lock_slowpath(). pv_enabled() is
true only for function __pv_queued_spin_lock_slowpath().
> test-set spinlock and go to pv_queue section. If for some reason the pv spinlock
> can not be used because KVM_FEATURE_PV_UNHALT is not supported, it will fall into
> the default qpinlock without pv-qspinlock(no pv_wait hook because it is NULL).
yes, if pv spinlock cannot be used, native_queued_spin_lock_slowpath()
will be called for spin_lock_slowpath, then there will be native
qspinlock and test-test spinlock.
Regards
Bibo Mao
>
> thanks,
> Chenyu
>
On 2024-08-02 at 16:13:48 +0800, maobibo wrote:
>
> Hi Chenyu,
> On 2024/8/2 下午3:56, Chen Yu wrote:
> > On 2024-08-02 at 09:27:32 +0800, maobibo wrote:
> > > Hi Chenyu,
> > >
> > > On 2024/8/1 下午10:40, Chen Yu wrote:
> > > > Hi Bibo,
> > > >
> > > > On 2024-08-01 at 16:00:19 +0800, maobibo wrote:
> > > > > Chenyu,
> > > > >
> > > > > I do not know much about x86, just give some comments(probably incorrected)
> > > > > from the code.
> > > > >
> > > > > On 2024/7/29 下午2:52, Chen Yu wrote:
> > > > > > X86_FEATURE_HYPERVISOR Y Y Y N
> > > > > > CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
> > > > > > PV spinlock Y N N Y/N
> > > > > >
> > > > > > virt_spin_lock_key N N Y N
> > > > > >
> > > > > > -DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
> > > > > > +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
> > > > >
> > > > > @@ -87,7 +87,7 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> > > > > {
> > > > > int val;
> > > > >
> > > > > - if (!static_branch_likely(&virt_spin_lock_key))
> > > > > + if (!static_branch_unlikely(&virt_spin_lock_key))
> > > > > return false;
> > > > >
> > > > > Do we need change it with static_branch_unlikely() if value of key is false
> > > > > by fault?
> > > >
> > > > My understanding is that, firstly, whether it is likely() or unlikely()
> > > > depends on the 'expected' value of the key, rather than its default
> > > > initialized value. The compiler can arrange the if 'jump' condition to
> > > > avoid the overhead of branch jump(to keep the instruction pipeline)
> > > > as much as possible. Secondly, before this patch, the 'expected' value
> > > > of virt_spin_lock_key is 'true', so I'm not sure if we should change
> > > > its behavior. Although it seems that in most VM guest, with para-virt
> > > > spinlock enabled, this key should be false at most time, but just in
> > > > case of any regression...
> > > yes, it does not inflect the result, it is a trivial thing and depends on
> > > personal like or dislike.
> > >
> > > >
> > > > > > /*
> > > > > > * Shortcut for the queued_spin_lock_slowpath() function that allows
> > > > > > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> > > > > > index 5358d43886ad..fec381533555 100644
> > > > > > --- a/arch/x86/kernel/paravirt.c
> > > > > > +++ b/arch/x86/kernel/paravirt.c
> > > > > > @@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
> > > > > > DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
> > > > > > #endif
> > > > > > -DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
> > > > > > +DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
> > > > > > void __init native_pv_lock_init(void)
> > > > > > {
> > > > > > - if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&
> > > > > > - !boot_cpu_has(X86_FEATURE_HYPERVISOR))
> > > > > > - static_branch_disable(&virt_spin_lock_key);
> > > > > > + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> > > > > > + static_branch_enable(&virt_spin_lock_key);
> > > > > > }
> > > > >
> > > > > From my point, the sentence static_branch_disable(&virt_spin_lock_key) can
> > > > > be removed in file arch/x86/xen/spinlock.c and arch/x86/kernel/kvm.c, since
> > > > > function native_smp_prepare_boot_cpu() is already called by
> > > > > xen_smp_prepare_boot_cpu() and kvm_smp_prepare_boot_cpu().
> > > > >
> > > >
> > > > The key is enabled by native_smp_prepare_boot_cpu() for VM guest as
> > > > the initial value(default to true). And later either arch/x86/xen/spinlock.c
> > > > or arch/x86/kernel/kvm.c disable this key in a on-demand manner.
> > > I understand that you only care about host machine and do not want to change
> > > behavior of VM. Only that from the view of VM, there are two conditions such
> > > as:
> > >
> > > 1. If option CONFIG_PARAVIRT_SPINLOCKS is disabled, virt_spin_lock_key is
> > > disabled with your patch. VM will use test-set spinlock rather than
> > > qspinlock to avoid the over-preemption of native qspinlock, just the same
> > > with commit 2aa79af64263. And it is the same for all the hypervisor types.
> > >
> > > 2. If option CONFIG_PARAVIRT_SPINLOCKS is enable, pv spinlock cannot be used
> > > because some reasons, such as host hypervisor has no KVM_FEATURE_PV_UNHALT
> > > feature or nopvspin kernel parameter is added. The behavior to use test-set
> > > spinlock or native qspinlock is different on different hypervisor types.
> > >
> > > Even on KVM hypervisor, if KVM_FEATURE_PV_UNHALT is not supported, test-set
> > > spinlock will be used on VM; For nopvspin kernel parameter, native spinlock
> > > is used on VM. What is the rule for this? :)
> > >
> >
> > If CONFIG_PARAVIRT_SPINLOCKS is enabled, the test-set spinlock has nothing to do
> > with the lock path, because if pv_enabled() is true, it will skip the
> IIRC, if CONFIG_PARAVIRT_SPINLOCKS is enabled, there is two qspinlock path:
> native_queued_spin_lock_slowpath() and __pv_queued_spin_lock_slowpath().
> pv_enabled is false for native qspinlock path
> native_queued_spin_lock_slowpath(), test-set spinlock can be used in
> function native_queued_spin_lock_slowpath(). pv_enabled() is true only for
> function __pv_queued_spin_lock_slowpath().
>
Thanks for this explaination in detail!
> > test-set spinlock and go to pv_queue section. If for some reason the pv spinlock
> > can not be used because KVM_FEATURE_PV_UNHALT is not supported, it will fall into
> > the default qpinlock without pv-qspinlock(no pv_wait hook because it is NULL).
> yes, if pv spinlock cannot be used, native_queued_spin_lock_slowpath() will
> be called for spin_lock_slowpath, then there will be native qspinlock and
> test-test spinlock.
>
If I understand correctly, your concern about current logic is that, when
CONFIG_PARAVIRT_SPINLOCKS is set but unfortunately disabled at runtime, there is
inconsistence between using native qspinlock and test-set. My guess is that,
nopvspin is for user who wants non-paravirt and native qspin, no matter it is xen
or the kvm, all the other cases prefer test-set lock, no?
thanks,
Chenyu
Chenyu,
Sorry to bother you, I am porting pv spinlock to LoongArch platform, I
do not know the history about function virt_spin_lock().
When CONFIG_PARAVIRT_SPINLOCKS is enabled, there is pv_enabled() before
virt_spin_lock(), it seems that virt_spin_lock is never called in this
condition.
if (pv_enabled())
goto pv_queue;
if (virt_spin_lock(lock))
return;
When CONFIG_PARAVIRT_SPINLOCKS is disabled, there is no pv qspinlock
compiled, so what is the usage about function virt_spin_lock()? Is it
to switch spinlock method from test-set spinlock to qspinlock? why is
there such requirement for spinlock switching method?
Regards
Bibo Mao
On 2024/7/29 下午2:52, Chen Yu wrote:
> The kernel can change spinlock behavior when running as a guest. But
> this guest-friendly behavior causes performance problems on bare metal.
> So there's a 'virt_spin_lock_key' static key to switch between the two
> modes.
>
> In current code, the static key is always enabled by default when
> running in guest mode. The key is disabled for bare metal (and in
> some guests that want native behavior).
>
> Large performance regression is reported when running encode/decode
> workload and BenchSEE cache sub-workload on the bare metal.
> Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
> native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
> is disabled, the virt_spin_lock_key is incorrectly set to true on bare
> metal. The qspinlock degenerates to test-and-set spinlock, which
> decrease the performance on bare metal.
>
> Set the default value of virt_spin_lock_key to false. If booting in
> a VM, enable this key. Later during the VM initialization, if other
> high-efficient spinlock is detected(paravirt-spinlock eg), the
> virt_spin_lock_key is disabled. According to the description above,
> the final effect will be as followed:
>
> X86_FEATURE_HYPERVISOR Y Y Y N
> CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
> PV spinlock Y N N Y/N
>
> virt_spin_lock_key N N Y N
>
> To summarize, the virt_spin_lock_key is disabled on the bare metal
> no matter what other condidtion is. And the virt_spin_lock_key is
> also disabled when other spinlock mechanism is detected in the VM
> guest.
>
> Fixes: ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning")
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Suggested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Suggested-by: Nikolay Borisov <nik.borisov@suse.com>
> Reported-by: Prem Nath Dey <prem.nath.dey@intel.com>
> Reported-by: Xiaoping Zhou <xiaoping.zhou@intel.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
> v3->v4:
> Refine the commit log.
> Added Reviewed-by tag from Nikolay.
> v2->v3:
> Change the default value of virt_spin_lock_key from true to false.
> Enable this key when it is in the VM, and disable it when needed.
> This makes the code more readable. (Nikolay Borisov)
> Dropped Reviewed-by because the code has been changed.
> v1->v2:
> Refine the commit log per Dave's suggestion.
> Simplify the fix by directly disabling the virt_spin_lock_key on bare metal.
> Collect Reviewed-by from Juergen.
> ---
> arch/x86/include/asm/qspinlock.h | 4 ++--
> arch/x86/kernel/paravirt.c | 7 +++----
> 2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index a053c1293975..a32bd2aabdf9 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -66,13 +66,13 @@ static inline bool vcpu_is_preempted(long cpu)
>
> #ifdef CONFIG_PARAVIRT
> /*
> - * virt_spin_lock_key - enables (by default) the virt_spin_lock() hijack.
> + * virt_spin_lock_key - disables (by default) the virt_spin_lock() hijack.
> *
> * Native (and PV wanting native due to vCPU pinning) should disable this key.
> * It is done in this backwards fashion to only have a single direction change,
> * which removes ordering between native_pv_spin_init() and HV setup.
> */
> -DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
> +DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
>
> /*
> * Shortcut for the queued_spin_lock_slowpath() function that allows
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 5358d43886ad..fec381533555 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
> DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
> #endif
>
> -DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
> +DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
>
> void __init native_pv_lock_init(void)
> {
> - if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&
> - !boot_cpu_has(X86_FEATURE_HYPERVISOR))
> - static_branch_disable(&virt_spin_lock_key);
> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> + static_branch_enable(&virt_spin_lock_key);
> }
>
> static void native_tlb_remove_table(struct mmu_gather *tlb, void *table)
>
Hi Bibo,
On 2024-07-30 at 09:21:45 +0800, maobibo wrote:
> Chenyu,
>
> Sorry to bother you, I am porting pv spinlock to LoongArch platform, I do
> not know the history about function virt_spin_lock().
>
> When CONFIG_PARAVIRT_SPINLOCKS is enabled, there is pv_enabled() before
> virt_spin_lock(), it seems that virt_spin_lock is never called in this
> condition.
Right, if CONFIG_PARAVIRT_SPINLOCKS is enabled, virt_spin_lock() will not be
checked at all and go directly to pv_queue lock section.
> if (pv_enabled())
> goto pv_queue;
>
> if (virt_spin_lock(lock))
> return;
>
> When CONFIG_PARAVIRT_SPINLOCKS is disabled, there is no pv qspinlock
> compiled, so what is the usage about function virt_spin_lock()?
> Is it to switch spinlock method from test-set spinlock to qspinlock?
> why is there such requirement for spinlock switching method?
Firstly, according to the commit 2aa79af64263 ("locking/qspinlock: Revert
to test-and-set on hypervisors"), virt_spin_lock() was originally introduced
to avoid the over-preemption of native qspinlock(when paravirt-spinlock is
disabled).
Secondly, there is user requirement to use native qspinlock over test-set lock
in the VM. For example, nopvspin parameter is parsed in xen_init_spinlocks().
commit 9043442b43b1 ("locking/paravirt: Use new static key for controlling call
of virt_spin_lock()") was introduced to adjust the key from user boot up commandline.
so the key switches between test-set and the native qspinlock. I've Cced
Juergen and Peter in case I misunderstood.
thanks,
Chenyu
On 2024/7/30 下午4:46, Chen Yu wrote:
> Hi Bibo,
>
> On 2024-07-30 at 09:21:45 +0800, maobibo wrote:
>> Chenyu,
>>
>> Sorry to bother you, I am porting pv spinlock to LoongArch platform, I do
>> not know the history about function virt_spin_lock().
>>
>> When CONFIG_PARAVIRT_SPINLOCKS is enabled, there is pv_enabled() before
>> virt_spin_lock(), it seems that virt_spin_lock is never called in this
>> condition.
>
> Right, if CONFIG_PARAVIRT_SPINLOCKS is enabled, virt_spin_lock() will not be
> checked at all and go directly to pv_queue lock section.
>
>> if (pv_enabled())
>> goto pv_queue;
>>
>> if (virt_spin_lock(lock))
>> return;
>>
>> When CONFIG_PARAVIRT_SPINLOCKS is disabled, there is no pv qspinlock
>> compiled, so what is the usage about function virt_spin_lock()?
>> Is it to switch spinlock method from test-set spinlock to qspinlock?
>> why is there such requirement for spinlock switching method?
>
> Firstly, according to the commit 2aa79af64263 ("locking/qspinlock: Revert
> to test-and-set on hypervisors"), virt_spin_lock() was originally introduced
> to avoid the over-preemption of native qspinlock(when paravirt-spinlock is
> disabled).
Chenyu,
Thanks for the information. It seems that test-and-set spinlock is
suggested rather than qspinlock on VM when paravirt-spinlock is disabled.
Regards
Bibo Mao
>
> Secondly, there is user requirement to use native qspinlock over test-set lock
> in the VM. For example, nopvspin parameter is parsed in xen_init_spinlocks().
> commit 9043442b43b1 ("locking/paravirt: Use new static key for controlling call
> of virt_spin_lock()") was introduced to adjust the key from user boot up commandline.
> so the key switches between test-set and the native qspinlock. I've Cced
> Juergen and Peter in case I misunderstood.
>
> thanks,
> Chenyu
>
On Mon, Jul 29 2024 at 14:52, Chen Yu wrote: > #ifdef CONFIG_PARAVIRT > /* > - * virt_spin_lock_key - enables (by default) the virt_spin_lock() hijack. > + * virt_spin_lock_key - disables (by default) the virt_spin_lock() hijack. > * > * Native (and PV wanting native due to vCPU pinning) should disable this key. > * It is done in this backwards fashion to only have a single direction change, > * which removes ordering between native_pv_spin_init() and HV setup. This comment is bogus now.
Hi Thomas, On 2024-07-29 at 14:43:37 +0200, Thomas Gleixner wrote: > On Mon, Jul 29 2024 at 14:52, Chen Yu wrote: > > #ifdef CONFIG_PARAVIRT > > /* > > - * virt_spin_lock_key - enables (by default) the virt_spin_lock() hijack. > > + * virt_spin_lock_key - disables (by default) the virt_spin_lock() hijack. > > * > > * Native (and PV wanting native due to vCPU pinning) should disable this key. > > * It is done in this backwards fashion to only have a single direction change, > > * which removes ordering between native_pv_spin_init() and HV setup. > > This comment is bogus now. > Yes, now it could be two direction: enabled first and disabled later. Let me refine the comments. thanks, Chenyu
© 2016 - 2025 Red Hat, Inc.