[PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits

Pawan Gupta posted 1 patch 4 years, 4 months ago
There is a newer version of this series
arch/x86/kernel/cpu/tsx.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
[PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
Posted by Pawan Gupta 4 years, 4 months ago
tsx_clear_cpuid() uses MSR_TSX_FORCE_ABORT to clear CPUID.RTM and
CPUID.HLE. Not all CPUs support MSR_TSX_FORCE_ABORT, alternatively use
MSR_IA32_TSX_CTRL when supported.

Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts")
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/kernel/cpu/tsx.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c
index 9c7a5f049292..c2343ea911e8 100644
--- a/arch/x86/kernel/cpu/tsx.c
+++ b/arch/x86/kernel/cpu/tsx.c
@@ -58,7 +58,7 @@ void tsx_enable(void)
 	wrmsrl(MSR_IA32_TSX_CTRL, tsx);
 }
 
-static bool __init tsx_ctrl_is_supported(void)
+static bool tsx_ctrl_is_supported(void)
 {
 	u64 ia32_cap = x86_read_arch_cap_msr();
 
@@ -97,6 +97,10 @@ void tsx_clear_cpuid(void)
 		rdmsrl(MSR_TSX_FORCE_ABORT, msr);
 		msr |= MSR_TFA_TSX_CPUID_CLEAR;
 		wrmsrl(MSR_TSX_FORCE_ABORT, msr);
+	} else if (tsx_ctrl_is_supported()) {
+		rdmsrl(MSR_IA32_TSX_CTRL, msr);
+		msr |= TSX_CTRL_CPUID_CLEAR;
+		wrmsrl(MSR_IA32_TSX_CTRL, msr);
 	}
 }
 
@@ -106,13 +110,11 @@ void __init tsx_init(void)
 	int ret;
 
 	/*
-	 * Hardware will always abort a TSX transaction if both CPUID bits
-	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
-	 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
-	 * here.
+	 * Hardware will always abort a TSX transaction when CPUID
+	 * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
+	 * CPUID.RTM and CPUID.HLE bits. Clear them here.
 	 */
-	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
-	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
 		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
 		tsx_clear_cpuid();
 		setup_clear_cpu_cap(X86_FEATURE_RTM);
-- 
2.31.1

Re: [PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
Posted by Borislav Petkov 4 years, 4 months ago
On Wed, Feb 09, 2022 at 01:04:36PM -0800, Pawan Gupta wrote:
> tsx_clear_cpuid() uses MSR_TSX_FORCE_ABORT to clear CPUID.RTM and
> CPUID.HLE. Not all CPUs support MSR_TSX_FORCE_ABORT, alternatively use
> MSR_IA32_TSX_CTRL when supported.
> 
> Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts")
> Reported-by: kernel test robot <lkp@intel.com>
> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

<--- I'm assuming this needs to be

Cc: <stable@vger.kernel.org>

?

> @@ -106,13 +110,11 @@ void __init tsx_init(void)
>  	int ret;
>  
>  	/*
> -	 * Hardware will always abort a TSX transaction if both CPUID bits
> -	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
> -	 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
> -	 * here.
> +	 * Hardware will always abort a TSX transaction when CPUID
> +	 * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
> +	 * CPUID.RTM and CPUID.HLE bits. Clear them here.
>  	 */
> -	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
> -	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
> +	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {

So you test here X86_FEATURE_RTM_ALWAYS_ABORT and tsx_clear_cpuid()
tests it again. Simplify I guess.

>  		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
>  		tsx_clear_cpuid();
>  		setup_clear_cpu_cap(X86_FEATURE_RTM);

Also, here you clear X86_FEATURE_RTM and X86_FEATURE_HLE but the other
caller of tsx_clear_cpuid() - init_intel() - doesn't clear those bits.
Why?

IOW, can we concentrate the whole tsx_clear_cpuid() logic inside it and
have callers only call it unconditionally. Then the decision whether
to disable TSX and clear bits will happen all solely in that function
instead of being spread around the boot code and being inconsistent.

Hmmm?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
Posted by Pawan Gupta 4 years, 4 months ago
On 14.02.2022 18:38, Borislav Petkov wrote:
>On Wed, Feb 09, 2022 at 01:04:36PM -0800, Pawan Gupta wrote:
>> tsx_clear_cpuid() uses MSR_TSX_FORCE_ABORT to clear CPUID.RTM and
>> CPUID.HLE. Not all CPUs support MSR_TSX_FORCE_ABORT, alternatively use
>> MSR_IA32_TSX_CTRL when supported.
>>
>> Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts")
>> Reported-by: kernel test robot <lkp@intel.com>
>> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
>> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>
><--- I'm assuming this needs to be
>
>Cc: <stable@vger.kernel.org>
>
>?

Yes, this needs to be backported to a few kernels that have the commit
293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts").
Once this is reviewed, I will send a separate email to stable@ with the
list of stable kernels.

>> @@ -106,13 +110,11 @@ void __init tsx_init(void)
>>  	int ret;
>>
>>  	/*
>> -	 * Hardware will always abort a TSX transaction if both CPUID bits
>> -	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
>> -	 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
>> -	 * here.
>> +	 * Hardware will always abort a TSX transaction when CPUID
>> +	 * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
>> +	 * CPUID.RTM and CPUID.HLE bits. Clear them here.
>>  	 */
>> -	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
>> -	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
>> +	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
>
>So you test here X86_FEATURE_RTM_ALWAYS_ABORT and tsx_clear_cpuid()
>tests it again. Simplify I guess.

X86_FEATURE_RTM_ALWAYS_ABORT is the precondition for
MSR_TFA_TSX_CPUID_CLEAR bit to exist. For current callers of
tsx_clear_cpuid() this condition is met, and test for
X86_FEATURE_RTM_ALWAYS_ABORT can be removed. But, all the future callers
must also have this check, otherwise the MSR write will fault. 

>>  		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
>>  		tsx_clear_cpuid();
>>  		setup_clear_cpu_cap(X86_FEATURE_RTM);
>
>Also, here you clear X86_FEATURE_RTM and X86_FEATURE_HLE but the other
>caller of tsx_clear_cpuid() - init_intel() - doesn't clear those bits.
>Why?

Calling setup_clear_cpu_cap() by boot CPU is sufficient to clear these
bits for secondary CPUs. Moreover, init_intel() is also called during
cpu hotplug, clearing cached CPUID bits will not be safe after boot.

>IOW, can we concentrate the whole tsx_clear_cpuid() logic inside it and
>have callers only call it unconditionally. Then the decision whether
>to disable TSX and clear bits will happen all solely in that function
>instead of being spread around the boot code and being inconsistent.

There are certain cases where this will leave the system in an
inconsistent state, for example smt toggle after a late microcode update
that adds CPUID.RTM_ALWAYS_ABORT=1. During an smt toggle, if we
unconditionally clear CPUID.RTM and CPUID.HLE in init_intel(), half of
the CPUs will report TSX feature and other half will not.

Thanks,
Pawan
Re: [PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
Posted by Borislav Petkov 4 years, 4 months ago
On Mon, Feb 14, 2022 at 02:41:21PM -0800, Pawan Gupta wrote:
> Yes, this needs to be backported to a few kernels that have the commit
> 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts").
> Once this is reviewed, I will send a separate email to stable@ with the
> list of stable kernels.

You don't have to send a separate email - CC: stable and the Fixes tag
is enough for a patch to be picked up by the stable folks.

> X86_FEATURE_RTM_ALWAYS_ABORT is the precondition for
> MSR_TFA_TSX_CPUID_CLEAR bit to exist. For current callers of
> tsx_clear_cpuid() this condition is met, and test for
> X86_FEATURE_RTM_ALWAYS_ABORT can be removed. But, all the future callers
> must also have this check, otherwise the MSR write will fault.

I meant something like this (completely untested):

diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c
index c2343ea911e8..9d08a6b1726a 100644
--- a/arch/x86/kernel/cpu/tsx.c
+++ b/arch/x86/kernel/cpu/tsx.c
@@ -84,7 +84,7 @@ static enum tsx_ctrl_states x86_get_tsx_auto_mode(void)
 	return TSX_CTRL_ENABLE;
 }
 
-void tsx_clear_cpuid(void)
+bool tsx_clear_cpuid(void)
 {
 	u64 msr;
 
@@ -97,11 +97,14 @@ void tsx_clear_cpuid(void)
 		rdmsrl(MSR_TSX_FORCE_ABORT, msr);
 		msr |= MSR_TFA_TSX_CPUID_CLEAR;
 		wrmsrl(MSR_TSX_FORCE_ABORT, msr);
+		return true;
 	} else if (tsx_ctrl_is_supported()) {
 		rdmsrl(MSR_IA32_TSX_CTRL, msr);
 		msr |= TSX_CTRL_CPUID_CLEAR;
 		wrmsrl(MSR_IA32_TSX_CTRL, msr);
+		return true;
 	}
+	return false;
 }
 
 void __init tsx_init(void)
@@ -114,9 +117,8 @@ void __init tsx_init(void)
 	 * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
 	 * CPUID.RTM and CPUID.HLE bits. Clear them here.
 	 */
-	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
+	if (tsx_clear_cpuid()) {
 		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
-		tsx_clear_cpuid();
 		setup_clear_cpu_cap(X86_FEATURE_RTM);
 		setup_clear_cpu_cap(X86_FEATURE_HLE);
 		return;

---

but I'm guessing TSX should be disabled by default during boot only when
X86_FEATURE_RTM_ALWAYS_ABORT is set.

If those CPUs which support only disabling TSX through MSR_IA32_TSX_CTRL
but don't have MSR_TSX_FORCE_ABORT - if those CPUs set
X86_FEATURE_RTM_ALWAYS_ABORT too, then this should work.

> There are certain cases where this will leave the system in an
> inconsistent state, for example smt toggle after a late microcode update

What is a "smt toggle"?

You mean late microcode update and then offlining and onlining all
logical CPUs except the BSP which would re-detect CPUID features?

> that adds CPUID.RTM_ALWAYS_ABORT=1. During an smt toggle, if we
> unconditionally clear CPUID.RTM and CPUID.HLE in init_intel(), half of
> the CPUs will report TSX feature and other half will not.

That is important and should be documented. Something like this perhaps:

---

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8321c43554a1..6c7bca9d6f2e 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -722,6 +722,13 @@ static void init_intel(struct cpuinfo_x86 *c)
 	else if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
 	else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT)
+		/*
+		 * This call doesn't clear RTM and HLE X86_FEATURE bits because
+		 * a late microcode reload adding MSR_TSX_FORCE_ABORT can cause
+		 * for those bits to get cleared - something which the kernel
+		 * cannot do due to userspace potentially already using said
+		 * features.
+		 */
 		tsx_clear_cpuid();
 
 	split_lock_init();

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
Posted by Pawan Gupta 4 years, 4 months ago
On 15.02.2022 00:28, Borislav Petkov wrote:
>On Mon, Feb 14, 2022 at 02:41:21PM -0800, Pawan Gupta wrote:
>> X86_FEATURE_RTM_ALWAYS_ABORT is the precondition for
>> MSR_TFA_TSX_CPUID_CLEAR bit to exist. For current callers of
>> tsx_clear_cpuid() this condition is met, and test for
>> X86_FEATURE_RTM_ALWAYS_ABORT can be removed. But, all the future callers
>> must also have this check, otherwise the MSR write will fault.
>
>I meant something like this (completely untested):
>
>diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c
>index c2343ea911e8..9d08a6b1726a 100644
>--- a/arch/x86/kernel/cpu/tsx.c
>+++ b/arch/x86/kernel/cpu/tsx.c
>@@ -84,7 +84,7 @@ static enum tsx_ctrl_states x86_get_tsx_auto_mode(void)
> 	return TSX_CTRL_ENABLE;
> }
>
>-void tsx_clear_cpuid(void)
>+bool tsx_clear_cpuid(void)
> {
> 	u64 msr;
>
>@@ -97,11 +97,14 @@ void tsx_clear_cpuid(void)
> 		rdmsrl(MSR_TSX_FORCE_ABORT, msr);
> 		msr |= MSR_TFA_TSX_CPUID_CLEAR;
> 		wrmsrl(MSR_TSX_FORCE_ABORT, msr);
>+		return true;
> 	} else if (tsx_ctrl_is_supported()) {
> 		rdmsrl(MSR_IA32_TSX_CTRL, msr);
> 		msr |= TSX_CTRL_CPUID_CLEAR;
> 		wrmsrl(MSR_IA32_TSX_CTRL, msr);

This will clear TSX CPUID when X86_FEATURE_RTM_ALWAYS_ABORT=0 also, because ...

>+		return true;
> 	}
>+	return false;
> }
>
> void __init tsx_init(void)
>@@ -114,9 +117,8 @@ void __init tsx_init(void)
> 	 * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
> 	 * CPUID.RTM and CPUID.HLE bits. Clear them here.
> 	 */
>-	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
>+	if (tsx_clear_cpuid()) {

... we are calling tsx_clear_cpuid() unconditionally.

> 		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
>-		tsx_clear_cpuid();
> 		setup_clear_cpu_cap(X86_FEATURE_RTM);
> 		setup_clear_cpu_cap(X86_FEATURE_HLE);
> 		return;
>
>---
>
>but I'm guessing TSX should be disabled by default during boot only when
>X86_FEATURE_RTM_ALWAYS_ABORT is set.

That is correct.

>If those CPUs which support only disabling TSX through MSR_IA32_TSX_CTRL
>but don't have MSR_TSX_FORCE_ABORT - if those CPUs set
>X86_FEATURE_RTM_ALWAYS_ABORT too, then this should work.
>
>> There are certain cases where this will leave the system in an
>> inconsistent state, for example smt toggle after a late microcode update
>
>What is a "smt toggle"?

Sorry, I should have been more clear. I meant:

	# echo off > /sys/devices/system/cpu/smt/control
	# echo on  > /sys/devices/system/cpu/smt/control

>You mean late microcode update and then offlining and onlining all
>logical CPUs except the BSP which would re-detect CPUID features?

That could also be the problematic case.

>> that adds CPUID.RTM_ALWAYS_ABORT=1. During an smt toggle, if we
>> unconditionally clear CPUID.RTM and CPUID.HLE in init_intel(), half of
>> the CPUs will report TSX feature and other half will not.
>
>That is important and should be documented. Something like this perhaps:
>
>---
>
>diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
>index 8321c43554a1..6c7bca9d6f2e 100644
>--- a/arch/x86/kernel/cpu/intel.c
>+++ b/arch/x86/kernel/cpu/intel.c
>@@ -722,6 +722,13 @@ static void init_intel(struct cpuinfo_x86 *c)
> 	else if (tsx_ctrl_state == TSX_CTRL_DISABLE)
> 		tsx_disable();
> 	else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT)
>+		/*
>+		 * This call doesn't clear RTM and HLE X86_FEATURE bits because
>+		 * a late microcode reload adding MSR_TSX_FORCE_ABORT can cause
>+		 * for those bits to get cleared - something which the kernel
>+		 * cannot do due to userspace potentially already using said
>+		 * features.
>+		 */
> 		tsx_clear_cpuid();

Thanks, I will add this comment in the next revision.

Pawan