[PATCH] x86/srso: Correct the mitigation status when SMT is disabled

Borislav Petkov posted 1 patch 2 years, 4 months ago
There is a newer version of this series
arch/x86/kernel/cpu/bugs.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
[PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Borislav Petkov 2 years, 4 months ago
On Mon, Aug 14, 2023 at 11:17:27PM +0200, Borislav Petkov wrote:
> Lemme see how ugly it becomes tomorrow.

Not too bad, considering bugs.c's ugliness.

From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Tue, 15 Aug 2023 11:53:13 +0200
Subject: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled

Specify how is SRSO mitigated when SMT is disabled. Also, correct the
SMT check for that.

Fixes: e9fbc47b818b ("x86/srso: Disable the mitigation on unaffected configurations")
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 arch/x86/kernel/cpu/bugs.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6c04aef4b63b..dc8f874fdd63 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2417,8 +2417,7 @@ static void __init srso_select_mitigation(void)
 		 * Zen1/2 with SMT off aren't vulnerable after the right
 		 * IBPB microcode has been applied.
 		 */
-		if ((boot_cpu_data.x86 < 0x19) &&
-		    (!cpu_smt_possible() || (cpu_smt_control == CPU_SMT_DISABLED))) {
+		if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
 			setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
 			return;
 		}
@@ -2698,8 +2697,12 @@ static ssize_t retbleed_show_state(char *buf)
 
 static ssize_t srso_show_state(char *buf)
 {
-	if (boot_cpu_has(X86_FEATURE_SRSO_NO))
-		return sysfs_emit(buf, "Not affected\n");
+	if (boot_cpu_has(X86_FEATURE_SRSO_NO)) {
+		if (sched_smt_active())
+			return sysfs_emit(buf, "Not affected\n");
+		else
+			return sysfs_emit(buf, "Mitigation: SMT disabled\n");
+	}
 
 	return sysfs_emit(buf, "%s%s\n",
 			  srso_strings[srso_mitigation],
-- 
2.42.0.rc0.25.ga82fb66fed25

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Josh Poimboeuf 2 years, 4 months ago
On Tue, Aug 15, 2023 at 11:57:24AM +0200, Borislav Petkov wrote:
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -2417,8 +2417,7 @@ static void __init srso_select_mitigation(void)
>  		 * Zen1/2 with SMT off aren't vulnerable after the right
>  		 * IBPB microcode has been applied.
>  		 */
> -		if ((boot_cpu_data.x86 < 0x19) &&
> -		    (!cpu_smt_possible() || (cpu_smt_control == CPU_SMT_DISABLED))) {
> +		if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
>  			setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
>  			return;
>  		}
> @@ -2698,8 +2697,12 @@ static ssize_t retbleed_show_state(char *buf)
>  
>  static ssize_t srso_show_state(char *buf)
>  {
> -	if (boot_cpu_has(X86_FEATURE_SRSO_NO))
> -		return sysfs_emit(buf, "Not affected\n");
> +	if (boot_cpu_has(X86_FEATURE_SRSO_NO)) {
> +		if (sched_smt_active())
> +			return sysfs_emit(buf, "Not affected\n");
> +		else
> +			return sysfs_emit(buf, "Mitigation: SMT disabled\n");
> +	}

AFAICT, nowhere in the spec does it say the SRSO_NO bit won't get set by
future (fixed) HW.  In fact I'd expect it will, similar to other *_NO
flags.

Regardless, here SRSO_NO seems to mean two different things: "reported
safe by host (or HW)" and "not reported safe on Zen1/2 with SMT not
possible".

Also, in this code, the SRSO_NO+SMT combo doesn't seem logically
possible, as srso_show_state() only gets called if X86_BUG_SRSO is set,
which only happens if SRSO_NO is not set by the HW/host in the first
place.  So here, if boot_cpu_has(X86_FEATURE_SRSO_NO), it means SRSO_NO
was manually set by srso_select_mitigation(), and SMT can't possibly be
enabled.

Instead of piggybacking on SRSO_NO, which is confusing, why not just add
a new mitigation type, like:

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6c04aef4b63b..c925b98f5a15 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2343,6 +2343,7 @@ early_param("l1tf", l1tf_cmdline);
 enum srso_mitigation {
 	SRSO_MITIGATION_NONE,
 	SRSO_MITIGATION_MICROCODE,
+	SRSO_MITIGATION_SMT,
 	SRSO_MITIGATION_SAFE_RET,
 	SRSO_MITIGATION_IBPB,
 	SRSO_MITIGATION_IBPB_ON_VMEXIT,
@@ -2359,6 +2360,7 @@ enum srso_mitigation_cmd {
 static const char * const srso_strings[] = {
 	[SRSO_MITIGATION_NONE]           = "Vulnerable",
 	[SRSO_MITIGATION_MICROCODE]      = "Mitigation: microcode",
+	[SRSO_MITIGATION_SMT]		 = "Mitigation: SMT disabled",
 	[SRSO_MITIGATION_SAFE_RET]	 = "Mitigation: safe RET",
 	[SRSO_MITIGATION_IBPB]		 = "Mitigation: IBPB",
 	[SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
@@ -2407,19 +2409,15 @@ static void __init srso_select_mitigation(void)
 		pr_warn("IBPB-extending microcode not applied!\n");
 		pr_warn(SRSO_NOTICE);
 	} else {
-		/*
-		 * Enable the synthetic (even if in a real CPUID leaf)
-		 * flags for guests.
-		 */
+		/* Enable the synthetic flag, as HW doesn't set it. */
 		setup_force_cpu_cap(X86_FEATURE_IBPB_BRTYPE);
 
 		/*
 		 * Zen1/2 with SMT off aren't vulnerable after the right
 		 * IBPB microcode has been applied.
 		 */
-		if ((boot_cpu_data.x86 < 0x19) &&
-		    (!cpu_smt_possible() || (cpu_smt_control == CPU_SMT_DISABLED))) {
-			setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
+		if ((boot_cpu_data.x86 < 0x19) && !cpu_smt_possible()) {
+			srso_mitigation = SRSO_MITIGATION_SMT;
 			return;
 		}
 	}
@@ -2698,9 +2696,6 @@ static ssize_t retbleed_show_state(char *buf)
 
 static ssize_t srso_show_state(char *buf)
 {
-	if (boot_cpu_has(X86_FEATURE_SRSO_NO))
-		return sysfs_emit(buf, "Not affected\n");
-
 	return sysfs_emit(buf, "%s%s\n",
 			  srso_strings[srso_mitigation],
 			  (cpu_has_ibpb_brtype_microcode() ? "" : ", no microcode"));
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Borislav Petkov 2 years, 4 months ago
On Tue, Aug 15, 2023 at 12:58:31PM -0700, Josh Poimboeuf wrote:
> AFAICT, nowhere in the spec does it say the SRSO_NO bit won't get set by
> future (fixed) HW.  In fact I'd expect it will, similar to other *_NO
> flags.

I'm pretty sure it won't.

SRSO_NO is synthesized by the hypervisor *software*. Nothing else.

It is there so that you don't check microcode version in the guest which
is nearly impossible anyway.

> Regardless, here SRSO_NO seems to mean two different things: "reported
> safe by host (or HW)" and "not reported safe on Zen1/2 with SMT not
> possible".

Huh?

> Also, in this code, the SRSO_NO+SMT combo doesn't seem logically
> possible, as srso_show_state() only gets called if X86_BUG_SRSO is set,
> which only happens if SRSO_NO is not set by the HW/host in the first
> place.  So here, if boot_cpu_has(X86_FEATURE_SRSO_NO), it means SRSO_NO
> was manually set by srso_select_mitigation(), and SMT can't possibly be
> enabled.

Have you considered the case where Linux would set SRSO_NO when booting
on future hardware, which is fixed?

There SRSO_NO and SMT will very much be possible.

> Instead of piggybacking on SRSO_NO, which is confusing, why not just add
> a new mitigation type, like:

I had a separate mitigation defintion but then realized I don't need it
because, well, it is not really a mitigation - it is a case where the
machine is not affected.

For example, I have a Zen2 laptop here with SMT disabled in the hardware
which is also not affected.

And also, the rest of the SMT disabled cases in bugs.c do check
sched_smt_active() too, without having a separate mitigation.

That's why.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Josh Poimboeuf 2 years, 4 months ago
On Tue, Aug 15, 2023 at 10:17:53PM +0200, Borislav Petkov wrote:
> On Tue, Aug 15, 2023 at 12:58:31PM -0700, Josh Poimboeuf wrote:
> > AFAICT, nowhere in the spec does it say the SRSO_NO bit won't get set by
> > future (fixed) HW.  In fact I'd expect it will, similar to other *_NO
> > flags.
> 
> I'm pretty sure it won't.
> 
> SRSO_NO is synthesized by the hypervisor *software*. Nothing else.

Citation needed.

> It is there so that you don't check microcode version in the guest which
> is nearly impossible anyway.
> 
> > Regardless, here SRSO_NO seems to mean two different things: "reported
> > safe by host (or HW)" and "not reported safe on Zen1/2 with SMT not
> > possible".
> 
> Huh?

Can you clarify what doesn't make sense?

> > Also, in this code, the SRSO_NO+SMT combo doesn't seem logically
> > possible, as srso_show_state() only gets called if X86_BUG_SRSO is set,
> > which only happens if SRSO_NO is not set by the HW/host in the first
> > place.  So here, if boot_cpu_has(X86_FEATURE_SRSO_NO), it means SRSO_NO
> > was manually set by srso_select_mitigation(), and SMT can't possibly be
> > enabled.
> 
> Have you considered the case where Linux would set SRSO_NO when booting
> on future hardware, which is fixed?
> 
> There SRSO_NO and SMT will very much be possible.

How is that relevant to my comment?  The bug bit still wouldn't get set
and srso_show_state() still wouldn't be called.

-- 
Josh
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Borislav Petkov 2 years, 4 months ago
On Tue, Aug 15, 2023 at 02:27:51PM -0700, Josh Poimboeuf wrote:
> How is that relevant to my comment?  The bug bit still wouldn't get set
> and srso_show_state() still wouldn't be called.

Lemme explain how I see this working - it might help us get on the right
track. And for comparison you can look at X86_FEATURE_BTC_NO too.

* Something has set X86_FEATURE_SRSO_NO - hw or sw doesn't matter
- because the machine is not affected. X86_BUG_SRSO doesn't get set and
  the mitigation detection is skipped. All good.

* Nothing has set X86_FEATURE_SRSO_NO, mitigation detection runs and
find that the kernel runs on a Zen1/2 with SMT disabled - we set
X86_FEATURE_SRSO_NO.

* Now X86_FEATURE_SRSO_NO is passed in by KVM to the guest and the above
dance repeats.

Now you.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Josh Poimboeuf 2 years, 4 months ago
On Wed, Aug 16, 2023 at 10:30:57AM +0200, Borislav Petkov wrote:
> On Tue, Aug 15, 2023 at 02:27:51PM -0700, Josh Poimboeuf wrote:
> > How is that relevant to my comment?  The bug bit still wouldn't get set
> > and srso_show_state() still wouldn't be called.
> 
> Lemme explain how I see this working - it might help us get on the right
> track. And for comparison you can look at X86_FEATURE_BTC_NO too.
> 
> * Something has set X86_FEATURE_SRSO_NO - hw or sw doesn't matter
> - because the machine is not affected. X86_BUG_SRSO doesn't get set and
>   the mitigation detection is skipped. All good.

In this case srso_show_state() is never called, so the following code
can't run:

+	if (boot_cpu_has(X86_FEATURE_SRSO_NO)) {
+		if (sched_smt_active())
+			return sysfs_emit(buf, "Not affected\n");

> * Nothing has set X86_FEATURE_SRSO_NO, mitigation detection runs and
> find that the kernel runs on a Zen1/2 with SMT disabled - we set
> X86_FEATURE_SRSO_NO.

In this case SMT is disabled, so the following code still can't run:

+	if (boot_cpu_has(X86_FEATURE_SRSO_NO)) {
+		if (sched_smt_active())
+			return sysfs_emit(buf, "Not affected\n");


So the above code never runs.

See?

-- 
Josh
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Borislav Petkov 2 years, 4 months ago
On Wed, Aug 16, 2023 at 09:07:57AM -0700, Josh Poimboeuf wrote:
> In this case srso_show_state() is never called, so the following code
> can't run:
> 
> +	if (boot_cpu_has(X86_FEATURE_SRSO_NO)) {
> +		if (sched_smt_active())
> +			return sysfs_emit(buf, "Not affected\n");

Ofc it can. If something has set X86_FEATURE_SRSO_NO early, before the
bug bits detection happens, then you get:

$ cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
Not affected

> In this case SMT is disabled, so the following code still can't run:
> 
> +	if (boot_cpu_has(X86_FEATURE_SRSO_NO)) {
> +		if (sched_smt_active())
> +			return sysfs_emit(buf, "Not affected\n");

Yes, it runs in the above case where on some future hw which might have
SRSO fixed, we'll set SRSO_NO.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Josh Poimboeuf 2 years, 4 months ago
On Wed, Aug 16, 2023 at 07:35:31PM +0200, Borislav Petkov wrote:
> On Wed, Aug 16, 2023 at 09:07:57AM -0700, Josh Poimboeuf wrote:
> > In this case srso_show_state() is never called, so the following code
> > can't run:
> > 
> > +	if (boot_cpu_has(X86_FEATURE_SRSO_NO)) {
> > +		if (sched_smt_active())
> > +			return sysfs_emit(buf, "Not affected\n");
> 
> Ofc it can. If something has set X86_FEATURE_SRSO_NO early, before the
> bug bits detection happens, then you get:
> 
> $ cat /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
> Not affected

No, if the bug bit isn't set then that comes from cpu_show_common():

static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
			       char *buf, unsigned int bug)
{
	if (!boot_cpu_has_bug(bug))
		return sysfs_emit(buf, "Not affected\n");

-- 
Josh
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Borislav Petkov 2 years, 4 months ago
On Wed, Aug 16, 2023 at 11:29:40AM -0700, Josh Poimboeuf wrote:
> No, if the bug bit isn't set then that comes from cpu_show_common():
> 
> static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
> 			       char *buf, unsigned int bug)
> {
> 	if (!boot_cpu_has_bug(bug))
> 		return sysfs_emit(buf, "Not affected\n");

Bah, there was that thing too.

Ok, I'll zap the sched_smt_active() branch in srso_show_state() then.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Borislav Petkov 2 years, 4 months ago
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Tue, 15 Aug 2023 11:53:13 +0200
Subject: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled

Specify how is SRSO mitigated when SMT is disabled. Also, correct the
SMT check for that.

Fixes: e9fbc47b818b ("x86/srso: Disable the mitigation on unaffected configurations")
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814200813.p5czl47zssuej7nv@treble
---
 arch/x86/kernel/cpu/bugs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 9026e3fe9f6c..f081d26616ac 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2428,8 +2428,7 @@ static void __init srso_select_mitigation(void)
 		 * Zen1/2 with SMT off aren't vulnerable after the right
 		 * IBPB microcode has been applied.
 		 */
-		if ((boot_cpu_data.x86 < 0x19) &&
-		    (!cpu_smt_possible() || (cpu_smt_control == CPU_SMT_DISABLED))) {
+		if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
 			setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
 			return;
 		}
@@ -2714,7 +2713,7 @@ static ssize_t retbleed_show_state(char *buf)
 static ssize_t srso_show_state(char *buf)
 {
 	if (boot_cpu_has(X86_FEATURE_SRSO_NO))
-		return sysfs_emit(buf, "Not affected\n");
+		return sysfs_emit(buf, "Mitigation: SMT disabled\n");
 
 	return sysfs_emit(buf, "%s%s\n",
 			  srso_strings[srso_mitigation],
-- 
2.42.0.rc0.25.ga82fb66fed25

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
Posted by Josh Poimboeuf 2 years, 4 months ago
On Thu, Aug 17, 2023 at 11:07:27AM +0200, Borislav Petkov wrote:
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> Date: Tue, 15 Aug 2023 11:53:13 +0200
> Subject: [PATCH] x86/srso: Correct the mitigation status when SMT is disabled
> 
> Specify how is SRSO mitigated when SMT is disabled. Also, correct the
> SMT check for that.
> 
> Fixes: e9fbc47b818b ("x86/srso: Disable the mitigation on unaffected configurations")
> Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> Link: https://lore.kernel.org/r/20230814200813.p5czl47zssuej7nv@treble

Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>

-- 
Josh