[RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB

Amit Shah posted 2 patches 1 year ago
[RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
Posted by Amit Shah 1 year ago
From: Amit Shah <amit.shah@amd.com>

When Automatic IBRS is disabled, Linux flushed the RSB on every context
switch.  This RSB flush is not necessary in software with the ERAPS
feature on Zen5+ CPUs that flushes the RSB in hardware on a context
switch (triggered by mov-to-CR3).

Additionally, the ERAPS feature also tags host and guest addresses in
the RSB - eliminating the need for software flushing of the RSB on
VMEXIT.

Disable all RSB flushing by Linux when the CPU has ERAPS.

Feature mentioned in AMD PPR 57238.  Will be resubmitted once APM is
public - which I'm told is imminent.

Signed-off-by: Amit Shah <amit.shah@amd.com>
---
 Documentation/admin-guide/hw-vuln/spectre.rst | 5 +++--
 arch/x86/include/asm/cpufeatures.h            | 1 +
 arch/x86/kernel/cpu/bugs.c                    | 6 +++++-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 132e0bc6007e..647c10c0307a 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -417,9 +417,10 @@ The possible values in this file are:
 
   - Return stack buffer (RSB) protection status:
 
-  =============   ===========================================
+  =============   ========================================================
   'RSB filling'   Protection of RSB on context switch enabled
-  =============   ===========================================
+  'ERAPS'         Hardware RSB flush on context switches + guest/host tags
+  =============   ========================================================
 
   - EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status:
 
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 17b6590748c0..79a1373050f7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -461,6 +461,7 @@
 #define X86_FEATURE_AUTOIBRS		(20*32+ 8) /* Automatic IBRS */
 #define X86_FEATURE_NO_SMM_CTL_MSR	(20*32+ 9) /* SMM_CTL MSR is not present */
 
+#define X86_FEATURE_ERAPS		(20*32+24) /* Enhanced RAP / RSB / RAS Security */
 #define X86_FEATURE_SBPB		(20*32+27) /* Selective Branch Prediction Barrier */
 #define X86_FEATURE_IBPB_BRTYPE		(20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
 #define X86_FEATURE_SRSO_NO		(20*32+29) /* CPU is not affected by SRSO */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d5102b72f74d..d7af5f811776 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1634,6 +1634,9 @@ static void __init spectre_v2_mitigate_rsb(enum spectre_v2_mitigation mode)
 	case SPECTRE_V2_RETPOLINE:
 	case SPECTRE_V2_LFENCE:
 	case SPECTRE_V2_IBRS:
+		if (boot_cpu_has(X86_FEATURE_ERAPS))
+			break;
+
 		pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n");
 		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
 		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
@@ -2850,7 +2853,7 @@ static ssize_t spectre_v2_show_state(char *buf)
 	    spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE)
 		return sysfs_emit(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
 
-	return sysfs_emit(buf, "%s%s%s%s%s%s%s%s\n",
+	return sysfs_emit(buf, "%s%s%s%s%s%s%s%s%s\n",
 			  spectre_v2_strings[spectre_v2_enabled],
 			  ibpb_state(),
 			  boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? "; IBRS_FW" : "",
@@ -2858,6 +2861,7 @@ static ssize_t spectre_v2_show_state(char *buf)
 			  boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? "; RSB filling" : "",
 			  pbrsb_eibrs_state(),
 			  spectre_bhi_state(),
+			  boot_cpu_has(X86_FEATURE_ERAPS) ? "; ERAPS hardware RSB flush" : "",
 			  /* this should always be at the end */
 			  spectre_v2_module_string());
 }
-- 
2.47.0
Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
Posted by Dave Hansen 1 year ago
On 11/28/24 05:28, Amit Shah wrote:
> From: Amit Shah <amit.shah@amd.com>
> 
> When Automatic IBRS is disabled, Linux flushed the RSB on every context
> switch.  This RSB flush is not necessary in software with the ERAPS
> feature on Zen5+ CPUs that flushes the RSB in hardware on a context
> switch (triggered by mov-to-CR3).
> 
> Additionally, the ERAPS feature also tags host and guest addresses in
> the RSB - eliminating the need for software flushing of the RSB on
> VMEXIT.
> 
> Disable all RSB flushing by Linux when the CPU has ERAPS.
> 
> Feature mentioned in AMD PPR 57238.  Will be resubmitted once APM is
> public - which I'm told is imminent.

There was a _lot_ of discussion about this. But all of that discussion
seems to have been trimmed out and it seems like we're basically back
to: "this is new hardware supposed to mitigate SpectreRSB, thus it
mitigates SpectreRSB."

Could we please summarize the previous discussions in the changelog?
Otherwise, I fear it will be lost.
Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
Posted by Amit Shah 1 year ago
On Mon, 2024-12-02 at 09:26 -0800, Dave Hansen wrote:
> On 11/28/24 05:28, Amit Shah wrote:
> > From: Amit Shah <amit.shah@amd.com>
> > 
> > When Automatic IBRS is disabled, Linux flushed the RSB on every
> > context
> > switch.  This RSB flush is not necessary in software with the ERAPS
> > feature on Zen5+ CPUs that flushes the RSB in hardware on a context
> > switch (triggered by mov-to-CR3).
> > 
> > Additionally, the ERAPS feature also tags host and guest addresses
> > in
> > the RSB - eliminating the need for software flushing of the RSB on
> > VMEXIT.
> > 
> > Disable all RSB flushing by Linux when the CPU has ERAPS.
> > 
> > Feature mentioned in AMD PPR 57238.  Will be resubmitted once APM
> > is
> > public - which I'm told is imminent.
> 
> There was a _lot_ of discussion about this. But all of that
> discussion
> seems to have been trimmed out and it seems like we're basically back
> to: "this is new hardware supposed to mitigate SpectreRSB, thus it
> mitigates SpectreRSB."

Absolutely, I don't want that to get lost -- but I think that got
captured in Josh's rework patchset.  With that rework, I don't even
need this patchset for the hardware feature to work, because we now
rely on AutoIBRS to do the RSB clearing; and the hardware takes care of
AutoIBRS and ERAPS interaction in Zen5.

The only thing this patch now does is to handle the AutoIBRS-disabled
case -- which happens when SEV-SNP is turned on (i.e. let the hw clear
the RSB instead of stuffing it in Linux).

I can still include the summary of the discussion in this patch - I
just feel it isn't necessary with the rework.

		Amit
Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
Posted by Dave Hansen 1 year ago
On 12/2/24 10:09, Amit Shah wrote:
> I can still include the summary of the discussion in this patch - I
> just feel it isn't necessary with the rework.

It's necessary.

There's a new hardware feature. You want it to replace a software
sequence in certain situations. You have to make an actual, coherent
argument argument as to why the hardware feature is a suitable replacement.

For instance (and I'm pretty sure we've gone over this more than once),
the changelog here still makes the claim that a "context switch" and a
"mov-to-CR3" are the same thing.
Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
Posted by Sean Christopherson 1 year ago
On Mon, Dec 02, 2024, Dave Hansen wrote:
> On 12/2/24 10:09, Amit Shah wrote:
> > I can still include the summary of the discussion in this patch - I
> > just feel it isn't necessary with the rework.
> 
> It's necessary.
> 
> There's a new hardware feature. You want it to replace a software
> sequence in certain situations. You have to make an actual, coherent
> argument argument as to why the hardware feature is a suitable replacement.
> 
> For instance (and I'm pretty sure we've gone over this more than once),
> the changelog here still makes the claim that a "context switch" and a
> "mov-to-CR3" are the same thing.

+1000.  I want a crisp, precise description of the actual hardware behavior, so
that KVM can do the right thing when virtualizing ERAPS.