Documentation/admin-guide/hw-vuln/srso.rst | 21 +++++++++++++++++++++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/cpu/bugs.c | 16 +++++++++++++++- arch/x86/kvm/svm/svm.c | 14 ++++++++++++++ arch/x86/lib/msr.c | 2 ++ 7 files changed, 55 insertions(+), 1 deletion(-)
Ok,
here's a new version, I think I've addressed all outstanding review comments.
Lemme know how we should proceed here, you take it or I? Judging by the
diffstat probably I should and you ack it or so.
Also, lemme know if I should add your Co-developed-by for the user_return
portion.
Thx.
---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Add support for
CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
SRSO.
enable this BpSpecReduce bit to mitigate SRSO across guest/host
boundaries.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
Documentation/admin-guide/hw-vuln/srso.rst | 21 +++++++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/processor.h | 1 +
arch/x86/kernel/cpu/bugs.c | 16 +++++++++++++++-
arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
arch/x86/lib/msr.c | 2 ++
7 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 2ad1c05b8c88..b856538083a2 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -104,6 +104,27 @@ The possible values in this file are:
(spec_rstack_overflow=ibpb-vmexit)
+ * 'Mitigation: Reduced Speculation':
+
+ This mitigation gets automatically enabled when the above one "IBPB on
+ VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
+
+ It gets automatically enabled on machines which have the
+ SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
+ to the above =ibpb-vmexit mitigation because the user/kernel boundary is
+ not affected anymore and thus "safe RET" is not needed.
+
+ After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
+ is detected (functionality present on all such machines) and that
+ practically overrides IBPB on VMEXIT as it has a lot less performance
+ impact and takes care of the guest->host attack vector too.
+
+ Currently, the mitigation uses KVM's user_return approach
+ (kvm_set_user_return_msr()) to set the BpSpecReduce bit when a vCPU runs
+ a guest and reset it upon return to host userspace or when the KVM module
+ is unloaded. The intent being, the small perf impact of BpSpecReduce should
+ be incurred only when really necessary.
+
In order to exploit vulnerability, an attacker needs to:
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 508c0dad116b..471447a31605 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -468,6 +468,7 @@
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
#define X86_FEATURE_SRSO_USER_KERNEL_NO (20*32+30) /* CPU is not affected by SRSO across user/kernel boundaries */
+#define X86_FEATURE_SRSO_MSR_FIX (20*32+31) /* MSR BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3f3e2bc99162..4cbd461081a1 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -719,6 +719,7 @@
/* Zen4 */
#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
/* Fam 19h MSRs */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c0cd10182e90..a956cd578df6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -762,6 +762,7 @@ enum mds_mitigations {
};
extern bool gds_ucode_mitigated(void);
+extern bool srso_spec_reduce_enabled(void);
/*
* Make previous memory operations globally visible before
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5a505aa65489..07c04b3844fc 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2523,6 +2523,7 @@ enum srso_mitigation {
SRSO_MITIGATION_SAFE_RET,
SRSO_MITIGATION_IBPB,
SRSO_MITIGATION_IBPB_ON_VMEXIT,
+ SRSO_MITIGATION_BP_SPEC_REDUCE,
};
enum srso_mitigation_cmd {
@@ -2540,12 +2541,19 @@ static const char * const srso_strings[] = {
[SRSO_MITIGATION_MICROCODE] = "Vulnerable: Microcode, no safe RET",
[SRSO_MITIGATION_SAFE_RET] = "Mitigation: Safe RET",
[SRSO_MITIGATION_IBPB] = "Mitigation: IBPB",
- [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
+ [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only",
+ [SRSO_MITIGATION_BP_SPEC_REDUCE] = "Mitigation: Reduced Speculation"
};
static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
static enum srso_mitigation_cmd srso_cmd __ro_after_init = SRSO_CMD_SAFE_RET;
+bool srso_spec_reduce_enabled(void)
+{
+ return srso_mitigation == SRSO_MITIGATION_BP_SPEC_REDUCE;
+}
+EXPORT_SYMBOL_GPL(srso_spec_reduce_enabled);
+
static int __init srso_parse_cmdline(char *str)
{
if (!str)
@@ -2663,6 +2671,12 @@ static void __init srso_select_mitigation(void)
ibpb_on_vmexit:
case SRSO_CMD_IBPB_ON_VMEXIT:
+ if (boot_cpu_has(X86_FEATURE_SRSO_MSR_FIX)) {
+ pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
+ srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
+ break;
+ }
+
if (IS_ENABLED(CONFIG_MITIGATION_SRSO)) {
if (!boot_cpu_has(X86_FEATURE_ENTRY_IBPB) && has_microcode) {
setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 21dacd312779..59656fd51d57 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -256,6 +256,7 @@ DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
* defer the restoration of TSC_AUX until the CPU returns to userspace.
*/
static int tsc_aux_uret_slot __read_mostly = -1;
+static int zen4_bp_cfg_uret_slot __ro_after_init = -1;
static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
@@ -1541,6 +1542,11 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);
+ if (srso_spec_reduce_enabled())
+ kvm_set_user_return_msr(zen4_bp_cfg_uret_slot,
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT),
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT));
+
svm->guest_state_loaded = true;
}
@@ -5298,6 +5304,14 @@ static __init int svm_hardware_setup(void)
tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
+ if (srso_spec_reduce_enabled()) {
+ zen4_bp_cfg_uret_slot = kvm_add_user_return_msr(MSR_ZEN4_BP_CFG);
+ if (WARN_ON_ONCE(zen4_bp_cfg_uret_slot < 0)) {
+ r = -EIO;
+ goto err;
+ }
+ }
+
if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
kvm_enable_efer_bits(EFER_AUTOIBRS);
diff --git a/arch/x86/lib/msr.c b/arch/x86/lib/msr.c
index 4bf4fad5b148..5a18ecc04a6c 100644
--- a/arch/x86/lib/msr.c
+++ b/arch/x86/lib/msr.c
@@ -103,6 +103,7 @@ int msr_set_bit(u32 msr, u8 bit)
{
return __flip_bit(msr, bit, true);
}
+EXPORT_SYMBOL_GPL(msr_set_bit);
/**
* msr_clear_bit - Clear @bit in a MSR @msr.
@@ -118,6 +119,7 @@ int msr_clear_bit(u32 msr, u8 bit)
{
return __flip_bit(msr, bit, false);
}
+EXPORT_SYMBOL_GPL(msr_clear_bit);
#ifdef CONFIG_TRACEPOINTS
void do_trace_write_msr(unsigned int msr, u64 val, int failed)
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Sat, Jan 11, 2025, Borislav Petkov wrote:
> Ok,
>
> here's a new version, I think I've addressed all outstanding review comments.
>
> Lemme know how we should proceed here, you take it or I? Judging by the
> diffstat probably I should and you ack it or so.
No preference, either way works for me.
> Also, lemme know if I should add your Co-developed-by for the user_return
> portion.
Heh, no preference again. In case you want to add Co-developed-by, here's my SoB:
Signed-off-by: Sean Christopherson <seanjc@google.com>
> In order to exploit vulnerability, an attacker needs to:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 508c0dad116b..471447a31605 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -468,6 +468,7 @@
> #define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
> #define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
> #define X86_FEATURE_SRSO_USER_KERNEL_NO (20*32+30) /* CPU is not affected by SRSO across user/kernel boundaries */
> +#define X86_FEATURE_SRSO_MSR_FIX (20*32+31) /* MSR BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs */
Any objection to calling this X86_FEATURE_SRSO_BP_SPEC_REDUCE? More below.
> @@ -2540,12 +2541,19 @@ static const char * const srso_strings[] = {
> [SRSO_MITIGATION_MICROCODE] = "Vulnerable: Microcode, no safe RET",
> [SRSO_MITIGATION_SAFE_RET] = "Mitigation: Safe RET",
> [SRSO_MITIGATION_IBPB] = "Mitigation: IBPB",
> - [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
> + [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only",
> + [SRSO_MITIGATION_BP_SPEC_REDUCE] = "Mitigation: Reduced Speculation"
> };
>
> static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
> static enum srso_mitigation_cmd srso_cmd __ro_after_init = SRSO_CMD_SAFE_RET;
>
> +bool srso_spec_reduce_enabled(void)
> +{
> + return srso_mitigation == SRSO_MITIGATION_BP_SPEC_REDUCE;
At the risk of getting too clever, what if we use the X86_FEATURE to communicate
that KVM should toggle the magic MSR? That'd avoid the helper+export, and up to
this point "srso_mitigation" has been used only for documentation purposes.
The flag just needs to be cleared in two locations, which doesn't seem too awful.
Though if we go that route, SRSO_MSR_FIX is a pretty crappy name :-)
> diff --git a/arch/x86/lib/msr.c b/arch/x86/lib/msr.c
> index 4bf4fad5b148..5a18ecc04a6c 100644
> --- a/arch/x86/lib/msr.c
> +++ b/arch/x86/lib/msr.c
> @@ -103,6 +103,7 @@ int msr_set_bit(u32 msr, u8 bit)
> {
> return __flip_bit(msr, bit, true);
> }
> +EXPORT_SYMBOL_GPL(msr_set_bit);
>
> /**
> * msr_clear_bit - Clear @bit in a MSR @msr.
> @@ -118,6 +119,7 @@ int msr_clear_bit(u32 msr, u8 bit)
> {
> return __flip_bit(msr, bit, false);
> }
> +EXPORT_SYMBOL_GPL(msr_clear_bit);
These exports are no longer necessary.
Compile tested only...
---
Documentation/admin-guide/hw-vuln/srso.rst | 21 +++++++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/bugs.c | 15 ++++++++++++++-
arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
5 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 2ad1c05b8c88..b856538083a2 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -104,6 +104,27 @@ The possible values in this file are:
(spec_rstack_overflow=ibpb-vmexit)
+ * 'Mitigation: Reduced Speculation':
+
+ This mitigation gets automatically enabled when the above one "IBPB on
+ VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
+
+ It gets automatically enabled on machines which have the
+ SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
+ to the above =ibpb-vmexit mitigation because the user/kernel boundary is
+ not affected anymore and thus "safe RET" is not needed.
+
+ After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
+ is detected (functionality present on all such machines) and that
+ practically overrides IBPB on VMEXIT as it has a lot less performance
+ impact and takes care of the guest->host attack vector too.
+
+ Currently, the mitigation uses KVM's user_return approach
+ (kvm_set_user_return_msr()) to set the BpSpecReduce bit when a vCPU runs
+ a guest and reset it upon return to host userspace or when the KVM module
+ is unloaded. The intent being, the small perf impact of BpSpecReduce should
+ be incurred only when really necessary.
+
In order to exploit vulnerability, an attacker needs to:
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 0e2d81763615..00f2649c36ab 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -466,6 +466,7 @@
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
#define X86_FEATURE_SRSO_USER_KERNEL_NO (20*32+30) /* CPU is not affected by SRSO across user/kernel boundaries */
+#define X86_FEATURE_SRSO_BP_SPEC_REDUCE (20*32+31) /* MSR BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3ae84c3b8e6d..1372a569fb58 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -717,6 +717,7 @@
/* Zen4 */
#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
/* Fam 19h MSRs */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 8854d9bce2a5..971d3373eeaf 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2523,6 +2523,7 @@ enum srso_mitigation {
SRSO_MITIGATION_SAFE_RET,
SRSO_MITIGATION_IBPB,
SRSO_MITIGATION_IBPB_ON_VMEXIT,
+ SRSO_MITIGATION_BP_SPEC_REDUCE,
};
enum srso_mitigation_cmd {
@@ -2540,7 +2541,8 @@ static const char * const srso_strings[] = {
[SRSO_MITIGATION_MICROCODE] = "Vulnerable: Microcode, no safe RET",
[SRSO_MITIGATION_SAFE_RET] = "Mitigation: Safe RET",
[SRSO_MITIGATION_IBPB] = "Mitigation: IBPB",
- [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
+ [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only",
+ [SRSO_MITIGATION_BP_SPEC_REDUCE] = "Mitigation: Reduced Speculation"
};
static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
@@ -2577,6 +2579,8 @@ static void __init srso_select_mitigation(void)
if (!boot_cpu_has_bug(X86_BUG_SRSO) ||
cpu_mitigations_off() ||
srso_cmd == SRSO_CMD_OFF) {
+ setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
+
if (boot_cpu_has(X86_FEATURE_SBPB))
x86_pred_cmd = PRED_CMD_SBPB;
return;
@@ -2665,6 +2669,12 @@ static void __init srso_select_mitigation(void)
ibpb_on_vmexit:
case SRSO_CMD_IBPB_ON_VMEXIT:
+ if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
+ srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
+ break;
+ }
+
if (IS_ENABLED(CONFIG_MITIGATION_SRSO)) {
if (!boot_cpu_has(X86_FEATURE_ENTRY_IBPB) && has_microcode) {
setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
@@ -2686,6 +2696,9 @@ static void __init srso_select_mitigation(void)
}
out:
+ if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
+ setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
+
pr_info("%s\n", srso_strings[srso_mitigation]);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7640a84e554a..6ea3632af580 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -257,6 +257,7 @@ DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
* defer the restoration of TSC_AUX until the CPU returns to userspace.
*/
static int tsc_aux_uret_slot __read_mostly = -1;
+static int zen4_bp_cfg_uret_slot __ro_after_init = -1;
static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
@@ -1540,6 +1541,11 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ kvm_set_user_return_msr(zen4_bp_cfg_uret_slot,
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT),
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT));
+
svm->guest_state_loaded = true;
}
@@ -5306,6 +5312,14 @@ static __init int svm_hardware_setup(void)
tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ zen4_bp_cfg_uret_slot = kvm_add_user_return_msr(MSR_ZEN4_BP_CFG);
+ if (WARN_ON_ONCE(zen4_bp_cfg_uret_slot < 0)) {
+ r = -EIO;
+ goto err;
+ }
+ }
+
if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
kvm_enable_efer_bits(EFER_AUTOIBRS);
base-commit: 9bec4a1f4ea92b99f96894e0c51c192b5345aa91
--
On Fri, Jan 17, 2025 at 10:56:15AM -0800, Sean Christopherson wrote:
> No preference, either way works for me.
Yeah, too late now. Will queue it after -rc1.
> Heh, no preference again. In case you want to add Co-developed-by, here's my SoB:
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Thanks. Added.
> Any objection to calling this X86_FEATURE_SRSO_BP_SPEC_REDUCE? More below.
It sure is a better name. Lets leave SRSO_MSR_FIX in the comment so that
grepping can find it as SRSO_MSR_FIX is what is in the official doc. IOW:
+#define X86_FEATURE_SRSO_BP_SPEC_REDUCE (20*32+31) /*
+ * BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs.
+ * (SRSO_MSR_FIX in the official doc).
+
> At the risk of getting too clever, what if we use the X86_FEATURE to communicate
> that KVM should toggle the magic MSR?
Not too clever at all. I myself have used exactly this scheme to communicate
settings to other code and actually I should've thought about it... :-\
Definitely better than the silly export.
> That'd avoid the helper+export, and up to this point "srso_mitigation" has
> been used only for documentation purposes.
>
> The flag just needs to be cleared in two locations, which doesn't seem too
> awful. Though if we go that route, SRSO_MSR_FIX is a pretty crappy name :-)
Yap, most def.
> These exports are no longer necessary.
Doh.
> Compile tested only...
Tested on hw.
Thanks!
---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Wed, 13 Nov 2024 13:41:10 +0100
Subject: [PATCH] x86/bugs: KVM: Add support for SRSO_MSR_FIX
Add support for
CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
SRSO.
enable this BpSpecReduce bit to mitigate SRSO across guest/host
boundaries.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
Documentation/admin-guide/hw-vuln/srso.rst | 21 +++++++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 4 ++++
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/bugs.c | 14 +++++++++++++-
arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
5 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 2ad1c05b8c88..b856538083a2 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -104,6 +104,27 @@ The possible values in this file are:
(spec_rstack_overflow=ibpb-vmexit)
+ * 'Mitigation: Reduced Speculation':
+
+ This mitigation gets automatically enabled when the above one "IBPB on
+ VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
+
+ It gets automatically enabled on machines which have the
+ SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
+ to the above =ibpb-vmexit mitigation because the user/kernel boundary is
+ not affected anymore and thus "safe RET" is not needed.
+
+ After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
+ is detected (functionality present on all such machines) and that
+ practically overrides IBPB on VMEXIT as it has a lot less performance
+ impact and takes care of the guest->host attack vector too.
+
+ Currently, the mitigation uses KVM's user_return approach
+ (kvm_set_user_return_msr()) to set the BpSpecReduce bit when a vCPU runs
+ a guest and reset it upon return to host userspace or when the KVM module
+ is unloaded. The intent being, the small perf impact of BpSpecReduce should
+ be incurred only when really necessary.
+
In order to exploit vulnerability, an attacker needs to:
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 508c0dad116b..43653f2704c9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -468,6 +468,10 @@
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
#define X86_FEATURE_SRSO_USER_KERNEL_NO (20*32+30) /* CPU is not affected by SRSO across user/kernel boundaries */
+#define X86_FEATURE_SRSO_BP_SPEC_REDUCE (20*32+31) /*
+ * BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs.
+ * (SRSO_MSR_FIX in the official doc).
+ */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3f3e2bc99162..4cbd461081a1 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -719,6 +719,7 @@
/* Zen4 */
#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
/* Fam 19h MSRs */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5a505aa65489..9e3ea7f1b358 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2523,6 +2523,7 @@ enum srso_mitigation {
SRSO_MITIGATION_SAFE_RET,
SRSO_MITIGATION_IBPB,
SRSO_MITIGATION_IBPB_ON_VMEXIT,
+ SRSO_MITIGATION_BP_SPEC_REDUCE,
};
enum srso_mitigation_cmd {
@@ -2540,7 +2541,8 @@ static const char * const srso_strings[] = {
[SRSO_MITIGATION_MICROCODE] = "Vulnerable: Microcode, no safe RET",
[SRSO_MITIGATION_SAFE_RET] = "Mitigation: Safe RET",
[SRSO_MITIGATION_IBPB] = "Mitigation: IBPB",
- [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
+ [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only",
+ [SRSO_MITIGATION_BP_SPEC_REDUCE] = "Mitigation: Reduced Speculation"
};
static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
@@ -2663,6 +2665,12 @@ static void __init srso_select_mitigation(void)
ibpb_on_vmexit:
case SRSO_CMD_IBPB_ON_VMEXIT:
+ if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
+ srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
+ break;
+ }
+
if (IS_ENABLED(CONFIG_MITIGATION_SRSO)) {
if (!boot_cpu_has(X86_FEATURE_ENTRY_IBPB) && has_microcode) {
setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
@@ -2684,6 +2692,10 @@ static void __init srso_select_mitigation(void)
}
out:
+
+ if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
+ setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
+
pr_info("%s\n", srso_strings[srso_mitigation]);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 21dacd312779..ee14833f00e5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -256,6 +256,7 @@ DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
* defer the restoration of TSC_AUX until the CPU returns to userspace.
*/
static int tsc_aux_uret_slot __read_mostly = -1;
+static int zen4_bp_cfg_uret_slot __ro_after_init = -1;
static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
@@ -1541,6 +1542,11 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ kvm_set_user_return_msr(zen4_bp_cfg_uret_slot,
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT),
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT));
+
svm->guest_state_loaded = true;
}
@@ -5298,6 +5304,14 @@ static __init int svm_hardware_setup(void)
tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ zen4_bp_cfg_uret_slot = kvm_add_user_return_msr(MSR_ZEN4_BP_CFG);
+ if (WARN_ON_ONCE(zen4_bp_cfg_uret_slot < 0)) {
+ r = -EIO;
+ goto err;
+ }
+ }
+
if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
kvm_enable_efer_bits(EFER_AUTOIBRS);
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Sat, Jan 18, 2025, Borislav Petkov wrote:
> static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
> @@ -2663,6 +2665,12 @@ static void __init srso_select_mitigation(void)
Unless I'm missing something, the cpu_mitigations_off() and "srso_cmd == SRSO_CMD_OFF"
cases need to clear the feature
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 9e3ea7f1b3587..3939a8dee27d4 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2581,6 +2581,7 @@ static void __init srso_select_mitigation(void)
srso_cmd == SRSO_CMD_OFF) {
if (boot_cpu_has(X86_FEATURE_SBPB))
x86_pred_cmd = PRED_CMD_SBPB;
+ setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
return;
}
There's also the Zen1/Zen2 ucode+!SMT path, which I assume is irreveleant in
practice:
if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
return;
}
But if we wanted to catch all paths, wrap the guts and clear the feature in the
outer layer?
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 9e3ea7f1b3587..0501e31971421 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2572,7 +2572,7 @@ early_param("spec_rstack_overflow", srso_parse_cmdline);
#define SRSO_NOTICE "WARNING: See https://kernel.org/doc/html/latest/admin-guide/hw-vuln/srso.html for mitigation options."
-static void __init srso_select_mitigation(void)
+static void __init __srso_select_mitigation(void)
{
bool has_microcode = boot_cpu_has(X86_FEATURE_IBPB_BRTYPE);
@@ -2692,11 +2692,15 @@ static void __init srso_select_mitigation(void)
}
out:
+ pr_info("%s\n", srso_strings[srso_mitigation]);
+}
+
+static void __init srso_select_mitigation(void)
+{
+ __srso_select_mitigation();
if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
-
- pr_info("%s\n", srso_strings[srso_mitigation]);
}
#undef pr_fmt
> ibpb_on_vmexit:
> case SRSO_CMD_IBPB_ON_VMEXIT:
> + if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
> + pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
> + srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
> + break;
> + }
> +
> if (IS_ENABLED(CONFIG_MITIGATION_SRSO)) {
> if (!boot_cpu_has(X86_FEATURE_ENTRY_IBPB) && has_microcode) {
> setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
> @@ -2684,6 +2692,10 @@ static void __init srso_select_mitigation(void)
> }
>
> out:
> +
Spurious newlines.
> + if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
> + setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
> +
> pr_info("%s\n", srso_strings[srso_mitigation]);
> }
On Thu, Jan 23, 2025 at 08:25:17AM -0800, Sean Christopherson wrote:
> But if we wanted to catch all paths, wrap the guts and clear the feature in the
> outer layer?
Yap, all valid points, thanks for catching those.
> +static void __init srso_select_mitigation(void)
> +{
> + __srso_select_mitigation();
>
> if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
> setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
> -
> - pr_info("%s\n", srso_strings[srso_mitigation]);
> }
What I'd like, though, here is to not dance around this srso_mitigation
variable setting in __srso_select_mitigation() and then know that the __
function did modify it and now we can eval it.
I'd like for the __ function to return it like __ssb_select_mitigation() does.
But then if we do that, we'll have to do the same changes and turn the returns
to "goto out" where all the paths converge. And I'd prefer if those paths
converged anyway and not have some "early escapes" like those returns which
I completely overlooked. :-\
And that code is going to change soon anyway after David's attack vectors
series.
So, long story short, I guess the simplest thing to do would be to simply do
the below.
I *think*.
I'll stare at it later, on a clear head again and test all cases to make sure
nothing's escaping anymore.
Thx!
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 9e3ea7f1b358..11cafe293c29 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2581,7 +2581,7 @@ static void __init srso_select_mitigation(void)
srso_cmd == SRSO_CMD_OFF) {
if (boot_cpu_has(X86_FEATURE_SBPB))
x86_pred_cmd = PRED_CMD_SBPB;
- return;
+ goto out;
}
if (has_microcode) {
@@ -2593,7 +2593,7 @@ static void __init srso_select_mitigation(void)
*/
if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
- return;
+ goto out;
}
if (retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
@@ -2692,11 +2692,11 @@ static void __init srso_select_mitigation(void)
}
out:
-
if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
- pr_info("%s\n", srso_strings[srso_mitigation]);
+ if (srso_mitigation != SRSO_MITIGATION_NONE)
+ pr_info("%s\n", srso_strings[srso_mitigation]);
}
#undef pr_fmt
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
FWIW, this should be the updated version of the patch with all the review
comments posted so far.
Posting here just to have an overall view of how the new patch should look like.
This is also based on todays Linus's master branch.
Compile tested only...
Best,
Patrick
---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Add support for
CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
SRSO.
enable this BpSpecReduce bit to mitigate SRSO across guest/host
boundaries.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
Documentation/admin-guide/hw-vuln/srso.rst | 20 ++++++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/bugs.c | 21 +++++++++++++++++----
arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
tools/arch/x86/include/asm/msr-index.h | 1 +
5 files changed, 53 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 2ad1c05b8c883..49680ab99c393 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -104,6 +104,26 @@ The possible values in this file are:
(spec_rstack_overflow=ibpb-vmexit)
+ * 'Mitigation: Reduced Speculation':
+
+ This mitigation gets automatically enabled when the above one "IBPB on
+ VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
+
+ It gets automatically enabled on machines which have the
+ SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
+ to the above =ibpb-vmexit mitigation because the user/kernel boundary is
+ not affected anymore and thus "safe RET" is not needed.
+
+ After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
+ is detected (functionality present on all such machines) and that
+ practically overrides IBPB on VMEXIT as it has a lot less performance
+ impact and takes care of the guest->host attack vector too.
+
+ Currently, the mitigation uses KVM's user_return approach
+ (kvm_set_user_return_msr()) to set the BpSpecReduce bit when a vCPU runs
+ a guest and reset it upon return to host userspace or when the KVM module
+ is unloaded. The intent being, the small perf impact of BpSpecReduce should
+ be incurred only when really necessary.
In order to exploit vulnerability, an attacker needs to:
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 508c0dad116bc..c46754298507b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -468,6 +468,7 @@
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
#define X86_FEATURE_SRSO_USER_KERNEL_NO (20*32+30) /* CPU is not affected by SRSO across user/kernel boundaries */
+#define X86_FEATURE_SRSO_BP_SPEC_REDUCE (20*32+31) /* BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs (SRSO_MSR_FIX in AMD docs). */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a5d0998d76049..d2007dbfcc1cc 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2522,6 +2522,7 @@ enum srso_mitigation {
SRSO_MITIGATION_SAFE_RET,
SRSO_MITIGATION_IBPB,
SRSO_MITIGATION_IBPB_ON_VMEXIT,
+ SRSO_MITIGATION_BP_SPEC_REDUCE,
};
enum srso_mitigation_cmd {
@@ -2539,7 +2540,8 @@ static const char * const srso_strings[] = {
[SRSO_MITIGATION_MICROCODE] = "Vulnerable: Microcode, no safe RET",
[SRSO_MITIGATION_SAFE_RET] = "Mitigation: Safe RET",
[SRSO_MITIGATION_IBPB] = "Mitigation: IBPB",
- [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
+ [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only",
+ [SRSO_MITIGATION_BP_SPEC_REDUCE] = "Mitigation: Reduced Speculation"
};
static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
@@ -2578,7 +2580,7 @@ static void __init srso_select_mitigation(void)
srso_cmd == SRSO_CMD_OFF) {
if (boot_cpu_has(X86_FEATURE_SBPB))
x86_pred_cmd = PRED_CMD_SBPB;
- return;
+ goto out;
}
if (has_microcode) {
@@ -2590,7 +2592,7 @@ static void __init srso_select_mitigation(void)
*/
if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
- return;
+ goto out;
}
if (retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
@@ -2670,6 +2672,12 @@ static void __init srso_select_mitigation(void)
ibpb_on_vmexit:
case SRSO_CMD_IBPB_ON_VMEXIT:
+ if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
+ srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
+ break;
+ }
+
if (IS_ENABLED(CONFIG_MITIGATION_IBPB_ENTRY)) {
if (has_microcode) {
setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
@@ -2691,7 +2699,12 @@ static void __init srso_select_mitigation(void)
}
out:
- pr_info("%s\n", srso_strings[srso_mitigation]);
+
+ if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
+ setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
+
+ if (srso_mitigation != SRSO_MITIGATION_NONE)
+ pr_info("%s\n", srso_strings[srso_mitigation]);
}
#undef pr_fmt
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7640a84e554a6..6ea3632af5807 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -257,6 +257,7 @@ DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
* defer the restoration of TSC_AUX until the CPU returns to userspace.
*/
static int tsc_aux_uret_slot __read_mostly = -1;
+static int zen4_bp_cfg_uret_slot __ro_after_init = -1;
static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
@@ -1540,6 +1541,11 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ kvm_set_user_return_msr(zen4_bp_cfg_uret_slot,
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT),
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT));
+
svm->guest_state_loaded = true;
}
@@ -5306,6 +5312,14 @@ static __init int svm_hardware_setup(void)
tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ zen4_bp_cfg_uret_slot = kvm_add_user_return_msr(MSR_ZEN4_BP_CFG);
+ if (WARN_ON_ONCE(zen4_bp_cfg_uret_slot < 0)) {
+ r = -EIO;
+ goto err;
+ }
+ }
+
if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
kvm_enable_efer_bits(EFER_AUTOIBRS);
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
index 3ae84c3b8e6db..1372a569fb585 100644
--- a/tools/arch/x86/include/asm/msr-index.h
+++ b/tools/arch/x86/include/asm/msr-index.h
@@ -717,6 +717,7 @@
/* Zen4 */
#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
/* Fam 19h MSRs */
--
2.48.1.601.g30ceb7b040-goog
And, of course, this bit:
> diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
> index 3ae84c3b8e6db..1372a569fb585 100644
> --- a/tools/arch/x86/include/asm/msr-index.h
> +++ b/tools/arch/x86/include/asm/msr-index.h
> @@ -717,6 +717,7 @@
> >
> /* Zen4 */
> #define MSR_ZEN4_BP_CFG 0xc001102e
> +#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
has to be added to arch/x86/include/asm/msr-index.h as well.
Following (yet another) updated versions accounting for this.
Best,
Patrick
---
From: Borislav Petkov <bp@alien8.de>
Add support for
CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
SRSO.
enable this BpSpecReduce bit to mitigate SRSO across guest/host
boundaries.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Patrick Bellasi <derkling@google.com>
---
Documentation/admin-guide/hw-vuln/srso.rst | 20 ++++++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/bugs.c | 21 +++++++++++++++++----
arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
tools/arch/x86/include/asm/msr-index.h | 1 +
6 files changed, 54 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 2ad1c05b8c883..49680ab99c393 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -104,6 +104,26 @@ The possible values in this file are:
(spec_rstack_overflow=ibpb-vmexit)
+ * 'Mitigation: Reduced Speculation':
+
+ This mitigation gets automatically enabled when the above one "IBPB on
+ VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
+
+ It gets automatically enabled on machines which have the
+ SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
+ to the above =ibpb-vmexit mitigation because the user/kernel boundary is
+ not affected anymore and thus "safe RET" is not needed.
+
+ After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
+ is detected (functionality present on all such machines) and that
+ practically overrides IBPB on VMEXIT as it has a lot less performance
+ impact and takes care of the guest->host attack vector too.
+
+ Currently, the mitigation uses KVM's user_return approach
+ (kvm_set_user_return_msr()) to set the BpSpecReduce bit when a vCPU runs
+ a guest and reset it upon return to host userspace or when the KVM module
+ is unloaded. The intent being, the small perf impact of BpSpecReduce should
+ be incurred only when really necessary.
In order to exploit vulnerability, an attacker needs to:
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 508c0dad116bc..c46754298507b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -468,6 +468,7 @@
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
#define X86_FEATURE_SRSO_USER_KERNEL_NO (20*32+30) /* CPU is not affected by SRSO across user/kernel boundaries */
+#define X86_FEATURE_SRSO_BP_SPEC_REDUCE (20*32+31) /* BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs (SRSO_MSR_FIX in AMD docs). */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 9a71880eec070..6bbc8836d6766 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -720,6 +720,7 @@
/* Zen4 */
#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
/* Fam 19h MSRs */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a5d0998d76049..d2007dbfcc1cc 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2522,6 +2522,7 @@ enum srso_mitigation {
SRSO_MITIGATION_SAFE_RET,
SRSO_MITIGATION_IBPB,
SRSO_MITIGATION_IBPB_ON_VMEXIT,
+ SRSO_MITIGATION_BP_SPEC_REDUCE,
};
enum srso_mitigation_cmd {
@@ -2539,7 +2540,8 @@ static const char * const srso_strings[] = {
[SRSO_MITIGATION_MICROCODE] = "Vulnerable: Microcode, no safe RET",
[SRSO_MITIGATION_SAFE_RET] = "Mitigation: Safe RET",
[SRSO_MITIGATION_IBPB] = "Mitigation: IBPB",
- [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
+ [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only",
+ [SRSO_MITIGATION_BP_SPEC_REDUCE] = "Mitigation: Reduced Speculation"
};
static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
@@ -2578,7 +2580,7 @@ static void __init srso_select_mitigation(void)
srso_cmd == SRSO_CMD_OFF) {
if (boot_cpu_has(X86_FEATURE_SBPB))
x86_pred_cmd = PRED_CMD_SBPB;
- return;
+ goto out;
}
if (has_microcode) {
@@ -2590,7 +2592,7 @@ static void __init srso_select_mitigation(void)
*/
if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
- return;
+ goto out;
}
if (retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
@@ -2670,6 +2672,12 @@ static void __init srso_select_mitigation(void)
ibpb_on_vmexit:
case SRSO_CMD_IBPB_ON_VMEXIT:
+ if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
+ srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
+ break;
+ }
+
if (IS_ENABLED(CONFIG_MITIGATION_IBPB_ENTRY)) {
if (has_microcode) {
setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
@@ -2691,7 +2699,12 @@ static void __init srso_select_mitigation(void)
}
out:
- pr_info("%s\n", srso_strings[srso_mitigation]);
+
+ if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
+ setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
+
+ if (srso_mitigation != SRSO_MITIGATION_NONE)
+ pr_info("%s\n", srso_strings[srso_mitigation]);
}
#undef pr_fmt
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7640a84e554a6..6ea3632af5807 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -257,6 +257,7 @@ DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
* defer the restoration of TSC_AUX until the CPU returns to userspace.
*/
static int tsc_aux_uret_slot __read_mostly = -1;
+static int zen4_bp_cfg_uret_slot __ro_after_init = -1;
static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
@@ -1540,6 +1541,11 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ kvm_set_user_return_msr(zen4_bp_cfg_uret_slot,
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT),
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT));
+
svm->guest_state_loaded = true;
}
@@ -5306,6 +5312,14 @@ static __init int svm_hardware_setup(void)
tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ zen4_bp_cfg_uret_slot = kvm_add_user_return_msr(MSR_ZEN4_BP_CFG);
+ if (WARN_ON_ONCE(zen4_bp_cfg_uret_slot < 0)) {
+ r = -EIO;
+ goto err;
+ }
+ }
+
if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
kvm_enable_efer_bits(EFER_AUTOIBRS);
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
index 3ae84c3b8e6db..1372a569fb585 100644
--- a/tools/arch/x86/include/asm/msr-index.h
+++ b/tools/arch/x86/include/asm/msr-index.h
@@ -717,6 +717,7 @@
/* Zen4 */
#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
/* Fam 19h MSRs */
--
2.48.1.601.g30ceb7b040-goog
+ bjackman.
On Thu, Feb 13, 2025 at 01:44:08PM +0000, Patrick Bellasi wrote:
> Following (yet another) updated versions accounting for this.
Here's a tested one. You could've simply waited as I told you I'm working on
it.
I'm going to queue this one next week into tip once Patrick's fix becomes part
of -rc3.
Thx.
---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Wed, 13 Nov 2024 13:41:10 +0100
Subject: [PATCH] x86/bugs: KVM: Add support for SRSO_MSR_FIX
Add support for
CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
SRSO.
Enable BpSpecReduce to mitigate SRSO across guest/host boundaries.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
Documentation/admin-guide/hw-vuln/srso.rst | 21 +++++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 4 ++++
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/bugs.c | 24 ++++++++++++++++++----
arch/x86/kvm/svm/svm.c | 14 +++++++++++++
5 files changed, 60 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 2ad1c05b8c88..b856538083a2 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -104,6 +104,27 @@ The possible values in this file are:
(spec_rstack_overflow=ibpb-vmexit)
+ * 'Mitigation: Reduced Speculation':
+
+ This mitigation gets automatically enabled when the above one "IBPB on
+ VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
+
+ It gets automatically enabled on machines which have the
+ SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
+ to the above =ibpb-vmexit mitigation because the user/kernel boundary is
+ not affected anymore and thus "safe RET" is not needed.
+
+ After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
+ is detected (functionality present on all such machines) and that
+ practically overrides IBPB on VMEXIT as it has a lot less performance
+ impact and takes care of the guest->host attack vector too.
+
+ Currently, the mitigation uses KVM's user_return approach
+ (kvm_set_user_return_msr()) to set the BpSpecReduce bit when a vCPU runs
+ a guest and reset it upon return to host userspace or when the KVM module
+ is unloaded. The intent being, the small perf impact of BpSpecReduce should
+ be incurred only when really necessary.
+
In order to exploit vulnerability, an attacker needs to:
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 508c0dad116b..43653f2704c9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -468,6 +468,10 @@
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
#define X86_FEATURE_SRSO_USER_KERNEL_NO (20*32+30) /* CPU is not affected by SRSO across user/kernel boundaries */
+#define X86_FEATURE_SRSO_BP_SPEC_REDUCE (20*32+31) /*
+ * BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs.
+ * (SRSO_MSR_FIX in the official doc).
+ */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 72765b2fe0d8..d35519b337ba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -721,6 +721,7 @@
/* Zen4 */
#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
/* Fam 19h MSRs */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a5d0998d7604..1d7afc40f227 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2522,6 +2522,7 @@ enum srso_mitigation {
SRSO_MITIGATION_SAFE_RET,
SRSO_MITIGATION_IBPB,
SRSO_MITIGATION_IBPB_ON_VMEXIT,
+ SRSO_MITIGATION_BP_SPEC_REDUCE,
};
enum srso_mitigation_cmd {
@@ -2539,7 +2540,8 @@ static const char * const srso_strings[] = {
[SRSO_MITIGATION_MICROCODE] = "Vulnerable: Microcode, no safe RET",
[SRSO_MITIGATION_SAFE_RET] = "Mitigation: Safe RET",
[SRSO_MITIGATION_IBPB] = "Mitigation: IBPB",
- [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
+ [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only",
+ [SRSO_MITIGATION_BP_SPEC_REDUCE] = "Mitigation: Reduced Speculation"
};
static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
@@ -2578,7 +2580,7 @@ static void __init srso_select_mitigation(void)
srso_cmd == SRSO_CMD_OFF) {
if (boot_cpu_has(X86_FEATURE_SBPB))
x86_pred_cmd = PRED_CMD_SBPB;
- return;
+ goto out;
}
if (has_microcode) {
@@ -2590,7 +2592,7 @@ static void __init srso_select_mitigation(void)
*/
if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
- return;
+ goto out;
}
if (retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
@@ -2670,6 +2672,12 @@ static void __init srso_select_mitigation(void)
ibpb_on_vmexit:
case SRSO_CMD_IBPB_ON_VMEXIT:
+ if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
+ srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
+ break;
+ }
+
if (IS_ENABLED(CONFIG_MITIGATION_IBPB_ENTRY)) {
if (has_microcode) {
setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
@@ -2691,7 +2699,15 @@ static void __init srso_select_mitigation(void)
}
out:
- pr_info("%s\n", srso_strings[srso_mitigation]);
+ /*
+ * Clear the feature flag if this mitigation is not selected as that
+ * feature flag controls the BpSpecReduce MSR bit toggling in KVM.
+ */
+ if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
+ setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
+
+ if (srso_mitigation != SRSO_MITIGATION_NONE)
+ pr_info("%s\n", srso_strings[srso_mitigation]);
}
#undef pr_fmt
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7640a84e554a..6ea3632af580 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -257,6 +257,7 @@ DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
* defer the restoration of TSC_AUX until the CPU returns to userspace.
*/
static int tsc_aux_uret_slot __read_mostly = -1;
+static int zen4_bp_cfg_uret_slot __ro_after_init = -1;
static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
@@ -1540,6 +1541,11 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
(!boot_cpu_has(X86_FEATURE_V_TSC_AUX) || !sev_es_guest(vcpu->kvm)))
kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ kvm_set_user_return_msr(zen4_bp_cfg_uret_slot,
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT),
+ BIT_ULL(MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT));
+
svm->guest_state_loaded = true;
}
@@ -5306,6 +5312,14 @@ static __init int svm_hardware_setup(void)
tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ zen4_bp_cfg_uret_slot = kvm_add_user_return_msr(MSR_ZEN4_BP_CFG);
+ if (WARN_ON_ONCE(zen4_bp_cfg_uret_slot < 0)) {
+ r = -EIO;
+ goto err;
+ }
+ }
+
if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
kvm_enable_efer_bits(EFER_AUTOIBRS);
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
> On Thu, Feb 13, 2025 at 01:44:08PM +0000, Patrick Bellasi wrote:
> > Following (yet another) updated versions accounting for this.
> >
> Here's a tested one. You could've simply waited as I told you I'm working on
> it.
Thanks Borislav, no problems for me, I'm just looking into these SRSO fixes and
what I did was mostly a way to ramp up.
> I'm going to queue this one next week into tip once Patrick's fix becomes part
> of -rc3.
Thanks for all the efforts.
Meanwhile I've a couple of questions about the approach proposed here.
> + * 'Mitigation: Reduced Speculation':
> +
> + This mitigation gets automatically enabled when the above one "IBPB on
> + VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
> +
> + It gets automatically enabled on machines which have the
> + SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
> + to the above =ibpb-vmexit mitigation because the user/kernel boundary is
> + not affected anymore and thus "safe RET" is not needed.
> +
> + After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
> + is detected (functionality present on all such machines) and that
> + practically overrides IBPB on VMEXIT as it has a lot less performance
> + impact and takes care of the guest->host attack vector too.
> +
> + Currently, the mitigation uses KVM's user_return approach
> + (kvm_set_user_return_msr()) to set the BpSpecReduce bit when a vCPU runs
> + a guest and reset it upon return to host userspace or when the KVM module
> + is unloaded. The intent being, the small perf impact of BpSpecReduce should
> + be incurred only when really necessary.
According to [1], the AMD Whitepaper on SRSO says:
Processors which set SRSO_MSR_FIX=1 support an MSR bit which mitigates SRSO
across guest/host boundaries. Software may enable this by setting bit 4
(BpSpecReduce) of MSR C001_102E. This bit can be set once during boot and
should be set identically across all processors in the system.
The "should be set identically across all processors in the system" makes me
wondering if using the "KVM's user_return approach" proposed here is robust
enough. Could this not lead to the bit being possibly set only on some CPU
but not others?
Am I missing something? Is the MSR used something "shared" across all the cores
of a CPU, i.e. as long a vCPU is running the entire system is still protected?
Maybe there is even just a more recent whitepaper (I did not find) online with
a different recommendation?
Also, setting the MSR bit only while the guest is running, is it sufficient to
prevent both speculation and training?
Think about this scenario:
(a) Guest runs with BpSpecReduce MSR set by KVM and performs training.
(b) We exit into userspace and clear BpSpecReduce.
(c) We go back into the kernel with BpSpecReduce cleared and the guest
training intact.
If (a) can be done (i.e. the MSR does not prevent training), don't we end up in
a vulnerable state in (c)?
If BpSpecReduce does not prevent training, but only the training from being
used, should not we keep it consistently set after a guest has run, or until an
IBPB is executed?
Best,
Patrick
[1] https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
On Thu, Feb 13, 2025 at 05:50:57PM +0000, Patrick Bellasi wrote:
> The "should be set identically across all processors in the system" makes me
> wondering if using the "KVM's user_return approach" proposed here is robust
> enough. Could this not lead to the bit being possibly set only on some CPU
> but not others?
That's fine, we should update that paper.
> If BpSpecReduce does not prevent training, but only the training from being
> used, should not we keep it consistently set after a guest has run, or until an
> IBPB is executed?
After talking with folks internally, you're probably right. We should slap an
IBPB before clearing. Which means, I cannot use the MSR return slots anymore.
I will have to resurrect some of the other solutions we had lined up...
Stay tuned.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Feb 14, 2025 at 09:10:05PM +0100, Borislav Petkov wrote:
> After talking with folks internally, you're probably right. We should slap an
> IBPB before clearing. Which means, I cannot use the MSR return slots anymore.
> I will have to resurrect some of the other solutions we had lined up...
So I'm thinking about this (totally untested ofc) but I'm doing it in the CLGI
region so no need to worry about migration etc.
Sean, that ok this way?
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6ea3632af580..dcc4e5935b82 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4272,8 +4272,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
x86_spec_ctrl_set_guest(svm->virt_spec_ctrl);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
+
svm_vcpu_enter_exit(vcpu, spec_ctrl_intercepted);
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ indirect_branch_prediction_barrier();
+ msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
+ }
+
if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
x86_spec_ctrl_restore_host(svm->virt_spec_ctrl);
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Sat, Feb 15, 2025 at 01:53:07PM +0100, Borislav Petkov wrote:
> On Fri, Feb 14, 2025 at 09:10:05PM +0100, Borislav Petkov wrote:
> > After talking with folks internally, you're probably right. We should slap an
> > IBPB before clearing. Which means, I cannot use the MSR return slots anymore.
> > I will have to resurrect some of the other solutions we had lined up...
>
> So I'm thinking about this (totally untested ofc) but I'm doing it in the CLGI
> region so no need to worry about migration etc.
>
> Sean, that ok this way?
I am no Sean, but I have some questions about this approach if that's
okay :)
First of all, the use of indirect_branch_prediction_barrier() is
interesting to me because it only actually performs an IBPB if
X86_FEATURE_USE_IBPB is set, which is set according to the spectre_v2
mitigation. It seems to me that indirect_branch_prediction_barrier() was
originally intended for use only in switch_mm_irqs_off() ->
cond_mitigation(), where the spectre_v2 mitigations are executed, then
made its way to other places like KVM.
Second, and probably more importantly, it seems like with this approach
we will essentially be doing an IBPB on every VM-exit AND running the
guest with reduced speculation. At least on the surface, this looks
worse than an IBPB on VM-exit. My understanding is that this MSR is
intended to be a more efficient mitigation than IBPB on VM-exit.
This probably performs considerably worse than the previous approaches,
so I am wondering which approach is the 1-2% regression figure
associated with.
If 1-2% is the cost for keeping the MSR enabled at all times, I wonder
if we should just do that for simplicitly, and have it its own
mitigation option (chosen by the cmdline).
Alternatively, if that's too expensive, perhaps we can choose another
boundary to clear the MSR at and perform an IBPB. I can think of two
places:
- Upon return to userspace (similar to your previous proposal). In this
case we run userspace with the MSR cleared, and only perform an IBPB
in the exit to userspace pace.
- In the switch_mm() path (around cond_mitigation()). Basically we keep
the MSR bit set until we go to a different process, at which point we
clear it and do an IBPB. The MSR will bet set while the VMM is
running, but other processes in the system won't take the hit. We can
even be smarter and only do the MSR clear + IBPB if we're going from
a process using KVM to process that isn't. We'll need more bookkeeping
for that though.
Any thoughts? Am I completely off base?
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 6ea3632af580..dcc4e5935b82 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4272,8 +4272,16 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
> if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> x86_spec_ctrl_set_guest(svm->virt_spec_ctrl);
>
> + if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> + msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
> +
> svm_vcpu_enter_exit(vcpu, spec_ctrl_intercepted);
>
> + if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
> + indirect_branch_prediction_barrier();
> + msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
> + }
> +
> if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
> x86_spec_ctrl_restore_host(svm->virt_spec_ctrl);
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On Sun, Feb 16, 2025 at 09:59:59PM -0800, Yosry Ahmed wrote:
> If 1-2% is the cost for keeping the MSR enabled at all times, I wonder
> if we should just do that for simplicitly, and have it its own
> mitigation option (chosen by the cmdline).
Yes, that's what David and I think we should do initially.
Then we can chase some more involved scheme like setting the bit before the
first VM runs and then clearing it when the last one exits. I *think* I've
seen something like that in KVM but I need to stare in detail.
> - Upon return to userspace (similar to your previous proposal). In this
> case we run userspace with the MSR cleared, and only perform an IBPB
> in the exit to userspace pace.
You want to IBPB before clearing the MSR as otherwise host kernel will be
running with the mistrained gunk from the guest.
> Any thoughts?
Yes, let's keep it simple and do anything more involved *only* if it is really
necessary.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Feb 17, 2025 at 05:07:28PM +0100, Borislav Petkov wrote: > On Sun, Feb 16, 2025 at 09:59:59PM -0800, Yosry Ahmed wrote: > > If 1-2% is the cost for keeping the MSR enabled at all times, I wonder > > if we should just do that for simplicitly, and have it its own > > mitigation option (chosen by the cmdline). > > Yes, that's what David and I think we should do initially. > > Then we can chase some more involved scheme like setting the bit before the > first VM runs and then clearing it when the last one exits. I *think* I've > seen something like that in KVM but I need to stare in detail. > > > - Upon return to userspace (similar to your previous proposal). In this > > case we run userspace with the MSR cleared, and only perform an IBPB > > in the exit to userspace pace. > > You want to IBPB before clearing the MSR as otherwise host kernel will be > running with the mistrained gunk from the guest. I meant IBPB + MSR clear before going to userspace, or IBPB + MSR clear before a context switch. > > > Any thoughts? > > Yes, let's keep it simple and do anything more involved *only* if it is really > necessary. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Mon, Feb 17, 2025 at 11:56:22AM -0800, Yosry Ahmed wrote:
> I meant IBPB + MSR clear before going to userspace, or IBPB + MSR clear
> before a context switch.
Basically what I said already:
"Yes, let's keep it simple and do anything more involved *only* if it is
really necessary."
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
February 17, 2025 at 12:20 PM, "Borislav Petkov" <bp@alien8.de> wrote: > > On Mon, Feb 17, 2025 at 11:56:22AM -0800, Yosry Ahmed wrote: > > > > > I meant IBPB + MSR clear before going to userspace, or IBPB + MSR clear > > before a context switch. > > > > Basically what I said already: > > "Yes, let's keep it simple and do anything more involved *only* if it is > really necessary." Right, I agree as I mentioned earlier (assuming the perf hit is 1-2%), just wanted to clarify what I meant.
So,
in the interest of finally making some progress here I'd like to commit this
below (will test it one more time just in case but it should work :-P). It is
simple and straight-forward and doesn't need an IBPB when the bit gets
cleared.
A potential future improvement is David's suggestion that there could be a way
for tracking when the first guest gets started, we set the bit then, we make
sure the bit gets set on each logical CPU when the guests migrate across the
machine and when the *last* guest exists, that bit gets cleared again.
It all depends on how much machinery is additionally needed and how much ugly
it becomes and how much noticeable the perf impact even is from simply setting
that bit blindly on every CPU...
But something we can chase later, once *some* fix is there first.
Thx.
---
Add support for
CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it
indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate
SRSO.
Enable BpSpecReduce to mitigate SRSO across guest/host boundaries.
Switch back to enabling the bit when virtualization is enabled and to
clear the bit when virtualization is disabled because using a MSR slot
would clear the bit when the guest is exited and any training the guest
has done, would potentially influence the host kernel when execution
enters the kernel and hasn't VMRUN the guest yet.
More detail on the public thread in Link.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241202120416.6054-1-bp@kernel.org
---
Documentation/admin-guide/hw-vuln/srso.rst | 13 ++++++++++++
arch/x86/include/asm/cpufeatures.h | 4 ++++
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/bugs.c | 24 ++++++++++++++++++----
arch/x86/kvm/svm/svm.c | 6 ++++++
arch/x86/lib/msr.c | 2 ++
6 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 2ad1c05b8c88..66af95251a3d 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -104,7 +104,20 @@ The possible values in this file are:
(spec_rstack_overflow=ibpb-vmexit)
+ * 'Mitigation: Reduced Speculation':
+ This mitigation gets automatically enabled when the above one "IBPB on
+ VMEXIT" has been selected and the CPU supports the BpSpecReduce bit.
+
+ It gets automatically enabled on machines which have the
+ SRSO_USER_KERNEL_NO=1 CPUID bit. In that case, the code logic is to switch
+ to the above =ibpb-vmexit mitigation because the user/kernel boundary is
+ not affected anymore and thus "safe RET" is not needed.
+
+ After enabling the IBPB on VMEXIT mitigation option, the BpSpecReduce bit
+ is detected (functionality present on all such machines) and that
+ practically overrides IBPB on VMEXIT as it has a lot less performance
+ impact and takes care of the guest->host attack vector too.
In order to exploit vulnerability, an attacker needs to:
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 508c0dad116b..43653f2704c9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -468,6 +468,10 @@
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */
#define X86_FEATURE_SRSO_USER_KERNEL_NO (20*32+30) /* CPU is not affected by SRSO across user/kernel boundaries */
+#define X86_FEATURE_SRSO_BP_SPEC_REDUCE (20*32+31) /*
+ * BP_CFG[BpSpecReduce] can be used to mitigate SRSO for VMs.
+ * (SRSO_MSR_FIX in the official doc).
+ */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 72765b2fe0d8..d35519b337ba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -721,6 +721,7 @@
/* Zen4 */
#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT 4
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
/* Fam 19h MSRs */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a5d0998d7604..1d7afc40f227 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2522,6 +2522,7 @@ enum srso_mitigation {
SRSO_MITIGATION_SAFE_RET,
SRSO_MITIGATION_IBPB,
SRSO_MITIGATION_IBPB_ON_VMEXIT,
+ SRSO_MITIGATION_BP_SPEC_REDUCE,
};
enum srso_mitigation_cmd {
@@ -2539,7 +2540,8 @@ static const char * const srso_strings[] = {
[SRSO_MITIGATION_MICROCODE] = "Vulnerable: Microcode, no safe RET",
[SRSO_MITIGATION_SAFE_RET] = "Mitigation: Safe RET",
[SRSO_MITIGATION_IBPB] = "Mitigation: IBPB",
- [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only"
+ [SRSO_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT only",
+ [SRSO_MITIGATION_BP_SPEC_REDUCE] = "Mitigation: Reduced Speculation"
};
static enum srso_mitigation srso_mitigation __ro_after_init = SRSO_MITIGATION_NONE;
@@ -2578,7 +2580,7 @@ static void __init srso_select_mitigation(void)
srso_cmd == SRSO_CMD_OFF) {
if (boot_cpu_has(X86_FEATURE_SBPB))
x86_pred_cmd = PRED_CMD_SBPB;
- return;
+ goto out;
}
if (has_microcode) {
@@ -2590,7 +2592,7 @@ static void __init srso_select_mitigation(void)
*/
if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) {
setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
- return;
+ goto out;
}
if (retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
@@ -2670,6 +2672,12 @@ static void __init srso_select_mitigation(void)
ibpb_on_vmexit:
case SRSO_CMD_IBPB_ON_VMEXIT:
+ if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
+ srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
+ break;
+ }
+
if (IS_ENABLED(CONFIG_MITIGATION_IBPB_ENTRY)) {
if (has_microcode) {
setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
@@ -2691,7 +2699,15 @@ static void __init srso_select_mitigation(void)
}
out:
- pr_info("%s\n", srso_strings[srso_mitigation]);
+ /*
+ * Clear the feature flag if this mitigation is not selected as that
+ * feature flag controls the BpSpecReduce MSR bit toggling in KVM.
+ */
+ if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
+ setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
+
+ if (srso_mitigation != SRSO_MITIGATION_NONE)
+ pr_info("%s\n", srso_strings[srso_mitigation]);
}
#undef pr_fmt
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..77ab66c5bb96 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -607,6 +607,9 @@ static void svm_disable_virtualization_cpu(void)
kvm_cpu_svm_disable();
amd_pmu_disable_virt();
+
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
}
static int svm_enable_virtualization_cpu(void)
@@ -684,6 +687,9 @@ static int svm_enable_virtualization_cpu(void)
rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi);
}
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
+
return 0;
}
diff --git a/arch/x86/lib/msr.c b/arch/x86/lib/msr.c
index 4bf4fad5b148..5a18ecc04a6c 100644
--- a/arch/x86/lib/msr.c
+++ b/arch/x86/lib/msr.c
@@ -103,6 +103,7 @@ int msr_set_bit(u32 msr, u8 bit)
{
return __flip_bit(msr, bit, true);
}
+EXPORT_SYMBOL_GPL(msr_set_bit);
/**
* msr_clear_bit - Clear @bit in a MSR @msr.
@@ -118,6 +119,7 @@ int msr_clear_bit(u32 msr, u8 bit)
{
return __flip_bit(msr, bit, false);
}
+EXPORT_SYMBOL_GPL(msr_clear_bit);
#ifdef CONFIG_TRACEPOINTS
void do_trace_write_msr(unsigned int msr, u64 val, int failed)
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Feb 18, 2025 at 12:13:33PM +0100, Borislav Petkov wrote:
> So,
>
> in the interest of finally making some progress here I'd like to commit this
> below (will test it one more time just in case but it should work :-P). It is
> simple and straight-forward and doesn't need an IBPB when the bit gets
> cleared.
>
> A potential future improvement is David's suggestion that there could be a way
> for tracking when the first guest gets started, we set the bit then, we make
> sure the bit gets set on each logical CPU when the guests migrate across the
> machine and when the *last* guest exists, that bit gets cleared again.
Well, that "simplicity" was short-lived:
https://www.phoronix.com/review/linux-615-amd-regression
Sean, how about this below?
It is hacky and RFC-ish - i.e., don't look too hard at it - but basically
I'm pushing down into arch code the decision whether to enable virt on load.
And it has no effects on anything else but machines which have this
SRSO_MSR_FIX (Zen5).
And it seems to work here - the MSR is set only when I create a VM - i.e., as
expected.
Thoughts? Better ideas?
Thx.
---
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 823c0434bbad..6cc8698df1a5 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -16,6 +16,7 @@ BUILD_BUG_ON(1)
KVM_X86_OP(check_processor_compatibility)
KVM_X86_OP(enable_virtualization_cpu)
KVM_X86_OP(disable_virtualization_cpu)
+KVM_X86_OP_OPTIONAL(enable_virt_on_load)
KVM_X86_OP(hardware_unsetup)
KVM_X86_OP(has_emulated_msr)
KVM_X86_OP(vcpu_after_set_cpuid)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3131abcac4f1..c1a29d7fee45 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1664,6 +1664,7 @@ struct kvm_x86_ops {
int (*enable_virtualization_cpu)(void);
void (*disable_virtualization_cpu)(void);
+ bool (*enable_virt_on_load)(void);
cpu_emergency_virt_cb *emergency_disable_virtualization_cpu;
void (*hardware_unsetup)(void);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 67657b3a36ce..dcbba55cb949 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -693,6 +693,16 @@ static int svm_enable_virtualization_cpu(void)
return 0;
}
+static bool svm_enable_virt_on_load(void)
+{
+ bool ret = true;
+
+ if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ ret = false;
+
+ return ret;
+}
+
static void svm_cpu_uninit(int cpu)
{
struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
@@ -5082,6 +5092,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.hardware_unsetup = svm_hardware_unsetup,
.enable_virtualization_cpu = svm_enable_virtualization_cpu,
.disable_virtualization_cpu = svm_disable_virtualization_cpu,
+ .enable_virt_on_load = svm_enable_virt_on_load,
.emergency_disable_virtualization_cpu = svm_emergency_disable_virtualization_cpu,
.has_emulated_msr = svm_has_emulated_msr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c6553985e75..a09dc8cbd59f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12576,9 +12576,15 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
-void kvm_arch_enable_virtualization(void)
+bool kvm_arch_enable_virtualization(bool allow_arch_override)
{
+ if (allow_arch_override)
+ if (!kvm_x86_call(enable_virt_on_load)())
+ return false;
+
cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
+
+ return true;
}
void kvm_arch_disable_virtualization(void)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 291d49b9bf05..4353ef54d45d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1599,7 +1599,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
* kvm_usage_count, i.e. at the beginning of the generic hardware enabling
* sequence, and at the end of the generic hardware disabling sequence.
*/
-void kvm_arch_enable_virtualization(void);
+bool kvm_arch_enable_virtualization(bool);
void kvm_arch_disable_virtualization(void);
/*
* kvm_arch_{enable,disable}_virtualization_cpu() are called on "every" CPU to
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e85b33a92624..0009661dee1d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -143,8 +143,8 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
#define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
.open = kvm_no_compat_open
#endif
-static int kvm_enable_virtualization(void);
-static void kvm_disable_virtualization(void);
+static int kvm_enable_virtualization(bool allow_arch_override);
+static void kvm_disable_virtualization(bool allow_arch_override);
static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
@@ -1187,7 +1187,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_arch_destroy_vm;
- r = kvm_enable_virtualization();
+ r = kvm_enable_virtualization(false);
if (r)
goto out_err_no_disable;
@@ -1224,7 +1224,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
#endif
out_err_no_mmu_notifier:
- kvm_disable_virtualization();
+ kvm_disable_virtualization(false);
out_err_no_disable:
kvm_arch_destroy_vm(kvm);
out_err_no_arch_destroy_vm:
@@ -1320,7 +1320,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
#endif
kvm_arch_free_vm(kvm);
preempt_notifier_dec();
- kvm_disable_virtualization();
+ kvm_disable_virtualization(false);
mmdrop(mm);
}
@@ -5489,9 +5489,9 @@ static DEFINE_PER_CPU(bool, virtualization_enabled);
static DEFINE_MUTEX(kvm_usage_lock);
static int kvm_usage_count;
-__weak void kvm_arch_enable_virtualization(void)
+__weak bool kvm_arch_enable_virtualization(bool)
{
-
+ return false;
}
__weak void kvm_arch_disable_virtualization(void)
@@ -5589,8 +5589,9 @@ static struct syscore_ops kvm_syscore_ops = {
.shutdown = kvm_shutdown,
};
-static int kvm_enable_virtualization(void)
+static int kvm_enable_virtualization(bool allow_arch_override)
{
+ bool do_init;
int r;
guard(mutex)(&kvm_usage_lock);
@@ -5598,7 +5599,9 @@ static int kvm_enable_virtualization(void)
if (kvm_usage_count++)
return 0;
- kvm_arch_enable_virtualization();
+ do_init = kvm_arch_enable_virtualization(allow_arch_override);
+ if (!do_init)
+ goto out;
r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
kvm_online_cpu, kvm_offline_cpu);
@@ -5631,11 +5634,13 @@ static int kvm_enable_virtualization(void)
cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
err_cpuhp:
kvm_arch_disable_virtualization();
+
+out:
--kvm_usage_count;
return r;
}
-static void kvm_disable_virtualization(void)
+static void kvm_disable_virtualization(bool allow_arch_override)
{
guard(mutex)(&kvm_usage_lock);
@@ -5650,7 +5655,7 @@ static void kvm_disable_virtualization(void)
static int kvm_init_virtualization(void)
{
if (enable_virt_at_load)
- return kvm_enable_virtualization();
+ return kvm_enable_virtualization(true);
return 0;
}
@@ -5658,10 +5663,10 @@ static int kvm_init_virtualization(void)
static void kvm_uninit_virtualization(void)
{
if (enable_virt_at_load)
- kvm_disable_virtualization();
+ kvm_disable_virtualization(true);
}
#else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
-static int kvm_enable_virtualization(void)
+static int kvm_enable_virtualization(bool allow_arch_override)
{
return 0;
}
@@ -5671,7 +5676,7 @@ static int kvm_init_virtualization(void)
return 0;
}
-static void kvm_disable_virtualization(void)
+static void kvm_disable_virtualization(bool allow_arch_override)
{
}
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Apr 29, 2025, Borislav Petkov wrote:
> On Tue, Feb 18, 2025 at 12:13:33PM +0100, Borislav Petkov wrote:
> > So,
> >
> > in the interest of finally making some progress here I'd like to commit this
> > below (will test it one more time just in case but it should work :-P). It is
> > simple and straight-forward and doesn't need an IBPB when the bit gets
> > cleared.
> >
> > A potential future improvement is David's suggestion that there could be a way
> > for tracking when the first guest gets started, we set the bit then, we make
> > sure the bit gets set on each logical CPU when the guests migrate across the
> > machine and when the *last* guest exists, that bit gets cleared again.
>
> Well, that "simplicity" was short-lived:
>
> https://www.phoronix.com/review/linux-615-amd-regression
LOL.
> Sean, how about this below?
Eww. That's quite painful, and completely disallowing enable_virt_on_load is
undesirable, e.g. for use cases where the host is (almost) exclusively running
VMs.
Best idea I have is to throw in the towel on getting fancy, and just maintain a
dedicated count in SVM.
Alternatively, we could plumb an arch hook into kvm_create_vm() and kvm_destroy_vm()
that's called when KVM adds/deletes a VM from vm_list, and key off vm_list being
empty. But that adds a lot of boilerplate just to avoid a mutex+count.
I haven't tested on a system with X86_FEATURE_SRSO_BP_SPEC_REDUCE, but did verify
the mechanics by inverting the flag.
--
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 30 Apr 2025 15:34:50 -0700
Subject: [PATCH] KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count
transitions
Set the magic BP_SPEC_REDUCE bit to mitigate SRSO when running VMs if and
only if KVM has at least one active VM. Leaving the bit set at all times
unfortunately degrades performance by a wee bit more than expected.
Use a dedicated mutex and counter instead of hooking virtualization
enablement, as changing the behavior of kvm.enable_virt_at_load based on
SRSO_BP_SPEC_REDUCE is painful, and has its own drawbacks, e.g. could
result in performance issues for flows that are sensity to VM creation
latency.
Fixes: 8442df2b49ed ("x86/bugs: KVM: Add support for SRSO_MSR_FIX")
Reported-by: Michael Larabel <Michael@michaellarabel.com>
Closes: https://www.phoronix.com/review/linux-615-amd-regression
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 39 +++++++++++++++++++++++++++++++++------
1 file changed, 33 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d5d0c5c3300b..fe8866572218 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -607,9 +607,6 @@ static void svm_disable_virtualization_cpu(void)
kvm_cpu_svm_disable();
amd_pmu_disable_virt();
-
- if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
- msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
}
static int svm_enable_virtualization_cpu(void)
@@ -687,9 +684,6 @@ static int svm_enable_virtualization_cpu(void)
rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi);
}
- if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
- msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
-
return 0;
}
@@ -5032,10 +5026,42 @@ static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
sev_vcpu_deliver_sipi_vector(vcpu, vector);
}
+static DEFINE_MUTEX(srso_lock);
+static int srso_nr_vms;
+
+static void svm_toggle_srso_spec_reduce(void *set)
+{
+ if (set)
+ msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
+ else
+ msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
+}
+
+static void svm_srso_add_remove_vm(int count)
+{
+ bool set;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
+ return;
+
+ guard(mutex)(&srso_lock);
+
+ set = !srso_nr_vms;
+ srso_nr_vms += count;
+
+ WARN_ON_ONCE(srso_nr_vms < 0);
+ if (!set && srso_nr_vms)
+ return;
+
+ on_each_cpu(svm_toggle_srso_spec_reduce, (void *)set, 1);
+}
+
static void svm_vm_destroy(struct kvm *kvm)
{
avic_vm_destroy(kvm);
sev_vm_destroy(kvm);
+
+ svm_srso_add_remove_vm(-1);
}
static int svm_vm_init(struct kvm *kvm)
@@ -5061,6 +5087,7 @@ static int svm_vm_init(struct kvm *kvm)
return ret;
}
+ svm_srso_add_remove_vm(1);
return 0;
}
base-commit: f158e1b145f73aae1d3b7e756eb129a15b2b7a90
--
On Wed, Apr 30, 2025 at 04:33:19PM -0700, Sean Christopherson wrote:
> Eww. That's quite painful, and completely disallowing enable_virt_on_load is
> undesirable, e.g. for use cases where the host is (almost) exclusively running
> VMs.
I wanted to stay generic... :-)
> Best idea I have is to throw in the towel on getting fancy, and just maintain a
> dedicated count in SVM.
>
> Alternatively, we could plumb an arch hook into kvm_create_vm() and kvm_destroy_vm()
> that's called when KVM adds/deletes a VM from vm_list, and key off vm_list being
> empty. But that adds a lot of boilerplate just to avoid a mutex+count.
FWIW, that was Tom's idea.
> +static void svm_srso_add_remove_vm(int count)
> +{
> + bool set;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> + return;
> +
> + guard(mutex)(&srso_lock);
> +
> + set = !srso_nr_vms;
> + srso_nr_vms += count;
> +
> + WARN_ON_ONCE(srso_nr_vms < 0);
> + if (!set && srso_nr_vms)
> + return;
So instead of doing this "by-foot", I would've used any of those
atomic_inc_return() and atomic_dec_and_test() and act upon the value when it
becomes 0 or !0 instead of passing 1 and -1. Because the count is kinda
implicit...
But yeah, not a biggie - that solves the issue too.
Thanks!
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, May 01, 2025, Borislav Petkov wrote:
> On Wed, Apr 30, 2025 at 04:33:19PM -0700, Sean Christopherson wrote:
> > +static void svm_srso_add_remove_vm(int count)
> > +{
> > + bool set;
> > +
> > + if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> > + return;
> > +
> > + guard(mutex)(&srso_lock);
> > +
> > + set = !srso_nr_vms;
> > + srso_nr_vms += count;
> > +
> > + WARN_ON_ONCE(srso_nr_vms < 0);
> > + if (!set && srso_nr_vms)
> > + return;
>
> So instead of doing this "by-foot", I would've used any of those
> atomic_inc_return() and atomic_dec_and_test() and act upon the value when it
> becomes 0 or !0 instead of passing 1 and -1. Because the count is kinda
> implicit...
Heh, I considered that, and even tried it this morning because I thought it wouldn't
be as tricky as I first thought, but turns out, yeah, it's tricky. The complication
is that KVM needs to ensure BP_SPEC_REDUCE=1 on all CPUs before any VM is created.
I thought it wouldn't be _that_ tricky once I realized the 1=>0 case doesn't require
ordering, e.g. running host code while other CPUs have BP_SPEC_REDUCE=1 is totally
fine, KVM just needs to ensure no guest code is executed with BP_SPEC_REDUCE=0.
But guarding against all the possible edge cases is comically difficult.
For giggles, I did get it working, but it's a rather absurd amount of complexity
for very little gain. In practice, when srso_nr_vms >= 1, CPUs will hold the lock
for only a handful of cycles. The latency of VM creation is somewhat important,
but it's certainly not that latency sensitive, e.g. KVM already serializes VM
creation on kvm_lock (which is a mutex) in order to add VMs to vm_list.
The only truly slow path is the 0<=>1 transitions, and those *must* be slow to
get the ordering correct.
One thing I can/will do is change the mutex into a spinlock, so that on non-RT
systems there's zero chance of VM creation getting bogged down because a task
happens to get preempted at just the wrong time. I was thinking it's illegal to
use on_each_cpu() in a spinlock, but that's fine; it's using it with IRQs disabled
that's problematic.
I'll post a proper patch with the spinlock and CONFIG_CPU_MITIGATIONS #ifdef,
assuming testing comes back clean.
As for all the problematic scenarios... If KVM increments the counter before
sending IPIs, then reacing VM creation can result in the second VM skipping the
mutex and doing KVM_RUN with BP_SPEC_REDUCE=0.
VMs MSR CPU-0 CPU-1
-----------------------------
0 0 CREATE
0 0 lock()
1 0 inc()
1 0 CREATE
2 0 inc()
2 0 KVM_RUN :-(
2 0 IPI
2 1 WRMSR WRMSR
But simply incrementing the count after sending IPIs obviously doesn't work.
VMs MSR CPU-0 CPU-1
-----------------------------
0 0 CREATE
0 0 lock()
0 0 IPI
0 0 WRMSR WRMSR
1 0 inc()
1 0 KVM_RUN :-(
And passing in set/clear (or using separate IPI handlers) doesn't handle the case
where the IPI from destroy arrives after the IPI from create.
VMs MSR CPU-0 CPU-1
-----------------------------
1 1 DESTROY
0 1 CREATE dec()
0 1 lock()
0 1 IPI(1)
0 1 WRMSR WRMSR
0 1 IPI(0)
0 0 WRMSR WRMSR
1 0 inc()
1 0 KVM_RUN :-(
Addressing that by adding a global flag to track that SRSO needs to be set
almost works, but there's still a race with a destroy IPI if the callback only
checks the "set" flag.
VMs MSR CPU-0 CPU-1
-----------------------------
1 1 DESTROY
0 1 CREATE dec()
0 1 lock()
0 1 set=1
0 1 IPI
0 1 WRMSR WRMSR
1 0 inc()
1 1 set=0
1 1 IPI
1 0 WRMSR WRMSR
1 0 KVM_RUN :-(
To address all of those, I ended up with the below. It's not actually that much
code, but amount of documentation needed to explain everything is ugly.
#ifndef CONFIG_CPU_MITIGATIONS
static DEFINE_MUTEX(srso_add_vm_lock);
static atomic_t srso_nr_vms;
static bool srso_set;
static void svm_toggle_srso_spec_reduce(void *ign)
{
/*
* Read both srso_set and the count (and don't pass in set/clear!) so
* that BP_SPEC_REDUCE isn't incorrectly cleared if IPIs from destroying
* he last VM arrive after IPIs from creating the first VM (in the new
* "generation").
*/
if (READ_ONCE(srso_set) || atomic_read(&srso_nr_vms))
msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
else
msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
}
static void svm_srso_vm_init(void)
{
if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
return;
/*
* Acquire the mutex on a 0=>1 transition to ensure BP_SPEC_REDUCE is
* set before any VM is fully created.
*/
if (atomic_inc_not_zero(&srso_nr_vms))
return;
guard(mutex)(&srso_add_vm_lock);
/*
* Re-check the count before sending IPIs, only the first task needs to
* toggle BP_SPEC_REDUCE, other tasks just need to wait. For the 0=>1
* case, update the count *after* BP_SPEC_REDUCE is set on all CPUs to
* ensure creating multiple VMs concurrently doesn't result in a task
* skipping the mutex before BP_SPEC_REDUCE is set.
*
* Atomically increment the count in all cases as the mutex only guards
* 0=>1 transitions, e.g. another task can decrement the count if a VM
* was created (0=>1) *and* destroyed (1=>0) between observing a count
* of '0' and acquiring the mutex, and another task can increment the
* count if the count is already >= 1.
*/
if (!atomic_inc_not_zero(&srso_nr_vms)) {
WRITE_ONCE(srso_set, true);
on_each_cpu(svm_toggle_srso_spec_reduce, NULL, 1);
atomic_inc(&srso_nr_vms);
smp_mb__after_atomic();
WRITE_ONCE(srso_set, false);
}
}
static void svm_srso_vm_destroy(void)
{
if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
return;
/*
* If the last VM is being destroyed, clear BP_SPEC_REDUCE on all CPUs.
* Unlike the creation case, there is no need to wait on other CPUs as
* running code with BP_SPEC_REDUCE=1 is always safe, KVM just needs to
* ensure guest code never runs with BP_SPEC_REDUCE=0.
*/
if (atomic_dec_and_test(&srso_nr_vms))
on_each_cpu(svm_toggle_srso_spec_reduce, NULL, 0);
}
#else
static void svm_srso_vm_init(void) { }
static void svm_srso_vm_destroy(void) { }
#endif /* CONFIG_CPU_MITIGATIONS */
On Thu, May 01, 2025 at 09:56:44AM -0700, Sean Christopherson wrote:
> Heh, I considered that, and even tried it this morning because I thought it wouldn't
> be as tricky as I first thought, but turns out, yeah, it's tricky. The complication
> is that KVM needs to ensure BP_SPEC_REDUCE=1 on all CPUs before any VM is created.
>
> I thought it wouldn't be _that_ tricky once I realized the 1=>0 case doesn't require
> ordering, e.g. running host code while other CPUs have BP_SPEC_REDUCE=1 is totally
> fine, KVM just needs to ensure no guest code is executed with BP_SPEC_REDUCE=0.
> But guarding against all the possible edge cases is comically difficult.
>
> For giggles, I did get it working, but it's a rather absurd amount of complexity
Thanks for taking the time to explain - that's, well, funky. :-\
Btw, in talking about this, David had this other idea which sounds
interesting:
How about we do a per-CPU var which holds down whether BP_SPEC_REDUCE is
enabled on the CPU?
It'll toggle the MSR bit before VMRUN on the CPU when num VMs goes 0=>1. This
way you avoid the IPIs and you set the bit on time.
You'd still need to do an IPI on VMEXIT when VM count does 1=>0 but that's
easy.
Dunno, there probably already is a per-CPU setting in KVM so you could add
that to it...
Anyway, something along those lines...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 4/30/25 6:33 PM, Sean Christopherson wrote:
> On Tue, Apr 29, 2025, Borislav Petkov wrote:
>> On Tue, Feb 18, 2025 at 12:13:33PM +0100, Borislav Petkov wrote:
>>> So,
>>>
>>> in the interest of finally making some progress here I'd like to commit this
>>> below (will test it one more time just in case but it should work :-P). It is
>>> simple and straight-forward and doesn't need an IBPB when the bit gets
>>> cleared.
>>>
>>> A potential future improvement is David's suggestion that there could be a way
>>> for tracking when the first guest gets started, we set the bit then, we make
>>> sure the bit gets set on each logical CPU when the guests migrate across the
>>> machine and when the *last* guest exists, that bit gets cleared again.
>> Well, that "simplicity" was short-lived:
>>
>> https://www.phoronix.com/review/linux-615-amd-regression
> LOL.
>
>> Sean, how about this below?
> Eww. That's quite painful, and completely disallowing enable_virt_on_load is
> undesirable, e.g. for use cases where the host is (almost) exclusively running
> VMs.
>
> Best idea I have is to throw in the towel on getting fancy, and just maintain a
> dedicated count in SVM.
>
> Alternatively, we could plumb an arch hook into kvm_create_vm() and kvm_destroy_vm()
> that's called when KVM adds/deletes a VM from vm_list, and key off vm_list being
> empty. But that adds a lot of boilerplate just to avoid a mutex+count.
>
> I haven't tested on a system with X86_FEATURE_SRSO_BP_SPEC_REDUCE, but did verify
> the mechanics by inverting the flag.
Testing this patch on the same EPYC Turin server as my original tests, I
can confirm that on a clean boot without any VMs running, the
performance is back to where it was on v6.14. :)
Thanks,
Michael
>
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Wed, 30 Apr 2025 15:34:50 -0700
> Subject: [PATCH] KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count
> transitions
>
> Set the magic BP_SPEC_REDUCE bit to mitigate SRSO when running VMs if and
> only if KVM has at least one active VM. Leaving the bit set at all times
> unfortunately degrades performance by a wee bit more than expected.
>
> Use a dedicated mutex and counter instead of hooking virtualization
> enablement, as changing the behavior of kvm.enable_virt_at_load based on
> SRSO_BP_SPEC_REDUCE is painful, and has its own drawbacks, e.g. could
> result in performance issues for flows that are sensity to VM creation
> latency.
>
> Fixes: 8442df2b49ed ("x86/bugs: KVM: Add support for SRSO_MSR_FIX")
> Reported-by: Michael Larabel <Michael@michaellarabel.com>
> Closes: https://www.phoronix.com/review/linux-615-amd-regression
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/svm/svm.c | 39 +++++++++++++++++++++++++++++++++------
> 1 file changed, 33 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d5d0c5c3300b..fe8866572218 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -607,9 +607,6 @@ static void svm_disable_virtualization_cpu(void)
> kvm_cpu_svm_disable();
>
> amd_pmu_disable_virt();
> -
> - if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> - msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
> }
>
> static int svm_enable_virtualization_cpu(void)
> @@ -687,9 +684,6 @@ static int svm_enable_virtualization_cpu(void)
> rdmsr(MSR_TSC_AUX, sev_es_host_save_area(sd)->tsc_aux, msr_hi);
> }
>
> - if (cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> - msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
> -
> return 0;
> }
>
> @@ -5032,10 +5026,42 @@ static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> sev_vcpu_deliver_sipi_vector(vcpu, vector);
> }
>
> +static DEFINE_MUTEX(srso_lock);
> +static int srso_nr_vms;
> +
> +static void svm_toggle_srso_spec_reduce(void *set)
> +{
> + if (set)
> + msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
> + else
> + msr_clear_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_BP_SPEC_REDUCE_BIT);
> +}
> +
> +static void svm_srso_add_remove_vm(int count)
> +{
> + bool set;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> + return;
> +
> + guard(mutex)(&srso_lock);
> +
> + set = !srso_nr_vms;
> + srso_nr_vms += count;
> +
> + WARN_ON_ONCE(srso_nr_vms < 0);
> + if (!set && srso_nr_vms)
> + return;
> +
> + on_each_cpu(svm_toggle_srso_spec_reduce, (void *)set, 1);
> +}
> +
> static void svm_vm_destroy(struct kvm *kvm)
> {
> avic_vm_destroy(kvm);
> sev_vm_destroy(kvm);
> +
> + svm_srso_add_remove_vm(-1);
> }
>
> static int svm_vm_init(struct kvm *kvm)
> @@ -5061,6 +5087,7 @@ static int svm_vm_init(struct kvm *kvm)
> return ret;
> }
>
> + svm_srso_add_remove_vm(1);
> return 0;
> }
>
>
> base-commit: f158e1b145f73aae1d3b7e756eb129a15b2b7a90
> --
> in the interest of finally making some progress here I'd like to commit this
> below (will test it one more time just in case but it should work :-P). It is
> simple and straight-forward and doesn't need an IBPB when the bit gets
> cleared.
That's indeed simple and straight-forward for the time being.
Maybe a small improvement we could add on top is to have a separate and
dedicated cmdline option?
Indeed, with `X86_FEATURE_SRSO_USER_KERNEL_NO` we are not effectively using an
IBPB on VM-Exit anymore. Something like the diff down below?
Best,
Patrick
---
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1d7afc40f2272..7609d80eda123 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2531,6 +2531,7 @@ enum srso_mitigation_cmd {
SRSO_CMD_SAFE_RET,
SRSO_CMD_IBPB,
SRSO_CMD_IBPB_ON_VMEXIT,
+ SRSO_CMD_BP_SPEC_REDUCE,
};
static const char * const srso_strings[] = {
@@ -2562,6 +2563,8 @@ static int __init srso_parse_cmdline(char *str)
srso_cmd = SRSO_CMD_IBPB;
else if (!strcmp(str, "ibpb-vmexit"))
srso_cmd = SRSO_CMD_IBPB_ON_VMEXIT;
+ else if (!strcmp(str, "spec-reduce"))
+ srso_cmd = SRSO_CMD_BP_SPEC_REDUCE;
else
pr_err("Ignoring unknown SRSO option (%s).", str);
@@ -2617,7 +2620,7 @@ static void __init srso_select_mitigation(void)
case SRSO_CMD_SAFE_RET:
if (boot_cpu_has(X86_FEATURE_SRSO_USER_KERNEL_NO))
- goto ibpb_on_vmexit;
+ goto spec_reduce;
if (IS_ENABLED(CONFIG_MITIGATION_SRSO)) {
/*
@@ -2670,14 +2673,7 @@ static void __init srso_select_mitigation(void)
}
break;
-ibpb_on_vmexit:
case SRSO_CMD_IBPB_ON_VMEXIT:
- if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
- pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
- srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
- break;
- }
-
if (IS_ENABLED(CONFIG_MITIGATION_IBPB_ENTRY)) {
if (has_microcode) {
setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
@@ -2694,6 +2690,14 @@ static void __init srso_select_mitigation(void)
pr_err("WARNING: kernel not compiled with MITIGATION_IBPB_ENTRY.\n");
}
break;
+
+spec_reduce:
+ case SRSO_CMD_BP_SPEC_REDUCE:
+ if (boot_cpu_has(X86_FEATURE_SRSO_BP_SPEC_REDUCE)) {
+ pr_notice("Reducing speculation to address VM/HV SRSO attack vector.\n");
+ srso_mitigation = SRSO_MITIGATION_BP_SPEC_REDUCE;
+ break;
+ }
default:
break;
}
On Tue, Feb 18, 2025 at 02:42:57PM +0000, Patrick Bellasi wrote:
> Maybe a small improvement we could add on top is to have a separate and
> dedicated cmdline option?
>
> Indeed, with `X86_FEATURE_SRSO_USER_KERNEL_NO` we are not effectively using an
> IBPB on VM-Exit anymore. Something like the diff down below?
Except that I don't see the point of this yet one more cmdline option. Our
mitigations options space is a nightmare. Why do we want to add another one?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Feb 14, 2025 at 09:10:05PM +0100, Borislav Petkov wrote: > On Thu, Feb 13, 2025 at 05:50:57PM +0000, Patrick Bellasi wrote: > > The "should be set identically across all processors in the system" makes me > > wondering if using the "KVM's user_return approach" proposed here is robust > > enough. Could this not lead to the bit being possibly set only on some CPU > > but not others? > > That's fine, we should update that paper. > > > If BpSpecReduce does not prevent training, but only the training from being > > used, should not we keep it consistently set after a guest has run, or until an > > IBPB is executed? > > After talking with folks internally, you're probably right. We should slap an > IBPB before clearing. Which means, I cannot use the MSR return slots anymore. > I will have to resurrect some of the other solutions we had lined up... > > Stay tuned. Thanks for working on this! Should this patch (and the two previously merged patches) be backported to stable? I noticed they did not have CC:stable. > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Sat, Feb 15, 2025 at 12:57:07AM +0000, Yosry Ahmed wrote:
> Should this patch (and the two previously merged patches) be backported
> to stable? I noticed they did not have CC:stable.
Because a stable backport would require manual backporting, depending on where
it goes. And Brendan, Patrick or I are willing to do that - it's a question of
who gets there first. :-)
And folks, whoever wants to backport, pls tell the others so that we don't
duplicate work.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Sat, Feb 15, 2025 at 10:15:27AM +0100, Borislav Petkov wrote: > On Sat, Feb 15, 2025 at 12:57:07AM +0000, Yosry Ahmed wrote: > > Should this patch (and the two previously merged patches) be backported > > to stable? I noticed they did not have CC:stable. > > Because a stable backport would require manual backporting, depending on where > it goes. And Brendan, Patrick or I are willing to do that - it's a question of > who gets there first. :-) Thanks for the context, didn't realize a manual backport was needed.
On Sun, Feb 16, 2025 at 09:47:56PM -0800, Yosry Ahmed wrote:
> Thanks for the context, didn't realize a manual backport was needed.
Feb 17 gregkh@linuxfoundation.org (:9.1K|) FAILED: patch "[PATCH] x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-Exit" failed to apply to 6.6-stable tree
Feb 17 gregkh@linuxfoundation.org (:9.1K|) FAILED: patch "[PATCH] x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-Exit" failed to apply to 6.1-stable tree
Feb 17 gregkh@linuxfoundation.org (:9.1K|) FAILED: patch "[PATCH] x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-Exit" failed to apply to 5.15-stable tree
Feb 17 gregkh@linuxfoundation.org (:9.2K|) FAILED: patch "[PATCH] x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-Exit" failed to apply to 5.10-stable tree
Feb 17 gregkh@linuxfoundation.org (:9.5K|) FAILED: patch "[PATCH] x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-Exit" failed to apply to 5.4-stable tree
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Jan 23, 2025, Borislav Petkov wrote:
> On Thu, Jan 23, 2025 at 08:25:17AM -0800, Sean Christopherson wrote:
> > But if we wanted to catch all paths, wrap the guts and clear the feature in the
> > outer layer?
>
> Yap, all valid points, thanks for catching those.
>
> > +static void __init srso_select_mitigation(void)
> > +{
> > + __srso_select_mitigation();
> >
> > if (srso_mitigation != SRSO_MITIGATION_BP_SPEC_REDUCE)
> > setup_clear_cpu_cap(X86_FEATURE_SRSO_BP_SPEC_REDUCE);
> > -
> > - pr_info("%s\n", srso_strings[srso_mitigation]);
> > }
>
> What I'd like, though, here is to not dance around this srso_mitigation
> variable setting in __srso_select_mitigation() and then know that the __
> function did modify it and now we can eval it.
>
> I'd like for the __ function to return it like __ssb_select_mitigation() does.
>
> But then if we do that, we'll have to do the same changes and turn the returns
> to "goto out" where all the paths converge. And I'd prefer if those paths
> converged anyway and not have some "early escapes" like those returns which
> I completely overlooked. :-\
>
> And that code is going to change soon anyway after David's attack vectors
> series.
>
> So, long story short, I guess the simplest thing to do would be to simply do
> the below.
I almost proposed that as well, the only reason I didn't is because I wasn't sure
what to do with the pr_info() at the end.
On Thu, Jan 23, 2025 at 10:04:17AM -0800, Sean Christopherson wrote:
> I almost proposed that as well, the only reason I didn't is because I wasn't
> sure what to do with the pr_info() at the end.
Yeah, that came up too while looking at David's patches. We're not really
consistent when issuing the mitigation strings depending on the mitigation or
whether we're affected or not...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Borislav, What is the performance impact of BpSpecReduce on Turin?
On Tue, Feb 11, 2025 at 11:19:25AM -0800, Jim Mattson wrote:
> What is the performance impact of BpSpecReduce on Turin?
See upthread:
https://lore.kernel.org/all/20241216173142.GDZ2Bj_uPBG3TTPYd_@fat_crate.local
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
© 2016 - 2025 Red Hat, Inc.