[PATCH v9 08/22] KVM: VMX: Set FRED MSR intercepts

Xin Li (Intel) posted 22 patches 3 months, 2 weeks ago
[PATCH v9 08/22] KVM: VMX: Set FRED MSR intercepts
Posted by Xin Li (Intel) 3 months, 2 weeks ago
From: Xin Li <xin3.li@intel.com>

On a userspace MSR filter change, set FRED MSR intercepts.

The eight FRED MSRs, MSR_IA32_FRED_RSP[123], MSR_IA32_FRED_STKLVLS,
MSR_IA32_FRED_SSP[123] and MSR_IA32_FRED_CONFIG, are all safe to
passthrough, because each has a corresponding host and guest field
in VMCS.

Both MSR_IA32_FRED_RSP0 and MSR_IA32_FRED_SSP0 (aka MSR_IA32_PL0_SSP)
are dedicated for userspace event delivery, IOW they are NOT used in
any kernel event delivery and the execution of ERETS.  Thus KVM can
run safely with guest values in the two MSRs.  As a result, save and
restore of their guest values are deferred until vCPU context switch,
Host MSR_IA32_FRED_RSP0 is restored upon returning to userspace, and
Host MSR_IA32_PL0_SSP is managed with XRSTORS/XSAVES.

Note, FRED SSP MSRs, including MSR_IA32_PL0_SSP, are available on
any processor that enumerates FRED.  On processors that support FRED
but not CET, FRED transitions do not use these MSRs, but they remain
accessible via MSR instructions such as RDMSR and WRMSR.

Intercept MSR_IA32_PL0_SSP when CET shadow stack is not supported,
regardless of FRED support.  This ensures the guest value remains
fully virtual and does not modify the hardware FRED SSP0 MSR.

This behavior is consistent with the current setup in
vmx_recalc_msr_intercepts(), so no change is needed to the interception
logic for MSR_IA32_PL0_SSP.

Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
---

Changes in v7:
* Rewrite the changelog and comment, majorly for MSR_IA32_PL0_SSP.

Changes in v5:
* Skip execution of vmx_set_intercept_for_fred_msr() if FRED is
  not available or enabled (Sean).
* Use 'intercept' as the variable name to indicate whether MSR
  interception should be enabled (Sean).
* Add TB from Xuelian Guo.
---
 arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c8b5359123bf..ef9765779884 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4146,6 +4146,51 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	}
 }
 
+static void vmx_set_intercept_for_fred_msr(struct kvm_vcpu *vcpu)
+{
+	bool intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_FRED);
+
+	if (!kvm_cpu_cap_has(X86_FEATURE_FRED))
+		return;
+
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP1, MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP2, MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP3, MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_STKLVLS, MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP1, MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP2, MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP3, MSR_TYPE_RW, intercept);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_CONFIG, MSR_TYPE_RW, intercept);
+
+	/*
+	 * MSR_IA32_FRED_RSP0 and MSR_IA32_PL0_SSP (aka MSR_IA32_FRED_SSP0) are
+	 * designed for event delivery while executing in userspace.  Since KVM
+	 * operates entirely in kernel mode (CPL is always 0 after any VM exit),
+	 * it can safely retain and operate with guest-defined values for these
+	 * MSRs.
+	 *
+	 * As a result, interception of MSR_IA32_FRED_RSP0 and MSR_IA32_PL0_SSP
+	 * is unnecessary.
+	 *
+	 * Note: Saving and restoring MSR_IA32_PL0_SSP is part of CET supervisor
+	 * context management.  However, FRED SSP MSRs, including MSR_IA32_PL0_SSP,
+	 * are available on any processor that enumerates FRED.
+	 *
+	 * On processors that support FRED but not CET, FRED transitions do not
+	 * use these MSRs, but they remain accessible via MSR instructions such
+	 * as RDMSR and WRMSR.
+	 *
+	 * Intercept MSR_IA32_PL0_SSP when CET shadow stack is not supported,
+	 * regardless of FRED support.  This ensures the guest value remains
+	 * fully virtual and does not modify the hardware FRED SSP0 MSR.
+	 *
+	 * This behavior is consistent with the current setup in
+	 * vmx_recalc_msr_intercepts(), so no change is needed to the interception
+	 * logic for MSR_IA32_PL0_SSP.
+	 */
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP0, MSR_TYPE_RW, intercept);
+}
+
 static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 {
 	bool intercept;
@@ -4212,6 +4257,8 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, intercept);
 	}
 
+	vmx_set_intercept_for_fred_msr(vcpu);
+
 	/*
 	 * x2APIC and LBR MSR intercepts are modified on-demand and cannot be
 	 * filtered by userspace.
-- 
2.51.0
Re: [PATCH v9 08/22] KVM: VMX: Set FRED MSR intercepts
Posted by Dave Hansen 3 weeks, 2 days ago
On 10/26/25 13:18, Xin Li (Intel) wrote:
> Both MSR_IA32_FRED_RSP0 and MSR_IA32_FRED_SSP0 (aka MSR_IA32_PL0_SSP)
> are dedicated for userspace event delivery, IOW they are NOT used in
> any kernel event delivery and the execution of ERETS.  Thus KVM can
> run safely with guest values in the two MSRs.  As a result, save and
> restore of their guest values are deferred until vCPU context switch,
> Host MSR_IA32_FRED_RSP0 is restored upon returning to userspace, and
> Host MSR_IA32_PL0_SSP is managed with XRSTORS/XSAVES.

Is it worth making MSR_IA32_FRED_RSP0 special versus MSR_IA32_FRED_RSP[123]?

Is it needed because MSR_IA32_FRED_RSP0 is rewritten all the time as
CPUs switch between threads? But MSR_IA32_FRED_RSP[123] are not
frequently written?

I'd like to hear more about the motivation.
Re: [PATCH v9 08/22] KVM: VMX: Set FRED MSR intercepts
Posted by H. Peter Anvin 3 weeks, 2 days ago
On 2026-01-16 11:49, Dave Hansen wrote:
> On 10/26/25 13:18, Xin Li (Intel) wrote:
>> Both MSR_IA32_FRED_RSP0 and MSR_IA32_FRED_SSP0 (aka MSR_IA32_PL0_SSP)
>> are dedicated for userspace event delivery, IOW they are NOT used in
>> any kernel event delivery and the execution of ERETS.  Thus KVM can
>> run safely with guest values in the two MSRs.  As a result, save and
>> restore of their guest values are deferred until vCPU context switch,
>> Host MSR_IA32_FRED_RSP0 is restored upon returning to userspace, and
>> Host MSR_IA32_PL0_SSP is managed with XRSTORS/XSAVES.
> 
> Is it worth making MSR_IA32_FRED_RSP0 special versus MSR_IA32_FRED_RSP[123]?
> 
> Is it needed because MSR_IA32_FRED_RSP0 is rewritten all the time as
> CPUs switch between threads? But MSR_IA32_FRED_RSP[123] are not
> frequently written?
> 
> I'd like to hear more about the motivation.

Because RSP[123] (and SSP[123]) are used by the kernel itself they are
context-switched by VTx automatically. This is necessary in order to preserve
the FRED architectural invariant that there should NEVER be a "gap" during
which it is unsafe to take an exception.

[RS]SP0 are not used while in kernel mode (since the only time we switch
*onto* the level 0 kernel stack is when entering from user space, logically
"CSL -1"), so during FRED architectural discussions it was agreed that it was
better to leave its management to kernel code, especially as KVM often does
not need to cross back into user space after each VMEXIT.

A nice side effect, but just that -- a side effect -- is that we don't need to
actually modify [RS]SP0 in the context-switch or task setup code.

The invariant that needs to be maintained is that IF the cached rsp0 value is
equal to the initial stack pointer for the running task, THEN the MSR MUST
match the cached value. A corollary of that is that if we modify either the
MSR or the cached value from an event that may have interrupted the kernel we
MUST make sure that this invariant cannot be inadvertently broken.

Setting the cached value to an invalid value (e.g. NULL/0) should work; there
shouldn't be an actual need to read the MSR unless I'm mistaken -- but I have
been working on other code today and so my cache for this specific code is not
100% up to date.

	-hpa
Re: [PATCH v9 08/22] KVM: VMX: Set FRED MSR intercepts
Posted by Chao Gao 2 months, 4 weeks ago
On Sun, Oct 26, 2025 at 01:18:56PM -0700, Xin Li (Intel) wrote:
>From: Xin Li <xin3.li@intel.com>
>
>On a userspace MSR filter change, set FRED MSR intercepts.
>
>The eight FRED MSRs, MSR_IA32_FRED_RSP[123], MSR_IA32_FRED_STKLVLS,
>MSR_IA32_FRED_SSP[123] and MSR_IA32_FRED_CONFIG, are all safe to
>passthrough, because each has a corresponding host and guest field
>in VMCS.

Sean prefers to pass through MSRs only when there is a reason to do that rather
than just because it is free. My thinking is that RSPs and SSPs are per-task
and are context-switched frequently, so we need to pass through them. But I am
not sure if there is a reason for STKLVLS and CONFIG.

[*] https://lore.kernel.org/all/aKTGVvOb8PZ7mzVr@google.com/

>
>Both MSR_IA32_FRED_RSP0 and MSR_IA32_FRED_SSP0 (aka MSR_IA32_PL0_SSP)
>are dedicated for userspace event delivery, IOW they are NOT used in
>any kernel event delivery and the execution of ERETS.  Thus KVM can
>run safely with guest values in the two MSRs.  As a result, save and
>restore of their guest values are deferred until vCPU context switch,
>Host MSR_IA32_FRED_RSP0 is restored upon returning to userspace, and
>Host MSR_IA32_PL0_SSP is managed with XRSTORS/XSAVES.
>
>Note, FRED SSP MSRs, including MSR_IA32_PL0_SSP, are available on
>any processor that enumerates FRED.  On processors that support FRED
>but not CET, FRED transitions do not use these MSRs, but they remain
>accessible via MSR instructions such as RDMSR and WRMSR.
>
>Intercept MSR_IA32_PL0_SSP when CET shadow stack is not supported,
>regardless of FRED support.  This ensures the guest value remains
>fully virtual and does not modify the hardware FRED SSP0 MSR.
>
>This behavior is consistent with the current setup in
>vmx_recalc_msr_intercepts(), so no change is needed to the interception
>logic for MSR_IA32_PL0_SSP.
>
>Signed-off-by: Xin Li <xin3.li@intel.com>
>Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>Tested-by: Shan Kang <shan.kang@intel.com>
>Tested-by: Xuelian Guo <xuelian.guo@intel.com>
>---
>
>Changes in v7:
>* Rewrite the changelog and comment, majorly for MSR_IA32_PL0_SSP.
>
>Changes in v5:
>* Skip execution of vmx_set_intercept_for_fred_msr() if FRED is
>  not available or enabled (Sean).
>* Use 'intercept' as the variable name to indicate whether MSR
>  interception should be enabled (Sean).
>* Add TB from Xuelian Guo.
>---
> arch/x86/kvm/vmx/vmx.c | 47 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index c8b5359123bf..ef9765779884 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -4146,6 +4146,51 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
> 	}
> }
> 
>+static void vmx_set_intercept_for_fred_msr(struct kvm_vcpu *vcpu)
>+{
>+	bool intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_FRED);
>+
>+	if (!kvm_cpu_cap_has(X86_FEATURE_FRED))
>+		return;
>+
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP1, MSR_TYPE_RW, intercept);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP2, MSR_TYPE_RW, intercept);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP3, MSR_TYPE_RW, intercept);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_STKLVLS, MSR_TYPE_RW, intercept);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP1, MSR_TYPE_RW, intercept);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP2, MSR_TYPE_RW, intercept);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_SSP3, MSR_TYPE_RW, intercept);
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_CONFIG, MSR_TYPE_RW, intercept);
>+
>+	/*
>+	 * MSR_IA32_FRED_RSP0 and MSR_IA32_PL0_SSP (aka MSR_IA32_FRED_SSP0) are
>+	 * designed for event delivery while executing in userspace.  Since KVM
>+	 * operates entirely in kernel mode (CPL is always 0 after any VM exit),
>+	 * it can safely retain and operate with guest-defined values for these
>+	 * MSRs.
>+	 *
>+	 * As a result, interception of MSR_IA32_FRED_RSP0 and MSR_IA32_PL0_SSP
>+	 * is unnecessary.

I think it would be slightly better to document why MSRs need to be passed
through rather than just why it is safe to pass through.

>+	 *
>+	 * Note: Saving and restoring MSR_IA32_PL0_SSP is part of CET supervisor
>+	 * context management.  However, FRED SSP MSRs, including MSR_IA32_PL0_SSP,
>+	 * are available on any processor that enumerates FRED.
>+	 *
>+	 * On processors that support FRED but not CET, FRED transitions do not
>+	 * use these MSRs, but they remain accessible via MSR instructions such
>+	 * as RDMSR and WRMSR.
>+	 *
>+	 * Intercept MSR_IA32_PL0_SSP when CET shadow stack is not supported,
>+	 * regardless of FRED support.  This ensures the guest value remains
>+	 * fully virtual and does not modify the hardware FRED SSP0 MSR.

Modifying the hardware MSR itself isn't a problem. The problem is that the
MSR isn't supposed to be accessed frequently in the guest if CET isn't
supported and will never be accessed via XSAVES. So, there is no good reason
to pass through it. And passing through the MSR means KVM needs to
context-switch it along with vcpu load/put, i.e., more code and complexity.

>+	 *
>+	 * This behavior is consistent with the current setup in
>+	 * vmx_recalc_msr_intercepts(), so no change is needed to the interception
>+	 * logic for MSR_IA32_PL0_SSP.
>+	 */
>+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_FRED_RSP0, MSR_TYPE_RW, intercept);
>+}
>+