arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/vmx/tdx.c | 4 ++-- arch/x86/kvm/x86.c | 21 ++++++++++----------- 3 files changed, 14 insertions(+), 15 deletions(-)
Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
and use the helper kvm_set_user_return_msr() to make it obvious that the
double-underscores version is doing a subset of the work of the "full"
setter.
While the function does indeed update a cache, the nomenclature becomes
slightly misleading when adding a getter[1], as the current value isn't
_just_ the cached value, it's also the value that's currently loaded in
hardware.
Opportunistically rename "index" to "slot" in the prototypes. The user-
return APIs deliberately use "slot" to try and make it more obvious that
they take the slot within the array, not the index of the MSR.
No functional change intended.
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://lore.kernel.org/all/aM2EvzLLmBi5-iQ5@google.com [1]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 4 ++--
arch/x86/kvm/vmx/tdx.c | 4 ++--
arch/x86/kvm/x86.c | 21 ++++++++++-----------
3 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 17772513b9cc..b633d5c33f57 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2374,8 +2374,8 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
int kvm_add_user_return_msr(u32 msr);
int kvm_find_user_return_msr(u32 msr);
-int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
-void kvm_user_return_msr_update_cache(unsigned int index, u64 val);
+int kvm_set_user_return_msr(unsigned int slot, u64 val, u64 mask);
+void __kvm_set_user_return_msr(unsigned int slot, u64 val);
static inline bool kvm_is_supported_user_return_msr(u32 msr)
{
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ff41d3d00380..b3cb39ae937d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -809,8 +809,8 @@ static void tdx_user_return_msr_update_cache(void)
int i;
for (i = 0; i < ARRAY_SIZE(tdx_uret_msrs); i++)
- kvm_user_return_msr_update_cache(tdx_uret_msrs[i].slot,
- tdx_uret_msrs[i].defval);
+ __kvm_set_user_return_msr(tdx_uret_msrs[i].slot,
+ tdx_uret_msrs[i].defval);
}
static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e07936efacd4..d975d0c60107 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -648,6 +648,15 @@ static void kvm_user_return_register_notifier(struct kvm_user_return_msrs *msrs)
}
}
+void __kvm_set_user_return_msr(unsigned int slot, u64 value)
+{
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
+
+ msrs->values[slot].curr = value;
+ kvm_user_return_register_notifier(msrs);
+}
+EXPORT_SYMBOL_GPL(__kvm_set_user_return_msr);
+
int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
{
struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
@@ -660,21 +669,11 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
if (err)
return 1;
- msrs->values[slot].curr = value;
- kvm_user_return_register_notifier(msrs);
+ __kvm_set_user_return_msr(slot, value);
return 0;
}
EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
-void kvm_user_return_msr_update_cache(unsigned int slot, u64 value)
-{
- struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
-
- msrs->values[slot].curr = value;
- kvm_user_return_register_notifier(msrs);
-}
-EXPORT_SYMBOL_GPL(kvm_user_return_msr_update_cache);
-
static void drop_user_return_notifiers(void)
{
struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
base-commit: c8fbf7ceb2ae3f64b0c377c8c21f6df577a13eb4
--
2.51.0.470.ga7dc726c21-goog
On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote: > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr() > and use the helper kvm_set_user_return_msr() to make it obvious that the > double-underscores version is doing a subset of the work of the "full" > setter. > > While the function does indeed update a cache, the nomenclature becomes > slightly misleading when adding a getter[1], as the current value isn't > _just_ the cached value, it's also the value that's currently loaded in > hardware. Nit: For TDX, "it's also the value that's currently loaded in hardware" is not true. > Opportunistically rename "index" to "slot" in the prototypes. The user- > return APIs deliberately use "slot" to try and make it more obvious that > they take the slot within the array, not the index of the MSR. > > No functional change intended. Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> > Cc: Yan Zhao <yan.y.zhao@intel.com> > Cc: Xiaoyao Li <xiaoyao.li@intel.com> > Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> > Link: https://lore.kernel.org/all/aM2EvzLLmBi5-iQ5@google.com [1] > Signed-off-by: Sean Christopherson <seanjc@google.com>
On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote: > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote: > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr() > > and use the helper kvm_set_user_return_msr() to make it obvious that the > > double-underscores version is doing a subset of the work of the "full" > > setter. > > > > While the function does indeed update a cache, the nomenclature becomes > > slightly misleading when adding a getter[1], as the current value isn't > > _just_ the cached value, it's also the value that's currently loaded in > > hardware. > Nit: > > For TDX, "it's also the value that's currently loaded in hardware" is not true. since tdx module invokes wrmsr()s before each exit to VMM, while KVM only invokes __kvm_set_user_return_msr() in tdx_vcpu_put(). > > Opportunistically rename "index" to "slot" in the prototypes. The user- > > return APIs deliberately use "slot" to try and make it more obvious that > > they take the slot within the array, not the index of the MSR. > > > > No functional change intended. > > Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> > > > Cc: Yan Zhao <yan.y.zhao@intel.com> > > Cc: Xiaoyao Li <xiaoyao.li@intel.com> > > Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> > > Link: https://lore.kernel.org/all/aM2EvzLLmBi5-iQ5@google.com [1] > > Signed-off-by: Sean Christopherson <seanjc@google.com>
On Tue, Sep 30, 2025, Yan Zhao wrote:
> On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > double-underscores version is doing a subset of the work of the "full"
> > > setter.
> > >
> > > While the function does indeed update a cache, the nomenclature becomes
> > > slightly misleading when adding a getter[1], as the current value isn't
> > > _just_ the cached value, it's also the value that's currently loaded in
> > > hardware.
> > Nit:
> >
> > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
No? kvm_user_return_msr_update_cache() is passed the value that's currently
loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
Ah, I suspect you're calling out that the cache can be stale. Maybe this?
While the function does indeed update a cache, the nomenclature becomes
slightly misleading when adding a getter[1], as the current value isn't
_just_ the cached value, it's also the value that's currently loaded in
hardware (ignoring that the cache holds stale data until the vCPU is put,
i.e. until KVM prepares to switch back to the host).
Actually, that's a bug waiting to happen when the getter comes along. Rather
than document the potential pitfall, what about adding a prep patch to mimize
the window? Then _this_ patch shouldn't need the caveat about the cache being
stale.
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index ff41d3d00380..326fa81cb35f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
vt->guest_state_loaded = true;
+
+ /*
+ * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
+ * VP.ENTER succeeds, i.e. on TD-Exit. Mark those MSRs as needing an
+ * update to synchronize the "current" value in KVM's cache with the
+ * value in hardware (loaded by the TDX-Module).
+ */
+ to_tdx(vcpu)->need_user_return_msr_update = true;
}
struct tdx_uret_msr {
@@ -816,7 +824,6 @@ static void tdx_user_return_msr_update_cache(void)
static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
{
struct vcpu_vt *vt = to_vt(vcpu);
- struct vcpu_tdx *tdx = to_tdx(vcpu);
if (!vt->guest_state_loaded)
return;
@@ -824,11 +831,6 @@ static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
++vcpu->stat.host_state_reload;
wrmsrl(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base);
- if (tdx->guest_entered) {
- tdx_user_return_msr_update_cache();
- tdx->guest_entered = false;
- }
-
vt->guest_state_loaded = false;
}
@@ -1067,7 +1069,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
update_debugctlmsr(vcpu->arch.host_debugctl);
tdx_load_host_xsave_state(vcpu);
- tdx->guest_entered = true;
+
+ if (tdx->need_user_return_msr_update) {
+ tdx_user_return_msr_update_cache();
+ tdx->need_user_return_msr_update = false;
+ }
vcpu->arch.regs_avail &= TDX_REGS_AVAIL_SET;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index ca39a9391db1..fcac1627f71f 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -67,7 +67,7 @@ struct vcpu_tdx {
u64 vp_enter_ret;
enum vcpu_tdx_state state;
- bool guest_entered;
+ bool need_user_return_msr_update;
u64 map_gpa_next;
u64 map_gpa_end;
Sorry for the slow response due to the PRC holiday.
On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> On Tue, Sep 30, 2025, Yan Zhao wrote:
> > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > double-underscores version is doing a subset of the work of the "full"
> > > > setter.
> > > >
> > > > While the function does indeed update a cache, the nomenclature becomes
> > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > hardware.
> > > Nit:
> > >
> > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
>
> No? kvm_user_return_msr_update_cache() is passed the value that's currently
> loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
>
> Ah, I suspect you're calling out that the cache can be stale. Maybe this?
Right. But not just that the cache can be stale. My previous reply was quite
misleading.
As with below tables, where
CURR: msrs->values[slot].curr.
REAL: value that's currently loaded in hardware
For TDs,
CURR REAL
-----------------------------------------------------------------------
1. enable virtualization host value host value
2. TDH.VP.ENTER host value guest value (updated by tdx module)
3. TDH.VP.ENTER return host value defval (updated by tdx module)
4. tdx_vcpu_put defval defval
5. exit to user mode host value host value
For normal VMs,
CURR REAL
-----------------------------------------------------------------------
1. enable virtualization host value host value
2. before vcpu_run shadow guest value shadow guest value
3. after vcpu_run shadow guest value shadow guest value
4. exit to user mode host value host value
Unlike normal VMs, where msrs->values[slot].curr always matches the the value
that's currently loaded in hardware. For TDs, msrs->values[slot].curr does not
contain the value that's currently loaded in hardware in stages 2-3.
> While the function does indeed update a cache, the nomenclature becomes
> slightly misleading when adding a getter[1], as the current value isn't
> _just_ the cached value, it's also the value that's currently loaded in
> hardware (ignoring that the cache holds stale data until the vCPU is put,
So, "stale data" is not accurate.
It just can't hold the current hardware loaded value when guest is running in
TD.
> i.e. until KVM prepares to switch back to the host).
>
> Actually, that's a bug waiting to happen when the getter comes along. Rather
> than document the potential pitfall, what about adding a prep patch to mimize
> the window? Then _this_ patch shouldn't need the caveat about the cache being
> stale.
With below patch,
For TDs,
CURR REAL
-----------------------------------------------------------------------
1. enable virtualization host value host value
2. before TDH.VP.ENTER defval host value or defval
3. TDH.VP.ENTER defval guest value (updated by tdx module)
4. TDH.VP.ENTER return defval defval (updated by tdx module)
5. exit to user mode host value host value
msrs->values[slot].curr is still not the current value loaded in hardware.
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index ff41d3d00380..326fa81cb35f 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
>
> vt->guest_state_loaded = true;
> +
> + /*
> + * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
Hmm, my previous mail didn't mention that besides saving guest value + clobber
hardware value before exit to VMM, the TDX module also loads saved guest value
to hardware on TDH.VP.ENTER.
> + * VP.ENTER succeeds, i.e. on TD-Exit. Mark those MSRs as needing an
> + * update to synchronize the "current" value in KVM's cache with the
> + * value in hardware (loaded by the TDX-Module).
> + */
> + to_tdx(vcpu)->need_user_return_msr_update = true;
> }
>
> struct tdx_uret_msr {
> @@ -816,7 +824,6 @@ static void tdx_user_return_msr_update_cache(void)
> static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vt *vt = to_vt(vcpu);
> - struct vcpu_tdx *tdx = to_tdx(vcpu);
>
> if (!vt->guest_state_loaded)
> return;
> @@ -824,11 +831,6 @@ static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
> ++vcpu->stat.host_state_reload;
> wrmsrl(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base);
>
> - if (tdx->guest_entered) {
> - tdx_user_return_msr_update_cache();
> - tdx->guest_entered = false;
> - }
> -
> vt->guest_state_loaded = false;
> }
>
> @@ -1067,7 +1069,11 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> update_debugctlmsr(vcpu->arch.host_debugctl);
>
> tdx_load_host_xsave_state(vcpu);
> - tdx->guest_entered = true;
> +
> + if (tdx->need_user_return_msr_update) {
> + tdx_user_return_msr_update_cache();
> + tdx->need_user_return_msr_update = false;
> + }
>
> vcpu->arch.regs_avail &= TDX_REGS_AVAIL_SET;
>
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index ca39a9391db1..fcac1627f71f 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -67,7 +67,7 @@ struct vcpu_tdx {
> u64 vp_enter_ret;
>
> enum vcpu_tdx_state state;
> - bool guest_entered;
> + bool need_user_return_msr_update;
>
> u64 map_gpa_next;
> u64 map_gpa_end;
>
On Fri, Oct 03, 2025, Yan Zhao wrote:
> Sorry for the slow response due to the PRC holiday.
>
> On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > setter.
> > > > >
> > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > hardware.
> > > > Nit:
> > > >
> > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> >
> > No? kvm_user_return_msr_update_cache() is passed the value that's currently
> > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> >
> > Ah, I suspect you're calling out that the cache can be stale. Maybe this?
> Right. But not just that the cache can be stale. My previous reply was quite
> misleading.
>
> As with below tables, where
> CURR: msrs->values[slot].curr.
> REAL: value that's currently loaded in hardware
>
> For TDs,
> CURR REAL
> -----------------------------------------------------------------------
> 1. enable virtualization host value host value
>
> 2. TDH.VP.ENTER host value guest value (updated by tdx module)
> 3. TDH.VP.ENTER return host value defval (updated by tdx module)
> 4. tdx_vcpu_put defval defval
> 5. exit to user mode host value host value
>
>
> For normal VMs,
> CURR REAL
> -----------------------------------------------------------------------
> 1. enable virtualization host value host value
> 2. before vcpu_run shadow guest value shadow guest value
> 3. after vcpu_run shadow guest value shadow guest value
> 4. exit to user mode host value host value
>
>
> Unlike normal VMs, where msrs->values[slot].curr always matches the the value
> that's currently loaded in hardware.
That isn't actually true, see the bottom.
> For TDs, msrs->values[slot].curr does not contain the value that's currently
> loaded in hardware in stages 2-3.
>
> > While the function does indeed update a cache, the nomenclature becomes
> > slightly misleading when adding a getter[1], as the current value isn't
> > _just_ the cached value, it's also the value that's currently loaded in
> > hardware (ignoring that the cache holds stale data until the vCPU is put,
> So, "stale data" is not accurate.
> It just can't hold the current hardware loaded value when guest is running in
> TD.
Eh, that's still "stale data" as far as KVM is concerned. Though I'm splitting
hairs, I totally agree that as written the changelog is misleading.
> it's also the value that's currently loaded in hardware.
I just need to append "when KVM is actively running" (or probably something more
verbose).
> > i.e. until KVM prepares to switch back to the host).
> >
> > Actually, that's a bug waiting to happen when the getter comes along. Rather
> > than document the potential pitfall, what about adding a prep patch to mimize
> > the window? Then _this_ patch shouldn't need the caveat about the cache being
> > stale.
> With below patch,
>
> For TDs,
> CURR REAL
> -----------------------------------------------------------------------
> 1. enable virtualization host value host value
> 2. before TDH.VP.ENTER defval host value or defval
> 3. TDH.VP.ENTER defval guest value (updated by tdx module)
> 4. TDH.VP.ENTER return defval defval (updated by tdx module)
> 5. exit to user mode host value host value
>
> msrs->values[slot].curr is still not the current value loaded in hardware.
Right, this where it becomes stale from my perspective.
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index ff41d3d00380..326fa81cb35f 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> >
> > vt->guest_state_loaded = true;
> > +
> > + /*
> > + * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
> Hmm, my previous mail didn't mention that besides saving guest value + clobber
> hardware value before exit to VMM, the TDX module also loads saved guest value
> to hardware on TDH.VP.ENTER.
That's not actually unique to TDX. EFER is setup as a user return MSR, but is
context switched on VM-Enter/VM-Exit except when running on ancient hardware
without VM_{ENTRY,EXIT}_LOAD_IA32_EFER (and even then, only when KVM doesn't
need to atomically switch to avoid toggling EFER.NX while in the host).
I.e. msrs->values[<EFER slot>].curr won't match hardware either while running
the guest. But because EFER is atomically loaded on VM-Exit in those cases, the
curr value can't be stale while KVM is running.
On Fri, Oct 03, 2025 at 09:53:28AM -0700, Sean Christopherson wrote:
> On Fri, Oct 03, 2025, Yan Zhao wrote:
> > Sorry for the slow response due to the PRC holiday.
> >
> > On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> > > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > > setter.
> > > > > >
> > > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > > hardware.
> > > > > Nit:
> > > > >
> > > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> > >
> > > No? kvm_user_return_msr_update_cache() is passed the value that's currently
> > > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> > >
> > > Ah, I suspect you're calling out that the cache can be stale. Maybe this?
> > Right. But not just that the cache can be stale. My previous reply was quite
> > misleading.
> >
> > As with below tables, where
> > CURR: msrs->values[slot].curr.
> > REAL: value that's currently loaded in hardware
> >
> > For TDs,
> > CURR REAL
> > -----------------------------------------------------------------------
> > 1. enable virtualization host value host value
> >
> > 2. TDH.VP.ENTER host value guest value (updated by tdx module)
> > 3. TDH.VP.ENTER return host value defval (updated by tdx module)
> > 4. tdx_vcpu_put defval defval
> > 5. exit to user mode host value host value
> >
> >
> > For normal VMs,
> > CURR REAL
> > -----------------------------------------------------------------------
> > 1. enable virtualization host value host value
> > 2. before vcpu_run shadow guest value shadow guest value
> > 3. after vcpu_run shadow guest value shadow guest value
> > 4. exit to user mode host value host value
> >
> >
> > Unlike normal VMs, where msrs->values[slot].curr always matches the the value
> > that's currently loaded in hardware.
>
> That isn't actually true, see the bottom.
>
> > For TDs, msrs->values[slot].curr does not contain the value that's currently
> > loaded in hardware in stages 2-3.
> >
> > > While the function does indeed update a cache, the nomenclature becomes
> > > slightly misleading when adding a getter[1], as the current value isn't
> > > _just_ the cached value, it's also the value that's currently loaded in
> > > hardware (ignoring that the cache holds stale data until the vCPU is put,
> > So, "stale data" is not accurate.
> > It just can't hold the current hardware loaded value when guest is running in
> > TD.
>
> Eh, that's still "stale data" as far as KVM is concerned. Though I'm splitting
> hairs, I totally agree that as written the changelog is misleading.
>
> > it's also the value that's currently loaded in hardware.
>
> I just need to append "when KVM is actively running" (or probably something more
> verbose).
>
> > > i.e. until KVM prepares to switch back to the host).
> > >
> > > Actually, that's a bug waiting to happen when the getter comes along. Rather
> > > than document the potential pitfall, what about adding a prep patch to mimize
> > > the window? Then _this_ patch shouldn't need the caveat about the cache being
> > > stale.
> > With below patch,
> >
> > For TDs,
> > CURR REAL
> > -----------------------------------------------------------------------
> > 1. enable virtualization host value host value
> > 2. before TDH.VP.ENTER defval host value or defval
> > 3. TDH.VP.ENTER defval guest value (updated by tdx module)
> > 4. TDH.VP.ENTER return defval defval (updated by tdx module)
> > 5. exit to user mode host value host value
> >
> > msrs->values[slot].curr is still not the current value loaded in hardware.
>
> Right, this where it becomes stale from my perspective.
Ok, then though after your fix, msrs->values[slot].curr is still not matching
hardware value in stage 2, I think it's harmless.
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index ff41d3d00380..326fa81cb35f 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> > > vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> > >
> > > vt->guest_state_loaded = true;
> > > +
> > > + /*
> > > + * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
> > Hmm, my previous mail didn't mention that besides saving guest value + clobber
> > hardware value before exit to VMM, the TDX module also loads saved guest value
> > to hardware on TDH.VP.ENTER.
>
> That's not actually unique to TDX. EFER is setup as a user return MSR, but is
> context switched on VM-Enter/VM-Exit except when running on ancient hardware
> without VM_{ENTRY,EXIT}_LOAD_IA32_EFER (and even then, only when KVM doesn't
> need to atomically switch to avoid toggling EFER.NX while in the host).
>
> I.e. msrs->values[<EFER slot>].curr won't match hardware either while running
> the guest. But because EFER is atomically loaded on VM-Exit in those cases, the
> curr value can't be stale while KVM is running.
Oh, yes.
On Tue, Sep 30, 2025, Sean Christopherson wrote: > On Tue, Sep 30, 2025, Yan Zhao wrote: > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote: > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote: > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr() > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the > > > > double-underscores version is doing a subset of the work of the "full" > > > > setter. > > > > > > > > While the function does indeed update a cache, the nomenclature becomes > > > > slightly misleading when adding a getter[1], as the current value isn't > > > > _just_ the cached value, it's also the value that's currently loaded in > > > > hardware. > > > Nit: > > > > > > For TDX, "it's also the value that's currently loaded in hardware" is not true. > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put(). > > No? kvm_user_return_msr_update_cache() is passed the value that's currently > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit. > > Ah, I suspect you're calling out that the cache can be stale. Maybe this? > > While the function does indeed update a cache, the nomenclature becomes > slightly misleading when adding a getter[1], as the current value isn't > _just_ the cached value, it's also the value that's currently loaded in > hardware (ignoring that the cache holds stale data until the vCPU is put, > i.e. until KVM prepares to switch back to the host). > > Actually, that's a bug waiting to happen when the getter comes along. Rather > than document the potential pitfall, what about adding a prep patch to mimize > the window? Then _this_ patch shouldn't need the caveat about the cache being > stale. Ha! It's technically a bug fix. Because a forced shutdown will invoke kvm_shutdown() without waiting for tasks to exit, and so the on_each_cpu() calls to kvm_disable_virtualization_cpu() can call kvm_on_user_return() and thus consume a stale values->curr.
On 9/20/2025 5:42 AM, Sean Christopherson wrote:
> Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> and use the helper kvm_set_user_return_msr() to make it obvious that the
^
the helper *in* kvm_set_user_return_msrs() ..
> double-underscores version is doing a subset of the work of the "full"
> setter.
>
> While the function does indeed update a cache, the nomenclature becomes
> slightly misleading when adding a getter[1], as the current value isn't
> _just_ the cached value, it's also the value that's currently loaded in
> hardware.
>
> Opportunistically rename "index" to "slot" in the prototypes. The user-
> return APIs deliberately use "slot" to try and make it more obvious that
> they take the slot within the array, not the index of the MSR.
>
> No functional change intended.
>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Xiaoyao Li <xiaoyao.li@intel.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Link: https://lore.kernel.org/all/aM2EvzLLmBi5-iQ5@google.com [1]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
© 2016 - 2026 Red Hat, Inc.