arch/x86/kvm/cpuid.c | 1 + arch/x86/kvm/svm/svm.c | 3 +++ arch/x86/kvm/x86.c | 3 +++ 3 files changed, 7 insertions(+)
From: Venkatesh Srinivas <venkateshs@chromium.org>
TCE augments the behavior of TLB invalidating instructions (INVLPG,
INVLPGB, and INVPCID) to only invalidate translations for relevant
intermediate mappings to the address range, rather than ALL intermdiate
translations.
The Linux kernel has been setting EFER.TCE if supported by the CPU since
commit 440a65b7d25f ("x86/mm: Enable AMD translation cache extensions"),
as it may improve performance.
KVM does not need to do anything to virtualize the feature, only
advertise it and allow setting EFER.TCE. Passthrough X86_FEATURE_TCE to
the guest, and allow the guest to set EFER.TCE if available.
Co-developed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
---
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/svm/svm.c | 3 +++
arch/x86/kvm/x86.c | 3 +++
3 files changed, 7 insertions(+)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fffbf087937d4..4f810f23b1d9b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1112,6 +1112,7 @@ void kvm_initialize_cpu_caps(void)
F(XOP),
/* SKINIT, WDT, LWP */
F(FMA4),
+ F(TCE),
F(TBM),
F(TOPOEXT),
VENDOR_F(PERFCTR_CORE),
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3407deac90bd6..fee1c8cd45973 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5580,6 +5580,9 @@ static __init int svm_hardware_setup(void)
if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
kvm_enable_efer_bits(EFER_AUTOIBRS);
+ if (boot_cpu_has(X86_FEATURE_TCE))
+ kvm_enable_efer_bits(EFER_TCE);
+
/* Check for pause filtering support */
if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
pause_filter_count = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 879cdeb6adde2..7336ce1df3f7a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1743,6 +1743,9 @@ static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
if (efer & EFER_NX && !guest_cpu_cap_has(vcpu, X86_FEATURE_NX))
return false;
+ if (efer & EFER_TCE && !guest_cpu_cap_has(vcpu, X86_FEATURE_TCE))
+ return false;
+
return true;
}
base-commit: 5128b972fb2801ad9aca54d990a75611ab5283a9
--
2.53.0.473.g4a7958ca14-goog
On Fri, Mar 06, 2026, Yosry Ahmed wrote:
> From: Venkatesh Srinivas <venkateshs@chromium.org>
>
> TCE augments the behavior of TLB invalidating instructions (INVLPG,
> INVLPGB, and INVPCID) to only invalidate translations for relevant
> intermediate mappings to the address range, rather than ALL intermdiate
> translations.
>
> The Linux kernel has been setting EFER.TCE if supported by the CPU since
> commit 440a65b7d25f ("x86/mm: Enable AMD translation cache extensions"),
> as it may improve performance.
>
> KVM does not need to do anything to virtualize the feature,
Please back this up with actual analysis.
> only advertise it and allow setting EFER.TCE. Passthrough X86_FEATURE_TCE to
Advertise X86_FEATURE_TCE to userspace, not "passthrough xxx to the guest".
Because that's all KVM
> the guest, and allow the guest to set EFER.TCE if available.
>
> Co-developed-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Your SoB should come last to capture that the chain of hanlding, i.e. this should
be:
Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Co-developed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> ---
> arch/x86/kvm/cpuid.c | 1 +
> arch/x86/kvm/svm/svm.c | 3 +++
> arch/x86/kvm/x86.c | 3 +++
> 3 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index fffbf087937d4..4f810f23b1d9b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1112,6 +1112,7 @@ void kvm_initialize_cpu_caps(void)
> F(XOP),
> /* SKINIT, WDT, LWP */
> F(FMA4),
> + F(TCE),
> F(TBM),
> F(TOPOEXT),
> VENDOR_F(PERFCTR_CORE),
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3407deac90bd6..fee1c8cd45973 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5580,6 +5580,9 @@ static __init int svm_hardware_setup(void)
> if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
> kvm_enable_efer_bits(EFER_AUTOIBRS);
>
> + if (boot_cpu_has(X86_FEATURE_TCE))
> + kvm_enable_efer_bits(EFER_TCE);
Hrm, I think we should handle all of the kvm_enable_efer_bits() calls that are
conditioned only on CPU support in common code. While it's highly unlikely Intel
CPUs will ever support more EFER-based features, if they do, then KVM will
over-report support since kvm_initialize_cpu_caps() will effectively enable the
feature, but VMX won't enable the corresponding EFER bit.
I can't think anything that will go sideways if we rely purely on KVM caps, so
get to something like this as prep work, and then land TCE in common x86?
---
arch/x86/kvm/svm/svm.c | 7 +------
arch/x86/kvm/vmx/vmx.c | 4 ----
arch/x86/kvm/x86.c | 14 ++++++++++++++
3 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3407deac90bd..c23ee45f2ba8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5556,14 +5556,10 @@ static __init int svm_hardware_setup(void)
pr_err_ratelimited("NX (Execute Disable) not supported\n");
return -EOPNOTSUPP;
}
- kvm_enable_efer_bits(EFER_NX);
kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
XFEATURE_MASK_BNDCSR);
- if (boot_cpu_has(X86_FEATURE_FXSR_OPT))
- kvm_enable_efer_bits(EFER_FFXSR);
-
if (tsc_scaling) {
if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
tsc_scaling = false;
@@ -5577,8 +5573,7 @@ static __init int svm_hardware_setup(void)
tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
- if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
- kvm_enable_efer_bits(EFER_AUTOIBRS);
+
/* Check for pause filtering support */
if (!boot_cpu_has(X86_FEATURE_PAUSEFILTER)) {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9302c16571cd..2b8a7456039c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8583,10 +8583,6 @@ __init int vmx_hardware_setup(void)
vmx_setup_user_return_msrs();
-
- if (boot_cpu_has(X86_FEATURE_NX))
- kvm_enable_efer_bits(EFER_NX);
-
if (boot_cpu_has(X86_FEATURE_MPX)) {
rdmsrq(MSR_IA32_BNDCFGS, host_bndcfgs);
WARN_ONCE(host_bndcfgs, "BNDCFGS in host will be lost");
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 879cdeb6adde..0b5d48e75b65 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10025,6 +10025,18 @@ void kvm_setup_xss_caps(void)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_setup_xss_caps);
+static void kvm_setup_efer_caps(void)
+{
+ if (kvm_cpu_cap_has(X86_FEATURE_NX))
+ kvm_enable_efer_bits(EFER_NX);
+
+ if (kvm_cpu_cap_has(X86_FEATURE_FXSR_OPT))
+ kvm_enable_efer_bits(EFER_FFXSR);
+
+ if (kvm_cpu_cap_has(X86_FEATURE_AUTOIBRS))
+ kvm_enable_efer_bits(EFER_AUTOIBRS);
+}
+
static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
{
memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
@@ -10161,6 +10173,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
if (r != 0)
goto out_mmu_exit;
+ kvm_setup_efer_caps();
+
enable_device_posted_irqs &= enable_apicv &&
irq_remapping_cap(IRQ_POSTING_CAP);
base-commit: 5128b972fb2801ad9aca54d990a75611ab5283a9
--
> Hrm, I think we should handle all of the kvm_enable_efer_bits() calls that are
> conditioned only on CPU support in common code. While it's highly unlikely Intel
> CPUs will ever support more EFER-based features, if they do, then KVM will
> over-report support since kvm_initialize_cpu_caps() will effectively enable the
> feature, but VMX won't enable the corresponding EFER bit.
>
> I can't think anything that will go sideways if we rely purely on KVM caps, so
> get to something like this as prep work, and then land TCE in common x86?
Taking a second look here, doesn't this break the changes introduced
by commit 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks
for host-initiated writes")? Userspace writes may fail if the
corresponding CPUID feature is not enabled.
We can still pull the enablement to common code, but
kvm_setup_efer_caps() still needs to query the host CPU features (i.e.
boot_cpu_has()) AFAICT.
On Fri, Mar 06, 2026, Yosry Ahmed wrote:
> > Hrm, I think we should handle all of the kvm_enable_efer_bits() calls that are
> > conditioned only on CPU support in common code. While it's highly unlikely Intel
> > CPUs will ever support more EFER-based features, if they do, then KVM will
> > over-report support since kvm_initialize_cpu_caps() will effectively enable the
> > feature, but VMX won't enable the corresponding EFER bit.
> >
> > I can't think anything that will go sideways if we rely purely on KVM caps, so
> > get to something like this as prep work, and then land TCE in common x86?
>
> Taking a second look here, doesn't this break the changes introduced
> by commit 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks
> for host-initiated writes")? Userspace writes may fail if the
> corresponding CPUID feature is not enabled.
No, because kvm_cpu_cap_has() == boot_cpu_has() filtered by what KVM supports.
All of these EFER updates subtly rely on KVM enabling the associated CPUID
feature in kvm_set_cpu_caps().
If we used guest_cpu_cap_has(), then yes, that would be a problem.
On Fri, Mar 6, 2026 at 4:56 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Mar 06, 2026, Yosry Ahmed wrote:
> > > Hrm, I think we should handle all of the kvm_enable_efer_bits() calls that are
> > > conditioned only on CPU support in common code. While it's highly unlikely Intel
> > > CPUs will ever support more EFER-based features, if they do, then KVM will
> > > over-report support since kvm_initialize_cpu_caps() will effectively enable the
> > > feature, but VMX won't enable the corresponding EFER bit.
> > >
> > > I can't think anything that will go sideways if we rely purely on KVM caps, so
> > > get to something like this as prep work, and then land TCE in common x86?
> >
> > Taking a second look here, doesn't this break the changes introduced
> > by commit 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks
> > for host-initiated writes")? Userspace writes may fail if the
> > corresponding CPUID feature is not enabled.
>
> No, because kvm_cpu_cap_has() == boot_cpu_has() filtered by what KVM supports.
> All of these EFER updates subtly rely on KVM enabling the associated CPUID
> feature in kvm_set_cpu_caps().
>
> If we used guest_cpu_cap_has(), then yes, that would be a problem.
Gaah I mixed up guest_cpu_cap_has() and kvm_cpu_cap_has(), sorry for the noise.
On Fri, Mar 6, 2026 at 8:19 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Mar 06, 2026, Yosry Ahmed wrote:
> > From: Venkatesh Srinivas <venkateshs@chromium.org>
> >
> > TCE augments the behavior of TLB invalidating instructions (INVLPG,
> > INVLPGB, and INVPCID) to only invalidate translations for relevant
> > intermediate mappings to the address range, rather than ALL intermdiate
> > translations.
> >
> > The Linux kernel has been setting EFER.TCE if supported by the CPU since
> > commit 440a65b7d25f ("x86/mm: Enable AMD translation cache extensions"),
> > as it may improve performance.
> >
> > KVM does not need to do anything to virtualize the feature,
>
> Please back this up with actual analysis.
Something like this?
If a TLB invalidating instruction is not intercepted, it will behave
according to the guest's setting of EFER.TCE as the value will be
loaded on VM-Enter. Otherwise, KVM's emulation may invalidate more TLB
entries, which is perfectly fine as the CPU is allowed to invalidate
more TLB entries that it strictly needs to.
>
> > only advertise it and allow setting EFER.TCE. Passthrough X86_FEATURE_TCE to
>
> Advertise X86_FEATURE_TCE to userspace, not "passthrough xxx to the guest".
> Because that's all KVM
>
> > the guest, and allow the guest to set EFER.TCE if available.
> >
> > Co-developed-by: Yosry Ahmed <yosry@kernel.org>
> > Signed-off-by: Yosry Ahmed <yosry@kernel.org>
> > Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
>
> Your SoB should come last to capture that the chain of hanlding, i.e. this should
> be:
Ack.
>
> Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
> Co-developed-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Yosry Ahmed <yosry@kernel.org>
>
[..]
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 3407deac90bd6..fee1c8cd45973 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -5580,6 +5580,9 @@ static __init int svm_hardware_setup(void)
> > if (boot_cpu_has(X86_FEATURE_AUTOIBRS))
> > kvm_enable_efer_bits(EFER_AUTOIBRS);
> >
> > + if (boot_cpu_has(X86_FEATURE_TCE))
> > + kvm_enable_efer_bits(EFER_TCE);
>
> Hrm, I think we should handle all of the kvm_enable_efer_bits() calls that are
> conditioned only on CPU support in common code. While it's highly unlikely Intel
> CPUs will ever support more EFER-based features, if they do, then KVM will
> over-report support since kvm_initialize_cpu_caps() will effectively enable the
> feature, but VMX won't enable the corresponding EFER bit.
>
> I can't think anything that will go sideways if we rely purely on KVM caps, so
> get to something like this as prep work, and then land TCE in common x86?
Will do.
On Fri, Mar 06, 2026, Yosry Ahmed wrote:
> On Fri, Mar 6, 2026 at 8:19 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Mar 06, 2026, Yosry Ahmed wrote:
> > > From: Venkatesh Srinivas <venkateshs@chromium.org>
> > >
> > > TCE augments the behavior of TLB invalidating instructions (INVLPG,
> > > INVLPGB, and INVPCID) to only invalidate translations for relevant
> > > intermediate mappings to the address range, rather than ALL intermdiate
> > > translations.
> > >
> > > The Linux kernel has been setting EFER.TCE if supported by the CPU since
> > > commit 440a65b7d25f ("x86/mm: Enable AMD translation cache extensions"),
> > > as it may improve performance.
> > >
> > > KVM does not need to do anything to virtualize the feature,
> >
> > Please back this up with actual analysis.
>
> Something like this?
>
> If a TLB invalidating instruction is not intercepted, it will behave
> according to the guest's setting of EFER.TCE as the value will be
> loaded on VM-Enter. Otherwise, KVM's emulation may invalidate more TLB
> entries, which is perfectly fine as the CPU is allowed to invalidate
> more TLB entries that it strictly needs to.
Ya, LGTM.
© 2016 - 2026 Red Hat, Inc.