[PATCH v3 00/27] KVM: combined patchset for MBEC/GMET support

Paolo Bonzini posted 27 patches 2 months, 1 week ago
There is a newer version of this series
Documentation/virt/kvm/x86/mmu.rst |  10 +-
arch/x86/include/asm/cpufeatures.h |   1 +
arch/x86/include/asm/kvm-x86-ops.h |   1 +
arch/x86/include/asm/kvm_host.h    |  48 ++++++---
arch/x86/include/asm/svm.h         |   1 +
arch/x86/include/asm/vmx.h         |  14 ++-
arch/x86/kvm/hyperv.c              |   4 +-
arch/x86/kvm/mmu.h                 |  29 +++--
arch/x86/kvm/mmu/mmu.c             | 165 ++++++++++++++++++++---------
arch/x86/kvm/mmu/mmutrace.h        |  19 ++--
arch/x86/kvm/mmu/paging_tmpl.h     |  73 ++++++++-----
arch/x86/kvm/mmu/spte.c            |  74 +++++++------
arch/x86/kvm/mmu/spte.h            |  70 ++++++------
arch/x86/kvm/mmu/tdp_mmu.c         |   6 +-
arch/x86/kvm/svm/nested.c          |  33 +++++-
arch/x86/kvm/svm/svm.c             |  30 ++++++
arch/x86/kvm/svm/svm.h             |   1 +
arch/x86/kvm/vmx/capabilities.h    |  12 ++-
arch/x86/kvm/vmx/common.h          |  20 ++--
arch/x86/kvm/vmx/hyperv_evmcs.h    |   1 +
arch/x86/kvm/vmx/main.c            |   9 ++
arch/x86/kvm/vmx/nested.c          |  46 +++++++-
arch/x86/kvm/vmx/tdx.c             |   2 +-
arch/x86/kvm/vmx/vmx.c             |  27 ++++-
arch/x86/kvm/vmx/vmx.h             |   1 +
arch/x86/kvm/vmx/x86_ops.h         |   1 +
arch/x86/kvm/x86.c                 |  18 +---
27 files changed, 500 insertions(+), 216 deletions(-)
[PATCH v3 00/27] KVM: combined patchset for MBEC/GMET support
Posted by Paolo Bonzini 2 months, 1 week ago
This series introduces support for two related features that Hyper-V uses
in its implementation of Virtual Secure Mode; these are Intel Mode-Based
Execute Control and AMD Guest Mode Execution Trap.

Both MBEC and GMET allow more granular control over execute permissions,
with different levels of separation between supervisor and user mode.
MBEC provides support for separate supervisor and user-mode bits in the
PTEs; GMET instead lacks supervisor-mode only execution (with NX=0,
"both" is represented by U=0 and user-mode only by U=1).  GMET was
clearly inspired by SMEP though with some differences and annoyances.

The series was developed starting from Jon Kohler's earlier version at
https://lore.kernel.org/kvm/20251223054806.1611168-1-jon@nutanix.com/.
The difference is that I am starting this implementation from two
changes to core MMU code, even before looking at nested MBEC/GMET;
these are seemingly unnecessary for that goal but they make the actual
feature almost trivial to implement:

- first, I'm cleaning up the implementation of nVMX exec-only, by
  properly adding read permissions to the ACC_* constant and to the
  permission bitmask machinery.  Jon also had to add a fourth ACC_*
  bit, but used it only in the special case of nested MBEC; here
  instead ACC_READ_MASK is the normality, which simplifies testing
  a lot and removes gratuitous complexity.

- second, I'm enforcing that KVM runs with MBEC/GMET enabled even in
  non-nested mode, if it wants to provide the feature to nested
  hypervisors.  Initially I thought this would mostly simplify the
  testing; but it actually has a big effect on the code as well, because
  the creation of SPTEs now looks exactly the same for L1 and L2 guests;
  the difference lies only in the input access permissions.

Later patches have to use slightly different meanings for ACC_* in Intel
and AMD, but the differences are driven by whether the underlying SPTEs
have U/NX or XS/XU bits, and propagate from there.  In other words,
unlike the older ACC_USER_MASK hack these differences are backed by
concrete concepts of the page table format, and there is always a 1:1
mapping from ACC_* bits to PT_*_MASK or shadow_*_mask:

                            Intel                 AMD
     --------------------   -------------------   -------------------
     ACC_READ_MASK          PT_PRESENT_MASK       PT_PRESENT_MASK
     ACC_WRITE_MASK         PT_WRITABLE_MASK      PT_WRITABLE_MASK
     ACC_EXEC_MASK          shadow_xs_mask        shadow_nx_mask
     ACC_USER_MASK          ---                   shadow_user_mask
     ACC_USER_EXEC_MASK     shadow_xu_mask        ---

On the Intel side, some work is needed in order to split
shadow_x_mask and ACC_EXEC_MASK in two; now that there is an actual
ACC_READ_MASK to be used for exec-only pages, ACC_USER_MASK is unused
and can be reused as ACC_USER_EXEC_MASK.  ACC_EXEC_MASK is used for
kernel-mode execution and is tied to shadow_xs_mask (when MBEC is disabled
shadow_xs_mask == shadow_xu_mask, and ACC_USER_EXEC_MASK is computed but
ineffective).  update_permission_bitmask() precomputes all the necessary
conditions.  Note that with MBEC the user/supervisor distinction
depends on the U bit of the page tables rather than the CPL.  Processors
provide this information to the hypervisor through the "advanced EPT
violation vmexit info" feature, which is a requirement for KVM to use
MBEC, and kvm-intel.ko passes it to the MMU in PFERR_USER_MASK.  When
walking guest page tables, PFERR_USER_MASK is passed to the final nEPT
walk via translate_nested_gpa().

On the AMD side, the U bit maps to ACC_USER_MASK but nNPT adjusts the
permission bitmask to ignore it for reads and writes when GMET is active.
Despite the smaller scale of the changes compared to MBEC, there are some
changes to make to use GMET for L1 guests, because the page tables have
to be created with U=0.  This means that the root page has role.access !=
ACC_ALL and its permissions have to be propagated down.

In both cases, the complexity added to the core is limited in comparison
to the benefits of a pretty seamless nested support.

The former "smep_andnot_wp" bit of cpu_role.base, now named "cr4_smep",
is repurposed for nested TDP to indicate that MBEC/GMET is on.  The minor
pessimization for shadow page tables (toggling CR4.SMEP now always forces
building a separate version of the shadow page tables, even though that's
technically unnecessary if CR4.WP=1) is not really worth fretting about;
in practice, guests are not going to flip CR4.SMEP in a way that would
prevent efficient reuse of shadow page tables.

Patches 1-9 are general cleanups, mostly for MMU code.

Patches 10-12 make kvm_translate_nested_gpa vendor-specific, to
account for the different meaning of the PFERR_USER_MASK for VMX vs. SVM.

Patches 13-20 are for Intel MBEC, with the first three covering
non-nested use.

Patches 21-27 are for AMD GMET, with 21/22/23/25 covering non-nested
use and the others covering nested virtualization.

Paolo

v2->v3:
- do not clear ACC_USER_EXEC_MASK for NX huge pages on AMD [Sashiko]
- add comment that is_executable_pte() works only because XS==XU [Sashiko]
- handle XU in __is_bad_mt_xwr [Sashiko]
- use CPL<3, not CPL==0, for U bit of nested #NPF [Sashiko]
- move forced clearing of GMET_ENABLE to __nested_copy_vmcb_control_to_cache [Sashiko]
- only add XU shadow_acc_track_mask if MBEC is enabled
- only enable nested MBEC if enable_mbec==true, likewise only set
  X86_FEATURE_GMET in svm_set_cpu_caps if gmet_enabled==true [Sashiko]
- add MBEC/GMET support to translate_nested_gpa (plus preparation)

v1->v2:
- fix EXPORT_SYMBOL_FOR_KVM_INTERNAL goof
- drop bit 10 from FROZEN_SPTE, add static_assert to catch it [Kai]
- fix exit qualification for page table EPT violations [kvm-unit-tests]
- add XU to shadow_acc_track_mask
- propagate root_role->access also in shadow MMU direct_map()
- add requested access to kvm_mmu_spte_requested tracepoint
- include support for passing advanced EPT violation vmexit info to guest
- advanced EPT violation vmexit info is now required for nested MBEC
- fix nested MBEC to gate XS vs XU based on the U bit of paging structures
- drop SECONDARY_EXEC_MODE_BASED_EPT_EXEC if L1 does not set it
- add mini commit message to "KVM: nVMX: advertise MBEC to nested guests" [Jon]
- drop gmet from /proc/cpuinfo [Borislav]
- fix running L1 without GMET



Jon Kohler (5):
  KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
  KVM: x86/mmu: remove SPTE_PERM_MASK
  KVM: x86/mmu: free up bit 10 of PTEs in preparation for MBEC
  KVM: nVMX: advertise MBEC to nested guests
  KVM: nVMX: allow MBEC with EVMCS

Paolo Bonzini (22):
  KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC
  KVM: x86/mmu: remove SPTE_EPT_*
  KVM: x86/mmu: merge make_spte_{non,}executable
  KVM: x86/mmu: rename and clarify BYTE_MASK
  KVM: x86/mmu: introduce ACC_READ_MASK
  KVM: x86/mmu: separate more EPT/non-EPT permission_fault()
  KVM: x86/mmu: pass PFERR_GUEST_PAGE/FINAL_MASK to kvm_translate_gpa
  KVM: x86/mmu: pass pte_access for final nGPA->GPA walk
  KVM: x86: make translate_nested_gpa vendor-specific
  KVM: x86/mmu: split XS/XU bits for EPT
  KVM: x86/mmu: move cr4_smep to base role
  KVM: VMX: enable use of MBEC
  KVM: nVMX: pass advanced EPT violation vmexit info to guest
  KVM: nVMX: pass PFERR_USER_MASK to MMU on EPT violations
  KVM: x86/mmu: add support for MBEC to EPT page table walks
  KVM: x86/mmu: propagate access mask from root pages down
  KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D
  KVM: SVM: add GMET bit definitions
  KVM: x86/mmu: add support for GMET to NPT page table walks
  KVM: SVM: enable GMET and set it in MMU role
  KVM: SVM: work around errata 1218
  KVM: nSVM: enable GMET for guests

 Documentation/virt/kvm/x86/mmu.rst |  10 +-
 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/kvm-x86-ops.h |   1 +
 arch/x86/include/asm/kvm_host.h    |  48 ++++++---
 arch/x86/include/asm/svm.h         |   1 +
 arch/x86/include/asm/vmx.h         |  14 ++-
 arch/x86/kvm/hyperv.c              |   4 +-
 arch/x86/kvm/mmu.h                 |  29 +++--
 arch/x86/kvm/mmu/mmu.c             | 165 ++++++++++++++++++++---------
 arch/x86/kvm/mmu/mmutrace.h        |  19 ++--
 arch/x86/kvm/mmu/paging_tmpl.h     |  73 ++++++++-----
 arch/x86/kvm/mmu/spte.c            |  74 +++++++------
 arch/x86/kvm/mmu/spte.h            |  70 ++++++------
 arch/x86/kvm/mmu/tdp_mmu.c         |   6 +-
 arch/x86/kvm/svm/nested.c          |  33 +++++-
 arch/x86/kvm/svm/svm.c             |  30 ++++++
 arch/x86/kvm/svm/svm.h             |   1 +
 arch/x86/kvm/vmx/capabilities.h    |  12 ++-
 arch/x86/kvm/vmx/common.h          |  20 ++--
 arch/x86/kvm/vmx/hyperv_evmcs.h    |   1 +
 arch/x86/kvm/vmx/main.c            |   9 ++
 arch/x86/kvm/vmx/nested.c          |  46 +++++++-
 arch/x86/kvm/vmx/tdx.c             |   2 +-
 arch/x86/kvm/vmx/vmx.c             |  27 ++++-
 arch/x86/kvm/vmx/vmx.h             |   1 +
 arch/x86/kvm/vmx/x86_ops.h         |   1 +
 arch/x86/kvm/x86.c                 |  18 +---
 27 files changed, 500 insertions(+), 216 deletions(-)

-- 
2.52.0
Re: [PATCH v3 00/27] KVM: combined patchset for MBEC/GMET support
Posted by David Riley 2 months ago
Hi Paolo, Jon,

Thanks to Paolo for sending the new patch series (v3), and to Jon
for the feedback on my previous test.

I have once again tested this patchset (v3) on both Intel and AMD
platforms using Proxmox VE (based on Debian Trixie) with a Windows
Server guest (24H2, Build 26100.1742).

The focus of the tests were live migrations between different hosts
(Intel <-> Intel & AMD <-> AMD).

All tests used the same base setup:

Kernel: mainline 7.0.0-rc7 (with MBEC/GMET v3 patches applied)
QEMU: our downstream QEMU build based on 10.2.1, plus Jon's patches
virtio-win: 0.1.271

Windows Guest:
For the guest setup I enabled Virtualization-Based Security (VBS)
and Hypervisor-Protected Code Integrity (HVCI).

I set the following in the Group Policy Editor (DeviceGuard):
* Select Platform Security Level: Secure Boot
* Virtualization Based Protection of Code Integrity: Enabled without
   lock
* Require UEFI Memory Attributes Table: Checked

Hosts:
Intel Nodes:
    CPU: Intel(R) Xeon(R) Gold 6426Y

AMD Nodes:
    CPU: AMD EPYC 7302P


I tested the following:

1. Intel without Hyper-V Enlightenments:

QEMU CPU options: -cpu 'host,+kvm_pv_eoi,+kvm_pv_unhalt,level=30'
AvailableSecurityProperties [0]:  1,2,4,5,7

Security Property 7 indicates MBEC/GMET support. [0]

I migrated the virtual guest between the two Intel hosts whilst
running Cinebench R32.200. No issues were found, but the VM does not
perform well without Hyper-V Enlightenments.

2. Intel with Hyper-V Enlightenments:

QEMU CPU options: -cpu 'host,+hv-evmcs,+hv-ipi,+hv-relaxed,
   +hv-runtime,hv-spinlocks=0x1fff,+hv-stimer,+hv-synic,+hv-time,
+hv-tlbflush,+hv-tlbflush-ext,+hv-vapic,+hv-vpindex,+hv-xmm-input,
   +kvm_pv_eoi,+kvm_pv_unhalt,level=30,+vmx-mbec'

AvailableSecurityProperties [0]: 1,2,4,5,7

I again migrated the virtual machine between the two Intel hosts
whilst running Cinebench R32.200. No issues were found, but the VM
performs significantly better with Hyper-V Enlightenments set.

3. AMD without Hyper-V Enlightenments:

QEMU CPU options: -cpu 'host,+kvm_pv_eoi,+kvm_pv_unhalt,level=30'

AvailableSecurityProperties [0]: 1,2,4,5,7

I migrated the virtual machine between the two AMD hosts whilst
running Cinebench R32.200. No issues were found.

4. AMD with Hyper-V Enlightenments:

QEMU CPU options: -cpu 'host,+gmet,+hv-emsr-bitmap,+hv-ipi,
+hv-relaxed,+hv-runtime,hv-spinlocks=0x1fff,+hv-stimer,+hv-synic,
   +hv-time,+hv-tlbflush,+hv-tlbflush-ext,+hv-vapic,+hv-vpindex,
   +hv-xmm-input,+kvm_pv_eoi,+kvm_pv_unhalt,level=30'

AvailableSecurityProperties [0]: 1,2,4,5,7

I again migrated the virtual machine between the two AMD hosts whilst
running Cinebench R32.200. I have not found any issues.

Tested-by: David Riley <d.riley@proxmox.com>

[0] https://learn.microsoft.com/en-us/windows/security/hardware-security/enable-virtualization-based-protection-of-code-integrity?tabs=security


Re: [PATCH v3 00/27] KVM: combined patchset for MBEC/GMET support
Posted by Jon Kohler 1 month, 3 weeks ago

> On Apr 15, 2026, at 3:06 AM, David Riley <d.riley@proxmox.com> wrote:
> 
> Hi Paolo, Jon,
> 
> Thanks to Paolo for sending the new patch series (v3), and to Jon
> for the feedback on my previous test.
> 
> I have once again tested this patchset (v3) on both Intel and AMD
> platforms using Proxmox VE (based on Debian Trixie) with a Windows
> Server guest (24H2, Build 26100.1742).
> 
> The focus of the tests were live migrations between different hosts
> (Intel <-> Intel & AMD <-> AMD).
> 
> All tests used the same base setup:
> 
> Kernel: mainline 7.0.0-rc7 (with MBEC/GMET v3 patches applied)
> QEMU: our downstream QEMU build based on 10.2.1, plus Jon's patches
> virtio-win: 0.1.271
> 
> Windows Guest:
> For the guest setup I enabled Virtualization-Based Security (VBS)
> and Hypervisor-Protected Code Integrity (HVCI).
> 
> I set the following in the Group Policy Editor (DeviceGuard):
> * Select Platform Security Level: Secure Boot
> * Virtualization Based Protection of Code Integrity: Enabled without
>   lock
> * Require UEFI Memory Attributes Table: Checked
> 
> Hosts:
> Intel Nodes:
>    CPU: Intel(R) Xeon(R) Gold 6426Y
> 
> AMD Nodes:
>    CPU: AMD EPYC 7302P
> 
> 
> I tested the following:
> 
> 1. Intel without Hyper-V Enlightenments:
> 
> QEMU CPU options: -cpu 'host,+kvm_pv_eoi,+kvm_pv_unhalt,level=30'
> AvailableSecurityProperties [0]:  1,2,4,5,7
> 
> Security Property 7 indicates MBEC/GMET support. [0]
> 
> I migrated the virtual guest between the two Intel hosts whilst
> running Cinebench R32.200. No issues were found, but the VM does not
> perform well without Hyper-V Enlightenments.
> 
> 2. Intel with Hyper-V Enlightenments:
> 
> QEMU CPU options: -cpu 'host,+hv-evmcs,+hv-ipi,+hv-relaxed,
>   +hv-runtime,hv-spinlocks=0x1fff,+hv-stimer,+hv-synic,+hv-time,
> +hv-tlbflush,+hv-tlbflush-ext,+hv-vapic,+hv-vpindex,+hv-xmm-input,
>   +kvm_pv_eoi,+kvm_pv_unhalt,level=30,+vmx-mbec'
> 
> AvailableSecurityProperties [0]: 1,2,4,5,7
> 
> I again migrated the virtual machine between the two Intel hosts
> whilst running Cinebench R32.200. No issues were found, but the VM
> performs significantly better with Hyper-V Enlightenments set.
> 
> 3. AMD without Hyper-V Enlightenments:
> 
> QEMU CPU options: -cpu 'host,+kvm_pv_eoi,+kvm_pv_unhalt,level=30'
> 
> AvailableSecurityProperties [0]: 1,2,4,5,7
> 
> I migrated the virtual machine between the two AMD hosts whilst
> running Cinebench R32.200. No issues were found.
> 
> 4. AMD with Hyper-V Enlightenments:
> 
> QEMU CPU options: -cpu 'host,+gmet,+hv-emsr-bitmap,+hv-ipi,
> +hv-relaxed,+hv-runtime,hv-spinlocks=0x1fff,+hv-stimer,+hv-synic,
>   +hv-time,+hv-tlbflush,+hv-tlbflush-ext,+hv-vapic,+hv-vpindex,
>   +hv-xmm-input,+kvm_pv_eoi,+kvm_pv_unhalt,level=30'
> 
> AvailableSecurityProperties [0]: 1,2,4,5,7
> 
> I again migrated the virtual machine between the two AMD hosts whilst
> running Cinebench R32.200. I have not found any issues.
> 
> Tested-by: David Riley <d.riley@proxmox.com>

Great! Thanks for testing these various permutations out, that’s
a very helpful datapoint. 

For posterity, we’ve also done a similar round of testing on both
AMD/Intel and knock on wood, things are holding up nicely, with
no trouble reports from QA as of yet (more knocking on wood).