[PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support

Paolo Bonzini posted 28 patches 1 month, 1 week ago
Documentation/virt/kvm/x86/mmu.rst |  10 +-
arch/x86/include/asm/cpufeatures.h |   1 +
arch/x86/include/asm/kvm-x86-ops.h |   1 +
arch/x86/include/asm/kvm_host.h    |  48 +++++---
arch/x86/include/asm/svm.h         |   1 +
arch/x86/include/asm/vmx.h         |  14 ++-
arch/x86/kvm/hyperv.c              |   4 +-
arch/x86/kvm/mmu.h                 |  30 +++--
arch/x86/kvm/mmu/mmu.c             | 182 ++++++++++++++++++++---------
arch/x86/kvm/mmu/mmutrace.h        |  19 +--
arch/x86/kvm/mmu/paging_tmpl.h     |  73 ++++++++----
arch/x86/kvm/mmu/spte.c            |  92 +++++++++------
arch/x86/kvm/mmu/spte.h            |  70 ++++++-----
arch/x86/kvm/mmu/tdp_mmu.c         |   6 +-
arch/x86/kvm/svm/nested.c          |  38 +++++-
arch/x86/kvm/svm/svm.c             |  31 +++++
arch/x86/kvm/svm/svm.h             |   1 +
arch/x86/kvm/vmx/capabilities.h    |  12 +-
arch/x86/kvm/vmx/common.h          |  26 +++--
arch/x86/kvm/vmx/hyperv_evmcs.h    |   1 +
arch/x86/kvm/vmx/main.c            |   9 ++
arch/x86/kvm/vmx/nested.c          |  46 +++++++-
arch/x86/kvm/vmx/tdx.c             |   2 +-
arch/x86/kvm/vmx/vmx.c             |  27 ++++-
arch/x86/kvm/vmx/vmx.h             |   1 +
arch/x86/kvm/vmx/x86_ops.h         |   1 +
arch/x86/kvm/x86.c                 |  18 +--
27 files changed, 536 insertions(+), 228 deletions(-)
[PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Paolo Bonzini 1 month, 1 week ago
This version can also be found in the "queue" branch of kvm.git.
Since it should be final I'm including again for reference the full
description.
    
Both MBEC and GMET allow more granular control over execute permissions,
with different levels of separation between supervisor and user mode.
MBEC provides support for separate supervisor and user-mode bits in the
PTEs; GMET instead lacks supervisor-mode only execution (with NX=0,
"both" is represented by U=0 and user-mode only by U=1).  GMET was
clearly inspired by SMEP though with some differences and annoyances.

The implementation starts from two changes to core MMU code, both
of which help making the actual feature almost trivial to implement:

- first, I'm cleaning up the implementation of nVMX exec-only, by
  properly adding read permissions to the ACC_* constant and to the
  permission bitmask machinery.  Jon also had to add a fourth ACC_*
  bit, but used it only in the special case of nested MBEC; here
  instead ACC_READ_MASK is the normality, which simplifies testing
  a lot and removes gratuitous complexity.

- second, I'm enforcing that KVM runs with MBEC/GMET enabled even in
  non-nested mode, if it wants to provide the feature to nested
  hypervisors.  This makes the creation of SPTEs looks exactly the
  same for L1 and L2 guests, despite only the latter using MBEC/GMET
  fully; the difference lies only in the input access permissions.

This strategy adds a limited amount of complexity to the core is limited,
while providing for an almost entirely seamless support of nested
hypervisors.

Later patches have to use slightly different meanings for ACC_* in Intel
and AMD.  On the Intel side, some work is needed in order to split
shadow_x_mask and ACC_EXEC_MASK in two; now that there is an actual
ACC_READ_MASK to be used for exec-only pages, ACC_USER_MASK is unused
and can be reused as ACC_USER_EXEC_MASK.  However, unlike the older
ACC_USER_MASK hack these differences are backed by concrete concepts
of the page table format, and there is always a 1:1 mapping from ACC_*
bits to PT_*_MASK or shadow_*_mask:

                            Intel                 AMD
     --------------------   -------------------   -------------------
     ACC_READ_MASK          PT_PRESENT_MASK       PT_PRESENT_MASK
     ACC_WRITE_MASK         PT_WRITABLE_MASK      PT_WRITABLE_MASK
     ACC_EXEC_MASK          shadow_xs_mask        shadow_nx_mask
     ACC_USER_MASK          ---                   shadow_user_mask
     ACC_USER_EXEC_MASK     shadow_xu_mask        ---

On Intel, ACC_EXEC_MASK is used for kernel-mode execution and is tied to
shadow_xs_mask (when MBEC is disabled, ACC_USER_EXEC_MASK and the XU bit
are computed but ineffective).  update_permission_bitmask() precomputes
all the necessary conditions.  On the AMD side, the U bit maps to
ACC_USER_MASK but nNPT adjusts the permission bitmask to ignore it for
reads and writes when GMET is active.  Despite the smaller scale of the
changes compared to MBEC, there are some changes to make to use GMET
for L1 guests, because the page tables have to be created with U=0.
This means that the root page has role.access != ACC_ALL and its
permissions have to be propagated down.

Note that with MBEC the user/supervisor distinction depends on the U
bit of the page tables rather than the CPL.  Processors provide this
information to the hypervisor through the "advanced EPT violation
vmexit info" feature, which is a requirement for KVM to use MBEC,
and kvm-intel.ko passes it to the MMU in PFERR_USER_MASK (unlike
kvm-amd.ko which computes it from the CPL).  This needs a small change
to pass the effective XWU permissions of the page tables down to
translate_nested_gpa().

The former "smep_andnot_wp" bit of cpu_role.base, now named "cr4_smep",
is repurposed for nested TDP to indicate that MBEC/GMET is on.  The minor
pessimization for shadow page tables (toggling CR4.SMEP now always forces
building a separate version of the shadow page tables, even though that's
technically unnecessary if CR4.WP=1) is not really worth fretting about;
in practice, guests are not going to flip CR4.SMEP in a way that would
prevent efficient reuse of shadow page tables.

Paolo

v5->v6:
- rename make_spte_executable to change_spte_executable
- rename byte index in update_permission_bitmask to index
- use (u8) casts before "KVM: x86/mmu: introduce ACC_READ_MASK"
- make commit message for "KVM: x86/mmu: split XS/XU bits for EPT" more accurate
- add XU to shadow_acc_track_mask already in "KVM: x86/mmu: split XS/XU bits for EPT"
- fix compilation error
- use alternative code for __vmx_handle_ept_violation suggested by Sean


Jon Kohler (5):
  KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
  KVM: x86/mmu: remove SPTE_PERM_MASK
  KVM: x86/mmu: free up bit 10 of PTEs in preparation for MBEC
  KVM: nVMX: advertise MBEC to nested guests
  KVM: nVMX: allow MBEC with EVMCS

Paolo Bonzini (23):
  KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC
  KVM: x86/mmu: remove SPTE_EPT_*
  KVM: x86/mmu: merge make_spte_{non,}executable
  KVM: x86/mmu: rename and clarify BYTE_MASK
  KVM: x86/mmu: separate more EPT/non-EPT permission_fault()
  KVM: x86/mmu: introduce ACC_READ_MASK
  KVM: x86/mmu: pass PFERR_GUEST_PAGE/FINAL_MASK to kvm_translate_gpa
  KVM: x86/mmu: pass pte_access for final nGPA->GPA walk
  KVM: x86: make translate_nested_gpa vendor-specific
  KVM: x86/mmu: split XS/XU bits for EPT
  KVM: x86/mmu: move cr4_smep to base role
  KVM: VMX: enable use of MBEC
  KVM: nVMX: pass advanced EPT violation vmexit info to guest
  KVM: nVMX: pass PFERR_USER_MASK to MMU on EPT violations
  KVM: x86/mmu: add support for MBEC to EPT page table walks
  KVM: x86/mmu: propagate access mask from root pages down
  KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D
  KVM: SVM: add GMET bit definitions
  KVM: x86/mmu: hard code more bits in kvm_init_shadow_npt_mmu
  KVM: x86/mmu: add support for GMET to NPT page table walks
  KVM: SVM: enable GMET and set it in MMU role
  KVM: SVM: work around errata 1218
  KVM: nSVM: enable GMET for guests

 Documentation/virt/kvm/x86/mmu.rst |  10 +-
 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/kvm-x86-ops.h |   1 +
 arch/x86/include/asm/kvm_host.h    |  48 +++++---
 arch/x86/include/asm/svm.h         |   1 +
 arch/x86/include/asm/vmx.h         |  14 ++-
 arch/x86/kvm/hyperv.c              |   4 +-
 arch/x86/kvm/mmu.h                 |  30 +++--
 arch/x86/kvm/mmu/mmu.c             | 182 ++++++++++++++++++++---------
 arch/x86/kvm/mmu/mmutrace.h        |  19 +--
 arch/x86/kvm/mmu/paging_tmpl.h     |  73 ++++++++----
 arch/x86/kvm/mmu/spte.c            |  92 +++++++++------
 arch/x86/kvm/mmu/spte.h            |  70 ++++++-----
 arch/x86/kvm/mmu/tdp_mmu.c         |   6 +-
 arch/x86/kvm/svm/nested.c          |  38 +++++-
 arch/x86/kvm/svm/svm.c             |  31 +++++
 arch/x86/kvm/svm/svm.h             |   1 +
 arch/x86/kvm/vmx/capabilities.h    |  12 +-
 arch/x86/kvm/vmx/common.h          |  26 +++--
 arch/x86/kvm/vmx/hyperv_evmcs.h    |   1 +
 arch/x86/kvm/vmx/main.c            |   9 ++
 arch/x86/kvm/vmx/nested.c          |  46 +++++++-
 arch/x86/kvm/vmx/tdx.c             |   2 +-
 arch/x86/kvm/vmx/vmx.c             |  27 ++++-
 arch/x86/kvm/vmx/vmx.h             |   1 +
 arch/x86/kvm/vmx/x86_ops.h         |   1 +
 arch/x86/kvm/x86.c                 |  18 +--
 27 files changed, 536 insertions(+), 228 deletions(-)

-- 
2.54.0
Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by David Riley 1 month ago
Thanks again for the updated version of this patch series.

I have been testing v6 on Intel and AMD platforms again and observed
a regression on Intel when CET and MBEC are both exposed to a Windows
guest.

Environment:
- Kernel: mainline 7.1.0-rc2 (with v6 patches applied)
- QEMU: downstream 11.0.0-1
- Guest: Windows Server 2026 (24H2, Build 26100.1742)
- virtio-win: 0.1.271

Hosts:
Intel: Intel(R) Xeon(R) Gold 6426Y
AMD: Epyc 7302P

Both hosts are running Proxmox VE (based on Debian Trixie).

Windows Guest Setup:
After the initial installation and verification [0] I enabled
Virtualization-Based Security (VBS) and Hypervisor-Protected Code
Integrity (HVCI).

I set the following in the Group Policy Editor (DeviceGuard):
* Select Platform Security Level: Secure Boot
* Virtualization Based Protection of Code Integrity: Enabled without
     lock
* Require UEFI Memory Attributes Table: Checked


Issue: Host Lockups and Guest Hangs

On the Intel platform, the guest fails to boot when using:
QEMU options: -cpu host,level=30,+vmx-mbec

I observed two behaviors:
* The guest hangs indefinitely during the early boot phase. (Most
   frequent)
* The guest fails to boot and ends up in Windows Recovery Mode.

When the guest hangs during early boot, the host experiences hard/soft
lockups:

watchdog: CPU11: Watchdog detected hard LOCKUP on cpu 11
watchdog: BUG: soft lockup - CPU#11 stuck for 28s [CPU 0/KVM:16105]

I also recorded a trace of the virtual guest getting stuck using:
`trace-cmd record -e kvm`

and found the following:

Frequency of top RIP:
* 987816 rip 0xfffff801b031bf36
* 985670 rip 0xfffff801b031bf35
* 184002 rip 0x7ffd3e35

Sequence of Events:
--
CPU 0/KVM-16105 [001] .....  4327.371276: kvm_cr:  cr_write 4 = 0xb50ef8
...
CPU 0/KVM-16105 [001] .....  4327.373469: kvm_pio: pio_read at 0x608 size 4 count 1 val 0x5bcbb1
CPU 0/KVM-16105 [001] d..1.  4327.373469: kvm_entry:  vcpu 0 rip 0xfffff801b031bf36
CPU 0/KVM-16105 [001] d..1.  4327.373470: kvm_exit:  reason IO_INSTRUCTION rip 0xfffff801b031bf35 info 608000b 0
--
the last three lines seem to be repeating in a infinite loop.

On the AMD platform, the guest had no issue booting when using:
QEMU options: -cpu host

This seem to be the case because CET is not present on AMD.
I confirmed this using:
`cpuid -1 -l 7 -s 0`

which shows that:
- CET_SS: CET shadow stack                 = false
- CET_IBT: CET indirect branch tracking    = false

On Intel `cpuid -1 -l 7 -s 0` shows:
- CET_SS: CET shadow stack                 = true
- CET_IBT: CET indirect branch tracking    = true

If I explicitly disable them on Intel using:
QEMU options: -cpu host,level=30,+vmx-mbec,-cet-ss,-cet-ibt
the guest boots without issues.

This regression previously did not occur because I was using QEMU
version 10.2.0-1, where these options did not yet get exposed for this
particular Intel CPU [1].

Please let me know if you need further information or if there is
something else I could try/test.

[0] https://learn.microsoft.com/en-us/windows/security/hardware-security/enable-virtualization-based-protection-of-code-integrity?tabs=security
[1] https://gitlab.com/qemu-project/qemu/-/commit/5cb89cad7f30be3175dd5abbb79ae5e634476cfa


On 5/5/26 9:50 PM, Paolo Bonzini wrote:
> This version can also be found in the "queue" branch of kvm.git.
> Since it should be final I'm including again for reference the full
> description.
>      
> Both MBEC and GMET allow more granular control over execute permissions,
> with different levels of separation between supervisor and user mode.
> MBEC provides support for separate supervisor and user-mode bits in the
> PTEs; GMET instead lacks supervisor-mode only execution (with NX=0,
> "both" is represented by U=0 and user-mode only by U=1).  GMET was
> clearly inspired by SMEP though with some differences and annoyances.
>
> The implementation starts from two changes to core MMU code, both
> of which help making the actual feature almost trivial to implement:
>
> - first, I'm cleaning up the implementation of nVMX exec-only, by
>    properly adding read permissions to the ACC_* constant and to the
>    permission bitmask machinery.  Jon also had to add a fourth ACC_*
>    bit, but used it only in the special case of nested MBEC; here
>    instead ACC_READ_MASK is the normality, which simplifies testing
>    a lot and removes gratuitous complexity.
>
> - second, I'm enforcing that KVM runs with MBEC/GMET enabled even in
>    non-nested mode, if it wants to provide the feature to nested
>    hypervisors.  This makes the creation of SPTEs looks exactly the
>    same for L1 and L2 guests, despite only the latter using MBEC/GMET
>    fully; the difference lies only in the input access permissions.
>
> This strategy adds a limited amount of complexity to the core is limited,
> while providing for an almost entirely seamless support of nested
> hypervisors.
>
> Later patches have to use slightly different meanings for ACC_* in Intel
> and AMD.  On the Intel side, some work is needed in order to split
> shadow_x_mask and ACC_EXEC_MASK in two; now that there is an actual
> ACC_READ_MASK to be used for exec-only pages, ACC_USER_MASK is unused
> and can be reused as ACC_USER_EXEC_MASK.  However, unlike the older
> ACC_USER_MASK hack these differences are backed by concrete concepts
> of the page table format, and there is always a 1:1 mapping from ACC_*
> bits to PT_*_MASK or shadow_*_mask:
>
>                              Intel                 AMD
>       --------------------   -------------------   -------------------
>       ACC_READ_MASK          PT_PRESENT_MASK       PT_PRESENT_MASK
>       ACC_WRITE_MASK         PT_WRITABLE_MASK      PT_WRITABLE_MASK
>       ACC_EXEC_MASK          shadow_xs_mask        shadow_nx_mask
>       ACC_USER_MASK          ---                   shadow_user_mask
>       ACC_USER_EXEC_MASK     shadow_xu_mask        ---
>
> On Intel, ACC_EXEC_MASK is used for kernel-mode execution and is tied to
> shadow_xs_mask (when MBEC is disabled, ACC_USER_EXEC_MASK and the XU bit
> are computed but ineffective).  update_permission_bitmask() precomputes
> all the necessary conditions.  On the AMD side, the U bit maps to
> ACC_USER_MASK but nNPT adjusts the permission bitmask to ignore it for
> reads and writes when GMET is active.  Despite the smaller scale of the
> changes compared to MBEC, there are some changes to make to use GMET
> for L1 guests, because the page tables have to be created with U=0.
> This means that the root page has role.access != ACC_ALL and its
> permissions have to be propagated down.
>
> Note that with MBEC the user/supervisor distinction depends on the U
> bit of the page tables rather than the CPL.  Processors provide this
> information to the hypervisor through the "advanced EPT violation
> vmexit info" feature, which is a requirement for KVM to use MBEC,
> and kvm-intel.ko passes it to the MMU in PFERR_USER_MASK (unlike
> kvm-amd.ko which computes it from the CPL).  This needs a small change
> to pass the effective XWU permissions of the page tables down to
> translate_nested_gpa().
>
> The former "smep_andnot_wp" bit of cpu_role.base, now named "cr4_smep",
> is repurposed for nested TDP to indicate that MBEC/GMET is on.  The minor
> pessimization for shadow page tables (toggling CR4.SMEP now always forces
> building a separate version of the shadow page tables, even though that's
> technically unnecessary if CR4.WP=1) is not really worth fretting about;
> in practice, guests are not going to flip CR4.SMEP in a way that would
> prevent efficient reuse of shadow page tables.
>
> Paolo
>
> v5->v6:
> - rename make_spte_executable to change_spte_executable
> - rename byte index in update_permission_bitmask to index
> - use (u8) casts before "KVM: x86/mmu: introduce ACC_READ_MASK"
> - make commit message for "KVM: x86/mmu: split XS/XU bits for EPT" more accurate
> - add XU to shadow_acc_track_mask already in "KVM: x86/mmu: split XS/XU bits for EPT"
> - fix compilation error
> - use alternative code for __vmx_handle_ept_violation suggested by Sean
>
>
> Jon Kohler (5):
>    KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
>    KVM: x86/mmu: remove SPTE_PERM_MASK
>    KVM: x86/mmu: free up bit 10 of PTEs in preparation for MBEC
>    KVM: nVMX: advertise MBEC to nested guests
>    KVM: nVMX: allow MBEC with EVMCS
>
> Paolo Bonzini (23):
>    KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC
>    KVM: x86/mmu: remove SPTE_EPT_*
>    KVM: x86/mmu: merge make_spte_{non,}executable
>    KVM: x86/mmu: rename and clarify BYTE_MASK
>    KVM: x86/mmu: separate more EPT/non-EPT permission_fault()
>    KVM: x86/mmu: introduce ACC_READ_MASK
>    KVM: x86/mmu: pass PFERR_GUEST_PAGE/FINAL_MASK to kvm_translate_gpa
>    KVM: x86/mmu: pass pte_access for final nGPA->GPA walk
>    KVM: x86: make translate_nested_gpa vendor-specific
>    KVM: x86/mmu: split XS/XU bits for EPT
>    KVM: x86/mmu: move cr4_smep to base role
>    KVM: VMX: enable use of MBEC
>    KVM: nVMX: pass advanced EPT violation vmexit info to guest
>    KVM: nVMX: pass PFERR_USER_MASK to MMU on EPT violations
>    KVM: x86/mmu: add support for MBEC to EPT page table walks
>    KVM: x86/mmu: propagate access mask from root pages down
>    KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D
>    KVM: SVM: add GMET bit definitions
>    KVM: x86/mmu: hard code more bits in kvm_init_shadow_npt_mmu
>    KVM: x86/mmu: add support for GMET to NPT page table walks
>    KVM: SVM: enable GMET and set it in MMU role
>    KVM: SVM: work around errata 1218
>    KVM: nSVM: enable GMET for guests
>
>   Documentation/virt/kvm/x86/mmu.rst |  10 +-
>   arch/x86/include/asm/cpufeatures.h |   1 +
>   arch/x86/include/asm/kvm-x86-ops.h |   1 +
>   arch/x86/include/asm/kvm_host.h    |  48 +++++---
>   arch/x86/include/asm/svm.h         |   1 +
>   arch/x86/include/asm/vmx.h         |  14 ++-
>   arch/x86/kvm/hyperv.c              |   4 +-
>   arch/x86/kvm/mmu.h                 |  30 +++--
>   arch/x86/kvm/mmu/mmu.c             | 182 ++++++++++++++++++++---------
>   arch/x86/kvm/mmu/mmutrace.h        |  19 +--
>   arch/x86/kvm/mmu/paging_tmpl.h     |  73 ++++++++----
>   arch/x86/kvm/mmu/spte.c            |  92 +++++++++------
>   arch/x86/kvm/mmu/spte.h            |  70 ++++++-----
>   arch/x86/kvm/mmu/tdp_mmu.c         |   6 +-
>   arch/x86/kvm/svm/nested.c          |  38 +++++-
>   arch/x86/kvm/svm/svm.c             |  31 +++++
>   arch/x86/kvm/svm/svm.h             |   1 +
>   arch/x86/kvm/vmx/capabilities.h    |  12 +-
>   arch/x86/kvm/vmx/common.h          |  26 +++--
>   arch/x86/kvm/vmx/hyperv_evmcs.h    |   1 +
>   arch/x86/kvm/vmx/main.c            |   9 ++
>   arch/x86/kvm/vmx/nested.c          |  46 +++++++-
>   arch/x86/kvm/vmx/tdx.c             |   2 +-
>   arch/x86/kvm/vmx/vmx.c             |  27 ++++-
>   arch/x86/kvm/vmx/vmx.h             |   1 +
>   arch/x86/kvm/vmx/x86_ops.h         |   1 +
>   arch/x86/kvm/x86.c                 |  18 +--
>   27 files changed, 536 insertions(+), 228 deletions(-)
>

Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Paolo Bonzini 1 month ago
On Mon, May 11, 2026 at 12:54 PM David Riley <d.riley@proxmox.com> wrote:
> Environment:
> - Kernel: mainline 7.1.0-rc2 (with v6 patches applied)

Using 7.0.4 + v6 patches applied (except for the AMD ones) I could
reproduce the guest not booting, but not the host lockups. I also have
Windows Server 2025 build 26100, and in my case the host is a Meteor
Lake.

I'm running my guest with

$ qemu-kvm \
   -M q35 -drive if=ide,file=win2k25.qcow2 \
   -cpu host,+vmx-mbec,+cet-ss,+cet-ibt -vnc :0 \
   -monitor stdio -m 8192 \
   -bios /usr/share/edk2/ovmf/OVMF.stateless.secboot.fd  \
   -device nec-usb-xhci -usbdevice tablet -smp 8

However, the trace shows that CET is not used at all unless MBEC is
present. In particular (after "trace-cmd record -e kvm ...") I can do:

$ trace-cmd report |grep  -e msr_write.*da0| sed 's/.*kvm_/kvm_/' | sort -u

and it shows as expected this with +vmx-mbec,+cet-ss,+cet-ibt:

kvm_msr:              msr_write da0 = 0x800

but not with -vmx-mbec,+cet-ss,+cet-ibt.  This initialization is
performed by Hyper-V even before VMXON, and the breakage happens even
if Memory Integrity is disabled inside Windows.

Knowing that Hyper-V was not running any nested guest at the time of
the hang, I changed __vmcs_writel() to have

        if (field == SECONDARY_VM_EXEC_CONTROL) value &=
~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;

which is admittedly a bit blunt :) but lets Hyper-V use CET, while
basically undoing the effects of this patch for non-nested operation.
This also hung for me.

If possible, could you please check:

1) whether 7.0 + patches (up to 22/28) also causes the host to hang?
2) whether 7.1 + patches (up to 22/28) also causes the host to hang?

to understand if this is a) something caused by our different setup b)
a regression in 7.1 c) something caused by the last 6 patches?

So, while the host hanging is worrisome, this seems to be caused more
likely by the CET enablement rather than by MBEC.

Thanks,

Paolo
Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by David Riley 3 weeks, 6 days ago
Hi Paolo, Hi Chao, Hi Sean,

I have been testing the v6 patchset (up to 22/28) this time on Arrow
Lake hardware. My results suggest a kernel version dependent regression
regarding host stability.

Environment:
* Host CPU: Intel(R) Core(TM) Ultra 7 265K (Arrow Lake)
* Motherboard: Gigabyte Z890 EAGLE (BIOS F18)
* Host OS: Proxmox VE based on Debian Trixie
* Host Kernel: mainline with patches 1-22/28 applied.
* Guest OS: Windows Server 2026 (24H2, Build 26100.1742) with VBS/Hyper-V
   enabled.
* QEMU Command: -cpu host,level=30,+vmx-mbec,+cet-ss,+cet-ibt

Results for Kernel 7.1.0-rc3 + v6 patches 1-22:
I can reproduce the guest failing to boot. This setup causes host lockups on
my Arrow Lake machine. In some cases, the guest manages to reach Windows
Recovery, but most of the time it does not.

@Chao, in the first line you can see the hard lockup. Also have a look at the
hrtimer trap I tested below.

dmesg output:
[Fri May 15 13:07:37 2026] watchdog: CPU1: Watchdog detected hard LOCKUP on cpu 1
...
[Fri May 15 13:07:37 2026] CPU: 1 UID: 0 PID: 3327 Comm: CPU 0/KVM Tainted: G            E  7.1.0-rc3-v7.1-rc3-v6-p22-00022-g9ba8d1bdd861-dirty #31 PREEMPT(lazy)
[Fri May 15 13:07:37 2026] Tainted: [E]=UNSIGNED_MODULE
[Fri May 15 13:07:37 2026] Hardware name: Gigabyte Technology Co., Ltd. Z890 EAGLE/Z890 EAGLE, BIOS F18 11/27/2025
[Fri May 15 13:07:37 2026] RIP: 0010:vmx_do_nmi_irqoff+0x13/0x20 [kvm_intel]
[Fri May 15 13:07:37 2026] Code: 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 48 83 e4 f0 6a 18 55 9c 6a 10 e8 5d cc ca f1 <c9> c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90
[Fri May 15 13:07:37 2026] RSP: 0018:ffffcdf58fdf7c28 EFLAGS: 00000082
[Fri May 15 13:07:37 2026] RAX: 0000000080000200 RBX: ffff8baa8a6f4900 RCX: 0000000000000000
[Fri May 15 13:07:37 2026] RDX: 0000000080000202 RSI: 0000000000000000 RDI: ffff8baa8a6f4900
[Fri May 15 13:07:37 2026] RBP: ffffcdf58fdf7c28 R08: 0000000000000000 R09: 0000000000000000
[Fri May 15 13:07:37 2026] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[Fri May 15 13:07:37 2026] R13: 0000000000000000 R14: 0000000000000004 R15: ffff8bab70170000
[Fri May 15 13:07:37 2026] FS:  0000756cc330a6c0(0000) GS:ffff8bba5a58a000(0000) knlGS:fffff8031fbbd000
[Fri May 15 13:07:37 2026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri May 15 13:07:37 2026] CR2: 00007fffffff0000 CR3: 00000001152c2001 CR4: 0000000008f72ef0
[Fri May 15 13:07:37 2026] PKRU: 55555554
[Fri May 15 13:07:37 2026] Call Trace:
[Fri May 15 13:07:37 2026]  <TASK>
[Fri May 15 13:07:37 2026]  vmx_handle_nmi+0x9a/0x140 [kvm_intel]
[Fri May 15 13:07:37 2026]  vmx_vcpu_enter_exit+0x18f/0x300 [kvm_intel]
[Fri May 15 13:07:37 2026]  vmx_vcpu_run+0x1d2/0x12c0 [kvm_intel]
[Fri May 15 13:07:37 2026]  vt_vcpu_run+0x1a/0x40 [kvm_intel]
[Fri May 15 13:07:37 2026]  kvm_arch_vcpu_ioctl_run+0x69e/0x18e0 [kvm]
[Fri May 15 13:07:37 2026]  ? fire_user_return_notifiers+0x37/0x70
[Fri May 15 13:07:37 2026]  ? __x64_sys_ioctl+0xbf/0x100
[Fri May 15 13:07:37 2026]  kvm_vcpu_ioctl+0x312/0xba0 [kvm]
[Fri May 15 13:07:37 2026]  ? __x64_sys_ioctl+0xbf/0x100
[Fri May 15 13:07:37 2026]  ? kvm_on_user_return+0x4a/0x90 [kvm]
[Fri May 15 13:07:37 2026]  ? fire_user_return_notifiers+0x37/0x70
[Fri May 15 13:07:37 2026]  ? do_syscall_64+0x396/0x14c0
[Fri May 15 13:07:37 2026]  __x64_sys_ioctl+0xa5/0x100
[Fri May 15 13:07:37 2026]  x64_sys_call+0x103b/0x2390
[Fri May 15 13:07:37 2026]  do_syscall_64+0xe6/0x14c0
[Fri May 15 13:07:37 2026]  ? fire_user_return_notifiers+0x37/0x70
[Fri May 15 13:07:37 2026]  ? do_syscall_64+0x396/0x14c0
[Fri May 15 13:07:37 2026]  ? fire_user_return_notifiers+0x37/0x70
[Fri May 15 13:07:37 2026]  ? do_syscall_64+0x396/0x14c0
[Fri May 15 13:07:37 2026]  ? do_syscall_64+0x9b/0x14c0
[Fri May 15 13:07:37 2026] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Fri May 15 13:07:37 2026] RIP: 0033:0x756cc783091b
[Fri May 15 13:07:37 2026] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[Fri May 15 13:07:37 2026] RSP: 002b:0000756cc3305b30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Fri May 15 13:07:37 2026] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 0000756cc783091b
[Fri May 15 13:07:37 2026] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000020
[Fri May 15 13:07:37 2026] RBP: 00005a973691c030 R08: 0000000000000000 R09: 0000000000000000
[Fri May 15 13:07:37 2026] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[Fri May 15 13:07:37 2026] R13: 0000000000000001 R14: 0000000000000608 R15: 0000000000000000
[Fri May 15 13:07:37 2026]  </TASK>

Output from trace-cmd when the guest gets stuck:
        CPU 0/KVM-6610  [001] d..2.   709.333183: kvm_apic_accept_irq:  apicid 0 vec 209 (Fixed|edge)
        CPU 0/KVM-6610  [001] d..2.   709.333183: kvm_apicv_accept_irq: apicid 0 vec 209 (Fixed|edge)
        CPU 0/KVM-6610  [001] d..3.   709.333183: kvm_hv_timer_state:   vcpu_id 0 hv_timer 1
        CPU 0/KVM-6610  [001] d..1.   709.333183: kvm_entry:      vcpu 0 rip 0xfffff800b49020ec
        CPU 0/KVM-6610  [001] d..1.   709.333183: kvm_wait_lapic_expire: vcpu 0: delta 460 (late)
        CPU 0/KVM-6610  [001] d..1.   709.348806: kvm_exit:      reason PREEMPTION_TIMER rip 0xfffff800b49020ec info 0 0
        CPU 0/KVM-6610  [001] d..2.   709.348806: kvm_apic_accept_irq:  apicid 0 vec 209 (Fixed|edge)
        CPU 0/KVM-6610  [001] d..2.   709.348806: kvm_apicv_accept_irq: apicid 0 vec 209 (Fixed|edge)
        CPU 0/KVM-6610  [001] d..3.   709.348806: kvm_hv_timer_state:   vcpu_id 0 hv_timer 1
        CPU 0/KVM-6610  [001] d..1.   709.348806: kvm_entry:      vcpu 0 rip 0xfffff800b49020ec
        CPU 0/KVM-6610  [001] d..1.   709.348807: kvm_wait_lapic_expire: vcpu 0: delta -1624 (late)

and
trace-cmd report |grep  -e msr_write.*da0| sed 's/.*kvm_/kvm_/' | sort -u
kvm_msr:              msr_write da0 = 0x800


If I run:
sudo modprobe -r kvm_intel
sudo modprobe kvm_intel preemption_timer=0

I am able to boot into windows sometimes.

And other times it enters a endless loop:
boots into windows recovery mode:
        CPU 0/KVM-18245 [001] .....  2628.320804: kvm_fpu:     unload
        CPU 0/KVM-18245 [001] .....  2628.320804: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
        CPU 0/KVM-18245 [001] .....  2628.320805: kvm_fpu:     load
        CPU 0/KVM-18245 [001] .....  2628.320805: kvm_pio:     pio_read at 0x608 size 4 count 1 val 0xe1d664
        CPU 0/KVM-18245 [001] d..1.  2628.320805: kvm_entry:      vcpu 0 rip 0xfffff807c131bf36
        CPU 0/KVM-18245 [001] d..1.  2628.320806: kvm_exit:      reason IO_INSTRUCTION rip 0xfffff807c131bf35 info 608000b 0
        CPU 0/KVM-18245 [001] .....  2628.320806: kvm_fpu:     unload
        CPU 0/KVM-18245 [001] .....  2628.320807: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
        CPU 0/KVM-18245 [001] .....  2628.320808: kvm_fpu:     load
        CPU 0/KVM-18245 [001] .....  2628.320808: kvm_pio:     pio_read at 0x608 size 4 count 1 val 0xe1d66e
        CPU 0/KVM-18245 [001] d..1.  2628.320808: kvm_entry:      vcpu 0 rip 0xfffff807c131bf36
        CPU 0/KVM-18245 [001] d..1.  2628.320809: kvm_exit:      reason IO_INSTRUCTION rip 0xfffff807c131bf35 info 608000b 0


but I also observed this (Windows is stuck in the boot stage):
        CPU 0/KVM-35230 [001] d..1.  4945.511627: kvm_entry:      vcpu 0 rip 0xfffff804b53020ec
        CPU 0/KVM-35230 [001] d..1.  4945.511632: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020ec info 0 0
        CPU 0/KVM-35230 [001] d..1.  4945.511632: kvm_entry:      vcpu 0 rip 0xfffff804b53020ec
        CPU 0/KVM-35230 [001] d..1.  4945.511634: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020ec info 0 0
        CPU 0/KVM-35230 [001] d..1.  4945.511635: kvm_entry:      vcpu 0 rip 0xfffff804b53020ec
        CPU 0/KVM-35230 [001] d..1.  4945.511640: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020ec info 0 0
        CPU 0/KVM-35230 [001] d..1.  4945.511640: kvm_entry:      vcpu 0 rip 0xfffff804b53020ec
        CPU 0/KVM-35230 [001] d..1.  4945.730129: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020f7 info 0 0
        CPU 0/KVM-35230 [001] .....  4945.730145: kvm_apic_accept_irq:  apicid 0 vec 209 (Fixed|edge)
        CPU 0/KVM-35230 [001] .....  4945.730145: kvm_apicv_accept_irq: apicid 0 vec 209 (Fixed|edge)
        CPU 0/KVM-35230 [001] d..1.  4945.730146: kvm_entry:      vcpu 0 rip 0xfffff804b53020f7
    kvm-pit/35115-35236 [012] .....  4945.730212: kvm_set_irq:     gsi 0 level 1 source 2
    kvm-pit/35115-35236 [012] ...1.  4945.730215: kvm_pic_set_irq:     chip 0 pin 0 (edge|masked)
    kvm-pit/35115-35236 [012] ...1.  4945.730217: kvm_ioapic_set_irq:   pin 2 dst 0 vec 255 (Fixed|physical|edge|masked)
    kvm-pit/35115-35236 [012] .....  4945.730217: kvm_set_irq:     gsi 0 level 0 source 2
    kvm-pit/35115-35236 [012] ...1.  4945.730217: kvm_pic_set_irq:     chip 0 pin 0 (edge|masked)
    kvm-pit/35115-35236 [012] ...1.  4945.730217: kvm_ioapic_set_irq:   pin 2 dst 0 vec 255 (Fixed|physical|edge|masked)
        CPU 0/KVM-35230 [001] d..1.  4945.730559: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020ec info 0 0
        CPU 0/KVM-35230 [001] d..1.  4945.730561: kvm_entry:      vcpu 0 rip 0xfffff804b53020ec
        CPU 0/KVM-35230 [001] d..1.  4945.731559: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020ec info 0 0
        CPU 0/KVM-35230 [001] d..1.  4945.731560: kvm_entry:      vcpu 0 rip 0xfffff804b53020ec
        CPU 0/KVM-35230 [001] d..1.  4945.732559: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020ec info 0 0
        CPU 0/KVM-35230 [001] d..1.  4945.732560: kvm_entry:      vcpu 0 rip 0xfffff804b53020ec
        CPU 0/KVM-35230 [001] d..1.  4945.732934: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020ec info 0 0
        CPU 0/KVM-35230 [001] d..1.  4945.732935: kvm_entry:      vcpu 0 rip 0xfffff804b53020ec
        CPU 0/KVM-35230 [001] d..1.  4945.733559: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff804b53020f7 info 0 0
        CPU 0/KVM-35230 [001] d..1.  4945.733560: kvm_entry:      vcpu 0 rip 0xfffff804b53020f7
        CPU 2/KVM-35232 [006] .....  4945.893095: kvm_halt_poll_ns:     vcpu 2: halt_poll_ns 0 (shrink 10000)
        CPU 2/KVM-35232 [006] .....  4945.893097: kvm_vcpu_wakeup:     wait time 8589893724 ns, polling valid
        CPU 1/KVM-35231 [008] .....  4945.893300: kvm_halt_poll_ns:     vcpu 1: halt_poll_ns 0 (shrink 10000)
        CPU 1/KVM-35231 [008] .....  4945.893302: kvm_vcpu_wakeup:     wait time 8590000199 ns, polling valid
        CPU 3/KVM-35233 [011] .....  4945.893332: kvm_vcpu_wakeup:     wait time 8


Results for Kernel 7.1.0-rc3 + v6 patches 1-22 + hrtimer trap:

I used the mentioned trap from [0]

Before booting the VM I setup tracing and verified that it was on using:
cat /sys/kernel/tracing/tracing_on
1

and after booting the VM, which got stuck again, I checked again and it was
off:
cat /sys/kernel/tracing/tracing_on
0

I have the full compressed trace from this trigger event (captured with the
trap). It is quite large, but I can provide it if needed.


Results for Kernel 7.0.0 + v6 patches 1-22:
I used the same:
* Guest OS: Windows Server 2026 (24H2, Build 26100.1742) with VBS/Hyper-V
   enabled.
* QEMU Command: -cpu host,level=30,+vmx-mbec,+cet-ss,+cet-ibt

This also results in the Windows Guest getting stuck but there are no
indications of CPU lockups.

A trace-cmd shows that the guest is stuck entering and exiting:

        CPU 0/KVM-14938 [001] d..1.  2355.384828: kvm_entry:      vcpu 0 rip 0xfffff805879020ec
        CPU 0/KVM-14938 [001] d..1.  2355.385826: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff805879020ec info 0 0
        CPU 0/KVM-14938 [001] d..1.  2355.385827: kvm_entry:      vcpu 0 rip 0xfffff805879020ec
        CPU 0/KVM-14938 [001] d..1.  2355.386826: kvm_exit:      reason EXTERNAL_INTERRUPT rip 0xfffff805879020ec info 0 0

And in the trace report:
trace-cmd report |grep  -e msr_write.*da0| sed 's/.*kvm_/kvm_/' | sort -u
kvm_msr:              msr_write da0 = 0x800

Hope this helps, feel free to suggest other tests that I should run.

Best regards,
David

[0] https://lore.kernel.org/kvm/70cd3e97fbb796e2eb2ff8cd4b7614ada05a5f24.camel@intel.com


On 5/12/26 4:30 PM, Paolo Bonzini wrote:
> If possible, could you please check:
>
> 1) whether 7.0 + patches (up to 22/28) also causes the host to hang?
> 2) whether 7.1 + patches (up to 22/28) also causes the host to hang?
>
> to understand if this is a) something caused by our different setup b)
> a regression in 7.1 c) something caused by the last 6 patches?
>

Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Sean Christopherson 3 weeks, 6 days ago
On Fri, May 15, 2026, David Riley wrote:
> Hi Paolo, Hi Chao, Hi Sean,
> 
> I have been testing the v6 patchset (up to 22/28) this time on Arrow
> Lake hardware. My results suggest a kernel version dependent regression
> regarding host stability.
> 
> Environment:
> * Host CPU: Intel(R) Core(TM) Ultra 7 265K (Arrow Lake)
> * Motherboard: Gigabyte Z890 EAGLE (BIOS F18)
> * Host OS: Proxmox VE based on Debian Trixie
> * Host Kernel: mainline with patches 1-22/28 applied.
> * Guest OS: Windows Server 2026 (24H2, Build 26100.1742) with VBS/Hyper-V
>   enabled.
> * QEMU Command: -cpu host,level=30,+vmx-mbec,+cet-ss,+cet-ibt
> 
> Results for Kernel 7.1.0-rc3 + v6 patches 1-22:
> I can reproduce the guest failing to boot. This setup causes host lockups on
> my Arrow Lake machine. In some cases, the guest manages to reach Windows
> Recovery, but most of the time it does not.
> 
> @Chao, in the first line you can see the hard lockup. Also have a look at the
> hrtimer trap I tested below.
> 
> dmesg output:
> [Fri May 15 13:07:37 2026] watchdog: CPU1: Watchdog detected hard LOCKUP on cpu 1

...

> If I run:
> sudo modprobe -r kvm_intel
> sudo modprobe kvm_intel preemption_timer=0
> 
> I am able to boot into windows sometimes.

Hmm, this probably confirms its the hrtimer issue?  When using the VMX preemption
timer, KVM (on Intel) doesn't use an hrtimer to emulate L1's APIC timer.  I _think_
forcing KVM to use an hrtimer would cause result in hrtimers being reprogrammed
in response to KVM's usage, and thus mask the deferred reprogramming bug?  That
sounds plausible-ish?

> Results for Kernel 7.1.0-rc3 + v6 patches 1-22 + hrtimer trap:
> 
> I used the mentioned trap from [0]

Can you try Peter's fixes?  AIUI, the reporter's hack-a-fix was very far from a
complete fix.  Note, there's a v3 of patch 1 (b4 should take care of that for you,
if you're using b4).

https://lore.kernel.org/all/20260423155611.216805954@infradead.org
Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by David Riley 3 weeks, 2 days ago
Hi Sean,
thanks for the input.

On 5/15/26 8:31 PM, Sean Christopherson wrote:
> [...]
> Hmm, this probably confirms its the hrtimer issue?  When using the VMX preemption
> timer, KVM (on Intel) doesn't use an hrtimer to emulate L1's APIC timer.  I _think_
> forcing KVM to use an hrtimer would cause result in hrtimers being reprogrammed
> in response to KVM's usage, and thus mask the deferred reprogramming bug?  That
> sounds plausible-ish?
> [...]
> Can you try Peter's fixes?  AIUI, the reporter's hack-a-fix was very far from a
> complete fix.  Note, there's a v3 of patch 1 (b4 should take care of that for you,
> if you're using b4).
>
> https://lore.kernel.org/all/20260423155611.216805954@infradead.org

I tested it again with the v3 hrtimer patches [0] applied on top of the v6
MBEC/GMET series.

Setup:
* Host CPU: Intel(R) Core(TM) Ultra 7 265K (Arrow Lake)
* Host OS: Proxmox VE (based on Debian Trixie)
* Host Kernel: mainline kernel 7.1.0-rc4 with v6 MBEC/GMET and v3 
hrtimer [0]
* QEMU: 11.0.0 (downstream build)
* Guest OS: Windows Server 2026 (24H2, Build 26100.1742) with VBS/Hyper-V
    enabled.

Using:
* QEMU CPU Options: -cpu host,level=30,+vmx-mbec,-cet-ss,-cet-ibt

The CPU lockups did not occur anymore and I was able to boot the Guest. 
Keep
in mind that in this case I have the cet-ss and cet-ibt not passed along 
to the
guest.

If I launch the same Virtual Guest using
* QEMU CPU Options: -cpu host,level=30,+vmx-mbec,+cet-ss,+cet-ibt

The issue of the VM being stuck on boot persists even with the hrtimer 
patches
applied, but now there are no hard/soft lockups of the CPU anymore.

I get this trace using: trace-cmd record -e kvm

        CPU 0/KVM-11837 [001] d..2.  1363.314703: kvm_apic_accept_irq:  
apicid 0 vec 209 (Fixed|edge)
        CPU 0/KVM-11837 [001] d..2.  1363.314703: kvm_apicv_accept_irq: 
apicid 0 vec 209 (Fixed|edge)
        CPU 0/KVM-11837 [001] d..3.  1363.314703: kvm_hv_timer_state:  
  vcpu_id 0 hv_timer 1
        CPU 0/KVM-11837 [001] d..1.  1363.314703: kvm_entry:      vcpu 0 
rip 0xfffff801a59020f7
        CPU 0/KVM-11837 [001] d..1.  1363.314703: kvm_wait_lapic_expire: 
vcpu 0: delta -590 (late)
        CPU 0/KVM-11837 [001] d..1.  1363.314993: kvm_exit:      reason 
EXTERNAL_INTERRUPT rip 0xfffff801a59020ec info 0 0
        CPU 0/KVM-11837 [001] d..1.  1363.314994: kvm_entry:      vcpu 0 
rip 0xfffff801a59020ec
        CPU 0/KVM-11837 [001] d..1.  1363.315993: kvm_exit:      reason 
EXTERNAL_INTERRUPT rip 0xfffff801a59020ec info 0 0
        CPU 0/KVM-11837 [001] d..1.  1363.315993: kvm_entry:      vcpu 0 
rip 0xfffff801a59020ec
        CPU 0/KVM-11837 [001] d..1.  1363.316993: kvm_exit:      reason 
EXTERNAL_INTERRUPT rip 0xfffff801a59020ec info 0 0
        CPU 0/KVM-11837 [001] d..1.  1363.316994: kvm_entry:      vcpu 0 
rip 0xfffff801a59020ec
        CPU 0/KVM-11837 [001] d..1.  1363.317992: kvm_exit:      reason 
EXTERNAL_INTERRUPT rip 0xfffff801a59020ec info 0 0
        CPU 0/KVM-11837 [001] d..1.  1363.317993: kvm_entry:      vcpu 0 
rip 0xfffff801a59020ec
        CPU 0/KVM-11837 [001] d..1.  1363.318992: kvm_exit:      reason 
EXTERNAL_INTERRUPT rip 0xfffff801a59020ec info 0 0
        CPU 0/KVM-11837 [001] d..1.  1363.319269: kvm_entry:      vcpu 0 
rip 0xfffff801a59020ec
        CPU 0/KVM-11837 [001] d..1.  1363.319992: kvm_exit:      reason 
EXTERNAL_INTERRUPT rip 0xfffff801a59020ec info 0 0
        CPU 0/KVM-11837 [001] d..1.  1363.319994: kvm_entry:      vcpu 0 
rip 0xfffff801a59020ec

I did not spot anything useful in the dmesg/journalctl output.

I also did the same tests with mainline kernel
7.1.0-rc3 with v6 MBEC/GMET (patches 1-22 of 28) and v3 hrtimer [0]
and got the same results.

Best regards,
David

[0] https://lore.kernel.org/all/20260423155611.216805954@infradead.org



Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Paolo Bonzini 1 month ago
On 5/12/26 16:32, Paolo Bonzini wrote:
The trace shows that CET is not used at all unless MBEC is
> present. In particular (after "trace-cmd record -e kvm ...") I can do:
> 
> $ trace-cmd report |grep  -e msr_write.*da0| sed 's/.*kvm_/kvm_/' | sort -u
> 
> and it shows as expected this with +vmx-mbec,+cet-ss,+cet-ibt:
> 
> kvm_msr:              msr_write da0 = 0x800
> 
> but not with -vmx-mbec,+cet-ss,+cet-ibt.  This initialization is
> performed by Hyper-V even before VMXON, and the breakage happens even
> if Memory Integrity is disabled inside Windows.
> 
> Knowing that Hyper-V was not running any nested guest at the time of
> the hang, I changed __vmcs_writel() to have
> 
>          if (field == SECONDARY_VM_EXEC_CONTROL) value &=
> ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;

I have now reproduced the guest hang with a one line change on top of 
kvm/master:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 937aeb474af7..43e0f20e4e26 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -7231,6 +7231,7 @@ static void nested_vmx_setup_secondary_ctls(u32 
ept_caps,
         if (enable_ept) {
                 /* nested EPT: emulate EPT also to L1 */
                 msrs->secondary_ctls_high |=
+                       SECONDARY_EXEC_MODE_BASED_EPT_EXEC | /* hem hem */
                         SECONDARY_EXEC_ENABLE_EPT;
                 msrs->ept_caps =
                         VMX_EPT_PAGE_WALK_4_BIT |

(which would break very badly if Hyper-V were to start a nested guest, 
but the trace says it doesn't).

Can you check what behavior you get from this (actually silly) change? 
It should allow you to exercise Hyper-V's CET paths without the burden 
of the MMU changes.

Paolo
Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Paolo Bonzini 1 month ago
On 5/11/26 12:53, David Riley wrote:
> 
> watchdog: CPU11: Watchdog detected hard LOCKUP on cpu 11
> watchdog: BUG: soft lockup - CPU#11 stuck for 28s [CPU 0/KVM:16105]

What is the backtrace here?

Thanks,

Paolo
Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by David Riley 1 month ago
is that enough?

dmesg | grep -A 50 "soft lockup - CPU#11"

[ 5565.326572] watchdog: BUG: soft lockup - CPU#11 stuck for 28s! [CPU 0/KVM:16105]
[ 5565.326576] Modules linked in: tcp_diag(E) inet_diag(E) veth(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) nfs(E) lockd(E) grace(E) netfs(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) iptable_raw(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) sunrpc(E) nf_tables(E) softdog(E) bonding(E) tls(E) binfmt_misc(E) nfnetlink_log(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency(E) intel_uncore_frequency_common(E) intel_ifs(E) i10nm_edac(E) skx_edac_common(E) nfit(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) cxl_pci(E) cxl_mem(E) kvm_intel(E) acpi_power_meter(E) ipmi_ssif(E) cxl_acpi(E) cxl_port(E) cxl_pmem(E) kvm(E) pmt_telemetry(E) dax_hmem(E) pmt_discovery(E) irqbypass(E) pmt_class(E) intel_sdsi(E) bnxt_re(E) cxl_core(E) aesni_intel(E) ib_uverbs(E) gf128mul(E) isst_if_mmio(E) isst_if_mbox_pci(E) fwctl(E) rapl(E) cmdlinepart(E) intel_cstate(E) einj(E) pcspkr(E) wmi_bmof(E) spi_nor(E) iaa_crypto(E) isst_if_common(E) ib_core(E) 
intel_vsec(E) mei_me(E) ast(E) mtd(E) spd5118(E)
[ 5565.326599]  i2c_algo_bit(E) mei(E) ipmi_si(E) acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) joydev(E) input_leds(E) mac_hid(E) sch_fq_codel(E) msr(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) efi_pstore(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) libblake2b(E) raid6_pq(E) xor(E) hid_generic(E) usbmouse(E) usbkbd(E) usbhid(E) hid(E) cdc_ether(E) usbnet(E) mii(E) uas(E) usb_storage(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) nvme(E) nvme_core(E) xhci_pci(E) i40e(E) nvme_keyring(E) i2c_i801(E) idxd(E) tg3(E) ahci(E) i2c_mux(E) libie(E) idxd_bus(E) spi_intel_pci(E) bnxt_en(E) nvme_auth(E) libie_adminq(E) xhci_hcd(E) i2c_smbus(E) spi_intel(E) libahci(E) i2c_ismt(E) wmi(E) pinctrl_emmitsburg(E)
[ 5565.326618] CPU: 11 UID: 0 PID: 16105 Comm: CPU 0/KVM Tainted: G            EL      7.1.0-rc2-v6-mbec-gmet-00028-g1e3b074acc33 #24 PREEMPT(lazy)
[ 5565.326620] Tainted: [E]=UNSIGNED_MODULE, [L]=SOFTLOCKUP
[ 5565.326620] Hardware name: ****, BIOS 3001 07/03/2025
[ 5565.326621] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x78d/0x18e0 [kvm]
[ 5565.326680] Code: 07 00 48 83 bb 28 08 00 00 00 0f 85 69 0a 00 00 0f 1f 44 00 00 65 c6 05 08 d7 08 c7 01 c6 83 e2 0a 00 00 01 fb 0f 1f 44 00 00 <48> 83 83 48 19 00 00 01 fa 0f 1f 44 00 00 c6 83 e2 0a 00 00 00 0f
[ 5565.326681] RSP: 0018:ff6afb0ed454b9e0 EFLAGS: 00000246
[ 5565.326682] RAX: 0000000000000000 RBX: ff2a147c1758a480 RCX: 0000000000000000
[ 5565.326683] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 5565.326683] RBP: ff6afb0ed454ba90 R08: 0000000000000000 R09: 0000000000000000
[ 5565.326684] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 5565.326684] R13: ff2a147c12163000 R14: 0000000000000000 R15: ff2a147c20d2b140
[ 5565.326685] FS:  00007503727ef6c0(0000) GS:ff2a147bf748a000(0000) knlGS:fffff8013ee33000
[ 5565.326685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5565.326686] CR2: 00007fffffff0000 CR3: 00000020948ed006 CR4: 0000000000f73ef0
[ 5565.326687] PKRU: 55555554
[ 5565.326687] Call Trace:
[ 5565.326688]  <TASK>
[ 5565.326689]  ? trace_event_buffer_reserve+0xa5/0xe0
[ 5565.326692]  ? trace_event_raw_event_kvm_userspace_exit+0x6c/0xc0 [kvm]
[ 5565.326727]  kvm_vcpu_ioctl+0x312/0xba0 [kvm]
[ 5565.326762]  ? __rb_reserve_next.constprop.0+0x5c/0x420
[ 5565.326765]  ? ring_buffer_lock_reserve+0x155/0x410
[ 5565.326767]  __x64_sys_ioctl+0xa5/0x100
[ 5565.326769]  x64_sys_call+0x103b/0x2390
[ 5565.326771]  do_syscall_64+0xe6/0x14c0
[ 5565.326774]  ? trace_event_buffer_reserve+0xa5/0xe0
[ 5565.326775]  ? trace_event_raw_event_kvm_userspace_exit+0x6c/0xc0 [kvm]
[ 5565.326807]  ? kvm_vcpu_ioctl+0x2a7/0xba0 [kvm]
[ 5565.326841]  ? trace_event_buffer_reserve+0xa5/0xe0
[ 5565.326842]  ? trace_event_raw_event_kvm_userspace_exit+0x6c/0xc0 [kvm]
[ 5565.326874]  ? __x64_sys_ioctl+0xbf/0x100
[ 5565.326875]  ? kvm_on_user_return+0x4a/0x90 [kvm]
[ 5565.326916]  ? fire_user_return_notifiers+0x37/0x70
[ 5565.326918]  ? do_syscall_64+0x396/0x14c0
[ 5565.326920]  ? do_syscall_64+0x396/0x14c0
[ 5565.326922]  ? __x64_sys_ioctl+0xbf/0x100
[ 5565.326923]  ? kvm_on_user_return+0x4a/0x90 [kvm]
[ 5565.326962]  ? fire_user_return_notifiers+0x37/0x70
[ 5565.326963]  ? do_syscall_64+0x396/0x14c0
[ 5565.326965]  ? kvm_on_user_return+0x4a/0x90 [kvm]
[ 5565.327002]  ? fire_user_return_notifiers+0x37/0x70
[ 5565.327004]  ? do_syscall_64+0x396/0x14c0
[ 5565.327005]  ? do_syscall_64+0x396/0x14c0
[ 5565.327007]  ? do_syscall_64+0x396/0x14c0
[ 5565.327008]  ? do_syscall_64+0x396/0x14c0
[ 5565.327009]  ? do_syscall_64+0x9b/0x14c0
[ 5565.327011]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 5565.327012] RIP: 0033:0x75037650f91b


On 5/11/26 12:54 PM, Paolo Bonzini wrote:
> On 5/11/26 12:53, David Riley wrote:
>>
>> watchdog: CPU11: Watchdog detected hard LOCKUP on cpu 11
>> watchdog: BUG: soft lockup - CPU#11 stuck for 28s [CPU 0/KVM:16105]
>
> What is the backtrace here?
>
> Thanks,
>
> Paolo
>
>

Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Chao Gao 4 weeks, 1 day ago
On Mon, May 11, 2026 at 01:07:33PM +0200, David Riley wrote:
>is that enough?
>
>dmesg | grep -A 50 "soft lockup - CPU#11"

Do you also have a hard lockup trace?

I want to make sure the host lockup is not the issue discussed here:

https://lore.kernel.org/kvm/70cd3e97fbb796e2eb2ff8cd4b7614ada05a5f24.camel@intel.com/
Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Sean Christopherson 4 weeks ago
On Thu, May 14, 2026, Chao Gao wrote:
> On Mon, May 11, 2026 at 01:07:33PM +0200, David Riley wrote:
> >is that enough?
> >
> >dmesg | grep -A 50 "soft lockup - CPU#11"
> 
> Do you also have a hard lockup trace?
> 
> I want to make sure the host lockup is not the issue discussed here:
> 
> https://lore.kernel.org/kvm/70cd3e97fbb796e2eb2ff8cd4b7614ada05a5f24.camel@intel.com

Ugh, if it is the hrtimer issue, I apologize in advance.  Despite being bitten
by that bug over, and over, and over, I somehow keep forgetting to mention it to
others when they run into problems.  Glad someone is paying attention...
Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Sean Christopherson 1 month ago
On Tue, May 05, 2026, Paolo Bonzini wrote:
> This version can also be found in the "queue" branch of kvm.git.
> Since it should be final I'm including again for reference the full
> description.

I still have two nits, but on my end the most important thing is to stabilize the
hashes sooner than later, so I can use the resulting merge into kvm/next as the
basis for 7.2 topic branches.  I.e. feel free to ignore the nits for now if that
makes life easier for you, I can always send patches to apply on top.

> v5->v6:
> - rename make_spte_executable to change_spte_executable
> - rename byte index in update_permission_bitmask to index
> - use (u8) casts before "KVM: x86/mmu: introduce ACC_READ_MASK"
> - make commit message for "KVM: x86/mmu: split XS/XU bits for EPT" more accurate
> - add XU to shadow_acc_track_mask already in "KVM: x86/mmu: split XS/XU bits for EPT"
> - fix compilation error
> - use alternative code for __vmx_handle_ept_violation suggested by Sean
Re: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Posted by Paolo Bonzini 1 month ago
On Thu, May 7, 2026 at 4:45 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, May 05, 2026, Paolo Bonzini wrote:
> > This version can also be found in the "queue" branch of kvm.git.
> > Since it should be final I'm including again for reference the full
> > description.
>
> I still have two nits, but on my end the most important thing is to stabilize the
> hashes sooner than later, so I can use the resulting merge into kvm/next as the
> basis for 7.2 topic branches.  I.e. feel free to ignore the nits for now if that
> makes life easier for you, I can always send patches to apply on top.

No no, it's fine. The ff one I just missed... the other I am not sure
if it gains much but I'm not going to argue either way.

I'll push with the stable hashes tomorrow.

Paolo