KVM: x86: APX reg prep work

[PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 4 weeks ago

Clean up KVM's register tracking and storage in preparation for landing APX,
which expands the maximum number of GPRs from 16 to 32.

This is kinda sorta an RFC, as there are some very opinionated changes.  I.e.
if you dislike something, please speak up.

My thought is to treat R16-R31 as much like other GPRs as possible (though
maybe we don't need to expand regs[] as sketched out in the last patch?).

Sean Christopherson (7):
  KVM: x86: Add dedicated storage for guest RIP
  KVM: x86: Drop the "EX" part of "EXREG" to avoid collision with APX
  KVM: nVMX: Do a bitwise-AND of regs_avail when switching active VMCS
  KVM: x86: Add wrapper APIs to reset dirty/available register masks
  KVM: x86: Track available/dirty register masks as "unsigned long"
    values
  KVM: x86: Use a proper bitmap for tracking available/dirty registers
  *** DO NOT MERGE *** KVM: x86: Pretend that APX is supported on 64-bit
    kernels

 arch/x86/include/asm/kvm_host.h | 53 +++++++++++++++++++--------
 arch/x86/kvm/kvm_cache_regs.h   | 64 +++++++++++++++++++++++----------
 arch/x86/kvm/svm/sev.c          |  2 +-
 arch/x86/kvm/svm/svm.c          | 16 ++++-----
 arch/x86/kvm/svm/svm.h          |  2 +-
 arch/x86/kvm/vmx/nested.c       | 10 +++---
 arch/x86/kvm/vmx/tdx.c          | 36 +++++++++----------
 arch/x86/kvm/vmx/vmx.c          | 52 +++++++++++++--------------
 arch/x86/kvm/vmx/vmx.h          | 24 ++++++-------
 arch/x86/kvm/x86.c              | 20 +++++------
 10 files changed, 166 insertions(+), 113 deletions(-)


base-commit: 5128b972fb2801ad9aca54d990a75611ab5283a9
-- 
2.53.0.473.g4a7958ca14-goog

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Paolo Bonzini 3 weeks, 6 days ago

On 3/11/26 01:33, Sean Christopherson wrote:
> Clean up KVM's register tracking and storage in preparation for landing APX,
> which expands the maximum number of GPRs from 16 to 32.
> 
> This is kinda sorta an RFC, as there are some very opinionated changes.  I.e.
> if you dislike something, please speak up.
> 
> My thought is to treat R16-R31 as much like other GPRs as possible (though
> maybe we don't need to expand regs[] as sketched out in the last patch?).

The cleanups in patches 1-4 are nice.

For APX specifically, in abstract it's nice to treat R16-R31 as much as 
possible as regular GPRs.  On the other hand, the extra 16 regs[] 
entries would be more or less unused, the ugly switch statements 
wouldn't go away.  In other words, most of your remarks to Changseok's 
patches would remain...

Paolo

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 5 days, 2 hours ago

On Wed, Mar 11, 2026, Paolo Bonzini wrote:
> On 3/11/26 01:33, Sean Christopherson wrote:
> > Clean up KVM's register tracking and storage in preparation for landing APX,
> > which expands the maximum number of GPRs from 16 to 32.
> > 
> > This is kinda sorta an RFC, as there are some very opinionated changes.  I.e.
> > if you dislike something, please speak up.
> > 
> > My thought is to treat R16-R31 as much like other GPRs as possible (though
> > maybe we don't need to expand regs[] as sketched out in the last patch?).
> 
> The cleanups in patches 1-4 are nice.
> 
> For APX specifically, in abstract it's nice to treat R16-R31 as much as
> possible as regular GPRs.  On the other hand, the extra 16 regs[] entries
> would be more or less unused, the ugly switch statements wouldn't go away.

Hmm, yeah, but only if XSAVE is the source of truth for guest R16-R31.

Do we know what the compiler and/or kernel rules for using R16-R31 will be?
E.g. if C code is allowed to use R16-R31 at will, then KVM will either need to
swap R16-R31 in assembly, or annotate a pile of functions as "no_egpr" or
whatever.

At that point, my vote would be to use regs[] to track R16-R31 for KVM's purposes.
IIUC, we could largely ignore XSAVE state at runtime and just ensure R16-R31 are
copied to/from userspace as needed, same as we do for PKRU.

If R16-R31 aren't generally available for C code, then how exactly is APX going
to be used?

Understanding the usage rules for R16-R31 seems fundamental to figuring what to
do in KVM...

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Dave Hansen 4 days, 9 hours ago

On 4/2/26 16:19, Sean Christopherson wrote:
> Do we know what the compiler and/or kernel rules for using R16-R31 will be?
> E.g. if C code is allowed to use R16-R31 at will, then KVM will either need to
> swap R16-R31 in assembly, or annotate a pile of functions as "no_egpr" or
> whatever.

My _assumption_ is that the speedup from using the new GPRs as GPRs in
the kernel is going to be enough for us to support it. This is even
though those kernel binaries won't run on old hardware.

If I'm right, then we're going to have to handle the new GPRs just like
the existing ones and save them on kernel entry before we hit C code.
I'm not sure I want to be messing with XSAVE there. XSAVE requires
munging a header which means even if we used XSAVE we'd need to XSAVE
and then copy things over to pt_regs (assuming we continue using pt_regs).

That doesn't seem like loads of fun because we'll also need to copy out
to the XSAVE UABI spots, like PKRU times 32.

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 1 day, 9 hours ago

On Fri, Apr 03, 2026, Dave Hansen wrote:
> On 4/2/26 16:19, Sean Christopherson wrote:
> > Do we know what the compiler and/or kernel rules for using R16-R31 will be?
> > E.g. if C code is allowed to use R16-R31 at will, then KVM will either need to
> > swap R16-R31 in assembly, or annotate a pile of functions as "no_egpr" or
> > whatever.
> 
> My _assumption_ is that the speedup from using the new GPRs as GPRs in
> the kernel is going to be enough for us to support it. This is even
> though those kernel binaries won't run on old hardware.
> 
> If I'm right, then we're going to have to handle the new GPRs just like
> the existing ones and save them on kernel entry before we hit C code.

Ooof, one nasty wrinkle to prepare for is an NMI that arrives after VM-Exit on
Intel CPUs.  Unless Intel extends VMX to context switch XCR0 at VM-Entry/VM-Exit,
and/or provides GIF-like functionality (which would be awesome!), it will be
possible for an NMI to be taken with the guest's XCR0 loaded, i.e. with XCR0.APX=0
even when APX is fully enabled in the host.

> I'm not sure I want to be messing with XSAVE there. XSAVE requires
> munging a header which means even if we used XSAVE we'd need to XSAVE
> and then copy things over to pt_regs (assuming we continue using pt_regs).
> 
> That doesn't seem like loads of fun because we'll also need to copy out
> to the XSAVE UABI spots, like PKRU times 32.

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Paolo Bonzini 4 days, 9 hours ago

On 4/3/26 01:19, Sean Christopherson wrote:
> On Wed, Mar 11, 2026, Paolo Bonzini wrote:
>> On 3/11/26 01:33, Sean Christopherson wrote:
>>> Clean up KVM's register tracking and storage in preparation for landing APX,
>>> which expands the maximum number of GPRs from 16 to 32.
>>>
>>> This is kinda sorta an RFC, as there are some very opinionated changes.  I.e.
>>> if you dislike something, please speak up.
>>>
>>> My thought is to treat R16-R31 as much like other GPRs as possible (though
>>> maybe we don't need to expand regs[] as sketched out in the last patch?).
>>
>> The cleanups in patches 1-4 are nice.
>>
>> For APX specifically, in abstract it's nice to treat R16-R31 as much as
>> possible as regular GPRs.  On the other hand, the extra 16 regs[] entries
>> would be more or less unused, the ugly switch statements wouldn't go away.
> 
> Hmm, yeah, but only if XSAVE is the source of truth for guest R16-R31.
> 
> Do we know what the compiler and/or kernel rules for using R16-R31 will be?

On the compiler side, they are enabling APX only with new enough -march, 
or with -march=native if the host has APX. This, by the way, implies 
CONFIG_X86_NATIVE_CPU is currently hosed on Diamond Rapids and newer 
machines.

 From what I remember of Chang Seok Bae's presentation at Plumbers last 
year, right now there's no plan to have the kernel use APX, except 
possibly through the usual kernel_fpu_begin/end.

> E.g. if C code is allowed to use R16-R31 at will, then KVM will either need to
> swap R16-R31 in assembly, or annotate a pile of functions as "no_egpr" or
> whatever.

__attribute__((__target__("no-apxf")), yeah.  Perhaps it could be done 
(kernel-wide) for all noinstr functions, but I think we agree that it's 
not a great idea overall.

> At that point, my vote would be to use regs[] to track R16-R31 for KVM's purposes.
> IIUC, we could largely ignore XSAVE state at runtime and just ensure R16-R31 are
> copied to/from userspace as needed, same as we do for PKRU.

If it can be done efficiently when APX is not available, I suppose 
that's fine.  The KVM code is nicer for sure.

But until the kernel starts using APX, I would do the save/restore near 
kvm_load_xfeatures(), because __vmx_vcpu_run()/__svm_vcpu_run() would 
have to check whether xcr0.apx is set or not.  This is much simpler:

	// runs with host xcr0, so it can assume it includes APX
	// alternatively it could be a static_call(), to only invoke
	// the function if at least one guest enables APX in xcr0
	if (static_cpu_has(X86_FEATURE_APX))
		kvm_apx_save(vcpu);
	kvm_load_xfeatures(vcpu, true);
	...
	kvm_load_xfeatures(vcpu, false);
	// same as above
	if (static_cpu_has(X86_FEATURE_APX))
		kvm_apx_restore(vcpu);

Writing kvm_apx_save() and kvm_apx_restore() already now in a .S file is 
fine but I don't care too much.

If the kernel starts using APX we would have to do the xcr0 changes in 
two steps, one in kvm_load_xfeatures() and one (finalizing the APX bit) 
right around the assembly code for world switching.  That xcr0 update 
would have to be in kvm_apx_save/restore, which would have to 1) be 
rewritten in assembly 2) be called from within 
__vmx_vcpu_run/__svm_vcpu_run.

But it would be premature to do now the two-step save/restore of xcr0.

> If R16-R31 aren't generally available for C code, then how exactly is APX going
> to be used?

Only in userspace, at least for now.

Paolo

> Understanding the usage rules for R16-R31 seems fundamental to figuring what to
> do in KVM...
>

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Chang S. Bae 4 days, 3 hours ago

On 4/3/2026 9:03 AM, Paolo Bonzini wrote:
> 
> But until the kernel starts using APX, I would do the save/restore near 
> kvm_load_xfeatures(), because __vmx_vcpu_run()/__svm_vcpu_run() would 
> have to check whether xcr0.apx is set or not. 
Right, I'd much prefer this. Then, it requires to audit whether any 
fast-path handler could access EGPRs.

But there are cases with the new {RD|WR}MSR (MSR_IMM) instructions that 
appear to access GPRs. Because of this, the EGPR saving/restoring needs 
to happen earlier.

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Paolo Bonzini 3 days, 20 hours ago

On Sat, Apr 4, 2026 at 12:05 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 4/3/2026 9:03 AM, Paolo Bonzini wrote:
> >
> > But until the kernel starts using APX, I would do the save/restore near
> > kvm_load_xfeatures(), because __vmx_vcpu_run()/__svm_vcpu_run() would
> > have to check whether xcr0.apx is set or not.
> Right, I'd much prefer this. Then, it requires to audit whether any
> fast-path handler could access EGPRs.
>
> But there are cases with the new {RD|WR}MSR (MSR_IMM) instructions that
> appear to access GPRs. Because of this, the EGPR saving/restoring needs
> to happen earlier.

You're right about fast paths... so something like the attached patch.
It is not too bad to translate into assembly, where it could use
alternatives (in the same way as
RESTORE_GUEST_SPEC_CTRL/RESTORE_GUEST_SPEC_CTRL_BODY) in place of
static_cpu_has(). Maybe it's best to bite the bullet and do it
already...

Paolo

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 1 day, 10 hours ago

+Andrew

On Sat, Apr 04, 2026, Paolo Bonzini wrote:
> On Sat, Apr 4, 2026 at 12:05 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >
> > On 4/3/2026 9:03 AM, Paolo Bonzini wrote:
> > >
> > > But until the kernel starts using APX, I would do the save/restore near
> > > kvm_load_xfeatures(), because __vmx_vcpu_run()/__svm_vcpu_run() would
> > > have to check whether xcr0.apx is set or not.
> > Right, I'd much prefer this. Then, it requires to audit whether any
> > fast-path handler could access EGPRs.
> >
> > But there are cases with the new {RD|WR}MSR (MSR_IMM) instructions that
> > appear to access GPRs. Because of this, the EGPR saving/restoring needs
> > to happen earlier.
> 
> You're right about fast paths...

Ya, potential fastpath usage is why I wanted to just context switch around
entry/exit.

> so something like the attached patch.
> It is not too bad to translate into assembly, where it could use
> alternatives (in the same way as
> RESTORE_GUEST_SPEC_CTRL/RESTORE_GUEST_SPEC_CTRL_BODY) in place of
> static_cpu_has(). Maybe it's best to bite the bullet and do it
> already...

My strong vote is to context switch in assembly, but _conditionally_ context
switch R16-R31.  All of this started from Andrew's comment:

 : You can't unconditionally use PUSH2/POP2 in the VMExit, because at that
 : point in time it's the guest's XCR0 in context.  If the guest has APX
 : disabled, PUSH2 in the VMExit path will #UD.
 : 
 : You either need two VMExit handlers, one APX and one non-APX and choose
 : based on the guest XCR0 value, or you need a branch prior to regaining
 : speculative safety, or you need to save/restore XCR0 as the first
 : action.  It's horrible any way you look at it.

But that second paragraph isn't quite correct, at least not for KVM.  Specifically,
"need a branch prior to regaining speculative safety" isn't correct, as that holds
true if and only if "regaining speculative safety" requires executing code that
might access R16-R31.  If we massage __vmx_vcpu_run() to restore SPEC_CTRL in
assembly, same as __svm_vcpu_run(), then __{svm,vmx}_vcpu_run() can simply context
switch R16-R31 if and only if APX is enabled in XCR0.

KVM always intercepts XCR0 writes (when XCR0 isn't context switched by "harware",
i.e. ignoring SEV-ES+ and TDX guests), and IIUC all access to R16-R31 is gated on
XCR0.APX=1.  So unless I'm missing something (or hardware is flawed and lets the
guest speculative consume R16-R31, which would be sad), it's perfectly safe to
run the guest with host state in R16-R31.

That would avoid pointlessly context switching 16 registers when APX is not being
used by the guest, and would avoid having to write XCR0 in the fastpath.

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 959fcc01ee0f..9a1766037b6f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -887,6 +887,7 @@ struct kvm_vcpu_arch {
>  	struct fpu_guest guest_fpu;
>  
>  	u64 xcr0;
> +	u64 early_xcr0;

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0757b93e528d..69abfdd946dd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1220,9 +1220,13 @@ static void kvm_load_xfeatures(struct kvm_vcpu *vcpu, bool load_guest)
>  	if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE))
>  		return;
>  
> -	if (vcpu->arch.xcr0 != kvm_host.xcr0)
> +	/*
> +	 * Do not load the definitive XCR0 yet; vcpu->arch.early_xcr0 keeps
> +	 * APX enabled so that the kernel can move to and from r16...r31.
> +	 */
> +	if (vcpu->arch.early_xcr0 != kvm_host.xcr0)
>  		xsetbv(XCR_XFEATURE_ENABLED_MASK,
> -		       load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);
> +		       load_guest ? vcpu->arch.early_xcr0 : kvm_host.xcr0);

Even _if_ we want to play XCR0 games, tracking early_xcr0 is unnecessary.  This
can be:

	/*
	 * XCR0 is context switched around VM-Enter/VM-Exit if APX is enabled
	 * in the host but not in the guest.
	 */
	if (vcpu->arch.xcr0 != kvm_host.xcr0 &&
	    (!cpu_feature_enabled(X86_FEATURE_APX) ||
	     vcpu->arch.xcr0 & XFEATURE_MASK_APX))
		xsetbv(XCR_XFEATURE_ENABLED_MASK,
		       load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);

And then __kvm_load_guest_apx()

	<context switch R16-R31>

	if (cpu_feature_enabled(X86_FEATURE_APX) &&
	    !(vcpu->arch.xcr0 & & XFEATURE_MASK_APX))
		xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);

And __kvm_save_guest_apx() would reverse the order of __kvm_load_guest_apx().

> @@ -11056,6 +11061,49 @@ static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>  	kvm_x86_call(set_apic_access_page_addr)(vcpu);
>  }
>  
> +/*
> + * Assuming the kernel does not use APX for now.  When
> + * the kernel starts using APX this needs to move into
> + * assembly, and KVM_GET/SET_XSAVE needs to fill in
> + * EGPRs from vcpu->arch.regs.
> + */
> +void __kvm_load_guest_apx(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.early_xcr0 != vcpu->arch.xcr0)
> +		xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);

This is wrong.  The "real" xcr0 needs to be loaded *after* accessing R16+.

> +	if (!(vcpu->arch.xcr0 & XFEATURE_MASK_APX))
> +		return;
> +
> +	WARN_ON_ONCE(!irqs_disabled());
> +
> +	asm("mov %[r16], %%r16\n"
> +	    "mov %[r17], %%r17\n" // ...
> +	    : : [r16] "m" (vcpu->arch.regs[16]),
> +	        [r17] "m" (vcpu->arch.regs[17]));
> +}

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Paolo Bonzini 1 day, 3 hours ago

Il lun 6 apr 2026, 17:28 Sean Christopherson <seanjc@google.com> ha scritto:
> > You're right about fast paths...
>
> Ya, potential fastpath usage is why I wanted to just context switch around
> entry/exit.
>
> > so something like the attached patch.
> > It is not too bad to translate into assembly, where it could use
> > alternatives (in the same way as
> > RESTORE_GUEST_SPEC_CTRL/RESTORE_GUEST_SPEC_CTRL_BODY) in place of
> > static_cpu_has(). Maybe it's best to bite the bullet and do it
> > already...
>
> My strong vote is to context switch in assembly, but _conditionally_ context
> switch R16-R31.
>
> But that second paragraph isn't quite correct, at least not for KVM.  Specifically,
> "need a branch prior to regaining speculative safety" isn't correct, as that holds
> true if and only if "regaining speculative safety" requires executing code that
> might access R16-R31.  If we massage __vmx_vcpu_run() to restore SPEC_CTRL in
> assembly, same as __svm_vcpu_run(), then __{svm,vmx}_vcpu_run() can simply context
> switch R16-R31 if and only if APX is enabled in XCR0.

I might even have patches for that lying around (the SPEC_CTRL part).

> KVM always intercepts XCR0 writes (when XCR0 isn't context switched by "hardware",
> i.e. ignoring SEV-ES+ and TDX guests), and IIUC all access to R16-R31 is gated on
> XCR0.APX=1

Right, fortunately.

> .  So unless I'm missing something (or hardware is flawed and lets the
> guest speculative consume R16-R31, which would be sad), it's perfectly safe to
> run the guest with host state in R16-R31.
>
> That would avoid pointlessly context switching 16 registers when APX is not being
> used by the guest, and would avoid having to write XCR0 in the fastpath.

For now yes, but once/if the kernel starts using the registers there's
no way out of writing XCR0 for APX-disabled guests in the fast path.

If we ignore that, we can keep guest XCR0 all the time for now, and
that would be:
- move SPEC_CTRL to assembly
- not changing XCR0 handling at all
- use XCR0 in addition to just static_cpu_has(X86_FEATURE_APX) to make
r16-r31 swap conditional

> > -     if (vcpu->arch.xcr0 != kvm_host.xcr0)
> > +     /*
> > +      * Do not load the definitive XCR0 yet; vcpu->arch.early_xcr0 keeps
> > +      * APX enabled so that the kernel can move to and from r16...r31.
> > +      */
> > +     if (vcpu->arch.early_xcr0 != kvm_host.xcr0)
> >               xsetbv(XCR_XFEATURE_ENABLED_MASK,
> > -                    load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);
> > +                    load_guest ? vcpu->arch.early_xcr0 : kvm_host.xcr0);
>
> Even _if_ we want to play XCR0 games,

(which depends on whether we want to be ready for kernel usage of APX, right?)

> tracking early_xcr0 is unnecessary.  This can be:
>
>         /*
>          * XCR0 is context switched around VM-Enter/VM-Exit if APX is enabled
>          * in the host but not in the guest.
>          */
>         if (vcpu->arch.xcr0 != kvm_host.xcr0 &&
>             (!cpu_feature_enabled(X86_FEATURE_APX) ||
>              vcpu->arch.xcr0 & XFEATURE_MASK_APX))
>                 xsetbv(XCR_XFEATURE_ENABLED_MASK,
>                        load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);

This is a bit more complex however, because in the end early_xcr0 is
precomputing the same conditions and optimizations. For example...

> > +void __kvm_load_guest_apx(struct kvm_vcpu *vcpu)
> > +{
> > +     if (vcpu->arch.early_xcr0 != vcpu->arch.xcr0)
> > +             xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);
>
> This is wrong.  The "real" xcr0 needs to be loaded *after* accessing R16+.

... this is actually the same optimization you mention above: the real
xcr0 only needs to be loaded if APX is off in the guest, and in that
case you don't need to load r16-r31. So you can load xcr0 first, and
then any components (for now only APX) that need to be swapped.

> > +     if (!(vcpu->arch.xcr0 & XFEATURE_MASK_APX))
> > +             return;

... Because the loads are conditional on APX being enabled in the real xcr0.

Paolo

> > +
> > +     WARN_ON_ONCE(!irqs_disabled());
> > +
> > +     asm("mov %[r16], %%r16\n"
> > +         "mov %[r17], %%r17\n" // ...
> > +         : : [r16] "m" (vcpu->arch.regs[16]),
> > +             [r17] "m" (vcpu->arch.regs[17]));
> > +}
>

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 1 day, 3 hours ago

On Mon, Apr 06, 2026, Paolo Bonzini wrote:
> Il lun 6 apr 2026, 17:28 Sean Christopherson <seanjc@google.com> ha scritto:
> > > You're right about fast paths...
> >
> > Ya, potential fastpath usage is why I wanted to just context switch around
> > entry/exit.
> >
> > > so something like the attached patch.
> > > It is not too bad to translate into assembly, where it could use
> > > alternatives (in the same way as
> > > RESTORE_GUEST_SPEC_CTRL/RESTORE_GUEST_SPEC_CTRL_BODY) in place of
> > > static_cpu_has(). Maybe it's best to bite the bullet and do it
> > > already...
> >
> > My strong vote is to context switch in assembly, but _conditionally_ context
> > switch R16-R31.
> >
> > But that second paragraph isn't quite correct, at least not for KVM.  Specifically,
> > "need a branch prior to regaining speculative safety" isn't correct, as that holds
> > true if and only if "regaining speculative safety" requires executing code that
> > might access R16-R31.  If we massage __vmx_vcpu_run() to restore SPEC_CTRL in
> > assembly, same as __svm_vcpu_run(), then __{svm,vmx}_vcpu_run() can simply context
> > switch R16-R31 if and only if APX is enabled in XCR0.
> 
> I might even have patches for that lying around (the SPEC_CTRL part).
> 
> > KVM always intercepts XCR0 writes (when XCR0 isn't context switched by "hardware",
> > i.e. ignoring SEV-ES+ and TDX guests), and IIUC all access to R16-R31 is gated on
> > XCR0.APX=1
> 
> Right, fortunately.
> 
> > .  So unless I'm missing something (or hardware is flawed and lets the
> > guest speculative consume R16-R31, which would be sad), it's perfectly safe to
> > run the guest with host state in R16-R31.
> >
> > That would avoid pointlessly context switching 16 registers when APX is not being
> > used by the guest, and would avoid having to write XCR0 in the fastpath.
> 
> For now yes, but once/if the kernel starts using the registers there's
> no way out of writing XCR0 for APX-disabled guests in the fast path.

Why's that?  So long as KVM uses vcpu->arch.regs[R16-R31] as the source of truth
when emulating anything, there's no danger of taking a #UD in the host due to
accessing R16-R31 with XCR0.APX=0.  There's not even any danger of consuming stale
guest state, e.g. in case KVM screws up accesses R16-R31 instead of generating #UD,
as the value in regs[] will still be the guest's last written value.

If we wanted be paranoid, we could add sanity checks to ensure R16-R31 don't show
up in hardware-provided informational fields, but to some extent that's orthogonal
to how KVM maintains guest values.

> If we ignore that, we can keep guest XCR0 all the time for now, and
> that would be:
> - move SPEC_CTRL to assembly
> - not changing XCR0 handling at all
> - use XCR0 in addition to just static_cpu_has(X86_FEATURE_APX) to make
> r16-r31 swap conditional
> 
> > > -     if (vcpu->arch.xcr0 != kvm_host.xcr0)
> > > +     /*
> > > +      * Do not load the definitive XCR0 yet; vcpu->arch.early_xcr0 keeps
> > > +      * APX enabled so that the kernel can move to and from r16...r31.
> > > +      */
> > > +     if (vcpu->arch.early_xcr0 != kvm_host.xcr0)
> > >               xsetbv(XCR_XFEATURE_ENABLED_MASK,
> > > -                    load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);
> > > +                    load_guest ? vcpu->arch.early_xcr0 : kvm_host.xcr0);
> >
> > Even _if_ we want to play XCR0 games,
> 
> (which depends on whether we want to be ready for kernel usage of APX, right?)

No?

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Paolo Bonzini 18 hours ago

Il mar 7 apr 2026, 00:00 Sean Christopherson <seanjc@google.com> ha scritto:
>
> > > .  So unless I'm missing something (or hardware is flawed and lets the
> > > guest speculative consume R16-R31, which would be sad), it's perfectly safe to
> > > run the guest with host state in R16-R31.
> > >
> > > That would avoid pointlessly context switching 16 registers when APX is not being
> > > used by the guest, and would avoid having to write XCR0 in the fastpath.
> >
> > For now yes, but once/if the kernel starts using the registers there's
> > no way out of writing XCR0 for APX-disabled guests in the fast path.
>
> Why's that?  So long as KVM uses vcpu->arch.regs[R16-R31] as the source of truth
> when emulating anything, there's no danger of taking a #UD in the host due to
> accessing R16-R31 with XCR0.APX=0.

Yes I agree with that. But the unavoidable part is the XSETBV because
only the assembly code can run with XCR0.APX=0. As soon as you go back
to C, including during the fast path, you have to ensure XCR0.APX=1
again if the kernel is compiled with -mapxf.

For now, I agree that early_xcr0 isn't needed and you can run all the
time with XCR0.APX=0.

Paolo

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 12 hours ago

On Tue, Apr 07, 2026, Paolo Bonzini wrote:
> Il mar 7 apr 2026, 00:00 Sean Christopherson <seanjc@google.com> ha scritto:
> >
> > > > .  So unless I'm missing something (or hardware is flawed and lets the
> > > > guest speculative consume R16-R31, which would be sad), it's perfectly safe to
> > > > run the guest with host state in R16-R31.
> > > >
> > > > That would avoid pointlessly context switching 16 registers when APX is not being
> > > > used by the guest, and would avoid having to write XCR0 in the fastpath.
> > >
> > > For now yes, but once/if the kernel starts using the registers there's
> > > no way out of writing XCR0 for APX-disabled guests in the fast path.
> >
> > Why's that?  So long as KVM uses vcpu->arch.regs[R16-R31] as the source of truth
> > when emulating anything, there's no danger of taking a #UD in the host due to
> > accessing R16-R31 with XCR0.APX=0.
> 
> Yes I agree with that. But the unavoidable part is the XSETBV because
> only the assembly code can run with XCR0.APX=0. As soon as you go back
> to C, including during the fast path, you have to ensure XCR0.APX=1
> again if the kernel is compiled with -mapxf.

/facepalm

I got so focused on register state that I completely forgot about actually
using the registers...

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Chang S. Bae 3 weeks, 5 days ago

On 3/11/2026 12:01 PM, Paolo Bonzini wrote:
> 
>   On the other hand, the extra 16 regs[] 
> entries would be more or less unused, the ugly switch statements 
> wouldn't go away.  In other words, most of your remarks to Changseok's 
> patches would remain...

I think so...

If the host kernel ever starts using EGPRs, the state would need to be 
switched in the entry code. At that point, they would likely be saved 
somewhere other than XSAVE buffer. In turn, the guest state would also 
need to be saved to regs[] on VM exit.

However, that is sort of what-if scenarios at best. The host kernel 
still manages EGPR context switching through XSAVE. Saving EGPRs into 
regs[] would introduce an oddity to synchronize between two buffers: 
regs[] and gfpu->fpstate, which looks like unnecessary complexity.

So while ugly, the switch statements are a bit of a trade-off here. Also 
bits 16-31 in the extended regs_avail will remain unset with APX=y.

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 3 weeks, 5 days ago

On Thu, Mar 12, 2026, Chang S. Bae wrote:
> On 3/11/2026 12:01 PM, Paolo Bonzini wrote:
> > 
> >   On the other hand, the extra 16 regs[] entries would be more or less
> > unused, the ugly switch statements wouldn't go away.  In other words,
> > most of your remarks to Changseok's patches would remain...
> 
> I think so...
> 
> If the host kernel ever starts using EGPRs, the state would need to be
> switched in the entry code. At that point, they would likely be saved
> somewhere other than XSAVE buffer. In turn, the guest state would also need
> to be saved to regs[] on VM exit.
> 
> However, that is sort of what-if scenarios at best. The host kernel still
> manages EGPR context switching through XSAVE. Saving EGPRs into regs[] would
> introduce an oddity to synchronize between two buffers: regs[] and
> gfpu->fpstate, which looks like unnecessary complexity.
> 
> So while ugly, the switch statements are a bit of a trade-off here. Also
> bits 16-31 in the extended regs_avail will remain unset with APX=y.

Have you measured performance/latency overhead if KVM goes straight to context
switching R16-R31 at entry/exit?  With PUSH2/POP2, it's "only" 8 more instructions
on each side.

If the overhead is in the noise, I'd be very strongly inclined to say KVM should
swap at entry/exit regardless of kernel behavior so that we don't have to special
case accesses on the back end.

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Chang S. Bae 1 week, 6 days ago

On 3/12/2026 10:47 AM, Sean Christopherson wrote:
> On Thu, Mar 12, 2026, Chang S. Bae wrote:
>>
>> However, that is sort of what-if scenarios at best. The host kernel still
>> manages EGPR context switching through XSAVE. Saving EGPRs into regs[] would
>> introduce an oddity to synchronize between two buffers: regs[] and
>> gfpu->fpstate, which looks like unnecessary complexity.

No, this looks ugly. If guest EGPR state is saved in vcpu->arch.regs[], 
the APX area there isn't necessary:

When the KVM API exposes state in XSAVE format, the frontend can handle 
this separately. Alongside uABI <-> guest fpstate copy functions, new 
copy functions may deal with the state between uABI <-> VCPU cache.

Further, one could think of exclusion as such:

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 76153dfb58c9..5404f9399eea 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -794,9 +794,10 @@ static u64 __init guest_default_mask(void)
{
	/*
	 * Exclude dynamic features, which require userspace opt-in even
-	 * for KVM guests.
+	 * for KVM guests, and APX as extended general-purpose register
+	 * states are saved in the KVM cache separately.
	 */
-	return ~(u64)XFEATURE_MASK_USER_DYNAMIC;
+	return ~((u64)XFEATURE_MASK_USER_DYNAMIC | XFEATURE_MASK_APX);
}

But this default bitmask feeds into the permission bits:

	fpu->guest_perm.__state_perm    = guest_default_cfg.features;
	fpu->guest_perm.__state_size    = guest_default_cfg.size;

This policy looks clear and sensible: permission is granted only if 
space is reserved to save the state. If there is a strong desire to save 
memory, I think it should go through a more thorough review to revisit 
this policy.

> Have you measured performance/latency overhead if KVM goes straight to context
> switching R16-R31 at entry/exit?  With PUSH2/POP2, it's "only" 8 more instructions
> on each side.

Yup, when I check a prototype in the lab, it appears to be in the noise, 
with less than 1% overall variance.

> If the overhead is in the noise, I'd be very strongly inclined to say KVM should
> swap at entry/exit regardless of kernel behavior so that we don't have to special
> case accesses on the back end.

Note: The hardware request discussed looks to be on-going. I don't know 
the decision yet. But at least for now let me add you to the off-list 
thread for your info.

Right now, I think the entry path may live with guest XCR0 in this 
regard. Since XSETBV is trapped/emulated, the shadow XCR0 remains in 
sync. The entry function can take an additional flag reflecting guest 
XCR0.APX, and gate EGPR access accordingly.

Then, it looks to keep the behavior aligned with the architecture:
   * On initial enable, EGPRs are zeroed on entry following XSETBV exit
   * If APX is disabled and later re-enabled, regs[] retains the state
     while XCR0.APX=0 and restores it when returning from the re-enabling
     XSETBV exit.

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 5 days, 2 hours ago

On Wed, Mar 25, 2026, Chang S. Bae wrote:
> On 3/12/2026 10:47 AM, Sean Christopherson wrote:
> > On Thu, Mar 12, 2026, Chang S. Bae wrote:
> > > 
> > > However, that is sort of what-if scenarios at best. The host kernel still
> > > manages EGPR context switching through XSAVE. Saving EGPRs into regs[] would
> > > introduce an oddity to synchronize between two buffers: regs[] and
> > > gfpu->fpstate, which looks like unnecessary complexity.
> 
> No, this looks ugly. 

Sorry, you lost me.  What looks ugly?

> If guest EGPR state is saved in vcpu->arch.regs[], the APX area there isn't
> necessary:
> 
> When the KVM API exposes state in XSAVE format, the frontend can handle this
> separately. Alongside uABI <-> guest fpstate copy functions, new copy
> functions may deal with the state between uABI <-> VCPU cache.
> 
> Further, one could think of exclusion as such:
> 
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 76153dfb58c9..5404f9399eea 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -794,9 +794,10 @@ static u64 __init guest_default_mask(void)
> {
> 	/*
> 	 * Exclude dynamic features, which require userspace opt-in even
> -	 * for KVM guests.
> +	 * for KVM guests, and APX as extended general-purpose register
> +	 * states are saved in the KVM cache separately.
> 	 */
> -	return ~(u64)XFEATURE_MASK_USER_DYNAMIC;
> +	return ~((u64)XFEATURE_MASK_USER_DYNAMIC | XFEATURE_MASK_APX);
> }
> 
> But this default bitmask feeds into the permission bits:
> 
> 	fpu->guest_perm.__state_perm    = guest_default_cfg.features;
> 	fpu->guest_perm.__state_size    = guest_default_cfg.size;
> 
> This policy looks clear and sensible: permission is granted only if space is
> reserved to save the state. If there is a strong desire to save memory, I
> think it should go through a more thorough review to revisit this policy.

And I'm lost again.

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Chang S. Bae 5 days, 1 hour ago

On 4/2/2026 4:07 PM, Sean Christopherson wrote:
> On Wed, Mar 25, 2026, Chang S. Bae wrote:
>> On 3/12/2026 10:47 AM, Sean Christopherson wrote:
>>> On Thu, Mar 12, 2026, Chang S. Bae wrote:
>>>>
>>>> However, that is sort of what-if scenarios at best. The host kernel still
>>>> manages EGPR context switching through XSAVE. Saving EGPRs into regs[] would
>>>> introduce an oddity to synchronize between two buffers: regs[] and
>>>> gfpu->fpstate, which looks like unnecessary complexity.
>>
>> No, this looks ugly.
> 
> Sorry, you lost me.  What looks ugly?

Oh, this is against my comment above. Keeping regs[] <-> guest fpstate 
in sync will be unnecessarily complex without clear usage (continues below).

>> If guest EGPR state is saved in vcpu->arch.regs[], the APX area there isn't
>> necessary:
>>
>> When the KVM API exposes state in XSAVE format, the frontend can handle this
>> separately. Alongside uABI <-> guest fpstate copy functions, new copy
>> functions may deal with the state between uABI <-> VCPU cache.
>>
>> Further, one could think of exclusion as such:
>>
>> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>> index 76153dfb58c9..5404f9399eea 100644
>> --- a/arch/x86/kernel/fpu/xstate.c
>> +++ b/arch/x86/kernel/fpu/xstate.c
>> @@ -794,9 +794,10 @@ static u64 __init guest_default_mask(void)
>> {
>> 	/*
>> 	 * Exclude dynamic features, which require userspace opt-in even
>> -	 * for KVM guests.
>> +	 * for KVM guests, and APX as extended general-purpose register
>> +	 * states are saved in the KVM cache separately.
>> 	 */
>> -	return ~(u64)XFEATURE_MASK_USER_DYNAMIC;
>> +	return ~((u64)XFEATURE_MASK_USER_DYNAMIC | XFEATURE_MASK_APX);
>> }
>>
>> But this default bitmask feeds into the permission bits:
>>
>> 	fpu->guest_perm.__state_perm    = guest_default_cfg.features;
>> 	fpu->guest_perm.__state_size    = guest_default_cfg.size;
>>
>> This policy looks clear and sensible: permission is granted only if space is
>> reserved to save the state. If there is a strong desire to save memory, I
>> think it should go through a more thorough review to revisit this policy.
> 
> And I'm lost again.

Here I made myself pursuing the approach saving/restoring EGPRs via 
regs[] on VM entry/exit. Then a couple of follow-up questions:

   1. What about APX area in guest fpstate?
   2. How to support the state for KVM ABI?

It surely departs from the "XSAVE - the single source of truth" model. 
Then,

   Leave the APX area in guest fpstate unused.

   Copying APX state directly between regs[] and uABI to preserve XSAVE-
   based ABI like in the attached diff.

That's all I'm saying.diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fffbf087937d..b3ab2ac827e6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -59,6 +59,16 @@ void __init kvm_init_xstate_sizes(void)
 	}
 }
 
+u32 xstate_size(unsigned int xfeature)
+{
+	return xstate_sizes[xfeature].eax;
+}
+
+u32 xstate_offset(unsigned int xfeature)
+{
+	return xstate_sizes[xfeature].ebx;
+}
+
 u32 xstate_required_size(u64 xstate_bv, bool compacted)
 {
 	u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 039b8e6f40ba..5ace99dd152b 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -64,6 +64,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 
 void __init kvm_init_xstate_sizes(void);
 u32 xstate_required_size(u64 xstate_bv, bool compacted);
+u32 xstate_size(unsigned int xfeature);
+u32 xstate_offset(unsigned int xfeature);
 
 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
 int cpuid_query_maxguestphyaddr(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c1e1b3030786..1f064a32b8b7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -108,6 +108,12 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host);
 #define emul_to_vcpu(ctxt) \
 	((struct kvm_vcpu *)(ctxt)->vcpu)
 
+#ifdef CONFIG_KVM_APX
+#define VCPU_EGPRS_PTR(vcpu)   &(vcpu)->arch.regs[VCPU_REGS_R16]
+#else
+#define VCPU_EGPRS_PTR(vcpu)   NULL
+#endif
+
 /* EFER defaults:
  * - enable syscall per default because its emulated by KVM
  * - enable LME and LMA per default on 64 bit KVM
@@ -5804,10 +5810,33 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+static void kvm_copy_vcpu_regs_to_uabi(struct kvm_vcpu *vcpu, struct kvm_xsave *uabi_xsave)
+{
+	union fpregs_state *xstate = (union fpregs_state *)uabi_xsave->region;
+	void *uabi_apx = (void*)uabi_xsave->region + xstate_offset(XFEATURE_APX);
+	void *vcpu_egprs = VCPU_EGPRS_PTR(vcpu);
+
+	if (!vcpu_egprs)
+		return;
 
-static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
-					 u8 *state, unsigned int size)
+	memcpy(uabi_apx, vcpu_egprs, xstate_size(XFEATURE_APX));
+	xstate->xsave.header.xfeatures |= XFEATURE_MASK_APX;
+}
+
+static void kvm_copy_uabi_to_vcpu_regs(struct kvm_vcpu *vcpu, struct kvm_xsave *uabi_xsave)
 {
+	union fpregs_state *xstate = (union fpregs_state *)uabi_xsave->region;
+	void *uabi_apx = (void*)uabi_xsave->region + xstate_offset(XFEATURE_APX);
+	void *vcpu_egprs = VCPU_EGPRS_PTR(vcpu);
+
+	if (vcpu_egprs && xstate->xsave.header.xfeatures & XFEATURE_MASK_APX)
+		memcpy(vcpu_egprs, uabi_apx, xstate_size(XFEATURE_APX));
+}
+
+static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave,
+					 unsigned int size)
+{
+
 	/*
 	 * Only copy state for features that are enabled for the guest.  The
 	 * state itself isn't problematic, but setting bits in the header for
@@ -5826,15 +5855,23 @@ static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
 	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
 		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
 
-	fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size,
+	/*
+	 * The generic XSAVE copy function zeros out areas not present in
+	 * guest fpstate. Those not in fpstate but in somewhere else,
+	 * like EGPRs, should be copied after this.
+	 */
+	fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, guest_xsave->region, size,
 				       supported_xcr0, vcpu->arch.pkru);
+
+	kvm_copy_vcpu_regs_to_uabi(vcpu, guest_xsave);
+
 	return 0;
 }
 
 static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 					struct kvm_xsave *guest_xsave)
 {
-	return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region,
+	return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, guest_xsave,
 					     sizeof(guest_xsave->region));
 }
 
@@ -5853,6 +5890,8 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 	 */
 	xstate->xsave.header.xfeatures &= ~vcpu->arch.guest_fpu.fpstate->xfd;
 
+	kvm_copy_uabi_to_vcpu_regs(vcpu, guest_xsave);
+
 	return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
 					      guest_xsave->region,
 					      kvm_caps.supported_xcr0,
@@ -6464,7 +6503,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		if (!u.xsave)
 			break;
 
-		r = kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
+		r = kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.xsave, size);
 		if (r < 0)
 			break;
 
-- 
2.51.0

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Andrew Cooper 3 weeks, 5 days ago

> Have you measured performance/latency overhead if KVM goes straight to context
> switching R16-R31 at entry/exit?  With PUSH2/POP2, it's "only" 8 more instructions
> on each side.
>
> If the overhead is in the noise, I'd be very strongly inclined to say KVM should
> swap at entry/exit regardless of kernel behavior so that we don't have to special
> case accesses on the back end.

I tried raising this point at plumbers but I don't think it came through
well.

You can't unconditionally use PUSH2/POP2 in the VMExit, because at that
point in time it's the guest's XCR0 in context.  If the guest has APX
disabled, PUSH2 in the VMExit path will #UD.

You either need two VMExit handlers, one APX and one non-APX and choose
based on the guest XCR0 value, or you need a branch prior to regaining
speculative safety, or you need to save/restore XCR0 as the first
action.  It's horrible any way you look at it.

I've asked both Intel and AMD for changes to VT-x/SVM to have a proper
host/guest split of XCR0 which hardware manages on entry/exit.  It's the
only viable option in my opinion, but it's still an unknown period of
time away and not going to exist in the first APX-capable hardware.

~Andrew

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Sean Christopherson 3 weeks, 5 days ago

On Thu, Mar 12, 2026, Andrew Cooper wrote:
> > Have you measured performance/latency overhead if KVM goes straight to context
> > switching R16-R31 at entry/exit?  With PUSH2/POP2, it's "only" 8 more instructions
> > on each side.
> >
> > If the overhead is in the noise, I'd be very strongly inclined to say KVM should
> > swap at entry/exit regardless of kernel behavior so that we don't have to special
> > case accesses on the back end.
> 
> I tried raising this point at plumbers but I don't think it came through
> well.
> 
> You can't unconditionally use PUSH2/POP2 in the VMExit, because at that
> point in time it's the guest's XCR0 in context.  If the guest has APX
> disabled, PUSH2 in the VMExit path will #UD.

Oh good gravy, so that's what the spec means by "inherited XCR0-sensitivity".

> You either need two VMExit handlers, one APX and one non-APX and choose
> based on the guest XCR0 value, or you need a branch prior to regaining
> speculative safety, or you need to save/restore XCR0 as the first
> action.  It's horrible any way you look at it.

Yeah, no kidding.  And now that KVM loads host XCR0 outside of the fastpath,
moving it back in just to load APX registers and take on all that complexity
makes zero sense.

> I've asked both Intel and AMD for changes to VT-x/SVM to have a proper
> host/guest split of XCR0 which hardware manages on entry/exit.  It's the
> only viable option in my opinion, but it's still an unknown period of
> time away and not going to exist in the first APX-capable hardware.

+1, especially hardware already swaps XCR0 for SEV-ES+ guests.

Thanks Andy!

Re: [PATCH 0/7] KVM: x86: APX reg prep work

Posted by Andrew Cooper 3 weeks, 5 days ago

On 12/03/2026 6:29 pm, Sean Christopherson wrote:
> On Thu, Mar 12, 2026, Andrew Cooper wrote:
>>> Have you measured performance/latency overhead if KVM goes straight to context
>>> switching R16-R31 at entry/exit?  With PUSH2/POP2, it's "only" 8 more instructions
>>> on each side.
>>>
>>> If the overhead is in the noise, I'd be very strongly inclined to say KVM should
>>> swap at entry/exit regardless of kernel behavior so that we don't have to special
>>> case accesses on the back end.
>> I tried raising this point at plumbers but I don't think it came through
>> well.
>>
>> You can't unconditionally use PUSH2/POP2 in the VMExit, because at that
>> point in time it's the guest's XCR0 in context.  If the guest has APX
>> disabled, PUSH2 in the VMExit path will #UD.
> Oh good gravy, so that's what the spec means by "inherited XCR0-sensitivity".
>
>> You either need two VMExit handlers, one APX and one non-APX and choose
>> based on the guest XCR0 value, or you need a branch prior to regaining
>> speculative safety, or you need to save/restore XCR0 as the first
>> action.  It's horrible any way you look at it.
> Yeah, no kidding.  And now that KVM loads host XCR0 outside of the fastpath,
> moving it back in just to load APX registers and take on all that complexity
> makes zero sense.
>
>> I've asked both Intel and AMD for changes to VT-x/SVM to have a proper
>> host/guest split of XCR0 which hardware manages on entry/exit.  It's the
>> only viable option in my opinion, but it's still an unknown period of
>> time away and not going to exist in the first APX-capable hardware.
> +1, especially hardware already swaps XCR0 for SEV-ES+ guests.
>
> Thanks Andy!

To be clear, I've got tumbleweeds from one, and "oh yeah, we'll think
about that" from the other.  Some extra requests for this would not go
amiss.

~Andrew