[v2] x86, fpu/kvm: fix crash with AMX

[PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Paolo Bonzini 1 month, 1 week ago

Fix a possible host panic, due to an unexpected #NM, when a KVM guest
is using AMX features.

The guest's XFD value, which is stored in fpstate->xfd, is used for both
guest execution and host XSAVE operations.  However, the guest-configured
XFD setting can disable features that were enabled when the guest executed
XSAVE, and this causes a #NM when executing XRSTOR on the guest FPU state.

This can happen in two cases: due to a KVM_SET_XSAVE that includes a
disabled component, or if an interrupt causes XSAVE to be executed
before the call to fpu_update_guest_xfd().

The first patch fixes both cases, the rest is improvements to selftests
in order to cover this test and also verify that #NM faults are injected
corectly.

v1 had extra patches to export higher-level functions for KVM in place
of switch_fpu_return() and fpregs_assert_state_consistent().  Those
were part of refactoring how KVM loaded guest state when KVM_RUN is
issued, but are not needed anymore with this v2 fix and I will submit
them separately.

Tested on a Sapphire Rapids machine, reviews and acks are welcome so
that I can submit it to Linus via the KVM tree.

Paolo



Paolo Bonzini (2):
  selftests: kvm: replace numbered sync points with actions
  selftests: kvm: try getting XFD and XSAVE state out of sync

Sean Christopherson (2):
  x86/fpu: Clear XSTATE_BV[i] in save state whenever XFD[i]=1
  selftests: kvm: Verify TILELOADD actually #NM faults when XFD[18]=1

 arch/x86/kernel/fpu/core.c                 |  32 ++++-
 arch/x86/kvm/x86.c                         |   9 ++
 tools/testing/selftests/kvm/x86/amx_test.c | 144 ++++++++++++---------
 3 files changed, 123 insertions(+), 62 deletions(-)

-- 
2.52.0

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Sean Christopherson 1 month ago

On Thu, Jan 01, 2026, Paolo Bonzini wrote:
> Fix a possible host panic, due to an unexpected #NM, when a KVM guest
> is using AMX features.
> 
> The guest's XFD value, which is stored in fpstate->xfd, is used for both
> guest execution and host XSAVE operations.  However, the guest-configured
> XFD setting can disable features that were enabled when the guest executed
> XSAVE, and this causes a #NM when executing XRSTOR on the guest FPU state.
> 
> This can happen in two cases: due to a KVM_SET_XSAVE that includes a
> disabled component, or if an interrupt causes XSAVE to be executed
> before the call to fpu_update_guest_xfd().
> 
> The first patch fixes both cases, the rest is improvements to selftests
> in order to cover this test and also verify that #NM faults are injected
> corectly.
> 
> v1 had extra patches to export higher-level functions for KVM in place
> of switch_fpu_return() and fpregs_assert_state_consistent().  Those
> were part of refactoring how KVM loaded guest state when KVM_RUN is
> issued, but are not needed anymore with this v2 fix and I will submit
> them separately.
> 
> Tested on a Sapphire Rapids machine, reviews and acks are welcome so
> that I can submit it to Linus via the KVM tree.

Tested on EMR with with my simulated IRQ hack.  Other than ongoing complaints
about the prints in the selftest, LGTM :-)

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Borislav Petkov 3 weeks, 1 day ago

On Thu, Jan 01, 2026 at 10:05:12AM +0100, Paolo Bonzini wrote:
> Fix a possible host panic, due to an unexpected #NM, when a KVM guest
> is using AMX features.
> 
> The guest's XFD value, which is stored in fpstate->xfd, is used for both
> guest execution and host XSAVE operations. 

This already sounds weird. Why?

Why don't we carry separate XFD copies - guest and host - which we use for the
guest and the host, respectively?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Paolo Bonzini 3 weeks, 1 day ago

Il gio 15 gen 2026, 13:22 Borislav Petkov <bp@alien8.de> ha scritto:
>
> On Thu, Jan 01, 2026 at 10:05:12AM +0100, Paolo Bonzini wrote:
> > Fix a possible host panic, due to an unexpected #NM, when a KVM guest
> > is using AMX features.
> >
> > The guest's XFD value, which is stored in fpstate->xfd, is used for both
> > guest execution and host XSAVE operations.
>
> This already sounds weird. Why?

Because the state of disabled components is undefined anyway. There's
no point in making all host XSAVEs more expensive, even when the TMM
registers aren't in use by the guest (which is going to be most of the
time, likely).

> Why don't we carry separate XFD copies - guest and host - which we use for the
> guest and the host, respectively?

That was exactly what I did in v1, but it's more code and less efficient too.

Paolo

>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Sean Christopherson 3 weeks, 1 day ago

On Thu, Jan 15, 2026, Paolo Bonzini wrote:
> Il gio 15 gen 2026, 13:22 Borislav Petkov <bp@alien8.de> ha scritto:
> >
> > On Thu, Jan 01, 2026 at 10:05:12AM +0100, Paolo Bonzini wrote:
> > > Fix a possible host panic, due to an unexpected #NM, when a KVM guest
> > > is using AMX features.
> > >
> > > The guest's XFD value, which is stored in fpstate->xfd, is used for both
> > > guest execution and host XSAVE operations.
> >
> > This already sounds weird. Why?
> 
> Because the state of disabled components is undefined anyway. There's
> no point in making all host XSAVEs more expensive, even when the TMM
> registers aren't in use by the guest (which is going to be most of the
> time, likely).
> 
> > Why don't we carry separate XFD copies - guest and host - which we use for the
> > guest and the host, respectively?
> 
> That was exactly what I did in v1, but it's more code and less efficient too.

And creates a weird ABI for KVM:

 : This also creates a nasty, subtle asymmetry in KVM's ABI.  Notably, the comment
 : above is wrong.  XSAVE does NOT run with fpstate->xfd, it runs with whatever
 : happens to be in hardware.  For non-guest tasks, fpstate->xfd is guaranteed to
 : be resident in hardware when save_fpregs_to_fpstate() runs, but for guest tasks,
 : it will usually be the _guest's_ value.  So in the common case, KVM_GET_XSAVE2
 : would not return the same data set by KVM_SET_XSAVE.
 : 
 : In theory we could ensure KVM saved exactly what is resident in hardware, but
 : that's quite tricky (and costly!) as it would require doing xfd_update_state()
 : before _every_ save_fpregs_to_fpstate(), e.g. not just in fpu_swap_kvm_fpstate().
 : E.g. if the host kernel used the FPU from IRQ context (spoiler alert!), then KVM
 : wouldn't have a chance to swap in the maximal XFD[18]=0 value (i.e. the userspace
 : task's XFD).

And IMO papered over the true bug, which is that the xstate snapshot can become
inconsistent relative to KVM's tracking of guest XFD:

 : Lastly, the fix is effectively papering over another bug, which I'm pretty sure
 : is the underlying issue that was originally encountered.  Assuming QEMU doesn't
 : intercept MSR_IA32_XFD for its own purposes, the only sequence I've come up with
 : that would result in KVM trying to load XTILE data with XFD[18]=1, without a
 : colluding userspace VMM (Paolo's selftest) is:
 : 
 :   1. vCPU loads non-init XTILE data without ever setting XFD to a non-zero value
 :      (KVM only disables XFD interception on writes with a non-zero value).
 :   2. Guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1
 :   3. VM-Exit due to the WRMSR
 :   4. Host IRQ arrives and triggers kernel_fpu_begin()
 :   5. save_fpregs_to_fpstate() saves guest FPU with XFD[18]=0
 :   6. fpu_update_guest_xfd() stuffs guest_fpu->fpstate->xfd = XFD[18]=1
 :   7. vcpu_enter_guest() attempts to load XTILE data with XFD[18]=1
 : 
 : Note!  There's no KVM_SET_XSAVE2 in the above, i.e. this doesn't require userspace
 : to trigger save/restore for live migration or whatever, the only timing condition
 : is the arrival of an IRQ that uses kernel FPU during the XFD 0=>1 VM-Exit.

https://lore.kernel.org/all/aVMEcaZD_SzKzRvr@google.com

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Borislav Petkov 3 weeks, 1 day ago

On Thu, Jan 15, 2026 at 08:39:51AM -0800, Sean Christopherson wrote:
>  :   1. vCPU loads non-init XTILE data without ever setting XFD to a non-zero value
>  :      (KVM only disables XFD interception on writes with a non-zero value).
>  :   2. Guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1
>  :   3. VM-Exit due to the WRMSR
>  :   4. Host IRQ arrives and triggers kernel_fpu_begin()
>  :   5. save_fpregs_to_fpstate() saves guest FPU with XFD[18]=0
>  :   6. fpu_update_guest_xfd() stuffs guest_fpu->fpstate->xfd = XFD[18]=1
>  :   7. vcpu_enter_guest() attempts to load XTILE data with XFD[18]=1

I don't know, maybe I'm missing an important aspect but if not, I'm wondering
how you folks are not seeing the big honking discrepancy here.

*Anything* poking in MSRs under the kernel's feet where the kernel doesn't
know about that poking, is bound to cause trouble. And this is no exception.

Step 5. above should use the updated XFD[18]=1. The guest just disabled that
state! Anything else is bonkers.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Sean Christopherson 3 weeks, 1 day ago

On Thu, Jan 15, 2026, Borislav Petkov wrote:
> On Thu, Jan 15, 2026 at 08:39:51AM -0800, Sean Christopherson wrote:
> >  :   1. vCPU loads non-init XTILE data without ever setting XFD to a non-zero value
> >  :      (KVM only disables XFD interception on writes with a non-zero value).
> >  :   2. Guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1
> >  :   3. VM-Exit due to the WRMSR
> >  :   4. Host IRQ arrives and triggers kernel_fpu_begin()
> >  :   5. save_fpregs_to_fpstate() saves guest FPU with XFD[18]=0
> >  :   6. fpu_update_guest_xfd() stuffs guest_fpu->fpstate->xfd = XFD[18]=1
> >  :   7. vcpu_enter_guest() attempts to load XTILE data with XFD[18]=1
> 
> I don't know, maybe I'm missing an important aspect but if not, I'm wondering
> how you folks are not seeing the big honking discrepancy here.
> 
> *Anything* poking in MSRs under the kernel's feet where the kernel doesn't
> know about that poking, is bound to cause trouble. And this is no exception.

KVM isn't poking the MSR, KVM is literally calling a kernel API, fpu_update_guest_xfd(),
to ask/tell the kernel to update the guest's XFD.  It's the FPU code that's buggy,
because it doesn't ensure the state _it_ saved _without KVM's knowledge_ is
consistent with new XFD.

> Step 5. above should use the updated XFD[18]=1. The guest just disabled that
> state! Anything else is bonkers.

As I explained in my previous reply, that's easier said than done:

  In theory we could ensure KVM saved exactly what is resident in hardware, but
  that's quite tricky (and costly!) as it would require doing xfd_update_state()
  before _every_ save_fpregs_to_fpstate(), e.g. not just in fpu_swap_kvm_fpstate().
  E.g. if the host kernel used the FPU from IRQ context (spoiler alert!), then KVM
  wouldn't have a chance to swap in the maximal XFD[18]=0 value (i.e. the userspace
  task's XFD).

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Borislav Petkov 3 weeks ago

On Thu, Jan 01, 2026 at 10:05:12AM +0100, Paolo Bonzini wrote:
> Tested on a Sapphire Rapids machine, reviews and acks are welcome so
> that I can submit it to Linus via the KVM tree.

So I wanted to give this a thorough review after yesterday's discussion and
tried to apply the patch but it wouldn't apply. So I took a look at the code
it touches just to find out that the patch is already in Linus' tree!

Why?

Can you folks please explain to me how is this the process we've all agreed
upon?

Where does it say that people should sneak patches behind the maintainers'
backs without even getting an Ack from them?

By that logic, we can just as well sneak KVM patches behind your back and
you're supposed to be fine with it. Right?

Or should we try to adhere to the development rules we all have agreed upon
and work together in a fair and correct way?

I'd probably vote for latter, after we all sit down and agree upon something.

What I don't want is sneaking patches behind our backs and I'm sure you won't
like this either so let's please stop this.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Paolo Bonzini 2 weeks, 2 days ago

Il ven 16 gen 2026, 13:23 Borislav Petkov <bp@alien8.de> ha scritto:
>
> On Thu, Jan 01, 2026 at 10:05:12AM +0100, Paolo Bonzini wrote:
> > Tested on a Sapphire Rapids machine, reviews and acks are welcome so
> > that I can submit it to Linus via the KVM tree.
>
> So I wanted to give this a thorough review after yesterday's discussion and
> tried to apply the patch but it wouldn't apply. So I took a look at the code
> it touches just to find out that the patch is already in Linus' tree!
>
> Why?
>
> Can you folks please explain to me how is this the process we've all agreed
> upon?

It's a fix for a host crash that literally adds a single AND to a
function that's called fpu_update_*guest*_xfd. The patch doesn't have
any effect unless KVM is in use, and on any task that isn't the task
currently in KVM_RUN (other than by not crashing the system). So,
because of the effect of the bug and the small size/impact of the
patch, and the fact that there are really just two approaches and both
had been discussed extensively on list, I accepted the small
possibility that the patches would be rejected and would have to be
reverted.

If I really wanted to sneak something in, I could have written this
patch entirely in arch/x86/kvm. It would be possible, though the code
would be worse and inefficient. Sean wouldn't have let me :) but
anyway that didn't even cross my mind of course, because sneaking
something past you guys wasn't something I had in mind either. In fact
I instead plan to make that impossible, by making fpregs_lock() not
public and reducing the API exposed to KVM. I certainly will not send
that change to Linus without acks, even though it would also affect
only KVM in practice.

> By that logic, we can just as well sneak KVM patches behind your back and
> you're supposed to be fine with it. Right?

I would be ok with a Cc and sending the patch to Linus after a couple
weeks, yes, for a patch of similarly small and well-defined impact.
For example I didn't have a problem when commit b1e1296d7c6a ("kvm:
explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow()",
2023-08-21) was sent without my ack.

Paolo

>
> Or should we try to adhere to the development rules we all have agreed upon
> and work together in a fair and correct way?
>
> I'd probably vote for latter, after we all sit down and agree upon something.
>
> What I don't want is sneaking patches behind our backs and I'm sure you won't
> like this either so let's please stop this.

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Borislav Petkov 2 weeks, 1 day ago

On Wed, Jan 21, 2026 at 12:35:50PM +0100, Paolo Bonzini wrote:
> It's a fix for a host crash that literally adds a single AND to a
> function that's called fpu_update_*guest*_xfd. The patch doesn't have
> any effect unless KVM is in use,

No Paolo, *exactly* *because* arch/x86/ and KVM are so closely intertwined in
some areas, we should sync on changes there. And judging by our questions
on this thread, one of the aspects were whether the handling of the guest
state is adequate enough. And if it is not then we have to rethink it and
accomodate it.

What we definitely should NOT do is solo efforts without even an ACK.

We've had this before with the X86_FEATURE gunk and we're back at it with the
FPU.

> and on any task that isn't the task currently in KVM_RUN (other than by not
> crashing the system). So, because of the effect of the bug and the small
> size/impact of the patch, and the fact that there are really just two
> approaches and both had been discussed extensively on list,

Not by us.

> I accepted the small possibility that the patches would be rejected and
> would have to be reverted.

And all that smoke and effort just because you can't simply wait for us to
take a look. And what happened? We agreed and it is all good.

So what was all that rush all about?

> If I really wanted to sneak something in, I could have written this
> patch entirely in arch/x86/kvm. It would be possible, though the code
> would be worse and inefficient. Sean wouldn't have let me :) but

In my experience, syncing stuff with Sean who takes what and giving each other
immutable branches to use, works wonderfully. Why can't we simply stick to
that workflow?

> anyway that didn't even cross my mind of course, because sneaking
> something past you guys wasn't something I had in mind either. In fact
> I instead plan to make that impossible, by making fpregs_lock() not
> public and reducing the API exposed to KVM. I certainly will not send
> that change to Linus without acks, even though it would also affect
> only KVM in practice.

So how about we do only that from now on?

> I would be ok with a Cc and sending the patch to Linus after a couple
> weeks, yes, for a patch of similarly small and well-defined impact.
> For example I didn't have a problem when commit b1e1296d7c6a ("kvm:
> explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow()",
> 2023-08-21) was sent without my ack.

It happens. We talked with akpm recently and we'll separate the
responsibilities much better and by the looks of it, it is already much better
this way. I'd suggest you try the same.

What is really annoying and counter-productive are the unsynchronized solo
efforts so let's not do those please.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Paolo Bonzini 2 weeks, 1 day ago

On Thu, Jan 22, 2026 at 12:13 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Jan 21, 2026 at 12:35:50PM +0100, Paolo Bonzini wrote:
> > It's a fix for a host crash that literally adds a single AND to a
> > function that's called fpu_update_*guest*_xfd. The patch doesn't have
> > any effect unless KVM is in use,
>
> No Paolo, *exactly* *because* arch/x86/ and KVM are so closely intertwined in
> some areas, we should sync on changes there. And judging by our questions
> on this thread, one of the aspects were whether the handling of the guest
> state is adequate enough. And if it is not then we have to rethink it and
> accomodate it.
>
> What we definitely should NOT do is solo efforts without even an ACK.

I agree - as I wrote below, I judged that this was _not_ solo
considering that (while not including any x86 maintainers) there were
multiple people intervening and building on each other's analysis.
Yes, there was no x86 maintainer, I obviously knew that, but my
judgment call was that all these people together had looked at the
code more than it deserved. In the previous mail I said the
probability of a disagreement was small, it was even practically
nonexistent.

I don't think you can say that this is routine, for example in commit
eb4441864e03 ("KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA
time", 2024-04-11) I explicitly sought an ack for just an
EXPORT_SYMBOL change. Knowing that x86 maintainers want to tightly
control the API boundary of arch/x86/kernel/fpu, I considered that to
require the attention of you guys *even more* than a code change!

> We've had this before with the X86_FEATURE gunk and we're back at it with the
> FPU.

I agree that causing conflicts on X86_FEATURE (years ago?) was a
mistake, that said I don't think it's a great example. I still see
occasional changes to cpufeatures.h go in via Sean without ack---and
in fact I check them explicitly when I get his pull requests and look
at what tip is doing with cpufeatures.h in the same merge window. :)

> > If I really wanted to sneak something in, I could have written this
> > patch entirely in arch/x86/kvm. It would be possible, though the code
> > would be worse and inefficient. Sean wouldn't have let me :) but
>
> In my experience, syncing stuff with Sean who takes what and giving each other
> immutable branches to use, works wonderfully. Why can't we simply stick to
> that workflow?

I think it's a perfectly fine workflow across releases i.e. to prepare
for the merge window; points of contact for -rc patches are rare and
using branches to sync is unlikely to be necessary.

I appreciate a lot the support that Thomas and other arch/x86/ people
put in to help Linux run well and without hacks as a hypervisor. At
the same time I think it's fine for both sides to acknowledge that in
extremely rare cases the lines can be blurred. So rare that I cannot
think of another case in the past and it's no problem for me to say
"never again", but then it would be like saying that the Earth is
spherical...

Paolo

Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX

Posted by Borislav Petkov 2 weeks ago

On Thu, Jan 22, 2026 at 01:00:27PM +0100, Paolo Bonzini wrote:
> I agree - as I wrote below, I judged that this was _not_ solo
> considering that (while not including any x86 maintainers) there were
> multiple people intervening and building on each other's analysis.
> Yes, there was no x86 maintainer, I obviously knew that, but my
> judgment call was that all these people together had looked at the
> code more than it deserved. In the previous mail I said the
> probability of a disagreement was small, it was even practically
> nonexistent.

This all doesn't matter. Because when it comes down to cleaning up the mess
people have left behind, it is always we who end up mopping after everyone.
And everyone esle skedaddles into their next feature enablement.

And I do appreciate more than anyone when people make an effort to review
patches. You still need a maintainer ack though.

And it is not hard - you just need to ping us/send us a private mail even call
us if you want. :-)

> I don't think you can say that this is routine, for example in commit
> eb4441864e03 ("KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA
> time", 2024-04-11) I explicitly sought an ack for just an
> EXPORT_SYMBOL change. Knowing that x86 maintainers want to tightly
> control the API boundary of arch/x86/kernel/fpu, I considered that to
> require the attention of you guys *even more* than a code change!

Much appreciated, this is how it should always work. So let's make that the
default workflow please.

> I appreciate a lot the support that Thomas and other arch/x86/ people
> put in to help Linux run well and without hacks as a hypervisor. At
> the same time I think it's fine for both sides to acknowledge that in
> extremely rare cases the lines can be blurred.

If we don't reply for a week or so, sure. But if you really need an x86
maintainer ack, I'm sure you'll get one in time.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette