[PATCH 0/2] KVM: SVM: Fix missing LAPIC TPR sync into VMCB::V_TPR with AVIC on

Maciej S. Szmigiero posted 2 patches 1 month, 2 weeks ago
There is a newer version of this series
arch/x86/kvm/svm/avic.c                       |  23 ++
.../testing/selftests/kvm/include/x86/apic.h  |   5 +
.../selftests/kvm/x86/xapic_state_test.c      | 265 +++++++++++++++++-
3 files changed, 290 insertions(+), 3 deletions(-)
[PATCH 0/2] KVM: SVM: Fix missing LAPIC TPR sync into VMCB::V_TPR with AVIC on
Posted by Maciej S. Szmigiero 1 month, 2 weeks ago
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

When AVIC is enabled the normal pre-VMRUN LAPIC TPR to VMCB::V_TPR sync in
sync_lapic_to_cr8() is inhibited so any changed TPR in the LAPIC state would
*not* get copied into the V_TPR field of VMCB.

AVIC does sync between these two fields, however it does so only on
explicit guest writes to one of these fields, not on a bare VMRUN.

This is especially true when it is the userspace setting LAPIC state via
KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
VMCB.

Practice shows that it is the V_TPR that is actually used by the AVIC to
decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
so any leftover value in V_TPR will cause serious interrupt delivery issues
in the guest when AVIC is enabled.

Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
similar code paths when AVIC is enabled.

Add also a relevant set of tests to xapic_state_test so hopefully
we'll be protected against getting such regressions in the future.


Yes, this breaks real guests when AVIC is enabled.
Specifically, the one OS that sometimes needs different handling and its
name begins with letter 'W'.


  KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when setting LAPIC regs
  KVM: selftests: Test TPR / CR8 sync and interrupt masking

 arch/x86/kvm/svm/avic.c                       |  23 ++
 .../testing/selftests/kvm/include/x86/apic.h  |   5 +
 .../selftests/kvm/x86/xapic_state_test.c      | 265 +++++++++++++++++-
 3 files changed, 290 insertions(+), 3 deletions(-)
Re: [PATCH 0/2] KVM: SVM: Fix missing LAPIC TPR sync into VMCB::V_TPR with AVIC on
Posted by Naveen N Rao 1 month, 1 week ago
On Tue, Aug 19, 2025 at 03:32:13PM +0200, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> When AVIC is enabled the normal pre-VMRUN LAPIC TPR to VMCB::V_TPR sync in
> sync_lapic_to_cr8() is inhibited so any changed TPR in the LAPIC state would
> *not* get copied into the V_TPR field of VMCB.
> 
> AVIC does sync between these two fields, however it does so only on
> explicit guest writes to one of these fields, not on a bare VMRUN.
> 
> This is especially true when it is the userspace setting LAPIC state via
> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
> VMCB.

Dumb question: why is the VMM updating TPR? Is this related to live 
migration or such?

I think I do see the problem described here, but when AVIC is 
temporarily inhibited. So, trying to understand if there are other flows 
involving the VMM where TPR could be updated outside of the guest.

> 
> Practice shows that it is the V_TPR that is actually used by the AVIC to
> decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
> so any leftover value in V_TPR will cause serious interrupt delivery issues
> in the guest when AVIC is enabled.
> 
> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
> similar code paths when AVIC is enabled.
> 
> Add also a relevant set of tests to xapic_state_test so hopefully
> we'll be protected against getting such regressions in the future.

Do the new tests reproduce this issue?

> 
> 
> Yes, this breaks real guests when AVIC is enabled.
> Specifically, the one OS that sometimes needs different handling and its
> name begins with letter 'W'.

Indeed, Linux does not use TPR AFAIK.


- Naveen
Re: [PATCH 0/2] KVM: SVM: Fix missing LAPIC TPR sync into VMCB::V_TPR with AVIC on
Posted by Maciej S. Szmigiero 1 month, 1 week ago
On 21.08.2025 10:18, Naveen N Rao wrote:
> On Tue, Aug 19, 2025 at 03:32:13PM +0200, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> When AVIC is enabled the normal pre-VMRUN LAPIC TPR to VMCB::V_TPR sync in
>> sync_lapic_to_cr8() is inhibited so any changed TPR in the LAPIC state would
>> *not* get copied into the V_TPR field of VMCB.
>>
>> AVIC does sync between these two fields, however it does so only on
>> explicit guest writes to one of these fields, not on a bare VMRUN.
>>
>> This is especially true when it is the userspace setting LAPIC state via
>> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
>> VMCB.
> 
> Dumb question: why is the VMM updating TPR? Is this related to live
> migration or such?

In this case, VMM is resetting LAPIC state on machine reset.

> I think I do see the problem described here, but when AVIC is
> temporarily inhibited. So, trying to understand if there are other flows
> involving the VMM where TPR could be updated outside of the guest.
> 
>>
>> Practice shows that it is the V_TPR that is actually used by the AVIC to
>> decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
>> so any leftover value in V_TPR will cause serious interrupt delivery issues
>> in the guest when AVIC is enabled.
>>
>> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
>> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
>> similar code paths when AVIC is enabled.
>>
>> Add also a relevant set of tests to xapic_state_test so hopefully
>> we'll be protected against getting such regressions in the future.
> 
> Do the new tests reproduce this issue?

Yes, and also check quite a bit more of TPR-related functionality.

>>
>>
>> Yes, this breaks real guests when AVIC is enabled.
>> Specifically, the one OS that sometimes needs different handling and its
>> name begins with letter 'W'.
> 
> Indeed, Linux does not use TPR AFAIK.
> 
> 
> - Naveen
> 

Thanks,
Maciej
Re: [PATCH 0/2] KVM: SVM: Fix missing LAPIC TPR sync into VMCB::V_TPR with AVIC on
Posted by Alejandro Jimenez 1 month, 1 week ago

On 8/21/25 7:42 AM, Maciej S. Szmigiero wrote:
> On 21.08.2025 10:18, Naveen N Rao wrote:
>> On Tue, Aug 19, 2025 at 03:32:13PM +0200, Maciej S. Szmigiero wrote:
>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>
>>> When AVIC is enabled the normal pre-VMRUN LAPIC TPR to VMCB::V_TPR 
>>> sync in
>>> sync_lapic_to_cr8() is inhibited so any changed TPR in the LAPIC 
>>> state would
>>> *not* get copied into the V_TPR field of VMCB.
>>>
>>> AVIC does sync between these two fields, however it does so only on
>>> explicit guest writes to one of these fields, not on a bare VMRUN.
>>>
>>> This is especially true when it is the userspace setting LAPIC state via
>>> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
>>> VMCB.
>>
>> Dumb question: why is the VMM updating TPR? Is this related to live
>> migration or such?
> 
> In this case, VMM is resetting LAPIC state on machine reset.
> 
>> I think I do see the problem described here, but when AVIC is
>> temporarily inhibited. So, trying to understand if there are other flows
>> involving the VMM where TPR could be updated outside of the guest.
>>
>>>
>>> Practice shows that it is the V_TPR that is actually used by the AVIC to
>>> decide whether to issue pending interrupts to the CPU (not TPR in 
>>> TASKPRI),
>>> so any leftover value in V_TPR will cause serious interrupt delivery 
>>> issues
>>> in the guest when AVIC is enabled.
>>>
>>> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
>>> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC 
>>> and
>>> similar code paths when AVIC is enabled.
>>>
>>> Add also a relevant set of tests to xapic_state_test so hopefully
>>> we'll be protected against getting such regressions in the future.
>>
>> Do the new tests reproduce this issue?
> 
> Yes, and also check quite a bit more of TPR-related functionality.
> 
>>>
>>>
>>> Yes, this breaks real guests when AVIC is enabled.
>>> Specifically, the one OS that sometimes needs different handling and its
>>> name begins with letter 'W'.
>>
>> Indeed, Linux does not use TPR AFAIK.
>>
>>

I believe it does, during the local APIC initialization. When Maciej 
determined the root cause of this issue, I was wondering why we have not 
seen it earlier in Linux. I found that Linux takes a defensive approach 
and drains all pending interrupts during lapic initialization. 
Essentially, for each CPU, Linux will:
- temporarily disable the Local APIC (via Spurious Int Vector Reg)
- set the TPR to accept all "regular" interrupts i.e. tpr=0x10
- drain all pending interrupts in ISR and/or IRR
- attempt the above draining step a max of 512 times
- then re-enable APIC and continue initialization

The relevant code is in setup_local_APIC()
https://elixir.bootlin.com/linux/v6.16/source/arch/x86/kernel/apic/apic.c#L1533-L1545

So without Maciej's proposed change, other OSs that are not as resilient 
could also be affected by this issue.

Alejandro

>> - Naveen
>>
> 
> Thanks,
> Maciej
> 
>
Re: [PATCH 0/2] KVM: SVM: Fix missing LAPIC TPR sync into VMCB::V_TPR with AVIC on
Posted by Sean Christopherson 1 month, 1 week ago
On Thu, Aug 21, 2025, Alejandro Jimenez wrote:
> On 8/21/25 7:42 AM, Maciej S. Szmigiero wrote:
> > On 21.08.2025 10:18, Naveen N Rao wrote:
> > > > Yes, this breaks real guests when AVIC is enabled.
> > > > Specifically, the one OS that sometimes needs different handling and its
> > > > name begins with letter 'W'.
> > > 
> > > Indeed, Linux does not use TPR AFAIK.
> 
> I believe it does, 

Heh, yes, Linux technically "uses" the TPR in that it does a one-time write to
it.  But what Naveen really meant is that Linux doesn't actively use TPR to
manage what IRQs are masked/allowed, whereas Windows heavily uses TPR to do
exactly that.  Specifically, what matters is that Linux doesn't use TPR to _mask_
IRQs, and so clobbering it to '0' on migration is largely benign.

> during the local APIC initialization. When Maciej
> determined the root cause of this issue, I was wondering why we have not
> seen it earlier in Linux. I found that Linux takes a defensive approach and
> drains all pending interrupts during lapic initialization. Essentially, for
> each CPU, Linux will:
> - temporarily disable the Local APIC (via Spurious Int Vector Reg)
> - set the TPR to accept all "regular" interrupts i.e. tpr=0x10
> - drain all pending interrupts in ISR and/or IRR
> - attempt the above draining step a max of 512 times
> - then re-enable APIC and continue initialization
> 
> The relevant code is in setup_local_APIC()
> https://elixir.bootlin.com/linux/v6.16/source/arch/x86/kernel/apic/apic.c#L1533-L1545
> 
> So without Maciej's proposed change, other OSs that are not as resilient
> could also be affected by this issue.
> 
> Alejandro
> 
> > > - Naveen
> > > 
> > 
> > Thanks,
> > Maciej
> > 
> > 
>