[PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when setting LAPIC regs

Maciej S. Szmigiero posted 2 patches 1 month, 2 weeks ago
There is a newer version of this series
[PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when setting LAPIC regs
Posted by Maciej S. Szmigiero 1 month, 2 weeks ago
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
inhibited so any changed TPR in the LAPIC state would not get copied into
the V_TPR field of VMCB.

AVIC does sync between these two fields, however it does so only on
explicit guest writes to one of these fields, not on a bare VMRUN.

This is especially true when it is the userspace setting LAPIC state via
KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
VMCB.

Practice shows that it is the V_TPR that is actually used by the AVIC to
decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
so any leftover value in V_TPR will cause serious interrupt delivery issues
in the guest when AVIC is enabled.

Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
similar code paths when AVIC is enabled.

Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
 arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index a34c5c3b164e..877bc3db2c6e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
 
 void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+	u64 cr8;
+
 	avic_handle_dfr_update(vcpu);
 	avic_handle_ldr_update(vcpu);
+
+	/* Running nested should have inhibited AVIC. */
+	if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
+		return;
+
+	/*
+	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
+	 *
+	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
+	 * is inhibited so any set TPR LAPIC state would not get reflected
+	 * in V_TPR.
+	 *
+	 * Practice shows that it is the V_TPR that is actually used by the
+	 * AVIC to decide whether to issue pending interrupts to the CPU, not
+	 * TPR in TASKPRI.
+	 */
+	cr8 = kvm_get_cr8(vcpu);
+	svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
+	svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
+	WARN_ON_ONCE(!vmcb_is_dirty(svm->vmcb, VMCB_INTR));
 }
 
 static void svm_ir_list_del(struct kvm_kernel_irqfd *irqfd)
Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when setting LAPIC regs
Posted by Sean Christopherson 1 month, 1 week ago
On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
> inhibited so any changed TPR in the LAPIC state would not get copied into
> the V_TPR field of VMCB.
> 
> AVIC does sync between these two fields, however it does so only on
> explicit guest writes to one of these fields, not on a bare VMRUN.
> 
> This is especially true when it is the userspace setting LAPIC state via
> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
> VMCB.
> 
> Practice shows that it is the V_TPR that is actually used by the AVIC to
> decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
> so any leftover value in V_TPR will cause serious interrupt delivery issues
> in the guest when AVIC is enabled.
> 
> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
> similar code paths when AVIC is enabled.
> 
> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> ---
>  arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index a34c5c3b164e..877bc3db2c6e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
>  
>  void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	u64 cr8;
> +
>  	avic_handle_dfr_update(vcpu);
>  	avic_handle_ldr_update(vcpu);
> +
> +	/* Running nested should have inhibited AVIC. */
> +	if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
> +		return;


> +
> +	/*
> +	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
> +	 *
> +	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
> +	 * is inhibited so any set TPR LAPIC state would not get reflected
> +	 * in V_TPR.

Hmm, I think that code is straight up wrong.  There's no justification, just a
claim:

  commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
  Author:     Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
  AuthorDate: Wed May 4 14:09:51 2016 -0500
  Commit:     Paolo Bonzini <pbonzini@redhat.com>
  CommitDate: Wed May 18 18:04:31 2016 +0200

    svm: Do not intercept CR8 when enable AVIC
    
    When enable AVIC:
        * Do not intercept CR8 since this should be handled by AVIC HW.
        * Also, we don't need to sync cr8/V_TPR and APIC backing page.   <======
    
    Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

That claim assumes APIC[TPR] will _never_ be modified by anything other than
hardware.  That's obviously false for state restore from userspace, and it's also
technically false at steady state, e.g. if KVM managed to trigger emulation of a
store to the APIC page, then KVM would bypass the automatic harware sync.

There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
except for the initial ioctl), but could again set APIC[TPR] without updating
V_TPR.

So, rather than manually do the update during state restore, my vote is to restore
the sync logic.  And if we want to optimize that code (seems unnecessary), then
we should hook all TPR writes.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d9931c6c4bc6..1bfebe40854f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
        struct vcpu_svm *svm = to_svm(vcpu);
        u64 cr8;
 
-       if (nested_svm_virtualize_tpr(vcpu) ||
-           kvm_vcpu_apicv_active(vcpu))
+       if (nested_svm_virtualize_tpr(vcpu))
                return;
 
        cr8 = kvm_get_cr8(vcpu);


> +	 *
> +	 * Practice shows that it is the V_TPR that is actually used by the
> +	 * AVIC to decide whether to issue pending interrupts to the CPU, not
> +	 * TPR in TASKPRI.

FWIW, the APM kinda sorta alludes to this:

  Enabling AVIC also affects CR8 behavior independent of V_INTR_MASKING enable
  (bit 24): writes to CR8 affect the V_TPR and update the backing page and reads
  from CR8 return V_TPR.


> +	 */
> +	cr8 = kvm_get_cr8(vcpu);
> +	svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
> +	svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
> +	WARN_ON_ONCE(!vmcb_is_dirty(svm->vmcb, VMCB_INTR));


>  }
>  
>  static void svm_ir_list_del(struct kvm_kernel_irqfd *irqfd)
Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when setting LAPIC regs
Posted by Maciej S. Szmigiero 1 month, 1 week ago
On 21.08.2025 22:38, Sean Christopherson wrote:
> On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
>> inhibited so any changed TPR in the LAPIC state would not get copied into
>> the V_TPR field of VMCB.
>>
>> AVIC does sync between these two fields, however it does so only on
>> explicit guest writes to one of these fields, not on a bare VMRUN.
>>
>> This is especially true when it is the userspace setting LAPIC state via
>> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
>> VMCB.
>>
>> Practice shows that it is the V_TPR that is actually used by the AVIC to
>> decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
>> so any leftover value in V_TPR will cause serious interrupt delivery issues
>> in the guest when AVIC is enabled.
>>
>> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
>> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
>> similar code paths when AVIC is enabled.
>>
>> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>> ---
>>   arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>> index a34c5c3b164e..877bc3db2c6e 100644
>> --- a/arch/x86/kvm/svm/avic.c
>> +++ b/arch/x86/kvm/svm/avic.c
>> @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
>>   
>>   void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
>>   {
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +	u64 cr8;
>> +
>>   	avic_handle_dfr_update(vcpu);
>>   	avic_handle_ldr_update(vcpu);
>> +
>> +	/* Running nested should have inhibited AVIC. */
>> +	if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
>> +		return;
> 
> 
>> +
>> +	/*
>> +	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
>> +	 *
>> +	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
>> +	 * is inhibited so any set TPR LAPIC state would not get reflected
>> +	 * in V_TPR.
> 
> Hmm, I think that code is straight up wrong.  There's no justification, just a
> claim:
> 
>    commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
>    Author:     Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>    AuthorDate: Wed May 4 14:09:51 2016 -0500
>    Commit:     Paolo Bonzini <pbonzini@redhat.com>
>    CommitDate: Wed May 18 18:04:31 2016 +0200
> 
>      svm: Do not intercept CR8 when enable AVIC
>      
>      When enable AVIC:
>          * Do not intercept CR8 since this should be handled by AVIC HW.
>          * Also, we don't need to sync cr8/V_TPR and APIC backing page.   <======
>      
>      Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>      [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
>      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> That claim assumes APIC[TPR] will _never_ be modified by anything other than
> hardware.  That's obviously false for state restore from userspace, and it's also
> technically false at steady state, e.g. if KVM managed to trigger emulation of a
> store to the APIC page, then KVM would bypass the automatic harware sync.
> 
> There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
> be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
> except for the initial ioctl), but could again set APIC[TPR] without updating
> V_TPR.
> 
> So, rather than manually do the update during state restore, my vote is to restore
> the sync logic.  And if we want to optimize that code (seems unnecessary), then
> we should hook all TPR writes.
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d9931c6c4bc6..1bfebe40854f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>          struct vcpu_svm *svm = to_svm(vcpu);
>          u64 cr8;
>   
> -       if (nested_svm_virtualize_tpr(vcpu) ||
> -           kvm_vcpu_apicv_active(vcpu))
> +       if (nested_svm_virtualize_tpr(vcpu))
>                  return;
>   
>          cr8 = kvm_get_cr8(vcpu);
> 
> 

So you want to just do an unconditional LAPIC -> V_TPR sync at each VMRUN
and not try to patch every code flow where these possibly could get de-synced
to do such sync only on demand, correct?

By the way, the original Suravee's submission for the aforementioned patch
did *not* inhibit that sync when AVIC is on [1].

Something similar to this sync inhibit only showed in v4 [2],
probably upon Radim's comment on v3 [3] that:
> I think we can exit early with svm_vcpu_avic_enabled().

But the initial sync inhibit condition in v4 was essentially
nested_svm_virtualize_tpr() && svm_vcpu_avic_enabled(),
which suggests there was some confusion what was exactly meant
by the reviewer comment.

The final sync inhibit condition only showed in v5 [4].
No further discussion happened on that point.

Thanks,
Maciej

[1]: https://lore.kernel.org/kvm/1455285574-27892-9-git-send-email-suravee.suthikulpanit@amd.com/
[2]: https://lore.kernel.org/kvm/1460017232-17429-11-git-send-email-Suravee.Suthikulpanit@amd.com/
[3]: https://lore.kernel.org/kvm/20160318211048.GB26119@potion.brq.redhat.com/
[4]: https://lore.kernel.org/kvm/1462388992-25242-13-git-send-email-Suravee.Suthikulpanit@amd.com/
Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when setting LAPIC regs
Posted by Sean Christopherson 1 month, 1 week ago
On Sat, Aug 23, 2025, Maciej S. Szmigiero wrote:
> On 21.08.2025 22:38, Sean Christopherson wrote:
> > On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
> > > +	/*
> > > +	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
> > > +	 *
> > > +	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
> > > +	 * is inhibited so any set TPR LAPIC state would not get reflected
> > > +	 * in V_TPR.
> > 
> > Hmm, I think that code is straight up wrong.  There's no justification, just a
> > claim:
> > 
> >    commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
> >    Author:     Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> >    AuthorDate: Wed May 4 14:09:51 2016 -0500
> >    Commit:     Paolo Bonzini <pbonzini@redhat.com>
> >    CommitDate: Wed May 18 18:04:31 2016 +0200
> > 
> >      svm: Do not intercept CR8 when enable AVIC
> >      When enable AVIC:
> >          * Do not intercept CR8 since this should be handled by AVIC HW.
> >          * Also, we don't need to sync cr8/V_TPR and APIC backing page.   <======
> >      Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> >      [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
> >      Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > That claim assumes APIC[TPR] will _never_ be modified by anything other than
> > hardware.  That's obviously false for state restore from userspace, and it's also
> > technically false at steady state, e.g. if KVM managed to trigger emulation of a
> > store to the APIC page, then KVM would bypass the automatic harware sync.
> > 
> > There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
> > be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
> > except for the initial ioctl), but could again set APIC[TPR] without updating
> > V_TPR.
> > 
> > So, rather than manually do the update during state restore, my vote is to restore
> > the sync logic.  And if we want to optimize that code (seems unnecessary), then
> > we should hook all TPR writes.
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index d9931c6c4bc6..1bfebe40854f 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
> >          struct vcpu_svm *svm = to_svm(vcpu);
> >          u64 cr8;
> > -       if (nested_svm_virtualize_tpr(vcpu) ||
> > -           kvm_vcpu_apicv_active(vcpu))
> > +       if (nested_svm_virtualize_tpr(vcpu))
> >                  return;
> >          cr8 = kvm_get_cr8(vcpu);
> > 
> > 
> 
> So you want to just do an unconditional LAPIC -> V_TPR sync at each VMRUN
> and not try to patch every code flow where these possibly could get de-synced
> to do such sync only on demand, correct?

Yep.  For a fix, I definitely want to go with the bare minimum.  If we want to
optimize the sync, that can be done on top, and it can be done irrespective of
AVIC.  E.g. for guests that don't modify TPR, the sync is almost pure overhead
too.

> By the way, the original Suravee's submission for the aforementioned patch
> did *not* inhibit that sync when AVIC is on [1].
> 
> Something similar to this sync inhibit only showed in v4 [2],
> probably upon Radim's comment on v3 [3] that:
> > I think we can exit early with svm_vcpu_avic_enabled().
> 
> But the initial sync inhibit condition in v4 was essentially
> nested_svm_virtualize_tpr() && svm_vcpu_avic_enabled(),
> which suggests there was some confusion what was exactly meant
> by the reviewer comment.

Hmm, I suspect that was just a goof.  My guess is that Radim and Suravee simply
forgot to consider TPR writes that aren't handled by hardware.

> The final sync inhibit condition only showed in v5 [4].
> No further discussion happened on that point.
> 
> Thanks,
> Maciej
> 
> [1]: https://lore.kernel.org/kvm/1455285574-27892-9-git-send-email-suravee.suthikulpanit@amd.com/
> [2]: https://lore.kernel.org/kvm/1460017232-17429-11-git-send-email-Suravee.Suthikulpanit@amd.com/
> [3]: https://lore.kernel.org/kvm/20160318211048.GB26119@potion.brq.redhat.com/
> [4]: https://lore.kernel.org/kvm/1462388992-25242-13-git-send-email-Suravee.Suthikulpanit@amd.com/
>
Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when setting LAPIC regs
Posted by Naveen N Rao 1 month, 1 week ago
On Thu, Aug 21, 2025 at 01:38:17PM -0700, Sean Christopherson wrote:
> On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
> > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> > 
> > When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
> > inhibited so any changed TPR in the LAPIC state would not get copied into
> > the V_TPR field of VMCB.
> > 
> > AVIC does sync between these two fields, however it does so only on
> > explicit guest writes to one of these fields, not on a bare VMRUN.
> > 
> > This is especially true when it is the userspace setting LAPIC state via
> > KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
> > VMCB.
> > 
> > Practice shows that it is the V_TPR that is actually used by the AVIC to
> > decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
> > so any leftover value in V_TPR will cause serious interrupt delivery issues
> > in the guest when AVIC is enabled.
> > 
> > Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
> > avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
> > similar code paths when AVIC is enabled.
> > 
> > Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index a34c5c3b164e..877bc3db2c6e 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
> >  
> >  void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
> >  {
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +	u64 cr8;
> > +
> >  	avic_handle_dfr_update(vcpu);
> >  	avic_handle_ldr_update(vcpu);
> > +
> > +	/* Running nested should have inhibited AVIC. */
> > +	if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
> > +		return;
> 
> 
> > +
> > +	/*
> > +	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
> > +	 *
> > +	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
> > +	 * is inhibited so any set TPR LAPIC state would not get reflected
> > +	 * in V_TPR.
> 
> Hmm, I think that code is straight up wrong.  There's no justification, just a
> claim:
> 
>   commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
>   Author:     Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>   AuthorDate: Wed May 4 14:09:51 2016 -0500
>   Commit:     Paolo Bonzini <pbonzini@redhat.com>
>   CommitDate: Wed May 18 18:04:31 2016 +0200
> 
>     svm: Do not intercept CR8 when enable AVIC
>     
>     When enable AVIC:
>         * Do not intercept CR8 since this should be handled by AVIC HW.
>         * Also, we don't need to sync cr8/V_TPR and APIC backing page.   <======
>     
>     Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>     [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> That claim assumes APIC[TPR] will _never_ be modified by anything other than
> hardware. 

It also isn't clear to me why only sync_lapic_to_cr8() was gated when 
AVIC was enabled, while sync_cr8_to_lapic() continues to copy V_TRP to 
the backing page. If AVIC is enabled, then the AVIC hardware updates 
both the backing page and V_TPR on a guest write to TPR.

> That's obviously false for state restore from userspace, and it's also
> technically false at steady state, e.g. if KVM managed to trigger emulation of a
> store to the APIC page, then KVM would bypass the automatic harware sync.

Do you mean emulation due to AVIC being inhibited? I initially thought 
this could be a problem, but in this scenario, AVIC would be disabled on 
the next VMRUN, so we will end up sync'ing TPR from the lapic to V_TPR.

> 
> There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
> be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
> except for the initial ioctl), but could again set APIC[TPR] without updating
> V_TPR.
>
> So, rather than manually do the update during state restore, my vote 
> is to restore the sync logic.  And if we want to optimize that code 
> (seems unnecessary), then we should hook all TPR writes.

I guess you mean apic_set_tpr()? We will need to hook into that in 
addition to updating avic_apicv_post_state_restore() since KVM_SET_LAPIC 
just does a memcpy of the register state.

> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d9931c6c4bc6..1bfebe40854f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>         struct vcpu_svm *svm = to_svm(vcpu);
>         u64 cr8;
>  
> -       if (nested_svm_virtualize_tpr(vcpu) ||
> -           kvm_vcpu_apicv_active(vcpu))
> +       if (nested_svm_virtualize_tpr(vcpu))
>                 return;
>  
>         cr8 = kvm_get_cr8(vcpu);

I agree that this is a simpler fix, so would be good to do for backport 
ease.

The code in sync_lapic_to_cr8 ends up being a function call to 
kvm_get_cr8() and ~6 instructions, which isn't that much. But if we can 
gate sync'ing V_TPR to the backing page in sync_cr8_to_lapic() as well, 
then it might be good to do so.


- Naveen
Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when setting LAPIC regs
Posted by Sean Christopherson 1 month, 1 week ago
On Fri, Aug 22, 2025, Naveen N Rao wrote:
> On Thu, Aug 21, 2025 at 01:38:17PM -0700, Sean Christopherson wrote:
> > On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
> > > +	/*
> > > +	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
> > > +	 *
> > > +	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
> > > +	 * is inhibited so any set TPR LAPIC state would not get reflected
> > > +	 * in V_TPR.
> > 
> > Hmm, I think that code is straight up wrong.  There's no justification, just a
> > claim:
> > 
> >   commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
> >   Author:     Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> >   AuthorDate: Wed May 4 14:09:51 2016 -0500
> >   Commit:     Paolo Bonzini <pbonzini@redhat.com>
> >   CommitDate: Wed May 18 18:04:31 2016 +0200
> > 
> >     svm: Do not intercept CR8 when enable AVIC
> >     
> >     When enable AVIC:
> >         * Do not intercept CR8 since this should be handled by AVIC HW.
> >         * Also, we don't need to sync cr8/V_TPR and APIC backing page.   <======
> >     
> >     Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> >     [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > That claim assumes APIC[TPR] will _never_ be modified by anything other than
> > hardware. 
> 
> It also isn't clear to me why only sync_lapic_to_cr8() was gated when 
> AVIC was enabled, while sync_cr8_to_lapic() continues to copy V_TRP to 
> the backing page. If AVIC is enabled, then the AVIC hardware updates 
> both the backing page and V_TPR on a guest write to TPR.
> 
> > That's obviously false for state restore from userspace, and it's also
> > technically false at steady state, e.g. if KVM managed to trigger emulation of a
> > store to the APIC page, then KVM would bypass the automatic harware sync.
> 
> Do you mean emulation due to AVIC being inhibited? I initially thought 
> this could be a problem, but in this scenario, AVIC would be disabled on 
> the next VMRUN, so we will end up sync'ing TPR from the lapic to V_TPR.

No, if emulation is triggered when AVIC isn't inhibited.  E.g. a contrived but
entirely possible situation would be if MOVBE isn't supported in hardware, KVM
is emulating MOVBE for emulation, and the guest sets the TPR via MOVBE.  The MOVBE
#UDs, KVM emulates in response to the #UD, and Bob's your uncle.

There are other scenarios where KVM would emulate, though they're even more
contrived.

> > There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
> > be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
> > except for the initial ioctl), but could again set APIC[TPR] without updating
> > V_TPR.
> >
> > So, rather than manually do the update during state restore, my vote 
> > is to restore the sync logic.  And if we want to optimize that code 
> > (seems unnecessary), then we should hook all TPR writes.
> 
> I guess you mean apic_set_tpr()? 

Yep.

> We will need to hook into that in addition to updating
> avic_apicv_post_state_restore() since KVM_SET_LAPIC just does a memcpy of the
> register state.

Yeah, or explicitly call the hook, e.g. like kvm_apic_set_state() does for
hwapic_isr_update().  But I don't think we should add a hook unless someone
proves that unconditionally synchronizing before VMRUN affects performance.

> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index d9931c6c4bc6..1bfebe40854f 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >         u64 cr8;
> >  
> > -       if (nested_svm_virtualize_tpr(vcpu) ||
> > -           kvm_vcpu_apicv_active(vcpu))
> > +       if (nested_svm_virtualize_tpr(vcpu))
> >                 return;
> >  
> >         cr8 = kvm_get_cr8(vcpu);
> 
> I agree that this is a simpler fix, so would be good to do for backport 
> ease.
> 
> The code in sync_lapic_to_cr8 ends up being a function call to 
> kvm_get_cr8() and ~6 instructions, which isn't that much. But if we can 
> gate sync'ing V_TPR to the backing page in sync_cr8_to_lapic() as well, 
> then it might be good to do so.
> 
> 
> - Naveen
>