[PATCH v3 20/62] KVM: SVM: Add a comment to explain why avic_vcpu_blocking() ignores IRQ blocking

Sean Christopherson posted 62 patches 4 months ago
[PATCH v3 20/62] KVM: SVM: Add a comment to explain why avic_vcpu_blocking() ignores IRQ blocking
Posted by Sean Christopherson 4 months ago
Add a comment to explain why KVM clears IsRunning when putting a vCPU,
even though leaving IsRunning=1 would be ok from a functional perspective.
Per Maxim's experiments, a misbehaving VM could spam the AVIC doorbell so
fast as to induce a 50%+ loss in performance.

Link: https://lore.kernel.org/all/8d7e0d0391df4efc7cb28557297eb2ec9904f1e5.camel@redhat.com
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index bf8b59556373..3cf929ac117f 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -1121,19 +1121,24 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
 	if (!kvm_vcpu_apicv_active(vcpu))
 		return;
 
-       /*
-        * Unload the AVIC when the vCPU is about to block, _before_
-        * the vCPU actually blocks.
-        *
-        * Any IRQs that arrive before IsRunning=0 will not cause an
-        * incomplete IPI vmexit on the source, therefore vIRR will also
-        * be checked by kvm_vcpu_check_block() before blocking.  The
-        * memory barrier implicit in set_current_state orders writing
-        * IsRunning=0 before reading the vIRR.  The processor needs a
-        * matching memory barrier on interrupt delivery between writing
-        * IRR and reading IsRunning; the lack of this barrier might be
-        * the cause of errata #1235).
-        */
+	/*
+	 * Unload the AVIC when the vCPU is about to block, _before_ the vCPU
+	 * actually blocks.
+	 *
+	 * Note, any IRQs that arrive before IsRunning=0 will not cause an
+	 * incomplete IPI vmexit on the source; kvm_vcpu_check_block() handles
+	 * this by checking vIRR one last time before blocking.  The memory
+	 * barrier implicit in set_current_state orders writing IsRunning=0
+	 * before reading the vIRR.  The processor needs a matching memory
+	 * barrier on interrupt delivery between writing IRR and reading
+	 * IsRunning; the lack of this barrier might be the cause of errata #1235).
+	 *
+	 * Clear IsRunning=0 even if guest IRQs are disabled, i.e. even if KVM
+	 * doesn't need to detect events for scheduling purposes.  The doorbell
+	 * used to signal running vCPUs cannot be blocked, i.e. will perturb the
+	 * CPU and cause noisy neighbor problems if the VM is sending interrupts
+	 * to the vCPU while it's scheduled out.
+	 */
 	avic_vcpu_put(vcpu);
 }
 
-- 
2.50.0.rc1.591.g9c95f17f64-goog
Re: [PATCH v3 20/62] KVM: SVM: Add a comment to explain why avic_vcpu_blocking() ignores IRQ blocking
Posted by Naveen N Rao 3 months, 2 weeks ago
On Wed, Jun 11, 2025 at 03:45:23PM -0700, Sean Christopherson wrote:
> Add a comment to explain why KVM clears IsRunning when putting a vCPU,
> even though leaving IsRunning=1 would be ok from a functional perspective.
> Per Maxim's experiments, a misbehaving VM could spam the AVIC doorbell so
> fast as to induce a 50%+ loss in performance.
> 
> Link: https://lore.kernel.org/all/8d7e0d0391df4efc7cb28557297eb2ec9904f1e5.camel@redhat.com
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 31 ++++++++++++++++++-------------
>  1 file changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index bf8b59556373..3cf929ac117f 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -1121,19 +1121,24 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
>  	if (!kvm_vcpu_apicv_active(vcpu))
>  		return;
>  
> -       /*
> -        * Unload the AVIC when the vCPU is about to block, _before_
> -        * the vCPU actually blocks.
> -        *
> -        * Any IRQs that arrive before IsRunning=0 will not cause an
> -        * incomplete IPI vmexit on the source, therefore vIRR will also
> -        * be checked by kvm_vcpu_check_block() before blocking.  The
> -        * memory barrier implicit in set_current_state orders writing
> -        * IsRunning=0 before reading the vIRR.  The processor needs a
> -        * matching memory barrier on interrupt delivery between writing
> -        * IRR and reading IsRunning; the lack of this barrier might be
> -        * the cause of errata #1235).
> -        */
> +	/*
> +	 * Unload the AVIC when the vCPU is about to block, _before_ the vCPU
> +	 * actually blocks.
> +	 *
> +	 * Note, any IRQs that arrive before IsRunning=0 will not cause an
> +	 * incomplete IPI vmexit on the source; kvm_vcpu_check_block() handles
> +	 * this by checking vIRR one last time before blocking.  The memory
> +	 * barrier implicit in set_current_state orders writing IsRunning=0
> +	 * before reading the vIRR.  The processor needs a matching memory
> +	 * barrier on interrupt delivery between writing IRR and reading
> +	 * IsRunning; the lack of this barrier might be the cause of errata #1235).
> +	 *
> +	 * Clear IsRunning=0 even if guest IRQs are disabled, i.e. even if KVM
> +	 * doesn't need to detect events for scheduling purposes.  The doorbell

Nit: just IsRunning (you can drop the =0 part).

Trying to understand the significance of IRQs being disabled here. Is 
that a path KVM tries to optimize? Theoretically, it looks like we want 
to clear IsRunning regardless of whether the vCPU is blocked so as to 
prevent the guest from spamming the host with AVIC doorbells -- compared 
to always keeping IsRunning set so as to speed up VM entry/exit.

> +	 * used to signal running vCPUs cannot be blocked, i.e. will perturb the
> +	 * CPU and cause noisy neighbor problems if the VM is sending interrupts
> +	 * to the vCPU while it's scheduled out.
> +	 */
>  	avic_vcpu_put(vcpu);
>  }

Otherwise, this LGTM.
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>


Thanks,
Naveen
Re: [PATCH v3 20/62] KVM: SVM: Add a comment to explain why avic_vcpu_blocking() ignores IRQ blocking
Posted by Sean Christopherson 3 months, 2 weeks ago
On Mon, Jun 23, 2025, Naveen N Rao wrote:
> On Wed, Jun 11, 2025 at 03:45:23PM -0700, Sean Christopherson wrote:
> > Add a comment to explain why KVM clears IsRunning when putting a vCPU,
> > even though leaving IsRunning=1 would be ok from a functional perspective.
> > Per Maxim's experiments, a misbehaving VM could spam the AVIC doorbell so
> > fast as to induce a 50%+ loss in performance.
> > 
> > Link: https://lore.kernel.org/all/8d7e0d0391df4efc7cb28557297eb2ec9904f1e5.camel@redhat.com
> > Cc: Maxim Levitsky <mlevitsk@redhat.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 31 ++++++++++++++++++-------------
> >  1 file changed, 18 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index bf8b59556373..3cf929ac117f 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -1121,19 +1121,24 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
> >  	if (!kvm_vcpu_apicv_active(vcpu))
> >  		return;
> >  
> > -       /*
> > -        * Unload the AVIC when the vCPU is about to block, _before_
> > -        * the vCPU actually blocks.
> > -        *
> > -        * Any IRQs that arrive before IsRunning=0 will not cause an
> > -        * incomplete IPI vmexit on the source, therefore vIRR will also
> > -        * be checked by kvm_vcpu_check_block() before blocking.  The
> > -        * memory barrier implicit in set_current_state orders writing
> > -        * IsRunning=0 before reading the vIRR.  The processor needs a
> > -        * matching memory barrier on interrupt delivery between writing
> > -        * IRR and reading IsRunning; the lack of this barrier might be
> > -        * the cause of errata #1235).
> > -        */
> > +	/*
> > +	 * Unload the AVIC when the vCPU is about to block, _before_ the vCPU
> > +	 * actually blocks.
> > +	 *
> > +	 * Note, any IRQs that arrive before IsRunning=0 will not cause an
> > +	 * incomplete IPI vmexit on the source; kvm_vcpu_check_block() handles
> > +	 * this by checking vIRR one last time before blocking.  The memory
> > +	 * barrier implicit in set_current_state orders writing IsRunning=0
> > +	 * before reading the vIRR.  The processor needs a matching memory
> > +	 * barrier on interrupt delivery between writing IRR and reading
> > +	 * IsRunning; the lack of this barrier might be the cause of errata #1235).
> > +	 *
> > +	 * Clear IsRunning=0 even if guest IRQs are disabled, i.e. even if KVM
> > +	 * doesn't need to detect events for scheduling purposes.  The doorbell
> 
> Nit: just IsRunning (you can drop the =0 part).

Hmm, not really.  It could be:

	/* Note, any IRQs that arrive while IsRunning is set will not cause an

or

	/* Note, any IRQs that arrive while IsRunning=1 will not cause an

but that's just regurgitating the spec.  The slightly more interesting scenario
that's being described here is what will happen if an IRQ arrives _just_ before
the below code toggle IsRunning from 1 => 0.

> Trying to understand the significance of IRQs being disabled here. Is 
> that a path KVM tries to optimize?

Yep.  KVM doesn't need a notification for the undelivered (virtual) IRQ, because
it won't be handled by the vCPU until the vCPU enables IRQs, and thus it's not a
valid wake event for the vCPU.

So, *if* spurious doorbells didn't affect performance or functionality, then
ideally KVM would leave IsRunning=1, e.g. so that the IOMMU doesn't need to
generate GA log events, and so that other vCPUs aren't forced to VM-Exit when
sending an IPI.  Unfortunately, spurious doorbells are quite intrusive, and so
KVM "needs" to clear IsRunning.

> Theoretically, it looks like we want to clear IsRunning regardless of whether
> the vCPU is blocked so as to prevent the guest from spamming the host with
> AVIC doorbells -- compared to always keeping IsRunning set so as to speed up
> VM entry/exit.

Yep, exactly.

> > +	 * used to signal running vCPUs cannot be blocked, i.e. will perturb the
> > +	 * CPU and cause noisy neighbor problems if the VM is sending interrupts
> > +	 * to the vCPU while it's scheduled out.
> > +	 */
> >  	avic_vcpu_put(vcpu);
> >  }
> 
> Otherwise, this LGTM.
> Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
> 
> 
> Thanks,
> Naveen
>
Re: [PATCH v3 20/62] KVM: SVM: Add a comment to explain why avic_vcpu_blocking() ignores IRQ blocking
Posted by Naveen N Rao 3 months, 2 weeks ago
On Mon, Jun 23, 2025 at 09:18:45AM -0700, Sean Christopherson wrote:
> On Mon, Jun 23, 2025, Naveen N Rao wrote:
> > On Wed, Jun 11, 2025 at 03:45:23PM -0700, Sean Christopherson wrote:
> > > Add a comment to explain why KVM clears IsRunning when putting a vCPU,
> > > even though leaving IsRunning=1 would be ok from a functional perspective.
> > > Per Maxim's experiments, a misbehaving VM could spam the AVIC doorbell so
> > > fast as to induce a 50%+ loss in performance.
> > > 
> > > Link: https://lore.kernel.org/all/8d7e0d0391df4efc7cb28557297eb2ec9904f1e5.camel@redhat.com
> > > Cc: Maxim Levitsky <mlevitsk@redhat.com>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/svm/avic.c | 31 ++++++++++++++++++-------------
> > >  1 file changed, 18 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index bf8b59556373..3cf929ac117f 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -1121,19 +1121,24 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
> > >  	if (!kvm_vcpu_apicv_active(vcpu))
> > >  		return;
> > >  
> > > -       /*
> > > -        * Unload the AVIC when the vCPU is about to block, _before_
> > > -        * the vCPU actually blocks.
> > > -        *
> > > -        * Any IRQs that arrive before IsRunning=0 will not cause an
> > > -        * incomplete IPI vmexit on the source, therefore vIRR will also
> > > -        * be checked by kvm_vcpu_check_block() before blocking.  The
> > > -        * memory barrier implicit in set_current_state orders writing
> > > -        * IsRunning=0 before reading the vIRR.  The processor needs a
> > > -        * matching memory barrier on interrupt delivery between writing
> > > -        * IRR and reading IsRunning; the lack of this barrier might be
> > > -        * the cause of errata #1235).
> > > -        */
> > > +	/*
> > > +	 * Unload the AVIC when the vCPU is about to block, _before_ the vCPU
> > > +	 * actually blocks.
> > > +	 *
> > > +	 * Note, any IRQs that arrive before IsRunning=0 will not cause an
> > > +	 * incomplete IPI vmexit on the source; kvm_vcpu_check_block() handles
> > > +	 * this by checking vIRR one last time before blocking.  The memory
> > > +	 * barrier implicit in set_current_state orders writing IsRunning=0
> > > +	 * before reading the vIRR.  The processor needs a matching memory
> > > +	 * barrier on interrupt delivery between writing IRR and reading
> > > +	 * IsRunning; the lack of this barrier might be the cause of errata #1235).
> > > +	 *
> > > +	 * Clear IsRunning=0 even if guest IRQs are disabled, i.e. even if KVM
> > > +	 * doesn't need to detect events for scheduling purposes.  The doorbell
> > 
> > Nit: just IsRunning (you can drop the =0 part).
> 
> Hmm, not really.  It could be:
> 
> 	/* Note, any IRQs that arrive while IsRunning is set will not cause an
> 
> or
> 
> 	/* Note, any IRQs that arrive while IsRunning=1 will not cause an
> 
> but that's just regurgitating the spec.  The slightly more interesting scenario
> that's being described here is what will happen if an IRQ arrives _just_ before
> the below code toggle IsRunning from 1 => 0.

Oh, you're right. I was referring to the last paragraph starting with 
"Clear IsRunning=0 even if", which to me feels like a double negation. 
It reads better if it is "Clear IsRunning even if ..."

> 
> > Trying to understand the significance of IRQs being disabled here. Is 
> > that a path KVM tries to optimize?
> 
> Yep.  KVM doesn't need a notification for the undelivered (virtual) IRQ, because
> it won't be handled by the vCPU until the vCPU enables IRQs, and thus it's not a
> valid wake event for the vCPU.
> 
> So, *if* spurious doorbells didn't affect performance or functionality, then
> ideally KVM would leave IsRunning=1, e.g. so that the IOMMU doesn't need to
> generate GA log events, and so that other vCPUs aren't forced to VM-Exit when
> sending an IPI.  Unfortunately, spurious doorbells are quite intrusive, and so
> KVM "needs" to clear IsRunning.

Ok, got that. kvm_arch_vcpu_blocking() looks like it exists precisely 
for this reason.

Thanks,
Naveen