kvm: x86: implement PV send_IPI method

[PATCH v1] kvm: x86: implement PV send_IPI method

Posted by Cindy Lu 2 months, 2 weeks ago

From: Jason Wang <jasowang@redhat.com>

We used to have PV version of send_IPI_mask and
send_IPI_mask_allbutself. This patch implements PV send_IPI method to
reduce the number of vmexits.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Tested-by: Cindy Lu <lulu@redhat.com>
---
 arch/x86/kernel/kvm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 921c1c783bc1..b920cfd10441 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -557,6 +557,11 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
 	local_irq_restore(flags);
 }
 
+static void kvm_send_ipi(int cpu, int vector)
+{
+	__send_ipi_mask(cpumask_of(cpu), vector);
+}
+
 static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
 {
 	__send_ipi_mask(mask, vector);
@@ -628,6 +633,7 @@ late_initcall(setup_efi_kvm_sev_migration);
  */
 static __init void kvm_setup_pv_ipi(void)
 {
+	apic_update_callback(send_IPI, kvm_send_ipi);
 	apic_update_callback(send_IPI_mask, kvm_send_ipi_mask);
 	apic_update_callback(send_IPI_mask_allbutself, kvm_send_ipi_mask_allbutself);
 	pr_info("setup PV IPIs\n");
-- 
2.45.0

Re: [PATCH v1] kvm: x86: implement PV send_IPI method

Posted by Jason Wang 2 months, 2 weeks ago

On Fri, Jul 18, 2025 at 2:25 PM Cindy Lu <lulu@redhat.com> wrote:
>
> From: Jason Wang <jasowang@redhat.com>
>
> We used to have PV version of send_IPI_mask and
> send_IPI_mask_allbutself. This patch implements PV send_IPI method to
> reduce the number of vmexits.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Tested-by: Cindy Lu <lulu@redhat.com>

I think a question here is are we able to see performance improvement
in any kind of setup?

Thanks

Re: [PATCH v1] kvm: x86: implement PV send_IPI method

Posted by Chao Gao 2 months, 2 weeks ago

On Fri, Jul 18, 2025 at 03:52:30PM +0800, Jason Wang wrote:
>On Fri, Jul 18, 2025 at 2:25 PM Cindy Lu <lulu@redhat.com> wrote:
>>
>> From: Jason Wang <jasowang@redhat.com>
>>
>> We used to have PV version of send_IPI_mask and
>> send_IPI_mask_allbutself. This patch implements PV send_IPI method to
>> reduce the number of vmexits.

It won't reduce the number of VM-exits; in fact, it may increase them on CPUs
that support IPI virtualization.

With IPI virtualization enabled, *unicast* and physical-addressing IPIs won't
cause a VM-exit. Instead, the microcode posts interrupts directly to the target
vCPU. The PV version always causes a VM-exit.

>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Tested-by: Cindy Lu <lulu@redhat.com>
>
>I think a question here is are we able to see performance improvement
>in any kind of setup?

It may result in a negative performance impact.

>
>Thanks
>
>

Re: [PATCH v1] kvm: x86: implement PV send_IPI method

Posted by Jason Wang 2 months, 2 weeks ago

On Fri, Jul 18, 2025 at 7:01 PM Chao Gao <chao.gao@intel.com> wrote:
>
> On Fri, Jul 18, 2025 at 03:52:30PM +0800, Jason Wang wrote:
> >On Fri, Jul 18, 2025 at 2:25 PM Cindy Lu <lulu@redhat.com> wrote:
> >>
> >> From: Jason Wang <jasowang@redhat.com>
> >>
> >> We used to have PV version of send_IPI_mask and
> >> send_IPI_mask_allbutself. This patch implements PV send_IPI method to
> >> reduce the number of vmexits.
>
> It won't reduce the number of VM-exits; in fact, it may increase them on CPUs
> that support IPI virtualization.

Sure, but I wonder if it reduces the vmexits when there's no APICV or
L2 VM. I thought it can reduce the 2 vmexits to 1?

>
> With IPI virtualization enabled, *unicast* and physical-addressing IPIs won't
> cause a VM-exit.

Right.

> Instead, the microcode posts interrupts directly to the target
> vCPU. The PV version always causes a VM-exit.

Yes, but it applies to all PV IPI I think.

>
> >>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> Tested-by: Cindy Lu <lulu@redhat.com>
> >
> >I think a question here is are we able to see performance improvement
> >in any kind of setup?
>
> It may result in a negative performance impact.

Userspace can check and enable PV IPI for the case where it suits.

For example, HyperV did something like:

void __init hv_apic_init(void)
{
  if (ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) {
                pr_info("Hyper-V: Using IPI hypercalls\n");
                /*
                 * Set the IPI entry points.
                 */
                orig_apic = *apic;

                apic_update_callback(send_IPI, hv_send_ipi);
                apic_update_callback(send_IPI_mask, hv_send_ipi_mask);
                apic_update_callback(send_IPI_mask_allbutself,
hv_send_ipi_mask_allbutself);
                apic_update_callback(send_IPI_allbutself,
hv_send_ipi_allbutself);
                apic_update_callback(send_IPI_all, hv_send_ipi_all);
                apic_update_callback(send_IPI_self, hv_send_ipi_self);
}

send_IPI_mask is there.

Thanks

>
> >
> >Thanks
> >
> >
>

Re: [PATCH v1] kvm: x86: implement PV send_IPI method

Posted by Chao Gao 2 months, 2 weeks ago

On Fri, Jul 18, 2025 at 07:15:37PM +0800, Jason Wang wrote:
>On Fri, Jul 18, 2025 at 7:01 PM Chao Gao <chao.gao@intel.com> wrote:
>>
>> On Fri, Jul 18, 2025 at 03:52:30PM +0800, Jason Wang wrote:
>> >On Fri, Jul 18, 2025 at 2:25 PM Cindy Lu <lulu@redhat.com> wrote:
>> >>
>> >> From: Jason Wang <jasowang@redhat.com>
>> >>
>> >> We used to have PV version of send_IPI_mask and
>> >> send_IPI_mask_allbutself. This patch implements PV send_IPI method to
>> >> reduce the number of vmexits.
>>
>> It won't reduce the number of VM-exits; in fact, it may increase them on CPUs
>> that support IPI virtualization.
>
>Sure, but I wonder if it reduces the vmexits when there's no APICV or
>L2 VM. I thought it can reduce the 2 vmexits to 1?

Even without APICv, there is just 1 vmexit due to APIC write (xAPIC mode)
or MSR write (x2APIC mode).

>
>>
>> With IPI virtualization enabled, *unicast* and physical-addressing IPIs won't
>> cause a VM-exit.
>
>Right.
>
>> Instead, the microcode posts interrupts directly to the target
>> vCPU. The PV version always causes a VM-exit.
>
>Yes, but it applies to all PV IPI I think.

For multi-cast IPIs, a single hypercall (PV IPI) outperforms multiple ICR
writes, even when IPI virtualization is enabled.

>
>>
>> >>
>> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> >> Tested-by: Cindy Lu <lulu@redhat.com>
>> >
>> >I think a question here is are we able to see performance improvement
>> >in any kind of setup?
>>
>> It may result in a negative performance impact.
>
>Userspace can check and enable PV IPI for the case where it suits.

Yeah, we need to identify the cases. One example may be for TDX guests, using
a PV approach (TDVMCALL) can avoid the #VE cost.

>
>For example, HyperV did something like:
>
>void __init hv_apic_init(void)
>{
>  if (ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) {
>                pr_info("Hyper-V: Using IPI hypercalls\n");
>                /*
>                 * Set the IPI entry points.
>                 */
>                orig_apic = *apic;
>
>                apic_update_callback(send_IPI, hv_send_ipi);
>                apic_update_callback(send_IPI_mask, hv_send_ipi_mask);
>                apic_update_callback(send_IPI_mask_allbutself,
>hv_send_ipi_mask_allbutself);
>                apic_update_callback(send_IPI_allbutself,
>hv_send_ipi_allbutself);
>                apic_update_callback(send_IPI_all, hv_send_ipi_all);
>                apic_update_callback(send_IPI_self, hv_send_ipi_self);
>}
>
>send_IPI_mask is there.
>
>Thanks
>
>>
>> >
>> >Thanks
>> >
>> >
>>
>

Re: [PATCH v1] kvm: x86: implement PV send_IPI method

Posted by Sean Christopherson 2 months, 2 weeks ago

On Fri, Jul 18, 2025, Chao Gao wrote:
> On Fri, Jul 18, 2025 at 07:15:37PM +0800, Jason Wang wrote:
> >On Fri, Jul 18, 2025 at 7:01 PM Chao Gao <chao.gao@intel.com> wrote:
> >>
> >> On Fri, Jul 18, 2025 at 03:52:30PM +0800, Jason Wang wrote:
> >> >On Fri, Jul 18, 2025 at 2:25 PM Cindy Lu <lulu@redhat.com> wrote:
> >> >>
> >> >> From: Jason Wang <jasowang@redhat.com>
> >> >>
> >> >> We used to have PV version of send_IPI_mask and
> >> >> send_IPI_mask_allbutself. This patch implements PV send_IPI method to
> >> >> reduce the number of vmexits.
> >>
> >> It won't reduce the number of VM-exits; in fact, it may increase them on CPUs
> >> that support IPI virtualization.
> >
> >Sure, but I wonder if it reduces the vmexits when there's no APICV or
> >L2 VM. I thought it can reduce the 2 vmexits to 1?
> 
> Even without APICv, there is just 1 vmexit due to APIC write (xAPIC mode)
> or MSR write (x2APIC mode).

xAPIC will have two exits: ICR2 and then ICR.  If xAPIC vs. x2APIC is stable when
kvm_setup_pv_ipi() runs, maybe key off of that?

> >> With IPI virtualization enabled, *unicast* and physical-addressing IPIs won't
> >> cause a VM-exit.
> >
> >Right.
> >
> >> Instead, the microcode posts interrupts directly to the target
> >> vCPU. The PV version always causes a VM-exit.
> >
> >Yes, but it applies to all PV IPI I think.
> 
> For multi-cast IPIs, a single hypercall (PV IPI) outperforms multiple ICR
> writes, even when IPI virtualization is enabled.

FWIW, I doubt _all_ multi-cast IPIs outperform IPI virtualization.  My guess is
there's a threshold in the number of targets where the cost of sending multiple
virtual IPIs becomes more expensive than the VM-Exit and software processing,
and I assume/hope that threshold isn't '2'.

> >> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> >> Tested-by: Cindy Lu <lulu@redhat.com>
> >> >
> >> >I think a question here is are we able to see performance improvement
> >> >in any kind of setup?
> >>
> >> It may result in a negative performance impact.
> >
> >Userspace can check and enable PV IPI for the case where it suits.
> 
> Yeah, we need to identify the cases. One example may be for TDX guests, using
> a PV approach (TDVMCALL) can avoid the #VE cost.

TDX doesn't need a PV approach.  Or rather, TDX already has an "architectural"
PV approach.  Make a TDVMCALL to request emulation of WRMSR(ICR).  Don't plumb
more KVM logic into it.

Re: [PATCH v1] kvm: x86: implement PV send_IPI method

Posted by Chao Gao 2 months, 2 weeks ago

>> >> >> From: Jason Wang <jasowang@redhat.com>
>> >> >>
>> >> >> We used to have PV version of send_IPI_mask and
>> >> >> send_IPI_mask_allbutself. This patch implements PV send_IPI method to
>> >> >> reduce the number of vmexits.
>> >>
>> >> It won't reduce the number of VM-exits; in fact, it may increase them on CPUs
>> >> that support IPI virtualization.
>> >
>> >Sure, but I wonder if it reduces the vmexits when there's no APICV or
>> >L2 VM. I thought it can reduce the 2 vmexits to 1?
>> 
>> Even without APICv, there is just 1 vmexit due to APIC write (xAPIC mode)
>> or MSR write (x2APIC mode).
>
>xAPIC will have two exits: ICR2 and then ICR.

ah, yes.

>If xAPIC vs. x2APIC is stable when
>kvm_setup_pv_ipi() runs, maybe key off of that?

But the guest doesn't know if APICv is enabled or even IPI virtualization
is enabled.

>
>> >> With IPI virtualization enabled, *unicast* and physical-addressing IPIs won't
>> >> cause a VM-exit.
>> >
>> >Right.
>> >
>> >> Instead, the microcode posts interrupts directly to the target
>> >> vCPU. The PV version always causes a VM-exit.
>> >
>> >Yes, but it applies to all PV IPI I think.
>> 
>> For multi-cast IPIs, a single hypercall (PV IPI) outperforms multiple ICR
>> writes, even when IPI virtualization is enabled.
>
>FWIW, I doubt _all_ multi-cast IPIs outperform IPI virtualization.  My guess is
>there's a threshold in the number of targets where the cost of sending multiple
>virtual IPIs becomes more expensive than the VM-Exit and software processing,
>and I assume/hope that threshold isn't '2'.

Yes. Determining the threshold is tricky, and it's likely not a constant value
across different CPU generations.

>
>> >> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> >> >> Tested-by: Cindy Lu <lulu@redhat.com>
>> >> >
>> >> >I think a question here is are we able to see performance improvement
>> >> >in any kind of setup?
>> >>
>> >> It may result in a negative performance impact.
>> >
>> >Userspace can check and enable PV IPI for the case where it suits.
>> 
>> Yeah, we need to identify the cases. One example may be for TDX guests, using
>> a PV approach (TDVMCALL) can avoid the #VE cost.
>
>TDX doesn't need a PV approach.  Or rather, TDX already has an "architectural"
>PV approach.  Make a TDVMCALL to request emulation of WRMSR(ICR).  Don't plumb
>more KVM logic into it.

Agree. It should be an optimization for TDX guests, regardless of the
underlying hypervisor.

Re: [PATCH v1] kvm: x86: implement PV send_IPI method

Posted by Sean Christopherson 2 months, 2 weeks ago

On Fri, Jul 18, 2025, Chao Gao wrote:
> >> >> >> From: Jason Wang <jasowang@redhat.com>
> >If xAPIC vs. x2APIC is stable when
> >kvm_setup_pv_ipi() runs, maybe key off of that?
> 
> But the guest doesn't know if APICv is enabled or even IPI virtualization
> is enabled.

Oh yeah, duh.  Given that KVM emulates x2APIC irrespective of hardware support,
and that Linux leans heavily towards x2APIC (thanks MMIO stale data!), my vote
is to leave things as they are.