[PATCH] spapr/xive: Allocate IPIs from the vCPU contexts

Cédric Le Goater posted 1 patch 3 years, 8 months ago
Failed in applying to current master (apply log)
hw/intc/spapr_xive_kvm.c | 50 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
[PATCH] spapr/xive: Allocate IPIs from the vCPU contexts
Posted by Cédric Le Goater 3 years, 8 months ago
When QEMU switches to the XIVE interrupt mode, it performs a
kvmppc_xive_source_reset() which creates all the guest interrupts at
the level of the KVM device. These interrupts are backed by real HW
interrupts from the IPI interrupt pool of the XIVE controller.

Currently, this is done from the QEMU main thread, which results in
allocating all interrupts from the chip on which QEMU is running. IPIs
are not distributed across the system and the load is not well
balanced across the interrupt controllers.

Change the vCPU IPI allocation to run from the vCPU context in order
to allocate the associated XIVE IPI interrupt on the chip on which the
vCPU is running. This gives a chance to a better distribution of the
IPIs when the guest has a lot of vCPUs. When the vCPUs are pinned, it
makes the IPI local to the chip of the vCPU which reduces rerouting
between interrupt controllers and gives better performance.

This is only possible for running vCPUs. The IPIs of hot plugable
vCPUs will still be allocated in the context of the QEMU main thread.

Device interrupts are treated the same. To improve placement, we would
need some information on the chip owning the virtual source or HW
source in case of passthrough. This requires changes in PAPR.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive_kvm.c | 50 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index c6958f2da218..553fd7fd8f56 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -223,6 +223,47 @@ void kvmppc_xive_sync_source(SpaprXive *xive, uint32_t lisn, Error **errp)
                       NULL, true, errp);
 }
 
+/*
+ * Allocate the IPIs from the vCPU context. This will allocate the
+ * XIVE IPI interrupt on the chip on which the vCPU is running. This
+ * gives a better distribution of IPIs when the guest has a lot of
+ * vCPUs. When the vCPU are pinned, the IPIs are local which reduces
+ * rerouting between interrupt controllers and gives better
+ * performance.
+ */
+typedef struct {
+    SpaprXive *xive;
+    int ipi;
+    Error *err;
+    int rc;
+} XiveInitIPI;
+
+static void kvmppc_xive_reset_ipi_on_cpu(CPUState *cs, run_on_cpu_data arg)
+{
+    XiveInitIPI *s = arg.host_ptr;
+    uint64_t state = 0;
+
+    s->rc = kvm_device_access(s->xive->fd, KVM_DEV_XIVE_GRP_SOURCE, s->ipi,
+                              &state, true, &s->err);
+}
+
+static int kvmppc_xive_reset_ipi(SpaprXive *xive, int ipi, Error **errp)
+{
+    PowerPCCPU *cpu = spapr_find_cpu(ipi);
+    XiveInitIPI s = {
+        .xive = xive,
+        .ipi  = ipi,
+        .err  = NULL,
+        .rc   = 0,
+    };
+
+    run_on_cpu(CPU(cpu), kvmppc_xive_reset_ipi_on_cpu, RUN_ON_CPU_HOST_PTR(&s));
+    if (s.err) {
+        error_propagate(errp, s.err);
+    }
+    return s.rc;
+}
+
 /*
  * At reset, the interrupt sources are simply created and MASKED. We
  * only need to inform the KVM XIVE device about their type: LSI or
@@ -230,11 +271,20 @@ void kvmppc_xive_sync_source(SpaprXive *xive, uint32_t lisn, Error **errp)
  */
 int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
 {
+    MachineState *machine = MACHINE(qdev_get_machine());
     SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
     uint64_t state = 0;
 
     assert(xive->fd != -1);
 
+    /*
+     * IPIs are special. Allocate the IPIs from the vCPU context for
+     * those running. Hotplugged CPUs will the QEMU context.
+     */
+    if (srcno < machine->smp.cpus) {
+        return kvmppc_xive_reset_ipi(xive, srcno, errp);
+    }
+
     if (xive_source_irq_is_lsi(xsrc, srcno)) {
         state |= KVM_XIVE_LEVEL_SENSITIVE;
         if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
-- 
2.25.4


Re: [PATCH] spapr/xive: Allocate IPIs from the vCPU contexts
Posted by Cédric Le Goater 3 years, 8 months ago
This works as expected with a 128 vCPUs guest with pinned vcpus. The
first 64 IPIs are allocated on the first chip and the remaining 64
on the second chip.

Still, this is more an RFC. We have time before the end of the merge
window.

Thanks,

C.  


On 8/14/20 5:03 PM, Cédric Le Goater wrote:
> When QEMU switches to the XIVE interrupt mode, it performs a
> kvmppc_xive_source_reset() which creates all the guest interrupts at
> the level of the KVM device. These interrupts are backed by real HW
> interrupts from the IPI interrupt pool of the XIVE controller.
> 
> Currently, this is done from the QEMU main thread, which results in
> allocating all interrupts from the chip on which QEMU is running. IPIs
> are not distributed across the system and the load is not well
> balanced across the interrupt controllers.
> 
> Change the vCPU IPI allocation to run from the vCPU context in order
> to allocate the associated XIVE IPI interrupt on the chip on which the
> vCPU is running. This gives a chance to a better distribution of the
> IPIs when the guest has a lot of vCPUs. When the vCPUs are pinned, it
> makes the IPI local to the chip of the vCPU which reduces rerouting
> between interrupt controllers and gives better performance.
> 
> This is only possible for running vCPUs. The IPIs of hot plugable
> vCPUs will still be allocated in the context of the QEMU main thread.
> 
> Device interrupts are treated the same. To improve placement, we would
> need some information on the chip owning the virtual source or HW
> source in case of passthrough. This requires changes in PAPR.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive_kvm.c | 50 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index c6958f2da218..553fd7fd8f56 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -223,6 +223,47 @@ void kvmppc_xive_sync_source(SpaprXive *xive, uint32_t lisn, Error **errp)
>                        NULL, true, errp);
>  }
>  
> +/*
> + * Allocate the IPIs from the vCPU context. This will allocate the
> + * XIVE IPI interrupt on the chip on which the vCPU is running. This
> + * gives a better distribution of IPIs when the guest has a lot of
> + * vCPUs. When the vCPU are pinned, the IPIs are local which reduces
> + * rerouting between interrupt controllers and gives better
> + * performance.
> + */
> +typedef struct {
> +    SpaprXive *xive;
> +    int ipi;
> +    Error *err;
> +    int rc;
> +} XiveInitIPI;
> +
> +static void kvmppc_xive_reset_ipi_on_cpu(CPUState *cs, run_on_cpu_data arg)
> +{
> +    XiveInitIPI *s = arg.host_ptr;
> +    uint64_t state = 0;
> +
> +    s->rc = kvm_device_access(s->xive->fd, KVM_DEV_XIVE_GRP_SOURCE, s->ipi,
> +                              &state, true, &s->err);
> +}
> +
> +static int kvmppc_xive_reset_ipi(SpaprXive *xive, int ipi, Error **errp)
> +{
> +    PowerPCCPU *cpu = spapr_find_cpu(ipi);
> +    XiveInitIPI s = {
> +        .xive = xive,
> +        .ipi  = ipi,
> +        .err  = NULL,
> +        .rc   = 0,
> +    };
> +
> +    run_on_cpu(CPU(cpu), kvmppc_xive_reset_ipi_on_cpu, RUN_ON_CPU_HOST_PTR(&s));
> +    if (s.err) {
> +        error_propagate(errp, s.err);
> +    }
> +    return s.rc;
> +}
> +
>  /*
>   * At reset, the interrupt sources are simply created and MASKED. We
>   * only need to inform the KVM XIVE device about their type: LSI or
> @@ -230,11 +271,20 @@ void kvmppc_xive_sync_source(SpaprXive *xive, uint32_t lisn, Error **errp)
>   */
>  int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
>  {
> +    MachineState *machine = MACHINE(qdev_get_machine());
>      SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
>      uint64_t state = 0;
>  
>      assert(xive->fd != -1);
>  
> +    /*
> +     * IPIs are special. Allocate the IPIs from the vCPU context for
> +     * those running. Hotplugged CPUs will the QEMU context.
> +     */
> +    if (srcno < machine->smp.cpus) {
> +        return kvmppc_xive_reset_ipi(xive, srcno, errp);
> +    }
> +
>      if (xive_source_irq_is_lsi(xsrc, srcno)) {
>          state |= KVM_XIVE_LEVEL_SENSITIVE;
>          if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
> 


Re: [PATCH] spapr/xive: Allocate IPIs from the vCPU contexts
Posted by David Gibson 3 years, 8 months ago
On Fri, Aug 14, 2020 at 05:08:13PM +0200, Cédric Le Goater wrote:
> 
> This works as expected with a 128 vCPUs guest with pinned vcpus. The
> first 64 IPIs are allocated on the first chip and the remaining 64
> on the second chip.
> 
> Still, this is more an RFC. We have time before the end of the merge
> window.

It looks reasonable to me.  AFAICT it makes things better than they
were, and even if we can improve it further, that won't break
migration or other interfaces we need to preserve.

> 
> Thanks,
> 
> C.  
> 
> 
> On 8/14/20 5:03 PM, Cédric Le Goater wrote:
> > When QEMU switches to the XIVE interrupt mode, it performs a
> > kvmppc_xive_source_reset() which creates all the guest interrupts at
> > the level of the KVM device. These interrupts are backed by real HW
> > interrupts from the IPI interrupt pool of the XIVE controller.
> > 
> > Currently, this is done from the QEMU main thread, which results in
> > allocating all interrupts from the chip on which QEMU is running. IPIs
> > are not distributed across the system and the load is not well
> > balanced across the interrupt controllers.
> > 
> > Change the vCPU IPI allocation to run from the vCPU context in order
> > to allocate the associated XIVE IPI interrupt on the chip on which the
> > vCPU is running. This gives a chance to a better distribution of the
> > IPIs when the guest has a lot of vCPUs. When the vCPUs are pinned, it
> > makes the IPI local to the chip of the vCPU which reduces rerouting
> > between interrupt controllers and gives better performance.
> > 
> > This is only possible for running vCPUs. The IPIs of hot plugable
> > vCPUs will still be allocated in the context of the QEMU main thread.
> > 
> > Device interrupts are treated the same. To improve placement, we would
> > need some information on the chip owning the virtual source or HW
> > source in case of passthrough. This requires changes in PAPR.
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  hw/intc/spapr_xive_kvm.c | 50 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 50 insertions(+)
> > 
> > diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> > index c6958f2da218..553fd7fd8f56 100644
> > --- a/hw/intc/spapr_xive_kvm.c
> > +++ b/hw/intc/spapr_xive_kvm.c
> > @@ -223,6 +223,47 @@ void kvmppc_xive_sync_source(SpaprXive *xive, uint32_t lisn, Error **errp)
> >                        NULL, true, errp);
> >  }
> >  
> > +/*
> > + * Allocate the IPIs from the vCPU context. This will allocate the
> > + * XIVE IPI interrupt on the chip on which the vCPU is running. This
> > + * gives a better distribution of IPIs when the guest has a lot of
> > + * vCPUs. When the vCPU are pinned, the IPIs are local which reduces
> > + * rerouting between interrupt controllers and gives better
> > + * performance.
> > + */
> > +typedef struct {
> > +    SpaprXive *xive;
> > +    int ipi;
> > +    Error *err;
> > +    int rc;
> > +} XiveInitIPI;
> > +
> > +static void kvmppc_xive_reset_ipi_on_cpu(CPUState *cs, run_on_cpu_data arg)
> > +{
> > +    XiveInitIPI *s = arg.host_ptr;
> > +    uint64_t state = 0;
> > +
> > +    s->rc = kvm_device_access(s->xive->fd, KVM_DEV_XIVE_GRP_SOURCE, s->ipi,
> > +                              &state, true, &s->err);
> > +}
> > +
> > +static int kvmppc_xive_reset_ipi(SpaprXive *xive, int ipi, Error **errp)
> > +{
> > +    PowerPCCPU *cpu = spapr_find_cpu(ipi);
> > +    XiveInitIPI s = {
> > +        .xive = xive,
> > +        .ipi  = ipi,
> > +        .err  = NULL,
> > +        .rc   = 0,
> > +    };
> > +
> > +    run_on_cpu(CPU(cpu), kvmppc_xive_reset_ipi_on_cpu, RUN_ON_CPU_HOST_PTR(&s));
> > +    if (s.err) {
> > +        error_propagate(errp, s.err);
> > +    }
> > +    return s.rc;
> > +}
> > +
> >  /*
> >   * At reset, the interrupt sources are simply created and MASKED. We
> >   * only need to inform the KVM XIVE device about their type: LSI or
> > @@ -230,11 +271,20 @@ void kvmppc_xive_sync_source(SpaprXive *xive, uint32_t lisn, Error **errp)
> >   */
> >  int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
> >  {
> > +    MachineState *machine = MACHINE(qdev_get_machine());
> >      SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
> >      uint64_t state = 0;
> >  
> >      assert(xive->fd != -1);
> >  
> > +    /*
> > +     * IPIs are special. Allocate the IPIs from the vCPU context for
> > +     * those running. Hotplugged CPUs will the QEMU context.
> > +     */
> > +    if (srcno < machine->smp.cpus) {
> > +        return kvmppc_xive_reset_ipi(xive, srcno, errp);
> > +    }
> > +
> >      if (xive_source_irq_is_lsi(xsrc, srcno)) {
> >          state |= KVM_XIVE_LEVEL_SENSITIVE;
> >          if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [PATCH] spapr/xive: Allocate IPIs from the vCPU contexts
Posted by Cédric Le Goater 3 years, 8 months ago
On 8/16/20 6:30 AM, David Gibson wrote:
> On Fri, Aug 14, 2020 at 05:08:13PM +0200, Cédric Le Goater wrote:
>>
>> This works as expected with a 128 vCPUs guest with pinned vcpus. The
>> first 64 IPIs are allocated on the first chip and the remaining 64
>> on the second chip.
>>
>> Still, this is more an RFC. We have time before the end of the merge
>> window.
> 
> It looks reasonable to me.  AFAICT it makes things better than they
> were, and even if we can improve it further, that won't break
> migration or other interfaces we need to preserve.

Yeah. What I don't like is this test below. I am not sure that 
machine->smp.cpus is the correct way to test the number of currently
active vCPUs. 

>>> +    if (srcno < machine->smp.cpus) {
>>> +        return kvmppc_xive_reset_ipi(xive, srcno, errp);
>>> +    }
>>> +
>>>      if (xive_source_irq_is_lsi(xsrc, srcno)) {
>>>          state |= KVM_XIVE_LEVEL_SENSITIVE;
>>>          if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {

Re: [PATCH] spapr/xive: Allocate IPIs from the vCPU contexts
Posted by David Gibson 3 years, 8 months ago
On Sun, Aug 16, 2020 at 03:38:20PM +0200, Cédric Le Goater wrote:
> On 8/16/20 6:30 AM, David Gibson wrote:
> > On Fri, Aug 14, 2020 at 05:08:13PM +0200, Cédric Le Goater wrote:
> >>
> >> This works as expected with a 128 vCPUs guest with pinned vcpus. The
> >> first 64 IPIs are allocated on the first chip and the remaining 64
> >> on the second chip.
> >>
> >> Still, this is more an RFC. We have time before the end of the merge
> >> window.
> > 
> > It looks reasonable to me.  AFAICT it makes things better than they
> > were, and even if we can improve it further, that won't break
> > migration or other interfaces we need to preserve.
> 
> Yeah. What I don't like is this test below. I am not sure that 
> machine->smp.cpus is the correct way to test the number of currently
> active vCPUs.

Ah, yeah.  It should be correct at initial startup, but might not be
after a bunch of hotplugs/unplugs, which could result in a sparse set
of active vcpus.

Usually this code will be called during initial setup, but I think
there are some edge cases where it won't be (e.g. boot a XICS kernel,
do some vcpu plugs/unplugs, reboot into a XIVE kernel).

So I think we need to explicitly check for each vcpu # if it's
currently active.  Using... spapr_find_cpu(), I guess?

> 
> >>> +    if (srcno < machine->smp.cpus) {
> >>> +        return kvmppc_xive_reset_ipi(xive, srcno, errp);
> >>> +    }
> >>> +
> >>>      if (xive_source_irq_is_lsi(xsrc, srcno)) {
> >>>          state |= KVM_XIVE_LEVEL_SENSITIVE;
> >>>          if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [PATCH] spapr/xive: Allocate IPIs from the vCPU contexts
Posted by Cédric Le Goater 3 years, 8 months ago
On 8/20/20 2:42 AM, David Gibson wrote:
> On Sun, Aug 16, 2020 at 03:38:20PM +0200, Cédric Le Goater wrote:
>> On 8/16/20 6:30 AM, David Gibson wrote:
>>> On Fri, Aug 14, 2020 at 05:08:13PM +0200, Cédric Le Goater wrote:
>>>>
>>>> This works as expected with a 128 vCPUs guest with pinned vcpus. The
>>>> first 64 IPIs are allocated on the first chip and the remaining 64
>>>> on the second chip.
>>>>
>>>> Still, this is more an RFC. We have time before the end of the merge
>>>> window.
>>>
>>> It looks reasonable to me.  AFAICT it makes things better than they
>>> were, and even if we can improve it further, that won't break
>>> migration or other interfaces we need to preserve.
>>
>> Yeah. What I don't like is this test below. I am not sure that 
>> machine->smp.cpus is the correct way to test the number of currently
>> active vCPUs.
> 
> Ah, yeah.  It should be correct at initial startup, but might not be
> after a bunch of hotplugs/unplugs, which could result in a sparse set
> of active vcpus.

Yes. I have a v2 ready to be sent in which the vCPU IPI is allocated
under kvmppc_xive_cpu_connect().

Thanks,

C. 

> 
> Usually this code will be called during initial setup, but I think
> there are some edge cases where it won't be (e.g. boot a XICS kernel,
> do some vcpu plugs/unplugs, reboot into a XIVE kernel).
> 
> So I think we need to explicitly check for each vcpu # if it's
> currently active.  Using... spapr_find_cpu(), I guess?
> 
>>
>>>>> +    if (srcno < machine->smp.cpus) {
>>>>> +        return kvmppc_xive_reset_ipi(xive, srcno, errp);
>>>>> +    }
>>>>> +
>>>>>      if (xive_source_irq_is_lsi(xsrc, srcno)) {
>>>>>          state |= KVM_XIVE_LEVEL_SENSITIVE;
>>>>>          if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
>>
>