[Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)

Cédric Le Goater posted 26 patches 8 years, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1499274819-15607-1-git-send-email-clg@kaod.org
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
default-configs/ppc64-softmmu.mak |    2 +
hw/intc/Makefile.objs             |    2 +
hw/intc/xics.c                    |   36 +-
hw/intc/xive-internal.h           |  218 ++++++++
hw/intc/xive.c                    | 1024 +++++++++++++++++++++++++++++++++++++
hw/intc/xive_spapr.c              |  796 ++++++++++++++++++++++++++++
hw/ppc/spapr.c                    |  141 ++++-
include/hw/ppc/spapr.h            |   17 +-
include/hw/ppc/spapr_ovec.h       |    1 +
include/hw/ppc/xics.h             |    2 +
include/hw/ppc/xive.h             |   80 +++
target/ppc/cpu-qom.h              |    2 +
target/ppc/excp_helper.c          |    9 +-
target/ppc/translate.c            |    3 +-
target/ppc/translate_init.c       |    2 +-
15 files changed, 2306 insertions(+), 29 deletions(-)
create mode 100644 hw/intc/xive-internal.h
create mode 100644 hw/intc/xive.c
create mode 100644 hw/intc/xive_spapr.c
create mode 100644 include/hw/ppc/xive.h
[Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
Posted by Cédric Le Goater 8 years, 4 months ago
On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
negotiation process determines whether the guest operates with an
interrupt controller using the XICS legacy model, as found on POWER8,
or in XIVE exploitation mode, the newer POWER9 interrupt model. This
patchset is a first proposal to add XIVE support in the sPAPR machine.

The first patches introduce the XIVE exploitation mode in CAS.

Follow models for the XIVE interrupt controller, source and presenter.
We try to reuse the ICS and ICP models of XICS because the sPAPR
machine is tied to the XICSFabric interface and should be using a
common framework to be able to switch from one controller model to
another. To be discussed of course.

Then comes support for the Hypervisor's call which are used to
configure the interrupt sources and the event/notification queues of
the guest.

Finally, the last patches try to integrate the XIVE interrupt model in
the sPAPR machine and this not without a couple of serious hacks to
have something to test. See 'Caveats' below for more details.

This is a first draft and I expect a lot of rewrite before it reaches
mainline QEMU. Nevertheless, it compiles, boots and can be used for
some testing.

Code is here:

  https://github.com/legoater/qemu/commits/xive
  https://github.com/legoater/linux/commits/xive

Pre-compiled kernel (4.12) and initrd images can be found :

  http://kaod.org/qemu/ppc-xive/
       
Caveats :

 - Unnecessary complexity 

   I started working on XIVE looking at OPAL because I had the
   ambition to provide a common framework for the PowerNV and sPAPR
   machines. This is still the goal but the XIVE support for the
   PowerNV machine will be *much *more complex and we could use
   something simpler for sPAPR probably. This is why there are some
   clumsiness with the IRQ allocator and at the end of the patchset
   with the IPI interrupt source.

 - Switching interrupt model after CAS. 

   We now need a way to configure the guest with the interrupt model
   negotiated in CAS.

   But, currently, the sPAPR machine make uses of the controller very
   early in the initialization sequence. The interrupt source is used
   to allocate IRQ numbers and populate the device tree and the
   interrupt presenter objects are created along with the CPU.

   One approach would be to support the reset of the ICP and the ICS
   objects of the guest. We could be use a bitmap to allocate the IRQ
   numbers needed to populate the device tree and then instantiate the
   correct ICS with the bitmap as a parameter. The ICPs could be
   allocated later in the boot process. May be on demand, when a CPU
   is first notified.

 - Migration not addressed

 - Hotplug not addressed

 - KVM support

   The guest needs to be run with kernel_irqchip=off on a POWER9
   system.

 - LSI

   lightly tested.
   
Thanks,

C. 

Cédric Le Goater (26):
  spapr: introduce the XIVE_EXPLOIT option in CAS
  spapr: populate device tree depending on XIVE_EXPLOIT option
  target/ppc/POWER9: add POWERPC_EXCP_POWER9
  ppc/xive: introduce a skeleton for the XIVE interrupt controller model
  ppc/xive: define XIVE internal tables
  ppc/xive: introduce a XIVE interrupt source model
  ppc/xive: add MMIO handlers to the XIVE interrupt source
  ppc/xive: add flags to the XIVE interrupt source
  ppc/xive: add an overall memory region for the ESBs
  ppc/xive: record interrupt source MMIO address for hcalls
  ppc/xics: introduce a print_info() handler to the ICS and ICP objects
  ppc/xive: add a print_info() handler for the interrupt source
  ppc/xive: introduce a XIVE interrupt presenter model
  ppc/xive: add MMIO handlers to the XIVE interrupt presenter model
  ppc/xive: push EQ data in OS event queues
  ppc/xive: notify CPU when interrupt priority is more privileged
  ppc/xive: add hcalls support
  ppc/xive: add device tree support
  ppc/xive: introduce a helper to map the XIVE memory regions
  ppc/xive: introduce a helper to create XIVE interrupt source objects
  ppc/xive: introduce routines to allocate IRQ numbers
  ppc/xive: create an XIVE interrupt source to handle IPIs
  spapr: add a XIVE object to the sPAPR machine
  spapr: include the XIVE interrupt source for IPIs
  spapr: print the XIVE interrupt source for IPIs in the monitor
  spapr: force XIVE exploitation mode for POWER9 (HACK)

 default-configs/ppc64-softmmu.mak |    2 +
 hw/intc/Makefile.objs             |    2 +
 hw/intc/xics.c                    |   36 +-
 hw/intc/xive-internal.h           |  218 ++++++++
 hw/intc/xive.c                    | 1024 +++++++++++++++++++++++++++++++++++++
 hw/intc/xive_spapr.c              |  796 ++++++++++++++++++++++++++++
 hw/ppc/spapr.c                    |  141 ++++-
 include/hw/ppc/spapr.h            |   17 +-
 include/hw/ppc/spapr_ovec.h       |    1 +
 include/hw/ppc/xics.h             |    2 +
 include/hw/ppc/xive.h             |   80 +++
 target/ppc/cpu-qom.h              |    2 +
 target/ppc/excp_helper.c          |    9 +-
 target/ppc/translate.c            |    3 +-
 target/ppc/translate_init.c       |    2 +-
 15 files changed, 2306 insertions(+), 29 deletions(-)
 create mode 100644 hw/intc/xive-internal.h
 create mode 100644 hw/intc/xive.c
 create mode 100644 hw/intc/xive_spapr.c
 create mode 100644 include/hw/ppc/xive.h

-- 
2.7.5


Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
Posted by David Gibson 8 years, 3 months ago
On Wed, Jul 05, 2017 at 07:13:13PM +0200, Cédric Le Goater wrote:
> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> negotiation process determines whether the guest operates with an
> interrupt controller using the XICS legacy model, as found on POWER8,
> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> patchset is a first proposal to add XIVE support in the sPAPR machine.
> 
> The first patches introduce the XIVE exploitation mode in CAS.
> 
> Follow models for the XIVE interrupt controller, source and presenter.
> We try to reuse the ICS and ICP models of XICS because the sPAPR
> machine is tied to the XICSFabric interface and should be using a
> common framework to be able to switch from one controller model to
> another. To be discussed of course.
> 
> Then comes support for the Hypervisor's call which are used to
> configure the interrupt sources and the event/notification queues of
> the guest.
> 
> Finally, the last patches try to integrate the XIVE interrupt model in
> the sPAPR machine and this not without a couple of serious hacks to
> have something to test. See 'Caveats' below for more details.
> 
> This is a first draft and I expect a lot of rewrite before it reaches
> mainline QEMU. Nevertheless, it compiles, boots and can be used for
> some testing.

1 & 2 are straightforward enough that I've applied them already.  The
rest will take longer to review, obviously.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
Posted by Cédric Le Goater 8 years, 3 months ago
On 07/10/2017 12:24 PM, David Gibson wrote:
> On Wed, Jul 05, 2017 at 07:13:13PM +0200, Cédric Le Goater wrote:
>> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
>> negotiation process determines whether the guest operates with an
>> interrupt controller using the XICS legacy model, as found on POWER8,
>> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
>> patchset is a first proposal to add XIVE support in the sPAPR machine.
>>
>> The first patches introduce the XIVE exploitation mode in CAS.
>>
>> Follow models for the XIVE interrupt controller, source and presenter.
>> We try to reuse the ICS and ICP models of XICS because the sPAPR
>> machine is tied to the XICSFabric interface and should be using a
>> common framework to be able to switch from one controller model to
>> another. To be discussed of course.
>>
>> Then comes support for the Hypervisor's call which are used to
>> configure the interrupt sources and the event/notification queues of
>> the guest.
>>
>> Finally, the last patches try to integrate the XIVE interrupt model in
>> the sPAPR machine and this not without a couple of serious hacks to
>> have something to test. See 'Caveats' below for more details.
>>
>> This is a first draft and I expect a lot of rewrite before it reaches
>> mainline QEMU. Nevertheless, it compiles, boots and can be used for
>> some testing.
> 
> 1 & 2 are straightforward enough that I've applied them already.  The
> rest will take longer to review, obviously.

For sure ... I don't expect anything soon. This is really a first 
draft to show the differences with XICS in the overall mechanics. 
The guest boots and perf are OK but the integration with the sPAPR 
machine is a mess. I also think the IRQ allocator is too complex 
for the sPAPR need and the Xive ICP object is useless. The changelogs 
are too short. 

I have continued working on CAS support and have found a solution
which allows a guest to switch interrupt controller: XICS <-> XIVE, 
under TCG and under KVM,kernel_irqchip=off. 

The XIVE ICP lives under ICPState for ease of use. As for the ICS, 
two different objects, XIVE and XICS, are maintained under the 
sPAPR machine in which the 'irqs' array needs to be synced when 
changing model. It's not too much of a hack I think and it is 
migration friendly. We will see when discussed.

I have pushed on github these changes and I am now exploring the 
abyssal zone of migration and cpu hot-plugging.

Cheers,

C.


Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
Posted by David Gibson 8 years, 3 months ago
On Wed, Jul 05, 2017 at 07:13:13PM +0200, Cédric Le Goater wrote:
> On a POWER9 sPAPR machine, the Client Architecture Support (CAS)
> negotiation process determines whether the guest operates with an
> interrupt controller using the XICS legacy model, as found on POWER8,
> or in XIVE exploitation mode, the newer POWER9 interrupt model. This
> patchset is a first proposal to add XIVE support in the sPAPR machine.
> 
> The first patches introduce the XIVE exploitation mode in CAS.
> 
> Follow models for the XIVE interrupt controller, source and presenter.
> We try to reuse the ICS and ICP models of XICS because the sPAPR
> machine is tied to the XICSFabric interface and should be using a
> common framework to be able to switch from one controller model to
> another. To be discussed of course.
> 
> Then comes support for the Hypervisor's call which are used to
> configure the interrupt sources and the event/notification queues of
> the guest.
> 
> Finally, the last patches try to integrate the XIVE interrupt model in
> the sPAPR machine and this not without a couple of serious hacks to
> have something to test. See 'Caveats' below for more details.
> 
> This is a first draft and I expect a lot of rewrite before it reaches
> mainline QEMU. Nevertheless, it compiles, boots and can be used for
> some testing.

So, this is probably obvious, but I'm not considering this a candidate
for qemu 2.10 (seeing as the soft freeze was yesterday).  I'll still
try to review and, once ready, queue for 2.11.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
Posted by Benjamin Herrenschmidt 8 years, 3 months ago
On Wed, 2017-07-19 at 13:00 +1000, David Gibson wrote:
> So, this is probably obvious, but I'm not considering this a candidate
> for qemu 2.10 (seeing as the soft freeze was yesterday).  I'll still
> try to review and, once ready, queue for 2.11.

Right. I need to review still and we need to make sure we have the
right plumbing for migration etc... and of course I need to do the
KVM bits. So it's definitely not 2.10 material.

Cheers,
Ben.


Re: [Qemu-devel] [RFC PATCH 00/26] guest exploitation of the XIVE interrupt controller (POWER9)
Posted by Cédric Le Goater 8 years, 3 months ago
On 07/19/2017 05:55 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2017-07-19 at 13:00 +1000, David Gibson wrote:
>> So, this is probably obvious, but I'm not considering this a candidate
>> for qemu 2.10 (seeing as the soft freeze was yesterday).  I'll still
>> try to review and, once ready, queue for 2.11.
> 
> Right. I need to review still and we need to make sure we have the
> right plumbing for migration etc... and of course I need to do the
> KVM bits. So it's definitely not 2.10 material.

yes. This is not for 2.10 clearly. This is just an RFC to get
some feedback on the approach and on some ugly hacks I have 
put in place.

I have given KVM a quick look and it should be addressed before
we start merging anything. I think PowerNV should wait a bit. 

As for TCG, my branch supports reset, changing model XICS <-> XIVE, 
migration and CPU hotplug. KVM+kernel_irqchip=off is supported
also. Most of the issues have found a solution but now we need 
to discuss.   

I was out last week. Catching up.

Thanks,

C.