docs/index.rst | 1 + docs/specs/index.rst | 13 +++ docs/specs/ppc-spapr-xive.rst | 174 +++++++++++++++++++++++++++++ docs/specs/ppc-xive.rst | 199 ++++++++++++++++++++++++++++++++++ MAINTAINERS | 1 + 5 files changed, 388 insertions(+) create mode 100644 docs/specs/index.rst create mode 100644 docs/specs/ppc-spapr-xive.rst create mode 100644 docs/specs/ppc-xive.rst
This documents the overall XIVE architecture and the XIVE support for
sPAPR guest machines (pseries).
It also provides documentation on the 'info pic' command.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
Changes since v2:
- fixed typos.
Changes since v1:
- reorganized into different files and directories. I don't think the
'info pic' documentation belongs to 'interop' nor 'devel' and so
the ppc-spapr-xive.rst file seemed like the best place for it.
docs/index.rst | 1 +
docs/specs/index.rst | 13 +++
docs/specs/ppc-spapr-xive.rst | 174 +++++++++++++++++++++++++++++
docs/specs/ppc-xive.rst | 199 ++++++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
5 files changed, 388 insertions(+)
create mode 100644 docs/specs/index.rst
create mode 100644 docs/specs/ppc-spapr-xive.rst
create mode 100644 docs/specs/ppc-xive.rst
diff --git a/docs/index.rst b/docs/index.rst
index 3690955dd1f5..baa5791c174b 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -12,4 +12,5 @@ Welcome to QEMU's documentation!
interop/index
devel/index
+ specs/index
diff --git a/docs/specs/index.rst b/docs/specs/index.rst
new file mode 100644
index 000000000000..2e927519c2e7
--- /dev/null
+++ b/docs/specs/index.rst
@@ -0,0 +1,13 @@
+. This is the top level page for the 'specs' manual
+
+
+QEMU full-system emulation guest hardware specifications
+========================================================
+
+
+Contents:
+
+.. toctree::
+ :maxdepth: 2
+
+ xive
diff --git a/docs/specs/ppc-spapr-xive.rst b/docs/specs/ppc-spapr-xive.rst
new file mode 100644
index 000000000000..539ce7ca4e90
--- /dev/null
+++ b/docs/specs/ppc-spapr-xive.rst
@@ -0,0 +1,174 @@
+XIVE for sPAPR (pseries machines)
+=================================
+
+The POWER9 processor comes with a new interrupt controller
+architecture, called XIVE as "eXternal Interrupt Virtualization
+Engine". It supports a larger number of interrupt sources and offers
+virtualization features which enables the HW to deliver interrupts
+directly to virtual processors without hypervisor assistance.
+
+A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
+processors can run under two interrupt modes:
+
+- *Legacy Compatibility Mode*
+
+ the hypervisor provides identical interfaces and similar
+ functionality to PAPR+ Version 2.7. This is the default mode
+
+ It is also referred as *XICS* in QEMU.
+
+- *XIVE native exploitation mode*
+
+ the hypervisor provides new interfaces to manage the XIVE control
+ structures, and provides direct control for interrupt management
+ through MMIO pages.
+
+Which interrupt modes can be used by the machine is negotiated with
+the guest O/S during the Client Architecture Support negotiation
+sequence. The two modes are mutually exclusive.
+
+Both interrupt mode share the same IRQ number space. See below for the
+layout.
+
+CAS Negotiation
+---------------
+
+QEMU advertises the supported interrupt modes in the device tree
+property "ibm,arch-vec-5-platform-support" in byte 23 and the OS
+Selection for XIVE is indicated in the "ibm,architecture-vec-5"
+property byte 23.
+
+The interrupt modes supported by the machine depend on the CPU type
+(POWER9 is required for XIVE) but also on the machine property
+``ic-mode`` which can be set on the command line. It can take the
+following values: ``xics``, ``xive``, ``dual`` and currently ``xics``
+is the default but it may change in the future.
+
+The choosen interrupt mode is activated after a reconfiguration done
+in a machine reset.
+
+XIVE Device tree properties
+---------------------------
+
+The properties for the PAPR interrupt controller node when the *XIVE
+native exploitation mode* is selected shoud contain:
+
+- ``device_type``
+
+ value should be "power-ivpe".
+
+- ``compatible``
+
+ value should be "ibm,power-ivpe".
+
+- ``reg``
+
+ contains the base address and size of the thread interrupt
+ managnement areas (TIMA), for the User level and for the Guest OS
+ level. Only the Guest OS level is taken into account today.
+
+- ``ibm,xive-eq-sizes``
+
+ the size of the event queues. One cell per size supported, contains
+ log2 of size, in ascending order.
+
+- ``ibm,xive-lisn-ranges``
+
+ the IRQ interrupt number ranges assigned to the guest for the IPIs.
+
+The root node also exports :
+
+- ``ibm,plat-res-int-priorities``
+
+ contains a list of priorities that the hypervisor has reserved for
+ its own use.
+
+IRQ number space
+----------------
+
+IRQ Number space of the ``pseries`` machine is 8K wide and is the same
+for both interrupt mode. The different ranges are defined as follow :
+
+- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE)
+- ``0x1000 .. 0x1000`` 1 EPOW
+- ``0x1001 .. 0x1001`` 1 HOTPLUG
+- ``0x1100 .. 0x11FF`` 256 VIO devices
+- ``0x1200 .. 0x127F`` 32 PHBs devices
+- ``0x1280 .. 0x12FF`` unused
+- ``0x1300 .. 0x1FFF`` PHB MSIs
+
+Monitoring XIVE
+---------------
+
+The state of the XIVE interrupt controller can be queried through the
+monitor commands ``info pic``. The output comes in two parts.
+
+First, the state of the thread interrupt context registers is dumped
+for each CPU :
+
+::
+
+ (qemu) info pic
+ CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2
+ CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000
+ CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400
+ CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000
+ CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000
+ ...
+
+In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only
+the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM
+line which is set to the VP identifier.
+
+Then comes the routing information which aggregates the EAS and the
+END configuration:
+
+::
+
+ ...
+ LISN PQ EISN CPU/PRIO EQ
+ 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
+ 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
+ 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
+ 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
+ 00000004 MSI -Q M 00000000
+ 00000005 MSI -Q M 00000000
+ 00000006 MSI -Q M 00000000
+ 00000007 MSI -Q M 00000000
+ 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
+ 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
+ 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
+ 00001101 MSI -Q M 00000000
+ 00001200 LSI -Q M 00000000
+ 00001201 LSI -Q M 00000000
+ 00001202 LSI -Q M 00000000
+ 00001203 LSI -Q M 00000000
+ 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ]
+ 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
+ 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ]
+
+The source information and configuration:
+
+- The ``LISN`` column outputs the interrupt number of the source in
+ range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI``
+- The ``PQ`` column reflects the state of the PQ bits of the source :
+
+ - ``--`` source is ready to take events
+ - ``P-`` an event was sent and an EOI is PENDING
+ - ``PQ`` an event was QUEUED
+ - ``-Q`` source is OFF
+
+ a ``M`` indicates that source is *MASKED* at the EAS level,
+
+The targeting configuration :
+
+- The ``EISN`` column is the event data that will be queued in the event
+ queue of the O/S.
+- The ``CPU/PRIO`` column is the tuple defining the CPU number and
+ priority queue serving the source.
+- The ``EQ`` column outputs :
+
+ - the current index of the event queue/ the max number of entries
+ - the O/S event queue address
+ - the toggle bit
+ - the last entries that were pushed in the event queue.
diff --git a/docs/specs/ppc-xive.rst b/docs/specs/ppc-xive.rst
new file mode 100644
index 000000000000..b997dc062910
--- /dev/null
+++ b/docs/specs/ppc-xive.rst
@@ -0,0 +1,199 @@
+================================
+POWER9 XIVE interrupt controller
+================================
+
+The POWER9 processor comes with a new interrupt controller
+architecture, called XIVE as "eXternal Interrupt Virtualization
+Engine".
+
+Compared to the previous architecture, the main characteristics of
+XIVE are to support a larger number of interrupt sources and to
+deliver interrupts directly to virtual processors without hypervisor
+assistance. This removes the context switches required for the
+delivery process.
+
+
+XIVE architecture
+=================
+
+The XIVE IC is composed of three sub-engines, each taking care of a
+processing layer of external interrupts:
+
+- Interrupt Virtualization Source Engine (IVSE), or Source Controller
+ (SC). These are found in PCI PHBs, in the PSI host bridge
+ controller, but also inside the main controller for the core IPIs
+ and other sub-chips (NX, CAP, NPU) of the chip/processor. They are
+ configured to feed the IVRE with events.
+- Interrupt Virtualization Routing Engine (IVRE) or Virtualization
+ Controller (VC). It handles event coalescing and perform interrupt
+ routing by matching an event source number with an Event
+ Notification Descriptor (END).
+- Interrupt Virtualization Presentation Engine (IVPE) or Presentation
+ Controller (PC). It maintains the interrupt context state of each
+ thread and handles the delivery of the external interrupt to the
+ thread.
+
+::
+
+ XIVE Interrupt Controller
+ +------------------------------------+ IPIs
+ | +---------+ +---------+ +--------+ | +-------+
+ | |IVRE | |Common Q | |IVPE |----> | CORES |
+ | | esb | | | | |----> | |
+ | | eas | | Bridge | | tctx |----> | |
+ | |SC end | | | | nvt | | | |
+ +------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+
+ | RAM | +------------------|-----------------+ | | |
+ | | | | | |
+ | | | | | |
+ | | +--------------------v------------------------v-v-v--+ other
+ | <--+ Power Bus +--> chips
+ | esb | +---------+-----------------------+------------------+
+ | eas | | |
+ | end | +--|------+ |
+ | nvt | +----+----+ | +----+----+
+ +------+ |IVSE | | |IVSE |
+ | | | | |
+ | PQ-bits | | | PQ-bits |
+ | local |-+ | in VC |
+ +---------+ +---------+
+ PCIe NX,NPU,CAPI
+
+
+ PQ-bits: 2 bits source state machine (P:pending Q:queued)
+ esb: Event State Buffer (Array of PQ bits in an IVSE)
+ eas: Event Assignment Structure
+ end: Event Notification Descriptor
+ nvt: Notification Virtual Target
+ tctx: Thread interrupt Context registers
+
+
+
+XIVE internal tables
+--------------------
+
+Each of the sub-engines uses a set of tables to redirect interrupts
+from event sources to CPU threads.
+
+::
+
+ +-------+
+ User or O/S | EQ |
+ or +------>|entries|
+ Hypervisor | | .. |
+ Memory | +-------+
+ | ^
+ | |
+ +-------------------------------------------------+
+ | |
+ Hypervisor +------+ +---+--+ +---+--+ +------+
+ Memory | ESB | | EAT | | ENDT | | NVTT |
+ (skiboot) +----+-+ +----+-+ +----+-+ +------+
+ ^ | ^ | ^ | ^
+ | | | | | | |
+ +-------------------------------------------------+
+ | | | | | | |
+ | | | | | | |
+ +----|--|--------|--|--------|--|-+ +-|-----+ +------+
+ | | | | | | | | | | tctx| |Thread|
+ IPI or ---+ + v + v + v |---| + .. |-----> |
+ HW events | | | | | |
+ | IVRE | | IVPE | +------+
+ +---------------------------------+ +-------+
+
+
+The IVSE have a 2-bits state machine, P for pending and Q for queued,
+for each source that allows events to be triggered. They are stored in
+an Event State Buffer (ESB) array and can be controlled by MMIOs.
+
+If the event is let through, the IVRE looks up in the Event Assignment
+Structure (EAS) table for an Event Notification Descriptor (END)
+configured for the source. Each Event Notification Descriptor defines
+a notification path to a CPU and an in-memory Event Queue, in which
+will be enqueued an EQ data for the O/S to pull.
+
+The IVPE determines if a Notification Virtual Target (NVT) can handle
+the event by scanning the thread contexts of the VCPUs dispatched on
+the processor HW threads. It maintains the interrupt context state of
+each thread in a NVT table.
+
+XIVE thread interrupt context
+-----------------------------
+
+The XIVE presenter can generate four different exceptions to its
+HW threads:
+
+- hypervisor exception
+- O/S exception
+- Event-Based Branch (user level)
+- msgsnd (doorbell)
+
+Each exception has a state independent from the others called a Thread
+Interrupt Management context. This context is a set of registers which
+lets the thread handle priority management and interrupt
+acknowledgment among other things. The most important ones being :
+
+- Interrupt Priority Register (PIPR)
+- Interrupt Pending Buffer (IPB)
+- Current Processor Priority (CPPR)
+- Notification Source Register (NSR)
+
+TIMA
+~~~~
+
+The Thread Interrupt Management registers are accessible through a
+specific MMIO region, called the Thread Interrupt Management Area
+(TIMA), four aligned pages, each exposing a different view of the
+registers. First page (page address ending in ``0b00``) gives access
+to the entire context and is reserved for the ring 0 view for the
+physical thread context. The second (page address ending in ``0b01``)
+is for the hypervisor, ring 1 view. The third (page address ending in
+``0b10``) is for the operating system, ring 2 view. The fourth (page
+address ending in ``0b11``) is for user level, ring 3 view.
+
+Interrupt flow from an O/S perspective
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+After an event data has been enqueued in the O/S Event Queue, the IVPE
+raises the bit corresponding to the priority of the pending interrupt
+in the register IBP (Interrupt Pending Buffer) to indicate that an
+event is pending in one of the 8 priority queues. The Pending
+Interrupt Priority Register (PIPR) is also updated using the IPB. This
+register represent the priority of the most favored pending
+notification.
+
+The PIPR is then compared to the the Current Processor Priority
+Register (CPPR). If it is more favored (numerically less than), the
+CPU interrupt line is raised and the EO bit of the Notification Source
+Register (NSR) is updated to notify the presence of an exception for
+the O/S. The O/S acknowledges the interrupt with a special load in the
+Thread Interrupt Management Area.
+
+The O/S handles the interrupt and when done, performs an EOI using a
+MMIO operation on the ESB management page of the associate source.
+
+Overview of the QEMU models for XIVE
+====================================
+
+The XiveSource models the IVSE in general, internal and external. It
+handles the source ESBs and the MMIO interface to control them.
+
+The XiveNotifier is a small helper interface interconnecting the
+XiveSource to the XiveRouter.
+
+The XiveRouter is an abstract model acting as a combined IVRE and
+IVPE. It routes event notifications using the EAS and END tables to
+the IVPE sub-engine which does a CAM scan to find a CPU to deliver the
+exception. Storage should be provided by the inheriting classes.
+
+XiveEnDSource is a special source object. It exposes the END ESB MMIOs
+of the Event Queues which are used for coalescing event notifications
+and for escalation. Not used on the field, only to sync the EQ cache
+in OPAL.
+
+Finally, the XiveTCTX contains the interrupt state context of a thread,
+four sets of registers, one for each exception that can be delivered
+to a CPU. These contexts are scanned by the IVPE to find a matching VP
+when a notification is triggered. It also models the Thread Interrupt
+Management Area (TIMA), which exposes the thread context registers to
+the CPU for interrupt management.
diff --git a/MAINTAINERS b/MAINTAINERS
index 9424a490d647..82ac0415b493 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1708,6 +1708,7 @@ L: qemu-ppc@nongnu.org
S: Supported
F: hw/*/*xive*
F: include/hw/*/*xive*
+F: docs/*/*xive*
Subsystems
----------
--
2.20.1
On Tue, May 21, 2019 at 10:24:11AM +0200, Cédric Le Goater wrote: > This documents the overall XIVE architecture and the XIVE support for > sPAPR guest machines (pseries). > > It also provides documentation on the 'info pic' command. > > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- > > Changes since v2: > > - fixed typos. > > Changes since v1: > > - reorganized into different files and directories. I don't think the > 'info pic' documentation belongs to 'interop' nor 'devel' and so > the ppc-spapr-xive.rst file seemed like the best place for it. > > docs/index.rst | 1 + > docs/specs/index.rst | 13 +++ > docs/specs/ppc-spapr-xive.rst | 174 +++++++++++++++++++++++++++++ > docs/specs/ppc-xive.rst | 199 ++++++++++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 5 files changed, 388 insertions(+) > create mode 100644 docs/specs/index.rst > create mode 100644 docs/specs/ppc-spapr-xive.rst > create mode 100644 docs/specs/ppc-xive.rst Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> > > diff --git a/docs/index.rst b/docs/index.rst > index 3690955dd1f5..baa5791c174b 100644 > --- a/docs/index.rst > +++ b/docs/index.rst > @@ -12,4 +12,5 @@ Welcome to QEMU's documentation! > > interop/index > devel/index > + specs/index > > diff --git a/docs/specs/index.rst b/docs/specs/index.rst > new file mode 100644 > index 000000000000..2e927519c2e7 > --- /dev/null > +++ b/docs/specs/index.rst > @@ -0,0 +1,13 @@ > +. This is the top level page for the 'specs' manual > + > + > +QEMU full-system emulation guest hardware specifications > +======================================================== > + > + > +Contents: > + > +.. toctree:: > + :maxdepth: 2 > + > + xive > diff --git a/docs/specs/ppc-spapr-xive.rst b/docs/specs/ppc-spapr-xive.rst > new file mode 100644 > index 000000000000..539ce7ca4e90 > --- /dev/null > +++ b/docs/specs/ppc-spapr-xive.rst > @@ -0,0 +1,174 @@ > +XIVE for sPAPR (pseries machines) > +================================= > + > +The POWER9 processor comes with a new interrupt controller > +architecture, called XIVE as "eXternal Interrupt Virtualization > +Engine". It supports a larger number of interrupt sources and offers > +virtualization features which enables the HW to deliver interrupts > +directly to virtual processors without hypervisor assistance. > + > +A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 > +processors can run under two interrupt modes: > + > +- *Legacy Compatibility Mode* > + > + the hypervisor provides identical interfaces and similar > + functionality to PAPR+ Version 2.7. This is the default mode > + > + It is also referred as *XICS* in QEMU. > + > +- *XIVE native exploitation mode* > + > + the hypervisor provides new interfaces to manage the XIVE control > + structures, and provides direct control for interrupt management > + through MMIO pages. > + > +Which interrupt modes can be used by the machine is negotiated with > +the guest O/S during the Client Architecture Support negotiation > +sequence. The two modes are mutually exclusive. > + > +Both interrupt mode share the same IRQ number space. See below for the > +layout. > + > +CAS Negotiation > +--------------- > + > +QEMU advertises the supported interrupt modes in the device tree > +property "ibm,arch-vec-5-platform-support" in byte 23 and the OS > +Selection for XIVE is indicated in the "ibm,architecture-vec-5" > +property byte 23. > + > +The interrupt modes supported by the machine depend on the CPU type > +(POWER9 is required for XIVE) but also on the machine property > +``ic-mode`` which can be set on the command line. It can take the > +following values: ``xics``, ``xive``, ``dual`` and currently ``xics`` > +is the default but it may change in the future. > + > +The choosen interrupt mode is activated after a reconfiguration done > +in a machine reset. > + > +XIVE Device tree properties > +--------------------------- > + > +The properties for the PAPR interrupt controller node when the *XIVE > +native exploitation mode* is selected shoud contain: > + > +- ``device_type`` > + > + value should be "power-ivpe". > + > +- ``compatible`` > + > + value should be "ibm,power-ivpe". > + > +- ``reg`` > + > + contains the base address and size of the thread interrupt > + managnement areas (TIMA), for the User level and for the Guest OS > + level. Only the Guest OS level is taken into account today. > + > +- ``ibm,xive-eq-sizes`` > + > + the size of the event queues. One cell per size supported, contains > + log2 of size, in ascending order. > + > +- ``ibm,xive-lisn-ranges`` > + > + the IRQ interrupt number ranges assigned to the guest for the IPIs. > + > +The root node also exports : > + > +- ``ibm,plat-res-int-priorities`` > + > + contains a list of priorities that the hypervisor has reserved for > + its own use. > + > +IRQ number space > +---------------- > + > +IRQ Number space of the ``pseries`` machine is 8K wide and is the same > +for both interrupt mode. The different ranges are defined as follow : > + > +- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) > +- ``0x1000 .. 0x1000`` 1 EPOW > +- ``0x1001 .. 0x1001`` 1 HOTPLUG > +- ``0x1100 .. 0x11FF`` 256 VIO devices > +- ``0x1200 .. 0x127F`` 32 PHBs devices > +- ``0x1280 .. 0x12FF`` unused > +- ``0x1300 .. 0x1FFF`` PHB MSIs > + > +Monitoring XIVE > +--------------- > + > +The state of the XIVE interrupt controller can be queried through the > +monitor commands ``info pic``. The output comes in two parts. > + > +First, the state of the thread interrupt context registers is dumped > +for each CPU : > + > +:: > + > + (qemu) info pic > + CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 > + CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 > + CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 > + CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 > + CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 > + ... > + > +In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only > +the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM > +line which is set to the VP identifier. > + > +Then comes the routing information which aggregates the EAS and the > +END configuration: > + > +:: > + > + ... > + LISN PQ EISN CPU/PRIO EQ > + 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] > + 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] > + 00000004 MSI -Q M 00000000 > + 00000005 MSI -Q M 00000000 > + 00000006 MSI -Q M 00000000 > + 00000007 MSI -Q M 00000000 > + 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00001101 MSI -Q M 00000000 > + 00001200 LSI -Q M 00000000 > + 00001201 LSI -Q M 00000000 > + 00001202 LSI -Q M 00000000 > + 00001203 LSI -Q M 00000000 > + 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] > + 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] > + > +The source information and configuration: > + > +- The ``LISN`` column outputs the interrupt number of the source in > + range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` > +- The ``PQ`` column reflects the state of the PQ bits of the source : > + > + - ``--`` source is ready to take events > + - ``P-`` an event was sent and an EOI is PENDING > + - ``PQ`` an event was QUEUED > + - ``-Q`` source is OFF > + > + a ``M`` indicates that source is *MASKED* at the EAS level, > + > +The targeting configuration : > + > +- The ``EISN`` column is the event data that will be queued in the event > + queue of the O/S. > +- The ``CPU/PRIO`` column is the tuple defining the CPU number and > + priority queue serving the source. > +- The ``EQ`` column outputs : > + > + - the current index of the event queue/ the max number of entries > + - the O/S event queue address > + - the toggle bit > + - the last entries that were pushed in the event queue. > diff --git a/docs/specs/ppc-xive.rst b/docs/specs/ppc-xive.rst > new file mode 100644 > index 000000000000..b997dc062910 > --- /dev/null > +++ b/docs/specs/ppc-xive.rst > @@ -0,0 +1,199 @@ > +================================ > +POWER9 XIVE interrupt controller > +================================ > + > +The POWER9 processor comes with a new interrupt controller > +architecture, called XIVE as "eXternal Interrupt Virtualization > +Engine". > + > +Compared to the previous architecture, the main characteristics of > +XIVE are to support a larger number of interrupt sources and to > +deliver interrupts directly to virtual processors without hypervisor > +assistance. This removes the context switches required for the > +delivery process. > + > + > +XIVE architecture > +================= > + > +The XIVE IC is composed of three sub-engines, each taking care of a > +processing layer of external interrupts: > + > +- Interrupt Virtualization Source Engine (IVSE), or Source Controller > + (SC). These are found in PCI PHBs, in the PSI host bridge > + controller, but also inside the main controller for the core IPIs > + and other sub-chips (NX, CAP, NPU) of the chip/processor. They are > + configured to feed the IVRE with events. > +- Interrupt Virtualization Routing Engine (IVRE) or Virtualization > + Controller (VC). It handles event coalescing and perform interrupt > + routing by matching an event source number with an Event > + Notification Descriptor (END). > +- Interrupt Virtualization Presentation Engine (IVPE) or Presentation > + Controller (PC). It maintains the interrupt context state of each > + thread and handles the delivery of the external interrupt to the > + thread. > + > +:: > + > + XIVE Interrupt Controller > + +------------------------------------+ IPIs > + | +---------+ +---------+ +--------+ | +-------+ > + | |IVRE | |Common Q | |IVPE |----> | CORES | > + | | esb | | | | |----> | | > + | | eas | | Bridge | | tctx |----> | | > + | |SC end | | | | nvt | | | | > + +------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+ > + | RAM | +------------------|-----------------+ | | | > + | | | | | | > + | | | | | | > + | | +--------------------v------------------------v-v-v--+ other > + | <--+ Power Bus +--> chips > + | esb | +---------+-----------------------+------------------+ > + | eas | | | > + | end | +--|------+ | > + | nvt | +----+----+ | +----+----+ > + +------+ |IVSE | | |IVSE | > + | | | | | > + | PQ-bits | | | PQ-bits | > + | local |-+ | in VC | > + +---------+ +---------+ > + PCIe NX,NPU,CAPI > + > + > + PQ-bits: 2 bits source state machine (P:pending Q:queued) > + esb: Event State Buffer (Array of PQ bits in an IVSE) > + eas: Event Assignment Structure > + end: Event Notification Descriptor > + nvt: Notification Virtual Target > + tctx: Thread interrupt Context registers > + > + > + > +XIVE internal tables > +-------------------- > + > +Each of the sub-engines uses a set of tables to redirect interrupts > +from event sources to CPU threads. > + > +:: > + > + +-------+ > + User or O/S | EQ | > + or +------>|entries| > + Hypervisor | | .. | > + Memory | +-------+ > + | ^ > + | | > + +-------------------------------------------------+ > + | | > + Hypervisor +------+ +---+--+ +---+--+ +------+ > + Memory | ESB | | EAT | | ENDT | | NVTT | > + (skiboot) +----+-+ +----+-+ +----+-+ +------+ > + ^ | ^ | ^ | ^ > + | | | | | | | > + +-------------------------------------------------+ > + | | | | | | | > + | | | | | | | > + +----|--|--------|--|--------|--|-+ +-|-----+ +------+ > + | | | | | | | | | | tctx| |Thread| > + IPI or ---+ + v + v + v |---| + .. |-----> | > + HW events | | | | | | > + | IVRE | | IVPE | +------+ > + +---------------------------------+ +-------+ > + > + > +The IVSE have a 2-bits state machine, P for pending and Q for queued, > +for each source that allows events to be triggered. They are stored in > +an Event State Buffer (ESB) array and can be controlled by MMIOs. > + > +If the event is let through, the IVRE looks up in the Event Assignment > +Structure (EAS) table for an Event Notification Descriptor (END) > +configured for the source. Each Event Notification Descriptor defines > +a notification path to a CPU and an in-memory Event Queue, in which > +will be enqueued an EQ data for the O/S to pull. > + > +The IVPE determines if a Notification Virtual Target (NVT) can handle > +the event by scanning the thread contexts of the VCPUs dispatched on > +the processor HW threads. It maintains the interrupt context state of > +each thread in a NVT table. > + > +XIVE thread interrupt context > +----------------------------- > + > +The XIVE presenter can generate four different exceptions to its > +HW threads: > + > +- hypervisor exception > +- O/S exception > +- Event-Based Branch (user level) > +- msgsnd (doorbell) > + > +Each exception has a state independent from the others called a Thread > +Interrupt Management context. This context is a set of registers which > +lets the thread handle priority management and interrupt > +acknowledgment among other things. The most important ones being : > + > +- Interrupt Priority Register (PIPR) > +- Interrupt Pending Buffer (IPB) > +- Current Processor Priority (CPPR) > +- Notification Source Register (NSR) > + > +TIMA > +~~~~ > + > +The Thread Interrupt Management registers are accessible through a > +specific MMIO region, called the Thread Interrupt Management Area > +(TIMA), four aligned pages, each exposing a different view of the > +registers. First page (page address ending in ``0b00``) gives access > +to the entire context and is reserved for the ring 0 view for the > +physical thread context. The second (page address ending in ``0b01``) > +is for the hypervisor, ring 1 view. The third (page address ending in > +``0b10``) is for the operating system, ring 2 view. The fourth (page > +address ending in ``0b11``) is for user level, ring 3 view. > + > +Interrupt flow from an O/S perspective > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +After an event data has been enqueued in the O/S Event Queue, the IVPE > +raises the bit corresponding to the priority of the pending interrupt > +in the register IBP (Interrupt Pending Buffer) to indicate that an > +event is pending in one of the 8 priority queues. The Pending > +Interrupt Priority Register (PIPR) is also updated using the IPB. This > +register represent the priority of the most favored pending > +notification. > + > +The PIPR is then compared to the the Current Processor Priority > +Register (CPPR). If it is more favored (numerically less than), the > +CPU interrupt line is raised and the EO bit of the Notification Source > +Register (NSR) is updated to notify the presence of an exception for > +the O/S. The O/S acknowledges the interrupt with a special load in the > +Thread Interrupt Management Area. > + > +The O/S handles the interrupt and when done, performs an EOI using a > +MMIO operation on the ESB management page of the associate source. > + > +Overview of the QEMU models for XIVE > +==================================== > + > +The XiveSource models the IVSE in general, internal and external. It > +handles the source ESBs and the MMIO interface to control them. > + > +The XiveNotifier is a small helper interface interconnecting the > +XiveSource to the XiveRouter. > + > +The XiveRouter is an abstract model acting as a combined IVRE and > +IVPE. It routes event notifications using the EAS and END tables to > +the IVPE sub-engine which does a CAM scan to find a CPU to deliver the > +exception. Storage should be provided by the inheriting classes. > + > +XiveEnDSource is a special source object. It exposes the END ESB MMIOs > +of the Event Queues which are used for coalescing event notifications > +and for escalation. Not used on the field, only to sync the EQ cache > +in OPAL. > + > +Finally, the XiveTCTX contains the interrupt state context of a thread, > +four sets of registers, one for each exception that can be delivered > +to a CPU. These contexts are scanned by the IVPE to find a matching VP > +when a notification is triggered. It also models the Thread Interrupt > +Management Area (TIMA), which exposes the thread context registers to > +the CPU for interrupt management. > diff --git a/MAINTAINERS b/MAINTAINERS > index 9424a490d647..82ac0415b493 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1708,6 +1708,7 @@ L: qemu-ppc@nongnu.org > S: Supported > F: hw/*/*xive* > F: include/hw/*/*xive* > +F: docs/*/*xive* > > Subsystems > ---------- > -- > 2.20.1 > >
On Tue, 21 May 2019 10:24:11 +0200 Cédric Le Goater <clg@kaod.org> wrote: > This documents the overall XIVE architecture and the XIVE support for > sPAPR guest machines (pseries). > > It also provides documentation on the 'info pic' command. > > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- Reviewed-by: Greg Kurz <groug@kaod.org> > > Changes since v2: > > - fixed typos. > > Changes since v1: > > - reorganized into different files and directories. I don't think the > 'info pic' documentation belongs to 'interop' nor 'devel' and so > the ppc-spapr-xive.rst file seemed like the best place for it. > > docs/index.rst | 1 + > docs/specs/index.rst | 13 +++ > docs/specs/ppc-spapr-xive.rst | 174 +++++++++++++++++++++++++++++ > docs/specs/ppc-xive.rst | 199 ++++++++++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 5 files changed, 388 insertions(+) > create mode 100644 docs/specs/index.rst > create mode 100644 docs/specs/ppc-spapr-xive.rst > create mode 100644 docs/specs/ppc-xive.rst > > diff --git a/docs/index.rst b/docs/index.rst > index 3690955dd1f5..baa5791c174b 100644 > --- a/docs/index.rst > +++ b/docs/index.rst > @@ -12,4 +12,5 @@ Welcome to QEMU's documentation! > > interop/index > devel/index > + specs/index > > diff --git a/docs/specs/index.rst b/docs/specs/index.rst > new file mode 100644 > index 000000000000..2e927519c2e7 > --- /dev/null > +++ b/docs/specs/index.rst > @@ -0,0 +1,13 @@ > +. This is the top level page for the 'specs' manual > + > + > +QEMU full-system emulation guest hardware specifications > +======================================================== > + > + > +Contents: > + > +.. toctree:: > + :maxdepth: 2 > + > + xive > diff --git a/docs/specs/ppc-spapr-xive.rst b/docs/specs/ppc-spapr-xive.rst > new file mode 100644 > index 000000000000..539ce7ca4e90 > --- /dev/null > +++ b/docs/specs/ppc-spapr-xive.rst > @@ -0,0 +1,174 @@ > +XIVE for sPAPR (pseries machines) > +================================= > + > +The POWER9 processor comes with a new interrupt controller > +architecture, called XIVE as "eXternal Interrupt Virtualization > +Engine". It supports a larger number of interrupt sources and offers > +virtualization features which enables the HW to deliver interrupts > +directly to virtual processors without hypervisor assistance. > + > +A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 > +processors can run under two interrupt modes: > + > +- *Legacy Compatibility Mode* > + > + the hypervisor provides identical interfaces and similar > + functionality to PAPR+ Version 2.7. This is the default mode > + > + It is also referred as *XICS* in QEMU. > + > +- *XIVE native exploitation mode* > + > + the hypervisor provides new interfaces to manage the XIVE control > + structures, and provides direct control for interrupt management > + through MMIO pages. > + > +Which interrupt modes can be used by the machine is negotiated with > +the guest O/S during the Client Architecture Support negotiation > +sequence. The two modes are mutually exclusive. > + > +Both interrupt mode share the same IRQ number space. See below for the > +layout. > + > +CAS Negotiation > +--------------- > + > +QEMU advertises the supported interrupt modes in the device tree > +property "ibm,arch-vec-5-platform-support" in byte 23 and the OS > +Selection for XIVE is indicated in the "ibm,architecture-vec-5" > +property byte 23. > + > +The interrupt modes supported by the machine depend on the CPU type > +(POWER9 is required for XIVE) but also on the machine property > +``ic-mode`` which can be set on the command line. It can take the > +following values: ``xics``, ``xive``, ``dual`` and currently ``xics`` > +is the default but it may change in the future. > + > +The choosen interrupt mode is activated after a reconfiguration done > +in a machine reset. > + > +XIVE Device tree properties > +--------------------------- > + > +The properties for the PAPR interrupt controller node when the *XIVE > +native exploitation mode* is selected shoud contain: > + > +- ``device_type`` > + > + value should be "power-ivpe". > + > +- ``compatible`` > + > + value should be "ibm,power-ivpe". > + > +- ``reg`` > + > + contains the base address and size of the thread interrupt > + managnement areas (TIMA), for the User level and for the Guest OS > + level. Only the Guest OS level is taken into account today. > + > +- ``ibm,xive-eq-sizes`` > + > + the size of the event queues. One cell per size supported, contains > + log2 of size, in ascending order. > + > +- ``ibm,xive-lisn-ranges`` > + > + the IRQ interrupt number ranges assigned to the guest for the IPIs. > + > +The root node also exports : > + > +- ``ibm,plat-res-int-priorities`` > + > + contains a list of priorities that the hypervisor has reserved for > + its own use. > + > +IRQ number space > +---------------- > + > +IRQ Number space of the ``pseries`` machine is 8K wide and is the same > +for both interrupt mode. The different ranges are defined as follow : > + > +- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) > +- ``0x1000 .. 0x1000`` 1 EPOW > +- ``0x1001 .. 0x1001`` 1 HOTPLUG > +- ``0x1100 .. 0x11FF`` 256 VIO devices > +- ``0x1200 .. 0x127F`` 32 PHBs devices > +- ``0x1280 .. 0x12FF`` unused > +- ``0x1300 .. 0x1FFF`` PHB MSIs > + > +Monitoring XIVE > +--------------- > + > +The state of the XIVE interrupt controller can be queried through the > +monitor commands ``info pic``. The output comes in two parts. > + > +First, the state of the thread interrupt context registers is dumped > +for each CPU : > + > +:: > + > + (qemu) info pic > + CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 > + CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 > + CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 > + CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 > + CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 > + ... > + > +In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only > +the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM > +line which is set to the VP identifier. > + > +Then comes the routing information which aggregates the EAS and the > +END configuration: > + > +:: > + > + ... > + LISN PQ EISN CPU/PRIO EQ > + 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] > + 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] > + 00000004 MSI -Q M 00000000 > + 00000005 MSI -Q M 00000000 > + 00000006 MSI -Q M 00000000 > + 00000007 MSI -Q M 00000000 > + 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00001101 MSI -Q M 00000000 > + 00001200 LSI -Q M 00000000 > + 00001201 LSI -Q M 00000000 > + 00001202 LSI -Q M 00000000 > + 00001203 LSI -Q M 00000000 > + 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] > + 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] > + > +The source information and configuration: > + > +- The ``LISN`` column outputs the interrupt number of the source in > + range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` > +- The ``PQ`` column reflects the state of the PQ bits of the source : > + > + - ``--`` source is ready to take events > + - ``P-`` an event was sent and an EOI is PENDING > + - ``PQ`` an event was QUEUED > + - ``-Q`` source is OFF > + > + a ``M`` indicates that source is *MASKED* at the EAS level, > + > +The targeting configuration : > + > +- The ``EISN`` column is the event data that will be queued in the event > + queue of the O/S. > +- The ``CPU/PRIO`` column is the tuple defining the CPU number and > + priority queue serving the source. > +- The ``EQ`` column outputs : > + > + - the current index of the event queue/ the max number of entries > + - the O/S event queue address > + - the toggle bit > + - the last entries that were pushed in the event queue. > diff --git a/docs/specs/ppc-xive.rst b/docs/specs/ppc-xive.rst > new file mode 100644 > index 000000000000..b997dc062910 > --- /dev/null > +++ b/docs/specs/ppc-xive.rst > @@ -0,0 +1,199 @@ > +================================ > +POWER9 XIVE interrupt controller > +================================ > + > +The POWER9 processor comes with a new interrupt controller > +architecture, called XIVE as "eXternal Interrupt Virtualization > +Engine". > + > +Compared to the previous architecture, the main characteristics of > +XIVE are to support a larger number of interrupt sources and to > +deliver interrupts directly to virtual processors without hypervisor > +assistance. This removes the context switches required for the > +delivery process. > + > + > +XIVE architecture > +================= > + > +The XIVE IC is composed of three sub-engines, each taking care of a > +processing layer of external interrupts: > + > +- Interrupt Virtualization Source Engine (IVSE), or Source Controller > + (SC). These are found in PCI PHBs, in the PSI host bridge > + controller, but also inside the main controller for the core IPIs > + and other sub-chips (NX, CAP, NPU) of the chip/processor. They are > + configured to feed the IVRE with events. > +- Interrupt Virtualization Routing Engine (IVRE) or Virtualization > + Controller (VC). It handles event coalescing and perform interrupt > + routing by matching an event source number with an Event > + Notification Descriptor (END). > +- Interrupt Virtualization Presentation Engine (IVPE) or Presentation > + Controller (PC). It maintains the interrupt context state of each > + thread and handles the delivery of the external interrupt to the > + thread. > + > +:: > + > + XIVE Interrupt Controller > + +------------------------------------+ IPIs > + | +---------+ +---------+ +--------+ | +-------+ > + | |IVRE | |Common Q | |IVPE |----> | CORES | > + | | esb | | | | |----> | | > + | | eas | | Bridge | | tctx |----> | | > + | |SC end | | | | nvt | | | | > + +------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+ > + | RAM | +------------------|-----------------+ | | | > + | | | | | | > + | | | | | | > + | | +--------------------v------------------------v-v-v--+ other > + | <--+ Power Bus +--> chips > + | esb | +---------+-----------------------+------------------+ > + | eas | | | > + | end | +--|------+ | > + | nvt | +----+----+ | +----+----+ > + +------+ |IVSE | | |IVSE | > + | | | | | > + | PQ-bits | | | PQ-bits | > + | local |-+ | in VC | > + +---------+ +---------+ > + PCIe NX,NPU,CAPI > + > + > + PQ-bits: 2 bits source state machine (P:pending Q:queued) > + esb: Event State Buffer (Array of PQ bits in an IVSE) > + eas: Event Assignment Structure > + end: Event Notification Descriptor > + nvt: Notification Virtual Target > + tctx: Thread interrupt Context registers > + > + > + > +XIVE internal tables > +-------------------- > + > +Each of the sub-engines uses a set of tables to redirect interrupts > +from event sources to CPU threads. > + > +:: > + > + +-------+ > + User or O/S | EQ | > + or +------>|entries| > + Hypervisor | | .. | > + Memory | +-------+ > + | ^ > + | | > + +-------------------------------------------------+ > + | | > + Hypervisor +------+ +---+--+ +---+--+ +------+ > + Memory | ESB | | EAT | | ENDT | | NVTT | > + (skiboot) +----+-+ +----+-+ +----+-+ +------+ > + ^ | ^ | ^ | ^ > + | | | | | | | > + +-------------------------------------------------+ > + | | | | | | | > + | | | | | | | > + +----|--|--------|--|--------|--|-+ +-|-----+ +------+ > + | | | | | | | | | | tctx| |Thread| > + IPI or ---+ + v + v + v |---| + .. |-----> | > + HW events | | | | | | > + | IVRE | | IVPE | +------+ > + +---------------------------------+ +-------+ > + > + > +The IVSE have a 2-bits state machine, P for pending and Q for queued, > +for each source that allows events to be triggered. They are stored in > +an Event State Buffer (ESB) array and can be controlled by MMIOs. > + > +If the event is let through, the IVRE looks up in the Event Assignment > +Structure (EAS) table for an Event Notification Descriptor (END) > +configured for the source. Each Event Notification Descriptor defines > +a notification path to a CPU and an in-memory Event Queue, in which > +will be enqueued an EQ data for the O/S to pull. > + > +The IVPE determines if a Notification Virtual Target (NVT) can handle > +the event by scanning the thread contexts of the VCPUs dispatched on > +the processor HW threads. It maintains the interrupt context state of > +each thread in a NVT table. > + > +XIVE thread interrupt context > +----------------------------- > + > +The XIVE presenter can generate four different exceptions to its > +HW threads: > + > +- hypervisor exception > +- O/S exception > +- Event-Based Branch (user level) > +- msgsnd (doorbell) > + > +Each exception has a state independent from the others called a Thread > +Interrupt Management context. This context is a set of registers which > +lets the thread handle priority management and interrupt > +acknowledgment among other things. The most important ones being : > + > +- Interrupt Priority Register (PIPR) > +- Interrupt Pending Buffer (IPB) > +- Current Processor Priority (CPPR) > +- Notification Source Register (NSR) > + > +TIMA > +~~~~ > + > +The Thread Interrupt Management registers are accessible through a > +specific MMIO region, called the Thread Interrupt Management Area > +(TIMA), four aligned pages, each exposing a different view of the > +registers. First page (page address ending in ``0b00``) gives access > +to the entire context and is reserved for the ring 0 view for the > +physical thread context. The second (page address ending in ``0b01``) > +is for the hypervisor, ring 1 view. The third (page address ending in > +``0b10``) is for the operating system, ring 2 view. The fourth (page > +address ending in ``0b11``) is for user level, ring 3 view. > + > +Interrupt flow from an O/S perspective > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +After an event data has been enqueued in the O/S Event Queue, the IVPE > +raises the bit corresponding to the priority of the pending interrupt > +in the register IBP (Interrupt Pending Buffer) to indicate that an > +event is pending in one of the 8 priority queues. The Pending > +Interrupt Priority Register (PIPR) is also updated using the IPB. This > +register represent the priority of the most favored pending > +notification. > + > +The PIPR is then compared to the the Current Processor Priority > +Register (CPPR). If it is more favored (numerically less than), the > +CPU interrupt line is raised and the EO bit of the Notification Source > +Register (NSR) is updated to notify the presence of an exception for > +the O/S. The O/S acknowledges the interrupt with a special load in the > +Thread Interrupt Management Area. > + > +The O/S handles the interrupt and when done, performs an EOI usi xd->saved_p = true;ng a > +MMIO operation on the ESB management page of the associate source. > + > +Overview of the QEMU models for XIVE > +==================================== > + > +The XiveSource models the IVSE in general, internal and external. It > +handles the source ESBs and the MMIO interface to control them. > + > +The XiveNotifier is a small helper interface interconnecting the > +XiveSource to the XiveRouter. > + > +The XiveRouter is an abstract model acting as a combined IVRE and > +IVPE. It routes event notifications using the EAS and END tables to > +the IVPE sub-engine which does a CAM scan to find a CPU to deliver the > +exception. Storage should be provided by the inheriting classes. > + > +XiveEnDSource is a special source object. It exposes the END ESB MMIOs > +of the Event Queues which are used for coalescing event notifications > +and for escalation. Not used on the field, only to sync the EQ cache > +in OPAL. > + > +Finally, the XiveTCTX contains the interrupt state context of a thread, > +four sets of registers, one for each exception that can be delivered > +to a CPU. These contexts are scanned by the IVPE to find a matching VP > +when a notification is triggered. It also models the Thread Interrupt > +Management Area (TIMA), which exposes the thread context registers to > +the CPU for interrupt management. > diff --git a/MAINTAINERS b/MAINTAINERS > index 9424a490d647..82ac0415b493 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1708,6 +1708,7 @@ L: qemu-ppc@nongnu.org > S: Supported > F: hw/*/*xive* > F: include/hw/*/*xive* > +F: docs/*/*xive* > > Subsystems > ----------
On Tue, May 21, 2019 at 10:24:11AM +0200, Cédric Le Goater wrote: > This documents the overall XIVE architecture and the XIVE support for > sPAPR guest machines (pseries). > > It also provides documentation on the 'info pic' command. > > Signed-off-by: Cédric Le Goater <clg@kaod.org> Applied, thanks. > --- > > Changes since v2: > > - fixed typos. > > Changes since v1: > > - reorganized into different files and directories. I don't think the > 'info pic' documentation belongs to 'interop' nor 'devel' and so > the ppc-spapr-xive.rst file seemed like the best place for it. > > docs/index.rst | 1 + > docs/specs/index.rst | 13 +++ > docs/specs/ppc-spapr-xive.rst | 174 +++++++++++++++++++++++++++++ > docs/specs/ppc-xive.rst | 199 ++++++++++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 5 files changed, 388 insertions(+) > create mode 100644 docs/specs/index.rst > create mode 100644 docs/specs/ppc-spapr-xive.rst > create mode 100644 docs/specs/ppc-xive.rst > > diff --git a/docs/index.rst b/docs/index.rst > index 3690955dd1f5..baa5791c174b 100644 > --- a/docs/index.rst > +++ b/docs/index.rst > @@ -12,4 +12,5 @@ Welcome to QEMU's documentation! > > interop/index > devel/index > + specs/index > > diff --git a/docs/specs/index.rst b/docs/specs/index.rst > new file mode 100644 > index 000000000000..2e927519c2e7 > --- /dev/null > +++ b/docs/specs/index.rst > @@ -0,0 +1,13 @@ > +. This is the top level page for the 'specs' manual > + > + > +QEMU full-system emulation guest hardware specifications > +======================================================== > + > + > +Contents: > + > +.. toctree:: > + :maxdepth: 2 > + > + xive > diff --git a/docs/specs/ppc-spapr-xive.rst b/docs/specs/ppc-spapr-xive.rst > new file mode 100644 > index 000000000000..539ce7ca4e90 > --- /dev/null > +++ b/docs/specs/ppc-spapr-xive.rst > @@ -0,0 +1,174 @@ > +XIVE for sPAPR (pseries machines) > +================================= > + > +The POWER9 processor comes with a new interrupt controller > +architecture, called XIVE as "eXternal Interrupt Virtualization > +Engine". It supports a larger number of interrupt sources and offers > +virtualization features which enables the HW to deliver interrupts > +directly to virtual processors without hypervisor assistance. > + > +A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 > +processors can run under two interrupt modes: > + > +- *Legacy Compatibility Mode* > + > + the hypervisor provides identical interfaces and similar > + functionality to PAPR+ Version 2.7. This is the default mode > + > + It is also referred as *XICS* in QEMU. > + > +- *XIVE native exploitation mode* > + > + the hypervisor provides new interfaces to manage the XIVE control > + structures, and provides direct control for interrupt management > + through MMIO pages. > + > +Which interrupt modes can be used by the machine is negotiated with > +the guest O/S during the Client Architecture Support negotiation > +sequence. The two modes are mutually exclusive. > + > +Both interrupt mode share the same IRQ number space. See below for the > +layout. > + > +CAS Negotiation > +--------------- > + > +QEMU advertises the supported interrupt modes in the device tree > +property "ibm,arch-vec-5-platform-support" in byte 23 and the OS > +Selection for XIVE is indicated in the "ibm,architecture-vec-5" > +property byte 23. > + > +The interrupt modes supported by the machine depend on the CPU type > +(POWER9 is required for XIVE) but also on the machine property > +``ic-mode`` which can be set on the command line. It can take the > +following values: ``xics``, ``xive``, ``dual`` and currently ``xics`` > +is the default but it may change in the future. > + > +The choosen interrupt mode is activated after a reconfiguration done > +in a machine reset. > + > +XIVE Device tree properties > +--------------------------- > + > +The properties for the PAPR interrupt controller node when the *XIVE > +native exploitation mode* is selected shoud contain: > + > +- ``device_type`` > + > + value should be "power-ivpe". > + > +- ``compatible`` > + > + value should be "ibm,power-ivpe". > + > +- ``reg`` > + > + contains the base address and size of the thread interrupt > + managnement areas (TIMA), for the User level and for the Guest OS > + level. Only the Guest OS level is taken into account today. > + > +- ``ibm,xive-eq-sizes`` > + > + the size of the event queues. One cell per size supported, contains > + log2 of size, in ascending order. > + > +- ``ibm,xive-lisn-ranges`` > + > + the IRQ interrupt number ranges assigned to the guest for the IPIs. > + > +The root node also exports : > + > +- ``ibm,plat-res-int-priorities`` > + > + contains a list of priorities that the hypervisor has reserved for > + its own use. > + > +IRQ number space > +---------------- > + > +IRQ Number space of the ``pseries`` machine is 8K wide and is the same > +for both interrupt mode. The different ranges are defined as follow : > + > +- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) > +- ``0x1000 .. 0x1000`` 1 EPOW > +- ``0x1001 .. 0x1001`` 1 HOTPLUG > +- ``0x1100 .. 0x11FF`` 256 VIO devices > +- ``0x1200 .. 0x127F`` 32 PHBs devices > +- ``0x1280 .. 0x12FF`` unused > +- ``0x1300 .. 0x1FFF`` PHB MSIs > + > +Monitoring XIVE > +--------------- > + > +The state of the XIVE interrupt controller can be queried through the > +monitor commands ``info pic``. The output comes in two parts. > + > +First, the state of the thread interrupt context registers is dumped > +for each CPU : > + > +:: > + > + (qemu) info pic > + CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 > + CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 > + CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 > + CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 > + CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 > + ... > + > +In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only > +the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM > +line which is set to the VP identifier. > + > +Then comes the routing information which aggregates the EAS and the > +END configuration: > + > +:: > + > + ... > + LISN PQ EISN CPU/PRIO EQ > + 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] > + 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] > + 00000004 MSI -Q M 00000000 > + 00000005 MSI -Q M 00000000 > + 00000006 MSI -Q M 00000000 > + 00000007 MSI -Q M 00000000 > + 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00001101 MSI -Q M 00000000 > + 00001200 LSI -Q M 00000000 > + 00001201 LSI -Q M 00000000 > + 00001202 LSI -Q M 00000000 > + 00001203 LSI -Q M 00000000 > + 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] > + 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] > + > +The source information and configuration: > + > +- The ``LISN`` column outputs the interrupt number of the source in > + range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` > +- The ``PQ`` column reflects the state of the PQ bits of the source : > + > + - ``--`` source is ready to take events > + - ``P-`` an event was sent and an EOI is PENDING > + - ``PQ`` an event was QUEUED > + - ``-Q`` source is OFF > + > + a ``M`` indicates that source is *MASKED* at the EAS level, > + > +The targeting configuration : > + > +- The ``EISN`` column is the event data that will be queued in the event > + queue of the O/S. > +- The ``CPU/PRIO`` column is the tuple defining the CPU number and > + priority queue serving the source. > +- The ``EQ`` column outputs : > + > + - the current index of the event queue/ the max number of entries > + - the O/S event queue address > + - the toggle bit > + - the last entries that were pushed in the event queue. > diff --git a/docs/specs/ppc-xive.rst b/docs/specs/ppc-xive.rst > new file mode 100644 > index 000000000000..b997dc062910 > --- /dev/null > +++ b/docs/specs/ppc-xive.rst > @@ -0,0 +1,199 @@ > +================================ > +POWER9 XIVE interrupt controller > +================================ > + > +The POWER9 processor comes with a new interrupt controller > +architecture, called XIVE as "eXternal Interrupt Virtualization > +Engine". > + > +Compared to the previous architecture, the main characteristics of > +XIVE are to support a larger number of interrupt sources and to > +deliver interrupts directly to virtual processors without hypervisor > +assistance. This removes the context switches required for the > +delivery process. > + > + > +XIVE architecture > +================= > + > +The XIVE IC is composed of three sub-engines, each taking care of a > +processing layer of external interrupts: > + > +- Interrupt Virtualization Source Engine (IVSE), or Source Controller > + (SC). These are found in PCI PHBs, in the PSI host bridge > + controller, but also inside the main controller for the core IPIs > + and other sub-chips (NX, CAP, NPU) of the chip/processor. They are > + configured to feed the IVRE with events. > +- Interrupt Virtualization Routing Engine (IVRE) or Virtualization > + Controller (VC). It handles event coalescing and perform interrupt > + routing by matching an event source number with an Event > + Notification Descriptor (END). > +- Interrupt Virtualization Presentation Engine (IVPE) or Presentation > + Controller (PC). It maintains the interrupt context state of each > + thread and handles the delivery of the external interrupt to the > + thread. > + > +:: > + > + XIVE Interrupt Controller > + +------------------------------------+ IPIs > + | +---------+ +---------+ +--------+ | +-------+ > + | |IVRE | |Common Q | |IVPE |----> | CORES | > + | | esb | | | | |----> | | > + | | eas | | Bridge | | tctx |----> | | > + | |SC end | | | | nvt | | | | > + +------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+ > + | RAM | +------------------|-----------------+ | | | > + | | | | | | > + | | | | | | > + | | +--------------------v------------------------v-v-v--+ other > + | <--+ Power Bus +--> chips > + | esb | +---------+-----------------------+------------------+ > + | eas | | | > + | end | +--|------+ | > + | nvt | +----+----+ | +----+----+ > + +------+ |IVSE | | |IVSE | > + | | | | | > + | PQ-bits | | | PQ-bits | > + | local |-+ | in VC | > + +---------+ +---------+ > + PCIe NX,NPU,CAPI > + > + > + PQ-bits: 2 bits source state machine (P:pending Q:queued) > + esb: Event State Buffer (Array of PQ bits in an IVSE) > + eas: Event Assignment Structure > + end: Event Notification Descriptor > + nvt: Notification Virtual Target > + tctx: Thread interrupt Context registers > + > + > + > +XIVE internal tables > +-------------------- > + > +Each of the sub-engines uses a set of tables to redirect interrupts > +from event sources to CPU threads. > + > +:: > + > + +-------+ > + User or O/S | EQ | > + or +------>|entries| > + Hypervisor | | .. | > + Memory | +-------+ > + | ^ > + | | > + +-------------------------------------------------+ > + | | > + Hypervisor +------+ +---+--+ +---+--+ +------+ > + Memory | ESB | | EAT | | ENDT | | NVTT | > + (skiboot) +----+-+ +----+-+ +----+-+ +------+ > + ^ | ^ | ^ | ^ > + | | | | | | | > + +-------------------------------------------------+ > + | | | | | | | > + | | | | | | | > + +----|--|--------|--|--------|--|-+ +-|-----+ +------+ > + | | | | | | | | | | tctx| |Thread| > + IPI or ---+ + v + v + v |---| + .. |-----> | > + HW events | | | | | | > + | IVRE | | IVPE | +------+ > + +---------------------------------+ +-------+ > + > + > +The IVSE have a 2-bits state machine, P for pending and Q for queued, > +for each source that allows events to be triggered. They are stored in > +an Event State Buffer (ESB) array and can be controlled by MMIOs. > + > +If the event is let through, the IVRE looks up in the Event Assignment > +Structure (EAS) table for an Event Notification Descriptor (END) > +configured for the source. Each Event Notification Descriptor defines > +a notification path to a CPU and an in-memory Event Queue, in which > +will be enqueued an EQ data for the O/S to pull. > + > +The IVPE determines if a Notification Virtual Target (NVT) can handle > +the event by scanning the thread contexts of the VCPUs dispatched on > +the processor HW threads. It maintains the interrupt context state of > +each thread in a NVT table. > + > +XIVE thread interrupt context > +----------------------------- > + > +The XIVE presenter can generate four different exceptions to its > +HW threads: > + > +- hypervisor exception > +- O/S exception > +- Event-Based Branch (user level) > +- msgsnd (doorbell) > + > +Each exception has a state independent from the others called a Thread > +Interrupt Management context. This context is a set of registers which > +lets the thread handle priority management and interrupt > +acknowledgment among other things. The most important ones being : > + > +- Interrupt Priority Register (PIPR) > +- Interrupt Pending Buffer (IPB) > +- Current Processor Priority (CPPR) > +- Notification Source Register (NSR) > + > +TIMA > +~~~~ > + > +The Thread Interrupt Management registers are accessible through a > +specific MMIO region, called the Thread Interrupt Management Area > +(TIMA), four aligned pages, each exposing a different view of the > +registers. First page (page address ending in ``0b00``) gives access > +to the entire context and is reserved for the ring 0 view for the > +physical thread context. The second (page address ending in ``0b01``) > +is for the hypervisor, ring 1 view. The third (page address ending in > +``0b10``) is for the operating system, ring 2 view. The fourth (page > +address ending in ``0b11``) is for user level, ring 3 view. > + > +Interrupt flow from an O/S perspective > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +After an event data has been enqueued in the O/S Event Queue, the IVPE > +raises the bit corresponding to the priority of the pending interrupt > +in the register IBP (Interrupt Pending Buffer) to indicate that an > +event is pending in one of the 8 priority queues. The Pending > +Interrupt Priority Register (PIPR) is also updated using the IPB. This > +register represent the priority of the most favored pending > +notification. > + > +The PIPR is then compared to the the Current Processor Priority > +Register (CPPR). If it is more favored (numerically less than), the > +CPU interrupt line is raised and the EO bit of the Notification Source > +Register (NSR) is updated to notify the presence of an exception for > +the O/S. The O/S acknowledges the interrupt with a special load in the > +Thread Interrupt Management Area. > + > +The O/S handles the interrupt and when done, performs an EOI using a > +MMIO operation on the ESB management page of the associate source. > + > +Overview of the QEMU models for XIVE > +==================================== > + > +The XiveSource models the IVSE in general, internal and external. It > +handles the source ESBs and the MMIO interface to control them. > + > +The XiveNotifier is a small helper interface interconnecting the > +XiveSource to the XiveRouter. > + > +The XiveRouter is an abstract model acting as a combined IVRE and > +IVPE. It routes event notifications using the EAS and END tables to > +the IVPE sub-engine which does a CAM scan to find a CPU to deliver the > +exception. Storage should be provided by the inheriting classes. > + > +XiveEnDSource is a special source object. It exposes the END ESB MMIOs > +of the Event Queues which are used for coalescing event notifications > +and for escalation. Not used on the field, only to sync the EQ cache > +in OPAL. > + > +Finally, the XiveTCTX contains the interrupt state context of a thread, > +four sets of registers, one for each exception that can be delivered > +to a CPU. These contexts are scanned by the IVPE to find a matching VP > +when a notification is triggered. It also models the Thread Interrupt > +Management Area (TIMA), which exposes the thread context registers to > +the CPU for interrupt management. > diff --git a/MAINTAINERS b/MAINTAINERS > index 9424a490d647..82ac0415b493 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1708,6 +1708,7 @@ L: qemu-ppc@nongnu.org > S: Supported > F: hw/*/*xive* > F: include/hw/*/*xive* > +F: docs/*/*xive* > > Subsystems > ---------- -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On 21/05/2019 18:24, Cédric Le Goater wrote: > This documents the overall XIVE architecture and the XIVE support for > sPAPR guest machines (pseries). > > It also provides documentation on the 'info pic' command. > > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- > > Changes since v2: > > - fixed typos. > > Changes since v1: > > - reorganized into different files and directories. I don't think the > 'info pic' documentation belongs to 'interop' nor 'devel' and so > the ppc-spapr-xive.rst file seemed like the best place for it. > > docs/index.rst | 1 + > docs/specs/index.rst | 13 +++ > docs/specs/ppc-spapr-xive.rst | 174 +++++++++++++++++++++++++++++ > docs/specs/ppc-xive.rst | 199 ++++++++++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 5 files changed, 388 insertions(+) > create mode 100644 docs/specs/index.rst > create mode 100644 docs/specs/ppc-spapr-xive.rst > create mode 100644 docs/specs/ppc-xive.rst > > diff --git a/docs/index.rst b/docs/index.rst > index 3690955dd1f5..baa5791c174b 100644 > --- a/docs/index.rst > +++ b/docs/index.rst > @@ -12,4 +12,5 @@ Welcome to QEMU's documentation! > > interop/index > devel/index > + specs/index > > diff --git a/docs/specs/index.rst b/docs/specs/index.rst > new file mode 100644 > index 000000000000..2e927519c2e7 > --- /dev/null > +++ b/docs/specs/index.rst > @@ -0,0 +1,13 @@ > +. This is the top level page for the 'specs' manual > + > + > +QEMU full-system emulation guest hardware specifications > +======================================================== > + > + > +Contents: > + > +.. toctree:: > + :maxdepth: 2 > + > + xive > diff --git a/docs/specs/ppc-spapr-xive.rst b/docs/specs/ppc-spapr-xive.rst > new file mode 100644 > index 000000000000..539ce7ca4e90 > --- /dev/null > +++ b/docs/specs/ppc-spapr-xive.rst > @@ -0,0 +1,174 @@ > +XIVE for sPAPR (pseries machines) > +================================= > + > +The POWER9 processor comes with a new interrupt controller > +architecture, called XIVE as "eXternal Interrupt Virtualization > +Engine". It supports a larger number of interrupt sources and offers > +virtualization features which enables the HW to deliver interrupts > +directly to virtual processors without hypervisor assistance. > + > +A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 > +processors can run under two interrupt modes: > + > +- *Legacy Compatibility Mode* > + > + the hypervisor provides identical interfaces and similar > + functionality to PAPR+ Version 2.7. This is the default mode > + > + It is also referred as *XICS* in QEMU. > + > +- *XIVE native exploitation mode* > + > + the hypervisor provides new interfaces to manage the XIVE control > + structures, and provides direct control for interrupt management > + through MMIO pages. > + > +Which interrupt modes can be used by the machine is negotiated with > +the guest O/S during the Client Architecture Support negotiation > +sequence. The two modes are mutually exclusive. > + > +Both interrupt mode share the same IRQ number space. See below for the > +layout. > + > +CAS Negotiation > +--------------- > + > +QEMU advertises the supported interrupt modes in the device tree > +property "ibm,arch-vec-5-platform-support" in byte 23 and the OS > +Selection for XIVE is indicated in the "ibm,architecture-vec-5" > +property byte 23. > + > +The interrupt modes supported by the machine depend on the CPU type > +(POWER9 is required for XIVE) but also on the machine property > +``ic-mode`` which can be set on the command line. It can take the > +following values: ``xics``, ``xive``, ``dual`` and currently ``xics`` > +is the default but it may change in the future. > + > +The choosen interrupt mode is activated after a reconfiguration done > +in a machine reset. > + > +XIVE Device tree properties > +--------------------------- > + > +The properties for the PAPR interrupt controller node when the *XIVE > +native exploitation mode* is selected shoud contain: > + > +- ``device_type`` > + > + value should be "power-ivpe". > + > +- ``compatible`` > + > + value should be "ibm,power-ivpe". > + > +- ``reg`` > + > + contains the base address and size of the thread interrupt > + managnement areas (TIMA), for the User level and for the Guest OS > + level. Only the Guest OS level is taken into account today. > + > +- ``ibm,xive-eq-sizes`` > + > + the size of the event queues. One cell per size supported, contains > + log2 of size, in ascending order. > + > +- ``ibm,xive-lisn-ranges`` > + > + the IRQ interrupt number ranges assigned to the guest for the IPIs. > + > +The root node also exports : > + > +- ``ibm,plat-res-int-priorities`` > + > + contains a list of priorities that the hypervisor has reserved for > + its own use. > + > +IRQ number space > +---------------- > + > +IRQ Number space of the ``pseries`` machine is 8K wide and is the same > +for both interrupt mode. The different ranges are defined as follow : > + > +- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) > +- ``0x1000 .. 0x1000`` 1 EPOW > +- ``0x1001 .. 0x1001`` 1 HOTPLUG "0x1002 .. 0x10FF unused" since you have "unused" below too. > +- ``0x1100 .. 0x11FF`` 256 VIO devices > +- ``0x1200 .. 0x127F`` 32 PHBs devices > +- ``0x1280 .. 0x12FF`` unused > +- ``0x1300 .. 0x1FFF`` PHB MSIs > + > +Monitoring XIVE > +--------------- > + > +The state of the XIVE interrupt controller can be queried through the > +monitor commands ``info pic``. The output comes in two parts. > + > +First, the state of the thread interrupt context registers is dumped > +for each CPU : > + > +:: > + > + (qemu) info pic > + CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 > + CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 > + CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 > + CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 > + CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 > + ... > + > +In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only > +the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM > +line which is set to the VP identifier. > + > +Then comes the routing information which aggregates the EAS and the > +END configuration: > + > +:: > + > + ... > + LISN PQ EISN CPU/PRIO EQ > + 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] > + 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] > + 00000004 MSI -Q M 00000000 > + 00000005 MSI -Q M 00000000 > + 00000006 MSI -Q M 00000000 > + 00000007 MSI -Q M 00000000 > + 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] > + 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00001101 MSI -Q M 00000000 > + 00001200 LSI -Q M 00000000 > + 00001201 LSI -Q M 00000000 > + 00001202 LSI -Q M 00000000 > + 00001203 LSI -Q M 00000000 > + 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] > + 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] > + 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] > + > +The source information and configuration: > + > +- The ``LISN`` column outputs the interrupt number of the source in > + range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` > +- The ``PQ`` column reflects the state of the PQ bits of the source : > + > + - ``--`` source is ready to take events > + - ``P-`` an event was sent and an EOI is PENDING > + - ``PQ`` an event was QUEUED > + - ``-Q`` source is OFF > + > + a ``M`` indicates that source is *MASKED* at the EAS level, > + > +The targeting configuration : > + > +- The ``EISN`` column is the event data that will be queued in the event > + queue of the O/S. > +- The ``CPU/PRIO`` column is the tuple defining the CPU number and > + priority queue serving the source. > +- The ``EQ`` column outputs : > + > + - the current index of the event queue/ the max number of entries > + - the O/S event queue address My qemu does not print these: 00001302 MSI -- 00000104 0/6 12145/16384 ^1 [ 80000107 80000107 80000107 80000107 80000107 ^00000000 ] > + - the toggle bit > + - the last entries that were pushed in the event queue. > diff --git a/docs/specs/ppc-xive.rst b/docs/specs/ppc-xive.rst > new file mode 100644 > index 000000000000..b997dc062910 > --- /dev/null > +++ b/docs/specs/ppc-xive.rst > @@ -0,0 +1,199 @@ > +================================ > +POWER9 XIVE interrupt controller > +================================ > + > +The POWER9 processor comes with a new interrupt controller > +architecture, called XIVE as "eXternal Interrupt Virtualization > +Engine". > + > +Compared to the previous architecture, the main characteristics of > +XIVE are to support a larger number of interrupt sources and to > +deliver interrupts directly to virtual processors without hypervisor > +assistance. This removes the context switches required for the > +delivery process. > + > + > +XIVE architecture > +================= > + > +The XIVE IC is composed of three sub-engines, each taking care of a > +processing layer of external interrupts: > + > +- Interrupt Virtualization Source Engine (IVSE), or Source Controller > + (SC). These are found in PCI PHBs, in the PSI host bridge The text suggests IVSE==SC but the picture has SC in the IVRE box, does it mean that IVRE itself has a source or it should be "PC"? Also out of curiosity - what is "PSI"? > + controller, but also inside the main controller for the core IPIs > + and other sub-chips (NX, CAP, NPU) of the chip/processor. They are > + configured to feed the IVRE with events. > +- Interrupt Virtualization Routing Engine (IVRE) or Virtualization > + Controller (VC). It handles event coalescing and perform interrupt > + routing by matching an event source number with an Event > + Notification Descriptor (END). > +- Interrupt Virtualization Presentation Engine (IVPE) or Presentation > + Controller (PC). It maintains the interrupt context state of each > + thread and handles the delivery of the external interrupt to the > + thread. > + > +:: > + > + XIVE Interrupt Controller > + +------------------------------------+ IPIs > + | +---------+ +---------+ +--------+ | +-------+ > + | |IVRE | |Common Q | |IVPE |----> | CORES | > + | | esb | | | | |----> | | > + | | eas | | Bridge | | tctx |----> | | > + | |SC end | | | | nvt | | | | > + +------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+ > + | RAM | +------------------|-----------------+ | | | > + | | | | | | > + | | | | | | > + | | +--------------------v------------------------v-v-v--+ other > + | <--+ Power Bus +--> chips > + | esb | +---------+-----------------------+------------------+ > + | eas | | | > + | end | +--|------+ | > + | nvt | +----+----+ | +----+----+ > + +------+ |IVSE | | |IVSE | > + | | | | | > + | PQ-bits | | | PQ-bits | > + | local |-+ | in VC | > + +---------+ +---------+ > + PCIe NX,NPU,CAPI > + > + > + PQ-bits: 2 bits source state machine (P:pending Q:queued) > + esb: Event State Buffer (Array of PQ bits in an IVSE) > + eas: Event Assignment Structure > + end: Event Notification Descriptor > + nvt: Notification Virtual Target > + tctx: Thread interrupt Context registers [aik@yc02goos ~]$ ~/s/qemu/qmp qemu.mon.user2223.6_0_0_1 hmp info mtree | grep xive 0006010000000000-000601003fffffff (prio 0, i/o): xive.esb 0006010040000000-000601007fffffff (prio 0, i/o): xive.end [disabled] 0006030203180000-00060302031bffff (prio 0, i/o): xive.tima 0006010000000000-000601003fffffff (prio 0, i/o): xive.esb 0006010040000000-000601007fffffff (prio 0, i/o): xive.end [disabled] 0006030203180000-00060302031bffff (prio 0, i/o): xive.tima 0006010000000000-000601003fffffff (prio 0, i/o): xive.esb 0006010040000000-000601007fffffff (prio 0, i/o): xive.end [disabled] 0006030203180000-00060302031bffff (prio 0, i/o): xive.tima "xive.end" is always disabled, why create it then (I understand this all very little)? > + > + > + > +XIVE internal tables > +-------------------- > + > +Each of the sub-engines uses a set of tables to redirect interrupts > +from event sources to CPU threads. > + > +:: > + > + +-------+ > + User or O/S | EQ | > + or +------>|entries| > + Hypervisor | | .. | > + Memory | +-------+ > + | ^ > + | | > + +-------------------------------------------------+ > + | | > + Hypervisor +------+ +---+--+ +---+--+ +------+ > + Memory | ESB | | EAT | | ENDT | | NVTT | > + (skiboot) +----+-+ +----+-+ +----+-+ +------+ > + ^ | ^ | ^ | ^ > + | | | | | | | > + +-------------------------------------------------+ > + | | | | | | | > + | | | | | | | > + +----|--|--------|--|--------|--|-+ +-|-----+ +------+ > + | | | | | | | | | | tctx| |Thread| > + IPI or ---+ + v + v + v |---| + .. |-----> | > + HW events | | | | | | > + | IVRE | | IVPE | +------+ > + +---------------------------------+ +-------+ > + > + > +The IVSE have a 2-bits state machine, P for pending and Q for queued, > +for each source that allows events to be triggered. They are stored in > +an Event State Buffer (ESB) array and can be controlled by MMIOs. > + > +If the event is let through, the IVRE looks up in the Event Assignment > +Structure (EAS) table for an Event Notification Descriptor (END) > +configured for the source. Each Event Notification Descriptor defines > +a notification path to a CPU and an in-memory Event Queue, in which > +will be enqueued an EQ data for the O/S to pull. > + > +The IVPE determines if a Notification Virtual Target (NVT) can handle > +the event by scanning the thread contexts of the VCPUs dispatched on > +the processor HW threads. It maintains the interrupt context state of > +each thread in a NVT table. > + > +XIVE thread interrupt context > +----------------------------- > + > +The XIVE presenter can generate four different exceptions to its > +HW threads: > + > +- hypervisor exception > +- O/S exception > +- Event-Based Branch (user level) > +- msgsnd (doorbell) > + > +Each exception has a state independent from the others called a Thread > +Interrupt Management context. This context is a set of registers which > +lets the thread handle priority management and interrupt > +acknowledgment among other things. The most important ones being : > + > +- Interrupt Priority Register (PIPR) > +- Interrupt Pending Buffer (IPB) > +- Current Processor Priority (CPPR) > +- Notification Source Register (NSR) > + > +TIMA > +~~~~ > + > +The Thread Interrupt Management registers are accessible through a > +specific MMIO region, called the Thread Interrupt Management Area > +(TIMA), four aligned pages, each exposing a different view of the > +registers. First page (page address ending in ``0b00``) gives access > +to the entire context and is reserved for the ring 0 view for the > +physical thread context. The second (page address ending in ``0b01``) > +is for the hypervisor, ring 1 view. The third (page address ending in > +``0b10``) is for the operating system, ring 2 view. The fourth (page > +address ending in ``0b11``) is for user level, ring 3 view. > + > +Interrupt flow from an O/S perspective > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +After an event data has been enqueued in the O/S Event Queue, the IVPE > +raises the bit corresponding to the priority of the pending interrupt > +in the register IBP (Interrupt Pending Buffer) to indicate that an > +event is pending in one of the 8 priority queues. The Pending > +Interrupt Priority Register (PIPR) is also updated using the IPB. This > +register represent the priority of the most favored pending > +notification. > + > +The PIPR is then compared to the the Current Processor Priority > +Register (CPPR). If it is more favored (numerically less than), the > +CPU interrupt line is raised and the EO bit of the Notification Source > +Register (NSR) is updated to notify the presence of an exception for > +the O/S. The O/S acknowledges the interrupt with a special load in the > +Thread Interrupt Management Area. > + > +The O/S handles the interrupt and when done, performs an EOI using a > +MMIO operation on the ESB management page of the associate source. > + > +Overview of the QEMU models for XIVE > +==================================== > + > +The XiveSource models the IVSE in general, internal and external. It > +handles the source ESBs and the MMIO interface to control them. > + > +The XiveNotifier is a small helper interface interconnecting the > +XiveSource to the XiveRouter. > + > +The XiveRouter is an abstract model acting as a combined IVRE and > +IVPE. It routes event notifications using the EAS and END tables to > +the IVPE sub-engine which does a CAM scan to find a CPU to deliver the > +exception. Storage should be provided by the inheriting classes. > + > +XiveEnDSource is a special source object. It exposes the END ESB MMIOs > +of the Event Queues which are used for coalescing event notifications > +and for escalation. Not used on the field, only to sync the EQ cache > +in OPAL. > + > +Finally, the XiveTCTX contains the interrupt state context of a thread, > +four sets of registers, one for each exception that can be delivered > +to a CPU. These contexts are scanned by the IVPE to find a matching VP > +when a notification is triggered. It also models the Thread Interrupt > +Management Area (TIMA), which exposes the thread context registers to > +the CPU for interrupt management. > diff --git a/MAINTAINERS b/MAINTAINERS > index 9424a490d647..82ac0415b493 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1708,6 +1708,7 @@ L: qemu-ppc@nongnu.org > S: Supported > F: hw/*/*xive* > F: include/hw/*/*xive* > +F: docs/*/*xive* > > Subsystems > ---------- > -- Alexey
On 31/05/2019 09:03, Alexey Kardashevskiy wrote: > > > On 21/05/2019 18:24, Cédric Le Goater wrote: >> This documents the overall XIVE architecture and the XIVE support for >> sPAPR guest machines (pseries). >> >> It also provides documentation on the 'info pic' command. >> >> Signed-off-by: Cédric Le Goater <clg@kaod.org> >> --- >> >> Changes since v2: >> >> - fixed typos. >> >> Changes since v1: >> >> - reorganized into different files and directories. I don't think the >> 'info pic' documentation belongs to 'interop' nor 'devel' and so >> the ppc-spapr-xive.rst file seemed like the best place for it. >> >> docs/index.rst | 1 + >> docs/specs/index.rst | 13 +++ >> docs/specs/ppc-spapr-xive.rst | 174 +++++++++++++++++++++++++++++ >> docs/specs/ppc-xive.rst | 199 ++++++++++++++++++++++++++++++++++ >> MAINTAINERS | 1 + >> 5 files changed, 388 insertions(+) >> create mode 100644 docs/specs/index.rst >> create mode 100644 docs/specs/ppc-spapr-xive.rst >> create mode 100644 docs/specs/ppc-xive.rst >> >> diff --git a/docs/index.rst b/docs/index.rst >> index 3690955dd1f5..baa5791c174b 100644 >> --- a/docs/index.rst >> +++ b/docs/index.rst >> @@ -12,4 +12,5 @@ Welcome to QEMU's documentation! >> >> interop/index >> devel/index >> + specs/index >> >> diff --git a/docs/specs/index.rst b/docs/specs/index.rst >> new file mode 100644 >> index 000000000000..2e927519c2e7 >> --- /dev/null >> +++ b/docs/specs/index.rst >> @@ -0,0 +1,13 @@ >> +. This is the top level page for the 'specs' manual >> + >> + >> +QEMU full-system emulation guest hardware specifications >> +======================================================== >> + >> + >> +Contents: >> + >> +.. toctree:: >> + :maxdepth: 2 >> + >> + xive >> diff --git a/docs/specs/ppc-spapr-xive.rst b/docs/specs/ppc-spapr-xive.rst >> new file mode 100644 >> index 000000000000..539ce7ca4e90 >> --- /dev/null >> +++ b/docs/specs/ppc-spapr-xive.rst >> @@ -0,0 +1,174 @@ >> +XIVE for sPAPR (pseries machines) >> +================================= >> + >> +The POWER9 processor comes with a new interrupt controller >> +architecture, called XIVE as "eXternal Interrupt Virtualization >> +Engine". It supports a larger number of interrupt sources and offers >> +virtualization features which enables the HW to deliver interrupts >> +directly to virtual processors without hypervisor assistance. >> + >> +A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 >> +processors can run under two interrupt modes: >> + >> +- *Legacy Compatibility Mode* >> + >> + the hypervisor provides identical interfaces and similar >> + functionality to PAPR+ Version 2.7. This is the default mode >> + >> + It is also referred as *XICS* in QEMU. >> + >> +- *XIVE native exploitation mode* >> + >> + the hypervisor provides new interfaces to manage the XIVE control >> + structures, and provides direct control for interrupt management >> + through MMIO pages. >> + >> +Which interrupt modes can be used by the machine is negotiated with >> +the guest O/S during the Client Architecture Support negotiation >> +sequence. The two modes are mutually exclusive. >> + >> +Both interrupt mode share the same IRQ number space. See below for the >> +layout. >> + >> +CAS Negotiation >> +--------------- >> + >> +QEMU advertises the supported interrupt modes in the device tree >> +property "ibm,arch-vec-5-platform-support" in byte 23 and the OS >> +Selection for XIVE is indicated in the "ibm,architecture-vec-5" >> +property byte 23. >> + >> +The interrupt modes supported by the machine depend on the CPU type >> +(POWER9 is required for XIVE) but also on the machine property >> +``ic-mode`` which can be set on the command line. It can take the >> +following values: ``xics``, ``xive``, ``dual`` and currently ``xics`` >> +is the default but it may change in the future. >> + >> +The choosen interrupt mode is activated after a reconfiguration done >> +in a machine reset. >> + >> +XIVE Device tree properties >> +--------------------------- >> + >> +The properties for the PAPR interrupt controller node when the *XIVE >> +native exploitation mode* is selected shoud contain: >> + >> +- ``device_type`` >> + >> + value should be "power-ivpe". >> + >> +- ``compatible`` >> + >> + value should be "ibm,power-ivpe". >> + >> +- ``reg`` >> + >> + contains the base address and size of the thread interrupt >> + managnement areas (TIMA), for the User level and for the Guest OS >> + level. Only the Guest OS level is taken into account today. >> + >> +- ``ibm,xive-eq-sizes`` >> + >> + the size of the event queues. One cell per size supported, contains >> + log2 of size, in ascending order. >> + >> +- ``ibm,xive-lisn-ranges`` >> + >> + the IRQ interrupt number ranges assigned to the guest for the IPIs. >> + >> +The root node also exports : >> + >> +- ``ibm,plat-res-int-priorities`` >> + >> + contains a list of priorities that the hypervisor has reserved for >> + its own use. >> + >> +IRQ number space >> +---------------- >> + >> +IRQ Number space of the ``pseries`` machine is 8K wide and is the same >> +for both interrupt mode. The different ranges are defined as follow : >> + >> +- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) >> +- ``0x1000 .. 0x1000`` 1 EPOW >> +- ``0x1001 .. 0x1001`` 1 HOTPLUG > > "0x1002 .. 0x10FF unused" since you have "unused" below too. yes. It's worth adding. >> +- ``0x1100 .. 0x11FF`` 256 VIO devices >> +- ``0x1200 .. 0x127F`` 32 PHBs devices and this range above is for the static IRQ numbers of PHBs, such as LSIs. >> +- ``0x1280 .. 0x12FF`` unused >> +- ``0x1300 .. 0x1FFF`` PHB MSIs > > > > >> + >> +Monitoring XIVE >> +--------------- >> + >> +The state of the XIVE interrupt controller can be queried through the >> +monitor commands ``info pic``. The output comes in two parts. >> + >> +First, the state of the thread interrupt context registers is dumped >> +for each CPU : >> + >> +:: >> + >> + (qemu) info pic >> + CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 >> + CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 >> + CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 >> + CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 >> + CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 >> + ... >> + >> +In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only >> +the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM >> +line which is set to the VP identifier. >> + >> +Then comes the routing information which aggregates the EAS and the >> +END configuration: >> + >> +:: >> + >> + ... >> + LISN PQ EISN CPU/PRIO EQ >> + 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] >> + 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] >> + 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] >> + 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] >> + 00000004 MSI -Q M 00000000 >> + 00000005 MSI -Q M 00000000 >> + 00000006 MSI -Q M 00000000 >> + 00000007 MSI -Q M 00000000 >> + 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] >> + 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] >> + 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] >> + 00001101 MSI -Q M 00000000 >> + 00001200 LSI -Q M 00000000 >> + 00001201 LSI -Q M 00000000 >> + 00001202 LSI -Q M 00000000 >> + 00001203 LSI -Q M 00000000 >> + 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] >> + 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] >> + 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] >> + >> +The source information and configuration: >> + >> +- The ``LISN`` column outputs the interrupt number of the source in >> + range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` >> +- The ``PQ`` column reflects the state of the PQ bits of the source : >> + >> + - ``--`` source is ready to take events >> + - ``P-`` an event was sent and an EOI is PENDING >> + - ``PQ`` an event was QUEUED >> + - ``-Q`` source is OFF >> + >> + a ``M`` indicates that source is *MASKED* at the EAS level, >> + >> +The targeting configuration : >> + >> +- The ``EISN`` column is the event data that will be queued in the event >> + queue of the O/S. >> +- The ``CPU/PRIO`` column is the tuple defining the CPU number and >> + priority queue serving the source. >> +- The ``EQ`` column outputs : >> + >> + - the current index of the event queue/ the max number of entries >> + - the O/S event queue address > > > My qemu does not print these: > > 00001302 MSI -- 00000104 0/6 12145/16384 ^1 [ 80000107 80000107 > 80000107 80000107 80000107 ^00000000 ] Please update your git tree :) The patch adding the EQ address is merged in mainline now : https://git.qemu.org/?p=qemu.git;a=blob;f=hw/intc/spapr_xive.c;h=62e0ef8fa5b467298899327ca2f4d5efec37a4c7;hb=HEAD#l140 > > >> + - the toggle bit >> + - the last entries that were pushed in the event queue. >> diff --git a/docs/specs/ppc-xive.rst b/docs/specs/ppc-xive.rst >> new file mode 100644 >> index 000000000000..b997dc062910 >> --- /dev/null >> +++ b/docs/specs/ppc-xive.rst >> @@ -0,0 +1,199 @@ >> +================================ >> +POWER9 XIVE interrupt controller >> +================================ >> + >> +The POWER9 processor comes with a new interrupt controller >> +architecture, called XIVE as "eXternal Interrupt Virtualization >> +Engine". >> + >> +Compared to the previous architecture, the main characteristics of >> +XIVE are to support a larger number of interrupt sources and to >> +deliver interrupts directly to virtual processors without hypervisor >> +assistance. This removes the context switches required for the >> +delivery process. >> + >> + >> +XIVE architecture >> +================= >> + >> +The XIVE IC is composed of three sub-engines, each taking care of a >> +processing layer of external interrupts: >> + >> +- Interrupt Virtualization Source Engine (IVSE), or Source Controller >> + (SC). These are found in PCI PHBs, in the PSI host bridge > > The text suggests IVSE==SC but the picture has SC in the IVRE box, does > it mean that IVRE itself has a source or it should be "PC"? > > Also out of curiosity - what is "PSI"? The Processor Service Interface (PSI) Controller is a bridge unit which connects the different interfaces to the Power Processor, LPC, FSI, BMC, I2C, etc. > >> + controller, but also inside the main controller for the core IPIs >> + and other sub-chips (NX, CAP, NPU) of the chip/processor. They are >> + configured to feed the IVRE with events. >> +- Interrupt Virtualization Routing Engine (IVRE) or Virtualization >> + Controller (VC). It handles event coalescing and perform interrupt >> + routing by matching an event source number with an Event >> + Notification Descriptor (END). >> +- Interrupt Virtualization Presentation Engine (IVPE) or Presentation >> + Controller (PC). It maintains the interrupt context state of each >> + thread and handles the delivery of the external interrupt to the >> + thread. >> + >> +:: >> + >> + XIVE Interrupt Controller >> + +------------------------------------+ IPIs >> + | +---------+ +---------+ +--------+ | +-------+ >> + | |IVRE | |Common Q | |IVPE |----> | CORES | >> + | | esb | | | | |----> | | >> + | | eas | | Bridge | | tctx |----> | | >> + | |SC end | | | | nvt | | | | >> + +------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+ >> + | RAM | +------------------|-----------------+ | | | >> + | | | | | | >> + | | | | | | >> + | | +--------------------v------------------------v-v-v--+ other >> + | <--+ Power Bus +--> chips >> + | esb | +---------+-----------------------+------------------+ >> + | eas | | | >> + | end | +--|------+ | >> + | nvt | +----+----+ | +----+----+ >> + +------+ |IVSE | | |IVSE | >> + | | | | | >> + | PQ-bits | | | PQ-bits | >> + | local |-+ | in VC | >> + +---------+ +---------+ >> + PCIe NX,NPU,CAPI >> + >> + >> + PQ-bits: 2 bits source state machine (P:pending Q:queued) >> + esb: Event State Buffer (Array of PQ bits in an IVSE) >> + eas: Event Assignment Structure >> + end: Event Notification Descriptor >> + nvt: Notification Virtual Target >> + tctx: Thread interrupt Context registers > > > > [aik@yc02goos ~]$ ~/s/qemu/qmp qemu.mon.user2223.6_0_0_1 hmp info mtree > | grep xive > 0006010000000000-000601003fffffff (prio 0, i/o): xive.esb > 0006010040000000-000601007fffffff (prio 0, i/o): xive.end [disabled] > 0006030203180000-00060302031bffff (prio 0, i/o): xive.tima > 0006010000000000-000601003fffffff (prio 0, i/o): xive.esb > 0006010040000000-000601007fffffff (prio 0, i/o): xive.end [disabled] > 0006030203180000-00060302031bffff (prio 0, i/o): xive.tima > 0006010000000000-000601003fffffff (prio 0, i/o): xive.esb > 0006010040000000-000601007fffffff (prio 0, i/o): xive.end [disabled] > 0006030203180000-00060302031bffff (prio 0, i/o): xive.tima Do you really have 3 times the same mapping ? I think we fix this recently. > "xive.end" is always disabled, why create it then (I understand this all > very little)? The xive.end memory region contains the ESB pages of the END structure, which has two pairs of PQ bits, for notification and escalation. It provides a way to coalesce the event notifications at the END level. No OS makes use of this feature. I agree we could have removed the mapping but we kept it as a reminder. Thanks, C. > > >> + >> + >> + >> +XIVE internal tables >> +-------------------- >> + >> +Each of the sub-engines uses a set of tables to redirect interrupts >> +from event sources to CPU threads. >> + >> +:: >> + >> + +-------+ >> + User or O/S | EQ | >> + or +------>|entries| >> + Hypervisor | | .. | >> + Memory | +-------+ >> + | ^ >> + | | >> + +-------------------------------------------------+ >> + | | >> + Hypervisor +------+ +---+--+ +---+--+ +------+ >> + Memory | ESB | | EAT | | ENDT | | NVTT | >> + (skiboot) +----+-+ +----+-+ +----+-+ +------+ >> + ^ | ^ | ^ | ^ >> + | | | | | | | >> + +-------------------------------------------------+ >> + | | | | | | | >> + | | | | | | | >> + +----|--|--------|--|--------|--|-+ +-|-----+ +------+ >> + | | | | | | | | | | tctx| |Thread| >> + IPI or ---+ + v + v + v |---| + .. |-----> | >> + HW events | | | | | | >> + | IVRE | | IVPE | +------+ >> + +---------------------------------+ +-------+ >> + >> + >> +The IVSE have a 2-bits state machine, P for pending and Q for queued, >> +for each source that allows events to be triggered. They are stored in >> +an Event State Buffer (ESB) array and can be controlled by MMIOs. >> + >> +If the event is let through, the IVRE looks up in the Event Assignment >> +Structure (EAS) table for an Event Notification Descriptor (END) >> +configured for the source. Each Event Notification Descriptor defines >> +a notification path to a CPU and an in-memory Event Queue, in which >> +will be enqueued an EQ data for the O/S to pull. >> + >> +The IVPE determines if a Notification Virtual Target (NVT) can handle >> +the event by scanning the thread contexts of the VCPUs dispatched on >> +the processor HW threads. It maintains the interrupt context state of >> +each thread in a NVT table. >> + >> +XIVE thread interrupt context >> +----------------------------- >> + >> +The XIVE presenter can generate four different exceptions to its >> +HW threads: >> + >> +- hypervisor exception >> +- O/S exception >> +- Event-Based Branch (user level) >> +- msgsnd (doorbell) >> + >> +Each exception has a state independent from the others called a Thread >> +Interrupt Management context. This context is a set of registers which >> +lets the thread handle priority management and interrupt >> +acknowledgment among other things. The most important ones being : >> + >> +- Interrupt Priority Register (PIPR) >> +- Interrupt Pending Buffer (IPB) >> +- Current Processor Priority (CPPR) >> +- Notification Source Register (NSR) >> + >> +TIMA >> +~~~~ >> + >> +The Thread Interrupt Management registers are accessible through a >> +specific MMIO region, called the Thread Interrupt Management Area >> +(TIMA), four aligned pages, each exposing a different view of the >> +registers. First page (page address ending in ``0b00``) gives access >> +to the entire context and is reserved for the ring 0 view for the >> +physical thread context. The second (page address ending in ``0b01``) >> +is for the hypervisor, ring 1 view. The third (page address ending in >> +``0b10``) is for the operating system, ring 2 view. The fourth (page >> +address ending in ``0b11``) is for user level, ring 3 view. >> + >> +Interrupt flow from an O/S perspective >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +After an event data has been enqueued in the O/S Event Queue, the IVPE >> +raises the bit corresponding to the priority of the pending interrupt >> +in the register IBP (Interrupt Pending Buffer) to indicate that an >> +event is pending in one of the 8 priority queues. The Pending >> +Interrupt Priority Register (PIPR) is also updated using the IPB. This >> +register represent the priority of the most favored pending >> +notification. >> + >> +The PIPR is then compared to the the Current Processor Priority >> +Register (CPPR). If it is more favored (numerically less than), the >> +CPU interrupt line is raised and the EO bit of the Notification Source >> +Register (NSR) is updated to notify the presence of an exception for >> +the O/S. The O/S acknowledges the interrupt with a special load in the >> +Thread Interrupt Management Area. >> + >> +The O/S handles the interrupt and when done, performs an EOI using a >> +MMIO operation on the ESB management page of the associate source. >> + >> +Overview of the QEMU models for XIVE >> +==================================== >> + >> +The XiveSource models the IVSE in general, internal and external. It >> +handles the source ESBs and the MMIO interface to control them. >> + >> +The XiveNotifier is a small helper interface interconnecting the >> +XiveSource to the XiveRouter. >> + >> +The XiveRouter is an abstract model acting as a combined IVRE and >> +IVPE. It routes event notifications using the EAS and END tables to >> +the IVPE sub-engine which does a CAM scan to find a CPU to deliver the >> +exception. Storage should be provided by the inheriting classes. >> + >> +XiveEnDSource is a special source object. It exposes the END ESB MMIOs >> +of the Event Queues which are used for coalescing event notifications >> +and for escalation. Not used on the field, only to sync the EQ cache >> +in OPAL. >> + >> +Finally, the XiveTCTX contains the interrupt state context of a thread, >> +four sets of registers, one for each exception that can be delivered >> +to a CPU. These contexts are scanned by the IVPE to find a matching VP >> +when a notification is triggered. It also models the Thread Interrupt >> +Management Area (TIMA), which exposes the thread context registers to >> +the CPU for interrupt management. >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 9424a490d647..82ac0415b493 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -1708,6 +1708,7 @@ L: qemu-ppc@nongnu.org >> S: Supported >> F: hw/*/*xive* >> F: include/hw/*/*xive* >> +F: docs/*/*xive* >> >> Subsystems >> ---------- >> >
© 2016 - 2024 Red Hat, Inc.