.../interrupt-controller/arm,gic-v5-iwb.yaml | 76 ++ .../bindings/interrupt-controller/arm,gic-v5.yaml | 196 ++++ MAINTAINERS | 10 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/barrier.h | 3 + arch/arm64/include/asm/el2_setup.h | 45 + arch/arm64/include/asm/smp.h | 24 +- arch/arm64/include/asm/sysreg.h | 83 +- arch/arm64/kernel/cpufeature.c | 17 +- arch/arm64/kernel/smp.c | 156 ++- arch/arm64/tools/cpucaps | 3 +- arch/arm64/tools/sysreg | 495 +++++++- drivers/irqchip/Kconfig | 12 + drivers/irqchip/Makefile | 4 +- drivers/irqchip/irq-gic-common.h | 2 - ...3-its-msi-parent.c => irq-gic-its-msi-parent.c} | 3 +- drivers/irqchip/irq-gic-its-msi-parent.h | 13 + drivers/irqchip/irq-gic-v3-its.c | 3 +- drivers/irqchip/irq-gic-v5-irs.c | 819 ++++++++++++++ drivers/irqchip/irq-gic-v5-its.c | 1176 ++++++++++++++++++++ drivers/irqchip/irq-gic-v5.c | 1046 +++++++++++++++++ drivers/irqchip/irq-gic.c | 2 +- include/linux/irqchip/arm-gic-v5.h | 387 +++++++ 23 files changed, 4507 insertions(+), 69 deletions(-)
Implement the irqchip kernel driver for the Arm GICv5 architecture,
as described in the GICv5 beta0 specification, available at:
https://developer.arm.com/documentation/aes0070
The GICv5 architecture is composed of multiple components:
- one or more IRS (Interrupt Routing Service)
- zero or more ITS (Interrupt Translation Service)
- zero or more IWB (Interrupt Wire Bridge)
The GICv5 host kernel driver is organized into units corresponding
to GICv5 components.
The GICv5 architecture defines the following interrupt types:
- PPI (PE-Private Peripheral Interrupt)
- SPI (Shared Peripheral Interrupt)
- LPI (Logical Peripheral Interrupt)
This series adds sysreg entries required to automatically generate
GICv5 registers handling code, one patch per-register.
This patch series is split into patches matching *logical* entities,
to make the review easier.
Logical entities:
- PPI
- IRS/SPI
- LPI/IPI
- SMP enablement
- ITS
The salient points of the driver are summarized below.
=============
1. Testing
=============
Patchset tested with an architecturally compliant FVP model with
the following setup:
- 1 IRS
- 1 and 2 ITSes
- 1 and 2 IWBs
configured with different parameters that vary the IRS(IST) and
ITS(DT/ITT) table levels and INTID/DEVICEID/EVENTID bits.
A Trusted-Firmware (TF-A) prototype was used for device tree
bindings and component initializations.
================
2. Driver design
================
=====================
2.1 GICv5 DT bindings
=====================
The DT bindings attempt to map directly to the GICv5 component
hierarchy, with a top level node corresponding to the GICv5 "system",
having IRS child nodes, that have in turn ITS child nodes.
The IWB is defined in a separate schema; its relationship with the ITS
is explicit through the msi-parent property required to define the IWB
deviceID.
===================
2.2 GICv5 top level
===================
The top-level GICv5 irqchip driver implements separate IRQ
domains - one for each interrupt type, PPI (PE-Private Peripheral
Interrupt), SPI (Shared Peripheral Interrupt) and LPI (Logical
Peripheral Interrupt).
The top-level exception handler routes the IRQ to the relevant IRQ
domain for handling according to the interrupt type detected when the
IRQ is acknowledged.
All IRQs are set to the same priority value.
The driver assumes that the GICv5 components implement enough
physical address bits to address the full system RAM, as required
by the architecture; it does not check whether the physical address
ranges of memory allocated for IRS/ITS tables are within the GICv5
physical address range.
Components are probed by relying on the early DT irqchip probing
scheme. The probing is carried out hierarchically, starting from
the top level.
The IWB driver has been dropped owing to issues encountered with
core code DOMAIN_BUS_WIRED_TO_MSI bus token handling:
https://lore.kernel.org/lkml/87tt6310hu.wl-maz@kernel.org/
=============
2.3 GICv5 IRS
=============
The GICv5 IRS driver probes and manages SPI interrupts by detecting their
presence and by providing the top-level driver the information required
to set up the SPI interrupt domain.
The GICv5 IRS driver also parses from firmware Interrupt AFFinity ID
(IAFFID) IDs identifying cores and sets up IRS IRQ routing.
The GICv5 IRS driver allocates memory to handle the IRS tables.
The IRS LPI interrupts state is kept in an Interrupt State Table (IST)
and it is managed through CPU instructions.
The IRS driver allocates the IST table that, depending on available HW
features can be either 1- or 2-level.
If the IST is 2-level, memory for the level-2 table entries
is allocated on demand (ie when LPIs are requested), using an IRS
mechanism to make level-1 entry valid on demand after the IST
has already been enabled.
Chunks of memory allocated for IST entries can be smaller or larger than
PAGE_SIZE and are required to be physically contiguous within an IST level
(i.e. a linear IST is a single memory block, a 2-level IST is made up of a
block of memory for the L1 table, whose entries point at different L2 tables
that are in turn allocated as memory chunks).
LPI INTIDs are allocated in software using an IDA. IDA does not support
allocating ranges, which is a bit cumbersome because this forces us
to allocate IDs one by one where the LPIs could actually be allocated
in chunks.
An IDA was chosen because basically it is a dynamic bitmap, which
carries out memory allocation automatically.
Other drivers/subsystems made different choices to allocate ranges,
an IDA was chosen since it is part of the core kernel and an IDA
range API is in the making.
IPIs are implemented using LPIs and a hierarchical domain is created
specifically for IPIs using the LPI domain as a parent.
arm64 IPI management core code is augmented with a new API to handle
IPIs that are not per-cpu interrupts and force the affinity of the LPI
backing an IPI to a specific and immutable value.
=============
2.4 GICv5 ITS
=============
The ITS driver reuses the existing GICv3/v4 MSI-parent infrastructure
and on top builds an IRQ domain needed to enable message based IRQs.
ITS tables - DT (device table) and ITT (Interrupt Translation Table) are
allocated according to the number of required deviceIDs and eventIDs on
a per device basis. The ITS driver relies on the kmalloc() interface
because memory pages must be physically contiguous within a table level
and can be < or > than PAGE_SIZE.
=============
2.5 GICv5 IWB
=============
The IWB driver has been dropped owing to issues encountered with
core code DOMAIN_BUS_WIRED_TO_MSI bus token handling:
https://lore.kernel.org/lkml/87tt6310hu.wl-maz@kernel.org/
===================
3. Acknowledgements
===================
The patchset was co-developed with T.Hayes and S.Bischoff from
Arm - thank you so much for your help.
A big thank you to M.Zyngier for his fundamental help/advice.
If you have some time to help us review this series and get it into
shape, thank you very much.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
---
Changes in v3:
- Reintroduced v1 patch split to simplify review
- Reworked IRS/ITS iopoll loop, split in atomic/non-atomic
- Cleaned-up IRS/ITS code with macros addressing review comments
- Dropped IWB driver waiting for IRQ core code to be fixed for DOMAIN_BUS_WIRED_TO_MSI
https://lore.kernel.org/lkml/87tt6310hu.wl-maz@kernel.org/
- Moved headers to arch/arm64 and include/linux/irqchip
- Reworked GSB barriers definition
- Added extensive GSB/ISB barriers comments
- Limited error checking on IRS/ITS code - introduced couple of fatal
BUG_ON checks
- Link to v2: https://lore.kernel.org/r/20250424-gicv5-host-v2-0-545edcaf012b@kernel.org
Changes in v2:
- Squashed patches [18-21] into a single logical entity
- Replaced maple tree with IDA for LPI IDs allocation
- Changed coding style to tip-maintainer guidelines
- Tried to consolidate poll wait mechanism into fewer functions
- Added comments related to _relaxed accessors, barriers and kmalloc
limitations
- Removed IPI affinity check hotplug callback
- Applied DT schema changes requested, moved IWB into a separate schema
- Fixed DT examples
- Fixed guard() usage
- Link to v1: https://lore.kernel.org/r/20250408-gicv5-host-v1-0-1f26db465f8d@kernel.org
---
Lorenzo Pieralisi (24):
dt-bindings: interrupt-controller: Add Arm GICv5
arm64/sysreg: Add GCIE field to ID_AA64PFR2_EL1
arm64/sysreg: Add ICC_PPI_PRIORITY<n>_EL1
arm64/sysreg: Add ICC_ICSR_EL1
arm64/sysreg: Add ICC_PPI_HMR<n>_EL1
arm64/sysreg: Add ICC_PPI_ENABLER<n>_EL1
arm64/sysreg: Add ICC_PPI_{C/S}ACTIVER<n>_EL1
arm64/sysreg: Add ICC_PPI_{C/S}PENDR<n>_EL1
arm64/sysreg: Add ICC_CR0_EL1
arm64/sysreg: Add ICC_PCR_EL1
arm64/sysreg: Add ICC_IDR0_EL1
arm64/sysreg: Add ICH_HFGRTR_EL2
arm64/sysreg: Add ICH_HFGWTR_EL2
arm64/sysreg: Add ICH_HFGITR_EL2
arm64: Disable GICv5 read/write/instruction traps
arm64: cpucaps: Rename GICv3 CPU interface capability
arm64: cpucaps: Add GICv5 CPU interface (GCIE) capability
arm64: Add support for GICv5 GSB barriers
irqchip/gic-v5: Add GICv5 PPI support
irqchip/gic-v5: Add GICv5 IRS/SPI support
irqchip/gic-v5: Add GICv5 LPI/IPI support
irqchip/gic-v5: Enable GICv5 SMP booting
irqchip/gic-v5: Add GICv5 ITS support
arm64: Kconfig: Enable GICv5
Marc Zyngier (1):
arm64: smp: Support non-SGIs for IPIs
.../interrupt-controller/arm,gic-v5-iwb.yaml | 76 ++
.../bindings/interrupt-controller/arm,gic-v5.yaml | 196 ++++
MAINTAINERS | 10 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/barrier.h | 3 +
arch/arm64/include/asm/el2_setup.h | 45 +
arch/arm64/include/asm/smp.h | 24 +-
arch/arm64/include/asm/sysreg.h | 83 +-
arch/arm64/kernel/cpufeature.c | 17 +-
arch/arm64/kernel/smp.c | 156 ++-
arch/arm64/tools/cpucaps | 3 +-
arch/arm64/tools/sysreg | 495 +++++++-
drivers/irqchip/Kconfig | 12 +
drivers/irqchip/Makefile | 4 +-
drivers/irqchip/irq-gic-common.h | 2 -
...3-its-msi-parent.c => irq-gic-its-msi-parent.c} | 3 +-
drivers/irqchip/irq-gic-its-msi-parent.h | 13 +
drivers/irqchip/irq-gic-v3-its.c | 3 +-
drivers/irqchip/irq-gic-v5-irs.c | 819 ++++++++++++++
drivers/irqchip/irq-gic-v5-its.c | 1176 ++++++++++++++++++++
drivers/irqchip/irq-gic-v5.c | 1046 +++++++++++++++++
drivers/irqchip/irq-gic.c | 2 +-
include/linux/irqchip/arm-gic-v5.h | 387 +++++++
23 files changed, 4507 insertions(+), 69 deletions(-)
---
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
change-id: 20250408-gicv5-host-749f316afe84
Best regards,
--
Lorenzo Pieralisi <lpieralisi@kernel.org>
On Tue, 06 May 2025 13:23:29 +0100, Lorenzo Pieralisi <lpieralisi@kernel.org> wrote: > > ============= > 2.5 GICv5 IWB > ============= > > The IWB driver has been dropped owing to issues encountered with > core code DOMAIN_BUS_WIRED_TO_MSI bus token handling: > > https://lore.kernel.org/lkml/87tt6310hu.wl-maz@kernel.org/ This problem does not have much to do with DOMAIN_BUS_WIRED_TO_MSI. The issues are that: - the core code calls into the .prepare domain on a per-interrupt basis instead of on a per *device* basis. This is a complete violation of the MSI API, because .prepare is when you are supposed to perform resource reservation (in the GICv3 parlance, that's ITT allocation + MAPD command). - the same function calls .prepare for a *single* interrupt, effectively telling the irqchip "my device has only one interrupt". Because I'm super generous (and don't like wasting precious bytes), I allocate 32 LPIs at the minimum. Only snag is that I could do with 300+ interrupts, and calling repeatedly doesn't help at all, since we cannot *grow* an ITT. So this code needs to be taken to the backyard and beaten into shape before we can make use of it. My D05 (with its collection of MBIGENs) only works by accident at the moment, as I found out yesterday, and GICv5 IWB is in the same boat, since it reuses the msi-parent thing, and therefore the same heuristic. I guess not having the IWB immediately isn't too big a deal, but I really didn't expect to find this... Thanks, M. -- Without deviation from the norm, progress is not possible.
On Tue, May 06, 2025 at 03:05:39PM +0100, Marc Zyngier wrote: > On Tue, 06 May 2025 13:23:29 +0100, > Lorenzo Pieralisi <lpieralisi@kernel.org> wrote: > > > > ============= > > 2.5 GICv5 IWB > > ============= > > > > The IWB driver has been dropped owing to issues encountered with > > core code DOMAIN_BUS_WIRED_TO_MSI bus token handling: > > > > https://lore.kernel.org/lkml/87tt6310hu.wl-maz@kernel.org/ > > This problem does not have much to do with DOMAIN_BUS_WIRED_TO_MSI. > > The issues are that: > > - the core code calls into the .prepare domain on a per-interrupt > basis instead of on a per *device* basis. This is a complete > violation of the MSI API, because .prepare is when you are supposed > to perform resource reservation (in the GICv3 parlance, that's ITT > allocation + MAPD command). > > - the same function calls .prepare for a *single* interrupt, > effectively telling the irqchip "my device has only one interrupt". > Because I'm super generous (and don't like wasting precious bytes), > I allocate 32 LPIs at the minimum. Only snag is that I could do with > 300+ interrupts, and calling repeatedly doesn't help at all, since > we cannot *grow* an ITT. On the IWB driver code that I could not post I noticed that it is true that the .prepare callback is called on a per-interrupt basis but the vector size is the domain size (ie number of wires) which is correct AFAICS, so the ITT size should be fine I don't get why it would need to grow. The difference with this series is that on v3 LPIs are allocated on .prepare(), we allocate them on .alloc(). So yes, calling .prepare on a per-interrupt basis looks like a bug but if we allow reusing a deviceID (ie the "shared" thingy) it could be harmless. > So this code needs to be taken to the backyard and beaten into shape > before we can make use of it. My D05 (with its collection of MBIGENs) > only works by accident at the moment, as I found out yesterday, and > GICv5 IWB is in the same boat, since it reuses the msi-parent thing, > and therefore the same heuristic. > > I guess not having the IWB immediately isn't too big a deal, but I > really didn't expect to find this... To be honest, it was expected. We found these snags while designing the code (that explains how IWB was structured in v1 - by the way) but we didn't know if the behaviour above was by construction, we always thought "we must be making a mistake". The same goes for the fixed eventID but I would not resume that discussion again, there are things that are impossible to know unless you are aware of the background story behind them. Thanks, Lorenzo
On Wed, 07 May 2025 08:54:36 +0100, Lorenzo Pieralisi <lpieralisi@kernel.org> wrote: > > On Tue, May 06, 2025 at 03:05:39PM +0100, Marc Zyngier wrote: > > On Tue, 06 May 2025 13:23:29 +0100, > > Lorenzo Pieralisi <lpieralisi@kernel.org> wrote: > > > > > > ============= > > > 2.5 GICv5 IWB > > > ============= > > > > > > The IWB driver has been dropped owing to issues encountered with > > > core code DOMAIN_BUS_WIRED_TO_MSI bus token handling: > > > > > > https://lore.kernel.org/lkml/87tt6310hu.wl-maz@kernel.org/ > > > > This problem does not have much to do with DOMAIN_BUS_WIRED_TO_MSI. > > > > The issues are that: > > > > - the core code calls into the .prepare domain on a per-interrupt > > basis instead of on a per *device* basis. This is a complete > > violation of the MSI API, because .prepare is when you are supposed > > to perform resource reservation (in the GICv3 parlance, that's ITT > > allocation + MAPD command). > > > > - the same function calls .prepare for a *single* interrupt, > > effectively telling the irqchip "my device has only one interrupt". > > Because I'm super generous (and don't like wasting precious bytes), > > I allocate 32 LPIs at the minimum. Only snag is that I could do with > > 300+ interrupts, and calling repeatedly doesn't help at all, since > > we cannot *grow* an ITT. > > On the IWB driver code that I could not post I noticed that it is > true that the .prepare callback is called on a per-interrupt basis > but the vector size is the domain size (ie number of wires) which > is correct AFAICS, so the ITT size should be fine I don't get why > it would need to grow. Look again. The only reason you are getting something that *looks* correct is that its_pmsi_prepare() has this nugget: /* Allocate at least 32 MSIs, and always as a power of 2 */ nvec = max_t(int, 32, roundup_pow_of_two(nvec)); and that the IWB is, conveniently, in sets of 32. However, the caller of this function (__msi_domain_alloc_irqs()) passes a nvec value that is always exactly *1* when allocating an interrupt. So you're just lucky that I picked a minimum ITT size that matches the IWB on your model. Configure your IWB to be, let's say, 256 interrupts and use the last one, and you'll have a very different behaviour. > The difference with this series is that on v3 LPIs are allocated > on .prepare(), we allocate them on .alloc(). Absolutely not. Even on v3, we never allocate LPIs in .prepare(). We allocate the ITT, perform the MAPD, and that's it. That's why it's called *prepare*. > So yes, calling .prepare on a per-interrupt basis looks like a bug > but if we allow reusing a deviceID (ie the "shared" thingy) it could > be harmless. Harmless? No. It is really *bad*. It means you lose any sort of sane tracking of what owns the ITT and how you can free things. Seeing a devid twice is the admission that we have no idea of what is going on. GICv3 is already in that sorry state, but I am hopeful that GICv5 can be a bit less crap. > > So this code needs to be taken to the backyard and beaten into shape > > before we can make use of it. My D05 (with its collection of MBIGENs) > > only works by accident at the moment, as I found out yesterday, and > > GICv5 IWB is in the same boat, since it reuses the msi-parent thing, > > and therefore the same heuristic. > > > > I guess not having the IWB immediately isn't too big a deal, but I > > really didn't expect to find this... > > To be honest, it was expected. We found these snags while designing > the code (that explains how IWB was structured in v1 - by the way) > but we didn't know if the behaviour above was by construction, we > always thought "we must be making a mistake". Then why didn't you report it? We could have caught this very early on, before the fscked-up code was in a stable release... M. -- Without deviation from the norm, progress is not possible.
On Wed, May 07, 2025 at 10:09:44AM +0100, Marc Zyngier wrote: > On Wed, 07 May 2025 08:54:36 +0100, > Lorenzo Pieralisi <lpieralisi@kernel.org> wrote: > > > > On Tue, May 06, 2025 at 03:05:39PM +0100, Marc Zyngier wrote: > > > On Tue, 06 May 2025 13:23:29 +0100, > > > Lorenzo Pieralisi <lpieralisi@kernel.org> wrote: > > > > > > > > ============= > > > > 2.5 GICv5 IWB > > > > ============= > > > > > > > > The IWB driver has been dropped owing to issues encountered with > > > > core code DOMAIN_BUS_WIRED_TO_MSI bus token handling: > > > > > > > > https://lore.kernel.org/lkml/87tt6310hu.wl-maz@kernel.org/ > > > > > > This problem does not have much to do with DOMAIN_BUS_WIRED_TO_MSI. > > > > > > The issues are that: > > > > > > - the core code calls into the .prepare domain on a per-interrupt > > > basis instead of on a per *device* basis. This is a complete > > > violation of the MSI API, because .prepare is when you are supposed > > > to perform resource reservation (in the GICv3 parlance, that's ITT > > > allocation + MAPD command). > > > > > > - the same function calls .prepare for a *single* interrupt, > > > effectively telling the irqchip "my device has only one interrupt". > > > Because I'm super generous (and don't like wasting precious bytes), > > > I allocate 32 LPIs at the minimum. Only snag is that I could do with > > > 300+ interrupts, and calling repeatedly doesn't help at all, since > > > we cannot *grow* an ITT. > > > > On the IWB driver code that I could not post I noticed that it is > > true that the .prepare callback is called on a per-interrupt basis > > but the vector size is the domain size (ie number of wires) which > > is correct AFAICS, so the ITT size should be fine I don't get why > > it would need to grow. > > Look again. The only reason you are getting something that *looks* > correct is that its_pmsi_prepare() has this nugget: > > /* Allocate at least 32 MSIs, and always as a power of 2 */ > nvec = max_t(int, 32, roundup_pow_of_two(nvec)); > > and that the IWB is, conveniently, in sets of 32. However, the caller > of this function (__msi_domain_alloc_irqs()) passes a nvec value that > is always exactly *1* when allocating an interrupt. nvec is one but this does not work for the reason above, it works because of AFAICS (for the IWB set-up I have): msi_info = msi_get_domain_info(domain); if (msi_info->hwsize > nvec) nvec = msi_info->hwsize; > > So you're just lucky that I picked a minimum ITT size that matches the > IWB on your model. Not really, we test with wires above 32, we end up calling .prepare() with the precise number of wires, don't know why that does not work for the MBIgen (possibly because the interrupt-controller platform devices are children of the "main" MBIgen platform device ? The IWB one is created by OF code, MBIgen has to create children, maybe that's what is going wrong with the device/domain hierarchy ?). > Configure your IWB to be, let's say, 256 interrupts and use the last > one, and you'll have a very different behaviour. See above. > > The difference with this series is that on v3 LPIs are allocated > > on .prepare(), we allocate them on .alloc(). > > Absolutely not. Even on v3, we never allocate LPIs in .prepare(). We > allocate the ITT, perform the MAPD, and that's it. That's why it's > called *prepare*. I supposed that's what its_lpi_alloc() does in its_create_device() but OK, won't mention that any further. > > So yes, calling .prepare on a per-interrupt basis looks like a bug > > but if we allow reusing a deviceID (ie the "shared" thingy) it could > > be harmless. > > Harmless? No. It is really *bad*. It means you lose any sort of sane > tracking of what owns the ITT and how you can free things. Seeing a > devid twice is the admission that we have no idea of what is going on. > > GICv3 is already in that sorry state, but I am hopeful that GICv5 can > be a bit less crap. Well, GICv5 will have to cope with designs, hopefully deviceIDs sharing is a thing of the past I am not eulogizing the concept :) > > > So this code needs to be taken to the backyard and beaten into shape > > > before we can make use of it. My D05 (with its collection of MBIGENs) > > > only works by accident at the moment, as I found out yesterday, and > > > GICv5 IWB is in the same boat, since it reuses the msi-parent thing, > > > and therefore the same heuristic. > > > > > > I guess not having the IWB immediately isn't too big a deal, but I > > > really didn't expect to find this... > > > > To be honest, it was expected. We found these snags while designing > > the code (that explains how IWB was structured in v1 - by the way) > > but we didn't know if the behaviour above was by construction, we > > always thought "we must be making a mistake". > > Then why didn't you report it? We could have caught this very early > on, before the fscked-up code was in a stable release... We spotted it late March - planned to discuss the IWB design while reviewing v5. Thanks, Lorenzo
© 2016 - 2025 Red Hat, Inc.