xen/arch/arm/Kconfig | 6 + xen/arch/arm/Makefile | 1 + xen/arch/arm/dom0less-build.c | 1 + xen/arch/arm/domain.c | 16 + xen/arch/arm/gic-v2.c | 2 +- xen/arch/arm/gic-v3-its.c | 339 +++++-- xen/arch/arm/gic-v3-lpi.c | 169 +++- xen/arch/arm/gic-v3.c | 215 ++++- xen/arch/arm/gic-v4-its.c | 1136 ++++++++++++++++++++++++ xen/arch/arm/gic-vgic.c | 6 + xen/arch/arm/include/asm/gic.h | 4 +- xen/arch/arm/include/asm/gic_v3_defs.h | 22 + xen/arch/arm/include/asm/gic_v3_its.h | 139 ++- xen/arch/arm/include/asm/gic_v4_its.h | 114 +++ xen/arch/arm/include/asm/vgic.h | 79 +- xen/arch/arm/vgic-v3-its.c | 60 +- xen/arch/arm/vgic.c | 37 +- xen/common/domain.c | 14 +- xen/include/public/arch-arm.h | 2 + 19 files changed, 2174 insertions(+), 188 deletions(-) create mode 100644 xen/arch/arm/gic-v4-its.c create mode 100644 xen/arch/arm/include/asm/gic_v4_its.h
This series introduces GICv4 direct LPI injection for Xen. Direct LPI injection relies on the GIC tracking the mapping between physical and virtual CPUs. Each VCPU requires a VPE that is created and registered with the GIC via the `VMAPP` ITS command. The GIC is then informed of the current VPE-to-PCPU placement by programming `VPENDBASER` and `VPROPBASER` in the appropriate redistributor. LPIs are associated with VPEs through the `VMAPTI` ITS command, after which the GIC handles delivery without trapping into the hypervisor for each interrupt. When a VPE is not scheduled but has pending interrupts, the GIC raises a per-VPE doorbell LPI. Doorbells are owned by the hypervisor and prompt rescheduling so the VPE can drain its pending LPIs. Because GICv4 lacks a native doorbell invalidation mechanism, this series includes a helper that invalidates doorbell LPIs via synthetic “proxy” devices, following the approach used until GICv4.1. All of this work is mostly based on the work of Penny Zheng <penny.zheng@arm.com> and Luca Fancellu <luca.fancellu@arm.com>. And also from Linux patches by Mark Zyngier. Some patches are still a little rough and need some styling fixes and more testing, as all of them needed to be carved line by line from a giant ~4000 line patch. This RFC is directed mostly to get a general idea if the proposed approach is suitable and OK with everyone. And there is still an open question of how to handle Signed-off-by lines for Penny and Luca, since they have not indicated their preference yet. Mykyta Poturai (19): arm/gicv4 add management structure definitions arm/gicv4-its: Add GICv4 ITS command definitions arm/its: Export struct its_device arm/its: Add vlpi configuration arm/irq: Add hw flag to pending_irq arm/gicv4-its: Add VLPI map/unmap operations xen/domain: Alloc enough pages for VCPU struct arm/gic: Keep track of GIC features arm/its: Implement LPI invalidation arm/its: Keep track of BASER regs arm/its: Add ITS VM and VPE allocation/teardown arm/gic: Add VPENDBASER/VPROPBASER accessors arm/gic: VPE scheduling arm/its: VPE affinity changes arm: Add gicv4 to domain creation arm/gic: Fix LR group handling for GICv4 arm/gicv4: Handle doorbells arm/gic: Add VPE proxy support arm/gicv4: Add GICv4 to the build system xen/arch/arm/Kconfig | 6 + xen/arch/arm/Makefile | 1 + xen/arch/arm/dom0less-build.c | 1 + xen/arch/arm/domain.c | 16 + xen/arch/arm/gic-v2.c | 2 +- xen/arch/arm/gic-v3-its.c | 339 +++++-- xen/arch/arm/gic-v3-lpi.c | 169 +++- xen/arch/arm/gic-v3.c | 215 ++++- xen/arch/arm/gic-v4-its.c | 1136 ++++++++++++++++++++++++ xen/arch/arm/gic-vgic.c | 6 + xen/arch/arm/include/asm/gic.h | 4 +- xen/arch/arm/include/asm/gic_v3_defs.h | 22 + xen/arch/arm/include/asm/gic_v3_its.h | 139 ++- xen/arch/arm/include/asm/gic_v4_its.h | 114 +++ xen/arch/arm/include/asm/vgic.h | 79 +- xen/arch/arm/vgic-v3-its.c | 60 +- xen/arch/arm/vgic.c | 37 +- xen/common/domain.c | 14 +- xen/include/public/arch-arm.h | 2 + 19 files changed, 2174 insertions(+), 188 deletions(-) create mode 100644 xen/arch/arm/gic-v4-its.c create mode 100644 xen/arch/arm/include/asm/gic_v4_its.h -- 2.51.2
Hi Mykyta, We have a number of series from you which have not been merged yet and reviewing them all in parallel might be challenging. Would you mind giving us a status and maybe priorities on them. I could list the following series: - GICv4 - CPU Hotplug on arm - PCI enumeration on arm - IPMMU for pci on arm - dom0less for pci passthrough on arm - SR-IOV for pvh - SMMU for pci on arm - MSI injection on arm - suspend to ram on arm There might be others feel free to complete the list. On GICv4... > On 2 Feb 2026, at 17:14, Mykyta Poturai <Mykyta_Poturai@epam.com> wrote: > > This series introduces GICv4 direct LPI injection for Xen. > > Direct LPI injection relies on the GIC tracking the mapping between physical and > virtual CPUs. Each VCPU requires a VPE that is created and registered with the > GIC via the `VMAPP` ITS command. The GIC is then informed of the current > VPE-to-PCPU placement by programming `VPENDBASER` and `VPROPBASER` in the > appropriate redistributor. LPIs are associated with VPEs through the `VMAPTI` > ITS command, after which the GIC handles delivery without trapping into the > hypervisor for each interrupt. > > When a VPE is not scheduled but has pending interrupts, the GIC raises a per-VPE > doorbell LPI. Doorbells are owned by the hypervisor and prompt rescheduling so > the VPE can drain its pending LPIs. > > Because GICv4 lacks a native doorbell invalidation mechanism, this series > includes a helper that invalidates doorbell LPIs via synthetic “proxy” devices, > following the approach used until GICv4.1. > > All of this work is mostly based on the work of Penny Zheng > <penny.zheng@arm.com> and Luca Fancellu <luca.fancellu@arm.com>. And also from > Linux patches by Mark Zyngier. > > Some patches are still a little rough and need some styling fixes and more > testing, as all of them needed to be carved line by line from a giant ~4000 line > patch. This RFC is directed mostly to get a general idea if the proposed > approach is suitable and OK with everyone. And there is still an open question > of how to handle Signed-off-by lines for Penny and Luca, since they have not > indicated their preference yet. I would like to ask how much performance benefits you could have with this. Adding GICv4 support is adding a lot of code which will have to be maintained and tested and there should be a good improvement to justify this. Did you do some benchmarks ? what are the results ? At the time where we started to work on that at Arm, we ended up in the conclusion that the complexity in Xen compared to the benefit was not justifying it hence why this work was stopped in favor of other features that we thought would be more beneficial to Xen (like PCI passthrough or SMMUv3). Cheers Bertrand
Hi Bertrand, On Tue, Feb 3, 2026 at 12:02 PM Bertrand Marquis <Bertrand.Marquis@arm.com> wrote: > > Hi Mykyta, > > We have a number of series from you which have not been merged yet and > reviewing them all in parallel might be challenging. > > Would you mind giving us a status and maybe priorities on them. > > I could list the following series: > - GICv4 > - CPU Hotplug on arm > - PCI enumeration on arm > - IPMMU for pci on arm > - dom0less for pci passthrough on arm > - SR-IOV for pvh > - SMMU for pci on arm > - MSI injection on arm > - suspend to ram on arm > > There might be others feel free to complete the list. > > On GICv4... > > > On 2 Feb 2026, at 17:14, Mykyta Poturai <Mykyta_Poturai@epam.com> wrote: > > > > This series introduces GICv4 direct LPI injection for Xen. > > > > Direct LPI injection relies on the GIC tracking the mapping between physical and > > virtual CPUs. Each VCPU requires a VPE that is created and registered with the > > GIC via the `VMAPP` ITS command. The GIC is then informed of the current > > VPE-to-PCPU placement by programming `VPENDBASER` and `VPROPBASER` in the > > appropriate redistributor. LPIs are associated with VPEs through the `VMAPTI` > > ITS command, after which the GIC handles delivery without trapping into the > > hypervisor for each interrupt. > > > > When a VPE is not scheduled but has pending interrupts, the GIC raises a per-VPE > > doorbell LPI. Doorbells are owned by the hypervisor and prompt rescheduling so > > the VPE can drain its pending LPIs. > > > > Because GICv4 lacks a native doorbell invalidation mechanism, this series > > includes a helper that invalidates doorbell LPIs via synthetic “proxy” devices, > > following the approach used until GICv4.1. > > > > All of this work is mostly based on the work of Penny Zheng > > <penny.zheng@arm.com> and Luca Fancellu <luca.fancellu@arm.com>. And also from > > Linux patches by Mark Zyngier. > > > > Some patches are still a little rough and need some styling fixes and more > > testing, as all of them needed to be carved line by line from a giant ~4000 line > > patch. This RFC is directed mostly to get a general idea if the proposed > > approach is suitable and OK with everyone. And there is still an open question > > of how to handle Signed-off-by lines for Penny and Luca, since they have not > > indicated their preference yet. > > I would like to ask how much performance benefits you could > have with this. > Adding GICv4 support is adding a lot of code which will have to be maintained > and tested and there should be a good improvement to justify this. > > Did you do some benchmarks ? what are the results ? > > At the time where we started to work on that at Arm, we ended up in the conclusion > that the complexity in Xen compared to the benefit was not justifying it hence why > this work was stopped in favor of other features that we thought would be more > beneficial to Xen (like PCI passthrough or SMMUv3). I have been asked to run benchmarks for this series, so here is a short update from my side. Test setup: - AWS c7g bare metal - Linux bare-metal reference and Xen dom0 runs - fio random-read workloads on an NVMe-backed EBS volume (gp3, 160G, 80k iops) - Main workloads: - 4k, iodepth=1 - 16k, iodepth=1 - 4k, iodepth=4 - 4k, iodepth=1, numjobs=4 - 5 repetitions per configuration, looking mainly at median values - Main Xen comparison was done with the default scheduler (credit2), direct LPIs OFF vs ON Summary: - With credit2, enabling direct LPIs gave a small but repeatable IOPS improvement across all tested workloads, roughly in the 0.8-1.1% range. - Mean completion latency also improved consistently. - The clearest gain was in tail latency. In the 4k randread, iodepth=1, numjobs=4 case, p99.9 improved by about 41% and p99.99 by about 34% with direct LPIs enabled. - In this setup, switching from credit2 to null did not materially change median throughput, so the observed improvement appears to come primarily from the interrupt delivery path rather than from the scheduler choice. A few caveats: - This was a low-contention setup with only dom0 using 8 CPUs, so it did not exercise heavy VCPU migration or scheduler pressure. - I also tried an artificially constrained NVMe host queue depth configuration, but I am treating that only as a stress/control case and not as the main result. A full benchmark report is available here: https://github.com/xakep-amatop/giv4-benchmark/blob/main/report.pdf The same repository also contains the raw benchmark result archives used for the analysis. So, based on these measurements, there does appear to be a measurable benefit from direct LPI injection, with the strongest effect showing up in tail latency rather than in median throughput. If you need any additional benchmark results or specific test cases, please let me know. Best regards, Mykola > > Cheers > Bertrand >
Hi Mykyta, Thank you for this patch series. I'll go through it and follow up with comments shortly. On Tue, Feb 3, 2026 at 12:02 PM Bertrand Marquis <Bertrand.Marquis@arm.com> wrote: > > Hi Mykyta, > > We have a number of series from you which have not been merged yet and > reviewing them all in parallel might be challenging. > > Would you mind giving us a status and maybe priorities on them. > > I could list the following series: > - GICv4 > - CPU Hotplug on arm > - PCI enumeration on arm > - IPMMU for pci on arm > - dom0less for pci passthrough on arm > - SR-IOV for pvh > - SMMU for pci on arm > - MSI injection on arm > - suspend to ram on arm > > There might be others feel free to complete the list. > > On GICv4... > > > On 2 Feb 2026, at 17:14, Mykyta Poturai <Mykyta_Poturai@epam.com> wrote: > > > > This series introduces GICv4 direct LPI injection for Xen. > > > > Direct LPI injection relies on the GIC tracking the mapping between physical and > > virtual CPUs. Each VCPU requires a VPE that is created and registered with the > > GIC via the `VMAPP` ITS command. The GIC is then informed of the current > > VPE-to-PCPU placement by programming `VPENDBASER` and `VPROPBASER` in the > > appropriate redistributor. LPIs are associated with VPEs through the `VMAPTI` > > ITS command, after which the GIC handles delivery without trapping into the > > hypervisor for each interrupt. > > > > When a VPE is not scheduled but has pending interrupts, the GIC raises a per-VPE > > doorbell LPI. Doorbells are owned by the hypervisor and prompt rescheduling so > > the VPE can drain its pending LPIs. > > > > Because GICv4 lacks a native doorbell invalidation mechanism, this series > > includes a helper that invalidates doorbell LPIs via synthetic “proxy” devices, > > following the approach used until GICv4.1. > > > > All of this work is mostly based on the work of Penny Zheng > > <penny.zheng@arm.com> and Luca Fancellu <luca.fancellu@arm.com>. And also from > > Linux patches by Mark Zyngier. > > > > Some patches are still a little rough and need some styling fixes and more > > testing, as all of them needed to be carved line by line from a giant ~4000 line > > patch. This RFC is directed mostly to get a general idea if the proposed > > approach is suitable and OK with everyone. And there is still an open question > > of how to handle Signed-off-by lines for Penny and Luca, since they have not > > indicated their preference yet. > > I would like to ask how much performance benefits you could > have with this. > Adding GICv4 support is adding a lot of code which will have to be maintained > and tested and there should be a good improvement to justify this. > > Did you do some benchmarks ? what are the results ? One more benchmarking note (and rationale): for meaningful performance testing it may be necessary to disable WFI trapping (boot Xen with `vwfi=native`). If WFI is trapped, each guest idle instruction causes a VM-exit, and Xen typically deschedules the vCPU. This makes the vCPU become "non-resident" more often, so subsequent wakeups (e.g. vSGI/vLPI) tend to go through a slower host-mediated path (waking the vCPU thread via the scheduler and performing extra state transitions) instead of letting the hardware wake and deliver to a running guest quickly. For this reason it may be worth conditionally recommending (or even auto-selecting) `vwfi=native` when direct injection is enabled for a vCPU, so measurements reflect the actual delivery fast-path rather than exit/scheduling overhead. --- One more suggestion: it may be worth adding this as a small patch in the series (or at least documenting it prominently). When direct injection is enabled for a vCPU, trapping WFI can skew both behaviour and benchmarks by pushing the vCPU into a "non-resident" state more often and forcing wakeups to go through the host/scheduler path. A conditional recommendation (or even auto-selecting `vwfi=native` in that mode) would help keep the fast-path measurable and predictable. Best regards, Mykola > > At the time where we started to work on that at Arm, we ended up in the conclusion > that the complexity in Xen compared to the benefit was not justifying it hence why > this work was stopped in favor of other features that we thought would be more > beneficial to Xen (like PCI passthrough or SMMUv3)> > Cheers > Bertrand >
Hi Mykola, On 13/02/2026 11:36, Mykola Kvach wrote: > For this reason it may be worth conditionally recommending (or even > auto-selecting) `vwfi=native` when direct injection is enabled for a > vCPU, so measurements reflect the actual delivery fast-path rather than > exit/scheduling overhead. I don't think this is a straightforward answer. "vwfi=native" is beneficial when you have a single vCPU scheduled per pCPU. But if you have multiple vCPUs running, then you may impair the overall performance of the system as the scheduler will not be able to run another vCPU even if the current vCPU is doing nothing (it is waiting for an interrupt). As a data point, Xen didn't initially trapped WFI/WFE. But we noticed a lot of slow down during boot if all the vCPUs for a guest were running on the same pCPU. The difference was quite noticeable. So instead of recommending to always set "vwfi=native", I would consider an approach where Xen decides whether WFI/WFE is trapped based on the number of vCPUs that can be scheduled on a given pCPU. This could be adjusted on demand. Cheers, -- Julien Grall
Hi Julien, On Fri, Feb 13, 2026 at 1:48 PM Julien Grall <julien@xen.org> wrote: > > Hi Mykola, > > On 13/02/2026 11:36, Mykola Kvach wrote: > > For this reason it may be worth conditionally recommending (or even > > auto-selecting) `vwfi=native` when direct injection is enabled for a > > vCPU, so measurements reflect the actual delivery fast-path rather than > > exit/scheduling overhead. > > I don't think this is a straightforward answer. "vwfi=native" is > beneficial when you have a single vCPU scheduled per pCPU. But if you > have multiple vCPUs running, then you may impair the overall performance > of the system as the scheduler will not be able to run another vCPU even > if the current vCPU is doing nothing (it is waiting for an interrupt). > > As a data point, Xen didn't initially trapped WFI/WFE. But we noticed a > lot of slow down during boot if all the vCPUs for a guest were running > on the same pCPU. The difference was quite noticeable. > > So instead of recommending to always set "vwfi=native", I would consider > an approach where Xen decides whether WFI/WFE is trapped based on the > number of vCPUs that can be scheduled on a given pCPU. This could be > adjusted on demand. Thanks for the clarification. I agree: recommending vwfi=native unconditionally is not correct. What I meant was specifically for benchmarking direct injection in a 1:1 vCPU:pCPU setup (or with vCPUs pinned), where trapping WFI/WFE adds extra exits and can hide the fast-path benefit. For general setups with oversubscription, vwfi=trap is the right default, because it lets Xen schedule another runnable vCPU instead of leaving a pCPU effectively idle while the guest sits in WFI. I like your suggestion: make WFI/WFE trapping adaptive based on whether the current pCPU has other runnable vCPUs. Best regards, Mykola
On 03.02.26 12:01, Bertrand Marquis wrote: > Hi Mykyta, > > We have a number of series from you which have not been merged yet and > reviewing them all in parallel might be challenging. > > Would you mind giving us a status and maybe priorities on them. > > I could list the following series: > - GICv4 > - CPU Hotplug on arm > - PCI enumeration on arm > - IPMMU for pci on arm > - dom0less for pci passthrough on arm > - SR-IOV for pvh > - SMMU for pci on arm > - MSI injection on arm > - suspend to ram on arm > > There might be others feel free to complete the list. > > On GICv4... > >> On 2 Feb 2026, at 17:14, Mykyta Poturai <Mykyta_Poturai@epam.com> wrote: >> >> This series introduces GICv4 direct LPI injection for Xen. >> >> Direct LPI injection relies on the GIC tracking the mapping between physical and >> virtual CPUs. Each VCPU requires a VPE that is created and registered with the >> GIC via the `VMAPP` ITS command. The GIC is then informed of the current >> VPE-to-PCPU placement by programming `VPENDBASER` and `VPROPBASER` in the >> appropriate redistributor. LPIs are associated with VPEs through the `VMAPTI` >> ITS command, after which the GIC handles delivery without trapping into the >> hypervisor for each interrupt. >> >> When a VPE is not scheduled but has pending interrupts, the GIC raises a per-VPE >> doorbell LPI. Doorbells are owned by the hypervisor and prompt rescheduling so >> the VPE can drain its pending LPIs. >> >> Because GICv4 lacks a native doorbell invalidation mechanism, this series >> includes a helper that invalidates doorbell LPIs via synthetic “proxy” devices, >> following the approach used until GICv4.1. >> >> All of this work is mostly based on the work of Penny Zheng >> <penny.zheng@arm.com> and Luca Fancellu <luca.fancellu@arm.com>. And also from >> Linux patches by Mark Zyngier. >> >> Some patches are still a little rough and need some styling fixes and more >> testing, as all of them needed to be carved line by line from a giant ~4000 line >> patch. This RFC is directed mostly to get a general idea if the proposed >> approach is suitable and OK with everyone. And there is still an open question >> of how to handle Signed-off-by lines for Penny and Luca, since they have not >> indicated their preference yet. > > I would like to ask how much performance benefits you could > have with this. > Adding GICv4 support is adding a lot of code which will have to be maintained > and tested and there should be a good improvement to justify this. > > Did you do some benchmarks ? what are the results ? > > At the time where we started to work on that at Arm, we ended up in the conclusion > that the complexity in Xen compared to the benefit was not justifying it hence why > this work was stopped in favor of other features that we thought would be more > beneficial to Xen (like PCI passthrough or SMMUv3). > > Cheers > Bertrand > Hi Bertrand Current priorities are: - CPU hotplug - Suspend to RAM - GICv4 (we will follow up with benchmarks) - SR-IOV MSI injection, dom0less for pci and PCI enumeration are low priority for now Suspend to RAM is handled by Mykola Kvach SMMU and IPMMU support are merged already AFAIU -- Mykyta
Hi Mykyta, > On 3 Feb 2026, at 13:24, Mykyta Poturai <Mykyta_Poturai@epam.com> wrote: > > On 03.02.26 12:01, Bertrand Marquis wrote: >> Hi Mykyta, >> >> We have a number of series from you which have not been merged yet and >> reviewing them all in parallel might be challenging. >> >> Would you mind giving us a status and maybe priorities on them. >> >> I could list the following series: >> - GICv4 >> - CPU Hotplug on arm >> - PCI enumeration on arm >> - IPMMU for pci on arm >> - dom0less for pci passthrough on arm >> - SR-IOV for pvh >> - SMMU for pci on arm >> - MSI injection on arm >> - suspend to ram on arm >> >> There might be others feel free to complete the list. >> >> On GICv4... >> >>> On 2 Feb 2026, at 17:14, Mykyta Poturai <Mykyta_Poturai@epam.com> wrote: >>> >>> This series introduces GICv4 direct LPI injection for Xen. >>> >>> Direct LPI injection relies on the GIC tracking the mapping between physical and >>> virtual CPUs. Each VCPU requires a VPE that is created and registered with the >>> GIC via the `VMAPP` ITS command. The GIC is then informed of the current >>> VPE-to-PCPU placement by programming `VPENDBASER` and `VPROPBASER` in the >>> appropriate redistributor. LPIs are associated with VPEs through the `VMAPTI` >>> ITS command, after which the GIC handles delivery without trapping into the >>> hypervisor for each interrupt. >>> >>> When a VPE is not scheduled but has pending interrupts, the GIC raises a per-VPE >>> doorbell LPI. Doorbells are owned by the hypervisor and prompt rescheduling so >>> the VPE can drain its pending LPIs. >>> >>> Because GICv4 lacks a native doorbell invalidation mechanism, this series >>> includes a helper that invalidates doorbell LPIs via synthetic “proxy” devices, >>> following the approach used until GICv4.1. >>> >>> All of this work is mostly based on the work of Penny Zheng >>> <penny.zheng@arm.com> and Luca Fancellu <luca.fancellu@arm.com>. And also from >>> Linux patches by Mark Zyngier. >>> >>> Some patches are still a little rough and need some styling fixes and more >>> testing, as all of them needed to be carved line by line from a giant ~4000 line >>> patch. This RFC is directed mostly to get a general idea if the proposed >>> approach is suitable and OK with everyone. And there is still an open question >>> of how to handle Signed-off-by lines for Penny and Luca, since they have not >>> indicated their preference yet. >> >> I would like to ask how much performance benefits you could >> have with this. >> Adding GICv4 support is adding a lot of code which will have to be maintained >> and tested and there should be a good improvement to justify this. >> >> Did you do some benchmarks ? what are the results ? >> >> At the time where we started to work on that at Arm, we ended up in the conclusion >> that the complexity in Xen compared to the benefit was not justifying it hence why >> this work was stopped in favor of other features that we thought would be more >> beneficial to Xen (like PCI passthrough or SMMUv3). >> >> Cheers >> Bertrand >> > > Hi Bertrand > > Current priorities are: > > - CPU hotplug > - Suspend to RAM > - GICv4 (we will follow up with benchmarks) > - SR-IOV > Ok Let's focus on what is already there and being reviewed before GICv4. I will follow up and your CPU hotplug review and suspend to RAM is already advanced so we should focus on finishing those first. Cheers Bertrand > > MSI injection, dom0less for pci and PCI enumeration are low priority for now > > Suspend to RAM is handled by Mykola Kvach > > SMMU and IPMMU support are merged already AFAIU > > -- > Mykyta
© 2016 - 2026 Red Hat, Inc.