hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation

[RFC PATCH 0/8] hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation

Posted by Tushar Dave 2 weeks, 2 days ago

This RFC introduces a mechanism to specify Guest Physical Addresses
(GPAs) for PCI BARs, allowing explicit placement of guest MMIO BAR
addresses to match host physical addresses for assigned devices.

On some platforms, P2P DMA is performed between devices within the same
IOMMU group. The PCI fabric ACS is configured to permit direct P2P
without going through the host bridge in order to achieve the required
performance.

To support this multi-device IOMMU group P2P scenario in virtualization,
the VM may need to use the same MMIO BAR addresses as the host physical
address layout.

This series implements a per-device PCI property, "fixed-bars", which
allows users to specify fixed BAR addresses. The property is generic
and is available on any PCI-capable machine. It is a comma-separated
list of BAR assignments:

        barN@<addr>[,barM@<addr>]*

The virt machine builds on this with two additional machine properties:

pci-pre-enum
    When enabled, QEMU performs PCI enumeration and resource assignment
    before handing control to firmware (e.g. EDK2). This includes
    programming 64-bit prefetchable BARs according to fixed-bars
    assignments, and programming bridge prefetchable windows.
    A "pci-enum-done" device-tree property is set so firmware preserves
    the configuration.

pcie-mmio-window
    Defines the MMIO64 window for PCIe devices. When using fixed-bars,
    this allows the aperture to be resized or repositioned so all
    assigned BARs fall within a valid address range.

Why QEMU programs PCI resources rather than EDK2:

To support fixed BAR placement, QEMU performs PCI bus enumeration and
resource assignment prior to firmware execution. EDK2 already provides
a PCD-controlled mechanism (PcdPciDisableBusEnumeration) that allows
the platform to skip PCI enumeration and resource allocation. This
series leverages that mechanism so that, when enabled, firmware runs in
a discovery-only mode and preserves the configuration established by
QEMU.

When pci-pre-enum is enabled, QEMU runs PCI enumeration and resource
allocation, prioritizing fixed BARs specified via fixed-bars. If
allocation fails due to alignment, overlap, or address space constraints,
QEMU terminates with an error. Otherwise, all BARs and bridge windows are
fully programmed before firmware execution.

There is certainly room for improvement, but this RFC aims to gather
feedback on the overall approach chosen to address this problem.

We use the virt machine in this series as the concrete example
consuming the fixed-BAR model. Other machines may require their own
machine-specific mechanism (such as pcie-mmio-window) if they want to
adopt the same approach.

Example usage:

  -machine virt,...,pcie-mmio-window=0x400000000000:0x400000000000,pci-pre-enum=on \
  -device vfio-pci,host=0009:06:00.0,id=dev0 \
  -set device.dev0.fixed-bars=bar2@0x6b8000000000,bar4@0x6c8000000000

Testing:
This series was tested on NVIDIA GB300 platforms with a recent Linux
kernel. GPUDirect P2P between a GPU and a CX8 NIC requires a PCIe
topology in the VM that mirrors bare metal (e.g. both devices under the
same switch and ACS tuned for the minimal P2P paths needed for GPUDirect
RDMA).

TODO:
- The fixed BAR allocator handles 64-bit prefetchable BARs and related
  bridge prefetch windows only. Programming PIO, 32-bit MMIO, and
  64-bit non-prefetchable BARs, and sizing bridge windows for those
  resource types, is left for follow-up patches.
- SR-IOV virtual functions are not included when sizing bridge prefetch
  apertures and may require additional work.
- Add ACPI _DSM so the fixed BARs are preserved.


A git branch with this series applied is available at:
https://github.com/tdavenvidia/upstream-qemu/commits/upstream_May_08_26/

The related EDK2 change is available at:
https://github.com/tdavenvidia/edk2/commits/upstream_May_08_26/

Tushar Dave (8):
  hw/pci: add fixed-bars property to allow fixed BAR addresses
  hw/pci: enumerate PCI bus and program bridge bus numbers
  hw/pci: introduce allocator for fixed BAR placement
  hw/pci: pack remaining BARs and update bridge windows
  hw/pci: allocate remaining BARs for buses without fixed BARs
  hw/pci: finalize bridge prefetch windows after BAR allocation
  hw/arm/virt: add pcie-mmio-window machine property
  hw/arm/virt: add pci-pre-enum machine property

 hw/arm/virt.c               |  157 ++++-
 hw/pci/meson.build          |    2 +
 hw/pci/pci-enumerate.c      |  144 +++++
 hw/pci/pci-enumerate.h      |   15 +
 hw/pci/pci-resource.c       | 1099 +++++++++++++++++++++++++++++++++++
 hw/pci/pci-resource.h       |   82 +++
 hw/pci/pci.c                |  108 ++++
 include/hw/arm/virt.h       |    3 +
 include/hw/pci/pci_device.h |   10 +
 9 files changed, 1615 insertions(+), 5 deletions(-)
 create mode 100644 hw/pci/pci-enumerate.c
 create mode 100644 hw/pci/pci-enumerate.h
 create mode 100644 hw/pci/pci-resource.c
 create mode 100644 hw/pci/pci-resource.h

-- 
2.34.1

Re: [RFC PATCH 0/8] hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation

Posted by Michael S. Tsirkin 1 week, 6 days ago

On Fri, May 08, 2026 at 01:37:09PM -0500, Tushar Dave wrote:
> This RFC introduces a mechanism to specify Guest Physical Addresses
> (GPAs) for PCI BARs, allowing explicit placement of guest MMIO BAR
> addresses to match host physical addresses for assigned devices.
> 
> On some platforms, P2P DMA is performed between devices within the same
> IOMMU group. The PCI fabric ACS is configured to permit direct P2P
> without going through the host bridge in order to achieve the required
> performance.

Pass this info to guest firmware, let it set bars any way it wants?

Re: [RFC PATCH 0/8] hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation

Posted by Tushar Dave 1 week, 6 days ago

On 5/11/2026 4:09 AM, Michael S. Tsirkin wrote:
> On Fri, May 08, 2026 at 01:37:09PM -0500, Tushar Dave wrote:
>> This RFC introduces a mechanism to specify Guest Physical Addresses
>> (GPAs) for PCI BARs, allowing explicit placement of guest MMIO BAR
>> addresses to match host physical addresses for assigned devices.
>>
>> On some platforms, P2P DMA is performed between devices within the same
>> IOMMU group. The PCI fabric ACS is configured to permit direct P2P
>> without going through the host bridge in order to achieve the required
>> performance.
> 
> Pass this info to guest firmware, let it set bars any way it wants?

We are using firmware, relying on the existing EDK2-supported mode
enabled by PcdPciDisableBusEnumeration, where firmware is expected
to preserve the PCI topology and BAR programming established by
the hypervisor.

In our case, the hypervisor is QEMU, which performs PCI enumeration
and resource assignment before handing control to firmware. EDK2
then explicitly refrains from re-enumerating or reallocating PCI
BARs, as this is already a supported firmware behavior.

-Tushar

Re: [RFC PATCH 0/8] hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation

Posted by Michael S. Tsirkin 1 week, 5 days ago

On Mon, May 11, 2026 at 01:10:43PM -0500, Tushar Dave wrote:
> 
> 
> On 5/11/2026 4:09 AM, Michael S. Tsirkin wrote:
> > On Fri, May 08, 2026 at 01:37:09PM -0500, Tushar Dave wrote:
> >> This RFC introduces a mechanism to specify Guest Physical Addresses
> >> (GPAs) for PCI BARs, allowing explicit placement of guest MMIO BAR
> >> addresses to match host physical addresses for assigned devices.
> >>
> >> On some platforms, P2P DMA is performed between devices within the same
> >> IOMMU group. The PCI fabric ACS is configured to permit direct P2P
> >> without going through the host bridge in order to achieve the required
> >> performance.
> > 
> > Pass this info to guest firmware, let it set bars any way it wants?
> 
> We are using firmware, relying on the existing EDK2-supported mode
> enabled by PcdPciDisableBusEnumeration, where firmware is expected
> to preserve the PCI topology and BAR programming established by
> the hypervisor.
> 
> In our case, the hypervisor is QEMU, which performs PCI enumeration
> and resource assignment before handing control to firmware. EDK2
> then explicitly refrains from re-enumerating or reallocating PCI
> BARs, as this is already a supported firmware behavior.
> 
> -Tushar


I see no advantage in performing pci enumeration in qemu when firmware
is already doing an adequate job of it. If you want firmware to map
specific devices at specific addresses, pass that info along to it.

-- 
MST

Re: [RFC PATCH 0/8] hw/arm/virt, hw/pci: PCI pre-enumeration and fixed BAR allocation

Posted by Peter Maydell 1 week, 6 days ago

On Fri, 8 May 2026 at 19:37, Tushar Dave <tdave@nvidia.com> wrote:
>
> This RFC introduces a mechanism to specify Guest Physical Addresses
> (GPAs) for PCI BARs, allowing explicit placement of guest MMIO BAR
> addresses to match host physical addresses for assigned devices.
>
> On some platforms, P2P DMA is performed between devices within the same
> IOMMU group. The PCI fabric ACS is configured to permit direct P2P
> without going through the host bridge in order to achieve the required
> performance.
>
> To support this multi-device IOMMU group P2P scenario in virtualization,
> the VM may need to use the same MMIO BAR addresses as the host physical
> address layout.

This feels like something's wrong in the design. A VM doesn't
necessarily have the same memory layout as the host: the
VM hardware is all about making that possible.

> Why QEMU programs PCI resources rather than EDK2:
>
> To support fixed BAR placement, QEMU performs PCI bus enumeration and
> resource assignment prior to firmware execution. EDK2 already provides
> a PCD-controlled mechanism (PcdPciDisableBusEnumeration) that allows
> the platform to skip PCI enumeration and resource allocation. This
> series leverages that mechanism so that, when enabled, firmware runs in
> a discovery-only mode and preserves the configuration established by
> QEMU.

I'm definitely not enthusiastic about having QEMU do PCI bus
enumeration. This isn't the way the hardware does it, and it's a
lot of code that's duplicating what the guest already has (there's
over a thousand lines of code in this patchset).

> We use the virt machine in this series as the concrete example
> consuming the fixed-BAR model. Other machines may require their own
> machine-specific mechanism (such as pcie-mmio-window) if they want to
> adopt the same approach.
>
> Example usage:
>
>   -machine virt,...,pcie-mmio-window=0x400000000000:0x400000000000,pci-pre-enum=on \
>   -device vfio-pci,host=0009:06:00.0,id=dev0 \
>   -set device.dev0.fixed-bars=bar2@0x6b8000000000,bar4@0x6c8000000000

...and you end up with enormous command lines like this full of
magic numbers relating to address space layout.

I think it would be better to find a way of doing this that
doesn't have the "VM address space layout has to match the
host layout" restriction.

-- PMM