[v1] Add support for configuring PCI high memory MMIO size

[PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Matthew R. Ochs via Devel 10 months ago

Resending: Series has been re-based over latest upstream.

This patch series adds support for configuring the PCI high memory MMIO
window size for aarch64 virt machine types. This feature has been merged
into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
It allows users to configure the size of the high memory MMIO window above
4GB, which is particularly useful for systems with large amounts of PCI
memory requirements.
    
The feature is exposed through the domain XML as a new PCI feature:
<features>
  <pci>
    <highmem-mmio-size unit='G'>512</highmem-mmio-size>
  </pci>
</features>

When enabled, this configures the size of the PCI high memory MMIO window
via QEMU's highmem-mmio-size machine property. The feature is only
available for aarch64 virt machine types and requires QEMU support.

This series depends on [2] and should be applied on top of those patches.

For your convenience, this series is also available on Github [3].

[1] https://github.com/qemu/qemu/commit/f10104aeae3a17f181d5bb37b7fd7dad7fe86cba
[2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/Z4NQ3CVQYLNGZRBC35CUHOQ2EXJROPYG/
[3] git fetch https://github.com/nvmochs/libvirt.git pci_highmem_mmio_size

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>

Matthew R. Ochs (6):
  domain: Add PCI configuration feature infrastructure
  schema: Add PCI configuration feature schema
  conf: Add PCI configuration XML parsing and formatting
  qemu: Add capability for PCI high memory MMIO size
  qemu: Add command line support for PCI high memory MMIO size
  tests: Add tests for machine PCI features

 src/conf/domain_conf.c                        | 103 ++++++++++++++++++
 src/conf/domain_conf.h                        |   6 +
 src/conf/schemas/domaincommon.rng             |   9 ++
 src/qemu/qemu_capabilities.c                  |   2 +
 src/qemu/qemu_capabilities.h                  |   1 +
 src/qemu/qemu_command.c                       |   6 +
 src/qemu/qemu_validate.c                      |  15 +++
 .../caps_10.0.0_aarch64.replies               |  10 ++
 .../caps_10.0.0_aarch64.xml                   |   1 +
 ...rch64-virt-machine-pci.aarch64-latest.args |  31 ++++++
 ...arch64-virt-machine-pci.aarch64-latest.xml |  30 +++++
 .../aarch64-virt-machine-pci.xml              |  20 ++++
 tests/qemuxmlconftest.c                       |   2 +
 13 files changed, 236 insertions(+)
 create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args
 create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml
 create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml

-- 
2.46.0

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Daniel P. Berrangé via Devel 9 months ago

On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
> Resending: Series has been re-based over latest upstream.
> 
> This patch series adds support for configuring the PCI high memory MMIO
> window size for aarch64 virt machine types. This feature has been merged
> into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
> It allows users to configure the size of the high memory MMIO window above
> 4GB, which is particularly useful for systems with large amounts of PCI
> memory requirements.
>     
> The feature is exposed through the domain XML as a new PCI feature:
> <features>
>   <pci>
>     <highmem-mmio-size unit='G'>512</highmem-mmio-size>
>   </pci>
> </features>

As a schema design comment. IIUC, the MMIO size we're configuring
is conceptually a characteristic associated with the PCI(e) host
and the memory layout it defines for PCI(e) devices to use.

Checking through our schema I find we already have support
for

    <controller type='pci' index='0' model='pci-root'>
      <pcihole64 unit='KiB'>1048576</pcihole64>
    </controller>

this makes me think that we should model this new attribute
in a similar way, eg so we can support:

    <controller type='pci' index='0' model='pci-root'>
      <pcihole64 unit='KiB'>1048576</pcihole64>
      <pcimmio64 unit='TiB'>2</pcimmio64>
    </controller>

(pci-root or pcie-root are interchangable).

This 'pcimmio64' value can then be mapped to whatever hypervisor
or architecture specific setting is appropriate, avoiding exposing
the QEMU arm 'highmem-mmio-size' naming convention.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Matt Ochs via Devel 9 months ago

> On May 9, 2025, at 9:59 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
>> Resending: Series has been re-based over latest upstream.
>> 
>> This patch series adds support for configuring the PCI high memory MMIO
>> window size for aarch64 virt machine types. This feature has been merged
>> into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
>> It allows users to configure the size of the high memory MMIO window above
>> 4GB, which is particularly useful for systems with large amounts of PCI
>> memory requirements.
>> 
>> The feature is exposed through the domain XML as a new PCI feature:
>> <features>
>>  <pci>
>>    <highmem-mmio-size unit='G'>512</highmem-mmio-size>
>>  </pci>
>> </features>
> 
> As a schema design comment. IIUC, the MMIO size we're configuring
> is conceptually a characteristic associated with the PCI(e) host
> and the memory layout it defines for PCI(e) devices to use.

Correct.

> Checking through our schema I find we already have support
> for
> 
>    <controller type='pci' index='0' model='pci-root'>
>      <pcihole64 unit='KiB'>1048576</pcihole64>
>    </controller>
> 
> this makes me think that we should model this new attribute
> in a similar way, eg so we can support:
> 
>    <controller type='pci' index='0' model='pci-root'>
>      <pcihole64 unit='KiB'>1048576</pcihole64>
>      <pcimmio64 unit='TiB'>2</pcimmio64>
>    </controller>
> 
> (pci-root or pcie-root are interchangable).
> 
> This 'pcimmio64' value can then be mapped to whatever hypervisor
> or architecture specific setting is appropriate, avoiding exposing
> the QEMU arm 'highmem-mmio-size' naming convention.

Thanks for the feedback, this sounds like a better approach.

Would it make sense to just use the existing pcihole64 since [I think]
it more or less represents the same concept (setting 64bit MMIO window)?

Or perhaps that would be too messy or x86-centric and it’s better to go
with what you proposed (pcimmio64)?

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Daniel P. Berrangé via Devel 9 months ago

On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
> > On May 9, 2025, at 9:59 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > 
> > On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
> >> Resending: Series has been re-based over latest upstream.
> >> 
> >> This patch series adds support for configuring the PCI high memory MMIO
> >> window size for aarch64 virt machine types. This feature has been merged
> >> into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
> >> It allows users to configure the size of the high memory MMIO window above
> >> 4GB, which is particularly useful for systems with large amounts of PCI
> >> memory requirements.
> >> 
> >> The feature is exposed through the domain XML as a new PCI feature:
> >> <features>
> >>  <pci>
> >>    <highmem-mmio-size unit='G'>512</highmem-mmio-size>
> >>  </pci>
> >> </features>
> > 
> > As a schema design comment. IIUC, the MMIO size we're configuring
> > is conceptually a characteristic associated with the PCI(e) host
> > and the memory layout it defines for PCI(e) devices to use.
> 
> Correct.
> 
> > Checking through our schema I find we already have support
> > for
> > 
> >    <controller type='pci' index='0' model='pci-root'>
> >      <pcihole64 unit='KiB'>1048576</pcihole64>
> >    </controller>
> > 
> > this makes me think that we should model this new attribute
> > in a similar way, eg so we can support:
> > 
> >    <controller type='pci' index='0' model='pci-root'>
> >      <pcihole64 unit='KiB'>1048576</pcihole64>
> >      <pcimmio64 unit='TiB'>2</pcimmio64>
> >    </controller>
> > 
> > (pci-root or pcie-root are interchangable).
> > 
> > This 'pcimmio64' value can then be mapped to whatever hypervisor
> > or architecture specific setting is appropriate, avoiding exposing
> > the QEMU arm 'highmem-mmio-size' naming convention.
> 
> Thanks for the feedback, this sounds like a better approach.
> 
> Would it make sense to just use the existing pcihole64 since [I think]
> it more or less represents the same concept (setting 64bit MMIO window)?

I'm not sure. I've been struggling to reproduce an effect wit hseting
the existing  -global q35-pcihost.pci-hole64-size=1048576K settings
on x86, and also wondering how it interacts with the previously
mentioned  -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144

Possibly the former only works with SeaBIOS, and the latter only
works with EDK2, but I've not figured out how to prove this.

I'm curious if there's a good way to identify the guest memory
map impact, as I'm not finding a clear marker in 'dmesg' that
correlates ?

> Or perhaps that would be too messy or x86-centric and it’s better to go
> with what you proposed (pcimmio64)?

If the 'pcihole64' setting really is setting the MMIO64 window, then it
would be preferrable to re-use the existing setting field.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Matt Ochs via Devel 9 months ago

> On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
>> 
>> Would it make sense to just use the existing pcihole64 since [I think]
>> it more or less represents the same concept (setting 64bit MMIO window)?
> 
> I'm not sure. I've been struggling to reproduce an effect wit hseting
> the existing  -global q35-pcihost.pci-hole64-size=1048576K settings
> on x86, and also wondering how it interacts with the previously
> mentioned  -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144
> 
> Possibly the former only works with SeaBIOS, and the latter only
> works with EDK2, but I've not figured out how to prove this.

The qemu docs mention opt/ovmf is specifically for OVMF firmware:
https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/docs/specs/fw_cfg.rst#L279

The pcihole64 setting can be used with OVMF (see below) and with SEABIOS:
https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64)

The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt
XML would need to directly pass qemu command line arguments to use it.

> 
> I'm curious if there's a good way to identify the guest memory
> map impact, as I'm not finding a clear marker in 'dmesg' that
> correlates ?

We were able to test this by using OVMF without the dynamic mmio
window size patch (i.e. a version older than edk2-stable202211) and
guest kernel parameters that are not set to allow re-calculating the
MMIO window size by deferring guest resource allocations to the guest
kernel (i.e. pci=realloc and pci=nocrs aren't set). With this we could 
reproduce a 4 GPU VM launch with guest BARs not mapped properly
due to running out of space/resources. The BAR mapping failures will
be clear in dmesg, with no BAR region mappings in /proc/iomem or
output of lspci for the GPUs.

From there we added the pcihole64 attribute to the VM's libvirt definition,
setting a 2 TB hole size, and the VM booted with guest GPU BARs mapped
properly in dmesg + GPU BAR mappings visible in /proc/iomem and lspci output.

Lastly, observed the same behavior by removing the pcihole64 attribute and
setting the X-PciMmio64Mb configuration to 2TB.

> 
>> Or perhaps that would be too messy or x86-centric and it’s better to go
>> with what you proposed (pcimmio64)?
> 
> If the 'pcihole64' setting really is setting the MMIO64 window, then it
> would be preferrable to re-use the existing setting field.

Per the tests above, pcihole64 is setting the MMIO64 window. The only concern
I have with using it is that to date, it has been an x86-centric attribute and tied
closely with the qemu -global parameter. I don’t think this is a show-stopper, but
will require some code changes to allow it to work with the virt machine and
connect it up to a different qemu parameter for that machine.

-matt

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Daniel P. Berrangé via Devel 9 months ago

On Mon, May 12, 2025 at 07:33:37PM +0000, Matt Ochs wrote:
> > On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
> >> 
> >> Would it make sense to just use the existing pcihole64 since [I think]
> >> it more or less represents the same concept (setting 64bit MMIO window)?
> > 
> > I'm not sure. I've been struggling to reproduce an effect wit hseting
> > the existing  -global q35-pcihost.pci-hole64-size=1048576K settings
> > on x86, and also wondering how it interacts with the previously
> > mentioned  -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144
> > 
> > Possibly the former only works with SeaBIOS, and the latter only
> > works with EDK2, but I've not figured out how to prove this.
> 
> The qemu docs mention opt/ovmf is specifically for OVMF firmware:
> https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/docs/specs/fw_cfg.rst#L279
> 
> The pcihole64 setting can be used with OVMF (see below) and with SEABIOS:
> https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64)
> 
> The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt
> XML would need to directly pass qemu command line arguments to use it.

I'm wondering what the semantic difference is between setting
the pcihole64 property and the X-PciMmio64Mb fwcfg, in the context
of OVMF.

The fact that both exist, suggests that there is a meaningful
difference, which in turn would mean libvirt might need separate
XML attributes for each, which in turn influences how we might
choose to design the aarch64 solution.

> 
> > 
> > I'm curious if there's a good way to identify the guest memory
> > map impact, as I'm not finding a clear marker in 'dmesg' that
> > correlates ?
> 
> We were able to test this by using OVMF without the dynamic mmio
> window size patch (i.e. a version older than edk2-stable202211) and
> guest kernel parameters that are not set to allow re-calculating the
> MMIO window size by deferring guest resource allocations to the guest
> kernel (i.e. pci=realloc and pci=nocrs aren't set). With this we could 
> reproduce a 4 GPU VM launch with guest BARs not mapped properly
> due to running out of space/resources. The BAR mapping failures will
> be clear in dmesg, with no BAR region mappings in /proc/iomem or
> output of lspci for the GPUs.
> 
> From there we added the pcihole64 attribute to the VM's libvirt definition,
> setting a 2 TB hole size, and the VM booted with guest GPU BARs mapped
> properly in dmesg + GPU BAR mappings visible in /proc/iomem and lspci output.
> 
> Lastly, observed the same behavior by removing the pcihole64 attribute and
> setting the X-PciMmio64Mb configuration to 2TB.
> 
> > 
> >> Or perhaps that would be too messy or x86-centric and it’s better to go
> >> with what you proposed (pcimmio64)?
> > 
> > If the 'pcihole64' setting really is setting the MMIO64 window, then it
> > would be preferrable to re-use the existing setting field.
> 
> Per the tests above, pcihole64 is setting the MMIO64 window. The only concern
> I have with using it is that to date, it has been an x86-centric attribute and tied
> closely with the qemu -global parameter. I don’t think this is a show-stopper, but
> will require some code changes to allow it to work with the virt machine and
> connect it up to a different qemu parameter for that machine.
> 
> 
> -matt
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Matt Ochs via Devel 9 months ago

> On May 13, 2025, at 3:10 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> On Mon, May 12, 2025 at 07:33:37PM +0000, Matt Ochs wrote:
>>> On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
>>> On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
>>>> 
>>>> Would it make sense to just use the existing pcihole64 since [I think]
>>>> it more or less represents the same concept (setting 64bit MMIO window)?
>>> 
>>> I'm not sure. I've been struggling to reproduce an effect wit hseting
>>> the existing  -global q35-pcihost.pci-hole64-size=1048576K settings
>>> on x86, and also wondering how it interacts with the previously
>>> mentioned  -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144
>>> 
>>> Possibly the former only works with SeaBIOS, and the latter only
>>> works with EDK2, but I've not figured out how to prove this.
>> 
>> The qemu docs mention opt/ovmf is specifically for OVMF firmware:
>> https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/docs/specs/fw_cfg.rst#L279
>> 
>> The pcihole64 setting can be used with OVMF (see below) and with SEABIOS:
>> https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64)
>> 
>> The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt
>> XML would need to directly pass qemu command line arguments to use it.
> 
> I'm wondering what the semantic difference is between setting
> the pcihole64 property and the X-PciMmio64Mb fwcfg, in the context
> of OVMF.
> 
> The fact that both exist, suggests that there is a meaningful
> difference, which in turn would mean libvirt might need separate
> XML attributes for each, which in turn influences how we might
> choose to design the aarch64 solution.

AFAICT, I think these are the key points between the two…

 - pcihole64 is a QEMU property

   It tells QEMU how much address space to reserve for 64-bit
   PCI MMIO. It is about the host’s reservation and what is exposed
   to the guest.

 - X-PciMmio64Mb is an OVMF/firmware override

    It tells OVMF to use a specific size for the MMIO64 window,
    regardless of what QEMU might have reserved or exposed by
    default. Moreover, as indicated by the X- prefix, this is an
    “experimental” option that isn’t widely documented and used
     as a workaround for situations where the default window
     sizing logic that is present in OVMF is insufficient.

Since highmem-mmio-size is also a QEMU property that deals with host-side
reservation for the MMIO64 window, it seems more in line with pcihole64.

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Daniel P. Berrangé via Devel 9 months ago

On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
> Resending: Series has been re-based over latest upstream.
> 
> This patch series adds support for configuring the PCI high memory MMIO
> window size for aarch64 virt machine types. This feature has been merged
> into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
> It allows users to configure the size of the high memory MMIO window above
> 4GB, which is particularly useful for systems with large amounts of PCI
> memory requirements.
>     
> The feature is exposed through the domain XML as a new PCI feature:
> <features>
>   <pci>
>     <highmem-mmio-size unit='G'>512</highmem-mmio-size>
>   </pci>
> </features>
> 
> When enabled, this configures the size of the PCI high memory MMIO window
> via QEMU's highmem-mmio-size machine property. The feature is only
> available for aarch64 virt machine types and requires QEMU support.

This isn't my area of expertize, but could you give any more background
on why we need to /manually/ set such a property on Arm only ? Is there
something that prevents us making QEMU "do the right thing" ?

As a general rule these kind of obscure tunables are not very user
friendly. Since they are obscure, most mgmt apps developers are not
going to be aware of them, so may well not provide any way to set
them, and even if they can be set, it still requires someone or
something to remember to actually set it... which usually ends up
only happening /after/ the end user has complained their setup
is broken. Overall this leads to a poor user experience IME.

IOW, if there is any plausible way we can make QEMU work suitably
out of the box, that'd be preferrable to requiring a manually set
obscure tunable like this 

> This series depends on [2] and should be applied on top of those patches.
> 
> For your convenience, this series is also available on Github [3].
> 
> [1] https://github.com/qemu/qemu/commit/f10104aeae3a17f181d5bb37b7fd7dad7fe86cba
> [2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/Z4NQ3CVQYLNGZRBC35CUHOQ2EXJROPYG/
> [3] git fetch https://github.com/nvmochs/libvirt.git pci_highmem_mmio_size
> 
> Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
> 
> Matthew R. Ochs (6):
>   domain: Add PCI configuration feature infrastructure
>   schema: Add PCI configuration feature schema
>   conf: Add PCI configuration XML parsing and formatting
>   qemu: Add capability for PCI high memory MMIO size
>   qemu: Add command line support for PCI high memory MMIO size
>   tests: Add tests for machine PCI features
> 
>  src/conf/domain_conf.c                        | 103 ++++++++++++++++++
>  src/conf/domain_conf.h                        |   6 +
>  src/conf/schemas/domaincommon.rng             |   9 ++
>  src/qemu/qemu_capabilities.c                  |   2 +
>  src/qemu/qemu_capabilities.h                  |   1 +
>  src/qemu/qemu_command.c                       |   6 +
>  src/qemu/qemu_validate.c                      |  15 +++
>  .../caps_10.0.0_aarch64.replies               |  10 ++
>  .../caps_10.0.0_aarch64.xml                   |   1 +
>  ...rch64-virt-machine-pci.aarch64-latest.args |  31 ++++++
>  ...arch64-virt-machine-pci.aarch64-latest.xml |  30 +++++
>  .../aarch64-virt-machine-pci.xml              |  20 ++++
>  tests/qemuxmlconftest.c                       |   2 +
>  13 files changed, 236 insertions(+)
>  create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args
>  create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml
>  create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml
> 
> -- 
> 2.46.0
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Matt Ochs via Devel 9 months ago

Hi Daniel,

Thanks for your feedback!

> On May 7, 2025, at 11:51 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
>> Resending: Series has been re-based over latest upstream.
>> 
>> This patch series adds support for configuring the PCI high memory MMIO
>> window size for aarch64 virt machine types. This feature has been merged
>> into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
>> It allows users to configure the size of the high memory MMIO window above
>> 4GB, which is particularly useful for systems with large amounts of PCI
>> memory requirements.
>> 
>> The feature is exposed through the domain XML as a new PCI feature:
>> <features>
>>  <pci>
>>    <highmem-mmio-size unit='G'>512</highmem-mmio-size>
>>  </pci>
>> </features>
>> 
>> When enabled, this configures the size of the PCI high memory MMIO window
>> via QEMU's highmem-mmio-size machine property. The feature is only
>> available for aarch64 virt machine types and requires QEMU support.
> 
> This isn't my area of expertize, but could you give any more background
> on why we need to /manually/ set such a property on Arm only ? Is there
> something that prevents us making QEMU "do the right thing" ?

The highmem-mmio-size property is only available for the arm64 “virt”
machine. It is only needed when a VM configuration will exceed the 512G
default for PCI highmem region. There are some GPU devices that exist
today that have very large BARs and require more than 512G when
multiple devices are passed through to a VM.

Regarding making QEMU “do the right thing”, we could add logic to
libvirt to detect when these known devices are present in the VM
configuration and automatically set an appropriate size for the
parameter. However I was under the impression that type of solution
was preferred to be handled at the mgmt app layer.

-matt

Re: [PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Posted by Daniel P. Berrangé via Devel 9 months ago

On Wed, May 07, 2025 at 08:44:05PM +0000, Matt Ochs wrote:
> Hi Daniel,
> 
> Thanks for your feedback!
> 
> > On May 7, 2025, at 11:51 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
> >> Resending: Series has been re-based over latest upstream.
> >> 
> >> This patch series adds support for configuring the PCI high memory MMIO
> >> window size for aarch64 virt machine types. This feature has been merged
> >> into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
> >> It allows users to configure the size of the high memory MMIO window above
> >> 4GB, which is particularly useful for systems with large amounts of PCI
> >> memory requirements.
> >> 
> >> The feature is exposed through the domain XML as a new PCI feature:
> >> <features>
> >>  <pci>
> >>    <highmem-mmio-size unit='G'>512</highmem-mmio-size>
> >>  </pci>
> >> </features>
> >> 
> >> When enabled, this configures the size of the PCI high memory MMIO window
> >> via QEMU's highmem-mmio-size machine property. The feature is only
> >> available for aarch64 virt machine types and requires QEMU support.
> > 
> > This isn't my area of expertize, but could you give any more background
> > on why we need to /manually/ set such a property on Arm only ? Is there
> > something that prevents us making QEMU "do the right thing" ?
> 
> The highmem-mmio-size property is only available for the arm64 “virt”
> machine. It is only needed when a VM configuration will exceed the 512G
> default for PCI highmem region. There are some GPU devices that exist
> today that have very large BARs and require more than 512G when
> multiple devices are passed through to a VM.
> 
> Regarding making QEMU “do the right thing”, we could add logic to
> libvirt to detect when these known devices are present in the VM
> configuration and automatically set an appropriate size for the
> parameter. However I was under the impression that type of solution
> was preferred to be handled at the mgmt app layer.

I wasn't suggestnig to put logic in libvirt actually. I'm querying why
QEMU's memory map is setup such that this PCI assignment can't work by
default with a standard QEMU configuration ?

Can you confirm this works correctly on x86 QEMU with q35 machine type
by default ?  If so, what prevents QEMU 'virt' machine for aarch64
being changed to also work ?

Libvirt can't detect when the devices are present in the VM config
because this mmio setting is a cold boot option, while PCI devices
are often hot-plugged to an existing VM.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|