src/conf/domain_conf.c | 103 ++++++++++++++++++ src/conf/domain_conf.h | 6 + src/conf/schemas/domaincommon.rng | 9 ++ src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 6 + src/qemu/qemu_validate.c | 15 +++ .../caps_10.0.0_aarch64.replies | 10 ++ .../caps_10.0.0_aarch64.xml | 1 + ...rch64-virt-machine-pci.aarch64-latest.args | 31 ++++++ ...arch64-virt-machine-pci.aarch64-latest.xml | 30 +++++ .../aarch64-virt-machine-pci.xml | 20 ++++ tests/qemuxmlconftest.c | 2 + 13 files changed, 236 insertions(+) create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml
Resending: Series has been re-based over latest upstream.
This patch series adds support for configuring the PCI high memory MMIO
window size for aarch64 virt machine types. This feature has been merged
into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
It allows users to configure the size of the high memory MMIO window above
4GB, which is particularly useful for systems with large amounts of PCI
memory requirements.
The feature is exposed through the domain XML as a new PCI feature:
<features>
<pci>
<highmem-mmio-size unit='G'>512</highmem-mmio-size>
</pci>
</features>
When enabled, this configures the size of the PCI high memory MMIO window
via QEMU's highmem-mmio-size machine property. The feature is only
available for aarch64 virt machine types and requires QEMU support.
This series depends on [2] and should be applied on top of those patches.
For your convenience, this series is also available on Github [3].
[1] https://github.com/qemu/qemu/commit/f10104aeae3a17f181d5bb37b7fd7dad7fe86cba
[2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/Z4NQ3CVQYLNGZRBC35CUHOQ2EXJROPYG/
[3] git fetch https://github.com/nvmochs/libvirt.git pci_highmem_mmio_size
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Matthew R. Ochs (6):
domain: Add PCI configuration feature infrastructure
schema: Add PCI configuration feature schema
conf: Add PCI configuration XML parsing and formatting
qemu: Add capability for PCI high memory MMIO size
qemu: Add command line support for PCI high memory MMIO size
tests: Add tests for machine PCI features
src/conf/domain_conf.c | 103 ++++++++++++++++++
src/conf/domain_conf.h | 6 +
src/conf/schemas/domaincommon.rng | 9 ++
src/qemu/qemu_capabilities.c | 2 +
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_command.c | 6 +
src/qemu/qemu_validate.c | 15 +++
.../caps_10.0.0_aarch64.replies | 10 ++
.../caps_10.0.0_aarch64.xml | 1 +
...rch64-virt-machine-pci.aarch64-latest.args | 31 ++++++
...arch64-virt-machine-pci.aarch64-latest.xml | 30 +++++
.../aarch64-virt-machine-pci.xml | 20 ++++
tests/qemuxmlconftest.c | 2 +
13 files changed, 236 insertions(+)
create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args
create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml
create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml
--
2.46.0
On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
> Resending: Series has been re-based over latest upstream.
>
> This patch series adds support for configuring the PCI high memory MMIO
> window size for aarch64 virt machine types. This feature has been merged
> into the QEMU upstream master branch [1] and will be available in QEMU 10.0.
> It allows users to configure the size of the high memory MMIO window above
> 4GB, which is particularly useful for systems with large amounts of PCI
> memory requirements.
>
> The feature is exposed through the domain XML as a new PCI feature:
> <features>
> <pci>
> <highmem-mmio-size unit='G'>512</highmem-mmio-size>
> </pci>
> </features>
As a schema design comment. IIUC, the MMIO size we're configuring
is conceptually a characteristic associated with the PCI(e) host
and the memory layout it defines for PCI(e) devices to use.
Checking through our schema I find we already have support
for
<controller type='pci' index='0' model='pci-root'>
<pcihole64 unit='KiB'>1048576</pcihole64>
</controller>
this makes me think that we should model this new attribute
in a similar way, eg so we can support:
<controller type='pci' index='0' model='pci-root'>
<pcihole64 unit='KiB'>1048576</pcihole64>
<pcimmio64 unit='TiB'>2</pcimmio64>
</controller>
(pci-root or pcie-root are interchangable).
This 'pcimmio64' value can then be mapped to whatever hypervisor
or architecture specific setting is appropriate, avoiding exposing
the QEMU arm 'highmem-mmio-size' naming convention.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
> On May 9, 2025, at 9:59 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote: >> Resending: Series has been re-based over latest upstream. >> >> This patch series adds support for configuring the PCI high memory MMIO >> window size for aarch64 virt machine types. This feature has been merged >> into the QEMU upstream master branch [1] and will be available in QEMU 10.0. >> It allows users to configure the size of the high memory MMIO window above >> 4GB, which is particularly useful for systems with large amounts of PCI >> memory requirements. >> >> The feature is exposed through the domain XML as a new PCI feature: >> <features> >> <pci> >> <highmem-mmio-size unit='G'>512</highmem-mmio-size> >> </pci> >> </features> > > As a schema design comment. IIUC, the MMIO size we're configuring > is conceptually a characteristic associated with the PCI(e) host > and the memory layout it defines for PCI(e) devices to use. Correct. > Checking through our schema I find we already have support > for > > <controller type='pci' index='0' model='pci-root'> > <pcihole64 unit='KiB'>1048576</pcihole64> > </controller> > > this makes me think that we should model this new attribute > in a similar way, eg so we can support: > > <controller type='pci' index='0' model='pci-root'> > <pcihole64 unit='KiB'>1048576</pcihole64> > <pcimmio64 unit='TiB'>2</pcimmio64> > </controller> > > (pci-root or pcie-root are interchangable). > > This 'pcimmio64' value can then be mapped to whatever hypervisor > or architecture specific setting is appropriate, avoiding exposing > the QEMU arm 'highmem-mmio-size' naming convention. Thanks for the feedback, this sounds like a better approach. Would it make sense to just use the existing pcihole64 since [I think] it more or less represents the same concept (setting 64bit MMIO window)? Or perhaps that would be too messy or x86-centric and it’s better to go with what you proposed (pcimmio64)?
On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote: > > On May 9, 2025, at 9:59 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote: > >> Resending: Series has been re-based over latest upstream. > >> > >> This patch series adds support for configuring the PCI high memory MMIO > >> window size for aarch64 virt machine types. This feature has been merged > >> into the QEMU upstream master branch [1] and will be available in QEMU 10.0. > >> It allows users to configure the size of the high memory MMIO window above > >> 4GB, which is particularly useful for systems with large amounts of PCI > >> memory requirements. > >> > >> The feature is exposed through the domain XML as a new PCI feature: > >> <features> > >> <pci> > >> <highmem-mmio-size unit='G'>512</highmem-mmio-size> > >> </pci> > >> </features> > > > > As a schema design comment. IIUC, the MMIO size we're configuring > > is conceptually a characteristic associated with the PCI(e) host > > and the memory layout it defines for PCI(e) devices to use. > > Correct. > > > Checking through our schema I find we already have support > > for > > > > <controller type='pci' index='0' model='pci-root'> > > <pcihole64 unit='KiB'>1048576</pcihole64> > > </controller> > > > > this makes me think that we should model this new attribute > > in a similar way, eg so we can support: > > > > <controller type='pci' index='0' model='pci-root'> > > <pcihole64 unit='KiB'>1048576</pcihole64> > > <pcimmio64 unit='TiB'>2</pcimmio64> > > </controller> > > > > (pci-root or pcie-root are interchangable). > > > > This 'pcimmio64' value can then be mapped to whatever hypervisor > > or architecture specific setting is appropriate, avoiding exposing > > the QEMU arm 'highmem-mmio-size' naming convention. > > Thanks for the feedback, this sounds like a better approach. > > Would it make sense to just use the existing pcihole64 since [I think] > it more or less represents the same concept (setting 64bit MMIO window)? I'm not sure. I've been struggling to reproduce an effect wit hseting the existing -global q35-pcihost.pci-hole64-size=1048576K settings on x86, and also wondering how it interacts with the previously mentioned -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144 Possibly the former only works with SeaBIOS, and the latter only works with EDK2, but I've not figured out how to prove this. I'm curious if there's a good way to identify the guest memory map impact, as I'm not finding a clear marker in 'dmesg' that correlates ? > Or perhaps that would be too messy or x86-centric and it’s better to go > with what you proposed (pcimmio64)? If the 'pcihole64' setting really is setting the MMIO64 window, then it would be preferrable to re-use the existing setting field. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
> On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: > On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote: >> >> Would it make sense to just use the existing pcihole64 since [I think] >> it more or less represents the same concept (setting 64bit MMIO window)? > > I'm not sure. I've been struggling to reproduce an effect wit hseting > the existing -global q35-pcihost.pci-hole64-size=1048576K settings > on x86, and also wondering how it interacts with the previously > mentioned -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144 > > Possibly the former only works with SeaBIOS, and the latter only > works with EDK2, but I've not figured out how to prove this. The qemu docs mention opt/ovmf is specifically for OVMF firmware: https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/docs/specs/fw_cfg.rst#L279 The pcihole64 setting can be used with OVMF (see below) and with SEABIOS: https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64) The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt XML would need to directly pass qemu command line arguments to use it. > > I'm curious if there's a good way to identify the guest memory > map impact, as I'm not finding a clear marker in 'dmesg' that > correlates ? We were able to test this by using OVMF without the dynamic mmio window size patch (i.e. a version older than edk2-stable202211) and guest kernel parameters that are not set to allow re-calculating the MMIO window size by deferring guest resource allocations to the guest kernel (i.e. pci=realloc and pci=nocrs aren't set). With this we could reproduce a 4 GPU VM launch with guest BARs not mapped properly due to running out of space/resources. The BAR mapping failures will be clear in dmesg, with no BAR region mappings in /proc/iomem or output of lspci for the GPUs. From there we added the pcihole64 attribute to the VM's libvirt definition, setting a 2 TB hole size, and the VM booted with guest GPU BARs mapped properly in dmesg + GPU BAR mappings visible in /proc/iomem and lspci output. Lastly, observed the same behavior by removing the pcihole64 attribute and setting the X-PciMmio64Mb configuration to 2TB. > >> Or perhaps that would be too messy or x86-centric and it’s better to go >> with what you proposed (pcimmio64)? > > If the 'pcihole64' setting really is setting the MMIO64 window, then it > would be preferrable to re-use the existing setting field. Per the tests above, pcihole64 is setting the MMIO64 window. The only concern I have with using it is that to date, it has been an x86-centric attribute and tied closely with the qemu -global parameter. I don’t think this is a show-stopper, but will require some code changes to allow it to work with the virt machine and connect it up to a different qemu parameter for that machine. -matt
On Mon, May 12, 2025 at 07:33:37PM +0000, Matt Ochs wrote: > > On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote: > >> > >> Would it make sense to just use the existing pcihole64 since [I think] > >> it more or less represents the same concept (setting 64bit MMIO window)? > > > > I'm not sure. I've been struggling to reproduce an effect wit hseting > > the existing -global q35-pcihost.pci-hole64-size=1048576K settings > > on x86, and also wondering how it interacts with the previously > > mentioned -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144 > > > > Possibly the former only works with SeaBIOS, and the latter only > > works with EDK2, but I've not figured out how to prove this. > > The qemu docs mention opt/ovmf is specifically for OVMF firmware: > https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/docs/specs/fw_cfg.rst#L279 > > The pcihole64 setting can be used with OVMF (see below) and with SEABIOS: > https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64) > > The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt > XML would need to directly pass qemu command line arguments to use it. I'm wondering what the semantic difference is between setting the pcihole64 property and the X-PciMmio64Mb fwcfg, in the context of OVMF. The fact that both exist, suggests that there is a meaningful difference, which in turn would mean libvirt might need separate XML attributes for each, which in turn influences how we might choose to design the aarch64 solution. > > > > > I'm curious if there's a good way to identify the guest memory > > map impact, as I'm not finding a clear marker in 'dmesg' that > > correlates ? > > We were able to test this by using OVMF without the dynamic mmio > window size patch (i.e. a version older than edk2-stable202211) and > guest kernel parameters that are not set to allow re-calculating the > MMIO window size by deferring guest resource allocations to the guest > kernel (i.e. pci=realloc and pci=nocrs aren't set). With this we could > reproduce a 4 GPU VM launch with guest BARs not mapped properly > due to running out of space/resources. The BAR mapping failures will > be clear in dmesg, with no BAR region mappings in /proc/iomem or > output of lspci for the GPUs. > > From there we added the pcihole64 attribute to the VM's libvirt definition, > setting a 2 TB hole size, and the VM booted with guest GPU BARs mapped > properly in dmesg + GPU BAR mappings visible in /proc/iomem and lspci output. > > Lastly, observed the same behavior by removing the pcihole64 attribute and > setting the X-PciMmio64Mb configuration to 2TB. > > > > >> Or perhaps that would be too messy or x86-centric and it’s better to go > >> with what you proposed (pcimmio64)? > > > > If the 'pcihole64' setting really is setting the MMIO64 window, then it > > would be preferrable to re-use the existing setting field. > > Per the tests above, pcihole64 is setting the MMIO64 window. The only concern > I have with using it is that to date, it has been an x86-centric attribute and tied > closely with the qemu -global parameter. I don’t think this is a show-stopper, but > will require some code changes to allow it to work with the virt machine and > connect it up to a different qemu parameter for that machine. > > > -matt > > With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
> On May 13, 2025, at 3:10 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, May 12, 2025 at 07:33:37PM +0000, Matt Ochs wrote:
>>> On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
>>> On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
>>>>
>>>> Would it make sense to just use the existing pcihole64 since [I think]
>>>> it more or less represents the same concept (setting 64bit MMIO window)?
>>>
>>> I'm not sure. I've been struggling to reproduce an effect wit hseting
>>> the existing -global q35-pcihost.pci-hole64-size=1048576K settings
>>> on x86, and also wondering how it interacts with the previously
>>> mentioned -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144
>>>
>>> Possibly the former only works with SeaBIOS, and the latter only
>>> works with EDK2, but I've not figured out how to prove this.
>>
>> The qemu docs mention opt/ovmf is specifically for OVMF firmware:
>> https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/docs/specs/fw_cfg.rst#L279
>>
>> The pcihole64 setting can be used with OVMF (see below) and with SEABIOS:
>> https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64)
>>
>> The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt
>> XML would need to directly pass qemu command line arguments to use it.
>
> I'm wondering what the semantic difference is between setting
> the pcihole64 property and the X-PciMmio64Mb fwcfg, in the context
> of OVMF.
>
> The fact that both exist, suggests that there is a meaningful
> difference, which in turn would mean libvirt might need separate
> XML attributes for each, which in turn influences how we might
> choose to design the aarch64 solution.
AFAICT, I think these are the key points between the two…
- pcihole64 is a QEMU property
It tells QEMU how much address space to reserve for 64-bit
PCI MMIO. It is about the host’s reservation and what is exposed
to the guest.
- X-PciMmio64Mb is an OVMF/firmware override
It tells OVMF to use a specific size for the MMIO64 window,
regardless of what QEMU might have reserved or exposed by
default. Moreover, as indicated by the X- prefix, this is an
“experimental” option that isn’t widely documented and used
as a workaround for situations where the default window
sizing logic that is present in OVMF is insufficient.
Since highmem-mmio-size is also a QEMU property that deals with host-side
reservation for the MMIO64 window, it seems more in line with pcihole64.
On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote: > Resending: Series has been re-based over latest upstream. > > This patch series adds support for configuring the PCI high memory MMIO > window size for aarch64 virt machine types. This feature has been merged > into the QEMU upstream master branch [1] and will be available in QEMU 10.0. > It allows users to configure the size of the high memory MMIO window above > 4GB, which is particularly useful for systems with large amounts of PCI > memory requirements. > > The feature is exposed through the domain XML as a new PCI feature: > <features> > <pci> > <highmem-mmio-size unit='G'>512</highmem-mmio-size> > </pci> > </features> > > When enabled, this configures the size of the PCI high memory MMIO window > via QEMU's highmem-mmio-size machine property. The feature is only > available for aarch64 virt machine types and requires QEMU support. This isn't my area of expertize, but could you give any more background on why we need to /manually/ set such a property on Arm only ? Is there something that prevents us making QEMU "do the right thing" ? As a general rule these kind of obscure tunables are not very user friendly. Since they are obscure, most mgmt apps developers are not going to be aware of them, so may well not provide any way to set them, and even if they can be set, it still requires someone or something to remember to actually set it... which usually ends up only happening /after/ the end user has complained their setup is broken. Overall this leads to a poor user experience IME. IOW, if there is any plausible way we can make QEMU work suitably out of the box, that'd be preferrable to requiring a manually set obscure tunable like this > This series depends on [2] and should be applied on top of those patches. > > For your convenience, this series is also available on Github [3]. > > [1] https://github.com/qemu/qemu/commit/f10104aeae3a17f181d5bb37b7fd7dad7fe86cba > [2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/Z4NQ3CVQYLNGZRBC35CUHOQ2EXJROPYG/ > [3] git fetch https://github.com/nvmochs/libvirt.git pci_highmem_mmio_size > > Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> > > Matthew R. Ochs (6): > domain: Add PCI configuration feature infrastructure > schema: Add PCI configuration feature schema > conf: Add PCI configuration XML parsing and formatting > qemu: Add capability for PCI high memory MMIO size > qemu: Add command line support for PCI high memory MMIO size > tests: Add tests for machine PCI features > > src/conf/domain_conf.c | 103 ++++++++++++++++++ > src/conf/domain_conf.h | 6 + > src/conf/schemas/domaincommon.rng | 9 ++ > src/qemu/qemu_capabilities.c | 2 + > src/qemu/qemu_capabilities.h | 1 + > src/qemu/qemu_command.c | 6 + > src/qemu/qemu_validate.c | 15 +++ > .../caps_10.0.0_aarch64.replies | 10 ++ > .../caps_10.0.0_aarch64.xml | 1 + > ...rch64-virt-machine-pci.aarch64-latest.args | 31 ++++++ > ...arch64-virt-machine-pci.aarch64-latest.xml | 30 +++++ > .../aarch64-virt-machine-pci.xml | 20 ++++ > tests/qemuxmlconftest.c | 2 + > 13 files changed, 236 insertions(+) > create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args > create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml > create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml > > -- > 2.46.0 > With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Hi Daniel, Thanks for your feedback! > On May 7, 2025, at 11:51 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: > On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote: >> Resending: Series has been re-based over latest upstream. >> >> This patch series adds support for configuring the PCI high memory MMIO >> window size for aarch64 virt machine types. This feature has been merged >> into the QEMU upstream master branch [1] and will be available in QEMU 10.0. >> It allows users to configure the size of the high memory MMIO window above >> 4GB, which is particularly useful for systems with large amounts of PCI >> memory requirements. >> >> The feature is exposed through the domain XML as a new PCI feature: >> <features> >> <pci> >> <highmem-mmio-size unit='G'>512</highmem-mmio-size> >> </pci> >> </features> >> >> When enabled, this configures the size of the PCI high memory MMIO window >> via QEMU's highmem-mmio-size machine property. The feature is only >> available for aarch64 virt machine types and requires QEMU support. > > This isn't my area of expertize, but could you give any more background > on why we need to /manually/ set such a property on Arm only ? Is there > something that prevents us making QEMU "do the right thing" ? The highmem-mmio-size property is only available for the arm64 “virt” machine. It is only needed when a VM configuration will exceed the 512G default for PCI highmem region. There are some GPU devices that exist today that have very large BARs and require more than 512G when multiple devices are passed through to a VM. Regarding making QEMU “do the right thing”, we could add logic to libvirt to detect when these known devices are present in the VM configuration and automatically set an appropriate size for the parameter. However I was under the impression that type of solution was preferred to be handled at the mgmt app layer. -matt
On Wed, May 07, 2025 at 08:44:05PM +0000, Matt Ochs wrote: > Hi Daniel, > > Thanks for your feedback! > > > On May 7, 2025, at 11:51 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote: > >> Resending: Series has been re-based over latest upstream. > >> > >> This patch series adds support for configuring the PCI high memory MMIO > >> window size for aarch64 virt machine types. This feature has been merged > >> into the QEMU upstream master branch [1] and will be available in QEMU 10.0. > >> It allows users to configure the size of the high memory MMIO window above > >> 4GB, which is particularly useful for systems with large amounts of PCI > >> memory requirements. > >> > >> The feature is exposed through the domain XML as a new PCI feature: > >> <features> > >> <pci> > >> <highmem-mmio-size unit='G'>512</highmem-mmio-size> > >> </pci> > >> </features> > >> > >> When enabled, this configures the size of the PCI high memory MMIO window > >> via QEMU's highmem-mmio-size machine property. The feature is only > >> available for aarch64 virt machine types and requires QEMU support. > > > > This isn't my area of expertize, but could you give any more background > > on why we need to /manually/ set such a property on Arm only ? Is there > > something that prevents us making QEMU "do the right thing" ? > > The highmem-mmio-size property is only available for the arm64 “virt” > machine. It is only needed when a VM configuration will exceed the 512G > default for PCI highmem region. There are some GPU devices that exist > today that have very large BARs and require more than 512G when > multiple devices are passed through to a VM. > > Regarding making QEMU “do the right thing”, we could add logic to > libvirt to detect when these known devices are present in the VM > configuration and automatically set an appropriate size for the > parameter. However I was under the impression that type of solution > was preferred to be handled at the mgmt app layer. I wasn't suggestnig to put logic in libvirt actually. I'm querying why QEMU's memory map is setup such that this PCI assignment can't work by default with a standard QEMU configuration ? Can you confirm this works correctly on x86 QEMU with q35 machine type by default ? If so, what prevents QEMU 'virt' machine for aarch64 being changed to also work ? Libvirt can't detect when the devices are present in the VM config because this mmio setting is a cold boot option, while PCI devices are often hot-plugged to an existing VM. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2025 Red Hat, Inc.