[PATCH v5 0/6] qemu: acpi-generic-initiator support

Andrea Righi via Devel posted 6 patches 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20250806124415.107369-1-arighi@nvidia.com
There is a newer version of this series
NEWS.rst                                           |   8 ++
docs/formatdomain.rst                              |  36 ++++++
src/ch/ch_domain.c                                 |   1 +
src/conf/domain_conf.c                             | 138 +++++++++++++++++++++
src/conf/domain_conf.h                             |  14 +++
src/conf/domain_postparse.c                        |   1 +
src/conf/domain_validate.c                         |  37 ++++++
src/conf/numa_conf.c                               |   3 +
src/conf/schemas/domaincommon.rng                  |  14 +++
src/conf/virconftypes.h                            |   2 +
src/hyperv/hyperv_driver.c                         |   1 +
src/libxl/libxl_driver.c                           |   6 +
src/lxc/lxc_driver.c                               |   6 +
src/qemu/qemu_alias.c                              |  11 ++
src/qemu/qemu_capabilities.c                       |   2 +
src/qemu/qemu_capabilities.h                       |   1 +
src/qemu/qemu_command.c                            |  50 ++++++--
src/qemu/qemu_domain.c                             |   2 +
src/qemu/qemu_domain_address.c                     |   4 +
src/qemu/qemu_driver.c                             |   3 +
src/qemu/qemu_hotplug.c                            |   5 +
src/qemu/qemu_postparse.c                          |   1 +
src/qemu/qemu_validate.c                           |  18 +++
src/test/test_driver.c                             |   4 +
tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml |   1 +
.../caps_10.0.0_x86_64+amdsev.xml                  |   1 +
tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml  |   1 +
.../caps_10.1.0_x86_64+inteltdx.xml                |   1 +
tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml  |   1 +
tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml   |   1 +
tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml  |   1 +
tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml   |   1 +
.../caps_9.2.0_aarch64+hvf.xml                     |   1 +
.../caps_9.2.0_x86_64+amdsev.xml                   |   1 +
tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml   |   1 +
.../acpi-generic-initiator.x86_64-latest.args      |  55 ++++++++
.../acpi-generic-initiator.x86_64-latest.xml       |   1 +
tests/qemuxmlconfdata/acpi-generic-initiator.xml   |  94 ++++++++++++++
tests/qemuxmlconftest.c                            |   1 +
39 files changed, 523 insertions(+), 7 deletions(-)
create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.args
create mode 120000 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.xml
[PATCH v5 0/6] qemu: acpi-generic-initiator support
Posted by Andrea Righi via Devel 1 month ago
= Overview =

This patch set introduces support for acpi-generic-initiator devices,
supported by QEMU [1].

The acpi-generic-initiator object is required to support Multi-Instance GPU
(MIG) configurations on NVIDIA GPUs [2]. MIG enables partitioning of GPU
resources into multiple isolated instances, each requiring a dedicated NUMA
node definition.

= Implementation =

This patch set implements the libvirt counterpart to the QEMU feature,
enabling users to configure acpi-generic-initiator objects within libvirt
domain XML.

This includes:
 - adding XML syntax to define acpi-generic-initiator objects,
 - resolving the acpi-generic-initiator definitions into the proper QEMU
   command-line arguments,
 - ensuring compatibility with existing NUMA configuration.

= Example =

 - Domain XML:
```
...
<cpu mode='host-passthrough' check='none'>
  <numa>
    <cell id='0' cpus='0-15' memory='8388608' unit='KiB'/>
    <cell id='1' memory='0' unit='KiB'/>
    <cell id='2' memory='0' unit='KiB'/>
    <cell id='3' memory='0' unit='KiB'/>
    <cell id='4' memory='0' unit='KiB'/>
    <cell id='5' memory='0' unit='KiB'/>
    <cell id='6' memory='0' unit='KiB'/>
    <cell id='7' memory='0' unit='KiB'/>
    <cell id='8' memory='0' unit='KiB'/>
  </numa>
</cpu>
...
<devices>
...
    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </hostdev>
  <acpi-generic-initiator>
    <pci-dev>hostdev0</pci-dev>
    <numa-node>1</numa-node>
  </acpi-generic-initiator>
  <acpi-generic-initiator>
    <pci-dev>hostdev0</pci-dev>
    <numa-node>2</numa-node>
  </acpi-generic-initiator>
  <acpi-generic-initiator>
    <pci-dev>hostdev0</pci-dev>
    <numa-node>3</numa-node>
  </acpi-generic-initiator>
  <acpi-generic-initiator>
    <pci-dev>hostdev0</pci-dev>
    <numa-node>4</numa-node>
  </acpi-generic-initiator>
  <acpi-generic-initiator>
    <pci-dev>hostdev0</pci-dev>
    <numa-node>5</numa-node>
  </acpi-generic-initiator>
  <acpi-generic-initiator>
    <pci-dev>hostdev0</pci-dev>
    <numa-node>6</numa-node>
  </acpi-generic-initiator>
  <acpi-generic-initiator>
    <pci-dev>hostdev0</pci-dev>
    <numa-node>7</numa-node>
  </acpi-generic-initiator>
  <acpi-generic-initiator>
    <pci-dev>hostdev0</pci-dev>
    <numa-node>8</numa-node>
  </acpi-generic-initiator>
</devices>
```

 - Generated QEMU command line options:
```
... /usr/bin/qemu-system-aarch64 \
...
-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":8589934592}' \
-numa node,nodeid=0,cpus=0-15,memdev=ram-node0 \
-numa node,nodeid=1 \
-numa node,nodeid=2 \
-numa node,nodeid=3 \
-numa node,nodeid=4 \
-numa node,nodeid=5 \
-numa node,nodeid=6 \
-numa node,nodeid=7 \
-numa node,nodeid=8 \
...
-device '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","bus":"pci.3","addr":"0x0"}'
...
-object acpi-generic-initiator,id=gi0,pci-dev=hostdev0,node=1 \
-object acpi-generic-initiator,id=gi1,pci-dev=hostdev0,node=2 \
-object acpi-generic-initiator,id=gi2,pci-dev=hostdev0,node=3 \
-object acpi-generic-initiator,id=gi3,pci-dev=hostdev0,node=4 \
-object acpi-generic-initiator,id=gi4,pci-dev=hostdev0,node=5 \
-object acpi-generic-initiator,id=gi5,pci-dev=hostdev0,node=6 \
-object acpi-generic-initiator,id=gi6,pci-dev=hostdev0,node=7 \
-object acpi-generic-initiator,id=gi7,pci-dev=hostdev0,node=8
```

= References =

[1] https://lore.kernel.org/all/20231225045603.7654-2-ankita@nvidia.com/
[2] https://www.nvidia.com/en-in/technologies/multi-instance-gpu/

ChangeLog v4 -> v5:
 - Integrate suggestions and changes from Michal's review
 - Update qemu capabilities
 - Rebase to v11.6.0

ChangeLog v3 -> v4:
 - add acpi-generic-initiator documentation
 - refactor virDomainAcpiInitiatorDef to use info->alias and drop the name
   attribute
 - auto-generate alias for the acpi-generic-initiator devices via
   qemuAssignDeviceAliases()
 - use g_autoptr() when possible
 - add a new entry to NEWS.rst

ChangeLog v2 -> v3:
  - replaced <text/> with proper types in the XML schema
  - avoid mixing g_free() and VIR_FREE()
  - use virXMLPropString() instead of looping all XML nodes
  - report proper errors with virReportError()
  - use virBufferEscapeString() to process strings passed by the user
  - fix broken formatting of function headers
  - misc coding style fixes

ChangeLog v1 -> v2:
  - split parser and driver changes in separate patches
  - introduce a new qemu capability flag
  - introduce test in qemuxmlconftest

Andrea Righi (5):
      conf: Introduce acpi-generic-initiator device
      qemu: Allow to define NUMA nodes without memory or CPUs assigned
      qemu: capabilies: Introduce QEMU_CAPS_ACPI_GENERIC_INITIATOR
      qemu: Support acpi-generic-initiator
      NEWS: Mention new acpi-generic-initiator device

Michal Prívozník (1):
      qemu_validate: Validate acpi-generic-initiator

 NEWS.rst                                           |   8 ++
 docs/formatdomain.rst                              |  36 ++++++
 src/ch/ch_domain.c                                 |   1 +
 src/conf/domain_conf.c                             | 138 +++++++++++++++++++++
 src/conf/domain_conf.h                             |  14 +++
 src/conf/domain_postparse.c                        |   1 +
 src/conf/domain_validate.c                         |  37 ++++++
 src/conf/numa_conf.c                               |   3 +
 src/conf/schemas/domaincommon.rng                  |  14 +++
 src/conf/virconftypes.h                            |   2 +
 src/hyperv/hyperv_driver.c                         |   1 +
 src/libxl/libxl_driver.c                           |   6 +
 src/lxc/lxc_driver.c                               |   6 +
 src/qemu/qemu_alias.c                              |  11 ++
 src/qemu/qemu_capabilities.c                       |   2 +
 src/qemu/qemu_capabilities.h                       |   1 +
 src/qemu/qemu_command.c                            |  50 ++++++--
 src/qemu/qemu_domain.c                             |   2 +
 src/qemu/qemu_domain_address.c                     |   4 +
 src/qemu/qemu_driver.c                             |   3 +
 src/qemu/qemu_hotplug.c                            |   5 +
 src/qemu/qemu_postparse.c                          |   1 +
 src/qemu/qemu_validate.c                           |  18 +++
 src/test/test_driver.c                             |   4 +
 tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml |   1 +
 .../caps_10.0.0_x86_64+amdsev.xml                  |   1 +
 tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml  |   1 +
 .../caps_10.1.0_x86_64+inteltdx.xml                |   1 +
 tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml  |   1 +
 tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml   |   1 +
 tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml  |   1 +
 tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml   |   1 +
 .../caps_9.2.0_aarch64+hvf.xml                     |   1 +
 .../caps_9.2.0_x86_64+amdsev.xml                   |   1 +
 tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml   |   1 +
 .../acpi-generic-initiator.x86_64-latest.args      |  55 ++++++++
 .../acpi-generic-initiator.x86_64-latest.xml       |   1 +
 tests/qemuxmlconfdata/acpi-generic-initiator.xml   |  94 ++++++++++++++
 tests/qemuxmlconftest.c                            |   1 +
 39 files changed, 523 insertions(+), 7 deletions(-)
 create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.args
 create mode 120000 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.xml
 create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.xml
Re: [PATCH v5 0/6] qemu: acpi-generic-initiator support
Posted by Daniel P. Berrangé via Devel 1 week, 3 days ago
On Wed, Aug 06, 2025 at 02:42:10PM +0200, Andrea Righi via Devel wrote:
> = Overview =
> 
> This patch set introduces support for acpi-generic-initiator devices,
> supported by QEMU [1].
> 
> The acpi-generic-initiator object is required to support Multi-Instance GPU
> (MIG) configurations on NVIDIA GPUs [2]. MIG enables partitioning of GPU
> resources into multiple isolated instances, each requiring a dedicated NUMA
> node definition.


Ok, this took me a while to understand, but after looking at the actual
QEMU code for acpi-generic-initiator it is finally clear how ridiculously
simple the entire use case is.

We can have multiple virtual NUMA nodes, which traditionally would have
virtual CPUs and RAM assigned. Virtual PCI devices could be indirectly
associated with a NUMA node by having them placed on a PXB which has
affinity with a *single* NUMA node.

For the NVIDIA GPU use case, however, the PCI device itself to have
direct affinity with *multiple* NUMA nodes. Those nodes would not have
any CPUs or memory associated with them typically. Conceptually that
is an easy thing to model in the XML.


The 'acpi-generic-initiator' object exposed by QEMU is a direct
reflection of how the NUMA affinity is mapped at the ACPI table
level. This is an inappropriate low level impl detail to expose
at a high level, as well as being an insanely verbose way to
configure what is really just a bitmask (of NUMA node IDs)
against a device.

IOW, we should not expose any of this acpi-generic-initiator stuff
in libvirt XML at all.

In the virDomainDeviceInfo struct we record 'acpiIndex' which
is a property that sets an ACPI table index for PCI devices.
This maps to the XML:

   <acpi index='8'/>

We should extend virDomainDeviceInfo to hold 'virBitmap *acpiNodeset'
to record the NUMA affinity of PCI devives (if any), and expose
this as a bitset on the existing <acpi> element eg 

   <acpi nodeset="3-5,8-10,11,15' index='8'/>

Or possibly 'numaNodeset' as the attr name.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH v5 0/6] qemu: acpi-generic-initiator support
Posted by Andrea Righi via Devel 18 hours ago
Hi Daniel,

On Wed, Aug 27, 2025 at 02:17:43PM +0100, Daniel P. Berrangé wrote:
> On Wed, Aug 06, 2025 at 02:42:10PM +0200, Andrea Righi via Devel wrote:
> > = Overview =
> > 
> > This patch set introduces support for acpi-generic-initiator devices,
> > supported by QEMU [1].
> > 
> > The acpi-generic-initiator object is required to support Multi-Instance GPU
> > (MIG) configurations on NVIDIA GPUs [2]. MIG enables partitioning of GPU
> > resources into multiple isolated instances, each requiring a dedicated NUMA
> > node definition.
> 
> 
> Ok, this took me a while to understand, but after looking at the actual
> QEMU code for acpi-generic-initiator it is finally clear how ridiculously
> simple the entire use case is.
> 
> We can have multiple virtual NUMA nodes, which traditionally would have
> virtual CPUs and RAM assigned. Virtual PCI devices could be indirectly
> associated with a NUMA node by having them placed on a PXB which has
> affinity with a *single* NUMA node.
> 
> For the NVIDIA GPU use case, however, the PCI device itself to have
> direct affinity with *multiple* NUMA nodes. Those nodes would not have
> any CPUs or memory associated with them typically. Conceptually that
> is an easy thing to model in the XML.
> 
> 
> The 'acpi-generic-initiator' object exposed by QEMU is a direct
> reflection of how the NUMA affinity is mapped at the ACPI table
> level. This is an inappropriate low level impl detail to expose
> at a high level, as well as being an insanely verbose way to
> configure what is really just a bitmask (of NUMA node IDs)
> against a device.
> 
> IOW, we should not expose any of this acpi-generic-initiator stuff
> in libvirt XML at all.
> 
> In the virDomainDeviceInfo struct we record 'acpiIndex' which
> is a property that sets an ACPI table index for PCI devices.
> This maps to the XML:
> 
>    <acpi index='8'/>
> 
> We should extend virDomainDeviceInfo to hold 'virBitmap *acpiNodeset'
> to record the NUMA affinity of PCI devives (if any), and expose
> this as a bitset on the existing <acpi> element eg 
> 
>    <acpi nodeset="3-5,8-10,11,15' index='8'/>
> 
> Or possibly 'numaNodeset' as the attr name.

Thanks for taking a look. Makes sense, I like this new syntax. I'll send a
new patch set implementing it.

-Andrea