docs/formatdomain.html.in | 7 + docs/schemas/cputypes.rng | 1 + src/conf/cpu_conf.c | 3 +- src/conf/cpu_conf.h | 1 + src/conf/domain_conf.c | 166 ++++++++++++++++++ .../cpu-host-passthrough-nonuma.args | 25 +++ .../cpu-host-passthrough-nonuma.xml | 18 ++ .../cpu-host-passthrough-numa.args | 29 +++ .../cpu-host-passthrough-numa.xml | 18 ++ tests/qemuxml2argvtest.c | 2 + 10 files changed, 269 insertions(+), 1 deletion(-) create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml
From: Wim ten Have <wim.ten.have@oracle.com> This patch extends the guest domain administration adding support to automatically advertise the host NUMA node capabilities obtained architecture under a guest by creating a vNUMA copy. The mechanism is enabled by setting the check='numa' attribute under the CPU 'host-passthrough' topology: <cpu mode='host-passthrough' check='numa' .../> When enabled the mechanism automatically renders the host capabilities provided NUMA architecture, evenly balances the guest reserved vcpu and memory amongst its vNUMA composed cells and have the cell allocated vcpus pinned towards the host NUMA node physical cpusets. This in such way that the host NUMA topology is still in effect under the partitioned guest domain. Below example auto partitions the host 'lscpu' listed physical NUMA detail under a guest domain vNUMA description. [root@host ]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 240 On-line CPU(s) list: 0-239 Thread(s) per core: 2 Core(s) per socket: 15 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-8895 v2 @ 2.80GHz Stepping: 7 CPU MHz: 3449.555 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 5586.28 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 38400K NUMA node0 CPU(s): 0-14,120-134 NUMA node1 CPU(s): 15-29,135-149 NUMA node2 CPU(s): 30-44,150-164 NUMA node3 CPU(s): 45-59,165-179 NUMA node4 CPU(s): 60-74,180-194 NUMA node5 CPU(s): 75-89,195-209 NUMA node6 CPU(s): 90-104,210-224 NUMA node7 CPU(s): 105-119,225-239 Flags: ... The guest 'anuma' without the auto partition rendering enabled reads; "<cpu mode='host-passthrough' check='none'/>" <domain type='kvm'> <name>anuma</name> <uuid>3f439f5f-1156-4d48-9491-945a2c0abc6d</uuid> <memory unit='KiB'>67108864</memory> <currentMemory unit='KiB'>67108864</currentMemory> <vcpu placement='static'>16</vcpu> <os> <type arch='x86_64' machine='pc-q35-2.11'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <vmport state='off'/> </features> <cpu mode='host-passthrough' check='none'/> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/anuma.qcow2'/> Enabling the auto partitioning the guest 'anuma' XML is rewritten as listed below; "<cpu mode='host-passthrough' check='numa'>" <domain type='kvm'> <name>anuma</name> <uuid>3f439f5f-1156-4d48-9491-945a2c0abc6d</uuid> <memory unit='KiB'>67108864</memory> <currentMemory unit='KiB'>67108864</currentMemory> <vcpu placement='static'>16</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0-14,120-134'/> <vcpupin vcpu='1' cpuset='15-29,135-149'/> <vcpupin vcpu='2' cpuset='30-44,150-164'/> <vcpupin vcpu='3' cpuset='45-59,165-179'/> <vcpupin vcpu='4' cpuset='60-74,180-194'/> <vcpupin vcpu='5' cpuset='75-89,195-209'/> <vcpupin vcpu='6' cpuset='90-104,210-224'/> <vcpupin vcpu='7' cpuset='105-119,225-239'/> <vcpupin vcpu='8' cpuset='0-14,120-134'/> <vcpupin vcpu='9' cpuset='15-29,135-149'/> <vcpupin vcpu='10' cpuset='30-44,150-164'/> <vcpupin vcpu='11' cpuset='45-59,165-179'/> <vcpupin vcpu='12' cpuset='60-74,180-194'/> <vcpupin vcpu='13' cpuset='75-89,195-209'/> <vcpupin vcpu='14' cpuset='90-104,210-224'/> <vcpupin vcpu='15' cpuset='105-119,225-239'/> </cputune> <os> <type arch='x86_64' machine='pc-q35-2.11'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <vmport state='off'/> </features> <cpu mode='host-passthrough' check='numa'> <topology sockets='8' cores='1' threads='2'/> <numa> <cell id='0' cpus='0,8' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='10'/> <sibling id='1' value='21'/> <sibling id='2' value='31'/> <sibling id='3' value='21'/> <sibling id='4' value='21'/> <sibling id='5' value='31'/> <sibling id='6' value='31'/> <sibling id='7' value='31'/> </distances> </cell> <cell id='1' cpus='1,9' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='21'/> <sibling id='1' value='10'/> <sibling id='2' value='21'/> <sibling id='3' value='31'/> <sibling id='4' value='31'/> <sibling id='5' value='21'/> <sibling id='6' value='31'/> <sibling id='7' value='31'/> </distances> </cell> <cell id='2' cpus='2,10' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='21'/> <sibling id='2' value='10'/> <sibling id='3' value='21'/> <sibling id='4' value='31'/> <sibling id='5' value='31'/> <sibling id='6' value='21'/> <sibling id='7' value='31'/> </distances> </cell> <cell id='3' cpus='3,11' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='21'/> <sibling id='1' value='31'/> <sibling id='2' value='21'/> <sibling id='3' value='10'/> <sibling id='4' value='31'/> <sibling id='5' value='31'/> <sibling id='6' value='31'/> <sibling id='7' value='21'/> </distances> </cell> <cell id='4' cpus='4,12' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='21'/> <sibling id='1' value='31'/> <sibling id='2' value='31'/> <sibling id='3' value='31'/> <sibling id='4' value='10'/> <sibling id='5' value='21'/> <sibling id='6' value='21'/> <sibling id='7' value='31'/> </distances> </cell> <cell id='5' cpus='5,13' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='21'/> <sibling id='2' value='31'/> <sibling id='3' value='31'/> <sibling id='4' value='21'/> <sibling id='5' value='10'/> <sibling id='6' value='31'/> <sibling id='7' value='21'/> </distances> </cell> <cell id='6' cpus='6,14' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='31'/> <sibling id='2' value='21'/> <sibling id='3' value='31'/> <sibling id='4' value='21'/> <sibling id='5' value='31'/> <sibling id='6' value='10'/> <sibling id='7' value='21'/> </distances> </cell> <cell id='7' cpus='7,15' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='31'/> <sibling id='2' value='31'/> <sibling id='3' value='21'/> <sibling id='4' value='31'/> <sibling id='5' value='21'/> <sibling id='6' value='21'/> <sibling id='7' value='10'/> </distances> </cell> </numa> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/anuma.qcow2'/> Finally the auto partitioned guest anuma 'lscpu' listed virtual vNUMA detail. [root@anuma ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-8895 v2 @ 2.80GHz Stepping: 7 CPU MHz: 2793.268 BogoMIPS: 5586.53 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0,8 NUMA node1 CPU(s): 1,9 NUMA node2 CPU(s): 2,10 NUMA node3 CPU(s): 3,11 NUMA node4 CPU(s): 4,12 NUMA node5 CPU(s): 5,13 NUMA node6 CPU(s): 6,14 NUMA node7 CPU(s): 7,15 Flags: ... Wim ten Have (2): domain: auto partition guests providing the host NUMA topology qemuxml2argv: add tests that exercise vNUMA auto partition topology docs/formatdomain.html.in | 7 + docs/schemas/cputypes.rng | 1 + src/conf/cpu_conf.c | 3 +- src/conf/cpu_conf.h | 1 + src/conf/domain_conf.c | 166 ++++++++++++++++++ .../cpu-host-passthrough-nonuma.args | 25 +++ .../cpu-host-passthrough-nonuma.xml | 18 ++ .../cpu-host-passthrough-numa.args | 29 +++ .../cpu-host-passthrough-numa.xml | 18 ++ tests/qemuxml2argvtest.c | 2 + 10 files changed, 269 insertions(+), 1 deletion(-) create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml -- 2.17.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Sep 25, 2018 at 12:02:40 +0200, Wim Ten Have wrote: > From: Wim ten Have <wim.ten.have@oracle.com> > > This patch extends the guest domain administration adding support > to automatically advertise the host NUMA node capabilities obtained > architecture under a guest by creating a vNUMA copy. I'm pretty sure someone would find this useful and such configuration is perfectly valid in libvirt. But I don't think there is a compelling reason to add some magic into the domain XML which would automatically expand to such configuration. It's basically a NUMA placement policy and libvirt generally tries to avoid including any kind of policies and rather just provide all the mechanisms and knobs which can be used by applications to implement any policy they like. > The mechanism is enabled by setting the check='numa' attribute under > the CPU 'host-passthrough' topology: > <cpu mode='host-passthrough' check='numa' .../> Anyway, this is definitely not the right place for such option. The 'check' attribute is described as "Since 3.2.0, an optional check attribute can be used to request a specific way of checking whether the virtual CPU matches the specification." and the new 'numa' value does not fit in there in any way. Moreover the code does the automatic NUMA placement at the moment libvirt parses the domain XML, which is not the right place since it would break migration, snapshots, and save/restore features. We have existing placement attributes for vcpu and numatune/memory elements which would have been much better place for implementing such feature. And event cpu/numa element could have been enhanced to support similar configuration. Jirka -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, 25 Sep 2018 14:37:15 +0200 Jiri Denemark <jdenemar@redhat.com> wrote: > On Tue, Sep 25, 2018 at 12:02:40 +0200, Wim Ten Have wrote: > > From: Wim ten Have <wim.ten.have@oracle.com> > > > > This patch extends the guest domain administration adding support > > to automatically advertise the host NUMA node capabilities obtained > > architecture under a guest by creating a vNUMA copy. > > I'm pretty sure someone would find this useful and such configuration is > perfectly valid in libvirt. But I don't think there is a compelling > reason to add some magic into the domain XML which would automatically > expand to such configuration. It's basically a NUMA placement policy and > libvirt generally tries to avoid including any kind of policies and > rather just provide all the mechanisms and knobs which can be used by > applications to implement any policy they like. > > > The mechanism is enabled by setting the check='numa' attribute under > > the CPU 'host-passthrough' topology: > > <cpu mode='host-passthrough' check='numa' .../> > > Anyway, this is definitely not the right place for such option. The > 'check' attribute is described as > > "Since 3.2.0, an optional check attribute can be used to request a > specific way of checking whether the virtual CPU matches the > specification." > > and the new 'numa' value does not fit in there in any way. > > Moreover the code does the automatic NUMA placement at the moment > libvirt parses the domain XML, which is not the right place since it > would break migration, snapshots, and save/restore features. Howdy, thanks for your fast response. I was Out Of Office for a while unable to reply earlier. The beef of this code does indeed not belong under the domain code and should rather move into the NUMA specific code where check='numa' is simply badly chosen. Also whilst OOO it occurred to me that besides auto partitioning the host into a vNUMA replica there's probably even other configuration target we may introduce reserving a single NUMA-node out of the nodes reserved for a guest to configure. So my plan is to come back asap with reworked code. > We have existing placement attributes for vcpu and numatune/memory > elements which would have been much better place for implementing such > feature. And event cpu/numa element could have been enhanced to support > similar configuration. Going over libvirt documentation I am more appealed with vcpu area. As said let me rework and return with better approach/RFC asap. Rgds, - Wim10H. > Jirka > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2024 Red Hat, Inc.