[PATCH 1/3] qemu: Don't try to 'fix up' cpuset.mems after QEMU's memory allocation

Michal Privoznik posted 3 patches 1 year, 5 months ago
[PATCH 1/3] qemu: Don't try to 'fix up' cpuset.mems after QEMU's memory allocation
Posted by Michal Privoznik 1 year, 5 months ago
In ideal world, my plan was perfect. We allow union of all host
nodes in cpuset.mems and once QEMU has allocated its memory, we
'fix up' restriction of its emulator thread by writing the
original value we wanted to set all along. But in fact, we can't
do it because that triggers memory movement. For instance,
consider the following <numatune/>:

  <numatune>
    <memory mode="strict" nodeset="0"/>
    <memnode cellid="1" mode="strict" nodeset="1"/>
  </numatune>

  <numa>
    <cell id="0" cpus="0-1" memory="1024000" unit="KiB" />
    <cell id="1" cpus="2-3" memory="1048576" unit="KiB"/>
  </numa>

This is meant to create 1:1 mapping between guest and host NUMA
nodes. So we start QEMU with cpuset.mems set to "0-1" (so that it
can allocate memory even for guest node #1 and have the memory
come fro host node #1) and then, set cpuset.mems to "0" (because
that's where we wanted emulator thread to live).

But this in turn triggers movement of all memory (even the
allocated one) to host NUMA node #0. Therefore, we have to just
keep cpuset.mems untouched and rely on .host-nodes passed on the
QEMU cmd line.

The placement still suffers because of cpuset.mems set for vcpus
or iothreads, but that's fixed in next commit.

Fixes: 3ec6d586bc3ec7a8cf406b1b6363e87d50aa159c
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/qemu/qemu_hotplug.c |  7 +++----
 src/qemu/qemu_process.c | 15 ++++-----------
 src/qemu/qemu_process.h |  3 +--
 3 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 806aecb29d..ba9e44945b 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -2295,7 +2295,7 @@ qemuDomainAttachMemory(virQEMUDriver *driver,
     if (qemuDomainAdjustMaxMemLock(vm) < 0)
         goto removedef;
 
-    if (qemuProcessSetupEmulator(vm, true) < 0)
+    if (qemuProcessSetupEmulator(vm) < 0)
         goto removedef;
     restoreemulatorcgroup = true;
 
@@ -2336,11 +2336,10 @@ qemuDomainAttachMemory(virQEMUDriver *driver,
             VIR_WARN("Unable to remove memory device from /dev");
         if (releaseaddr)
             qemuDomainReleaseMemoryDeviceSlot(vm, mem);
+        if (restoreemulatorcgroup)
+            qemuProcessSetupEmulator(vm);
     }
 
-    if (restoreemulatorcgroup)
-        qemuProcessSetupEmulator(vm, false);
-
     virDomainMemoryDefFree(mem);
     return ret;
 
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index a4a4a17a9b..1d3cdeff9a 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2596,9 +2596,7 @@ qemuProcessSetupPid(virDomainObj *vm,
              mem_mode == VIR_DOMAIN_NUMATUNE_MEM_RESTRICTIVE)) {
 
             /* QEMU allocates its memory from the emulator thread. Thus it
-             * needs to access union of all host nodes configured. This is
-             * going to be replaced with proper value later in the startup
-             * process. */
+             * needs to access union of all host nodes configured. */
             if (unionMems &&
                 nameval == VIR_CGROUP_THREAD_EMULATOR &&
                 mem_mode != VIR_DOMAIN_NUMATUNE_MEM_RESTRICTIVE) {
@@ -2702,15 +2700,14 @@ qemuProcessSetupPid(virDomainObj *vm,
 
 
 int
-qemuProcessSetupEmulator(virDomainObj *vm,
-                         bool unionMems)
+qemuProcessSetupEmulator(virDomainObj *vm)
 {
     return qemuProcessSetupPid(vm, vm->pid, VIR_CGROUP_THREAD_EMULATOR,
                                0, vm->def->cputune.emulatorpin,
                                vm->def->cputune.emulator_period,
                                vm->def->cputune.emulator_quota,
                                vm->def->cputune.emulatorsched,
-                               unionMems);
+                               true);
 }
 
 
@@ -7764,7 +7761,7 @@ qemuProcessLaunch(virConnectPtr conn,
         goto cleanup;
 
     VIR_DEBUG("Setting emulator tuning/settings");
-    if (qemuProcessSetupEmulator(vm, true) < 0)
+    if (qemuProcessSetupEmulator(vm) < 0)
         goto cleanup;
 
     VIR_DEBUG("Setting cgroup for external devices (if required)");
@@ -7827,10 +7824,6 @@ qemuProcessLaunch(virConnectPtr conn,
     if (qemuConnectAgent(driver, vm) < 0)
         goto cleanup;
 
-    VIR_DEBUG("Fixing up emulator tuning/settings");
-    if (qemuProcessSetupEmulator(vm, false) < 0)
-        goto cleanup;
-
     VIR_DEBUG("setting up hotpluggable cpus");
     if (qemuDomainHasHotpluggableStartupVcpus(vm->def)) {
         if (qemuDomainRefreshVcpuInfo(vm, asyncJob, false) < 0)
diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h
index 1c4c0678ab..cae1b49756 100644
--- a/src/qemu/qemu_process.h
+++ b/src/qemu/qemu_process.h
@@ -236,5 +236,4 @@ void qemuProcessCleanupMigrationJob(virQEMUDriver *driver,
 void qemuProcessRefreshDiskProps(virDomainDiskDef *disk,
                                  struct qemuDomainDiskInfo *info);
 
-int qemuProcessSetupEmulator(virDomainObj *vm,
-                             bool unionMems);
+int qemuProcessSetupEmulator(virDomainObj *vm);
-- 
2.39.3
Re: [PATCH 1/3] qemu: Don't try to 'fix up' cpuset.mems after QEMU's memory allocation
Posted by Martin Kletzander 1 year, 5 months ago
On Wed, Jun 07, 2023 at 04:40:59PM +0200, Michal Privoznik wrote:
>In ideal world, my plan was perfect. We allow union of all host
>nodes in cpuset.mems and once QEMU has allocated its memory, we
>'fix up' restriction of its emulator thread by writing the
>original value we wanted to set all along. But in fact, we can't
>do it because that triggers memory movement. For instance,
>consider the following <numatune/>:
>
>  <numatune>
>    <memory mode="strict" nodeset="0"/>
>    <memnode cellid="1" mode="strict" nodeset="1"/>
>  </numatune>
>
>  <numa>
>    <cell id="0" cpus="0-1" memory="1024000" unit="KiB" />
>    <cell id="1" cpus="2-3" memory="1048576" unit="KiB"/>
>  </numa>
>
>This is meant to create 1:1 mapping between guest and host NUMA
>nodes. So we start QEMU with cpuset.mems set to "0-1" (so that it
>can allocate memory even for guest node #1 and have the memory
>come fro host node #1) and then, set cpuset.mems to "0" (because
>that's where we wanted emulator thread to live).
>
>But this in turn triggers movement of all memory (even the
>allocated one) to host NUMA node #0. Therefore, we have to just
>keep cpuset.mems untouched and rely on .host-nodes passed on the
>QEMU cmd line.
>
>The placement still suffers because of cpuset.mems set for vcpus
>or iothreads, but that's fixed in next commit.
>
>Fixes: 3ec6d586bc3ec7a8cf406b1b6363e87d50aa159c
>Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

Reviewed-by: Martin Kletzander <mkletzan@redhat.com>