[libvirt] [PATCH] conf: fix populating of fake NUMA in multi-node hosts

Daniel P. Berrangé posted 1 patch 4 years, 4 months ago
Failed in applying to current master (apply log)
src/conf/capabilities.c | 67 ++++++++++++++++++++++-------------------
1 file changed, 36 insertions(+), 31 deletions(-)
[libvirt] [PATCH] conf: fix populating of fake NUMA in multi-node hosts
Posted by Daniel P. Berrangé 4 years, 4 months ago
If the host OS doesn't have NUMA present, we fallback to
populating fake NUMA info and the code thus assumes only a
single NUMA node.

Unfortunately we also fallback to fake NUMA if numactl-devel
was not present, and in this case we can still have multiple
NUMA nodes. In this case we create all CPUs, but only the
CPUs in the first node have any data filled in, resulting in
capabilities like:

    <topology>
      <cells num='1'>
        <cell id='0'>
          <memory unit='KiB'>15977572</memory>
          <cpus num='48'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='0' core_id='0' siblings='1'/>
            <cpu id='2' socket_id='0' core_id='1' siblings='2'/>
            <cpu id='3' socket_id='0' core_id='1' siblings='3'/>
            <cpu id='4' socket_id='0' core_id='2' siblings='4'/>
            <cpu id='5' socket_id='0' core_id='2' siblings='5'/>
            <cpu id='6' socket_id='0' core_id='3' siblings='6'/>
            <cpu id='7' socket_id='0' core_id='3' siblings='7'/>
            <cpu id='8' socket_id='0' core_id='4' siblings='8'/>
            <cpu id='9' socket_id='0' core_id='4' siblings='9'/>
            <cpu id='10' socket_id='0' core_id='5' siblings='10'/>
            <cpu id='11' socket_id='0' core_id='5' siblings='11'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
            <cpu id='0'/>
          </cpus>
        </cell>
      </cells>
    </topology>

With this new code we get something slightly less broken

    <topology>
      <cells num='4'>
        <cell id='0'>
          <memory unit='KiB'>15977572</memory>
          <cpus num='12'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
            <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
            <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/>
            <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/>
            <cpu id='4' socket_id='0' core_id='2' siblings='4-5'/>
            <cpu id='5' socket_id='0' core_id='2' siblings='4-5'/>
            <cpu id='6' socket_id='0' core_id='3' siblings='6-7'/>
            <cpu id='7' socket_id='0' core_id='3' siblings='6-7'/>
            <cpu id='8' socket_id='0' core_id='4' siblings='8-9'/>
            <cpu id='9' socket_id='0' core_id='4' siblings='8-9'/>
            <cpu id='10' socket_id='0' core_id='5' siblings='10-11'/>
            <cpu id='11' socket_id='0' core_id='5' siblings='10-11'/>
          </cpus>
        </cell>
        <cell id='0'>
          <memory unit='KiB'>15977572</memory>
          <cpus num='12'>
            <cpu id='12' socket_id='0' core_id='0' siblings='12-13'/>
            <cpu id='13' socket_id='0' core_id='0' siblings='12-13'/>
            <cpu id='14' socket_id='0' core_id='1' siblings='14-15'/>
            <cpu id='15' socket_id='0' core_id='1' siblings='14-15'/>
            <cpu id='16' socket_id='0' core_id='2' siblings='16-17'/>
            <cpu id='17' socket_id='0' core_id='2' siblings='16-17'/>
            <cpu id='18' socket_id='0' core_id='3' siblings='18-19'/>
            <cpu id='19' socket_id='0' core_id='3' siblings='18-19'/>
            <cpu id='20' socket_id='0' core_id='4' siblings='20-21'/>
            <cpu id='21' socket_id='0' core_id='4' siblings='20-21'/>
            <cpu id='22' socket_id='0' core_id='5' siblings='22-23'/>
            <cpu id='23' socket_id='0' core_id='5' siblings='22-23'/>
          </cpus>
        </cell>
      </cells>
    </topology>

The topology at least now reflects what 'virsh nodeinfo' reports.
The main bug is that the CPU "id" values won't match what the Linux
host actually uses.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 src/conf/capabilities.c | 67 ++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 31 deletions(-)

diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c
index 81a2004dba..2a183d7070 100644
--- a/src/conf/capabilities.c
+++ b/src/conf/capabilities.c
@@ -1593,7 +1593,7 @@ virCapabilitiesHostNUMAInitFake(virCapsHostNUMAPtr caps)
     virNodeInfo nodeinfo;
     virCapsHostNUMACellCPUPtr cpus;
     int ncpus;
-    int s, c, t;
+    int n, s, c, t;
     int id, cid;
     int onlinecpus G_GNUC_UNUSED;
     bool tmp;
@@ -1602,47 +1602,52 @@ virCapabilitiesHostNUMAInitFake(virCapsHostNUMAPtr caps)
         return -1;
 
     ncpus = VIR_NODEINFO_MAXCPUS(nodeinfo);
-    onlinecpus = nodeinfo.cpus;
 
-    if (VIR_ALLOC_N(cpus, ncpus) < 0)
-        return -1;
 
-    id = cid = 0;
-    for (s = 0; s < nodeinfo.sockets; s++) {
-        for (c = 0; c < nodeinfo.cores; c++) {
-            for (t = 0; t < nodeinfo.threads; t++) {
-                if (virHostCPUGetOnline(id, &tmp) < 0)
-                    goto error;
-                if (tmp) {
-                    cpus[cid].id = id;
-                    cpus[cid].socket_id = s;
-                    cpus[cid].core_id = c;
-                    if (!(cpus[cid].siblings = virBitmapNew(ncpus)))
+    id = 0;
+    for (n = 0; n < nodeinfo.nodes; n++) {
+        int nodecpus = nodeinfo.sockets * nodeinfo.cores * nodeinfo.threads;
+        cid = 0;
+
+        if (VIR_ALLOC_N(cpus, nodecpus) < 0)
+            return -1;
+
+        for (s = 0; s < nodeinfo.sockets; s++) {
+            for (c = 0; c < nodeinfo.cores; c++) {
+                g_autoptr(virBitmap) siblings = virBitmapNew(ncpus);
+                for (t = 0; t < nodeinfo.threads; t++)
+                    ignore_value(virBitmapSetBit(siblings, id + t));
+
+                for (t = 0; t < nodeinfo.threads; t++) {
+                    if (virHostCPUGetOnline(id, &tmp) < 0)
                         goto error;
-                    ignore_value(virBitmapSetBit(cpus[cid].siblings, id));
-                    cid++;
+                    if (tmp) {
+                        cpus[cid].id = id;
+                        cpus[cid].socket_id = s;
+                        cpus[cid].core_id = c;
+                        if (!(cpus[cid].siblings = virBitmapNew(ncpus)))
+                            goto error;
+                        virBitmapCopy(cpus[cid].siblings, siblings);
+                        cid++;
+                    }
+
+                    id++;
                 }
-
-                id++;
             }
         }
-    }
 
-    virCapabilitiesHostNUMAAddCell(caps, 0,
-                                   nodeinfo.memory,
-#ifdef __linux__
-                                   onlinecpus, cpus,
-#else
-                                   ncpus, cpus,
-#endif
-                                   0, NULL,
-                                   0, NULL);
+        virCapabilitiesHostNUMAAddCell(caps, 0,
+                                       nodeinfo.memory,
+                                       cid, cpus,
+                                       0, NULL,
+                                       0, NULL);
+    }
 
     return 0;
 
  error:
-    for (; id >= 0; id--)
-        virBitmapFree(cpus[id].siblings);
+    for (; cid >= 0; cid--)
+        virBitmapFree(cpus[cid].siblings);
     VIR_FREE(cpus);
     return -1;
 }
-- 
2.23.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] conf: fix populating of fake NUMA in multi-node hosts
Posted by Michal Prívozník 4 years, 4 months ago
On 12/18/19 12:58 PM, Daniel P. Berrangé wrote:
> If the host OS doesn't have NUMA present, we fallback to
> populating fake NUMA info and the code thus assumes only a
> single NUMA node.
> 
> Unfortunately we also fallback to fake NUMA if numactl-devel
> was not present, and in this case we can still have multiple
> NUMA nodes. In this case we create all CPUs, but only the
> CPUs in the first node have any data filled in, resulting in
> capabilities like:
> 
>     <topology>
>       <cells num='1'>
>         <cell id='0'>
>           <memory unit='KiB'>15977572</memory>
>           <cpus num='48'>
>             <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
>             <cpu id='1' socket_id='0' core_id='0' siblings='1'/>
>             <cpu id='2' socket_id='0' core_id='1' siblings='2'/>
>             <cpu id='3' socket_id='0' core_id='1' siblings='3'/>
>             <cpu id='4' socket_id='0' core_id='2' siblings='4'/>
>             <cpu id='5' socket_id='0' core_id='2' siblings='5'/>
>             <cpu id='6' socket_id='0' core_id='3' siblings='6'/>
>             <cpu id='7' socket_id='0' core_id='3' siblings='7'/>
>             <cpu id='8' socket_id='0' core_id='4' siblings='8'/>
>             <cpu id='9' socket_id='0' core_id='4' siblings='9'/>
>             <cpu id='10' socket_id='0' core_id='5' siblings='10'/>
>             <cpu id='11' socket_id='0' core_id='5' siblings='11'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>             <cpu id='0'/>
>           </cpus>
>         </cell>
>       </cells>
>     </topology>
> 
> With this new code we get something slightly less broken
> 
>     <topology>
>       <cells num='4'>
>         <cell id='0'>
>           <memory unit='KiB'>15977572</memory>
>           <cpus num='12'>
>             <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
>             <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
>             <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/>
>             <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/>
>             <cpu id='4' socket_id='0' core_id='2' siblings='4-5'/>
>             <cpu id='5' socket_id='0' core_id='2' siblings='4-5'/>
>             <cpu id='6' socket_id='0' core_id='3' siblings='6-7'/>
>             <cpu id='7' socket_id='0' core_id='3' siblings='6-7'/>
>             <cpu id='8' socket_id='0' core_id='4' siblings='8-9'/>
>             <cpu id='9' socket_id='0' core_id='4' siblings='8-9'/>
>             <cpu id='10' socket_id='0' core_id='5' siblings='10-11'/>
>             <cpu id='11' socket_id='0' core_id='5' siblings='10-11'/>
>           </cpus>
>         </cell>
>         <cell id='0'>
>           <memory unit='KiB'>15977572</memory>
>           <cpus num='12'>
>             <cpu id='12' socket_id='0' core_id='0' siblings='12-13'/>
>             <cpu id='13' socket_id='0' core_id='0' siblings='12-13'/>
>             <cpu id='14' socket_id='0' core_id='1' siblings='14-15'/>
>             <cpu id='15' socket_id='0' core_id='1' siblings='14-15'/>
>             <cpu id='16' socket_id='0' core_id='2' siblings='16-17'/>
>             <cpu id='17' socket_id='0' core_id='2' siblings='16-17'/>
>             <cpu id='18' socket_id='0' core_id='3' siblings='18-19'/>
>             <cpu id='19' socket_id='0' core_id='3' siblings='18-19'/>
>             <cpu id='20' socket_id='0' core_id='4' siblings='20-21'/>
>             <cpu id='21' socket_id='0' core_id='4' siblings='20-21'/>
>             <cpu id='22' socket_id='0' core_id='5' siblings='22-23'/>
>             <cpu id='23' socket_id='0' core_id='5' siblings='22-23'/>
>           </cpus>
>         </cell>
>       </cells>
>     </topology>
> 
> The topology at least now reflects what 'virsh nodeinfo' reports.
> The main bug is that the CPU "id" values won't match what the Linux
> host actually uses.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  src/conf/capabilities.c | 67 ++++++++++++++++++++++-------------------
>  1 file changed, 36 insertions(+), 31 deletions(-)

Reviewed-by: Michal Privoznik <mprivozn@redhat.com>

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list