[libvirt] [PATCH v3 0/4] numa: describe sibling nodes distances

Wim Ten Have posted 4 patches 6 years, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20170831140241.26063-1-wim.ten.have@oracle.com
There is a newer version of this series
docs/formatdomain.html.in                          |  70 ++++-
docs/schemas/basictypes.rng                        |   9 +
docs/schemas/cputypes.rng                          |  18 ++
src/conf/cpu_conf.c                                |   2 +-
src/conf/numa_conf.c                               | 323 +++++++++++++++++++-
src/conf/numa_conf.h                               |  25 +-
src/libvirt_private.syms                           |   6 +
src/libxl/libxl_conf.c                             | 120 ++++++++
src/libxl/libxl_driver.c                           |   3 +-
src/xenconfig/xen_xl.c                             | 333 +++++++++++++++++++++
.../test-fullvirt-vnuma-nodistances.cfg            |  26 ++
.../test-fullvirt-vnuma-nodistances.xml            |  53 ++++
tests/xlconfigdata/test-fullvirt-vnuma.cfg         |  26 ++
tests/xlconfigdata/test-fullvirt-vnuma.xml         |  81 +++++
tests/xlconfigtest.c                               |   4 +
15 files changed, 1089 insertions(+), 10 deletions(-)
create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.cfg
create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.xml
create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.cfg
create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.xml
[libvirt] [PATCH v3 0/4] numa: describe sibling nodes distances
Posted by Wim Ten Have 6 years, 6 months ago
From: Wim ten Have <wim.ten.have@oracle.com>

This patch extents guest domain administration adding support to advertise
node sibling distances when configuring HVM numa guests.

NUMA (non-uniform memory access), a method of configuring a cluster of nodes
within a single multiprocessing system such that it shares processor
local memory amongst others improving performance and the ability of the
system to be expanded.

A NUMA system could be illustrated as shown below. Within this 4-node
system, every socket is equipped with its own distinct memory. The whole
typically resembles a SMP (symmetric multiprocessing) system being a
"tightly-coupled," "share everything" system in which multiple processors
are working under a single operating system and can access each others'
memory over multiple "Bus Interconnect" paths.

        +-----+-----+-----+         +-----+-----+-----+
        |  M  | CPU | CPU |         | CPU | CPU |  M  |
        |  E  |     |     |         |     |     |  E  |
        |  M  +- Socket0 -+         +- Socket3 -+  M  |
        |  O  |     |     |         |     |     |  O  |
        |  R  | CPU | CPU <---------> CPU | CPU |  R  |
        |  Y  |     |     |         |     |     |  Y  |
        +-----+--^--+-----+         +-----+--^--+-----+
                 |                           |
                 |      Bus Interconnect     |
                 |                           |
        +-----+--v--+-----+         +-----+--v--+-----+
        |  M  |     |     |         |     |     |  M  |
        |  E  | CPU | CPU <---------> CPU | CPU |  E  |
        |  M  |     |     |         |     |     |  M  |
        |  O  +- Socket1 -+         +- Socket2 -+  O  |
        |  R  |     |     |         |     |     |  R  |
        |  Y  | CPU | CPU |         | CPU | CPU |  Y  |
        +-----+-----+-----+         +-----+-----+-----+

In contrast there is the limitation of a flat SMP system, not illustrated.
Here, as sockets are added, the bus (data and address path), under high
activity, gets overloaded and easily becomes a performance bottleneck.
NUMA adds an intermediate level of memory shared amongst a few cores per
socket as illustrated above, so that data accesses do not have to travel
over a single bus.

Unfortunately the way NUMA does this adds its own limitations. This,
as visualized in the illustration above, happens when data is stored in
memory associated with Socket2 and is accessed by a CPU (core) in Socket0.
The processors use the "Bus Interconnect" to create gateways between the
sockets (nodes) enabling inter-socket access to memory. These "Bus
Interconnect" hops add data access delays when a CPU (core) accesses
memory associated with a remote socket (node).

For terminology we refer to sockets as "nodes" where access to each
others' distinct resources such as memory make them "siblings" with a
designated "distance" between them.  A specific design is described under
the ACPI (Advanced Configuration and Power Interface Specification)
within the chapter explaining the system's SLIT (System Locality Distance
Information Table).

These patches extend core libvirt's XML description of a virtual machine's
hardware to include NUMA distance information for sibling nodes, which
is then passed to Xen guests via libxl. Recently qemu landed support for
constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
distance for different NUMA nodes"), hence these core libvirt extensions
can also help other drivers in supporting this feature.

The XML changes made allow to describe the <cell> (or node/sockets) <distances>
amongst <sibling> node identifiers and propagate these towards the numa
domain functionality finally adding support to libxl.

[below is an example illustrating a 4 node/socket <cell> setup]

    <cpu>
      <numa>
        <cell id='0' cpus='0,4-7' memory='2097152' unit='KiB'>
          <distances>
            <sibling id='0' value='10'/>
            <sibling id='1' value='21'/>
            <sibling id='2' value='31'/>
            <sibling id='3' value='41'/>
          </distances>
        </cell>
        <cell id='1' cpus='1,8-10,12-15' memory='2097152' unit='KiB'>
          <distances>
            <sibling id='0' value='21'/>
            <sibling id='1' value='10'/>
            <sibling id='2' value='21'/>
            <sibling id='3' value='31'/>
          </distances>
        </cell>
        <cell id='2' cpus='2,11' memory='2097152' unit='KiB'>
          <distances>
            <sibling id='0' value='31'/>
            <sibling id='1' value='21'/>
            <sibling id='2' value='10'/>
            <sibling id='3' value='21'/>
          </distances>
        </cell>
        <cell id='3' cpus='3' memory='2097152' unit='KiB'>
          <distances>
            <sibling id='0' value='41'/>
            <sibling id='1' value='31'/>
            <sibling id='2' value='21'/>
            <sibling id='3' value='10'/>
          </distances>
        </cell>
      </numa>
    </cpu>

By default on libxl, if no <distances> are given to describe the SLIT data
between different <cell>s, this patch will default to a scheme using 10
for local and 21 for any remote node/socket, which is the assumption of
guest OS when no SLIT is specified. While SLIT is optional, libxl requires
that distances are set nonetheless.

On Linux systems the SLIT detail can be listed with help of the 'numactl -H'
command. An above HVM guest as described would on such prompt with below output.

    [root@f25 ~]# numactl -H
    available: 4 nodes (0-3)
    node 0 cpus: 0 4 5 6 7
    node 0 size: 1988 MB
    node 0 free: 1743 MB
    node 1 cpus: 1 8 9 10 12 13 14 15
    node 1 size: 1946 MB
    node 1 free: 1885 MB
    node 2 cpus: 2 11
    node 2 size: 2011 MB
    node 2 free: 1912 MB
    node 3 cpus: 3
    node 3 size: 2010 MB
    node 3 free: 1980 MB
    node distances:
    node   0   1   2   3
      0:  10  21  31  41
      1:  21  10  21  31
      2:  31  21  10  21
      3:  41  31  21  10

Wim ten Have (4):
  numa: describe siblings distances within cells
  libxl: vnuma support
  xenconfig: add domxml conversions for xen-xl
  xlconfigtest: add tests for numa cell sibling distances

 docs/formatdomain.html.in                          |  70 ++++-
 docs/schemas/basictypes.rng                        |   9 +
 docs/schemas/cputypes.rng                          |  18 ++
 src/conf/cpu_conf.c                                |   2 +-
 src/conf/numa_conf.c                               | 323 +++++++++++++++++++-
 src/conf/numa_conf.h                               |  25 +-
 src/libvirt_private.syms                           |   6 +
 src/libxl/libxl_conf.c                             | 120 ++++++++
 src/libxl/libxl_driver.c                           |   3 +-
 src/xenconfig/xen_xl.c                             | 333 +++++++++++++++++++++
 .../test-fullvirt-vnuma-nodistances.cfg            |  26 ++
 .../test-fullvirt-vnuma-nodistances.xml            |  53 ++++
 tests/xlconfigdata/test-fullvirt-vnuma.cfg         |  26 ++
 tests/xlconfigdata/test-fullvirt-vnuma.xml         |  81 +++++
 tests/xlconfigtest.c                               |   4 +
 15 files changed, 1089 insertions(+), 10 deletions(-)
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.cfg
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.xml
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.cfg
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.xml

-- 
2.9.5

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v3 0/4] numa: describe sibling nodes distances
Posted by Jim Fehlig 6 years, 6 months ago
Hi Wim,

I'll be away for a few weeks and won't be able to review this in detail until 
later in the month. I see Martin provided some feedback on patch1, which is 
awesome since I'd prefer a broader agreement on that patch than my single 'ack'.

BTW, the new code in patch2 can also be tested now that we have domXML <-> 
libxl_domain_config conversion tests :-). See tests/libxlxml2domconfigtest.c

Regards,
Jim

On 08/31/2017 08:02 AM, Wim Ten Have wrote:
> From: Wim ten Have <wim.ten.have@oracle.com>
> 
> This patch extents guest domain administration adding support to advertise
> node sibling distances when configuring HVM numa guests.
> 
> NUMA (non-uniform memory access), a method of configuring a cluster of nodes
> within a single multiprocessing system such that it shares processor
> local memory amongst others improving performance and the ability of the
> system to be expanded.
> 
> A NUMA system could be illustrated as shown below. Within this 4-node
> system, every socket is equipped with its own distinct memory. The whole
> typically resembles a SMP (symmetric multiprocessing) system being a
> "tightly-coupled," "share everything" system in which multiple processors
> are working under a single operating system and can access each others'
> memory over multiple "Bus Interconnect" paths.
> 
>          +-----+-----+-----+         +-----+-----+-----+
>          |  M  | CPU | CPU |         | CPU | CPU |  M  |
>          |  E  |     |     |         |     |     |  E  |
>          |  M  +- Socket0 -+         +- Socket3 -+  M  |
>          |  O  |     |     |         |     |     |  O  |
>          |  R  | CPU | CPU <---------> CPU | CPU |  R  |
>          |  Y  |     |     |         |     |     |  Y  |
>          +-----+--^--+-----+         +-----+--^--+-----+
>                   |                           |
>                   |      Bus Interconnect     |
>                   |                           |
>          +-----+--v--+-----+         +-----+--v--+-----+
>          |  M  |     |     |         |     |     |  M  |
>          |  E  | CPU | CPU <---------> CPU | CPU |  E  |
>          |  M  |     |     |         |     |     |  M  |
>          |  O  +- Socket1 -+         +- Socket2 -+  O  |
>          |  R  |     |     |         |     |     |  R  |
>          |  Y  | CPU | CPU |         | CPU | CPU |  Y  |
>          +-----+-----+-----+         +-----+-----+-----+
> 
> In contrast there is the limitation of a flat SMP system, not illustrated.
> Here, as sockets are added, the bus (data and address path), under high
> activity, gets overloaded and easily becomes a performance bottleneck.
> NUMA adds an intermediate level of memory shared amongst a few cores per
> socket as illustrated above, so that data accesses do not have to travel
> over a single bus.
> 
> Unfortunately the way NUMA does this adds its own limitations. This,
> as visualized in the illustration above, happens when data is stored in
> memory associated with Socket2 and is accessed by a CPU (core) in Socket0.
> The processors use the "Bus Interconnect" to create gateways between the
> sockets (nodes) enabling inter-socket access to memory. These "Bus
> Interconnect" hops add data access delays when a CPU (core) accesses
> memory associated with a remote socket (node).
> 
> For terminology we refer to sockets as "nodes" where access to each
> others' distinct resources such as memory make them "siblings" with a
> designated "distance" between them.  A specific design is described under
> the ACPI (Advanced Configuration and Power Interface Specification)
> within the chapter explaining the system's SLIT (System Locality Distance
> Information Table).
> 
> These patches extend core libvirt's XML description of a virtual machine's
> hardware to include NUMA distance information for sibling nodes, which
> is then passed to Xen guests via libxl. Recently qemu landed support for
> constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
> distance for different NUMA nodes"), hence these core libvirt extensions
> can also help other drivers in supporting this feature.
> 
> The XML changes made allow to describe the <cell> (or node/sockets) <distances>
> amongst <sibling> node identifiers and propagate these towards the numa
> domain functionality finally adding support to libxl.
> 
> [below is an example illustrating a 4 node/socket <cell> setup]
> 
>      <cpu>
>        <numa>
>          <cell id='0' cpus='0,4-7' memory='2097152' unit='KiB'>
>            <distances>
>              <sibling id='0' value='10'/>
>              <sibling id='1' value='21'/>
>              <sibling id='2' value='31'/>
>              <sibling id='3' value='41'/>
>            </distances>
>          </cell>
>          <cell id='1' cpus='1,8-10,12-15' memory='2097152' unit='KiB'>
>            <distances>
>              <sibling id='0' value='21'/>
>              <sibling id='1' value='10'/>
>              <sibling id='2' value='21'/>
>              <sibling id='3' value='31'/>
>            </distances>
>          </cell>
>          <cell id='2' cpus='2,11' memory='2097152' unit='KiB'>
>            <distances>
>              <sibling id='0' value='31'/>
>              <sibling id='1' value='21'/>
>              <sibling id='2' value='10'/>
>              <sibling id='3' value='21'/>
>            </distances>
>          </cell>
>          <cell id='3' cpus='3' memory='2097152' unit='KiB'>
>            <distances>
>              <sibling id='0' value='41'/>
>              <sibling id='1' value='31'/>
>              <sibling id='2' value='21'/>
>              <sibling id='3' value='10'/>
>            </distances>
>          </cell>
>        </numa>
>      </cpu>
> 
> By default on libxl, if no <distances> are given to describe the SLIT data
> between different <cell>s, this patch will default to a scheme using 10
> for local and 21 for any remote node/socket, which is the assumption of
> guest OS when no SLIT is specified. While SLIT is optional, libxl requires
> that distances are set nonetheless.
> 
> On Linux systems the SLIT detail can be listed with help of the 'numactl -H'
> command. An above HVM guest as described would on such prompt with below output.
> 
>      [root@f25 ~]# numactl -H
>      available: 4 nodes (0-3)
>      node 0 cpus: 0 4 5 6 7
>      node 0 size: 1988 MB
>      node 0 free: 1743 MB
>      node 1 cpus: 1 8 9 10 12 13 14 15
>      node 1 size: 1946 MB
>      node 1 free: 1885 MB
>      node 2 cpus: 2 11
>      node 2 size: 2011 MB
>      node 2 free: 1912 MB
>      node 3 cpus: 3
>      node 3 size: 2010 MB
>      node 3 free: 1980 MB
>      node distances:
>      node   0   1   2   3
>        0:  10  21  31  41
>        1:  21  10  21  31
>        2:  31  21  10  21
>        3:  41  31  21  10
> 
> Wim ten Have (4):
>    numa: describe siblings distances within cells
>    libxl: vnuma support
>    xenconfig: add domxml conversions for xen-xl
>    xlconfigtest: add tests for numa cell sibling distances
> 
>   docs/formatdomain.html.in                          |  70 ++++-
>   docs/schemas/basictypes.rng                        |   9 +
>   docs/schemas/cputypes.rng                          |  18 ++
>   src/conf/cpu_conf.c                                |   2 +-
>   src/conf/numa_conf.c                               | 323 +++++++++++++++++++-
>   src/conf/numa_conf.h                               |  25 +-
>   src/libvirt_private.syms                           |   6 +
>   src/libxl/libxl_conf.c                             | 120 ++++++++
>   src/libxl/libxl_driver.c                           |   3 +-
>   src/xenconfig/xen_xl.c                             | 333 +++++++++++++++++++++
>   .../test-fullvirt-vnuma-nodistances.cfg            |  26 ++
>   .../test-fullvirt-vnuma-nodistances.xml            |  53 ++++
>   tests/xlconfigdata/test-fullvirt-vnuma.cfg         |  26 ++
>   tests/xlconfigdata/test-fullvirt-vnuma.xml         |  81 +++++
>   tests/xlconfigtest.c                               |   4 +
>   15 files changed, 1089 insertions(+), 10 deletions(-)
>   create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.cfg
>   create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.xml
>   create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.cfg
>   create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.xml
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list