[libvirt][PATCH v5 0/3] introduce 'restrictive' mode in numatune

Luyao Zhong posted 3 patches 3 years ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20210413063808.333318-1-luyao.zhong@intel.com
docs/formatdomain.rst                         |  7 +++-
docs/schemas/domaincommon.rng                 |  2 +
include/libvirt/libvirt-domain.h              |  1 +
src/conf/numa_conf.c                          |  9 ++++
src/qemu/qemu_command.c                       |  6 ++-
src/qemu/qemu_process.c                       | 27 ++++++++++++
src/util/virnuma.c                            |  3 ++
.../numatune-memnode-invalid-mode.err         |  1 +
.../numatune-memnode-invalid-mode.xml         | 33 +++++++++++++++
...emnode-restrictive-mode.x86_64-latest.args | 38 +++++++++++++++++
.../numatune-memnode-restrictive-mode.xml     | 33 +++++++++++++++
tests/qemuxml2argvtest.c                      |  2 +
...memnode-restrictive-mode.x86_64-latest.xml | 41 +++++++++++++++++++
tests/qemuxml2xmltest.c                       |  1 +
14 files changed, 201 insertions(+), 3 deletions(-)
create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.err
create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.xml
create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.x86_64-latest.args
create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.xml
create mode 100644 tests/qemuxml2xmloutdata/numatune-memnode-restrictive-mode.x86_64-latest.xml
[libvirt][PATCH v5 0/3] introduce 'restrictive' mode in numatune
Posted by Luyao Zhong 3 years ago
Before this patch set, numatune only has three memory modes:
static, interleave and prefered. These memory policies are
ultimately set by mbind() system call.

Memory policy could be 'hard coded' into the kernel, but none of
above policies fit our requirment under this case. mbind() support
default memory policy, but it requires a NULL nodemask. So obviously
setting allowed memory nodes is cgroups' mission under this case.
So we introduce a new option for mode in numatune named 'restrictive'.

<numatune>
   <memory mode="restrictive" nodeset="1-4,^3"/>
   <memnode cellid="0" mode="restrictive" nodeset="1"/>
   <memnode cellid="2" mode="restrictive" nodeset="2"/>
</numatune>

The config above means we only use cgroups to restrict the allowed
memory nodes and not setting any specific memory policies explicitly.

For this new "restrictive" mode, there is a concrete use case about a
new feature in kernel but not merged yet, we call it memory tiering.
(https://lwn.net/Articles/802544/).
If memory tiering is enabled on host, DRAM is top tier memory, and
PMEM(persistent memory) is second tier memory, PMEM is shown as numa node
without cpu. Pages can be migrated between DRAM node and PMEM node based on
DRAM pressure and how cold/hot they are. *this memory policy* is implemented
in kernel. So we need a default mode here, but from libvirt's perspective,
the "defaut" mode is "strict", it's not MPOL_DEFAULT
(https://man7.org/linux/man-pages/man2/mbind.2.html) defined in kernel.
And to make memory tiering works well, cgroups setting is necessary, since
it restricts that the pages can only be migrated between the DRAM and PMEM
nodes that we specified (NUMA affinity support).

Just using cgroups with multiple nodes in the nodeset makes kernel decide
on which node (out of those in the restricted set) to allocate on, but specifying
"strict" basically allocates it sequentially (on the first one until it is full,
then on the next one and so on). 

In a word, if a user requires default mode(MPOL_DEFAULT), that means they want
kernel decide the memory allocation and also want the cgroups to restrict memory
nodes, "restrictive" mode will be useful.

BR,
Luyao

Luyao Zhong (3):
  docs: add docs for 'restrictive' option for mode in numatune
  schema: add 'restrictive' config option for mode in numatune
  qemu: add parser and formatter for 'restrictive' mode in numatune

 docs/formatdomain.rst                         |  7 +++-
 docs/schemas/domaincommon.rng                 |  2 +
 include/libvirt/libvirt-domain.h              |  1 +
 src/conf/numa_conf.c                          |  9 ++++
 src/qemu/qemu_command.c                       |  6 ++-
 src/qemu/qemu_process.c                       | 27 ++++++++++++
 src/util/virnuma.c                            |  3 ++
 .../numatune-memnode-invalid-mode.err         |  1 +
 .../numatune-memnode-invalid-mode.xml         | 33 +++++++++++++++
 ...emnode-restrictive-mode.x86_64-latest.args | 38 +++++++++++++++++
 .../numatune-memnode-restrictive-mode.xml     | 33 +++++++++++++++
 tests/qemuxml2argvtest.c                      |  2 +
 ...memnode-restrictive-mode.x86_64-latest.xml | 41 +++++++++++++++++++
 tests/qemuxml2xmltest.c                       |  1 +
 14 files changed, 201 insertions(+), 3 deletions(-)
 create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.err
 create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.xml
 create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.x86_64-latest.args
 create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.xml
 create mode 100644 tests/qemuxml2xmloutdata/numatune-memnode-restrictive-mode.x86_64-latest.xml

-- 
2.25.4

Re: [libvirt][PATCH v5 0/3] introduce 'restrictive' mode in numatune
Posted by Martin Kletzander 3 years ago
On Tue, Apr 13, 2021 at 02:38:05PM +0800, Luyao Zhong wrote:
>Before this patch set, numatune only has three memory modes:
>static, interleave and prefered. These memory policies are
>ultimately set by mbind() system call.
>
>Memory policy could be 'hard coded' into the kernel, but none of
>above policies fit our requirment under this case. mbind() support
>default memory policy, but it requires a NULL nodemask. So obviously
>setting allowed memory nodes is cgroups' mission under this case.
>So we introduce a new option for mode in numatune named 'restrictive'.
>
><numatune>
>   <memory mode="restrictive" nodeset="1-4,^3"/>
>   <memnode cellid="0" mode="restrictive" nodeset="1"/>
>   <memnode cellid="2" mode="restrictive" nodeset="2"/>
></numatune>
>
>The config above means we only use cgroups to restrict the allowed
>memory nodes and not setting any specific memory policies explicitly.
>
>For this new "restrictive" mode, there is a concrete use case about a
>new feature in kernel but not merged yet, we call it memory tiering.
>(https://lwn.net/Articles/802544/).
>If memory tiering is enabled on host, DRAM is top tier memory, and
>PMEM(persistent memory) is second tier memory, PMEM is shown as numa node
>without cpu. Pages can be migrated between DRAM node and PMEM node based on
>DRAM pressure and how cold/hot they are. *this memory policy* is implemented
>in kernel. So we need a default mode here, but from libvirt's perspective,
>the "defaut" mode is "strict", it's not MPOL_DEFAULT
>(https://man7.org/linux/man-pages/man2/mbind.2.html) defined in kernel.
>And to make memory tiering works well, cgroups setting is necessary, since
>it restricts that the pages can only be migrated between the DRAM and PMEM
>nodes that we specified (NUMA affinity support).
>
>Just using cgroups with multiple nodes in the nodeset makes kernel decide
>on which node (out of those in the restricted set) to allocate on, but specifying
>"strict" basically allocates it sequentially (on the first one until it is full,
>then on the next one and so on).
>
>In a word, if a user requires default mode(MPOL_DEFAULT), that means they want
>kernel decide the memory allocation and also want the cgroups to restrict memory
>nodes, "restrictive" mode will be useful.
>

I applied the changes locally and fixed some changes that happened in
the meantime.  I also split the patches differently as we usually add
conf, docs and schemas (driver-unrelated code) and some possible tests
in one patch and then add support for each applicable driver in separate
patches.  I reworded some comments there were also two memory leaks that
I fixed and I will resend the series later to see if we have everything
in order.

If we disagree on the naming, then we can change it until the release,
but I do not think that is something that should stall the patches.

Thanks.

>BR,
>Luyao
>
>Luyao Zhong (3):
>  docs: add docs for 'restrictive' option for mode in numatune
>  schema: add 'restrictive' config option for mode in numatune
>  qemu: add parser and formatter for 'restrictive' mode in numatune
>
> docs/formatdomain.rst                         |  7 +++-
> docs/schemas/domaincommon.rng                 |  2 +
> include/libvirt/libvirt-domain.h              |  1 +
> src/conf/numa_conf.c                          |  9 ++++
> src/qemu/qemu_command.c                       |  6 ++-
> src/qemu/qemu_process.c                       | 27 ++++++++++++
> src/util/virnuma.c                            |  3 ++
> .../numatune-memnode-invalid-mode.err         |  1 +
> .../numatune-memnode-invalid-mode.xml         | 33 +++++++++++++++
> ...emnode-restrictive-mode.x86_64-latest.args | 38 +++++++++++++++++
> .../numatune-memnode-restrictive-mode.xml     | 33 +++++++++++++++
> tests/qemuxml2argvtest.c                      |  2 +
> ...memnode-restrictive-mode.x86_64-latest.xml | 41 +++++++++++++++++++
> tests/qemuxml2xmltest.c                       |  1 +
> 14 files changed, 201 insertions(+), 3 deletions(-)
> create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.err
> create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.xml
> create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.x86_64-latest.args
> create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.xml
> create mode 100644 tests/qemuxml2xmloutdata/numatune-memnode-restrictive-mode.x86_64-latest.xml
>
>-- 
>2.25.4
>