[PATCH 1/2] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes

Akinobu Mita posted 2 patches 1 week, 4 days ago
[PATCH 1/2] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes
Posted by Akinobu Mita 1 week, 4 days ago
This makes it possible to create memory tiers using fake numa nodes
generated by numa emulation.

The new "numa_emulation.adistance" kernel parameter allows you to set the
abstract distance for each NUMA node.

For example, if the system is booted with the parameters
"numa=fake=2 numa_emulation.adistance=576,704", it will configure memory
tiers with node0 having the default DRAM adistance value and node1 having
a lower adistance value.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
 mm/numa_emulation.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c
index 703c8fa05048..a4266da21344 100644
--- a/mm/numa_emulation.c
+++ b/mm/numa_emulation.c
@@ -6,6 +6,9 @@
 #include <linux/errno.h>
 #include <linux/topology.h>
 #include <linux/memblock.h>
+#include <linux/memory-tiers.h>
+#include <linux/module.h>
+#include <linux/node.h>
 #include <linux/numa_memblks.h>
 #include <asm/numa.h>
 #include <acpi/acpi_numa.h>
@@ -344,6 +347,27 @@ static int __init setup_emu2phys_nid(int *dfl_phys_nid)
 	return max_emu_nid;
 }
 
+static int adistance[MAX_NUMNODES];
+module_param_array(adistance, int, NULL, 0400);
+MODULE_PARM_DESC(adistance, "Abstract distance values for each NUMA node");
+
+static int emu_calculate_adistance(struct notifier_block *self,
+				unsigned long nid, void *data)
+{
+	if (adistance[nid]) {
+		int *adist = data;
+
+		*adist = adistance[nid];
+		return NOTIFY_STOP;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block emu_adist_nb = {
+	.notifier_call = emu_calculate_adistance,
+	.priority = INT_MIN,
+};
+
 /**
  * numa_emulation - Emulate NUMA nodes
  * @numa_meminfo: NUMA configuration to massage
@@ -532,6 +556,8 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt)
 		}
 	}
 
+	register_mt_adistance_algorithm(&emu_adist_nb);
+
 	/* free the copied physical distance table */
 	memblock_free(phys_dist, phys_size);
 	return;
-- 
2.43.0
Re: [PATCH 1/2] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes
Posted by Andrew Morton 2 days, 15 hours ago
On Mon,  8 Dec 2025 18:40:27 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:

> This makes it possible to create memory tiers using fake numa nodes
> generated by numa emulation.
> 
> The new "numa_emulation.adistance" kernel parameter allows you to set the
> abstract distance for each NUMA node.
> 
> For example, if the system is booted with the parameters
> "numa=fake=2 numa_emulation.adistance=576,704", it will configure memory
> tiers with node0 having the default DRAM adistance value and node1 having
> a lower adistance value.

Confusing.  I'd have thought that this commandline would gave node0 a
distance of 576 and node1 a distance of 704?  But the text talks about
some third "default" distance, of unknown value.

Can we please clear all this up?

Also, we have little documentation for this stuff. 
fake-numa-for-cpusets.rst and kernel-parameters.txt.  Can you please
find somewhere appropriate to document this new user-facing feature? 
Maybe a new Documentation file?
Re: [PATCH 1/2] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes
Posted by Akinobu Mita 1 day, 21 hours ago
2025年12月17日(水) 5:24 Andrew Morton <akpm@linux-foundation.org>:
>
> On Mon,  8 Dec 2025 18:40:27 +0900 Akinobu Mita <akinobu.mita@gmail.com> wrote:
>
> > This makes it possible to create memory tiers using fake numa nodes
> > generated by numa emulation.
> >
> > The new "numa_emulation.adistance" kernel parameter allows you to set the
> > abstract distance for each NUMA node.
> >
> > For example, if the system is booted with the parameters
> > "numa=fake=2 numa_emulation.adistance=576,704", it will configure memory
> > tiers with node0 having the default DRAM adistance value and node1 having
> > a lower adistance value.
>
> Confusing.  I'd have thought that this commandline would gave node0 a
> distance of 576 and node1 a distance of 704?  But the text talks about
> some third "default" distance, of unknown value.
>
> Can we please clear all this up?

The DRAM abstract distance is defined by MEMTIER_ADISTANCE_DRAM
in linux/memory-tiers.h and has a value of 576.

Each memory tier covers an abstract distance chunk size of 128,
so nodes with abstract distances between 512 and 639 are classified
into the DRAM tier.

Here, the abstract distances of node0 and node1 are set to 576 and 706,
respectively, so they are classified into different tiers.

> Also, we have little documentation for this stuff.
> fake-numa-for-cpusets.rst and kernel-parameters.txt.  Can you please
> find somewhere appropriate to document this new user-facing feature?
> Maybe a new Documentation file?

Looks good.
I'll create a new Documentation/mm/numa_emulation.rst and
document at least this new parameter.