[PATCH v2] NUMA: Introduce NODE_DATA->node_present_pages(RAM pages)

Bernhard Kaindl posted 1 patch 1 month, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20241002102331.437048-1-bernhardkaindl7@gmail.com
There is a newer version of this series
xen/arch/x86/x86_64/mm.c |  3 +++
xen/common/numa.c        | 28 +++++++++++++++++++++++++++-
xen/include/xen/numa.h   |  3 +++
3 files changed, 33 insertions(+), 1 deletion(-)
[PATCH v2] NUMA: Introduce NODE_DATA->node_present_pages(RAM pages)
Posted by Bernhard Kaindl 1 month, 2 weeks ago
From: Bernhard Kaindl <bernhard.kaindl@cloud.com>

Changes in v2:
- Remove update of numainfo call, only calculate RAM for each node.
- Calculate RAM based on page boundaries, coding style fixes

Some admin tools like 'xl info -n' like to display the total memory
for each NUMA node. The Xen backend[1] of hwloc comes to mind too.

The total amount of RAM on a NUMA node is not needed by Xen internally:
Xen only uses NODE_DATA->node_spanned_pages, but that can be confusing
for users as it includes memory holes (can be as large as 2GB on x86).

Calculate the RAM per NUMA node by iterating over arch_get_ram_range()
which returns the e820 RAM entries on x86 and update it on memory_add().

Use NODE_DATA->node_present_pages (like in the Linux kernel) to hold
this info and in a later commit, find a way for tools to read it.

[1] hwloc with Xen backend: https://github.com/xenserver-next/hwloc/

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
---
 xen/arch/x86/x86_64/mm.c |  3 +++
 xen/common/numa.c        | 28 +++++++++++++++++++++++++++-
 xen/include/xen/numa.h   |  3 +++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index b2a280fba3..66b9bed057 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1334,6 +1334,9 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     share_hotadd_m2p_table(&info);
     transfer_pages_to_heap(&info);
 
+    /* Update the node's present pages (like the total_pages of the system) */
+    NODE_DATA(node)->node_present_pages += epfn - spfn;
+
     return 0;
 
 destroy_m2p:
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 28a09766fa..1b0f8180a0 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -499,15 +499,41 @@ int __init compute_hash_shift(const struct node *nodes,
     return shift;
 }
 
-/* Initialize NODE_DATA given nodeid and start/end */
+/**
+ * @brief Initialize a NUMA node's NODE_DATA given nodeid and start/end addrs.
+ *
+ * This function sets up the boot memory for a given NUMA node by calculating
+ * the node's start and end page frame numbers (PFNs) and determining
+ * the number of present RAM pages within the node's memory range.
+ *
+ * @param nodeid The identifier of the node to initialize.
+ * @param start The starting physical address of the node's memory range.
+ * @param end The ending physical address of the node's memory range.
+ */
 void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
 {
     unsigned long start_pfn = paddr_to_pfn(start);
     unsigned long end_pfn = paddr_to_pfn(end);
+    unsigned long ram_start_pfn, ram_end_pfn;
+    unsigned int idx = 0;
+    int err;
+    paddr_t ram_start, ram_end;
 
     NODE_DATA(nodeid)->node_start_pfn = start_pfn;
     NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
 
+    /* Calculate the numer of present RAM pages within the node: */
+    NODE_DATA(nodeid)->node_present_pages = 0;
+    while ( (err = arch_get_ram_range(idx++, &ram_start, &ram_end)) != -ENOENT )
+    {
+        if ( err || ram_start >= end || ram_end <= start )
+            continue;  /* Not RAM (err != 0) or range is outside the node */
+
+        /* Add RAM that in the node's memory range (within start and end): */
+        ram_end_pfn = min(ram_end, end) >> PAGE_SHIFT;
+        ram_start_pfn = (PAGE_ALIGN(max(ram_start, start))) >> PAGE_SHIFT;
+        NODE_DATA(nodeid)->node_present_pages += ram_end_pfn - ram_start_pfn;
+    }
     node_set_online(nodeid);
 }
 
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index fd1511a6fb..c860f3ad1c 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -71,6 +71,7 @@ extern nodeid_t *memnodemap;
 struct node_data {
     unsigned long node_start_pfn;
     unsigned long node_spanned_pages;
+    unsigned long node_present_pages;
 };
 
 extern struct node_data node_data[];
@@ -91,6 +92,7 @@ static inline nodeid_t mfn_to_nid(mfn_t mfn)
 
 #define node_start_pfn(nid)     (NODE_DATA(nid)->node_start_pfn)
 #define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages)
+#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
                                  NODE_DATA(nid)->node_spanned_pages)
 
@@ -123,6 +125,7 @@ extern void numa_set_processor_nodes_parsed(nodeid_t node);
 extern mfn_t first_valid_mfn;
 
 #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
+#define node_present_pages(nid) total_pages
 #define node_start_pfn(nid) mfn_x(first_valid_mfn)
 #define __node_distance(a, b) 20
 
-- 
2.43.0
Re: [PATCH v2] NUMA: Introduce NODE_DATA->node_present_pages(RAM pages)
Posted by Jan Beulich 1 month, 2 weeks ago
On 02.10.2024 12:23, Bernhard Kaindl wrote:
> From: Bernhard Kaindl <bernhard.kaindl@cloud.com>
> 
> Changes in v2:
> - Remove update of numainfo call, only calculate RAM for each node.
> - Calculate RAM based on page boundaries, coding style fixes

Nit: This shouldn't be part of the description (eventual commit message);
it belongs ...

> Some admin tools like 'xl info -n' like to display the total memory
> for each NUMA node. The Xen backend[1] of hwloc comes to mind too.
> 
> The total amount of RAM on a NUMA node is not needed by Xen internally:
> Xen only uses NODE_DATA->node_spanned_pages, but that can be confusing
> for users as it includes memory holes (can be as large as 2GB on x86).
> 
> Calculate the RAM per NUMA node by iterating over arch_get_ram_range()
> which returns the e820 RAM entries on x86 and update it on memory_add().
> 
> Use NODE_DATA->node_present_pages (like in the Linux kernel) to hold
> this info and in a later commit, find a way for tools to read it.
> 
> [1] hwloc with Xen backend: https://github.com/xenserver-next/hwloc/
> 
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
> ---

... below this first (or any later) such separator.

> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -499,15 +499,41 @@ int __init compute_hash_shift(const struct node *nodes,
>      return shift;
>  }
>  
> -/* Initialize NODE_DATA given nodeid and start/end */
> +/**
> + * @brief Initialize a NUMA node's NODE_DATA given nodeid and start/end addrs.
> + *
> + * This function sets up the boot memory for a given NUMA node by calculating
> + * the node's start and end page frame numbers (PFNs) and determining
> + * the number of present RAM pages within the node's memory range.
> + *
> + * @param nodeid The identifier of the node to initialize.
> + * @param start The starting physical address of the node's memory range.
> + * @param end The ending physical address of the node's memory range.
> + */
>  void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
>  {
>      unsigned long start_pfn = paddr_to_pfn(start);
>      unsigned long end_pfn = paddr_to_pfn(end);
> +    unsigned long ram_start_pfn, ram_end_pfn;

These two variables would better live inside the loop, as it's only there
that they're used.

> +    unsigned int idx = 0;
> +    int err;
> +    paddr_t ram_start, ram_end;
>  
>      NODE_DATA(nodeid)->node_start_pfn = start_pfn;
>      NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
>  
> +    /* Calculate the numer of present RAM pages within the node: */

Nit: number

> +    NODE_DATA(nodeid)->node_present_pages = 0;
> +    while ( (err = arch_get_ram_range(idx++, &ram_start, &ram_end)) != -ENOENT )
> +    {
> +        if ( err || ram_start >= end || ram_end <= start )
> +            continue;  /* Not RAM (err != 0) or range is outside the node */
> +
> +        /* Add RAM that in the node's memory range (within start and end): */

Nit: I think you either want "that's" or s/that in/from/ (or something
substantially similar).

> +        ram_end_pfn = min(ram_end, end) >> PAGE_SHIFT;
> +        ram_start_pfn = (PAGE_ALIGN(max(ram_start, start))) >> PAGE_SHIFT;

You effectively do the comparisons twice - once in the if() and then a
second time in the min() / max(). Can't you simply check ram_start_pfn
against ram_end_pfn, totaling to one less comparison?

Also please don't open-code PFN_DOWN() / PFN_UP().

Jan