[PATCH v4 4/6] xen/x86: use arch_get_ram_range to get information from E820 map

Wei Chen posted 6 patches 3 years, 5 months ago
There is a newer version of this series
[PATCH v4 4/6] xen/x86: use arch_get_ram_range to get information from E820 map
Posted by Wei Chen 3 years, 5 months ago
The sanity check of nodes_cover_memory is also a requirement of
other architectures that support NUMA. But now, the code of
nodes_cover_memory is tied to the x86 E820. In this case, we
introduce arch_get_ram_range to decouple architecture specific
memory map from this function. This means, other architectures
like Arm can also use it to check its node and memory coverage
from bootmem info.

Depends arch_get_ram_range, we make nodes_cover_memory become
architecture independent. We also use neutral words to replace
SRAT and E820 in the print message of this function. This will
to make the massage seems more common.

As arch_get_ram_range use unsigned int for index, we also adjust
the index in nodes_cover_memory from int to unsigned int.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v3 -> v4:
1. Move function comment to header file.
2. Use bool for found, and add a new "err" for the return
   value of arch_get_ram_range.
3. Use -ENODATA instead of -EINVAL for non-RAM type ranges.
v2 -> v3:
1. Rename arch_get_memory_map to arch_get_ram_range.
2. Use -ENOENT instead of -ENODEV to indicate end of memory map.
3. Add description to code comment that arch_get_ram_range returns
   RAM range in [start, end) format.
v1 -> v2:
1. Use arch_get_memory_map to replace arch_get_memory_bank_range
   and arch_get_memory_bank_number.
2. Remove the !start || !end check, because caller guarantee
   these two pointers will not be NULL.
---
 xen/arch/x86/numa.c    | 15 +++++++++++++++
 xen/arch/x86/srat.c    | 30 ++++++++++++++++++------------
 xen/include/xen/numa.h | 13 +++++++++++++
 3 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 90b2a22591..fa8caaa084 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -9,6 +9,7 @@
 #include <xen/nodemask.h>
 #include <xen/numa.h>
 #include <asm/acpi.h>
+#include <asm/e820.h>
 
 #ifndef Dprintk
 #define Dprintk(x...)
@@ -93,3 +94,17 @@ unsigned int __init arch_get_dma_bitsize(void)
                  flsl(node_start_pfn(node) + node_spanned_pages(node) / 4 - 1)
                  + PAGE_SHIFT, 32);
 }
+
+int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t *end)
+{
+    if ( idx >= e820.nr_map )
+        return -ENOENT;
+
+    if ( e820.map[idx].type != E820_RAM )
+        return -ENODATA;
+
+    *start = e820.map[idx].addr;
+    *end = *start + e820.map[idx].size;
+
+    return 0;
+}
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 4c7f0c547e..bd9694db24 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -428,37 +428,43 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
    Make sure the PXMs cover all memory. */
 static int __init nodes_cover_memory(void)
 {
-	int i;
+	unsigned int i;
 
-	for (i = 0; i < e820.nr_map; i++) {
-		int j, found;
+	for (i = 0; ; i++) {
+		int err;
+		unsigned int j;
+		bool found;
 		paddr_t start, end;
 
-		if (e820.map[i].type != E820_RAM) {
-			continue;
-		}
+		/* Try to loop memory map from index 0 to end to get RAM ranges. */
+		err = arch_get_ram_range(i, &start, &end);
 
-		start = e820.map[i].addr;
-		end = e820.map[i].addr + e820.map[i].size;
+		/* Reach the end of arch's memory map */
+		if (err == -ENOENT)
+			break;
+
+		/* Index relate entry is not RAM, skip it. */
+		if (err)
+			continue;
 
 		do {
-			found = 0;
+			found = false;
 			for_each_node_mask(j, memory_nodes_parsed)
 				if (start < nodes[j].end
 				    && end > nodes[j].start) {
 					if (start >= nodes[j].start) {
 						start = nodes[j].end;
-						found = 1;
+						found = true;
 					}
 					if (end <= nodes[j].end) {
 						end = nodes[j].start;
-						found = 1;
+						found = true;
 					}
 				}
 		} while (found && start < end);
 
 		if (start < end) {
-			printk(KERN_ERR "SRAT: No PXM for e820 range: "
+			printk(KERN_ERR "NUMA: No NODE for RAM range: "
 				"[%"PRIpaddr", %"PRIpaddr"]\n", start, end - 1);
 			return 0;
 		}
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index af0abfc8cf..38be7db960 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -81,6 +81,19 @@ static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
                                  NODE_DATA(nid)->node_spanned_pages)
 
+/*
+ * This function provides the ability for caller to get one RAM entry
+ * from architectural memory map by index.
+ *
+ * This function will return zero if it can return a proper RAM entry.
+ * otherwise it will return -ENOENT for out of scope index, or return
+ * -ENODATA for non-RAM type memory entry.
+ *
+ * Note: the range is exclusive at the end, e.g. [start, end).
+ */
+extern int arch_get_ram_range(unsigned int idx,
+                              paddr_t *start, paddr_t *end);
+
 #endif
 
 #endif /* _XEN_NUMA_H */
-- 
2.25.1
Re: [PATCH v4 4/6] xen/x86: use arch_get_ram_range to get information from E820 map
Posted by Jan Beulich 3 years, 5 months ago
On 02.09.2022 05:31, Wei Chen wrote:
> The sanity check of nodes_cover_memory is also a requirement of
> other architectures that support NUMA. But now, the code of
> nodes_cover_memory is tied to the x86 E820. In this case, we
> introduce arch_get_ram_range to decouple architecture specific
> memory map from this function. This means, other architectures
> like Arm can also use it to check its node and memory coverage
> from bootmem info.
> 
> Depends arch_get_ram_range, we make nodes_cover_memory become
> architecture independent. We also use neutral words to replace
> SRAT and E820 in the print message of this function. This will
> to make the massage seems more common.
> 
> As arch_get_ram_range use unsigned int for index, we also adjust
> the index in nodes_cover_memory from int to unsigned int.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit still with a couple of suggestions:

> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -428,37 +428,43 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>     Make sure the PXMs cover all memory. */
>  static int __init nodes_cover_memory(void)
>  {
> -	int i;
> +	unsigned int i;
>  
> -	for (i = 0; i < e820.nr_map; i++) {
> -		int j, found;
> +	for (i = 0; ; i++) {
> +		int err;
> +		unsigned int j;
> +		bool found;
>  		paddr_t start, end;
>  
> -		if (e820.map[i].type != E820_RAM) {
> -			continue;
> -		}
> +		/* Try to loop memory map from index 0 to end to get RAM ranges. */
> +		err = arch_get_ram_range(i, &start, &end);
>  
> -		start = e820.map[i].addr;
> -		end = e820.map[i].addr + e820.map[i].size;
> +		/* Reach the end of arch's memory map */
> +		if (err == -ENOENT)
> +			break;

Such a comment ahead of an if() is often better put as a question, e.g.
"Reached the end of the memory map?" here or, if you dislike using a
question, "Exit the loop at the end of the memory map".

> +		/* Index relate entry is not RAM, skip it. */
> +		if (err)
> +			continue;

And then perhaps "Skip non-RAM entries" here.

> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -81,6 +81,19 @@ static inline nodeid_t __attribute_pure__ phys_to_nid(paddr_t addr)
>  #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
>                                   NODE_DATA(nid)->node_spanned_pages)
>  
> +/*
> + * This function provides the ability for caller to get one RAM entry
> + * from architectural memory map by index.
> + *
> + * This function will return zero if it can return a proper RAM entry.
> + * otherwise it will return -ENOENT for out of scope index, or return
> + * -ENODATA for non-RAM type memory entry.

The way you've implemented things, -ENODATA isn't special anymore, so
better wouldn't be called out as special here. May I suggest to at
least insert "e.g." in front of it? (An alternative would be to check
for -ENODATA in nodes_cover_memory() again, followed by ASSERT(!err).)

> + * Note: the range is exclusive at the end, e.g. [start, end).

Perhaps better [*start, *end) to match ...

> + */
> +extern int arch_get_ram_range(unsigned int idx,
> +                              paddr_t *start, paddr_t *end);

... this?

Jan
RE: [PATCH v4 4/6] xen/x86: use arch_get_ram_range to get information from E820 map
Posted by Wei Chen 3 years, 5 months ago
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年9月8日 20:14
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; George Dunlap
> <george.dunlap@citrix.com>; Julien Grall <julien@xen.org>; Stefano
> Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v4 4/6] xen/x86: use arch_get_ram_range to get
> information from E820 map
> 
> On 02.09.2022 05:31, Wei Chen wrote:
> > The sanity check of nodes_cover_memory is also a requirement of
> > other architectures that support NUMA. But now, the code of
> > nodes_cover_memory is tied to the x86 E820. In this case, we
> > introduce arch_get_ram_range to decouple architecture specific
> > memory map from this function. This means, other architectures
> > like Arm can also use it to check its node and memory coverage
> > from bootmem info.
> >
> > Depends arch_get_ram_range, we make nodes_cover_memory become
> > architecture independent. We also use neutral words to replace
> > SRAT and E820 in the print message of this function. This will
> > to make the massage seems more common.
> >
> > As arch_get_ram_range use unsigned int for index, we also adjust
> > the index in nodes_cover_memory from int to unsigned int.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit still with a couple of suggestions:
> 

Thanks, I will adjust the code comments to address your suggestions.

> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -428,37 +428,43 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
> >     Make sure the PXMs cover all memory. */
> >  static int __init nodes_cover_memory(void)
> >  {
> > -	int i;
> > +	unsigned int i;
> >
> > -	for (i = 0; i < e820.nr_map; i++) {
> > -		int j, found;
> > +	for (i = 0; ; i++) {
> > +		int err;
> > +		unsigned int j;
> > +		bool found;
> >  		paddr_t start, end;
> >
> > -		if (e820.map[i].type != E820_RAM) {
> > -			continue;
> > -		}
> > +		/* Try to loop memory map from index 0 to end to get RAM
> ranges. */
> > +		err = arch_get_ram_range(i, &start, &end);
> >
> > -		start = e820.map[i].addr;
> > -		end = e820.map[i].addr + e820.map[i].size;
> > +		/* Reach the end of arch's memory map */
> > +		if (err == -ENOENT)
> > +			break;
> 
> Such a comment ahead of an if() is often better put as a question, e.g.
> "Reached the end of the memory map?" here or, if you dislike using a
> question, "Exit the loop at the end of the memory map".
> 
> > +		/* Index relate entry is not RAM, skip it. */
> > +		if (err)
> > +			continue;
> 
> And then perhaps "Skip non-RAM entries" here.
> 
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -81,6 +81,19 @@ static inline nodeid_t __attribute_pure__
> phys_to_nid(paddr_t addr)
> >  #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
> >                                   NODE_DATA(nid)->node_spanned_pages)
> >
> > +/*
> > + * This function provides the ability for caller to get one RAM entry
> > + * from architectural memory map by index.
> > + *
> > + * This function will return zero if it can return a proper RAM entry.
> > + * otherwise it will return -ENOENT for out of scope index, or return
> > + * -ENODATA for non-RAM type memory entry.
> 
> The way you've implemented things, -ENODATA isn't special anymore, so
> better wouldn't be called out as special here. May I suggest to at
> least insert "e.g." in front of it? (An alternative would be to check
> for -ENODATA in nodes_cover_memory() again, followed by ASSERT(!err).)
> 
> > + * Note: the range is exclusive at the end, e.g. [start, end).
> 
> Perhaps better [*start, *end) to match ...
> 
> > + */
> > +extern int arch_get_ram_range(unsigned int idx,
> > +                              paddr_t *start, paddr_t *end);
> 
> ... this?
> 
> Jan