[v2] sched/topology: add for_each_numa_cpu() macro

[PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()

Posted by Yury Norov 2 years, 9 months ago

The test ensures that enumerators' output is consistent with
cpumask_local_spread().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/test_bitmap.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index a8005ad3bd58..1b5f805f6879 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -12,6 +12,7 @@
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/topology.h>
 #include <linux/uaccess.h>
 
 #include "../tools/testing/selftests/kselftest_module.h"
@@ -751,6 +752,33 @@ static void __init test_for_each_set_bit_wrap(void)
 	}
 }
 
+static void __init test_for_each_numa(void)
+{
+	unsigned int cpu, node;
+
+	for (node = 0; node < sched_domains_numa_levels; node++) {
+		const struct cpumask *m, *p = cpu_none_mask;
+		unsigned int c = 0;
+
+		rcu_read_lock();
+		for_each_numa_hop_mask(m, node) {
+			for_each_cpu_andnot(cpu, m, p)
+				expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+			p = m;
+		}
+		rcu_read_unlock();
+	}
+
+	for (node = 0; node < sched_domains_numa_levels; node++) {
+		unsigned int hop, c = 0;
+
+		rcu_read_lock();
+		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
+			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+		rcu_read_unlock();
+	}
+}
+
 static void __init test_for_each_set_bit(void)
 {
 	DECLARE_BITMAP(orig, 500);
@@ -1237,6 +1265,7 @@ static void __init selftest(void)
 	test_for_each_clear_bitrange_from();
 	test_for_each_set_clump8();
 	test_for_each_set_bit_wrap();
+	test_for_each_numa();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.34.1

Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()

Posted by Valentin Schneider 2 years, 9 months ago

On 19/04/23 22:19, Yury Norov wrote:
> +	for (node = 0; node < sched_domains_numa_levels; node++) {
> +		unsigned int hop, c = 0;
> +
> +		rcu_read_lock();
> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> +		rcu_read_unlock();
> +	}

I'm not fond of the export of sched_domains_numa_levels, especially
considering it's just there for tests.

Furthermore, is there any value is testing parity with
cpumask_local_spread()? Rather, shouldn't we check that using this API does
yield CPUs of increasing NUMA distance?

Something like

        for_each_node(node) {
                unsigned int prev_cpu, hop = 0;

                cpu = cpumask_first(cpumask_of_node(node));
                prev_cpu = cpu;

                rcu_read_lock();

                /* Assert distance is monotonically increasing */
                for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
                        expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
                        prev_cpu = cpu;
                }

                rcu_read_unlock();
        }

Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()

Posted by Yury Norov 2 years, 9 months ago

Hi Valentin,

Thanks for review!

On Mon, Apr 24, 2023 at 06:09:52PM +0100, Valentin Schneider wrote:
> On 19/04/23 22:19, Yury Norov wrote:
> > +	for (node = 0; node < sched_domains_numa_levels; node++) {
> > +		unsigned int hop, c = 0;
> > +
> > +		rcu_read_lock();
> > +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> > +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> > +		rcu_read_unlock();
> > +	}
> 
> I'm not fond of the export of sched_domains_numa_levels, especially
> considering it's just there for tests.
> 
> Furthermore, is there any value is testing parity with
> cpumask_local_spread()?

I wanted to emphasize that new NUMA-aware functions are coherent with
each other, just like find_nth_bit() is coherent with find_next_bit().

But all that coherence looks important only in non-NUMA case, because
client code may depend on fact that next CPU is never less than current.
This doesn't hold for NUMA iterators anyways...

> Rather, shouldn't we check that using this API does
> yield CPUs of increasing NUMA distance?
> 
> Something like
> 
>         for_each_node(node) {
>                 unsigned int prev_cpu, hop = 0;
> 
>                 cpu = cpumask_first(cpumask_of_node(node));
>                 prev_cpu = cpu;
> 
>                 rcu_read_lock();
> 
>                 /* Assert distance is monotonically increasing */
>                 for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>                         expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
>                         prev_cpu = cpu;
>                 }
> 
>                 rcu_read_unlock();
>         }

Your version of the test looks more straightforward. I need to think
for more, but it looks like I can take it in v3.

Thanks,
Yury

Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()

Posted by Valentin Schneider 2 years, 9 months ago

On 25/04/23 22:50, Yury Norov wrote:
> Hi Valentin,
>
> Thanks for review!
>
> On Mon, Apr 24, 2023 at 06:09:52PM +0100, Valentin Schneider wrote:
>> On 19/04/23 22:19, Yury Norov wrote:
>> > +	for (node = 0; node < sched_domains_numa_levels; node++) {
>> > +		unsigned int hop, c = 0;
>> > +
>> > +		rcu_read_lock();
>> > +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
>> > +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
>> > +		rcu_read_unlock();
>> > +	}
>> 
>> I'm not fond of the export of sched_domains_numa_levels, especially
>> considering it's just there for tests.
>> 
>> Furthermore, is there any value is testing parity with
>> cpumask_local_spread()?
>
> I wanted to emphasize that new NUMA-aware functions are coherent with
> each other, just like find_nth_bit() is coherent with find_next_bit().
>
> But all that coherence looks important only in non-NUMA case, because
> client code may depend on fact that next CPU is never less than current.
> This doesn't hold for NUMA iterators anyways...
>

Ah right, I see your point. But yes, distance-ordered walks break this
assumption.

>> Rather, shouldn't we check that using this API does
>> yield CPUs of increasing NUMA distance?
>> 
>> Something like
>> 
>>         for_each_node(node) {
>>                 unsigned int prev_cpu, hop = 0;
>> 
>>                 cpu = cpumask_first(cpumask_of_node(node));
>>                 prev_cpu = cpu;
>> 
>>                 rcu_read_lock();
>> 
>>                 /* Assert distance is monotonically increasing */
>>                 for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>>                         expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
>>                         prev_cpu = cpu;
>>                 }
>> 
>>                 rcu_read_unlock();
>>         }
>
> Your version of the test looks more straightforward. I need to think
> for more, but it looks like I can take it in v3.
>

I realized I only wrote half the relevant code - comparing node IDs is
meaningless, I meant to compare distances as we walk through the
CPUs... I tested the below against a few NUMA topologies and it seems to be
sane:

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 6becb044a66f0..8f8512d139d58 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -174,11 +174,23 @@ __check_eq_str(const char *srcfile, unsigned int line,
 	return eq;
 }
 
-#define __expect_eq(suffix, ...)					\
+static bool __init
+__check_ge_uint(const char *srcfile, unsigned int line,
+		const unsigned int a, unsigned int b)
+{
+	if (a < b) {
+		pr_err("[%s:%u] expected a(%u) >= b(%u)\n",
+			srcfile, line, a, b);
+		return false;
+	}
+	return true;
+}
+
+#define __expect_op(op, suffix, ...)					\
 	({								\
 		int result = 0;						\
 		total_tests++;						\
-		if (!__check_eq_ ## suffix(__FILE__, __LINE__,		\
+		if (!__check_## op ## _ ## suffix(__FILE__, __LINE__,	\
 					   ##__VA_ARGS__)) {		\
 			failed_tests++;					\
 			result = 1;					\
@@ -186,6 +198,9 @@ __check_eq_str(const char *srcfile, unsigned int line,
 		result;							\
 	})
 
+#define __expect_eq(suffix, ...) __expect_op(eq, suffix, ##__VA_ARGS__)
+#define __expect_ge(suffix, ...) __expect_op(ge, suffix, ##__VA_ARGS__)
+
 #define expect_eq_uint(...)		__expect_eq(uint, ##__VA_ARGS__)
 #define expect_eq_bitmap(...)		__expect_eq(bitmap, ##__VA_ARGS__)
 #define expect_eq_pbl(...)		__expect_eq(pbl, ##__VA_ARGS__)
@@ -193,6 +208,8 @@ __check_eq_str(const char *srcfile, unsigned int line,
 #define expect_eq_clump8(...)		__expect_eq(clump8, ##__VA_ARGS__)
 #define expect_eq_str(...)		__expect_eq(str, ##__VA_ARGS__)
 
+#define expect_ge_uint(...)		__expect_ge(uint, ##__VA_ARGS__)
+
 static void __init test_zero_clear(void)
 {
 	DECLARE_BITMAP(bmap, 1024);
@@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
 {
 	unsigned int cpu, node;
 
-	for (node = 0; node < sched_domains_numa_levels; node++) {
-		unsigned int hop, c = 0;
+	for_each_node(node) {
+		unsigned int start_cpu, prev_dist, hop = 0;
+
+		cpu = cpumask_first(cpumask_of_node(node));
+		prev_dist = node_distance(node, node);
+		start_cpu = cpu;
 
 		rcu_read_lock();
-		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
-			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+
+		/* Assert distance is monotonically increasing */
+		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
+			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));
+
+			expect_ge_uint(dist, prev_dist);
+			prev_dist = dist;
+		}
+
 		rcu_read_unlock();
 	}
 }

Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()

Posted by Yury Norov 2 years, 9 months ago

> I realized I only wrote half the relevant code - comparing node IDs is
> meaningless, I meant to compare distances as we walk through the
> CPUs... I tested the below against a few NUMA topologies and it seems to be
> sane:
> 
> @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
>  {
>  	unsigned int cpu, node;
>  
> -	for (node = 0; node < sched_domains_numa_levels; node++) {
> -		unsigned int hop, c = 0;
> +	for_each_node(node) {
> +		unsigned int start_cpu, prev_dist, hop = 0;
> +
> +		cpu = cpumask_first(cpumask_of_node(node));
> +		prev_dist = node_distance(node, node);
> +		start_cpu = cpu;
>  
>  		rcu_read_lock();
> -		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> -			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> +
> +		/* Assert distance is monotonically increasing */
> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
> +			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));

Interestingly, node_distance() is an arch-specific function. Generic
implementation is quite useless:

 #define node_distance(from,to)  ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Particularly, arm64 takes the above. With node_distance() implemented
like that, we can barely test something...

Taking that into the account, I think it's better to test iterator against
cpumask_local_spread(), like in v2. I'll add a comment about that in v3.

> +
> +			expect_ge_uint(dist, prev_dist);
> +			prev_dist = dist;
> +		}
> +
>  		rcu_read_unlock();
>  	}
>  }

Re: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}()

Posted by Valentin Schneider 2 years, 9 months ago

On 26/04/23 13:51, Yury Norov wrote:
>> I realized I only wrote half the relevant code - comparing node IDs is
>> meaningless, I meant to compare distances as we walk through the
>> CPUs... I tested the below against a few NUMA topologies and it seems to be
>> sane:
>> 
>> @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
>>  {
>>  	unsigned int cpu, node;
>>  
>> -	for (node = 0; node < sched_domains_numa_levels; node++) {
>> -		unsigned int hop, c = 0;
>> +	for_each_node(node) {
>> +		unsigned int start_cpu, prev_dist, hop = 0;
>> +
>> +		cpu = cpumask_first(cpumask_of_node(node));
>> +		prev_dist = node_distance(node, node);
>> +		start_cpu = cpu;
>>  
>>  		rcu_read_lock();
>> -		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
>> -			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
>> +
>> +		/* Assert distance is monotonically increasing */
>> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>> +			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));
>
> Interestingly, node_distance() is an arch-specific function. Generic
> implementation is quite useless:
>
>  #define node_distance(from,to)  ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>
> Particularly, arm64 takes the above. With node_distance() implemented
> like that, we can barely test something...
>

riscv and arm64 rely on drivers/base/arch_numa.c to provide
__node_distance() (cf. CONFIG_GENERIC_ARCH_NUMA).

x86, sparc, powerpc and ia64 define __node_distance()
loongarch and mips define their own node_distance().

So all of those archs will have a usable node_distance(), the others won't
and that means the scheduler can't do anything about it - the scheduler
relies on node_distance() to understand the topolgoy!