[PATCH] spapr_numa.c: fix FORM1 distance-less nodes

Daniel Henrique Barboza posted 1 patch 2 years, 6 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20211109183553.1844689-1-danielhb413@gmail.com
Maintainers: Greg Kurz <groug@kaod.org>, Daniel Henrique Barboza <danielhb413@gmail.com>, "Cédric Le Goater" <clg@kaod.org>, David Gibson <david@gibson.dropbear.id.au>
There is a newer version of this series
hw/ppc/spapr_numa.c | 62 ++++++++++++++++++++++-----------------------
1 file changed, 31 insertions(+), 31 deletions(-)
[PATCH] spapr_numa.c: fix FORM1 distance-less nodes
Posted by Daniel Henrique Barboza 2 years, 6 months ago
Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. This happens when QEMU adds a default
NUMA node and when the user adds NUMA nodes without specifying the
distances.

During the discussions of the forementioned patch [1] it was found that
FORM1 guests were behaving in a strange way in the same scenario, with
the kernel seeing the distances between the nodes as '160', as we can
see in this example with 4 NUMA nodes without distance information:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node   0   1   2   3
  0:  10  160  160  160
  1:  160  10  160  160
  2:  160  160  10  160
  3:  160  160  160  10

Turns out that we have the same problem with FORM1 guests - we are
calculating associativity domain using zeroed values. And as it also
turns out, the solution from 71e6fae3a99 applies for FORM1 as well.

This patch creates a wrapper called 'get_numa_distance' that contains
the logic used in FORM2 to define node distances when this information
is absent. This helper is then used in all places where we need to read
distance information from machine_state->numa_state->nodes. That way
we'll guarantee that the NUMA node distance is always being curated
before being used.

After this patch, the FORM1 guest mentioned above will have the
following topology:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node   0   1   2   3
  0:  10  20  20  20
  1:  20  10  20  20
  2:  20  20  10  20
  3:  20  20  20  10

This is compatible with what FORM2 guests and other archs do in this
case.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html

CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
CC: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_numa.c | 62 ++++++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 56ab2a5fb6..e9ef7e7646 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -66,16 +66,41 @@ static const uint32_t *get_associativity(SpaprMachineState *spapr, int node_id)
     return spapr->FORM1_assoc_array[node_id];
 }
 
+/*
+ * Wrapper that returns node distance from ms->numa_state->nodes
+ * after handling edge cases where the distance might be absent.
+ */
+static int get_numa_distance(MachineState *ms, int src, int dst)
+{
+    NodeInfo *numa_info = ms->numa_state->nodes;
+    int ret = numa_info[src].distance[dst];
+
+    if (ret != 0) {
+        return ret;
+    }
+
+    /*
+     * In case QEMU adds a default NUMA single node when the user
+     * did not add any, or where the user did not supply distances,
+     * the distance will be absent (zero). Return local/remote
+     * distance in this case.
+     */
+    if (src == dst) {
+        return NUMA_DISTANCE_MIN;
+    }
+
+    return NUMA_DISTANCE_DEFAULT;
+}
+
 static bool spapr_numa_is_symmetrical(MachineState *ms)
 {
-    int src, dst;
     int nb_numa_nodes = ms->numa_state->num_nodes;
-    NodeInfo *numa_info = ms->numa_state->nodes;
+    int src, dst;
 
     for (src = 0; src < nb_numa_nodes; src++) {
         for (dst = src; dst < nb_numa_nodes; dst++) {
-            if (numa_info[src].distance[dst] !=
-                numa_info[dst].distance[src]) {
+            if (get_numa_distance(ms, src, dst) !=
+                get_numa_distance(ms, dst, src)) {
                 return false;
             }
         }
@@ -133,7 +158,6 @@ static uint8_t spapr_numa_get_numa_level(uint8_t distance)
 static void spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
 {
     MachineState *ms = MACHINE(spapr);
-    NodeInfo *numa_info = ms->numa_state->nodes;
     int nb_numa_nodes = ms->numa_state->num_nodes;
     int src, dst, i, j;
 
@@ -170,7 +194,7 @@ static void spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
              * The PPC kernel expects the associativity domains of node 0 to
              * be always 0, and this algorithm will grant that by default.
              */
-            uint8_t distance = numa_info[src].distance[dst];
+            uint8_t distance = get_numa_distance(ms, src, dst);
             uint8_t n_level = spapr_numa_get_numa_level(distance);
             uint32_t assoc_src;
 
@@ -498,7 +522,6 @@ static void spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
                                                void *fdt, int rtas)
 {
     MachineState *ms = MACHINE(spapr);
-    NodeInfo *numa_info = ms->numa_state->nodes;
     int nb_numa_nodes = ms->numa_state->num_nodes;
     int distance_table_entries = nb_numa_nodes * nb_numa_nodes;
     g_autofree uint32_t *lookup_index_table = NULL;
@@ -540,30 +563,7 @@ static void spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
 
     for (src = 0; src < nb_numa_nodes; src++) {
         for (dst = 0; dst < nb_numa_nodes; dst++) {
-            /*
-             * We need to be explicit with the local distance
-             * value to cover the case where the user didn't added any
-             * NUMA nodes, but QEMU adds the default NUMA node without
-             * adding the numa_info to retrieve distance info from.
-             */
-            distance_table[i] = numa_info[src].distance[dst];
-            if (distance_table[i] == 0) {
-                /*
-                 * In case QEMU adds a default NUMA single node when the user
-                 * did not add any, or where the user did not supply distances,
-                 * the value will be 0 here. Populate the table with a fallback
-                 * simple local / remote distance.
-                 */
-                if (src == dst) {
-                    distance_table[i] = NUMA_DISTANCE_MIN;
-                } else {
-                    distance_table[i] = numa_info[src].distance[dst];
-                    if (distance_table[i] < NUMA_DISTANCE_MIN) {
-                        distance_table[i] = NUMA_DISTANCE_DEFAULT;
-                    }
-                }
-            }
-            i++;
+            distance_table[i++] = get_numa_distance(ms, src, dst);
         }
     }
 
-- 
2.31.1


Re: [PATCH] spapr_numa.c: fix FORM1 distance-less nodes
Posted by Richard Henderson 2 years, 6 months ago
On 11/9/21 7:35 PM, Daniel Henrique Barboza wrote:
> Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
> nodes in which the distance info is absent in
> machine_state->numa_state->nodes. This happens when QEMU adds a default
> NUMA node and when the user adds NUMA nodes without specifying the
> distances.
> 
> During the discussions of the forementioned patch [1] it was found that
> FORM1 guests were behaving in a strange way in the same scenario, with
> the kernel seeing the distances between the nodes as '160', as we can
> see in this example with 4 NUMA nodes without distance information:
> 
> $ numactl -H
> available: 4 nodes (0-3)
> (...)
> node distances:
> node   0   1   2   3
>    0:  10  160  160  160
>    1:  160  10  160  160
>    2:  160  160  10  160
>    3:  160  160  160  10
> 
> Turns out that we have the same problem with FORM1 guests - we are
> calculating associativity domain using zeroed values. And as it also
> turns out, the solution from 71e6fae3a99 applies for FORM1 as well.
> 
> This patch creates a wrapper called 'get_numa_distance' that contains
> the logic used in FORM2 to define node distances when this information
> is absent. This helper is then used in all places where we need to read
> distance information from machine_state->numa_state->nodes. That way
> we'll guarantee that the NUMA node distance is always being curated
> before being used.
> 
> After this patch, the FORM1 guest mentioned above will have the
> following topology:
> 
> $ numactl -H
> available: 4 nodes (0-3)
> (...)
> node distances:
> node   0   1   2   3
>    0:  10  20  20  20
>    1:  20  10  20  20
>    2:  20  20  10  20
>    3:  20  20  20  10
> 
> This is compatible with what FORM2 guests and other archs do in this
> case.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html
> 
> CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> CC: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>   hw/ppc/spapr_numa.c | 62 ++++++++++++++++++++++-----------------------
>   1 file changed, 31 insertions(+), 31 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

Re: [PATCH] spapr_numa.c: fix FORM1 distance-less nodes
Posted by Daniel Henrique Barboza 2 years, 6 months ago

On 11/10/21 06:58, Richard Henderson wrote:
> On 11/9/21 7:35 PM, Daniel Henrique Barboza wrote:
>> Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
>> nodes in which the distance info is absent in
>> machine_state->numa_state->nodes. This happens when QEMU adds a default
>> NUMA node and when the user adds NUMA nodes without specifying the
>> distances.
>>
>> During the discussions of the forementioned patch [1] it was found that
>> FORM1 guests were behaving in a strange way in the same scenario, with
>> the kernel seeing the distances between the nodes as '160', as we can
>> see in this example with 4 NUMA nodes without distance information:
>>
>> $ numactl -H
>> available: 4 nodes (0-3)
>> (...)
>> node distances:
>> node   0   1   2   3
>>    0:  10  160  160  160
>>    1:  160  10  160  160
>>    2:  160  160  10  160
>>    3:  160  160  160  10
>>
>> Turns out that we have the same problem with FORM1 guests - we are
>> calculating associativity domain using zeroed values. And as it also
>> turns out, the solution from 71e6fae3a99 applies for FORM1 as well.
>>
>> This patch creates a wrapper called 'get_numa_distance' that contains
>> the logic used in FORM2 to define node distances when this information
>> is absent. This helper is then used in all places where we need to read
>> distance information from machine_state->numa_state->nodes. That way
>> we'll guarantee that the NUMA node distance is always being curated
>> before being used.
>>
>> After this patch, the FORM1 guest mentioned above will have the
>> following topology:
>>
>> $ numactl -H
>> available: 4 nodes (0-3)
>> (...)
>> node distances:
>> node   0   1   2   3
>>    0:  10  20  20  20
>>    1:  20  10  20  20
>>    2:  20  20  10  20
>>    3:  20  20  20  10
>>
>> This is compatible with what FORM2 guests and other archs do in this
>> case.
>>
>> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html
>>
>> CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> CC: Nicholas Piggin <npiggin@gmail.com>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   hw/ppc/spapr_numa.c | 62 ++++++++++++++++++++++-----------------------
>>   1 file changed, 31 insertions(+), 31 deletions(-)
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Thanks!

I forgot to add the "Fixes:" tag at the end of the commit msg though, so I
ended up sending a v2 adding it. Your r-b is kept in the v2.


Daniel


> 
> 
> r~