[PATCH v7 7/7] spapr_numa.c: handle auto NUMA node with no distance info

Daniel Henrique Barboza posted 7 patches 4 years, 4 months ago
Maintainers: Greg Kurz <groug@kaod.org>, David Gibson <david@gibson.dropbear.id.au>
There is a newer version of this series
[PATCH v7 7/7] spapr_numa.c: handle auto NUMA node with no distance info
Posted by Daniel Henrique Barboza 4 years, 4 months ago
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
for the pSeries machine if none was specified, but without node distance
information for the single node created. This added node is also not
accounted for in numa_state->num_nodes, which returns zero.

NUMA FORM1 affinity code didn't rely on numa_state information to do its
job, but FORM2 does. As is now, this is the result of a pSeries guest
with NUMA FORM2 affinity when no NUMA nodes is specified:

$ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB
node 0 free: 15681 MB No distance information available.

This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're
always expecting at least one NUMA node, and we're going to enforce that
the local distance (the distance to the node to itself) is always 10.
This allows for the proper creation of the NUMA distance tables, fixing
the output of 'numactl -H' in the guest:

$ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB
node 0 free: 15685 MB node distances: node   0
  0:  10

CC: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---

Igor,

CCing you as a FYI. I wasn't sure whether there is a reason for
numa_complete_configuration() not adding distance info an update 'num_nodes'
for the auto-generated NUMA node, I decided to handle this case in
pseries side instead.



 hw/ppc/spapr_numa.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 659513b405..d8caf5f6bd 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -499,7 +499,7 @@ static void spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
 {
     MachineState *ms = MACHINE(spapr);
     NodeInfo *numa_info = ms->numa_state->nodes;
-    int nb_numa_nodes = ms->numa_state->num_nodes;
+    int nb_numa_nodes = ms->numa_state->num_nodes ?: 1;
     int distance_table_entries = nb_numa_nodes * nb_numa_nodes;
     g_autofree uint32_t *lookup_index_table = NULL;
     g_autofree uint32_t *distance_table = NULL;
@@ -539,6 +539,17 @@ static void spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
 
     for (src = 0; src < nb_numa_nodes; src++) {
         for (dst = 0; dst < nb_numa_nodes; dst++) {
+            /*
+             * We need to be explicit with the local distance
+             * value to cover the case where the user didn't added any
+             * NUMA nodes, but QEMU adds the default NUMA node without
+             * adding the numa_info to retrieve the info from.
+             */
+            if (src == dst) {
+                node_distances[i++] = 10;
+                continue;
+            }
+
             node_distances[i++] = numa_info[src].distance[dst];
         }
     }
-- 
2.31.1


Re: [PATCH v7 7/7] spapr_numa.c: handle auto NUMA node with no distance info
Posted by Igor Mammedov 4 years, 4 months ago
On Wed, 15 Sep 2021 22:30:04 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
> for the pSeries machine if none was specified, but without node distance
> information for the single node created.

distance is optional feature, hence generic auto create magic doesn't
do anything with it (it does bare minimum for memhotplug to work).
I'd like to drop auto-generated node altogether and ask user to explicitly
provide needed -numa options (now with deprecation it's possible) if it's required.

> This added node is also not
> accounted for in numa_state->num_nodes, which returns zero.

that's probably a bug, parse_numa_node() should always increments on success,
can you check why it doesn't happen in your case?

> NUMA FORM1 affinity code didn't rely on numa_state information to do its
> job, but FORM2 does. As is now, this is the result of a pSeries guest
> with NUMA FORM2 affinity when no NUMA nodes is specified:
> 
> $ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB
> node 0 free: 15681 MB No distance information available.
> 
> This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're
> always expecting at least one NUMA node, and we're going to enforce that
> the local distance (the distance to the node to itself) is always 10.
> This allows for the proper creation of the NUMA distance tables, fixing
> the output of 'numactl -H' in the guest:
> 
> $ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB
> node 0 free: 15685 MB node distances: node   0
>   0:  10
> 
> CC: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
> 
> Igor,
> 
> CCing you as a FYI. I wasn't sure whether there is a reason for
> numa_complete_configuration() not adding distance info an update 'num_nodes'
> for the auto-generated NUMA node, I decided to handle this case in
> pseries side instead.
> 
> 
> 
>  hw/ppc/spapr_numa.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index 659513b405..d8caf5f6bd 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -499,7 +499,7 @@ static void spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
>  {
>      MachineState *ms = MACHINE(spapr);
>      NodeInfo *numa_info = ms->numa_state->nodes;
> -    int nb_numa_nodes = ms->numa_state->num_nodes;
> +    int nb_numa_nodes = ms->numa_state->num_nodes ?: 1;
>      int distance_table_entries = nb_numa_nodes * nb_numa_nodes;
>      g_autofree uint32_t *lookup_index_table = NULL;
>      g_autofree uint32_t *distance_table = NULL;
> @@ -539,6 +539,17 @@ static void spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
>  
>      for (src = 0; src < nb_numa_nodes; src++) {
>          for (dst = 0; dst < nb_numa_nodes; dst++) {
> +            /*
> +             * We need to be explicit with the local distance
> +             * value to cover the case where the user didn't added any
> +             * NUMA nodes, but QEMU adds the default NUMA node without
> +             * adding the numa_info to retrieve the info from.
> +             */
> +            if (src == dst) {
> +                node_distances[i++] = 10;
> +                continue;
> +            }
> +
>              node_distances[i++] = numa_info[src].distance[dst];
>          }
>      }


Re: [PATCH v7 7/7] spapr_numa.c: handle auto NUMA node with no distance info
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 9/17/21 05:25, Igor Mammedov wrote:
> On Wed, 15 Sep 2021 22:30:04 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>> numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
>> for the pSeries machine if none was specified, but without node distance
>> information for the single node created.
> 
> distance is optional feature, hence generic auto create magic doesn't
> do anything with it (it does bare minimum for memhotplug to work).
> I'd like to drop auto-generated node altogether and ask user to explicitly
> provide needed -numa options (now with deprecation it's possible) if it's required.

This means that there's a need to handle the case of local distance for a
single NUMA node. Thanks for confirming that.

> 
>> This added node is also not
>> accounted for in numa_state->num_nodes, which returns zero.
> 
> that's probably a bug, parse_numa_node() should always increments on success,
> can you check why it doesn't happen in your case?

After checking the problem further I realized that this isn't true.
numa_state->num_nodes is correctly reporting size = 1 in this case as
well.

The problem is just the absence of node distance information for the
auto-generated NUMA node. I'll fix both the code and commit message
in the next version.


Thanks,


Daniel

> 
>> NUMA FORM1 affinity code didn't rely on numa_state information to do its
>> job, but FORM2 does. As is now, this is the result of a pSeries guest
>> with NUMA FORM2 affinity when no NUMA nodes is specified:
>>
>> $ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB
>> node 0 free: 15681 MB No distance information available.
>>
>> This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're
>> always expecting at least one NUMA node, and we're going to enforce that
>> the local distance (the distance to the node to itself) is always 10.
>> This allows for the proper creation of the NUMA distance tables, fixing
>> the output of 'numactl -H' in the guest:
>>
>> $ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB
>> node 0 free: 15685 MB node distances: node   0
>>    0:  10
>>
>> CC: Igor Mammedov <imammedo@redhat.com>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>
>> Igor,
>>
>> CCing you as a FYI. I wasn't sure whether there is a reason for
>> numa_complete_configuration() not adding distance info an update 'num_nodes'
>> for the auto-generated NUMA node, I decided to handle this case in
>> pseries side instead.
>>
>>
>>
>>   hw/ppc/spapr_numa.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
>> index 659513b405..d8caf5f6bd 100644
>> --- a/hw/ppc/spapr_numa.c
>> +++ b/hw/ppc/spapr_numa.c
>> @@ -499,7 +499,7 @@ static void spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
>>   {
>>       MachineState *ms = MACHINE(spapr);
>>       NodeInfo *numa_info = ms->numa_state->nodes;
>> -    int nb_numa_nodes = ms->numa_state->num_nodes;
>> +    int nb_numa_nodes = ms->numa_state->num_nodes ?: 1;
>>       int distance_table_entries = nb_numa_nodes * nb_numa_nodes;
>>       g_autofree uint32_t *lookup_index_table = NULL;
>>       g_autofree uint32_t *distance_table = NULL;
>> @@ -539,6 +539,17 @@ static void spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
>>   
>>       for (src = 0; src < nb_numa_nodes; src++) {
>>           for (dst = 0; dst < nb_numa_nodes; dst++) {
>> +            /*
>> +             * We need to be explicit with the local distance
>> +             * value to cover the case where the user didn't added any
>> +             * NUMA nodes, but QEMU adds the default NUMA node without
>> +             * adding the numa_info to retrieve the info from.
>> +             */
>> +            if (src == dst) {
>> +                node_distances[i++] = 10;
>> +                continue;
>> +            }
>> +
>>               node_distances[i++] = numa_info[src].distance[dst];
>>           }
>>       }
>