[PATCH v3] hw/riscv: qemu crash when NUMA nodes exceed available CPUs

Yin Wang posted 1 patch 11 months, 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230512080346.1272337-1-yin.wang@intel.com
Maintainers: Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Bin Meng <bin.meng@windriver.com>, Weiwei Li <liweiwei@iscas.ac.cn>, Daniel Henrique Barboza <dbarboza@ventanamicro.com>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
There is a newer version of this series
hw/riscv/numa.c | 6 ++++++
1 file changed, 6 insertions(+)
[PATCH v3] hw/riscv: qemu crash when NUMA nodes exceed available CPUs
Posted by Yin Wang 11 months, 4 weeks ago
Command "qemu-system-riscv64 -machine virt
-m 2G -smp 1 -numa node,mem=1G -numa node,mem=1G"
would trigger this problem.Backtrace with:
 #0  0x0000555555b5b1a4 in riscv_numa_get_default_cpu_node_id  at ../hw/riscv/numa.c:211
 #1  0x00005555558ce510 in machine_numa_finish_cpu_init  at ../hw/core/machine.c:1230
 #2  0x00005555558ce9d3 in machine_run_board_init  at ../hw/core/machine.c:1346
 #3  0x0000555555aaedc3 in qemu_init_board  at ../softmmu/vl.c:2513
 #4  0x0000555555aaf064 in qmp_x_exit_preconfig  at ../softmmu/vl.c:2609
 #5  0x0000555555ab1916 in qemu_init  at ../softmmu/vl.c:3617
 #6  0x000055555585463b in main  at ../softmmu/main.c:47
This commit fixes the issue by adding parameter checks.

Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Signed-off-by: Yin Wang <yin.wang@intel.com>
---
 hw/riscv/numa.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
index 4720102561..a1bb312cd0 100644
--- a/hw/riscv/numa.c
+++ b/hw/riscv/numa.c
@@ -207,6 +207,12 @@ int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
     int64_t nidx = 0;
 
+    if (ms->numa_state->num_nodes > ms->smp.cpus) {
+        error_report("Number of CPUs used by NUMA nodes (%d)"
+                     " cannot exceed the number of available CPUs (%d).",
+                     ms->numa_state->num_nodes, ms->smp.max_cpus);
+        exit(EXIT_FAILURE);
+    }
     if (ms->numa_state->num_nodes) {
         nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
         if (ms->numa_state->num_nodes <= nidx) {
-- 
2.34.1
Re: [PATCH v3] hw/riscv: qemu crash when NUMA nodes exceed available CPUs
Posted by Daniel Henrique Barboza 11 months, 4 weeks ago

On 5/12/23 05:03, Yin Wang wrote:
> Command "qemu-system-riscv64 -machine virt
> -m 2G -smp 1 -numa node,mem=1G -numa node,mem=1G"
> would trigger this problem.Backtrace with:
>   #0  0x0000555555b5b1a4 in riscv_numa_get_default_cpu_node_id  at ../hw/riscv/numa.c:211
>   #1  0x00005555558ce510 in machine_numa_finish_cpu_init  at ../hw/core/machine.c:1230
>   #2  0x00005555558ce9d3 in machine_run_board_init  at ../hw/core/machine.c:1346
>   #3  0x0000555555aaedc3 in qemu_init_board  at ../softmmu/vl.c:2513
>   #4  0x0000555555aaf064 in qmp_x_exit_preconfig  at ../softmmu/vl.c:2609
>   #5  0x0000555555ab1916 in qemu_init  at ../softmmu/vl.c:3617
>   #6  0x000055555585463b in main  at ../softmmu/main.c:47
> This commit fixes the issue by adding parameter checks.
> 
> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
> Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
> Signed-off-by: Yin Wang <yin.wang@intel.com>
> ---
>   hw/riscv/numa.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
> index 4720102561..a1bb312cd0 100644
> --- a/hw/riscv/numa.c
> +++ b/hw/riscv/numa.c
> @@ -207,6 +207,12 @@ int64_t riscv_numa_get_default_cpu_node_id(const MachineState *ms, int idx)
>   {
>       int64_t nidx = 0;
>   
> +    if (ms->numa_state->num_nodes > ms->smp.cpus) {
> +        error_report("Number of CPUs used by NUMA nodes (%d)"
> +                     " cannot exceed the number of available CPUs (%d).",
> +                     ms->numa_state->num_nodes, ms->smp.max_cpus);
> +        exit(EXIT_FAILURE);
> +    }


IMO you should just say

"Number of NUMA nodes (%d) cannot exceed the number of available CPUs (%d)."

First because "ms->numa_state->num_nodes" is the number of NUMA nodes, not the number
of CPUs used by NUMA nodes (which can be higher). Second it goes right into the point:
we do not support cpu-less NUMA nodes in the 'virt' machine.


Assuming you agree with this change:


Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>

>       if (ms->numa_state->num_nodes) {
>           nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
>           if (ms->numa_state->num_nodes <= nidx) {