From nobody Tue Apr 30 09:21:21 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linux.ibm.com ARC-Seal: i=1; a=rsa-sha256; t=1564562504; cv=none; d=zoho.com; s=zohoarc; b=I6Co5hX/qkoCUL4V4v/J7ba7Hw4J0xmYNcH84Whvx1Q6lhH58ZzatABXyglTUJ79fajjqvhJycck9rstHIt3t8l/hXytbmHg/zeFC4kS0UMT/Zai19k8xgo9WYlDr4zQPJLbaPCfa0JwFk27ExWkU0Oe+OJ2ya6svZSPYGHaMPQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1564562504; h=Content-Transfer-Encoding:Cc:Date:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To:ARC-Authentication-Results; bh=+Fu3j7OszI2V3e6Vc9UkkY4Ag+Fp5dulwHmT0Ynqvik=; b=DisekT7JZPNx16hqlx+9zFM4ELs10n5NJcUAS762hBPLCQGZ9hvh5AQ3HgARiXZsnRQzZ/fPbDR1dGa5RZrzVcjAqm4EgdrZV0m46BXGlW+N9A4wrUSAlfZ0OWiQbKAPB6oe0NyWucq/wZuRtOjuDfGxZcDnoUy2cnNBvdP736g= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (209.51.188.17 [209.51.188.17]) by mx.zohomail.com with SMTPS id 1564562504552678.9740293838528; Wed, 31 Jul 2019 01:41:44 -0700 (PDT) Received: from localhost ([::1]:38832 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hskB0-0003zR-4k for importer@patchew.org; Wed, 31 Jul 2019 04:41:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45234) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hskAO-00037C-H7 for qemu-devel@nongnu.org; Wed, 31 Jul 2019 04:41:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hskAL-0002g0-Od for qemu-devel@nongnu.org; Wed, 31 Jul 2019 04:41:00 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:5526) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hskAL-0002fd-HC for qemu-devel@nongnu.org; Wed, 31 Jul 2019 04:40:57 -0400 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6V8biYi011710 for ; Wed, 31 Jul 2019 04:40:56 -0400 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2u37n5040v-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 31 Jul 2019 04:40:56 -0400 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 31 Jul 2019 09:40:49 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 31 Jul 2019 09:40:44 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x6V8ehCC50790460 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 31 Jul 2019 08:40:43 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 768AC4C040; Wed, 31 Jul 2019 08:40:43 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8ED864C064; Wed, 31 Jul 2019 08:40:42 +0000 (GMT) Received: from ozlabs.au.ibm.com (unknown [9.192.253.14]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 31 Jul 2019 08:40:42 +0000 (GMT) Received: from volution.fritz.box (unknown [9.102.40.183]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.au.ibm.com (Postfix) with ESMTPSA id E55F6A0224; Wed, 31 Jul 2019 18:40:40 +1000 (AEST) From: Daniel Black To: qemu-devel@nongnu.org Date: Wed, 31 Jul 2019 18:40:33 +1000 X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 x-cbid: 19073108-0020-0000-0000-00000358E5D0 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19073108-0021-0000-0000-000021ACEE49 Message-Id: <20190731084034.9498-1-daniel@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-07-31_04:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1907310093 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-Received-From: 148.163.156.1 Subject: [Qemu-devel] [PATCH] numa: numa nodeid need not be sequential X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Radoslaw Biernacki , Eduardo Habkost , "Michael S. Tsirkin" , Leif Lindholm , Shannon Zhao , "open list:ARM TCG CPUs" , "open list:sPAPR" , Daniel Black , Paolo Bonzini , Igor Mammedov , David Gibson , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" Replace all nodeid assumptions with lookups in the numa_info array and remove aspects that assume a sequential numbering of nodes. This enables non-sequential NUMA node number topoligies to be created. Default assignments of CPU->nodeid (get_default_cpu_node_id) now return a nodeid from the numa_info array. x86 will use the node is as the Proximity Domain (which the linux kernel will map down to sequential node numbers). In Linux kernel output look at the RAT: and PXM: references in the kernel early boot. Small enhancements where made to error messages to be more explicit about errors in node specification. Signed-off-by: Daniel Black --- Below is a TCG test script that covers ppc64le/x86_64/aarch64 architecutes: Uncomment run* lines as needed. Manually complete steps at end to verify: Caution, brutal killalls contained within. #!/bin/bash set -x -v QEMUHOME=3D${HOME}/repos/qemu/ # optional but make it easy to install/run numactl --hardware #ALPINE_NET=3D"" ALPINE_NET=3D"ip=3Ddhcp alpine_repo=3Dhttp://dl-cdn.alpinelinux.org/alpine/= edge/main/" ALPINE_HOME=3D${HOME}/repos/alpine/alpine-netboot-3.10.1- # x86 / armv7 - no CONFIG_NUMA=3Dy support in kernel # Kernel configs: https://git.alpinelinux.org/aports/tree/main/linux-vanill= a/ # s390x - no numa support in QEMU for ARCH in x86_64 aarch64 ppc64le do if [ ! -d ${ALPINE_HOME}${ARCH} ] then mkdir ${ALPINE_HOME}${ARCH} wget http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/${ARCH}/alpine-= netboot-3.10.1-${ARCH}.tar.gz -O - | tar -zxf - -C ${ALPINE_HOME}${ARCH} fi done if [ ! -x ${ALPINE_HOME}i386 ] then ln -s ${ALPINE_HOME}x86 ${ALPINE_HOME}i386 fi if [ ! -x ${ALPINE_HOME}arm ] then ln -s ${ALPINE_HOME}armv7 ${ALPINE_HOME}arm fi if [ ! -x ${ALPINE_HOME}ppc64 ] then ln -s ${ALPINE_HOME}ppc64le ${ALPINE_HOME}ppc64 fi # Note "virtual" kernels don't have numa enabled run() { NUMA=3D$1 ARCH=3D$2 ARGS=3D$3 CONSOLE=3D$4 ${QEMUHOME}/${ARCH}-softmmu/qemu-system-${ARCH} \ ${ARGS} \ -kernel ${ALPINE_HOME}${ARCH}/boot/vmlinuz-vanilla \ -initrd ${ALPINE_HOME}${ARCH}/boot/initramfs-vanilla \ -append "${CONSOLE} ${ALPINE_NET}" \ -m 2G \ ${NUMA} echo } # This ends up as odd: # ends up with both CPUs are on same node # as 0 and 8 % 2 (nodes) are the same # in short - don't run legacy with gaps with # odd numa node numbers (like 0 and 8). run_legacy() { run "-smp 2,cores=3D3,sockets=3D2,maxcpus=3D6 \ -numa node,mem=3D1G \ -numa node,mem=3D1G,nodeid=3D8 \ -numa dist,src=3D0,dst=3D8,val=3D21" "$@" } run_memdev_implicit_core() { run "-smp cpus=3D6,maxcpus=3D8,cores=3D4,sockets=3D2 \ -object memory-backend-ram,id=3Dram0,size=3D1G \ -object memory-backend-ram,id=3Dram1,size=3D1G \ -numa node,memdev=3Dram0,nodeid=3D0 \ -numa node,memdev=3Dram1,nodeid=3D8 \ -numa dist,src=3D0,dst=3D8,val=3D21" "$@" } run_memdev_explicit_core() { run "-smp cpus=3D6,maxcpus=3D8,cores=3D4,sockets=3D2 \ -object memory-backend-ram,id=3Dram0,size=3D1G \ -object memory-backend-ram,id=3Dram1,size=3D1G \ -numa node,memdev=3Dram0,cpus=3D0-3,nodeid=3D0 \ -numa node,memdev=3Dram1,cpus=3D4-7,nodeid=3D8 \ -numa dist,src=3D0,dst=3D8,val=3D21" "$@" } for arch in x86_64 ppc64 aarch64 s390x; do killall qemu-system-$arch; done killall vncviewer # i386 Alpine kernels don't have NUMA #run_memdev_implicit_core i386 "-machine pc -nographic" console=3DttyS0 # armv7 kernel's don't have NUMA #run_legacy arm "-machine virt -cpu cortex-a15 -nographic" console=3DttyAMA0 # GOOD run_legacy x86_64 "-machine pc -nographic" console=3DttyS0 run_memdev_implicit_core x86_64 "-machine pc -nographic" console=3DttyS0 run_memdev_explicit_core x86_64 "-machine pc -nographic" console=3DttyS0 # GOOD #run_legacy aarch64 "-machine virt -cpu cortex-a57 -nographic" console=3Dtt= yAMA0 #run_memdev_implicit_core aarch64 "-machine virt -cpu cortex-a57 -nographi= c" console=3DttyAMA0 #run_memdev_explicit_core aarch64 "-machine virt -cpu cortex-a57 -nographi= c" console=3DttyAMA0 # PPC not doing numa distance (not a regression) #(sleep 1; vncviewer :0) & # GOOD # run_legacy ppc64 "-machine pseries -cpu POWER9 -display vnc=3D:0" "numa= =3Ddebug" # run_memdev_implicit_core ppc64 "-machine pseries -cpu POWER9 -display vnc= =3D:0" "numa=3Ddebug" # run_memdev_explicit_core ppc64 "-machine pseries -cpu POWER9 -display vnc= =3D:0" "numa=3Ddebug" # ON P8 ppc64le host: # run_memdev_implicit_core ppc64 "-machine pseries -cpu host -accel kvm -di= splay vnc=3D:0" "numa=3Ddebug" # Couldn't boot Alpine ARM kernel on this machine type: # arm sbsa ref - appears to be a BMC so not really a numa target? # seems ok looking at the results of sbsa_ref_get_default_cpu_node_id howev= er it display no # output when booting # run_legacy aarch64 "-machine sbsa-ref -nographic" console=3DttyAMA0 # Then run: # sh -c 'apk add numactl-tools && numactl --hardware' # # alternately examine results in: # ls -la /sys/devices/system/node/node*/cpu* # more /sys/devices/system/node/node*/distance # # x86 node numbers are renumbered by kernel. To view # acpi mappings: # dmesg | grep PXM: hw/acpi/aml-build.c | 37 +++++++++++++++++++---- hw/arm/boot.c | 2 +- hw/arm/sbsa-ref.c | 6 ++-- hw/arm/virt-acpi-build.c | 4 +-- hw/arm/virt.c | 6 ++-- hw/core/numa.c | 63 +++++++++++++++++----------------------- hw/i386/acpi-build.c | 10 ++++--- hw/i386/pc.c | 2 +- hw/ppc/spapr.c | 9 +++--- include/sysemu/numa.h | 1 + 10 files changed, 79 insertions(+), 61 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 555c24f21d..4abfac61aa 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1722,22 +1722,47 @@ void build_srat_memory(AcpiSratMemoryAffinity *numa= mem, uint64_t base, numamem->range_length =3D cpu_to_le64(len); } =20 +static int find_numa_node(int node) +{ + int i; + for (i =3D 0; i < nb_numa_nodes && numa_info[i].nodeid !=3D node; i++)= { + /* empty */ + } + return i; +} /* * ACPI spec 5.2.17 System Locality Distance Information Table * (Revision 2.0 or later) */ void build_slit(GArray *table_data, BIOSLinker *linker) { - int slit_start, i, j; + int slit_start, i, j, src, dst, largest; slit_start =3D table_data->len; =20 acpi_data_push(table_data, sizeof(AcpiTableHeader)); =20 - build_append_int_noprefix(table_data, nb_numa_nodes, 8); - for (i =3D 0; i < nb_numa_nodes; i++) { - for (j =3D 0; j < nb_numa_nodes; j++) { - assert(numa_info[i].distance[j]); - build_append_int_noprefix(table_data, numa_info[i].distance[j]= , 1); + for (largest =3D 0, i =3D 0; i < nb_numa_nodes; i++) + if (largest < numa_info[i].nodeid) { + largest =3D numa_info[i].nodeid; + } + + /* number of entries is largest + 1 as nodes start at 0 */ + build_append_int_noprefix(table_data, largest + 1, 8); + + for (i =3D 0; i <=3D largest; i++) { + src =3D find_numa_node(i); + for (j =3D 0; j <=3D largest; j++) { + dst =3D find_numa_node(j); + + if (dst =3D=3D nb_numa_nodes || src =3D=3D nb_numa_nodes) { + /* 255 is unreachable. Linux expects 10 in self-maps entri= es */ + build_append_int_noprefix(table_data, + i =3D=3D j ? NUMA_DISTANCE_MIN := 255, 1); + } else { + assert(numa_info[src].distance[dst]); + build_append_int_noprefix(table_data, + numa_info[src].distance[dst], 1); + } } } =20 diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 1fb24fbef2..1a68fe957f 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -603,7 +603,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_inf= o *binfo, for (i =3D 0; i < nb_numa_nodes; i++) { mem_len =3D numa_info[i].node_mem; rc =3D fdt_add_memory_node(fdt, acells, mem_base, - scells, mem_len, i); + scells, mem_len, numa_info[i].nodeid); if (rc < 0) { fprintf(stderr, "couldn't add /memory@%"PRIx64" node\n", mem_base); diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c index 9c67d5c6f9..d33c42ac11 100644 --- a/hw/arm/sbsa-ref.c +++ b/hw/arm/sbsa-ref.c @@ -165,8 +165,8 @@ static void create_fdt(SBSAMachineState *sms) for (i =3D 0; i < nb_numa_nodes; i++) { for (j =3D 0; j < nb_numa_nodes; j++) { idx =3D (i * nb_numa_nodes + j) * 3; - matrix[idx + 0] =3D cpu_to_be32(i); - matrix[idx + 1] =3D cpu_to_be32(j); + matrix[idx + 0] =3D cpu_to_be32(numa_info[i].nodeid); + matrix[idx + 1] =3D cpu_to_be32(numa_info[j].nodeid); matrix[idx + 2] =3D cpu_to_be32(numa_info[i].distance[j]); } } @@ -761,7 +761,7 @@ sbsa_ref_cpu_index_to_props(MachineState *ms, unsigned = cpu_index) static int64_t sbsa_ref_get_default_cpu_node_id(const MachineState *ms, int idx) { - return idx % nb_numa_nodes; + return numa_info[idx % nb_numa_nodes].nodeid; } =20 static void sbsa_ref_instance_init(Object *obj) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 0afb372769..3be79dc735 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -535,8 +535,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, Virt= MachineState *vms) for (i =3D 0; i < nb_numa_nodes; ++i) { if (numa_info[i].node_mem > 0) { numamem =3D acpi_data_push(table_data, sizeof(*numamem)); - build_srat_memory(numamem, mem_base, numa_info[i].node_mem, i, - MEM_AFFINITY_ENABLED); + build_srat_memory(numamem, mem_base, numa_info[i].node_mem, + numa_info[i].nodeid, MEM_AFFINITY_ENABLED); mem_base +=3D numa_info[i].node_mem; } } diff --git a/hw/arm/virt.c b/hw/arm/virt.c index d9496c9363..c380659bd4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -238,8 +238,8 @@ static void create_fdt(VirtMachineState *vms) for (i =3D 0; i < nb_numa_nodes; i++) { for (j =3D 0; j < nb_numa_nodes; j++) { idx =3D (i * nb_numa_nodes + j) * 3; - matrix[idx + 0] =3D cpu_to_be32(i); - matrix[idx + 1] =3D cpu_to_be32(j); + matrix[idx + 0] =3D cpu_to_be32(numa_info[i].nodeid); + matrix[idx + 1] =3D cpu_to_be32(numa_info[j].nodeid); matrix[idx + 2] =3D cpu_to_be32(numa_info[i].distance[j]); } } @@ -1844,7 +1844,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cp= u_index) =20 static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int id= x) { - return idx % nb_numa_nodes; + return numa_info[idx % nb_numa_nodes].nodeid; } =20 static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) diff --git a/hw/core/numa.c b/hw/core/numa.c index a11431483c..b3fff6d1eb 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -47,13 +47,19 @@ QemuOptsList qemu_numa_opts =3D { =20 static int have_memdevs; static int have_mem; -static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one. - * For all nodes, nodeid < max_numa_nodeid - */ int nb_numa_nodes; bool have_numa_distance; NodeInfo numa_info[MAX_NODES]; =20 +static int find_numa(uint16_t node) +{ + for (int i =3D 0; i < nb_numa_nodes; i++) { + if (numa_info[i].present && numa_info[i].nodeid =3D=3D node) { + return i; + } + } + return MAX_NODES; +} =20 static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, Error **errp) @@ -64,19 +70,15 @@ static void parse_numa_node(MachineState *ms, NumaNodeO= ptions *node, MachineClass *mc =3D MACHINE_GET_CLASS(ms); unsigned int max_cpus =3D ms->smp.max_cpus; =20 - if (node->has_nodeid) { - nodenr =3D node->nodeid; - } else { - nodenr =3D nb_numa_nodes; - } + nodenr =3D nb_numa_nodes; =20 - if (nodenr >=3D MAX_NODES) { + if (nb_numa_nodes >=3D MAX_NODES) { error_setg(errp, "Max number of NUMA nodes reached: %" PRIu16 "", nodenr); return; } =20 - if (numa_info[nodenr].present) { + if (node->has_nodeid && find_numa(node->nodeid) !=3D MAX_NODES) { error_setg(errp, "Duplicate NUMA nodeid: %" PRIu16, nodenr); return; } @@ -95,7 +97,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOpt= ions *node, return; } props =3D mc->cpu_index_to_instance_props(ms, cpus->value); - props.node_id =3D nodenr; + props.node_id =3D node->has_nodeid ? node->nodeid : nodenr; props.has_node_id =3D true; machine_set_cpu_numa_node(ms, &props, &err); if (err) { @@ -132,24 +134,24 @@ static void parse_numa_node(MachineState *ms, NumaNod= eOptions *node, numa_info[nodenr].node_memdev =3D MEMORY_BACKEND(o); } numa_info[nodenr].present =3D true; - max_numa_nodeid =3D MAX(max_numa_nodeid, nodenr + 1); + numa_info[nodenr].nodeid =3D node->has_nodeid ? node->nodeid : nb_numa= _nodes; nb_numa_nodes++; } =20 static void parse_numa_distance(NumaDistOptions *dist, Error **errp) { - uint16_t src =3D dist->src; - uint16_t dst =3D dist->dst; + uint16_t src =3D find_numa(dist->src); + uint16_t dst =3D find_numa(dist->dst); uint8_t val =3D dist->val; =20 - if (src >=3D MAX_NODES || dst >=3D MAX_NODES) { - error_setg(errp, "Parameter '%s' expects an integer between 0 and = %d", - src >=3D MAX_NODES ? "src" : "dst", MAX_NODES - 1); + if (src >=3D MAX_NODES || !numa_info[src].present) { + error_setg(errp, "Source NUMA node is missing. " + "Please use '-numa node' option to declare it first."); return; } =20 - if (!numa_info[src].present || !numa_info[dst].present) { - error_setg(errp, "Source/Destination NUMA node is missing. " + if (dst >=3D MAX_NODES || !numa_info[dst].present) { + error_setg(errp, "Destination NUMA node is missing. " "Please use '-numa node' option to declare it first."); return; } @@ -193,7 +195,7 @@ void set_numa_options(MachineState *ms, NumaOptions *ob= ject, Error **errp) error_setg(&err, "Missing mandatory node-id property"); goto end; } - if (!numa_info[object->u.cpu.node_id].present) { + if (find_numa(object->u.cpu.node_id) =3D=3D MAX_NODES) { error_setg(&err, "Invalid node-id=3D%" PRId64 ", NUMA node mus= t be " "defined with -numa node,nodeid=3DID before it's used with= " "-numa cpu,node-id=3DID", object->u.cpu.node_id); @@ -261,7 +263,7 @@ static void validate_numa_distance(void) error_report("The distance between node %d and %d is " "missing, at least one distance value " "between each nodes should be provided.", - src, dst); + numa_info[src].nodeid, numa_info[dst].nod= eid); exit(EXIT_FAILURE); } } @@ -279,9 +281,10 @@ static void validate_numa_distance(void) for (src =3D 0; src < nb_numa_nodes; src++) { for (dst =3D 0; dst < nb_numa_nodes; dst++) { if (src !=3D dst && numa_info[src].distance[dst] =3D=3D 0)= { - error_report("At least one asymmetrical pair of " + error_report("At least one asymmetrical pair (%d, %d) = of " "distances is given, please provide distances " - "for both directions of all node pairs."); + "for both directions of all node pairs.", + numa_info[src].nodeid, numa_info[dst].nodeid); exit(EXIT_FAILURE); } } @@ -371,20 +374,6 @@ void numa_complete_configuration(MachineState *ms) parse_numa_node(ms, &node, &error_abort); } =20 - assert(max_numa_nodeid <=3D MAX_NODES); - - /* No support for sparse NUMA node IDs yet: */ - for (i =3D max_numa_nodeid - 1; i >=3D 0; i--) { - /* Report large node IDs first, to make mistakes easier to spot */ - if (!numa_info[i].present) { - error_report("numa: Node ID missing: %d", i); - exit(1); - } - } - - /* This must be always true if all nodes are present: */ - assert(nb_numa_nodes =3D=3D max_numa_nodeid); - if (nb_numa_nodes > 0) { uint64_t numa_total; =20 diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index d281ffa89e..7f7b31861c 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2351,6 +2351,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, Ma= chineState *machine) numa_start =3D table_data->len; =20 for (i =3D 1; i < pcms->numa_nodes + 1; ++i) { + int nodeid =3D numa_info[i - 1].nodeid; mem_base =3D next_base; mem_len =3D pcms->node_mem[i - 1]; next_base =3D mem_base + mem_len; @@ -2361,7 +2362,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, Ma= chineState *machine) mem_len -=3D next_base - HOLE_640K_START; if (mem_len > 0) { numamem =3D acpi_data_push(table_data, sizeof *numamem); - build_srat_memory(numamem, mem_base, mem_len, i - 1, + build_srat_memory(numamem, mem_base, mem_len, nodeid, MEM_AFFINITY_ENABLED); } =20 @@ -2380,7 +2381,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, Ma= chineState *machine) mem_len -=3D next_base - pcms->below_4g_mem_size; if (mem_len > 0) { numamem =3D acpi_data_push(table_data, sizeof *numamem); - build_srat_memory(numamem, mem_base, mem_len, i - 1, + build_srat_memory(numamem, mem_base, mem_len, nodeid, MEM_AFFINITY_ENABLED); } mem_base =3D 1ULL << 32; @@ -2390,7 +2391,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, Ma= chineState *machine) =20 if (mem_len > 0) { numamem =3D acpi_data_push(table_data, sizeof *numamem); - build_srat_memory(numamem, mem_base, mem_len, i - 1, + build_srat_memory(numamem, mem_base, mem_len, nodeid, MEM_AFFINITY_ENABLED); } } @@ -2411,7 +2412,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, Ma= chineState *machine) if (hotplugabble_address_space_size) { numamem =3D acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, machine->device_memory->base, - hotplugabble_address_space_size, pcms->numa_node= s - 1, + hotplugabble_address_space_size, + numa_info[pcms->numa_nodes - 1].nodeid, MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED= ); } =20 diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 549c437050..9e96d8a02c 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2847,7 +2847,7 @@ static int64_t pc_get_default_cpu_node_id(const Machi= neState *ms, int idx) x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id, pcms->smp_dies, ms->smp.cores, ms->smp.threads, &topo); - return topo.pkg_id % nb_numa_nodes; + return numa_info[topo.pkg_id % nb_numa_nodes].nodeid; } =20 static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 821f0d4a49..09cff62de8 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -424,7 +424,8 @@ static int spapr_populate_memory(SpaprMachineState *spa= pr, void *fdt) if (!mem_start) { /* spapr_machine_init() checks for rma_size <=3D node0_size * already */ - spapr_populate_memory_node(fdt, i, 0, spapr->rma_size); + spapr_populate_memory_node(fdt, nodes[i].nodeid, 0, + spapr->rma_size); mem_start +=3D spapr->rma_size; node_size -=3D spapr->rma_size; } @@ -436,7 +437,7 @@ static int spapr_populate_memory(SpaprMachineState *spa= pr, void *fdt) sizetmp =3D 1ULL << ctzl(mem_start); } =20 - spapr_populate_memory_node(fdt, i, mem_start, sizetmp); + spapr_populate_memory_node(fdt, nodes[i].nodeid, mem_start, si= zetmp); node_size -=3D sizetmp; mem_start +=3D sizetmp; } @@ -2542,7 +2543,7 @@ static void spapr_validate_node_memory(MachineState *= machine, Error **errp) error_setg(errp, "Node %d memory size 0x%" PRIx64 " is not aligned to %" PRIu64 " MiB", - i, numa_info[i].node_mem, + numa_info[i].nodeid, numa_info[i].node_mem, SPAPR_MEMORY_BLOCK_SIZE / MiB); return; } @@ -4139,7 +4140,7 @@ spapr_cpu_index_to_props(MachineState *machine, unsig= ned cpu_index) =20 static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int i= dx) { - return idx / ms->smp.cores % nb_numa_nodes; + return numa_info[idx / ms->smp.cores % nb_numa_nodes].nodeid; } =20 static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *mach= ine) diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 01a263eba2..6b0b49a699 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -13,6 +13,7 @@ struct NodeInfo { uint64_t node_mem; struct HostMemoryBackend *node_memdev; bool present; + uint16_t nodeid; uint8_t distance[MAX_NODES]; }; =20 --=20 2.21.0