From nobody Wed May 1 23:55:52 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1493260581041352.85637465862715; Wed, 26 Apr 2017 19:36:21 -0700 (PDT) Received: from localhost ([::1]:57949 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d3ZI3-0001pT-3t for importer@patchew.org; Wed, 26 Apr 2017 22:36:19 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45258) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d3ZHG-0001Xd-82 for qemu-devel@nongnu.org; Wed, 26 Apr 2017 22:35:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d3ZHC-0006ML-6w for qemu-devel@nongnu.org; Wed, 26 Apr 2017 22:35:30 -0400 Received: from mga03.intel.com ([134.134.136.65]:11697) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1d3ZHB-0006KK-Pz for qemu-devel@nongnu.org; Wed, 26 Apr 2017 22:35:26 -0400 Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Apr 2017 19:35:23 -0700 Received: from he.bj.intel.com (HELO localhost) ([10.238.135.175]) by fmsmga005.fm.intel.com with ESMTP; 26 Apr 2017 19:35:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,256,1488873600"; d="scan'208";a="94318892" From: He Chen To: qemu-devel@nongnu.org Date: Thu, 27 Apr 2017 10:35:58 +0800 Message-Id: <1493260558-20728-1-git-send-email-he.chen@linux.intel.com> X-Mailer: git-send-email 2.7.4 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.65 Subject: [Qemu-devel] [PATCH v9] Allow setting NUMA distance for different NUMA nodes X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Jones , Eduardo Habkost , "Michael S . Tsirkin" , Markus Armbruster , Paolo Bonzini , Igor Mammedov , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This patch is going to add SLIT table support in QEMU, and provides additional option `dist` for command `-numa` to allow user set vNUMA distance by QEMU command. With this patch, when a user wants to create a guest that contains several vNUMA nodes and also wants to set distance among those nodes, the QEMU command would like: ``` -numa node,nodeid=3D0,cpus=3D0 \ -numa node,nodeid=3D1,cpus=3D1 \ -numa node,nodeid=3D2,cpus=3D2 \ -numa node,nodeid=3D3,cpus=3D3 \ -numa dist,src=3D0,dst=3D1,val=3D21 \ -numa dist,src=3D0,dst=3D2,val=3D31 \ -numa dist,src=3D0,dst=3D3,val=3D41 \ -numa dist,src=3D1,dst=3D2,val=3D21 \ -numa dist,src=3D1,dst=3D3,val=3D31 \ -numa dist,src=3D2,dst=3D3,val=3D21 \ ``` Signed-off-by: He Chen Reviewed-by: Andrew Jones Reviewed-by: Igor Mammedov --- Changes since v8: * numa_{node, distance}_parse --> parse_numa_{node, distance} * Comments refinement. Changes since v7: * Remove unnecessary node present check. * Minor improvement on prompt message. Changes since v6: * Split validate_numa_distance into 2 separate functions. * Add comments before validate and complete numa distance functions. Changes since v5: * Made the generation of the SLIT dependent on `have_numa_distance`. * Doc refinement. --- hw/acpi/aml-build.c | 26 +++++++++ hw/i386/acpi-build.c | 4 ++ include/hw/acpi/aml-build.h | 1 + include/sysemu/numa.h | 2 + include/sysemu/sysemu.h | 4 ++ numa.c | 137 ++++++++++++++++++++++++++++++++++++++++= +++- qapi-schema.json | 30 +++++++++- qemu-options.hx | 16 +++++- 8 files changed, 215 insertions(+), 5 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index c6f2032..be496c8 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -24,6 +24,7 @@ #include "hw/acpi/aml-build.h" #include "qemu/bswap.h" #include "qemu/bitops.h" +#include "sysemu/numa.h" =20 static GArray *build_alloc_array(void) { @@ -1609,3 +1610,28 @@ void build_srat_memory(AcpiSratMemoryAffinity *numam= em, uint64_t base, numamem->base_addr =3D cpu_to_le64(base); numamem->range_length =3D cpu_to_le64(len); } + +/* + * ACPI spec 5.2.17 System Locality Distance Information Table + * (Revision 2.0 or later) + */ +void build_slit(GArray *table_data, BIOSLinker *linker) +{ + int slit_start, i, j; + slit_start =3D table_data->len; + + acpi_data_push(table_data, sizeof(AcpiTableHeader)); + + build_append_int_noprefix(table_data, nb_numa_nodes, 8); + for (i =3D 0; i < nb_numa_nodes; i++) { + for (j =3D 0; j < nb_numa_nodes; j++) { + assert(numa_info[i].distance[j]); + build_append_int_noprefix(table_data, numa_info[i].distance[j]= , 1); + } + } + + build_header(linker, table_data, + (void *)(table_data->data + slit_start), + "SLIT", + table_data->len - slit_start, 1, NULL, NULL); +} diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 2073108..2458ebc 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2678,6 +2678,10 @@ void acpi_build(AcpiBuildTables *tables, MachineStat= e *machine) if (pcms->numa_nodes) { acpi_add_table(table_offsets, tables_blob); build_srat(tables_blob, tables->linker, machine); + if (have_numa_distance) { + acpi_add_table(table_offsets, tables_blob); + build_slit(tables_blob, tables->linker); + } } if (acpi_get_mcfg(&mcfg)) { acpi_add_table(table_offsets, tables_blob); diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index 00c21f1..329a0d0 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -389,4 +389,5 @@ GCC_FMT_ATTR(2, 3); void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, uint64_t len, int node, MemoryAffinityFlags flags); =20 +void build_slit(GArray *table_data, BIOSLinker *linker); #endif diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 8f09dcf..0ea1bc0 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -8,6 +8,7 @@ #include "hw/boards.h" =20 extern int nb_numa_nodes; /* Number of NUMA nodes */ +extern bool have_numa_distance; =20 struct numa_addr_range { ram_addr_t mem_start; @@ -21,6 +22,7 @@ typedef struct node_info { struct HostMemoryBackend *node_memdev; bool present; QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */ + uint8_t distance[MAX_NODES]; } NodeInfo; =20 extern NodeInfo numa_info[MAX_NODES]; diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 576c7ce..6999545 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -169,6 +169,10 @@ extern int mem_prealloc; =20 #define MAX_NODES 128 #define NUMA_NODE_UNASSIGNED MAX_NODES +#define NUMA_DISTANCE_MIN 10 +#define NUMA_DISTANCE_DEFAULT 20 +#define NUMA_DISTANCE_MAX 254 +#define NUMA_DISTANCE_UNREACHABLE 255 =20 #define MAX_OPTION_ROMS 16 typedef struct QEMUOptionRom { diff --git a/numa.c b/numa.c index 6fc2393..2b3fc69 100644 --- a/numa.c +++ b/numa.c @@ -51,6 +51,7 @@ static int max_numa_nodeid; /* Highest specified NUMA nod= e ID, plus one. * For all nodes, nodeid < max_numa_nodeid */ int nb_numa_nodes; +bool have_numa_distance; NodeInfo numa_info[MAX_NODES]; =20 void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node) @@ -140,7 +141,7 @@ uint32_t numa_get_node(ram_addr_t addr, Error **errp) return -1; } =20 -static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error *= *errp) +static void parse_numa_node(NumaNodeOptions *node, QemuOpts *opts, Error *= *errp) { uint16_t nodenr; uint16List *cpus =3D NULL; @@ -212,6 +213,43 @@ static void numa_node_parse(NumaNodeOptions *node, Qem= uOpts *opts, Error **errp) max_numa_nodeid =3D MAX(max_numa_nodeid, nodenr + 1); } =20 +static void parse_numa_distance(NumaDistOptions *dist, Error **errp) +{ + uint16_t src =3D dist->src; + uint16_t dst =3D dist->dst; + uint8_t val =3D dist->val; + + if (src >=3D MAX_NODES || dst >=3D MAX_NODES) { + error_setg(errp, + "Invalid node %" PRIu16 + ", max possible could be %" PRIu16, + MAX(src, dst), MAX_NODES); + return; + } + + if (!numa_info[src].present || !numa_info[dst].present) { + error_setg(errp, "Source/Destination NUMA node is missing. " + "Please use '-numa node' option to declare it first."); + return; + } + + if (val < NUMA_DISTANCE_MIN) { + error_setg(errp, "NUMA distance (%" PRIu8 ") is invalid, " + "it shouldn't be less than %d.", + val, NUMA_DISTANCE_MIN); + return; + } + + if (src =3D=3D dst && val !=3D NUMA_DISTANCE_MIN) { + error_setg(errp, "Local distance of node %d should be %d.", + src, NUMA_DISTANCE_MIN); + return; + } + + numa_info[src].distance[dst] =3D val; + have_numa_distance =3D true; +} + static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) { NumaOptions *object =3D NULL; @@ -229,12 +267,18 @@ static int parse_numa(void *opaque, QemuOpts *opts, E= rror **errp) =20 switch (object->type) { case NUMA_OPTIONS_TYPE_NODE: - numa_node_parse(&object->u.node, opts, &err); + parse_numa_node(&object->u.node, opts, &err); if (err) { goto end; } nb_numa_nodes++; break; + case NUMA_OPTIONS_TYPE_DIST: + parse_numa_distance(&object->u.dist, &err); + if (err) { + goto end; + } + break; default: abort(); } @@ -294,6 +338,75 @@ static void validate_numa_cpus(void) g_free(seen_cpus); } =20 +/* If all node pair distances are symmetric, then only distances + * in one direction are enough. If there is even one asymmetric + * pair, though, then all distances must be provided. The + * distance from a node to itself is always NUMA_DISTANCE_MIN, + * so providing it is never necessary. + */ +static void validate_numa_distance(void) +{ + int src, dst; + bool is_asymmetrical =3D false; + + for (src =3D 0; src < nb_numa_nodes; src++) { + for (dst =3D src; dst < nb_numa_nodes; dst++) { + if (numa_info[src].distance[dst] =3D=3D 0 && + numa_info[dst].distance[src] =3D=3D 0) { + if (src !=3D dst) { + error_report("The distance between node %d and %d is " + "missing, at least one distance value " + "between each nodes should be provided.", + src, dst); + exit(EXIT_FAILURE); + } + } + + if (numa_info[src].distance[dst] !=3D 0 && + numa_info[dst].distance[src] !=3D 0 && + numa_info[src].distance[dst] !=3D + numa_info[dst].distance[src]) { + is_asymmetrical =3D true; + } + } + } + + if (is_asymmetrical) { + for (src =3D 0; src < nb_numa_nodes; src++) { + for (dst =3D 0; dst < nb_numa_nodes; dst++) { + if (src !=3D dst && numa_info[src].distance[dst] =3D=3D 0)= { + error_report("At least one asymmetrical pair of " + "distances is given, please provide distances " + "for both directions of all node pairs."); + exit(EXIT_FAILURE); + } + } + } + } +} + +static void complete_init_numa_distance(void) +{ + int src, dst; + + /* Fixup NUMA distance by symmetric policy because if it is an + * asymmetric distance table, it should be a complete table and + * there would not be any missing distance except local node, which + * is verified by validate_numa_distance above. + */ + for (src =3D 0; src < nb_numa_nodes; src++) { + for (dst =3D 0; dst < nb_numa_nodes; dst++) { + if (numa_info[src].distance[dst] =3D=3D 0) { + if (src =3D=3D dst) { + numa_info[src].distance[dst] =3D NUMA_DISTANCE_MIN; + } else { + numa_info[src].distance[dst] =3D numa_info[dst].distan= ce[src]; + } + } + } + } +} + void parse_numa_opts(MachineClass *mc) { int i; @@ -390,6 +503,26 @@ void parse_numa_opts(MachineClass *mc) } =20 validate_numa_cpus(); + + /* QEMU needs at least all unique node pair distances to build + * the whole NUMA distance table. QEMU treats the distance table + * as symmetric by default, i.e. distance A->B =3D=3D distance B->= A. + * Thus, QEMU is able to complete the distance table + * initialization even though only distance A->B is provided and + * distance B->A is not. QEMU knows the distance of a node to + * itself is always 10, so A->A distances may be omitted. When + * the distances of two nodes of a pair differ, i.e. distance + * A->B !=3D distance B->A, then that means the distance table is + * asymmetric. In this case, the distances for both directions + * of all node pairs are required. + */ + if (have_numa_distance) { + /* Validate enough NUMA distance information was provided. */ + validate_numa_distance(); + + /* Validation succeeded, now fill in any missing distances. */ + complete_init_numa_distance(); + } } else { numa_set_mem_node_id(0, ram_size, 0); } diff --git a/qapi-schema.json b/qapi-schema.json index 250e4dc..92fcd18 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -5673,10 +5673,14 @@ ## # @NumaOptionsType: # +# @node: NUMA nodes configuration +# +# @dist: NUMA distance configuration (since 2.10) +# # Since: 2.1 ## { 'enum': 'NumaOptionsType', - 'data': [ 'node' ] } + 'data': [ 'node', 'dist' ] } =20 ## # @NumaOptions: @@ -5689,7 +5693,8 @@ 'base': { 'type': 'NumaOptionsType' }, 'discriminator': 'type', 'data': { - 'node': 'NumaNodeOptions' }} + 'node': 'NumaNodeOptions', + 'dist': 'NumaDistOptions' }} =20 ## # @NumaNodeOptions: @@ -5718,6 +5723,27 @@ '*memdev': 'str' }} =20 ## +# @NumaDistOptions: +# +# Set the distance between 2 NUMA nodes. +# +# @src: source NUMA node. +# +# @dst: destination NUMA node. +# +# @val: NUMA distance from source node to destination node. +# When a node is unreachable from another node, set the distance +# between them to 255. +# +# Since: 2.10 +## +{ 'struct': 'NumaDistOptions', + 'data': { + 'src': 'uint16', + 'dst': 'uint16', + 'val': 'uint8' }} + +## # @HostMemPolicy: # # Host memory policy types diff --git a/qemu-options.hx b/qemu-options.hx index 99af8ed..7823db8 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -139,12 +139,15 @@ ETEXI =20 DEF("numa", HAS_ARG, QEMU_OPTION_numa, "-numa node[,mem=3Dsize][,cpus=3Dfirstcpu[-lastcpu]][,nodeid=3Dnode]\n" - "-numa node[,memdev=3Did][,cpus=3Dfirstcpu[-lastcpu]][,nodeid=3Dnode]\= n", QEMU_ARCH_ALL) + "-numa node[,memdev=3Did][,cpus=3Dfirstcpu[-lastcpu]][,nodeid=3Dnode]\= n" + "-numa dist,src=3Dsource,dst=3Ddestination,val=3Ddistance\n", QEMU_ARC= H_ALL) STEXI @item -numa node[,mem=3D@var{size}][,cpus=3D@var{firstcpu}[-@var{lastcpu}]= ][,nodeid=3D@var{node}] @itemx -numa node[,memdev=3D@var{id}][,cpus=3D@var{firstcpu}[-@var{lastcpu= }]][,nodeid=3D@var{node}] +@itemx -numa dist,src=3D@var{source},dst=3D@var{destination},val=3D@var{di= stance} @findex -numa Define a NUMA node and assign RAM and VCPUs to it. +Set the NUMA distance from a source node to a destination node. =20 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each @samp{cpus} option represent a contiguous range of CPU indexes @@ -167,6 +170,17 @@ split equally between them. @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore, if one node uses @samp{memdev}, all of them have to use it. =20 +@var{source} and @var{destination} are NUMA node IDs. +@var{distance} is the NUMA distance from @var{source} to @var{destination}. +The distance from a node to itself is always 10. If any pair of nodes is +given a distance, then all pairs must be given distances. Although, when +distances are only given in one direction for each pair of nodes, then +the distances in the opposite directions are assumed to be the same. If, +however, an asymmetrical pair of distances is given for even one node +pair, then all node pairs must be provided distance values for both +directions, even when they are symmetrical. When a node is unreachable +from another node, set the pair's distance to 255. + Note that the -@option{numa} option doesn't allocate any of the specified resources, it just assigns existing resources to NUMA nodes. This means that one still has to use the @option{-m}, --=20 2.7.4