From nobody Sun Apr 28 23:40:21 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 14937427512739.14475117646873; Tue, 2 May 2017 09:32:31 -0700 (PDT) Received: from localhost ([::1]:60067 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d5aiy-0000uB-M2 for importer@patchew.org; Tue, 02 May 2017 12:32:28 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58050) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d5agg-0007qS-0K for qemu-devel@nongnu.org; Tue, 02 May 2017 12:30:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d5age-00023z-Cn for qemu-devel@nongnu.org; Tue, 02 May 2017 12:30:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56922) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1d5age-000232-3p; Tue, 02 May 2017 12:30:04 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 04A6A1121E0; Tue, 2 May 2017 16:30:02 +0000 (UTC) Received: from thinkpad.redhat.com (ovpn-117-7.ams2.redhat.com [10.36.117.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8CF795C886; Tue, 2 May 2017 16:29:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 04A6A1121E0 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=lvivier@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 04A6A1121E0 From: Laurent Vivier To: Eduardo Habkost Date: Tue, 2 May 2017 18:29:55 +0200 Message-Id: <20170502162955.1610-2-lvivier@redhat.com> In-Reply-To: <20170502162955.1610-1-lvivier@redhat.com> References: <20170502162955.1610-1-lvivier@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 02 May 2017 16:30:03 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v4 1/1] numa: equally distribute memory on nodes X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Vivier , Thomas Huth , qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Paolo Bonzini , David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" When there are more nodes than available memory to put the minimum allowed memory by node, all the memory is put on the last node. This is because we put (ram_size / nb_numa_nodes) & ~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this case the value is 0. This is particularly true with pseries, as the memory must be aligned to 256MB. To avoid this problem, this patch uses an error diffusion algorithm [1] to distribute equally the memory on nodes. We introduce numa_auto_assign_ram() function in MachineClass to keep compatibility between machine type versions. The legacy function is used with pseries-2.9, pc-q35-2.9 and pc-i440fx-2.9 (and previous), the new one with all others. Example: qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \ -numa node -numa node -numa node \ -numa node -numa node -numa node Before: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 0 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 0 MB node 4 cpus: 4 node 4 size: 0 MB node 5 cpus: 5 node 5 size: 1024 MB After: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 256 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 256 MB node 4 cpus: 4 node 4 size: 256 MB node 5 cpus: 5 node 5 size: 256 MB [1] https://en.wikipedia.org/wiki/Error_diffusion Signed-off-by: Laurent Vivier --- hw/core/machine.c | 2 ++ hw/i386/pc_piix.c | 2 ++ hw/i386/pc_q35.c | 2 ++ hw/ppc/spapr.c | 1 + include/hw/boards.h | 2 ++ include/qemu/typedefs.h | 1 + include/sysemu/numa.h | 9 +++++++-- numa.c | 49 ++++++++++++++++++++++++++++++++++++++-------= ---- 8 files changed, 55 insertions(+), 13 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index ada9eea..2482c63 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -17,6 +17,7 @@ #include "qapi/visitor.h" #include "hw/sysbus.h" #include "sysemu/sysemu.h" +#include "sysemu/numa.h" #include "qemu/error-report.h" #include "qemu/cutils.h" =20 @@ -400,6 +401,7 @@ static void machine_class_init(ObjectClass *oc, void *d= ata) * On Linux, each node's border has to be 8MB aligned */ mc->numa_mem_align_shift =3D 23; + mc->numa_auto_assign_ram =3D numa_default_auto_assign_ram; =20 object_class_property_add_str(oc, "accel", machine_get_accel, machine_set_accel, &error_abort); diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 9f102aa..d468b96 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -54,6 +54,7 @@ #endif #include "migration/migration.h" #include "kvm_i386.h" +#include "sysemu/numa.h" =20 #define MAX_IDE_BUS 2 =20 @@ -442,6 +443,7 @@ static void pc_i440fx_2_9_machine_options(MachineClass = *m) pc_i440fx_machine_options(m); m->alias =3D "pc"; m->is_default =3D 1; + m->numa_auto_assign_ram =3D numa_legacy_auto_assign_ram; } =20 DEFINE_I440FX_MACHINE(v2_9, "pc-i440fx-2.9", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index dd792a8..66303a7 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -47,6 +47,7 @@ #include "hw/usb.h" #include "qemu/error-report.h" #include "migration/migration.h" +#include "sysemu/numa.h" =20 /* ICH9 AHCI has 6 ports */ #define MAX_SATA_PORTS 6 @@ -305,6 +306,7 @@ static void pc_q35_2_9_machine_options(MachineClass *m) { pc_q35_machine_options(m); m->alias =3D "q35"; + m->numa_auto_assign_ram =3D numa_legacy_auto_assign_ram; } =20 DEFINE_Q35_MACHINE(v2_9, "pc-q35-2.9", NULL, diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 80d12d0..bdc31ce 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3242,6 +3242,7 @@ static void spapr_machine_2_9_class_options(MachineCl= ass *mc) { spapr_machine_2_10_class_options(mc); SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9); + mc->numa_auto_assign_ram =3D numa_legacy_auto_assign_ram; } =20 DEFINE_SPAPR_MACHINE(2_9, "2.9", false); diff --git a/include/hw/boards.h b/include/hw/boards.h index 31d9c72..99458eb 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -136,6 +136,8 @@ struct MachineClass { int minimum_page_bits; bool has_hotpluggable_cpus; int numa_mem_align_shift; + void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, + int nb_nodes, ram_addr_t size); =20 HotplugHandler *(*get_hotplug_handler)(MachineState *machine, DeviceState *dev); diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h index f08d327..7d85057 100644 --- a/include/qemu/typedefs.h +++ b/include/qemu/typedefs.h @@ -97,5 +97,6 @@ typedef struct SSIBus SSIBus; typedef struct uWireSlave uWireSlave; typedef struct VirtIODevice VirtIODevice; typedef struct Visitor Visitor; +typedef struct node_info NodeInfo; =20 #endif /* QEMU_TYPEDEFS_H */ diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 8f09dcf..6270384 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -15,13 +15,13 @@ struct numa_addr_range { QLIST_ENTRY(numa_addr_range) entry; }; =20 -typedef struct node_info { +struct node_info { uint64_t node_mem; unsigned long *node_cpu; struct HostMemoryBackend *node_memdev; bool present; QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */ -} NodeInfo; +}; =20 extern NodeInfo numa_info[MAX_NODES]; void parse_numa_opts(MachineClass *mc); @@ -31,6 +31,11 @@ extern QemuOptsList qemu_numa_opts; void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); uint32_t numa_get_node(ram_addr_t addr, Error **errp); +void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, + int nb_nodes, ram_addr_t size); +void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, + int nb_nodes, ram_addr_t size); + =20 /* on success returns node index in numa_info, * on failure returns nb_numa_nodes */ diff --git a/numa.c b/numa.c index 6fc2393..750fd95 100644 --- a/numa.c +++ b/numa.c @@ -294,6 +294,42 @@ static void validate_numa_cpus(void) g_free(seen_cpus); } =20 +void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, + int nb_nodes, ram_addr_t size) +{ + int i; + uint64_t usedmem =3D 0; + + /* Align each node according to the alignment + * requirements of the machine class + */ + + for (i =3D 0; i < nb_nodes - 1; i++) { + nodes[i].node_mem =3D (size / nb_nodes) & + ~((1 << mc->numa_mem_align_shift) - 1); + usedmem +=3D nodes[i].node_mem; + } + nodes[i].node_mem =3D size - usedmem; +} + +void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, + int nb_nodes, ram_addr_t size) +{ + int i; + uint64_t usedmem =3D 0, node_mem; + uint64_t granularity =3D size / nb_nodes; + uint64_t propagate =3D 0; + + for (i =3D 0; i < nb_nodes - 1; i++) { + node_mem =3D (granularity + propagate) & + ~((1 << mc->numa_mem_align_shift) - 1); + propagate =3D granularity + propagate - node_mem; + nodes[i].node_mem =3D node_mem; + usedmem +=3D node_mem; + } + nodes[i].node_mem =3D ram_size - usedmem; +} + void parse_numa_opts(MachineClass *mc) { int i; @@ -336,17 +372,8 @@ void parse_numa_opts(MachineClass *mc) } } if (i =3D=3D nb_numa_nodes) { - uint64_t usedmem =3D 0; - - /* Align each node according to the alignment - * requirements of the machine class - */ - for (i =3D 0; i < nb_numa_nodes - 1; i++) { - numa_info[i].node_mem =3D (ram_size / nb_numa_nodes) & - ~((1 << mc->numa_mem_align_shift) = - 1); - usedmem +=3D numa_info[i].node_mem; - } - numa_info[i].node_mem =3D ram_size - usedmem; + assert(mc->numa_auto_assign_ram); + mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_siz= e); } =20 numa_total =3D 0; --=20 2.9.3