From nobody Sun Dec 14 01:50:18 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1510852254980836.8865334178713; Thu, 16 Nov 2017 09:10:54 -0800 (PST) Received: from localhost ([::1]:41751 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eFNKD-0003Z6-Lj for importer@patchew.org; Thu, 16 Nov 2017 11:47:37 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34028) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eFNDz-0006NP-8F for qemu-devel@nongnu.org; Thu, 16 Nov 2017 11:41:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eFNDu-0000lb-T7 for qemu-devel@nongnu.org; Thu, 16 Nov 2017 11:41:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48750) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eFNDu-0000kq-Jw for qemu-devel@nongnu.org; Thu, 16 Nov 2017 11:41:06 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9AC4A267FD; Thu, 16 Nov 2017 16:41:05 +0000 (UTC) Received: from redhat.com (ovpn-123-149.rdu2.redhat.com [10.10.123.149]) by smtp.corp.redhat.com (Postfix) with SMTP id 3DDC25C2EB; Thu, 16 Nov 2017 16:40:53 +0000 (UTC) Date: Thu, 16 Nov 2017 18:40:52 +0200 From: "Michael S. Tsirkin" To: qemu-devel@nongnu.org Message-ID: <1510850407-17266-7-git-send-email-mst@redhat.com> References: <1510850407-17266-1-git-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1510850407-17266-1-git-send-email-mst@redhat.com> X-Mutt-Fcc: =sent X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 16 Nov 2017 16:41:05 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 6/9] NUMA: Enable adding NUMA node implicitly X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Thomas Huth , Takao Indoh , Eduardo Habkost , Thadeu Lima de Souza Cascardo , Izumi Taku , Dou Liyang , David Hildenbrand , Alistair Francis , Paolo Bonzini , Marcel Apfelbaum , Igor Mammedov , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Dou Liyang Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't pre= sent on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT = table present, guest kernel will use nommu DMA ops, which breaks 32bit hw driv= ers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started= with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migra= tion). Which would provide SRAT table to guests without explicit -numa options on = CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit alloc= ated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo Suggested-by: Igor Mammedov Signed-off-by: Dou Liyang Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: "Michael S. Tsirkin" Cc: Marcel Apfelbaum Cc: Igor Mammedov Cc: David Hildenbrand Cc: Thomas Huth Cc: Alistair Francis Cc: Takao Indoh Cc: Izumi Taku Reviewed-by: Igor Mammedov Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Acked-by: Thadeu Lima de Souza Cascardo --- include/hw/boards.h | 1 + hw/i386/pc.c | 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + numa.c | 21 ++++++++++++++++++++- vl.c | 3 +-- 6 files changed, 25 insertions(+), 3 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index 62f160e..156b16f 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -197,6 +197,7 @@ struct MachineClass { bool ignore_memory_transaction_failures; int numa_mem_align_shift; const char **valid_cpu_types; + bool auto_enable_numa_with_memhp; void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, int nb_nodes, ram_addr_t size); =20 diff --git a/hw/i386/pc.c b/hw/i386/pc.c index fafe5ba..c3afe5b 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2347,6 +2347,7 @@ static void pc_machine_class_init(ObjectClass *oc, vo= id *data) mc->cpu_index_to_instance_props =3D pc_cpu_index_to_props; mc->get_default_cpu_node_id =3D pc_get_default_cpu_node_id; mc->possible_cpu_arch_ids =3D pc_possible_cpu_arch_ids; + mc->auto_enable_numa_with_memhp =3D true; mc->has_hotpluggable_cpus =3D true; mc->default_boot_order =3D "cad"; mc->hot_add_cpu =3D pc_hot_add_cpu; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index f79d5cb..5e47528 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass= *m) m->is_default =3D 0; m->alias =3D NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); + m->auto_enable_numa_with_memhp =3D false; } =20 DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index da3ea60..d606004 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) m->alias =3D NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); m->numa_auto_assign_ram =3D numa_legacy_auto_assign_ram; + m->auto_enable_numa_with_memhp =3D false; } =20 DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, diff --git a/numa.c b/numa.c index 8d78d95..7151b24 100644 --- a/numa.c +++ b/numa.c @@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeO= ptions *node, } numa_info[nodenr].present =3D true; max_numa_nodeid =3D MAX(max_numa_nodeid, nodenr + 1); + nb_numa_nodes++; } =20 static void parse_numa_distance(NumaDistOptions *dist, Error **errp) @@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Err= or **errp) if (err) { goto end; } - nb_numa_nodes++; break; case NUMA_OPTIONS_TYPE_DIST: parse_numa_distance(&object->u.dist, &err); @@ -433,6 +433,25 @@ void parse_numa_opts(MachineState *ms) exit(1); } =20 + /* + * If memory hotplug is enabled (slots > 0) but without '-numa' + * options explicitly on CLI, guestes will break. + * + * Windows: won't enable memory hotplug without SRAT table at all + * + * Linux: if QEMU is started with initial memory all below 4Gb + * and no SRAT table present, guest kernel will use nommu DMA ops, + * which breaks 32bit hw drivers when memory is hotplugged and + * guest tries to use it with that drivers. + * + * Enable NUMA implicitly by adding a new NUMA node automatically. + */ + if (ms->ram_slots > 0 && nb_numa_nodes =3D=3D 0 && + mc->auto_enable_numa_with_memhp) { + NumaNodeOptions node =3D { }; + parse_numa_node(ms, &node, NULL); + } + assert(max_numa_nodeid <=3D MAX_NODES); =20 /* No support for sparse NUMA node IDs yet: */ diff --git a/vl.c b/vl.c index 7372424..1ad1c04 100644 --- a/vl.c +++ b/vl.c @@ -4690,8 +4690,6 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); =20 - parse_numa_opts(current_machine); - if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, NULL)) { exit(1); @@ -4741,6 +4739,7 @@ int main(int argc, char **argv, char **envp) current_machine->boot_order =3D boot_order; current_machine->cpu_model =3D cpu_model; =20 + parse_numa_opts(current_machine); =20 /* parse features once if machine provides default cpu_type */ if (machine_class->default_cpu_type) { --=20 MST