From nobody Thu May 2 05:08:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1489469747107100.59934768616438; Mon, 13 Mar 2017 22:35:47 -0700 (PDT) Received: from localhost ([::1]:56447 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf7Z-0001jx-OR for importer@patchew.org; Tue, 14 Mar 2017 01:35:45 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60918) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf6U-0001QZ-Gr for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cnf6Q-0002iM-KH for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:38 -0400 Received: from ozlabs.org ([103.22.144.67]:46485) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cnf6P-0002dm-Hp; Tue, 14 Mar 2017 01:34:34 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 3vj3MT4K4Dz9s76; Tue, 14 Mar 2017 16:34:29 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1489469669; bh=lla+roVAEONS2L/t4+3pdUMVC7rqNAoBbEG1Am4kybs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nM453kVmFdFAOf5NvWHNtb4c7SxJhZBw4L156orRzJ03SQmP3PbYCHIW/Zhe7xjST RVAX1acS12w//iV8xBAE7CfAqWJBCx3Czw1P+LdFExbep+26VQnR6qEmTeQwXMPtm2 PH7CXO0cccSelKrXHqMS8w2fu+Jsbf1GcgfvfpUE= From: David Gibson To: qemu-ppc@nongnu.org Date: Tue, 14 Mar 2017 16:34:18 +1100 Message-Id: <20170314053422.10922-2-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170314053422.10922-1-david@gibson.dropbear.id.au> References: <20170314053422.10922-1-david@gibson.dropbear.id.au> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 103.22.144.67 Subject: [Qemu-devel] [PATCHv2 for-2.10 1/5] pseries: Stubs for HPT resizing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lvivier@redhat.com, agraf@suse.de, thuth@redhat.com, aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org, David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This introduces stub implementations of the H_RESIZE_HPT_PREPARE and H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR extension to allow run time resizing of a guest's hash page table. It also adds a new machine property for controlling whether this new facility is available. For now we only allow resizing with TCG, allowing it with KVM will require kernel changes as well. Finally, it adds a new string to the hypertas property in the device tree, advertising to the guest the availability of the HPT resizing hypercalls. This is a tentative suggested value, and would need to be standardized by PAPR before being merged. Signed-off-by: David Gibson Reviewed-by: Suraj Jitindar Singh --- hw/ppc/spapr.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++= ++++ hw/ppc/spapr_hcall.c | 36 ++++++++++++++++++++++++ hw/ppc/trace-events | 2 ++ include/hw/ppc/spapr.h | 11 ++++++++ target/ppc/kvm.c | 12 ++++++++ target/ppc/kvm_ppc.h | 5 ++++ 6 files changed, 141 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 92402bf..558109c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -808,6 +808,11 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, vo= id *fdt) if (!kvm_enabled() || kvmppc_spapr_use_multitce()) { add_str(hypertas, "hcall-multi-tce"); } + + if (spapr->resize_hpt !=3D SPAPR_RESIZE_HPT_DISABLED) { + add_str(hypertas, "hcall-hpt-resize"); + } + _FDT(fdt_setprop(fdt, rtas, "ibm,hypertas-functions", hypertas->str, hypertas->len)); g_string_free(hypertas, TRUE); @@ -1991,11 +1996,40 @@ static void ppc_spapr_init(MachineState *machine) long load_limit, fw_size; char *filename; int smt =3D kvmppc_smt_threads(); + Error *resize_hpt_err =3D NULL; =20 msi_nonbroken =3D true; =20 QLIST_INIT(&spapr->phbs); =20 + /* Check HPT resizing availability */ + kvmppc_check_papr_resize_hpt(&resize_hpt_err); + if (spapr->resize_hpt =3D=3D SPAPR_RESIZE_HPT_DEFAULT) { + /* + * If the user explicitly requested a mode we should either + * supply it, or fail completely (which we do below). But if + * it's not set explicitly, we reset our mode to something + * that works + */ + if (resize_hpt_err) { + spapr->resize_hpt =3D SPAPR_RESIZE_HPT_DISABLED; + error_free(resize_hpt_err); + resize_hpt_err =3D NULL; + } else { + spapr->resize_hpt =3D smc->resize_hpt_default; + } + } + + assert(spapr->resize_hpt !=3D SPAPR_RESIZE_HPT_DEFAULT); + + if ((spapr->resize_hpt !=3D SPAPR_RESIZE_HPT_DISABLED) && resize_hpt_e= rr) { + /* + * User requested HPT resize, but this host can't supply it. Bail= out + */ + error_report_err(resize_hpt_err); + exit(1); + } + /* Allocate RMA if necessary */ rma_alloc_size =3D kvmppc_alloc_rma(&rma); =20 @@ -2406,6 +2440,40 @@ static void spapr_set_modern_hotplug_events(Object *= obj, bool value, spapr->use_hotplug_event_source =3D value; } =20 +static char *spapr_get_resize_hpt(Object *obj, Error **errp) +{ + sPAPRMachineState *spapr =3D SPAPR_MACHINE(obj); + + switch (spapr->resize_hpt) { + case SPAPR_RESIZE_HPT_DEFAULT: + return g_strdup("default"); + case SPAPR_RESIZE_HPT_DISABLED: + return g_strdup("disabled"); + case SPAPR_RESIZE_HPT_ENABLED: + return g_strdup("enabled"); + case SPAPR_RESIZE_HPT_REQUIRED: + return g_strdup("required"); + } + assert(0); +} + +static void spapr_set_resize_hpt(Object *obj, const char *value, Error **e= rrp) +{ + sPAPRMachineState *spapr =3D SPAPR_MACHINE(obj); + + if (strcmp(value, "default") =3D=3D 0) { + spapr->resize_hpt =3D SPAPR_RESIZE_HPT_DEFAULT; + } else if (strcmp(value, "disabled") =3D=3D 0) { + spapr->resize_hpt =3D SPAPR_RESIZE_HPT_DISABLED; + } else if (strcmp(value, "enabled") =3D=3D 0) { + spapr->resize_hpt =3D SPAPR_RESIZE_HPT_ENABLED; + } else if (strcmp(value, "required") =3D=3D 0) { + spapr->resize_hpt =3D SPAPR_RESIZE_HPT_REQUIRED; + } else { + error_setg(errp, "Bad value for \"resize-hpt\" property"); + } +} + static void spapr_machine_initfn(Object *obj) { sPAPRMachineState *spapr =3D SPAPR_MACHINE(obj); @@ -2426,6 +2494,12 @@ static void spapr_machine_initfn(Object *obj) " place of standard EPOW events when p= ossible" " (required for memory hot-unplug supp= ort)", NULL); + + object_property_add_str(obj, "resize-hpt", + spapr_get_resize_hpt, spapr_set_resize_hpt, NU= LL); + object_property_set_description(obj, "resize-hpt", + "Resizing of the Hash Page Table (enab= led, disabled, required)", + NULL); } =20 static void spapr_machine_finalizefn(Object *obj) @@ -3083,6 +3157,7 @@ static void spapr_machine_class_init(ObjectClass *oc,= void *data) smc->dr_lmb_enabled =3D true; smc->tcg_default_cpu =3D "POWER8"; mc->has_hotpluggable_cpus =3D true; + smc->resize_hpt_default =3D SPAPR_RESIZE_HPT_DISABLED; fwc->get_dev_path =3D spapr_get_fw_dev_path; nc->nmi_monitor_handler =3D spapr_nmi; smc->phb_placement =3D spapr_phb_placement; diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index f05a90e..9f88960 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -352,6 +352,38 @@ static target_ulong h_read(PowerPCCPU *cpu, sPAPRMachi= neState *spapr, return H_SUCCESS; } =20 +static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu, + sPAPRMachineState *spapr, + target_ulong opcode, + target_ulong *args) +{ + target_ulong flags =3D args[0]; + target_ulong shift =3D args[1]; + + if (spapr->resize_hpt =3D=3D SPAPR_RESIZE_HPT_DISABLED) { + return H_AUTHORITY; + } + + trace_spapr_h_resize_hpt_prepare(flags, shift); + return H_HARDWARE; +} + +static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu, + sPAPRMachineState *spapr, + target_ulong opcode, + target_ulong *args) +{ + target_ulong flags =3D args[0]; + target_ulong shift =3D args[1]; + + if (spapr->resize_hpt =3D=3D SPAPR_RESIZE_HPT_DISABLED) { + return H_AUTHORITY; + } + + trace_spapr_h_resize_hpt_commit(flags, shift); + return H_HARDWARE; +} + static target_ulong h_set_sprg0(PowerPCCPU *cpu, sPAPRMachineState *spapr, target_ulong opcode, target_ulong *args) { @@ -1072,6 +1104,10 @@ static void hypercall_register_types(void) /* hcall-bulk */ spapr_register_hypercall(H_BULK_REMOVE, h_bulk_remove); =20 + /* hcall-hpt-resize */ + spapr_register_hypercall(H_RESIZE_HPT_PREPARE, h_resize_hpt_prepare); + spapr_register_hypercall(H_RESIZE_HPT_COMMIT, h_resize_hpt_commit); + /* hcall-splpar */ spapr_register_hypercall(H_REGISTER_VPA, h_register_vpa); spapr_register_hypercall(H_CEDE, h_cede); diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events index 43d265f..7c77cc6 100644 --- a/hw/ppc/trace-events +++ b/hw/ppc/trace-events @@ -16,6 +16,8 @@ spapr_cas_continue(unsigned long n) "Copy changes to the = guest: %ld bytes" # hw/ppc/spapr_hcall.c spapr_cas_pvr_try(uint32_t pvr) "%x" spapr_cas_pvr(uint32_t cur_pvr, bool explicit_match, uint32_t new_pvr) "cu= rrent=3D%x, explicit_match=3D%u, new=3D%x" +spapr_h_resize_hpt_prepare(uint64_t flags, uint64_t shift) "flags=3D0x%"PR= Ix64", shift=3D%"PRIu64 +spapr_h_resize_hpt_commit(uint64_t flags, uint64_t shift) "flags=3D0x%"PRI= x64", shift=3D%"PRIu64 =20 # hw/ppc/spapr_iommu.c spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret)= "liobn=3D%"PRIx64" ioba=3D0x%"PRIx64" tce=3D0x%"PRIx64" ret=3D%"PRId64 diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index ba9e689..4d2c89c 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -42,6 +42,13 @@ typedef struct sPAPRMachineClass sPAPRMachineClass; #define SPAPR_MACHINE_CLASS(klass) \ OBJECT_CLASS_CHECK(sPAPRMachineClass, klass, TYPE_SPAPR_MACHINE) =20 +typedef enum { + SPAPR_RESIZE_HPT_DEFAULT =3D 0, + SPAPR_RESIZE_HPT_DISABLED, + SPAPR_RESIZE_HPT_ENABLED, + SPAPR_RESIZE_HPT_REQUIRED, +} sPAPRResizeHPT; + /** * sPAPRMachineClass: */ @@ -57,6 +64,7 @@ struct sPAPRMachineClass { uint64_t *buid, hwaddr *pio,=20 hwaddr *mmio32, hwaddr *mmio64, unsigned n_dma, uint32_t *liobns, Error **errp); + sPAPRResizeHPT resize_hpt_default; }; =20 /** @@ -72,6 +80,7 @@ struct sPAPRMachineState { ICSState *ics; sPAPRRTCState rtc; =20 + sPAPRResizeHPT resize_hpt; void *htab; uint32_t htab_shift; uint64_t patb_entry; /* Process tbl registed in H_REGISTER_PROCESS_TAB= LE */ @@ -361,6 +370,8 @@ struct sPAPRMachineState { #define H_XIRR_X 0x2FC #define H_RANDOM 0x300 #define H_SET_MODE 0x31C +#define H_RESIZE_HPT_PREPARE 0x36C +#define H_RESIZE_HPT_COMMIT 0x370 #define H_SIGNAL_SYS_RESET 0x380 #define MAX_HCALL_OPCODE H_SIGNAL_SYS_RESET =20 diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 031d31f..9d69bb4 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -22,6 +22,7 @@ #include =20 #include "qemu-common.h" +#include "qapi/error.h" #include "qemu/error-report.h" #include "cpu.h" #include "cpu-models.h" @@ -2624,3 +2625,14 @@ int kvmppc_enable_hwrng(void) =20 return kvmppc_enable_hcall(kvm_state, H_RANDOM); } + +void kvmppc_check_papr_resize_hpt(Error **errp) +{ + if (!kvm_enabled()) { + return; + } + + /* TODO: Check for resize-capable KVM implementations */ + + error_setg(errp, "Hash page table resizing not available with this KVM= version"); +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 08ecf75..67761e7 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -57,6 +57,7 @@ bool kvmppc_has_cap_htm(void); int kvmppc_enable_hwrng(void); int kvmppc_put_books_sregs(PowerPCCPU *cpu); PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void); +void kvmppc_check_papr_resize_hpt(Error **errp); =20 bool kvmppc_is_mem_backend_page_size_ok(char *obj_path); =20 @@ -269,6 +270,10 @@ static inline PowerPCCPUClass *kvm_ppc_get_host_cpu_cl= ass(void) return NULL; } =20 +static inline void kvmppc_check_papr_resize_hpt(Error **errp) +{ + return; +} #endif =20 #ifndef CONFIG_KVM --=20 2.9.3 From nobody Thu May 2 05:08:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1489469835583530.2580497075664; Mon, 13 Mar 2017 22:37:15 -0700 (PDT) Received: from localhost ([::1]:56460 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf90-000387-Ci for importer@patchew.org; Tue, 14 Mar 2017 01:37:14 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60920) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf6U-0001Qb-HJ for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cnf6Q-0002iU-Ki for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:38 -0400 Received: from ozlabs.org ([103.22.144.67]:54511) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cnf6P-0002do-Hd; Tue, 14 Mar 2017 01:34:34 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 3vj3MT35l3z9s0g; Tue, 14 Mar 2017 16:34:29 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1489469669; bh=K2P0VnORp1bjMATK0Zm4OSm36EZNeMjcAmAu7E2nP1Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CGPphhtikxdFirRiZ+ebpKdZ3obWyfXeNo10iuCkHSVCIPDTMs/WxcIdf0+vMFeca jobFwKg2QNinQXqDO9lKYQf+19tdTRFKz7WWglUZA5LqrgvnnfpYb1VBI04obsDLyO Us7KAh+uPz9IG5y9qK8t8Z2TjXcZpnrkns2hdyMA= From: David Gibson To: qemu-ppc@nongnu.org Date: Tue, 14 Mar 2017 16:34:19 +1100 Message-Id: <20170314053422.10922-3-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170314053422.10922-1-david@gibson.dropbear.id.au> References: <20170314053422.10922-1-david@gibson.dropbear.id.au> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 103.22.144.67 Subject: [Qemu-devel] [PATCHv2 for-2.10 2/5] pseries: Implement HPT resizing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lvivier@redhat.com, agraf@suse.de, thuth@redhat.com, aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org, David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" This patch implements hypercalls allowing a PAPR guest to resize its own hash page table. This will eventually allow for more flexible memory hotplug. The implementation is partially asynchronous, handled in a special thread running the hpt_prepare_thread() function. The state of a pending resize is stored in SPAPR_MACHINE->pending_hpt. The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or, if one is already in progress, monitor it for completion. If there is an existing HPT resize in progress that doesn't match the size specified in the call, it will cancel it, replacing it with a new one matching the given size. The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only be called successfully once H_RESIZE_HPT_PREPARE has successfully completed initialization of a new HPT. The guest must ensure that there are no concurrent accesses to the existing HPT while this is called (this effectively means stop_machine() for Linux guests). For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each HPTE into the new HPT. This can have quite high latency, but it seems to be of the order of typical migration downtime latencies for HPTs of size up to ~2GiB (which would be used in a 256GiB guest). In future we probably want to move more of the rehashing to the "prepare" phase, by having H_ENTER and other hcalls update both current and pending HPTs. That's a project for another day, but should be possible without any changes to the guest interface. Signed-off-by: David Gibson --- hw/ppc/spapr.c | 4 +- hw/ppc/spapr_hcall.c | 306 ++++++++++++++++++++++++++++++++++++++++++++= +++- include/hw/ppc/spapr.h | 6 + target/ppc/mmu-hash64.h | 4 + 4 files changed, 314 insertions(+), 6 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 558109c..83db110 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -94,8 +94,6 @@ =20 #define PHANDLE_XICP 0x00001111 =20 -#define HTAB_SIZE(spapr) (1ULL << ((spapr)->htab_shift)) - static int try_create_xics(sPAPRMachineState *spapr, const char *type_ics, const char *type_icp, int nr_servers, int nr_irqs, Error **errp) @@ -1169,7 +1167,7 @@ static void spapr_store_hpte(PPCVirtualHypervisor *vh= yp, hwaddr ptex, } } =20 -static int spapr_hpt_shift_for_ramsize(uint64_t ramsize) +int spapr_hpt_shift_for_ramsize(uint64_t ramsize) { int shift; =20 diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index 9f88960..cdafc3f 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -3,6 +3,7 @@ #include "sysemu/hw_accel.h" #include "sysemu/sysemu.h" #include "qemu/log.h" +#include "qemu/error-report.h" #include "cpu.h" #include "exec/exec-all.h" #include "helper_regs.h" @@ -352,20 +353,286 @@ static target_ulong h_read(PowerPCCPU *cpu, sPAPRMac= hineState *spapr, return H_SUCCESS; } =20 +struct sPAPRPendingHPT { + /* These fields are read-only after initialization */ + int shift; + QemuThread thread; + + /* These fields are protected by the BQL */ + bool complete; + + /* These fields are private to the preparation thread if + * !complete, otherwise protected by the BQL */ + int ret; + void *hpt; +}; + +static void free_pending_hpt(sPAPRPendingHPT *pending) +{ + if (pending->hpt) { + qemu_vfree(pending->hpt); + } + + g_free(pending); +} + +static void *hpt_prepare_thread(void *opaque) +{ + sPAPRPendingHPT *pending =3D opaque; + size_t size =3D 1ULL << pending->shift; + + pending->hpt =3D qemu_memalign(size, size); + if (pending->hpt) { + memset(pending->hpt, 0, size); + pending->ret =3D H_SUCCESS; + } else { + pending->ret =3D H_NO_MEM; + } + + qemu_mutex_lock_iothread(); + + if (SPAPR_MACHINE(qdev_get_machine())->pending_hpt =3D=3D pending) { + /* Ready to go */ + pending->complete =3D true; + } else { + /* We've been cancelled, clean ourselves up */ + free_pending_hpt(pending); + } + + qemu_mutex_unlock_iothread(); + return NULL; +} + +/* Must be called with BQL held */ +static void cancel_hpt_prepare(sPAPRMachineState *spapr) +{ + sPAPRPendingHPT *pending =3D spapr->pending_hpt; + + /* Let the thread know it's cancelled */ + spapr->pending_hpt =3D NULL; + + if (!pending) { + /* Nothing to do */ + return; + } + + if (!pending->complete) { + /* thread will clean itself up */ + return; + } + + free_pending_hpt(pending); +} + static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu, sPAPRMachineState *spapr, target_ulong opcode, target_ulong *args) { target_ulong flags =3D args[0]; - target_ulong shift =3D args[1]; + int shift =3D args[1]; + sPAPRPendingHPT *pending =3D spapr->pending_hpt; + uint64_t current_ram_size; =20 if (spapr->resize_hpt =3D=3D SPAPR_RESIZE_HPT_DISABLED) { return H_AUTHORITY; } =20 trace_spapr_h_resize_hpt_prepare(flags, shift); - return H_HARDWARE; + + if (flags !=3D 0) { + return H_PARAMETER; + } + + if (shift && ((shift < 18) || (shift > 46))) { + return H_PARAMETER; + } + + current_ram_size =3D pc_existing_dimms_capacity(&error_fatal); + + /* We only allow the guest to allocate an HPT one order above what + * we'd normally give them (to stop a small guest claiming a huge + * chunk of resources in the HPT */ + if (shift > (spapr_hpt_shift_for_ramsize(current_ram_size) + 1)) { + return H_RESOURCE; + } + + if (pending) { + /* something already in progress */ + if (pending->shift =3D=3D shift) { + /* and it's suitable */ + if (pending->complete) { + return pending->ret; + } else { + return H_LONG_BUSY_ORDER_100_MSEC; + } + } + + /* not suitable, cancel and replace */ + cancel_hpt_prepare(spapr); + } + + if (!shift) { + /* nothing to do */ + return H_SUCCESS; + } + + /* start new prepare */ + + pending =3D g_new0(sPAPRPendingHPT, 1); + pending->shift =3D shift; + pending->ret =3D H_HARDWARE; + + qemu_thread_create(&pending->thread, "sPAPR HPT prepare", + hpt_prepare_thread, pending, QEMU_THREAD_DETACHED); + + spapr->pending_hpt =3D pending; + + /* In theory we could estimate the time more accurately based on + * the new size, but there's not much point */ + return H_LONG_BUSY_ORDER_100_MSEC; +} + +static uint64_t new_hpte_load0(void *htab, uint64_t pteg, int slot) +{ + uint8_t *addr =3D htab; + + addr +=3D pteg * HASH_PTEG_SIZE_64; + addr +=3D slot * HASH_PTE_SIZE_64; + return ldq_p(addr); +} + +static void new_hpte_store(void *htab, uint64_t pteg, int slot, + uint64_t pte0, uint64_t pte1) +{ + uint8_t *addr =3D htab; + + addr +=3D pteg * HASH_PTEG_SIZE_64; + addr +=3D slot * HASH_PTE_SIZE_64; + + stq_p(addr, pte0); + stq_p(addr + HASH_PTE_SIZE_64/2, pte1); +} + +static int rehash_hpte(PowerPCCPU *cpu, + const ppc_hash_pte64_t *hptes, + void *old_hpt, uint64_t oldsize, + void *new_hpt, uint64_t newsize, + uint64_t pteg, int slot) +{ + uint64_t old_hash_mask =3D (oldsize >> 7) - 1; + uint64_t new_hash_mask =3D (newsize >> 7) - 1; + target_ulong pte0 =3D ppc_hash64_hpte0(cpu, hptes, slot); + target_ulong pte1; + uint64_t avpn; + unsigned base_pg_shift; + uint64_t hash, new_pteg, replace_pte0; + + if (!(pte0 & HPTE64_V_VALID) || !(pte0 & HPTE64_V_BOLTED)) { + return H_SUCCESS; + } + + pte1 =3D ppc_hash64_hpte1(cpu, hptes, slot); + + base_pg_shift =3D ppc_hash64_hpte_page_shift_noslb(cpu, pte0, pte1); + assert(base_pg_shift); /* H_ENTER shouldn't allow a bad encoding */ + avpn =3D HPTE64_V_AVPN_VAL(pte0) & ~(((1ULL << base_pg_shift) - 1) >> = 23); + + if (pte0 & HPTE64_V_SECONDARY) { + pteg =3D ~pteg; + } + + if ((pte0 & HPTE64_V_SSIZE) =3D=3D HPTE64_V_SSIZE_256M) { + uint64_t offset, vsid; + + /* We only have 28 - 23 bits of offset in avpn */ + offset =3D (avpn & 0x1f) << 23; + vsid =3D avpn >> 5; + /* We can find more bits from the pteg value */ + if (base_pg_shift < 23) { + offset |=3D ((vsid ^ pteg) & old_hash_mask) << base_pg_shift; + } + + hash =3D vsid ^ (offset >> base_pg_shift); + } else if ((pte0 & HPTE64_V_SSIZE) =3D=3D HPTE64_V_SSIZE_1T) { + uint64_t offset, vsid; + + /* We only have 40 - 23 bits of seg_off in avpn */ + offset =3D (avpn & 0x1ffff) << 23; + vsid =3D avpn >> 17; + if (base_pg_shift < 23) { + offset |=3D ((vsid ^ (vsid << 25) ^ pteg) & old_hash_mask) + << base_pg_shift; + } + + hash =3D vsid ^ (vsid << 25) ^ (offset >> base_pg_shift); + } else { + error_report("rehash_pte: Bad segment size in HPTE"); + return H_HARDWARE; + } + + new_pteg =3D hash & new_hash_mask; + if (pte0 & HPTE64_V_SECONDARY) { + assert(~pteg =3D=3D (hash & old_hash_mask)); + new_pteg =3D ~new_pteg; + } else { + assert(pteg =3D=3D (hash & old_hash_mask)); + } + assert((oldsize !=3D newsize) || (pteg =3D=3D new_pteg)); + replace_pte0 =3D new_hpte_load0(new_hpt, new_pteg, slot); + /* + * Strictly speaking, we don't need all these tests, since we only + * ever rehash bolted HPTEs. We might in future handle non-bolted + * HPTEs, though so make the logic correct for those cases as + * well. + */ + if (replace_pte0 & HPTE64_V_VALID) { + assert(newsize < oldsize); + if (replace_pte0 & HPTE64_V_BOLTED) { + if (pte0 & HPTE64_V_BOLTED) { + /* Bolted collision, nothing we can do */ + return H_PTEG_FULL; + } else { + /* Discard this hpte */ + return H_SUCCESS; + } + } + } + + new_hpte_store(new_hpt, new_pteg, slot, pte0, pte1); + return H_SUCCESS; +} + +static int rehash_hpt(PowerPCCPU *cpu, + void *old_hpt, uint64_t oldsize, + void *new_hpt, uint64_t newsize) +{ + uint64_t n_ptegs =3D oldsize >> 7; + uint64_t pteg; + int slot; + int rc; + + for (pteg =3D 0; pteg < n_ptegs; pteg++) { + hwaddr ptex =3D pteg *HPTES_PER_GROUP; + const ppc_hash_pte64_t *hptes + =3D ppc_hash64_map_hptes(cpu, ptex, HPTES_PER_GROUP); + + if (!hptes) { + return H_HARDWARE; + } + + for (slot =3D 0; slot < HPTES_PER_GROUP; slot++) { + rc =3D rehash_hpte(cpu, hptes, old_hpt, oldsize, new_hpt, news= ize, + pteg, slot); + if (rc !=3D H_SUCCESS) { + ppc_hash64_unmap_hptes(cpu, hptes, ptex, HPTES_PER_GROUP); + return rc; + } + } + ppc_hash64_unmap_hptes(cpu, hptes, ptex, HPTES_PER_GROUP); + } + + return H_SUCCESS; } =20 static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu, @@ -375,13 +642,46 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *c= pu, { target_ulong flags =3D args[0]; target_ulong shift =3D args[1]; + sPAPRPendingHPT *pending =3D spapr->pending_hpt; + int rc; + size_t newsize; =20 if (spapr->resize_hpt =3D=3D SPAPR_RESIZE_HPT_DISABLED) { return H_AUTHORITY; } =20 trace_spapr_h_resize_hpt_commit(flags, shift); - return H_HARDWARE; + + if (flags !=3D 0) { + return H_PARAMETER; + } + + if (!pending || (pending->shift !=3D shift)) { + /* no matching prepare */ + return H_CLOSED; + } + + if (!pending->complete) { + /* prepare has not completed */ + return H_BUSY; + } + + newsize =3D 1ULL << pending->shift; + rc =3D rehash_hpt(cpu, spapr->htab, HTAB_SIZE(spapr), + pending->hpt, newsize); + if (rc =3D=3D H_SUCCESS) { + qemu_vfree(spapr->htab); + spapr->htab =3D pending->hpt; + spapr->htab_shift =3D pending->shift; + + pending->hpt =3D NULL; /* so it's not free()d */ + } + + /* Clean up */ + spapr->pending_hpt =3D NULL; + free_pending_hpt(pending); + + return rc; } =20 static target_ulong h_set_sprg0(PowerPCCPU *cpu, sPAPRMachineState *spapr, diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 4d2c89c..ba5c7d5 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -14,6 +14,7 @@ struct sPAPRNVRAM; typedef struct sPAPRConfigureConnectorState sPAPRConfigureConnectorState; typedef struct sPAPREventLogEntry sPAPREventLogEntry; typedef struct sPAPREventSource sPAPREventSource; +typedef struct sPAPRPendingHPT sPAPRPendingHPT; =20 #define HPTE64_V_HPTE_DIRTY 0x0000000000000040ULL #define SPAPR_ENTRY_POINT 0x100 @@ -84,6 +85,8 @@ struct sPAPRMachineState { void *htab; uint32_t htab_shift; uint64_t patb_entry; /* Process tbl registed in H_REGISTER_PROCESS_TAB= LE */ + sPAPRPendingHPT *pending_hpt; /* in-progress resize */ + hwaddr rma_size; int vrma_adjust; ssize_t rtas_size; @@ -641,6 +644,7 @@ void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRC= onnectorType drc_type, uint32_t count, uint32_t in= dex); void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset, sPAPRMachineState *spapr); +int spapr_hpt_shift_for_ramsize(uint64_t ramsize); =20 /* rtas-configure-connector state */ struct sPAPRConfigureConnectorState { @@ -687,4 +691,6 @@ int spapr_rng_populate_dt(void *fdt); =20 void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg); =20 +#define HTAB_SIZE(spapr) (1ULL << ((spapr)->htab_shift)) + #endif /* HW_SPAPR_H */ diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h index 54f1e37..d297b97 100644 --- a/target/ppc/mmu-hash64.h +++ b/target/ppc/mmu-hash64.h @@ -63,11 +63,15 @@ void ppc_hash64_update_rmls(CPUPPCState *env); #define HASH_PTE_SIZE_64 16 #define HASH_PTEG_SIZE_64 (HASH_PTE_SIZE_64 * HPTES_PER_GROUP) =20 +#define HPTE64_V_SSIZE SLB_VSID_B +#define HPTE64_V_SSIZE_256M SLB_VSID_B_256M +#define HPTE64_V_SSIZE_1T SLB_VSID_B_1T #define HPTE64_V_SSIZE_SHIFT 62 #define HPTE64_V_AVPN_SHIFT 7 #define HPTE64_V_AVPN 0x3fffffffffffff80ULL #define HPTE64_V_AVPN_VAL(x) (((x) & HPTE64_V_AVPN) >> HPTE64_V_AVPN_SH= IFT) #define HPTE64_V_COMPARE(x, y) (!(((x) ^ (y)) & 0xffffffffffffff83ULL)) +#define HPTE64_V_BOLTED 0x0000000000000010ULL #define HPTE64_V_LARGE 0x0000000000000004ULL #define HPTE64_V_SECONDARY 0x0000000000000002ULL #define HPTE64_V_VALID 0x0000000000000001ULL --=20 2.9.3 From nobody Thu May 2 05:08:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1489469730185686.0697151962771; Mon, 13 Mar 2017 22:35:30 -0700 (PDT) Received: from localhost ([::1]:56442 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf7G-0001Pv-Ai for importer@patchew.org; Tue, 14 Mar 2017 01:35:26 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60875) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf6R-0001Pk-Gr for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cnf6Q-0002iF-JV for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:35 -0400 Received: from ozlabs.org ([2401:3900:2:1::2]:42691) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cnf6P-0002e3-QI; Tue, 14 Mar 2017 01:34:34 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 3vj3MT5BVdz9s75; Tue, 14 Mar 2017 16:34:29 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1489469669; bh=hryMIEu9KvAejCUUWXKzHZP/ptRTZahHnW9GfnCz0l8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mM9IT68FDstQdteVsgvJmxhjDtrIx6r9atdYFJIvGmpGE/nOkLWHZIh7QtS1mi5An UQI3gFu/ofu6/h7PmYGZ1HwTUOqrwmblMiYb6VTfGU1vXl2NirRLVbYCIjz48C1VMK vsuqfOAnXniK1juKe9vsJMF5qhWECEkEznzh+XKA= From: David Gibson To: qemu-ppc@nongnu.org Date: Tue, 14 Mar 2017 16:34:20 +1100 Message-Id: <20170314053422.10922-4-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170314053422.10922-1-david@gibson.dropbear.id.au> References: <20170314053422.10922-1-david@gibson.dropbear.id.au> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2401:3900:2:1::2 Subject: [Qemu-devel] [PATCHv2 for-2.10 3/5] pseries: Enable HPT resizing for 2.10 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lvivier@redhat.com, agraf@suse.de, thuth@redhat.com, aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org, David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" We've now implemented a PAPR extensions which allows PAPR guests (i.e. "pseries" machine type) to resize their hash page table during runtime. However, that extension is only enabled if explicitly chosen on the command line. This patch enables it by default for spapr-2.10, but leaves it disabled (by default) for older machine types. Signed-off-by: David Gibson --- hw/ppc/spapr.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 83db110..a6a1c93 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3155,7 +3155,7 @@ static void spapr_machine_class_init(ObjectClass *oc,= void *data) smc->dr_lmb_enabled =3D true; smc->tcg_default_cpu =3D "POWER8"; mc->has_hotpluggable_cpus =3D true; - smc->resize_hpt_default =3D SPAPR_RESIZE_HPT_DISABLED; + smc->resize_hpt_default =3D SPAPR_RESIZE_HPT_ENABLED; fwc->get_dev_path =3D spapr_get_fw_dev_path; nc->nmi_monitor_handler =3D spapr_nmi; smc->phb_placement =3D spapr_phb_placement; @@ -3246,8 +3246,11 @@ static void spapr_machine_2_9_instance_options(Machi= neState *machine) =20 static void spapr_machine_2_9_class_options(MachineClass *mc) { + sPAPRMachineClass *smc =3D SPAPR_MACHINE_CLASS(mc); + spapr_machine_2_10_class_options(mc); SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9); + smc->resize_hpt_default =3D SPAPR_RESIZE_HPT_DISABLED; } =20 DEFINE_SPAPR_MACHINE(2_9, "2.9", false); --=20 2.9.3 From nobody Thu May 2 05:08:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1489469817614963.0171623076527; Mon, 13 Mar 2017 22:36:57 -0700 (PDT) Received: from localhost ([::1]:56459 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf8i-0002un-3s for importer@patchew.org; Tue, 14 Mar 2017 01:36:56 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60883) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf6R-0001Pl-Sl for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cnf6Q-0002iR-KI for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:35 -0400 Received: from ozlabs.org ([2401:3900:2:1::2]:42121) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cnf6P-0002e4-Q3; Tue, 14 Mar 2017 01:34:34 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 3vj3MT5m5fz9s78; Tue, 14 Mar 2017 16:34:29 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1489469669; bh=d1zp9fn1/GJDQt8/kHvvhb74H0y2mFt+4OQh5ZNzX0U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=f8ord4pf+Iiq8LCBuHj8NZMTLHU4vsAcDjsdylx4mcfjXFjqfDCB2lUCmmjDpusWy DAsmj8RST7D+mAYdpmUKPrlVLWOvrRb1/uVXfKKJAWtyj0YXA+YYQ1gnJk7UjAjTXE IDaRd8B3tYFGxLraOS99/4j8Ii7teN+r88IvtSgs= From: David Gibson To: qemu-ppc@nongnu.org Date: Tue, 14 Mar 2017 16:34:21 +1100 Message-Id: <20170314053422.10922-5-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170314053422.10922-1-david@gibson.dropbear.id.au> References: <20170314053422.10922-1-david@gibson.dropbear.id.au> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2401:3900:2:1::2 Subject: [Qemu-devel] [PATCHv2 for-2.10 4/5] pseries: Use smaller default hash page tables when guest can resize X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lvivier@redhat.com, agraf@suse.de, thuth@redhat.com, aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org, David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" We've now implemented a PAPR extension allowing PAPR guest to resize their hash page table (HPT) during runtime. This patch makes use of that facility to allocate smaller HPTs by default. Specifically when a guest is aware of the HPT resize facility, qemu sizes the HPT to the initial memory size, rather than the maximum memory size on the assumption that the guest will resize its HPT if necessary for hot plugged memory. When the initial memory size is much smaller than the maximum memory size (a common configuration with e.g. oVirt / RHEV) then this can save significant memory on the HPT. If the guest does *not* advertise HPT resize awareness when it makes the ibm,client-architecture-support call, qemu resizes the HPT for maxmimum memory size (unless it's been configured not to allow such guests at all). For now we make that reallocation assuming the guest has not yet used the HPT at all. That's true in practice, but not, strictly, an architectural or PAPR requirement. If we need to in future we can fix this by having the client-architecture-support call reboot the guest with the revised HPT size (the client-architecture-support call is explicitly permitted to trigger a reboot in this way). Signed-off-by: David Gibson Reviewed-by: Suraj Jitindar Singh --- hw/ppc/spapr.c | 21 ++++++++++++++++----- hw/ppc/spapr_hcall.c | 28 ++++++++++++++++++++++++++++ include/hw/ppc/spapr.h | 2 ++ include/hw/ppc/spapr_ovec.h | 1 + 4 files changed, 47 insertions(+), 5 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index a6a1c93..295d654 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1180,8 +1180,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize) return shift; } =20 -static void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift, - Error **errp) +void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift, + Error **errp) { long rc; =20 @@ -1254,6 +1254,7 @@ static void ppc_spapr_reset(void) hwaddr rtas_addr, fdt_addr; void *fdt; int rc; + int hpt_shift; =20 /* Check for unknown sysbus devices */ foreach_dynamic_sysbus_device(find_unknown_sysbus_device, NULL); @@ -1261,9 +1262,14 @@ static void ppc_spapr_reset(void) spapr->patb_entry =3D 0; =20 /* Allocate and/or reset the hash page table */ - spapr_reallocate_hpt(spapr, - spapr_hpt_shift_for_ramsize(machine->maxram_size), - &error_fatal); + if ((spapr->resize_hpt =3D=3D SPAPR_RESIZE_HPT_DISABLED) + || (spapr->cas_reboot + && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) { + hpt_shift =3D spapr_hpt_shift_for_ramsize(machine->maxram_size); + } else { + hpt_shift =3D spapr_hpt_shift_for_ramsize(machine->ram_size); + } + spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal); =20 /* Update the RMA size if necessary */ if (spapr->vrma_adjust) { @@ -2092,6 +2098,11 @@ static void ppc_spapr_init(MachineState *machine) spapr_ovec_set(spapr->ov5, OV5_HP_EVT); } =20 + /* advertise support for HPT resizing */ + if (spapr->resize_hpt !=3D SPAPR_RESIZE_HPT_DISABLED) { + spapr_ovec_set(spapr->ov5, OV5_HPT_RESIZE); + } + /* init CPUs */ if (machine->cpu_model =3D=3D NULL) { machine->cpu_model =3D kvm_enabled() ? "host" : smc->tcg_default_c= pu; diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index cdafc3f..5893647 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -1314,6 +1314,34 @@ static target_ulong h_client_architecture_support(Po= werPCCPU *cpu, =20 ov5_guest =3D spapr_ovec_parse_vector(ov_table, 5); =20 + /* + * HPT resizing is a bit of a special case, because when enabled + * we assume the guest will support it until it says it doesn't, + * instead of assuming it won't support it until it says it does. + * Strictly speaking that approach could break for guests which + * don't make a CAS call, but those are so old we don't care about + * them. Without that assumption we'd have to make at least a + * temporary allocation of an HPT sized for max memory, which + * could be impossibly difficult under KVM HV if maxram is large. + */ + if (!spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) { + int maxshift =3D spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxra= m_size); + + if (spapr->resize_hpt =3D=3D SPAPR_RESIZE_HPT_REQUIRED) { + error_report( + "h_client_architecture_support: Guest doesn't support HPT = resizing, but resize-hpt=3Drequired"); + exit(1); + } + + if (spapr->htab_shift < maxshift) { + /* Guest doesn't know about HPT resizing, so we + * pre-emptively resize for the maximum permitted RAM. At + * the point this is called, nothing should have been + * entered into the existing HPT */ + spapr_reallocate_hpt(spapr, maxshift, &error_fatal); + } + } + /* NOTE: there are actually a number of ov5 bits where input from the * guest is always zero, and the platform/QEMU enables them independen= tly * of guest input. To model these properly we'd want some sort of mask, diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index ba5c7d5..d4a9ed7 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -645,6 +645,8 @@ void spapr_hotplug_req_remove_by_count_indexed(sPAPRDRC= onnectorType drc_type, void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset, sPAPRMachineState *spapr); int spapr_hpt_shift_for_ramsize(uint64_t ramsize); +void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift, + Error **errp); =20 /* rtas-configure-connector state */ struct sPAPRConfigureConnectorState { diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h index 355a344..f5fed87 100644 --- a/include/hw/ppc/spapr_ovec.h +++ b/include/hw/ppc/spapr_ovec.h @@ -47,6 +47,7 @@ typedef struct sPAPROptionVector sPAPROptionVector; #define OV5_DRCONF_MEMORY OV_BIT(2, 2) #define OV5_FORM1_AFFINITY OV_BIT(5, 0) #define OV5_HP_EVT OV_BIT(6, 5) +#define OV5_HPT_RESIZE OV_BIT(6, 7) =20 /* interfaces */ sPAPROptionVector *spapr_ovec_new(void); --=20 2.9.3 From nobody Thu May 2 05:08:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 148946992443571.24149370648831; Mon, 13 Mar 2017 22:38:44 -0700 (PDT) Received: from localhost ([::1]:56464 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnfAR-0004Kx-6o for importer@patchew.org; Tue, 14 Mar 2017 01:38:43 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60939) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cnf6V-0001Re-Ou for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cnf6U-0002my-Bl for qemu-devel@nongnu.org; Tue, 14 Mar 2017 01:34:39 -0400 Received: from ozlabs.org ([2401:3900:2:1::2]:55285) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cnf6T-0002it-Iy; Tue, 14 Mar 2017 01:34:38 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 3vj3MT6V3Tz9s7C; Tue, 14 Mar 2017 16:34:29 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1489469669; bh=1yCn00kMfHwLXCLOl+o+TPCYfnM5LhB6K7RqLufEUuU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=baIokPYdZ981ZYLzoIK4mQHzKM3lPILt/xLNi48sclE3VaJlU6qjg9j5E+/GcI7Yt k3NEMbfKkiGFVyPQrZ132m6wpGIx2INjvR5TV0/urtJkLU/INgNBqsRymeDeuVBLtd uR6N/uWvfHk9zaNpXG4hN5k21gdH9pzXTgVcXuWE= From: David Gibson To: qemu-ppc@nongnu.org Date: Tue, 14 Mar 2017 16:34:22 +1100 Message-Id: <20170314053422.10922-6-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170314053422.10922-1-david@gibson.dropbear.id.au> References: <20170314053422.10922-1-david@gibson.dropbear.id.au> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2401:3900:2:1::2 Subject: [Qemu-devel] [PATCHv2 for-2.10 5/5] pseries: Allow HPT resizing with KVM X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lvivier@redhat.com, agraf@suse.de, thuth@redhat.com, aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org, David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" So far, qemu implements the PAPR Hash Page Table (HPT) resizing extension with TCG. The same implementation will work with KVM PR, but we don't currently allow that. For KVM HV we can only implement resizing with the assistance of the host kernel, which needs a new capability and ioctl()s. This patch adds support for testing the new KVM capability and implementing the resize in terms of KVM facilities when necessary. If we're running on a kernel which doesn't have the new capability flag at all, we fall back to testing for PR vs. HV KVM using the same hack that we already use in a number of places for older kernels. Signed-off-by: David Gibson --- hw/ppc/spapr_hcall.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++= +++ target/ppc/kvm.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++= ++-- target/ppc/kvm_ppc.h | 21 ++++++++++++++++ 3 files changed, 152 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index 5893647..874dfc2 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -424,6 +424,44 @@ static void cancel_hpt_prepare(sPAPRMachineState *spap= r) free_pending_hpt(pending); } =20 +/* Convert a return code from the KVM ioctl()s implementing resize HPT + * into a PAPR hypercall return code */ +static target_ulong resize_hpt_convert_rc(int ret) +{ + if (ret >=3D 100000) { + return H_LONG_BUSY_ORDER_100_SEC; + } else if (ret >=3D 10000) { + return H_LONG_BUSY_ORDER_10_SEC; + } else if (ret >=3D 1000) { + return H_LONG_BUSY_ORDER_1_SEC; + } else if (ret >=3D 100) { + return H_LONG_BUSY_ORDER_100_MSEC; + } else if (ret >=3D 10) { + return H_LONG_BUSY_ORDER_10_MSEC; + } else if (ret > 0) { + return H_LONG_BUSY_ORDER_1_MSEC; + } + + switch (ret) { + case 0: + return H_SUCCESS; + case -EPERM: + return H_AUTHORITY; + case -EINVAL: + return H_PARAMETER; + case -ENXIO: + return H_CLOSED; + case -ENOSPC: + return H_PTEG_FULL; + case -EBUSY: + return H_BUSY; + case -ENOMEM: + return H_NO_MEM; + default: + return H_HARDWARE; + } +} + static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu, sPAPRMachineState *spapr, target_ulong opcode, @@ -433,6 +471,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cp= u, int shift =3D args[1]; sPAPRPendingHPT *pending =3D spapr->pending_hpt; uint64_t current_ram_size; + int rc; =20 if (spapr->resize_hpt =3D=3D SPAPR_RESIZE_HPT_DISABLED) { return H_AUTHORITY; @@ -457,6 +496,11 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *c= pu, return H_RESOURCE; } =20 + rc =3D kvmppc_resize_hpt_prepare(cpu, flags, shift); + if (rc !=3D -ENOSYS) { + return resize_hpt_convert_rc(rc); + } + if (pending) { /* something already in progress */ if (pending->shift =3D=3D shift) { @@ -652,6 +696,11 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cp= u, =20 trace_spapr_h_resize_hpt_commit(flags, shift); =20 + rc =3D kvmppc_resize_hpt_commit(cpu, flags, shift); + if (rc !=3D -ENOSYS) { + return resize_hpt_convert_rc(rc); + } + if (flags !=3D 0) { return H_PARAMETER; } @@ -674,6 +723,13 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cp= u, spapr->htab =3D pending->hpt; spapr->htab_shift =3D pending->shift; =20 + if (kvm_enabled()) { + /* For KVM PR, update the HPT pointer */ + target_ulong sdr1 =3D (target_ulong)(uintptr_t)spapr->htab + | (spapr->htab_shift - 18); + kvmppc_update_sdr1(sdr1); + } + pending->hpt =3D NULL; /* so it's not free()d */ } =20 @@ -1334,11 +1390,21 @@ static target_ulong h_client_architecture_support(P= owerPCCPU *cpu, } =20 if (spapr->htab_shift < maxshift) { + CPUState *cs; + /* Guest doesn't know about HPT resizing, so we * pre-emptively resize for the maximum permitted RAM. At * the point this is called, nothing should have been * entered into the existing HPT */ spapr_reallocate_hpt(spapr, maxshift, &error_fatal); + CPU_FOREACH(cs) { + if (kvm_enabled()) { + /* For KVM PR, update the HPT pointer */ + target_ulong sdr1 =3D (target_ulong)(uintptr_t)spapr->= htab + | (spapr->htab_shift - 18); + kvmppc_update_sdr1(sdr1); + } + } } } =20 diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 9d69bb4..4a911c9 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -85,6 +85,7 @@ static int cap_papr; static int cap_htab_fd; static int cap_fixup_hcalls; static int cap_htm; /* Hardware transactional memory support */ +static int cap_resize_hpt; =20 static uint32_t debug_inst_opcode; =20 @@ -139,6 +140,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) cap_htab_fd =3D kvm_check_extension(s, KVM_CAP_PPC_HTAB_FD); cap_fixup_hcalls =3D kvm_check_extension(s, KVM_CAP_PPC_FIXUP_HCALL); cap_htm =3D kvm_vm_check_extension(s, KVM_CAP_PPC_HTM); + cap_resize_hpt =3D kvm_vm_check_extension(s, KVM_CAP_SPAPR_RESIZE_HPT); =20 if (!cap_interrupt_level) { fprintf(stderr, "KVM: Couldn't find level irq capability. Expect t= he " @@ -2629,10 +2631,71 @@ int kvmppc_enable_hwrng(void) void kvmppc_check_papr_resize_hpt(Error **errp) { if (!kvm_enabled()) { - return; + return; /* No KVM, we're good */ + } + + if (cap_resize_hpt) { + return; /* Kernel has explicit support, we're good */ } =20 - /* TODO: Check for resize-capable KVM implementations */ + /* Otherwise fallback on looking for PR KVM */ + if (kvmppc_is_pr(kvm_state)) { + return; + } =20 error_setg(errp, "Hash page table resizing not available with this KVM= version"); } + +int kvmppc_resize_hpt_prepare(PowerPCCPU *cpu, target_ulong flags, int shi= ft) +{ + CPUState *cs =3D CPU(cpu); + struct kvm_ppc_resize_hpt rhpt =3D { + .flags =3D flags, + .shift =3D shift, + }; + + if (!cap_resize_hpt) { + return -ENOSYS; + } + + return kvm_vm_ioctl(cs->kvm_state, KVM_PPC_RESIZE_HPT_PREPARE, &rhpt); +} + +int kvmppc_resize_hpt_commit(PowerPCCPU *cpu, target_ulong flags, int shif= t) +{ + CPUState *cs =3D CPU(cpu); + struct kvm_ppc_resize_hpt rhpt =3D { + .flags =3D flags, + .shift =3D shift, + }; + + if (!cap_resize_hpt) { + return -ENOSYS; + } + + return kvm_vm_ioctl(cs->kvm_state, KVM_PPC_RESIZE_HPT_COMMIT, &rhpt); +} + +static void kvmppc_pivot_hpt_cpu(CPUState *cs, run_on_cpu_data arg) +{ + target_ulong sdr1 =3D arg.target_ptr; + PowerPCCPU *cpu =3D POWERPC_CPU(cs); + CPUPPCState *env =3D &cpu->env; + + /* This is just for the benefit of PR KVM */ + cpu_synchronize_state(cs); + env->spr[SPR_SDR1] =3D sdr1; + if (kvmppc_put_books_sregs(cpu) < 0) { + error_report("Unable to update SDR1 in KVM"); + exit(1); + } +} + +void kvmppc_update_sdr1(target_ulong sdr1) +{ + CPUState *cs; + + CPU_FOREACH(cs) { + run_on_cpu(cs, kvmppc_pivot_hpt_cpu, RUN_ON_CPU_TARGET_PTR(sdr1)); + } +} diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index 67761e7..b0842b4 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -58,6 +58,9 @@ int kvmppc_enable_hwrng(void); int kvmppc_put_books_sregs(PowerPCCPU *cpu); PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void); void kvmppc_check_papr_resize_hpt(Error **errp); +int kvmppc_resize_hpt_prepare(PowerPCCPU *cpu, target_ulong flags, int shi= ft); +int kvmppc_resize_hpt_commit(PowerPCCPU *cpu, target_ulong flags, int shif= t); +void kvmppc_update_sdr1(target_ulong sdr1); =20 bool kvmppc_is_mem_backend_page_size_ok(char *obj_path); =20 @@ -274,6 +277,24 @@ static inline void kvmppc_check_papr_resize_hpt(Error = **errp) { return; } + +static inline int kvmppc_resize_hpt_prepare(PowerPCCPU *cpu, + target_ulong flags, int shift) +{ + return -ENOSYS; +} + +static inline int kvmppc_resize_hpt_commit(PowerPCCPU *cpu, + target_ulong flags, int shift) +{ + return -ENOSYS; +} + +static inline void kvmppc_update_sdr1(target_ulong sdr1) +{ + abort(); +} + #endif =20 #ifndef CONFIG_KVM --=20 2.9.3