From nobody Sun May 5 12:36:37 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 154480946590278.02380973446452; Fri, 14 Dec 2018 09:44:25 -0800 (PST) Received: from localhost ([::1]:34792 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gXrVV-0007rY-1G for importer@patchew.org; Fri, 14 Dec 2018 12:44:13 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43564) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gXrU3-0006BD-Cm for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gXrTy-0000in-IT for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:43 -0500 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:37263) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gXrTy-0000hI-1E for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:38 -0500 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 6278BE2F; Fri, 14 Dec 2018 12:42:36 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 14 Dec 2018 12:42:36 -0500 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 4EFA1102F1; Fri, 14 Dec 2018 12:42:35 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=mesmtp; bh=MIX3vCXOYC9gw1f/sGJPxDGAFOF0zZ9csugcxwiHsT8=; b=RG4R+Mg4qNJq bZCp3sp++cLP6kHgHxp6lJPLxPafGpthUphJL9WTfwNH9Lns6v7WC6StUyDqjhv1 MBUtS9JX9p46BCXkPdS6aLKx4ZWpAuaIBfA8qsd1SJfp9yjFRxKDma2IYpl7AoRj DZ+BwyLOnxAHIERhygrX5+GhtPmkfPg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=MIX3vCXOYC9gw1f/sGJPxDGAFOF0zZ9csugcxwiHs T8=; b=Qg4uwdSIWKBh2nw8pFcfbb/8k8osqhJlQNwF97cODFc1JiYnnnc2tOb9A o5FOp4BhPRjEdRowf1V8IcVDgB5wxMhWW3sq296YMJn44wAe7/GC4GXrj3hpCghR w7pM7grX3TQcgJ+SuqOelNgpM62kUDfXk5KsC3ymnaTE8xGmUrN3huXa2d1LbMcO Hv14j2Ysihz91w0VfmpcosJSuKaaiBhKOtqOi5Hrvt3Fxt+bX0f43X1tcgsUOKuD 9yZkY2j58SFNzh0sGKsaaQ1Im8MEeZvm7+aowunBYpqVceejBaFw+EuelGaQb7NU uNK1gk1BsLCuPtiahmQpYtOn8cXWQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtkedrudehgedguddtgecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfgggtgfesthekredtredtjeenucfhrhhomhepfdfgmhhilhhiohcuifdrucevohht rgdfuceotghothgrsegsrhgrrghprdhorhhgqeenucfkphepuddvkedrheelrddvtddrvd duieenucfrrghrrghmpehmrghilhhfrhhomheptghothgrsegsrhgrrghprdhorhhgnecu vehluhhsthgvrhfuihiivgeptd X-ME-Proxy: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Fri, 14 Dec 2018 12:42:31 -0500 Message-Id: <20181214174233.18027-2-cota@braap.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181214174233.18027-1-cota@braap.org> References: <20181214174233.18027-1-cota@braap.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 64.147.123.25 Subject: [Qemu-devel] [PATCH v5 1/3] cputlb: do not evict empty entries to the vtlb X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Currently we evict an entry to the victim TLB when it doesn't match the current address. But it could be that there's no match because the current entry is empty (i.e. all -1's, for instance via tlb_flush). Do not evict the entry to the vtlb in that case. This change will help us keep track of the TLB's use rate, which we'll use to implement a policy for dynamic TLB sizing. Reviewed-by: Alex Benn=C3=A9e Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- include/exec/cpu-all.h | 9 +++++++++ accel/tcg/cputlb.c | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index 117d2fbbca..e21140049b 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -362,6 +362,15 @@ static inline bool tlb_hit(target_ulong tlb_addr, targ= et_ulong addr) return tlb_hit_page(tlb_addr, addr & TARGET_PAGE_MASK); } =20 +/** + * tlb_entry_is_empty - return true if the entry is not in use + * @te: pointer to CPUTLBEntry + */ +static inline bool tlb_entry_is_empty(const CPUTLBEntry *te) +{ + return te->addr_read =3D=3D -1 && te->addr_write =3D=3D -1 && te->addr= _code =3D=3D -1; +} + void dump_exec_info(FILE *f, fprintf_function cpu_fprintf); void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf); #endif /* !CONFIG_USER_ONLY */ diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index af6bd8ccf9..5dc97212a9 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -591,7 +591,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulon= g vaddr, * Only evict the old entry to the victim tlb if it's for a * different page; otherwise just overwrite the stale data. */ - if (!tlb_hit_page_anyprot(te, vaddr_page)) { + if (!tlb_hit_page_anyprot(te, vaddr_page) && !tlb_entry_is_empty(te)) { unsigned vidx =3D env->tlb_d[mmu_idx].vindex++ % CPU_VTLB_SIZE; CPUTLBEntry *tv =3D &env->tlb_v_table[mmu_idx][vidx]; =20 --=20 2.17.1 From nobody Sun May 5 12:36:37 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1544809553274733.6111755280043; Fri, 14 Dec 2018 09:45:53 -0800 (PST) Received: from localhost ([::1]:34806 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gXrX5-0000TK-PA for importer@patchew.org; Fri, 14 Dec 2018 12:45:51 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43572) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gXrU4-0006BG-0H for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gXrTy-0000j1-Jq for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:43 -0500 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:56133) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gXrTy-0000hK-1G for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:38 -0500 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 631611060; Fri, 14 Dec 2018 12:42:36 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 14 Dec 2018 12:42:36 -0500 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 8862F102F2; Fri, 14 Dec 2018 12:42:35 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= mesmtp; bh=QSheTdmQ40GRkDoHHViT6cq+NgKfubDwnLu5WYLfkx0=; b=I1JeC N7Kvdq8iU1XNnC2N8/85fYgtAxOU+yv+Re4O6NY4GWXO89r/5KaUOhzxlURPf/la i0L/Fvl8kzly2jKl/sf15oTTJapqLz9vdRiGGoI8lybYDmtQI//COx6T8wL5vH5r V0M+XC9mQzAnMOCWfHkTGmtv3dzOGGA5SJ1Itg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=QSheTdmQ40GRkDoHHViT6cq+NgKfu bDwnLu5WYLfkx0=; b=LnuW9q/faMb2PFAh2FosjNwz3qMYYSdiCWZAhMrXopnsu ozRzogUZ4wc+xQuor/L9Rw2fEm9fYpZGpi14GTmZMHgvI7ppFjXMi9jLF/y09MuU 87HEN+qCprAPkDthfSY/MX4I5EOrf4h7ZB6eFqA2+wyqu0bESN5QtaaX1111e8nF dF1q0U3+ZEB7cLYjkMpr07ohNw22T+ig9AKFfyNp5dkB213PPUSnKOV39S21GcAs 5w2N1i3eJPsNWb+UCbiSlmKM+n9FKt2elNmylMlFMgLWGatAhx9zf/EgsnZ6ndOs HuGioF2LXHXvD0WPFk22qrPROdzJnUj+BOJQsbtVw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtkedrudehgedguddtgecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfsedttdertdertddtnecuhfhrohhmpedfgfhmihhlihhoucfirdcuvehothgrfdcu oegtohhtrgessghrrggrphdrohhrgheqnecukfhppeduvdekrdehledrvddtrddvudeine curfgrrhgrmhepmhgrihhlfhhrohhmpegtohhtrgessghrrggrphdrohhrghenucevlhhu shhtvghrufhiiigvpedt X-ME-Proxy: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Fri, 14 Dec 2018 12:42:32 -0500 Message-Id: <20181214174233.18027-3-cota@braap.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181214174233.18027-1-cota@braap.org> References: <20181214174233.18027-1-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 64.147.123.25 Subject: [Qemu-devel] [PATCH v5 2/3] tcg: introduce dynamic TLB sizing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Disabled in all TCG backends for now. Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 48 +++++++++- include/exec/cpu_ldst.h | 21 +++++ tcg/aarch64/tcg-target.h | 1 + tcg/arm/tcg-target.h | 1 + tcg/i386/tcg-target.h | 1 + tcg/mips/tcg-target.h | 1 + tcg/ppc/tcg-target.h | 1 + tcg/s390/tcg-target.h | 1 + tcg/sparc/tcg-target.h | 1 + tcg/tci/tcg-target.h | 1 + accel/tcg/cputlb.c | 184 +++++++++++++++++++++++++++++++++++++-- 11 files changed, 254 insertions(+), 7 deletions(-) diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h index 6a60f94a41..b14a00b027 100644 --- a/include/exec/cpu-defs.h +++ b/include/exec/cpu-defs.h @@ -67,6 +67,19 @@ typedef uint64_t target_ulong; #define CPU_TLB_ENTRY_BITS 5 #endif =20 +#if TCG_TARGET_IMPLEMENTS_DYN_TLB +#define CPU_TLB_DYN_MIN_BITS 6 +#define CPU_TLB_DYN_DEFAULT_BITS 8 +/* + * Assuming TARGET_PAGE_BITS=3D=3D12, with 2**22 entries we can cover 2**(= 22+12) =3D=3D + * 2**34 =3D=3D 16G of address space. This is roughly what one would expec= t a + * TLB to cover in a modern (as of 2018) x86_64 CPU. For instance, Intel + * Skylake's Level-2 STLB has 16 1G entries. + */ +#define CPU_TLB_DYN_MAX_BITS 22 + +#else /* !TCG_TARGET_IMPLEMENTS_DYN_TLB */ + /* TCG_TARGET_TLB_DISPLACEMENT_BITS is used in CPU_TLB_BITS to ensure that * the TLB is not unnecessarily small, but still small enough for the * TLB lookup instruction sequence used by the TCG target. @@ -98,6 +111,7 @@ typedef uint64_t target_ulong; NB_MMU_MODES <=3D 8 ? 3 : 4)) =20 #define CPU_TLB_SIZE (1 << CPU_TLB_BITS) +#endif /* TCG_TARGET_IMPLEMENTS_DYN_TLB */ =20 typedef struct CPUTLBEntry { /* bit TARGET_LONG_BITS to TARGET_PAGE_BITS : virtual address @@ -141,6 +155,18 @@ typedef struct CPUIOTLBEntry { MemTxAttrs attrs; } CPUIOTLBEntry; =20 +/** + * struct CPUTLBWindow + * @begin_ns: host time (in ns) at the beginning of the time window + * @max_entries: maximum number of entries observed in the window + * + * See also: tlb_mmu_resize_locked() + */ +typedef struct CPUTLBWindow { + int64_t begin_ns; + size_t max_entries; +} CPUTLBWindow; + typedef struct CPUTLBDesc { /* * Describe a region covering all of the large pages allocated @@ -152,6 +178,10 @@ typedef struct CPUTLBDesc { target_ulong large_page_mask; /* The next index to use in the tlb victim table. */ size_t vindex; +#if TCG_TARGET_IMPLEMENTS_DYN_TLB + CPUTLBWindow window; + size_t n_used_entries; +#endif } CPUTLBDesc; =20 /* @@ -176,6 +206,20 @@ typedef struct CPUTLBCommon { size_t elide_flush_count; } CPUTLBCommon; =20 +#if TCG_TARGET_IMPLEMENTS_DYN_TLB +# define CPU_TLB \ + /* tlb_mask[i] contains (n_entries - 1) << CPU_TLB_ENTRY_BITS */ \ + uintptr_t tlb_mask[NB_MMU_MODES]; \ + CPUTLBEntry *tlb_table[NB_MMU_MODES]; +# define CPU_IOTLB \ + CPUIOTLBEntry *iotlb[NB_MMU_MODES]; +#else +# define CPU_TLB \ + CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; +# define CPU_IOTLB \ + CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; +#endif + /* * The meaning of each of the MMU modes is defined in the target code. * Note that NB_MMU_MODES is not yet defined; we can only reference it @@ -184,9 +228,9 @@ typedef struct CPUTLBCommon { #define CPU_COMMON_TLB \ CPUTLBCommon tlb_c; \ CPUTLBDesc tlb_d[NB_MMU_MODES]; \ - CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; \ + CPU_TLB \ CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE]; \ - CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; \ + CPU_IOTLB \ CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE]; =20 #else diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index 959068495a..83b2907d86 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -135,6 +135,21 @@ static inline target_ulong tlb_addr_write(const CPUTLB= Entry *entry) #endif } =20 +#if TCG_TARGET_IMPLEMENTS_DYN_TLB +/* Find the TLB index corresponding to the mmu_idx + address pair. */ +static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx, + target_ulong addr) +{ + uintptr_t size_mask =3D env->tlb_mask[mmu_idx] >> CPU_TLB_ENTRY_BITS; + + return (addr >> TARGET_PAGE_BITS) & size_mask; +} + +static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx) +{ + return (env->tlb_mask[mmu_idx] >> CPU_TLB_ENTRY_BITS) + 1; +} +#else /* Find the TLB index corresponding to the mmu_idx + address pair. */ static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx, target_ulong addr) @@ -142,6 +157,12 @@ static inline uintptr_t tlb_index(CPUArchState *env, u= intptr_t mmu_idx, return (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); } =20 +static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx) +{ + return CPU_TLB_SIZE; +} +#endif /* TCG_TARGET_IMPLEMENTS_DYN_TLB */ + /* Find the TLB entry corresponding to the mmu_idx + address pair. */ static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx, target_ulong addr) diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h index 9aea1d1771..3060d83d14 100644 --- a/tcg/aarch64/tcg-target.h +++ b/tcg/aarch64/tcg-target.h @@ -15,6 +15,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 24 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 #undef TCG_TARGET_STACK_GROWSUP =20 typedef enum { diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h index 94b3578c55..0e8b79d20f 100644 --- a/tcg/arm/tcg-target.h +++ b/tcg/arm/tcg-target.h @@ -60,6 +60,7 @@ extern int arm_arch; #undef TCG_TARGET_STACK_GROWSUP #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 typedef enum { TCG_REG_R0 =3D 0, diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 9fdf37f23c..9e4bfa90d1 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -27,6 +27,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 1 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 31 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h index a8222476f0..a97f31113e 100644 --- a/tcg/mips/tcg-target.h +++ b/tcg/mips/tcg-target.h @@ -37,6 +37,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 #define TCG_TARGET_NB_REGS 32 =20 typedef enum { diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h index be52ad1d2e..8f03328af4 100644 --- a/tcg/ppc/tcg-target.h +++ b/tcg/ppc/tcg-target.h @@ -34,6 +34,7 @@ #define TCG_TARGET_NB_REGS 32 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 typedef enum { TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R3, diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h index 6f2b06a7d1..df92f3065a 100644 --- a/tcg/s390/tcg-target.h +++ b/tcg/s390/tcg-target.h @@ -27,6 +27,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 2 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 19 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 typedef enum TCGReg { TCG_REG_R0 =3D 0, diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h index d8339bf010..975ddc7b0d 100644 --- a/tcg/sparc/tcg-target.h +++ b/tcg/sparc/tcg-target.h @@ -29,6 +29,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 32 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 #define TCG_TARGET_NB_REGS 32 =20 typedef enum { diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h index 26140d78cb..bcfd8d69e6 100644 --- a/tcg/tci/tcg-target.h +++ b/tcg/tci/tcg-target.h @@ -43,6 +43,7 @@ #define TCG_TARGET_INTERPRETER 1 #define TCG_TARGET_INSN_UNIT_SIZE 1 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 32 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 #if UINTPTR_MAX =3D=3D UINT32_MAX # define TCG_TARGET_REG_BITS 32 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 5dc97212a9..b5af36ad1a 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -74,6 +74,169 @@ QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_= cpu_data)); QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16); #define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1) =20 +#if TCG_TARGET_IMPLEMENTS_DYN_TLB +static inline size_t sizeof_tlb(CPUArchState *env, uintptr_t mmu_idx) +{ + return env->tlb_mask[mmu_idx] + (1 << CPU_TLB_ENTRY_BITS); +} + +static void tlb_window_reset(CPUTLBWindow *window, int64_t ns, + size_t max_entries) +{ + window->begin_ns =3D ns; + window->max_entries =3D max_entries; +} + +static void tlb_dyn_init(CPUArchState *env) +{ + int i; + + for (i =3D 0; i < NB_MMU_MODES; i++) { + CPUTLBDesc *desc =3D &env->tlb_d[i]; + size_t n_entries =3D 1 << CPU_TLB_DYN_DEFAULT_BITS; + + tlb_window_reset(&desc->window, get_clock_realtime(), 0); + desc->n_used_entries =3D 0; + env->tlb_mask[i] =3D (n_entries - 1) << CPU_TLB_ENTRY_BITS; + env->tlb_table[i] =3D g_new(CPUTLBEntry, n_entries); + env->iotlb[i] =3D g_new(CPUIOTLBEntry, n_entries); + } +} + +/** + * tlb_mmu_resize_locked() - perform TLB resize bookkeeping; resize if nec= essary + * @env: CPU that owns the TLB + * @mmu_idx: MMU index of the TLB + * + * Called with tlb_lock_held. + * + * We have two main constraints when resizing a TLB: (1) we only resize it + * on a TLB flush (otherwise we'd have to take a perf hit by either rehash= ing + * the array or unnecessarily flushing it), which means we do not control = how + * frequently the resizing can occur; (2) we don't have access to the gues= t's + * future scheduling decisions, and therefore have to decide the magnitude= of + * the resize based on past observations. + * + * In general, a memory-hungry process can benefit greatly from an appropr= iately + * sized TLB, since a guest TLB miss is very expensive. This doesn't mean = that + * we just have to make the TLB as large as possible; while an oversized T= LB + * results in minimal TLB miss rates, it also takes longer to be flushed + * (flushes can be _very_ frequent), and the reduced locality can also hurt + * performance. + * + * To achieve near-optimal performance for all kinds of workloads, we: + * + * 1. Aggressively increase the size of the TLB when the use rate of the + * TLB being flushed is high, since it is likely that in the near future t= his + * memory-hungry process will execute again, and its memory hungriness will + * probably be similar. + * + * 2. Slowly reduce the size of the TLB as the use rate declines over a + * reasonably large time window. The rationale is that if in such a time w= indow + * we have not observed a high TLB use rate, it is likely that we won't ob= serve + * it in the near future. In that case, once a time window expires we down= size + * the TLB to match the maximum use rate observed in the window. + * + * 3. Try to keep the maximum use rate in a time window in the 30-70% rang= e, + * since in that range performance is likely near-optimal. Recall that the= TLB + * is direct mapped, so we want the use rate to be low (or at least not too + * high), since otherwise we are likely to have a significant amount of + * conflict misses. + */ +static void tlb_mmu_resize_locked(CPUArchState *env, int mmu_idx) +{ + CPUTLBDesc *desc =3D &env->tlb_d[mmu_idx]; + size_t old_size =3D tlb_n_entries(env, mmu_idx); + size_t rate; + size_t new_size =3D old_size; + int64_t now =3D get_clock_realtime(); + int64_t window_len_ms =3D 100; + int64_t window_len_ns =3D window_len_ms * 1000 * 1000; + bool window_expired =3D now > desc->window.begin_ns + window_len_ns; + + if (desc->n_used_entries > desc->window.max_entries) { + desc->window.max_entries =3D desc->n_used_entries; + } + rate =3D desc->window.max_entries * 100 / old_size; + + if (rate =3D=3D 100) { + new_size =3D MIN(old_size << 2, 1 << CPU_TLB_DYN_MAX_BITS); + } else if (rate > 70) { + new_size =3D MIN(old_size << 1, 1 << CPU_TLB_DYN_MAX_BITS); + } else if (rate < 30 && window_expired) { + size_t ceil =3D pow2ceil(desc->window.max_entries); + size_t expected_rate =3D desc->window.max_entries * 100 / ceil; + + /* + * Avoid undersizing when the max number of entries seen is just b= elow + * a pow2. For instance, if max_entries =3D=3D 1025, the expected = use rate + * would be 1025/2048=3D=3D50%. However, if max_entries =3D=3D 102= 3, we'd get + * 1023/1024=3D=3D99.9% use rate, so we'd likely end up doubling t= he size + * later. Thus, make sure that the expected use rate remains below= 70%. + * (and since we double the size, that means the lowest rate we'd + * expect to get is 35%, which is still in the 30-70% range where + * we consider that the size is appropriate.) + */ + if (expected_rate > 70) { + ceil *=3D 2; + } + new_size =3D MAX(ceil, 1 << CPU_TLB_DYN_MIN_BITS); + } + + if (new_size =3D=3D old_size) { + if (window_expired) { + tlb_window_reset(&desc->window, now, desc->n_used_entries); + } + return; + } + + g_free(env->tlb_table[mmu_idx]); + g_free(env->iotlb[mmu_idx]); + + tlb_window_reset(&desc->window, now, 0); + /* desc->n_used_entries is cleared by the caller */ + env->tlb_mask[mmu_idx] =3D (new_size - 1) << CPU_TLB_ENTRY_BITS; + env->tlb_table[mmu_idx] =3D g_new(CPUTLBEntry, new_size); + env->iotlb[mmu_idx] =3D g_new(CPUIOTLBEntry, new_size); +} + +static inline void tlb_table_flush_by_mmuidx(CPUArchState *env, int mmu_id= x) +{ + tlb_mmu_resize_locked(env, mmu_idx); + memset(env->tlb_table[mmu_idx], -1, sizeof_tlb(env, mmu_idx)); + env->tlb_d[mmu_idx].n_used_entries =3D 0; +} + +static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu= _idx) +{ + env->tlb_d[mmu_idx].n_used_entries++; +} + +static inline void tlb_n_used_entries_dec(CPUArchState *env, uintptr_t mmu= _idx) +{ + env->tlb_d[mmu_idx].n_used_entries--; +} + +#else /* !TCG_TARGET_IMPLEMENTS_DYN_TLB */ + +static inline void tlb_dyn_init(CPUArchState *env) +{ +} + +static inline void tlb_table_flush_by_mmuidx(CPUArchState *env, int mmu_id= x) +{ + memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0])); +} + +static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu= _idx) +{ +} + +static inline void tlb_n_used_entries_dec(CPUArchState *env, uintptr_t mmu= _idx) +{ +} +#endif /* TCG_TARGET_IMPLEMENTS_DYN_TLB */ + void tlb_init(CPUState *cpu) { CPUArchState *env =3D cpu->env_ptr; @@ -82,6 +245,8 @@ void tlb_init(CPUState *cpu) =20 /* Ensure that cpu_reset performs a full flush. */ env->tlb_c.dirty =3D ALL_MMUIDX_BITS; + + tlb_dyn_init(env); } =20 /* flush_all_helper: run fn across all cpus @@ -122,7 +287,7 @@ void tlb_flush_counts(size_t *pfull, size_t *ppart, siz= e_t *pelide) =20 static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx) { - memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0])); + tlb_table_flush_by_mmuidx(env, mmu_idx); memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0])); env->tlb_d[mmu_idx].large_page_addr =3D -1; env->tlb_d[mmu_idx].large_page_mask =3D -1; @@ -225,12 +390,14 @@ static inline bool tlb_hit_page_anyprot(CPUTLBEntry *= tlb_entry, } =20 /* Called with tlb_c.lock held */ -static inline void tlb_flush_entry_locked(CPUTLBEntry *tlb_entry, +static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry, target_ulong page) { if (tlb_hit_page_anyprot(tlb_entry, page)) { memset(tlb_entry, -1, sizeof(*tlb_entry)); + return true; } + return false; } =20 /* Called with tlb_c.lock held */ @@ -241,7 +408,9 @@ static inline void tlb_flush_vtlb_page_locked(CPUArchSt= ate *env, int mmu_idx, =20 assert_cpu_is_self(ENV_GET_CPU(env)); for (k =3D 0; k < CPU_VTLB_SIZE; k++) { - tlb_flush_entry_locked(&env->tlb_v_table[mmu_idx][k], page); + if (tlb_flush_entry_locked(&env->tlb_v_table[mmu_idx][k], page)) { + tlb_n_used_entries_dec(env, mmu_idx); + } } } =20 @@ -258,7 +427,9 @@ static void tlb_flush_page_locked(CPUArchState *env, in= t midx, midx, lp_addr, lp_mask); tlb_flush_one_mmuidx_locked(env, midx); } else { - tlb_flush_entry_locked(tlb_entry(env, midx, page), page); + if (tlb_flush_entry_locked(tlb_entry(env, midx, page), page)) { + tlb_n_used_entries_dec(env, midx); + } tlb_flush_vtlb_page_locked(env, midx, page); } } @@ -435,8 +606,9 @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, = ram_addr_t length) qemu_spin_lock(&env->tlb_c.lock); for (mmu_idx =3D 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { unsigned int i; + unsigned int n =3D tlb_n_entries(env, mmu_idx); =20 - for (i =3D 0; i < CPU_TLB_SIZE; i++) { + for (i =3D 0; i < n; i++) { tlb_reset_dirty_range_locked(&env->tlb_table[mmu_idx][i], star= t1, length); } @@ -598,6 +770,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulon= g vaddr, /* Evict the old entry into the victim tlb. */ copy_tlb_helper_locked(tv, te); env->iotlb_v[mmu_idx][vidx] =3D env->iotlb[mmu_idx][index]; + tlb_n_used_entries_dec(env, mmu_idx); } =20 /* refill the tlb */ @@ -649,6 +822,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulon= g vaddr, } =20 copy_tlb_helper_locked(te, &tn); + tlb_n_used_entries_inc(env, mmu_idx); qemu_spin_unlock(&env->tlb_c.lock); } =20 --=20 2.17.1 From nobody Sun May 5 12:36:37 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1544809682046868.137755066684; Fri, 14 Dec 2018 09:48:02 -0800 (PST) Received: from localhost ([::1]:34813 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gXrZA-0001f8-Ul for importer@patchew.org; Fri, 14 Dec 2018 12:48:00 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43566) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gXrU3-0006BE-GL for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gXrTy-0000jD-Jk for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:43 -0500 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:53571) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gXrTy-0000hO-1E for qemu-devel@nongnu.org; Fri, 14 Dec 2018 12:42:38 -0500 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 90BB91164; Fri, 14 Dec 2018 12:42:36 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 14 Dec 2018 12:42:36 -0500 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id B8D9510085; Fri, 14 Dec 2018 12:42:35 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= mesmtp; bh=5al/8sqp4f8I4EPCCCo3Ij8gs5q/G9AymoK7nwrZJ2g=; b=EqgKD VBpYu+icSXEwA6djURitUUYjb7Iqit079kY4h593mqW3BWAgQONPXD19EQUmKCbE eCf54OrM5kk6TiDt4oLCstBWGuBQXUtiqBoW7UmIzZmIGyd2Qtx5nD9BYtnBEFCX 0nzb8hZfcEiiIKCp+lDavsSP0snDymwgllfIcU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=5al/8sqp4f8I4EPCCCo3Ij8gs5q/G 9AymoK7nwrZJ2g=; b=uShclCXTAnLcYW80oV7pRY2fHv3Gyy1RzilYcF3Z00Vd+ IjeWYP0VT8JUrixOrAJJRs5zVhScsuVMDyFfH37/LX2XTxqFiilhXThkesMteSoV Krwe+1h2os8CN1o4mUW+wd94YNaYlQWY+aky7h4zsPRjLa9vJ1hvhTRJN0vysqZk b34pX13caNFccUzMhkSnQrHg4LxovPM6AkLT8IWFsd0KrgCAMGHCariGi1HiQFwK lInULiSIe2xuz/ZFDZoa6vGqaTDd/MmbURv6C1n5VsX46SixIgnzI1US7NTAzH/3 HxToWTw3mN9iu8BTG9wOQ9y137EJOLUR0F4MurItA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtkedrudehgedguddtgecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnegoufhushhpvggtthffoh hmrghinhculdegledmnecujfgurhephffvufffkffojghfsedttdertdertddtnecuhfhr ohhmpedfgfhmihhlihhoucfirdcuvehothgrfdcuoegtohhtrgessghrrggrphdrohhrgh eqnecuffhomhgrihhnpehimhhguhhrrdgtohhmnecukfhppeduvdekrdehledrvddtrddv udeinecurfgrrhgrmhepmhgrihhlfhhrohhmpegtohhtrgessghrrggrphdrohhrghenuc evlhhushhtvghrufhiiigvpedt X-ME-Proxy: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Fri, 14 Dec 2018 12:42:33 -0500 Message-Id: <20181214174233.18027-4-cota@braap.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181214174233.18027-1-cota@braap.org> References: <20181214174233.18027-1-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 64.147.123.25 Subject: [Qemu-devel] [PATCH v5 3/3] tcg/i386: enable dynamic TLB sizing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" As the following experiments show, this series is a net perf gain, particularly for memory-heavy workloads. Experiments are run on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz. 1. System boot + shudown, debian aarch64: - Before (v3.1.0): Performance counter stats for './die.sh v3.1.0' (10 runs): 9019.797015 task-clock (msec) # 0.993 CPUs utilized = ( +- 0.23% ) 29,910,312,379 cycles # 3.316 GHz = ( +- 0.14% ) 54,699,252,014 instructions # 1.83 insn per cycle= ( +- 0.08% ) 10,061,951,686 branches # 1115.541 M/sec = ( +- 0.08% ) 172,966,530 branch-misses # 1.72% of all branche= s ( +- 0.07% ) 9.084039051 seconds time elapsed = ( +- 0.23% ) - After: Performance counter stats for './die.sh tlb-dyn-v5' (10 runs): 8624.084842 task-clock (msec) # 0.993 CPUs utilized = ( +- 0.23% ) 28,556,123,404 cycles # 3.311 GHz = ( +- 0.13% ) 51,755,089,512 instructions # 1.81 insn per cycle= ( +- 0.05% ) 9,526,513,946 branches # 1104.641 M/sec = ( +- 0.05% ) 166,578,509 branch-misses # 1.75% of all branche= s ( +- 0.19% ) 8.680540350 seconds time elapsed = ( +- 0.24% ) That is, a 4.4% perf increase. 2. System boot + shutdown, ubuntu 18.04 x86_64: - Before (v3.1.0): 56100.574751 task-clock (msec) # 1.016 CPUs utilized = ( +- 4.81% ) 200,745,466,128 cycles # 3.578 GHz = ( +- 5.24% ) 431,949,100,608 instructions # 2.15 insn per cycle= ( +- 5.65% ) 77,502,383,330 branches # 1381.490 M/sec = ( +- 6.18% ) 844,681,191 branch-misses # 1.09% of all branche= s ( +- 3.82% ) 55.221556378 seconds time elapsed = ( +- 5.01% ) - After: 56603.419540 task-clock (msec) # 1.019 CPUs utilized = ( +- 10.19% ) 202,217,930,479 cycles # 3.573 GHz = ( +- 10.69% ) 439,336,291,626 instructions # 2.17 insn per cycle= ( +- 14.14% ) 80,538,357,447 branches # 1422.853 M/sec = ( +- 16.09% ) 776,321,622 branch-misses # 0.96% of all branche= s ( +- 3.77% ) 55.549661409 seconds time elapsed = ( +- 10.44% ) No improvement (within noise range). Note that for this workload, increasing the time window too much can lead to perf degradation, since it flushes the TLB *very* frequently. 3. x86_64 SPEC06int: x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 5.5 +----------------------------------------------------------------------= --+ | +-+ = | 5 |-+.................+-+...............................tlb-dyn-v5.......= +-| | * * = | 4.5 |-+.................*.*................................................= +-| | * * = | 4 |-+.................*.*................................................= +-| | * * = | 3.5 |-+.................*.*................................................= +-| | * * = | 3 |-+......+-+*.......*.*................................................= +-| | * * * * = | 2.5 |-+......*..*.......*.*.................................+-+*...........= +-| | * * * * * * = | 2 |-+......*..*.......*.*.................................*..*...........= +-| | * * * * * * +-+ = | 1.5 |-+......*..*.......*.*.................................*..*.*+-+.*+-+.= +-| | * * *+-+ * * +-+ *+-+ +-+ +-+ * * * * * * = | 1 |++++-+*+*++*+*++*++*+*++*+*+++-+*+*+-++*+-++++-++++-+++*++*+*++*+*++*+= ++| | * * * * * * * * * * * * * * * * * * * * * * * * * * = | 0.5 +----------------------------------------------------------------------= --+ 400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeom= ean png: https://imgur.com/YRF90f7 That is, a 1.51x average speedup over the baseline, with a max speedup of 5.17x. Here's a different look at the SPEC06int results, using KVM as the baseline: x86_64-softmmu slowdown vs. KVM for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 25 +-----------------------------------------------------------------------= ----+ | +-+ +-+ = | | * * +-+ v3.1.0 = | | * * +-+ tlb-dyn-v5 = | | * * * * +-+ = | 20 |-+.................*.*.............................*.+-+......*.*......= ..+-| | * * * # # * * = | | +-+ * * * # # * * = | | * * * * * # # * * = | 15 |-+......*.*........*.*.............................*.#.#......*.+-+....= ..+-| | * * * * * # # * #|# = | | * * * * +-+ * # # * +-+ = | | * * +-+ * * ++-+ +-+ * # # * # # +-+= | | * * +-+ * * * ## *| +-+ * # # * # # +-+= | 10 |-+......*.*..*.+-+.*.*........*.##.......++-+.*.+-+*.#.#......*.#.#.*.*= ..+-| | * * * +-+ * * * ## +-+ *# # * # #* # # +-+ * # # * *= | | * * * # # * * +-+ * ## * +-+ *# # * # #* # # * * * # # *+-= + | | * * * # # * * * +-+ * ## * # # *# # * # #* # # * * * # # * #= # | 5 |-+......*.+-+*.#.#.*.*..*.#.#.*.##.*.#.#.*#.#.*.#.#*.#.#.*.*..*.#.#.*.#= #.+-| | * # #* # # * +-+* # # * ## * # # *# # * # #* # # * * * # # * #= # | | * # #* # # * # #* # # * ## * # # *# # * # #* # # * +-+* # # * #= # | | ++-+ * # #* # # * # #* # # * ## * # # *# # * # #* # # * # #* # # * #= # | |+++*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+*+#+#+*#+#+*+#+#*+#+#+*+#+#*+#+#+*+#= #+++| 0 +-----------------------------------------------------------------------= ----+ 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmge= omean png: https://imgur.com/YzAMNEV After this series, we bring down the average SPEC06int slowdown vs KVM from 11.47x to 7.58x. Signed-off-by: Emilio G. Cota --- tcg/i386/tcg-target.h | 2 +- tcg/i386/tcg-target.inc.c | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 9e4bfa90d1..8b6475d786 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -27,7 +27,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 1 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 31 -#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 1 =20 #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 436195894b..5cbb07deab 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -330,6 +330,7 @@ static inline int tcg_target_const_match(tcg_target_lon= g val, TCGType type, #define OPC_ARITH_GvEv (0x03) /* ... plus (ARITH_FOO << 3) */ #define OPC_ANDN (0xf2 | P_EXT38) #define OPC_ADD_GvEv (OPC_ARITH_GvEv | (ARITH_ADD << 3)) +#define OPC_AND_GvEv (OPC_ARITH_GvEv | (ARITH_AND << 3)) #define OPC_BLENDPS (0x0c | P_EXT3A | P_DATA16) #define OPC_BSF (0xbc | P_EXT) #define OPC_BSR (0xbd | P_EXT) @@ -1625,7 +1626,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TC= GReg addrlo, TCGReg addrhi, } if (TCG_TYPE_PTR =3D=3D TCG_TYPE_I64) { hrexw =3D P_REXW; - if (TARGET_PAGE_BITS + CPU_TLB_BITS > 32) { + if (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32) { tlbtype =3D TCG_TYPE_I64; tlbrexw =3D P_REXW; } @@ -1633,6 +1634,15 @@ static inline void tcg_out_tlb_load(TCGContext *s, T= CGReg addrlo, TCGReg addrhi, } =20 tcg_out_mov(s, tlbtype, r0, addrlo); + tcg_out_shifti(s, SHIFT_SHR + tlbrexw, r0, + TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS); + + tcg_out_modrm_offset(s, OPC_AND_GvEv + trexw, r0, TCG_AREG0, + offsetof(CPUArchState, tlb_mask[mem_index])); + + tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r0, TCG_AREG0, + offsetof(CPUArchState, tlb_table[mem_index])); + /* If the required alignment is at least as large as the access, simply copy the address and mask. For lesser alignments, check that we do= n't cross pages for the complete access. */ @@ -1642,20 +1652,10 @@ static inline void tcg_out_tlb_load(TCGContext *s, = TCGReg addrlo, TCGReg addrhi, tcg_out_modrm_offset(s, OPC_LEA + trexw, r1, addrlo, s_mask - a_ma= sk); } tlb_mask =3D (target_ulong)TARGET_PAGE_MASK | a_mask; - - tcg_out_shifti(s, SHIFT_SHR + tlbrexw, r0, - TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS); - tgen_arithi(s, ARITH_AND + trexw, r1, tlb_mask, 0); - tgen_arithi(s, ARITH_AND + tlbrexw, r0, - (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS, 0); - - tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, TCG_AREG0, r0, 0, - offsetof(CPUArchState, tlb_table[mem_index][0= ]) - + which); =20 /* cmp 0(r0), r1 */ - tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw, r1, r0, 0); + tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw, r1, r0, which); =20 /* Prepare for both the fast path add of the tlb addend, and the slow path function argument setup. There are two cases worth note: @@ -1672,7 +1672,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TC= GReg addrlo, TCGReg addrhi, =20 if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) { /* cmp 4(r0), addrhi */ - tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, 4); + tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, which + 4); =20 /* jne slow_path */ tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0); @@ -1684,7 +1684,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TC= GReg addrlo, TCGReg addrhi, =20 /* add addend(r0), r1 */ tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0, - offsetof(CPUTLBEntry, addend) - which); + offsetof(CPUTLBEntry, addend)); } =20 /* --=20 2.17.1