From nobody Thu Nov 6 03:26:10 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1539371189476343.9219160942456; Fri, 12 Oct 2018 12:06:29 -0700 (PDT) Received: from localhost ([::1]:42309 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gB2lY-0003gx-EO for importer@patchew.org; Fri, 12 Oct 2018 15:06:28 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35838) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gB2ju-0002lX-H5 for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gB2jq-0003dP-Jo for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:46 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:48237) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gB2jo-0003b9-CK for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:41 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id EC2B622165; Fri, 12 Oct 2018 15:04:36 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 12 Oct 2018 15:04:36 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 71191102EB; Fri, 12 Oct 2018 15:04:36 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=mesmtp; bh=KOJz8Nl7AiY8nGmM9ifnexwKMaMgcGf1/yNAxFCiJ+4=; b=FVrjevk2hFNS CpvYdUXxXc6dEriTHoLbV+KvoQTPMgLsqCJdqVnFFMyVqbmZP2wKmHvZPJgaHWzo pUZDh5F9JGObi1JAy0gnrB/5MiOHUKZkODJKNcT30hiqEh5eRC+wajAcTk3uOH7D czZkpVsF6oizMMjIWHK5J9GqRrdqOQQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=KOJz8Nl7AiY8nGmM9ifnexwKMaMgcGf1/yNAxFCiJ +4=; b=IHU9+uhomzW3L0Qb8z79DWlSHcLX8t35T/LR37U7DzOleLjDH5xh6/YMR V6amYiWQTkJzWVVakThW4r5sXnRUu9s9s5ocdvrrFdo4Ivs79oedh6jhkvUi5PLx GdKW9iCEBfZwzrmALtGVZ9Mo251g2C6H62XRL1VeM0/hVWZW1lLe7wj1XNyjtpkC wWGAZOu6A/sjixosi9Uo4n/tJ5X2Df4wm6W3FF4yztHMffbwiOqCVHyjoTg8DZTf 0ScljY+14GeG1rCD8oiBI3wWhZFLnTXl66AW25EnzqY+7zmV5dM3l5w2l0LVbKi3 eK++c4SMaJ+jrbEwFYqBSG8E3PeWA== X-ME-Sender: X-ME-Proxy: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Fri, 12 Oct 2018 15:04:32 -0400 Message-Id: <20181012190434.19477-2-cota@braap.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181012190434.19477-1-cota@braap.org> References: <20181012190434.19477-1-cota@braap.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.111.4.27 Subject: [Qemu-devel] [PATCH v4 1/3] cputlb: do not evict empty entries to the vtlb X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Currently we evict an entry to the victim TLB when it doesn't match the current address. But it could be that there's no match because the current entry is empty (i.e. all -1's, for instance via tlb_flush). Do not evict the entry to the vtlb in that case. This change will help us keep track of the TLB's use rate, which we'll use to implement a policy for dynamic TLB sizing. Reviewed-by: Alex Benn=C3=A9e Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- include/exec/cpu-all.h | 9 +++++++++ accel/tcg/cputlb.c | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index 117d2fbbca..e21140049b 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -362,6 +362,15 @@ static inline bool tlb_hit(target_ulong tlb_addr, targ= et_ulong addr) return tlb_hit_page(tlb_addr, addr & TARGET_PAGE_MASK); } =20 +/** + * tlb_entry_is_empty - return true if the entry is not in use + * @te: pointer to CPUTLBEntry + */ +static inline bool tlb_entry_is_empty(const CPUTLBEntry *te) +{ + return te->addr_read =3D=3D -1 && te->addr_write =3D=3D -1 && te->addr= _code =3D=3D -1; +} + void dump_exec_info(FILE *f, fprintf_function cpu_fprintf); void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf); #endif /* !CONFIG_USER_ONLY */ diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index a6b716eb79..6ee18308d5 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -678,7 +678,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulon= g vaddr, * Only evict the old entry to the victim tlb if it's for a * different page; otherwise just overwrite the stale data. */ - if (!tlb_hit_page_anyprot(te, vaddr_page)) { + if (!tlb_hit_page_anyprot(te, vaddr_page) && !tlb_entry_is_empty(te)) { unsigned vidx =3D env->vtlb_index++ % CPU_VTLB_SIZE; CPUTLBEntry *tv =3D &env->tlb_v_table[mmu_idx][vidx]; =20 --=20 2.17.1 From nobody Thu Nov 6 03:26:10 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1539371191796937.2303501814587; Fri, 12 Oct 2018 12:06:31 -0700 (PDT) Received: from localhost ([::1]:42310 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gB2la-0003ho-KI for importer@patchew.org; Fri, 12 Oct 2018 15:06:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35835) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gB2ju-0002lV-GM for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gB2jq-0003dF-Jn for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:46 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:43829) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gB2jo-0003bC-CZ for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:41 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 20B7B21F92; Fri, 12 Oct 2018 15:04:37 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 12 Oct 2018 15:04:37 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id A786F102EE; Fri, 12 Oct 2018 15:04:36 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= mesmtp; bh=PjivJAXJHWuMd8oYxass1oagIKwpNtFKeiBJz1aUBxM=; b=Ls34d TQWwLTDN/QbG+6Fc8Ahel2bvGrJOZOaS9PS/kzxLVIfEldEBmKDGWcZW+Qt/giU8 HnVywEhwrT65xYNuG0t+bnfgksuFdRoyYPb1HJMz8J7DWLrOoV+Fp3Aj19Zy2ttt tbnjl4P0vi/PNhYyBIuRDAZMSIMgsSGg/rSQKY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=PjivJAXJHWuMd8oYxass1oagIKwpN tFKeiBJz1aUBxM=; b=aps2b1UJoBrzwImw6zKAoQf9ToBxuaU5KxTuoMJ5BlzFz uMOOItVG/srGKaJ4czKXmmX4iU/PvQKz9M8YniJYbCVfi7CMXlkMjOb9IeLquCsg vgiVFhb+uuNMqY3k6QZFrNtEeHrozsMhfuD9IOaBzdWI8+CdEj6Pn8lw8BSU2rbi VkCw7sj9c78sIQy/W5xkv+G7ZopiN3RyK3yORYLMxdfDQ+qhJAYeyQDWRgQ+Idl/ SRICFsdpWOlcWme25CS3xNukgNNezEzJDjpYx/+vRv+6yAQCsT7anWsl1M7xozsD T0638/FpyFWBgO2PUny1f5zqbfrNB0yz5VUrbC1lA== X-ME-Sender: X-ME-Proxy: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Fri, 12 Oct 2018 15:04:33 -0400 Message-Id: <20181012190434.19477-3-cota@braap.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181012190434.19477-1-cota@braap.org> References: <20181012190434.19477-1-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.111.4.27 Subject: [Qemu-devel] [PATCH v4 2/3] tcg: introduce dynamic TLB sizing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Disable for all TCG backends for now. Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 43 +++++++++++- include/exec/cpu_ldst.h | 21 ++++++ tcg/aarch64/tcg-target.h | 1 + tcg/arm/tcg-target.h | 1 + tcg/i386/tcg-target.h | 1 + tcg/mips/tcg-target.h | 1 + tcg/ppc/tcg-target.h | 1 + tcg/s390/tcg-target.h | 1 + tcg/sparc/tcg-target.h | 1 + tcg/tci/tcg-target.h | 1 + accel/tcg/cputlb.c | 138 +++++++++++++++++++++++++++++++++++++-- 11 files changed, 201 insertions(+), 9 deletions(-) diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h index 4ff62f32bf..40cd5d4774 100644 --- a/include/exec/cpu-defs.h +++ b/include/exec/cpu-defs.h @@ -67,6 +67,19 @@ typedef uint64_t target_ulong; #define CPU_TLB_ENTRY_BITS 5 #endif =20 +#if TCG_TARGET_IMPLEMENTS_DYN_TLB +#define CPU_TLB_DYN_MIN_BITS 6 +#define CPU_TLB_DYN_DEFAULT_BITS 8 +/* + * Assuming TARGET_PAGE_BITS=3D=3D12, with 2**22 entries we can cover 2**(= 22+12) =3D=3D + * 2**34 =3D=3D 16G of address space. This is roughly what one would expec= t a + * TLB to cover in a modern (as of 2018) x86_64 CPU. For instance, Intel + * Skylake's Level-2 STLB has 16 1G entries. + */ +#define CPU_TLB_DYN_MAX_BITS 22 + +#else /* !TCG_TARGET_IMPLEMENTS_DYN_TLB */ + /* TCG_TARGET_TLB_DISPLACEMENT_BITS is used in CPU_TLB_BITS to ensure that * the TLB is not unnecessarily small, but still small enough for the * TLB lookup instruction sequence used by the TCG target. @@ -98,6 +111,7 @@ typedef uint64_t target_ulong; NB_MMU_MODES <=3D 8 ? 3 : 4)) =20 #define CPU_TLB_SIZE (1 << CPU_TLB_BITS) +#endif /* TCG_TARGET_IMPLEMENTS_DYN_TLB */ =20 typedef struct CPUTLBEntry { /* bit TARGET_LONG_BITS to TARGET_PAGE_BITS : virtual address @@ -141,13 +155,36 @@ typedef struct CPUIOTLBEntry { MemTxAttrs attrs; } CPUIOTLBEntry; =20 -#define CPU_COMMON_TLB \ +#if TCG_TARGET_IMPLEMENTS_DYN_TLB + +typedef struct CPUTLBDesc { + size_t n_used_entries; + size_t n_flushes_low_rate; +} CPUTLBDesc; + +#define CPU_TLB \ + CPUTLBDesc tlb_desc[NB_MMU_MODES]; \ + /* tlb_mask[i] contains (n_entries - 1) << CPU_TLB_ENTRY_BITS */ \ + uintptr_t tlb_mask[NB_MMU_MODES]; \ + CPUTLBEntry *tlb_table[NB_MMU_MODES]; + +#define CPU_IOTLB \ + CPUIOTLBEntry *iotlb[NB_MMU_MODES]; +#else +#define CPU_TLB \ + CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; + +#define CPU_IOTLB \ + CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; +#endif /* TCG_TARGET_IMPLEMENTS_DYN_TLB */ + +#define CPU_COMMON_TLB \ /* The meaning of the MMU modes is defined in the target code. */ \ /* tlb_lock serializes updates to tlb_table and tlb_v_table */ \ QemuSpin tlb_lock; \ - CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; \ + CPU_TLB \ CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE]; \ - CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; \ + CPU_IOTLB \ CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE]; \ size_t tlb_flush_count; \ target_ulong tlb_flush_addr; \ diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index e3d8d738aa..91f29c1188 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -126,6 +126,21 @@ extern __thread uintptr_t helper_retaddr; /* The memory helpers for tcg-generated code need tcg_target_long etc. */ #include "tcg.h" =20 +#if TCG_TARGET_IMPLEMENTS_DYN_TLB +/* Find the TLB index corresponding to the mmu_idx + address pair. */ +static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx, + target_ulong addr) +{ + uintptr_t size_mask =3D env->tlb_mask[mmu_idx] >> CPU_TLB_ENTRY_BITS; + + return (addr >> TARGET_PAGE_BITS) & size_mask; +} + +static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx) +{ + return (env->tlb_mask[mmu_idx] >> CPU_TLB_ENTRY_BITS) + 1; +} +#else /* Find the TLB index corresponding to the mmu_idx + address pair. */ static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx, target_ulong addr) @@ -133,6 +148,12 @@ static inline uintptr_t tlb_index(CPUArchState *env, u= intptr_t mmu_idx, return (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); } =20 +static inline size_t tlb_n_entries(CPUArchState *env, uintptr_t mmu_idx) +{ + return CPU_TLB_SIZE; +} +#endif /* TCG_TARGET_IMPLEMENTS_DYN_TLB */ + /* Find the TLB entry corresponding to the mmu_idx + address pair. */ static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx, target_ulong addr) diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h index 9aea1d1771..3060d83d14 100644 --- a/tcg/aarch64/tcg-target.h +++ b/tcg/aarch64/tcg-target.h @@ -15,6 +15,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 24 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 #undef TCG_TARGET_STACK_GROWSUP =20 typedef enum { diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h index 94b3578c55..0e8b79d20f 100644 --- a/tcg/arm/tcg-target.h +++ b/tcg/arm/tcg-target.h @@ -60,6 +60,7 @@ extern int arm_arch; #undef TCG_TARGET_STACK_GROWSUP #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 typedef enum { TCG_REG_R0 =3D 0, diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 9fdf37f23c..9e4bfa90d1 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -27,6 +27,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 1 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 31 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h index a8222476f0..a97f31113e 100644 --- a/tcg/mips/tcg-target.h +++ b/tcg/mips/tcg-target.h @@ -37,6 +37,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 #define TCG_TARGET_NB_REGS 32 =20 typedef enum { diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h index be52ad1d2e..8f03328af4 100644 --- a/tcg/ppc/tcg-target.h +++ b/tcg/ppc/tcg-target.h @@ -34,6 +34,7 @@ #define TCG_TARGET_NB_REGS 32 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 typedef enum { TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R3, diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h index 6f2b06a7d1..df92f3065a 100644 --- a/tcg/s390/tcg-target.h +++ b/tcg/s390/tcg-target.h @@ -27,6 +27,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 2 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 19 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 typedef enum TCGReg { TCG_REG_R0 =3D 0, diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h index d8339bf010..975ddc7b0d 100644 --- a/tcg/sparc/tcg-target.h +++ b/tcg/sparc/tcg-target.h @@ -29,6 +29,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 4 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 32 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 #define TCG_TARGET_NB_REGS 32 =20 typedef enum { diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h index 26140d78cb..bcfd8d69e6 100644 --- a/tcg/tci/tcg-target.h +++ b/tcg/tci/tcg-target.h @@ -43,6 +43,7 @@ #define TCG_TARGET_INTERPRETER 1 #define TCG_TARGET_INSN_UNIT_SIZE 1 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 32 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 =20 #if UINTPTR_MAX =3D=3D UINT32_MAX # define TCG_TARGET_REG_BITS 32 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 6ee18308d5..b7bc4bb32f 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -74,11 +74,128 @@ QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on= _cpu_data)); QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16); #define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1) =20 +#if TCG_TARGET_IMPLEMENTS_DYN_TLB +static inline size_t sizeof_tlb(CPUArchState *env, uintptr_t mmu_idx) +{ + return env->tlb_mask[mmu_idx] + (1 << CPU_TLB_ENTRY_BITS); +} + +static void tlb_dyn_init(CPUArchState *env) +{ + int i; + + for (i =3D 0; i < NB_MMU_MODES; i++) { + size_t n_entries =3D 1 << CPU_TLB_DYN_DEFAULT_BITS; + + env->tlb_desc[i].n_used_entries =3D 0; + env->tlb_desc[i].n_flushes_low_rate =3D 0; + env->tlb_mask[i] =3D (n_entries - 1) << CPU_TLB_ENTRY_BITS; + env->tlb_table[i] =3D g_new(CPUTLBEntry, n_entries); + env->iotlb[i] =3D g_new(CPUIOTLBEntry, n_entries); + } +} + +/* + * Perform the resizing only on flushes, otherwise we'd have to take a perf + * hit by either rehashing the array or unnecessarily flushing it. + * + * We grow the array aggressively, and reduce the size more slowly. This + * accommodates mixed workloads, where some processes might be memory-heavy + * while others might not. + * + * Called with tlb_lock held. + */ +static void tlb_mmu_resize_locked(CPUArchState *env, int mmu_idx) +{ + CPUTLBDesc *desc =3D &env->tlb_desc[mmu_idx]; + size_t old_size =3D tlb_n_entries(env, mmu_idx); + size_t rate =3D desc->n_used_entries * 100 / old_size; + size_t new_size =3D old_size; + + if (rate =3D=3D 100) { + new_size =3D MIN(old_size << 2, 1 << CPU_TLB_DYN_MAX_BITS); + } else if (rate > 70) { + new_size =3D MIN(old_size << 1, 1 << CPU_TLB_DYN_MAX_BITS); + } else if (rate < 30) { + desc->n_flushes_low_rate++; + if (desc->n_flushes_low_rate =3D=3D 100) { + new_size =3D MAX(old_size >> 1, 1 << CPU_TLB_DYN_MIN_BITS); + desc->n_flushes_low_rate =3D 0; + } + } + + if (new_size =3D=3D old_size) { + return; + } + g_free(env->tlb_table[mmu_idx]); + g_free(env->iotlb[mmu_idx]); + + /* desc->n_used_entries is cleared by the caller */ + desc->n_flushes_low_rate =3D 0; + env->tlb_mask[mmu_idx] =3D (new_size - 1) << CPU_TLB_ENTRY_BITS; + env->tlb_table[mmu_idx] =3D g_new(CPUTLBEntry, new_size); + env->iotlb[mmu_idx] =3D g_new(CPUIOTLBEntry, new_size); +} + +static inline void tlb_table_flush(CPUArchState *env) +{ + int i; + + for (i =3D 0; i < NB_MMU_MODES; i++) { + tlb_mmu_resize_locked(env, i); + memset(env->tlb_table[i], -1, sizeof_tlb(env, i)); + env->tlb_desc[i].n_used_entries =3D 0; + } +} + +static inline void tlb_table_flush_by_mmuidx(CPUArchState *env, int mmu_id= x) +{ + tlb_mmu_resize_locked(env, mmu_idx); + memset(env->tlb_table[mmu_idx], -1, sizeof_tlb(env, mmu_idx)); + env->tlb_desc[mmu_idx].n_used_entries =3D 0; +} + +static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu= _idx) +{ + env->tlb_desc[mmu_idx].n_used_entries++; +} + +static inline void tlb_n_used_entries_dec(CPUArchState *env, uintptr_t mmu= _idx) +{ + env->tlb_desc[mmu_idx].n_used_entries--; +} + +#else /* !TCG_TARGET_IMPLEMENTS_DYN_TLB */ + +static inline void tlb_dyn_init(CPUArchState *env) +{ +} + +static inline void tlb_table_flush(CPUArchState *env) +{ + memset(env->tlb_table, -1, sizeof(env->tlb_table)); +} + +static inline void tlb_table_flush_by_mmuidx(CPUArchState *env, int mmu_id= x) +{ + memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0])); +} + +static inline void tlb_n_used_entries_inc(CPUArchState *env, uintptr_t mmu= _idx) +{ +} + +static inline void tlb_n_used_entries_dec(CPUArchState *env, uintptr_t mmu= _idx) +{ +} +#endif /* TCG_TARGET_IMPLEMENTS_DYN_TLB */ + void tlb_init(CPUState *cpu) { CPUArchState *env =3D cpu->env_ptr; =20 qemu_spin_init(&env->tlb_lock); + tlb_dyn_init(env); } =20 /* flush_all_helper: run fn across all cpus @@ -140,7 +257,7 @@ static void tlb_flush_nocheck(CPUState *cpu) * that do not hold the lock are performed by the same owner thread. */ qemu_spin_lock(&env->tlb_lock); - memset(env->tlb_table, -1, sizeof(env->tlb_table)); + tlb_table_flush(env); memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table)); qemu_spin_unlock(&env->tlb_lock); =20 @@ -201,7 +318,7 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cp= u, run_on_cpu_data data) if (test_bit(mmu_idx, &mmu_idx_bitmask)) { tlb_debug("%d\n", mmu_idx); =20 - memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0])); + tlb_table_flush_by_mmuidx(env, mmu_idx); memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[= 0])); } } @@ -263,12 +380,14 @@ static inline bool tlb_hit_page_anyprot(CPUTLBEntry *= tlb_entry, } =20 /* Called with tlb_lock held */ -static inline void tlb_flush_entry_locked(CPUTLBEntry *tlb_entry, +static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry, target_ulong page) { if (tlb_hit_page_anyprot(tlb_entry, page)) { memset(tlb_entry, -1, sizeof(*tlb_entry)); + return true; } + return false; } =20 /* Called with tlb_lock held */ @@ -279,7 +398,9 @@ static inline void tlb_flush_vtlb_page_locked(CPUArchSt= ate *env, int mmu_idx, =20 assert_cpu_is_self(ENV_GET_CPU(env)); for (k =3D 0; k < CPU_VTLB_SIZE; k++) { - tlb_flush_entry_locked(&env->tlb_v_table[mmu_idx][k], page); + if (tlb_flush_entry_locked(&env->tlb_v_table[mmu_idx][k], page)) { + tlb_n_used_entries_dec(env, mmu_idx); + } } } =20 @@ -306,7 +427,9 @@ static void tlb_flush_page_async_work(CPUState *cpu, ru= n_on_cpu_data data) addr &=3D TARGET_PAGE_MASK; qemu_spin_lock(&env->tlb_lock); for (mmu_idx =3D 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { - tlb_flush_entry_locked(tlb_entry(env, mmu_idx, addr), addr); + if (tlb_flush_entry_locked(tlb_entry(env, mmu_idx, addr), addr)) { + tlb_n_used_entries_dec(env, mmu_idx); + } tlb_flush_vtlb_page_locked(env, mmu_idx, addr); } qemu_spin_unlock(&env->tlb_lock); @@ -524,8 +647,9 @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, = ram_addr_t length) qemu_spin_lock(&env->tlb_lock); for (mmu_idx =3D 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { unsigned int i; + unsigned int n =3D tlb_n_entries(env, mmu_idx); =20 - for (i =3D 0; i < CPU_TLB_SIZE; i++) { + for (i =3D 0; i < n; i++) { tlb_reset_dirty_range_locked(&env->tlb_table[mmu_idx][i], star= t1, length); } @@ -685,6 +809,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulon= g vaddr, /* Evict the old entry into the victim tlb. */ copy_tlb_helper_locked(tv, te); env->iotlb_v[mmu_idx][vidx] =3D env->iotlb[mmu_idx][index]; + tlb_n_used_entries_dec(env, mmu_idx); } =20 /* refill the tlb */ @@ -736,6 +861,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulon= g vaddr, } =20 copy_tlb_helper_locked(te, &tn); + tlb_n_used_entries_inc(env, mmu_idx); qemu_spin_unlock(&env->tlb_lock); } =20 --=20 2.17.1 From nobody Thu Nov 6 03:26:10 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 153937124940056.2830850812702; Fri, 12 Oct 2018 12:07:29 -0700 (PDT) Received: from localhost ([::1]:42311 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gB2mW-0004OZ-3c for importer@patchew.org; Fri, 12 Oct 2018 15:07:28 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35836) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gB2ju-0002lW-GQ for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gB2jq-0003dG-Jb for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:46 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:55585) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gB2jo-0003bA-CR for qemu-devel@nongnu.org; Fri, 12 Oct 2018 15:04:41 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 4211722183; Fri, 12 Oct 2018 15:04:37 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 12 Oct 2018 15:04:37 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id DA3F1102F0; Fri, 12 Oct 2018 15:04:36 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= mesmtp; bh=HN61Rfu9B8ULts5r2BqSzsKKdKrk81fzwko3vfx3mLk=; b=xFD4X 2Kjs5ZhPdyV+7koU2ysJNnffTBSZxZMKb8/Un/fio0OT9sYi20uqNYZ/gBu9rvS4 Znd4ajda4vNoKmwSYq9s0ony9irvGyfiIhzii6TH9GswtYLm5b0Ca3bGuuOH/ZHF XRP1NCnylSzSjGPaY7E4lCVDQZNCQuJsvK5RJA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=HN61Rfu9B8ULts5r2BqSzsKKdKrk8 1fzwko3vfx3mLk=; b=RdG8sMjiCYuj7YCEmIVXqLIWYe+JHneqOvbZ/ABEF704s x1dTeX10TwI+waiD0ITgLNS98g7rXG4aByGsKkzHpc21zC0frhLXHCXL8EZtIMeE /haWk5/Fg6noOymlBK4BzGd3cbHwwaGv21pEKLzt68/RdaXOGs5xSZePN1xfS1MJ 4oHKZkArsvfrUQ5MWqCwVQLnmu/kXSjsD74xKeHR58l24O8mpQ8rIvvDn6p13S8e yIUjWYSsVNKe/tjmb1gHPZ5FHmu3LoBPCgLohHl5Qam5i1AScPCanGV3GeMEPzYf aLXqhYlfjic6U6ORxEGGJ2I3aZMAg5/hMmu3vA1ww== X-ME-Sender: X-ME-Proxy: From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Fri, 12 Oct 2018 15:04:34 -0400 Message-Id: <20181012190434.19477-4-cota@braap.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181012190434.19477-1-cota@braap.org> References: <20181012190434.19477-1-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.111.4.27 Subject: [Qemu-devel] [PATCH v4 3/3] tcg/i386: enable dynamic TLB sizing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" As the following experiments show, this a net perf gain, particularly for memory-heavy workloads. Experiments are run on an Intel i7-6700K CPU @ 4.00GHz. 1. System boot + shudown, debian aarch64: - Before (tb-lock-v3): Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 run= s): 7469.363393 task-clock (msec) # 0.998 CPUs utilized = ( +- 0.07% ) 31,507,707,190 cycles # 4.218 GHz = ( +- 0.07% ) 57,101,577,452 instructions # 1.81 insns per cycl= e ( +- 0.08% ) 10,265,531,804 branches # 1374.352 M/sec = ( +- 0.07% ) 173,020,681 branch-misses # 1.69% of all branche= s ( +- 0.10% ) 7.483359063 seconds time elapsed = ( +- 0.08% ) - After: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 run= s): 7185.036730 task-clock (msec) # 0.999 CPUs utilized = ( +- 0.11% ) 30,303,501,143 cycles # 4.218 GHz = ( +- 0.11% ) 54,198,386,487 instructions # 1.79 insns per cycl= e ( +- 0.08% ) 9,726,518,945 branches # 1353.719 M/sec = ( +- 0.08% ) 167,082,307 branch-misses # 1.72% of all branche= s ( +- 0.08% ) 7.195597842 seconds time elapsed = ( +- 0.11% ) That is, a 3.8% improvement. 2. System boot + shutdown, ubuntu 18.04 x86_64: - Before (tb-lock-v3): Performance counter stats for 'taskset -c 0 ../img/x86_64/ubuntu-die.sh -no= graphic' (2 runs): 49971.036482 task-clock (msec) # 0.999 CPUs utilized = ( +- 1.62% ) 210,766,077,140 cycles # 4.218 GHz = ( +- 1.63% ) 428,829,830,790 instructions # 2.03 insns per cycl= e ( +- 0.75% ) 77,313,384,038 branches # 1547.164 M/sec = ( +- 0.54% ) 835,610,706 branch-misses # 1.08% of all branche= s ( +- 2.97% ) 50.003855102 seconds time elapsed = ( +- 1.61% ) - After: Performance counter stats for 'taskset -c 0 ../img/x86_64/ubuntu-die.sh -n= ographic' (2 runs): 50118.124477 task-clock (msec) # 0.999 CPUs utilized = ( +- 4.30% ) 132,396 context-switches # 0.003 M/sec = ( +- 1.20% ) 0 cpu-migrations # 0.000 K/sec = ( +-100.00% ) 167,754 page-faults # 0.003 M/sec = ( +- 0.06% ) 211,414,701,601 cycles # 4.218 GHz = ( +- 4.30% ) stalled-cycles-frontend stalled-cycles-backend 431,618,818,597 instructions # 2.04 insns per cycl= e ( +- 6.40% ) 80,197,256,524 branches # 1600.165 M/sec = ( +- 8.59% ) 794,830,352 branch-misses # 0.99% of all branche= s ( +- 2.05% ) 50.177077175 seconds time elapsed = ( +- 4.23% ) No improvement (within noise range). 3. x86_64 SPEC06int: SPEC06int (test set) [ Y axis: speedup over master ] 8 +-+--+----+----+-----+----+----+----+----+----+----+-----+----+----+--+= -+ | = | | tlb-lock-v3 = | 7 +-+..................$$$...........................+indirection += -+ | $ $ +resizing = | | $ $ = | 6 +-+..................$.$..............................................+= -+ | $ $ = | | $ $ = | 5 +-+..................$.$..............................................+= -+ | $ $ = | | $ $ = | 4 +-+..................$.$..............................................+= -+ | $ $ = | | +++ $ $ = | 3 +-+........$$+.......$.$..............................................+= -+ | $$ $ $ = | | $$ $ $ $$$ = | 2 +-+........$$........$.$.................................$.$..........+= -+ | $$ $ $ $ $ +$$ = | | $$ $$+ $ $ $$$ +$$ $ $ $$$ $$ = | 1 +-+***#$***#$+**#$+**#+$**#+$**##$**##$***#$***#$+**#$+**#+$**#+$**##$+= -+ | * *#$* *#$ **#$ **# $**# $** #$** #$* *#$* *#$ **#$ **# $**# $** #$ = | | * *#$* *#$ **#$ **# $**# $** #$** #$* *#$* *#$ **#$ **# $**# $** #$ = | 0 +-+***#$***#$-**#$-**#$$**#$$**##$**##$***#$***#$-**#$-**#$$**#$$**##$+= -+ 401.bzi403.gc429445.g456.h462.libq464.h471.omne4483.xalancbgeomean png: https://imgur.com/a/b1wn3wc That is, a 1.53x average speedup over master, with a max speedup of 7.13x. Note that "indirection" (i.e. the "cputlb: introduce indirection for TLB si= ze" patch in this series) incurs no overhead, on average. To conclude, here is a different look at the SPEC06int results, using linux-user as the baseline and comparing master and this series ("tlb-dyn"): Softmmu slowdown vs. linux-user for SPEC06int (test set) [ Y axis: slowdown over linux-user ] 14 +-+--+----+----+----+----+----+-----+----+----+----+----+----+----+--+= -+ | = | | master = | 12 +-+...............+**..................................tlb-dyn.......+= -+ | ** = | | ** = | | ** = | 10 +-+................**................................................+= -+ | ** = | | ** = | 8 +-+................**................................................+= -+ | ** = | | ** = | | ** = | 6 +-+................**................................................+= -+ | *** ** = | | * * ** = | 4 +-+.....*.*........**.................................***............+= -+ | * * ** * * = | | * * +++ ** *** *** * * *** *** = | | * * +**++ ** **## *+*# *** * *#+* * * *##* * = | 2 +-+.....*.*##.**##.**##.**.#.**##.*+*#.***#.*+*#.*.*#.*.*#+*.*.#*.*##+= -+ |++***##*+*+#+**+#+**+#+**+#+**+#+*+*#+*+*#+*+*#+*+*#+*+*#+*+*+#*+*+#+= +| | * * #* * # ** # ** # ** # ** # * *# * *# * *# * *# * *# * * #* * # = | 0 +-+***##***##-**##-**##-**##-**##-***#-***#-***#-***#-***#-***##***##+= -+ 401.bzi403.g429445.g456.hm462.libq464.h471.omn4483.xalancbgeomean png: https://imgur.com/a/eXkjMCE After this series, we bring down the average softmmu overhead from 2.77x to 1.80x, with a maximum slowdown of 2.48x (omnetpp). Signed-off-by: Emilio G. Cota --- tcg/i386/tcg-target.h | 2 +- tcg/i386/tcg-target.inc.c | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 9e4bfa90d1..8b6475d786 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -27,7 +27,7 @@ =20 #define TCG_TARGET_INSN_UNIT_SIZE 1 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 31 -#define TCG_TARGET_IMPLEMENTS_DYN_TLB 0 +#define TCG_TARGET_IMPLEMENTS_DYN_TLB 1 =20 #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 436195894b..5cbb07deab 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -330,6 +330,7 @@ static inline int tcg_target_const_match(tcg_target_lon= g val, TCGType type, #define OPC_ARITH_GvEv (0x03) /* ... plus (ARITH_FOO << 3) */ #define OPC_ANDN (0xf2 | P_EXT38) #define OPC_ADD_GvEv (OPC_ARITH_GvEv | (ARITH_ADD << 3)) +#define OPC_AND_GvEv (OPC_ARITH_GvEv | (ARITH_AND << 3)) #define OPC_BLENDPS (0x0c | P_EXT3A | P_DATA16) #define OPC_BSF (0xbc | P_EXT) #define OPC_BSR (0xbd | P_EXT) @@ -1625,7 +1626,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TC= GReg addrlo, TCGReg addrhi, } if (TCG_TYPE_PTR =3D=3D TCG_TYPE_I64) { hrexw =3D P_REXW; - if (TARGET_PAGE_BITS + CPU_TLB_BITS > 32) { + if (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32) { tlbtype =3D TCG_TYPE_I64; tlbrexw =3D P_REXW; } @@ -1633,6 +1634,15 @@ static inline void tcg_out_tlb_load(TCGContext *s, T= CGReg addrlo, TCGReg addrhi, } =20 tcg_out_mov(s, tlbtype, r0, addrlo); + tcg_out_shifti(s, SHIFT_SHR + tlbrexw, r0, + TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS); + + tcg_out_modrm_offset(s, OPC_AND_GvEv + trexw, r0, TCG_AREG0, + offsetof(CPUArchState, tlb_mask[mem_index])); + + tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r0, TCG_AREG0, + offsetof(CPUArchState, tlb_table[mem_index])); + /* If the required alignment is at least as large as the access, simply copy the address and mask. For lesser alignments, check that we do= n't cross pages for the complete access. */ @@ -1642,20 +1652,10 @@ static inline void tcg_out_tlb_load(TCGContext *s, = TCGReg addrlo, TCGReg addrhi, tcg_out_modrm_offset(s, OPC_LEA + trexw, r1, addrlo, s_mask - a_ma= sk); } tlb_mask =3D (target_ulong)TARGET_PAGE_MASK | a_mask; - - tcg_out_shifti(s, SHIFT_SHR + tlbrexw, r0, - TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS); - tgen_arithi(s, ARITH_AND + trexw, r1, tlb_mask, 0); - tgen_arithi(s, ARITH_AND + tlbrexw, r0, - (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS, 0); - - tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, TCG_AREG0, r0, 0, - offsetof(CPUArchState, tlb_table[mem_index][0= ]) - + which); =20 /* cmp 0(r0), r1 */ - tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw, r1, r0, 0); + tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw, r1, r0, which); =20 /* Prepare for both the fast path add of the tlb addend, and the slow path function argument setup. There are two cases worth note: @@ -1672,7 +1672,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TC= GReg addrlo, TCGReg addrhi, =20 if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) { /* cmp 4(r0), addrhi */ - tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, 4); + tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, which + 4); =20 /* jne slow_path */ tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0); @@ -1684,7 +1684,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, TC= GReg addrlo, TCGReg addrhi, =20 /* add addend(r0), r1 */ tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0, - offsetof(CPUTLBEntry, addend) - which); + offsetof(CPUTLBEntry, addend)); } =20 /* --=20 2.17.1