From nobody Tue Feb 10 02:00:51 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1508925308773925.7930993150155; Wed, 25 Oct 2017 02:55:08 -0700 (PDT) Received: from localhost ([::1]:47384 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e7IOf-0007hT-7X for importer@patchew.org; Wed, 25 Oct 2017 05:54:49 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45672) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e7I6c-0008NJ-C0 for qemu-devel@nongnu.org; Wed, 25 Oct 2017 05:36:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e7I6a-0008JF-AY for qemu-devel@nongnu.org; Wed, 25 Oct 2017 05:36:10 -0400 Received: from mail-wm0-x241.google.com ([2a00:1450:400c:c09::241]:46388) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1e7I6Z-0008Hx-W0 for qemu-devel@nongnu.org; Wed, 25 Oct 2017 05:36:08 -0400 Received: by mail-wm0-x241.google.com with SMTP id m72so632386wmc.1 for ; Wed, 25 Oct 2017 02:36:07 -0700 (PDT) Received: from cloudburst.twiddle.net ([62.168.35.107]) by smtp.gmail.com with ESMTPSA id v23sm2751025wmh.8.2017.10.25.02.36.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 25 Oct 2017 02:36:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vBzVhJ6nUvd1zB1p1teHetLEfZ0hFjT1C+3T8jp8Ggw=; b=Zq4tixZM5QebSKX+plAj/3IYx9orhWC2ylj4pdpvmpqzLCJHYw4lIfi/Zafp49N+rP TKHCTnYTj5ECc8Fw18mpFYnrbOkzHtRmyhrXd0EDsbOgt+B93JpJs1acR5ZNIsloU7gC Cj1s7a/4hCw/aEtKvUqWbmqzKH73dhhz691t4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vBzVhJ6nUvd1zB1p1teHetLEfZ0hFjT1C+3T8jp8Ggw=; b=BZxEw8kICw89tyHPx6RMvVDoi+xe2UtwBeUkTVc11uN0Cr/Y8VrBP9Tt8i0tcGoOvz 2MzGParSr9LTH4+hq1Pz34OjTGZcw2+n6gg6T3QcT/nupOqgDWDLkxwqI25ZHSg5Vld+ kixtBHArdovJbSlKCS0CgxlAnauBXEZCSRY8enBF8Cov8EWg3yRjXdD0BWcgm5y7P46w +4KUaS33rQ6y3BTuU4A/2y7vLPyHVn+pLw8Qy9t/RpuxCtvJsfbaIUaeBDXDytgqOLix ykMEyPfdvypKrMkP4gQlQ6C2kGWOpTWFdiaThlM59EQTNHU5yJADL8Qox2V2EEGO93It KA0w== X-Gm-Message-State: AMCzsaVnn/1StW0mTMguNpGr/WZp9SU30HACDzBtUM26GA2uJufWh/6a GAgY4/I6zNKli/dGRSy3pEnl7okgmwk= X-Google-Smtp-Source: ABhQp+RhRHY3JOf2IPoNyRk7BKjUuhvjtjVLD8M1ctnm0eGkRIQ7JNkzRR6nzdwMhQx+jK56sN3fRQ== X-Received: by 10.28.180.2 with SMTP id d2mr1070976wmf.118.1508924166157; Wed, 25 Oct 2017 02:36:06 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Wed, 25 Oct 2017 11:35:06 +0200 Message-Id: <20171025093535.10175-23-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20171025093535.10175-1-richard.henderson@linaro.org> References: <20171025093535.10175-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c09::241 Subject: [Qemu-devel] [PULL 22/51] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, "Emilio G. Cota" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: "Emilio G. Cota" This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota Signed-off-by: Richard Henderson --- include/exec/exec-all.h | 20 +++++++++++++++++++- include/exec/tb-hash-xx.h | 9 ++++++--- include/exec/tb-hash.h | 4 ++-- include/exec/tb-lookup.h | 6 +++--- tcg/tcg.h | 1 - accel/tcg/cpu-exec.c | 45 +++++++++++++++++++++++--------------------= -- accel/tcg/tcg-runtime.c | 2 +- accel/tcg/translate-all.c | 13 +++++++++---- exec.c | 2 +- tests/qht-bench.c | 2 +- 10 files changed, 65 insertions(+), 39 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 53f1835c43..352abc7450 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -325,6 +325,9 @@ struct TranslationBlock { #define CF_USE_ICOUNT 0x20000 #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */ #define CF_INVALID 0x80000 /* TB is stale. Setters must acquire tb_loc= k */ +#define CF_PARALLEL 0x100000 /* Generate code for a parallel context */ +/* cflags' mask for hashing/comparison */ +#define CF_HASH_MASK (CF_PARALLEL) =20 /* Per-vCPU dynamic tracing state used to generate this TB */ uint32_t trace_vcpu_dstate; @@ -365,11 +368,26 @@ struct TranslationBlock { uintptr_t jmp_list_first; }; =20 +extern bool parallel_cpus; + +/* Hide the atomic_read to make code a little easier on the eyes */ +static inline uint32_t tb_cflags(const TranslationBlock *tb) +{ + return atomic_read(&tb->cflags); +} + +/* current cflags for hashing/comparison */ +static inline uint32_t curr_cflags(void) +{ + return parallel_cpus ? CF_PARALLEL : 0; +} + void tb_free(TranslationBlock *tb); void tb_flush(CPUState *cpu); void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr); TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, - target_ulong cs_base, uint32_t flags); + target_ulong cs_base, uint32_t flags, + uint32_t cf_mask); void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr); =20 /* GETPC is the true target of the return instruction that we'll execute. = */ diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h index 6cd3022c07..747a9a612c 100644 --- a/include/exec/tb-hash-xx.h +++ b/include/exec/tb-hash-xx.h @@ -48,8 +48,8 @@ * xxhash32, customized for input variables that are not guaranteed to be * contiguous in memory. */ -static inline -uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f) +static inline uint32_t +tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f, uint32_t g) { uint32_t v1 =3D TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2; uint32_t v2 =3D TB_HASH_XX_SEED + PRIME32_2; @@ -78,7 +78,7 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t= e, uint32_t f) v4 *=3D PRIME32_1; =20 h32 =3D rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18); - h32 +=3D 24; + h32 +=3D 28; =20 h32 +=3D e * PRIME32_3; h32 =3D rol32(h32, 17) * PRIME32_4; @@ -86,6 +86,9 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t= e, uint32_t f) h32 +=3D f * PRIME32_3; h32 =3D rol32(h32, 17) * PRIME32_4; =20 + h32 +=3D g * PRIME32_3; + h32 =3D rol32(h32, 17) * PRIME32_4; + h32 ^=3D h32 >> 15; h32 *=3D PRIME32_2; h32 ^=3D h32 >> 13; diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h index 17b5ee0edf..0526c4f678 100644 --- a/include/exec/tb-hash.h +++ b/include/exec/tb-hash.h @@ -59,9 +59,9 @@ static inline unsigned int tb_jmp_cache_hash_func(target_= ulong pc) =20 static inline uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t fl= ags, - uint32_t trace_vcpu_dstate) + uint32_t cf_mask, uint32_t trace_vcpu_dstate) { - return tb_hash_func6(phys_pc, pc, flags, trace_vcpu_dstate); + return tb_hash_func7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate); } =20 #endif diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h index 436b6d5ecf..296138591a 100644 --- a/include/exec/tb-lookup.h +++ b/include/exec/tb-lookup.h @@ -21,7 +21,7 @@ /* Might cause an exception, so have a longjmp destination ready */ static inline TranslationBlock * tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_bas= e, - uint32_t *flags) + uint32_t *flags, uint32_t cf_mask) { CPUArchState *env =3D (CPUArchState *)cpu->env_ptr; TranslationBlock *tb; @@ -35,10 +35,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, t= arget_ulong *cs_base, tb->cs_base =3D=3D *cs_base && tb->flags =3D=3D *flags && tb->trace_vcpu_dstate =3D=3D *cpu->trace_dstate && - !(atomic_read(&tb->cflags) & CF_INVALID))) { + (tb_cflags(tb) & (CF_HASH_MASK | CF_INVALID)) =3D=3D cf_mas= k)) { return tb; } - tb =3D tb_htable_lookup(cpu, *pc, *cs_base, *flags); + tb =3D tb_htable_lookup(cpu, *pc, *cs_base, *flags, cf_mask); if (tb =3D=3D NULL) { return NULL; } diff --git a/tcg/tcg.h b/tcg/tcg.h index c85786fb47..92d7468cec 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -690,7 +690,6 @@ struct TCGContext { }; =20 extern TCGContext tcg_ctx; -extern bool parallel_cpus; =20 static inline size_t temp_idx(TCGTemp *ts) { diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 363dfa208a..39ec9508d1 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -207,7 +207,8 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cyc= les, tb_lock(); tb =3D tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags, max_cycles | CF_NOCACHE - | (ignore_icount ? CF_IGNORE_ICOUNT : 0)); + | (ignore_icount ? CF_IGNORE_ICOUNT : 0) + | curr_cflags()); tb->orig_tb =3D orig_tb; tb_unlock(); =20 @@ -225,31 +226,27 @@ static void cpu_exec_nocache(CPUState *cpu, int max_c= ycles, static void cpu_exec_step(CPUState *cpu) { CPUClass *cc =3D CPU_GET_CLASS(cpu); - CPUArchState *env =3D (CPUArchState *)cpu->env_ptr; TranslationBlock *tb; target_ulong cs_base, pc; uint32_t flags; + uint32_t cflags =3D 1 | CF_IGNORE_ICOUNT; =20 - cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); if (sigsetjmp(cpu->jmp_env, 0) =3D=3D 0) { - mmap_lock(); - tb_lock(); - tb =3D tb_gen_code(cpu, pc, cs_base, flags, - 1 | CF_NOCACHE | CF_IGNORE_ICOUNT); - tb->orig_tb =3D NULL; - tb_unlock(); - mmap_unlock(); + tb =3D tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, + cflags & CF_HASH_MASK); + if (tb =3D=3D NULL) { + mmap_lock(); + tb_lock(); + tb =3D tb_gen_code(cpu, pc, cs_base, flags, cflags); + tb_unlock(); + mmap_unlock(); + } =20 cc->cpu_exec_enter(cpu); /* execute the generated code */ - trace_exec_tb_nocache(tb, pc); + trace_exec_tb(tb, pc); cpu_tb_exec(cpu, tb); cc->cpu_exec_exit(cpu); - - tb_lock(); - tb_phys_invalidate(tb, -1); - tb_free(tb); - tb_unlock(); } else { /* We may have exited due to another problem here, so we need * to reset any tb_locks we may have taken but didn't release. @@ -281,6 +278,7 @@ struct tb_desc { CPUArchState *env; tb_page_addr_t phys_page1; uint32_t flags; + uint32_t cf_mask; uint32_t trace_vcpu_dstate; }; =20 @@ -294,7 +292,7 @@ static bool tb_cmp(const void *p, const void *d) tb->cs_base =3D=3D desc->cs_base && tb->flags =3D=3D desc->flags && tb->trace_vcpu_dstate =3D=3D desc->trace_vcpu_dstate && - !(atomic_read(&tb->cflags) & CF_INVALID)) { + (tb_cflags(tb) & (CF_HASH_MASK | CF_INVALID)) =3D=3D desc->cf_mask= ) { /* check next page if needed */ if (tb->page_addr[1] =3D=3D -1) { return true; @@ -313,7 +311,8 @@ static bool tb_cmp(const void *p, const void *d) } =20 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, - target_ulong cs_base, uint32_t flags) + target_ulong cs_base, uint32_t flags, + uint32_t cf_mask) { tb_page_addr_t phys_pc; struct tb_desc desc; @@ -322,11 +321,12 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, tar= get_ulong pc, desc.env =3D (CPUArchState *)cpu->env_ptr; desc.cs_base =3D cs_base; desc.flags =3D flags; + desc.cf_mask =3D cf_mask; desc.trace_vcpu_dstate =3D *cpu->trace_dstate; desc.pc =3D pc; phys_pc =3D get_page_addr_code(desc.env, pc); desc.phys_page1 =3D phys_pc & TARGET_PAGE_MASK; - h =3D tb_hash_func(phys_pc, pc, flags, *cpu->trace_dstate); + h =3D tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate); return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h); } =20 @@ -373,8 +373,9 @@ static inline TranslationBlock *tb_find(CPUState *cpu, target_ulong cs_base, pc; uint32_t flags; bool acquired_tb_lock =3D false; + uint32_t cf_mask =3D curr_cflags(); =20 - tb =3D tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags); + tb =3D tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask); if (tb =3D=3D NULL) { /* mmap_lock is needed by tb_gen_code, and mmap_lock must be * taken outside tb_lock. As system emulation is currently @@ -387,10 +388,10 @@ static inline TranslationBlock *tb_find(CPUState *cpu, /* There's a chance that our desired tb has been translated while * taking the locks so we check again inside the lock. */ - tb =3D tb_htable_lookup(cpu, pc, cs_base, flags); + tb =3D tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); if (likely(tb =3D=3D NULL)) { /* if no translated code available, then translate it now */ - tb =3D tb_gen_code(cpu, pc, cs_base, flags, 0); + tb =3D tb_gen_code(cpu, pc, cs_base, flags, cf_mask); } =20 mmap_unlock(); diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c index 54d89100d9..25f0cabfed 100644 --- a/accel/tcg/tcg-runtime.c +++ b/accel/tcg/tcg-runtime.c @@ -151,7 +151,7 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env) target_ulong cs_base, pc; uint32_t flags; =20 - tb =3D tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags); + tb =3D tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, curr_cflags()); if (tb =3D=3D NULL) { return tcg_ctx.code_gen_epilogue; } diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 1b43deb0cd..7ad65bc705 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1101,7 +1101,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page= _addr_t page_addr) =20 /* remove the TB from the hash list */ phys_pc =3D tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); - h =3D tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate); + h =3D tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MA= SK, + tb->trace_vcpu_dstate); qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); =20 /* remove the TB from the page list */ @@ -1245,7 +1246,8 @@ static void tb_link_page(TranslationBlock *tb, tb_pag= e_addr_t phys_pc, } =20 /* add in the hash table */ - h =3D tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate); + h =3D tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MA= SK, + tb->trace_vcpu_dstate); qht_insert(&tcg_ctx.tb_ctx.htable, tb, h); =20 #ifdef CONFIG_USER_ONLY @@ -1548,7 +1550,8 @@ void tb_invalidate_phys_page_range(tb_page_addr_t sta= rt, tb_page_addr_t end, /* we generate a block containing just the instruction modifying the memory. It will ensure that it cannot modify itself */ - tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1); + tb_gen_code(cpu, current_pc, current_cs_base, current_flags, + 1 | curr_cflags()); cpu_loop_exit_noexc(cpu); } #endif @@ -1666,7 +1669,8 @@ static bool tb_invalidate_phys_page(tb_page_addr_t ad= dr, uintptr_t pc) /* we generate a block containing just the instruction modifying the memory. It will ensure that it cannot modify itself */ - tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1); + tb_gen_code(cpu, current_pc, current_cs_base, current_flags, + 1 | curr_cflags()); /* tb_lock will be reset after cpu_loop_exit_noexc longjmps * back into the cpu_exec loop. */ return true; @@ -1810,6 +1814,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retadd= r) } =20 cflags =3D n | CF_LAST_IO; + cflags |=3D curr_cflags(); pc =3D tb->pc; cs_base =3D tb->cs_base; flags =3D tb->flags; diff --git a/exec.c b/exec.c index de03053d32..3e0a3dae46 100644 --- a/exec.c +++ b/exec.c @@ -2476,7 +2476,7 @@ static void check_watchpoint(int offset, int len, Mem= TxAttrs attrs, int flags) cpu_loop_exit(cpu); } else { cpu_get_tb_cpu_state(env, &pc, &cs_base, &cpu_flags); - tb_gen_code(cpu, pc, cs_base, cpu_flags, 1); + tb_gen_code(cpu, pc, cs_base, cpu_flags, 1 | curr_cfla= gs()); cpu_loop_exit_noexc(cpu); } } diff --git a/tests/qht-bench.c b/tests/qht-bench.c index 11c1cec766..4cabdfd62a 100644 --- a/tests/qht-bench.c +++ b/tests/qht-bench.c @@ -103,7 +103,7 @@ static bool is_equal(const void *obj, const void *userp) =20 static inline uint32_t h(unsigned long v) { - return tb_hash_func6(v, 0, 0, 0); + return tb_hash_func7(v, 0, 0, 0, 0); } =20 /* --=20 2.13.6