From nobody Thu Apr 25 23:02:50 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from lists.gnu.org (209.51.188.17 [209.51.188.17]) by mx.zohomail.com with SMTPS id 154724725362596.61403253458798; Fri, 11 Jan 2019 14:54:13 -0800 (PST) Received: from localhost ([127.0.0.1]:59197 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gi5gd-0003Om-5C for importer@patchew.org; Fri, 11 Jan 2019 17:53:59 -0500 Received: from eggs.gnu.org ([209.51.188.92]:45479) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gi5fl-00033x-LA for qemu-devel@nongnu.org; Fri, 11 Jan 2019 17:53:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gi5fk-00019n-0O for qemu-devel@nongnu.org; Fri, 11 Jan 2019 17:53:05 -0500 Received: from mail-lj1-x243.google.com ([2a00:1450:4864:20::243]:38345) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gi5fj-00019J-Io for qemu-devel@nongnu.org; Fri, 11 Jan 2019 17:53:03 -0500 Received: by mail-lj1-x243.google.com with SMTP id c19-v6so14322684lja.5 for ; Fri, 11 Jan 2019 14:53:03 -0800 (PST) Received: from octofox.cadence.com (jcmvbkbc-1-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:1fa::2]) by smtp.gmail.com with ESMTPSA id d23sm15034441lfc.11.2019.01.11.14.52.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Jan 2019 14:53:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=axGsCTSKWeyeUkp+maWnr3gF8OyC7chJGmyOgbGztZk=; b=q+pYY88f0enliPraPhFNERc98hF0luC3pszQKS47hRcDwBuUR4EuSRaW2NeJMcSkmA EyW4ve2p1Fo2jqH2VQpAO5PhzVaZirnDSuGfsuh6dGOqDIBqoZbpz8y7YZ9REPWO3ide kGaUumgmQdSsc100hn8QNiwttlkzM59rUOtBCWFF7olL89Q5TrlQ+mslwt4cX3bAFGl9 iZ1wsb03W8KONqdVS8BlmPjmimi1QMNw3ZFdhWBZzN3vwgGan/a3hP+b1sbY+OzSo/IA /eA7uVWqowNCx2k5Uv/yMx2uUIneuzB0GIuGJbWwPNr6Qss/BFT/kS4sYnf0ugCr8fzW tHng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=axGsCTSKWeyeUkp+maWnr3gF8OyC7chJGmyOgbGztZk=; b=kX9rhXsuuYSN2wg8KLWrmP2w6fOsnC1RdeSItxn+/7aGoWJQq86xTZ2/KORLTLMBSM upaEzYomfZBUi0Wkc7P1MkMhjarCsfIWxC7XnHIyXVaQtCWLwM4SIqZY9zwMj7SoGfAf KbC9xp/GXnasqmjABxEUH/St5bF5GdrhCcm/xxFE23KA0uXmNQYzxsv4b/OlN4KW55Qp 0IFRLRGuSUiKcGvbMHVfwMho3k9P99NPDOKd9dRjDp3pVz5/6bNSCThS5G25msN0mGzD YNK5E2aOELF/g4ydZfK2ACxyDRnxTwZO0P0NlZDJI+eM3HVQ5Iw3zxJY3OLGlpXLRXs5 m3Pg== X-Gm-Message-State: AJcUukd1oKzkIYbzfjoeAcCipUcCj4wqjtZp+QC+4ppnXODSrN08yHLA 3F1rol/vpInqxp3t+FRcgRbBklKzW0w= X-Google-Smtp-Source: ALg8bN5jHa1JK89Qg2O885BOSZUefK6gRIsXTqJAGEF+bkLN62UWegtwCDzlQIE/3fYPuQK7/iXRUg== X-Received: by 2002:a2e:5152:: with SMTP id b18-v6mr8309557lje.88.1547247181594; Fri, 11 Jan 2019 14:53:01 -0800 (PST) From: Max Filippov To: qemu-devel@nongnu.org Date: Fri, 11 Jan 2019 14:52:44 -0800 Message-Id: <20190111225244.27619-1-jcmvbkbc@gmail.com> X-Mailer: git-send-email 2.11.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::243 Subject: [Qemu-devel] [PATCH v3] target/xtensa: rework zero overhead loops implementation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Max Filippov , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Don't invalidate TB with the end of zero overhead loop when LBEG or LEND change. Instead encode the distance from the start of the page where the TB starts to the LEND in the TB cs_base and generate loopback code when the next PC matches encoded LEND. Distance to a destination within the same page and up to a maximum instruction length into the next page is encoded literally, otherwise it's zero. The distance from LEND to LBEG is also encoded in the cs_base: it's encoded literally when less than 256 or as 0 otherwise. This allows for TB chaining for the loopback branch at the end of a loop for the most common loop sizes. With this change the resulting emulation speed is about 10% higher in softmmu mode on uClibc-ng and LTP tests. Emulation speed in linux user mode is a few percent lower because there's no direct TB chaining between different memory pages. Testing with lower limit on direct TB chaining range shows gradual slowdown to ~15% for the block size of 64 bytes and ~50% for the block size of 32 bytes. Signed-off-by: Max Filippov Reviewed-by: Richard Henderson --- Changes v2->v3: - improve comment wording Changes v1->v2: - drop LINKABLE_*; - add comment about LEND offset encoding; target/xtensa/cpu.h | 32 ++++++++++++++++++++++++++ target/xtensa/helper.h | 2 -- target/xtensa/op_helper.c | 24 -------------------- target/xtensa/overlay_tool.h | 1 + target/xtensa/translate.c | 53 +++++++++++++---------------------------= ---- 5 files changed, 49 insertions(+), 63 deletions(-) diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h index 34e5ccd9f1d6..bf6f9a09b62c 100644 --- a/target/xtensa/cpu.h +++ b/target/xtensa/cpu.h @@ -400,6 +400,7 @@ struct XtensaConfig { int excm_level; int ndepc; unsigned inst_fetch_width; + unsigned max_insn_size; uint32_t vecbase; uint32_t exception_vector[EXC_MAX]; unsigned ninterrupt; @@ -695,6 +696,11 @@ static inline int cpu_mmu_index(CPUXtensaState *env, b= ool ifetch) #define XTENSA_TBFLAG_CALLINC_MASK 0x180000 #define XTENSA_TBFLAG_CALLINC_SHIFT 19 =20 +#define XTENSA_CSBASE_LEND_MASK 0x0000ffff +#define XTENSA_CSBASE_LEND_SHIFT 0 +#define XTENSA_CSBASE_LBEG_OFF_MASK 0x00ff0000 +#define XTENSA_CSBASE_LBEG_OFF_SHIFT 16 + static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, target_ulong = *pc, target_ulong *cs_base, uint32_t *flags) { @@ -706,6 +712,32 @@ static inline void cpu_get_tb_cpu_state(CPUXtensaState= *env, target_ulong *pc, *flags |=3D xtensa_get_ring(env); if (env->sregs[PS] & PS_EXCM) { *flags |=3D XTENSA_TBFLAG_EXCM; + } else if (xtensa_option_enabled(env->config, XTENSA_OPTION_LOOP)) { + target_ulong lend_dist =3D + env->sregs[LEND] - (env->pc & -(1u << TARGET_PAGE_BITS)); + + /* + * 0 in the csbase_lend field means that there may not be a loopba= ck + * for any instruction that starts inside this page. Any other val= ue + * means that an instruction that ends at this offset from the page + * start may loop back and will need loopback code to be generated. + * + * lend_dist is 0 when LEND points to the start of the page, but + * no instruction that starts inside this page may end at offset 0, + * so it's still correct. + * + * When an instruction ends at a page boundary it may only start in + * the previous page. lend_dist will be encoded as TARGET_PAGE_SIZE + * for the TB that contains this instruction. + */ + if (lend_dist < (1u << TARGET_PAGE_BITS) + env->config->max_insn_s= ize) { + target_ulong lbeg_off =3D env->sregs[LEND] - env->sregs[LBEG]; + + *cs_base =3D lend_dist; + if (lbeg_off < 256) { + *cs_base |=3D lbeg_off << XTENSA_CSBASE_LBEG_OFF_SHIFT; + } + } } if (xtensa_option_enabled(env->config, XTENSA_OPTION_EXTENDED_L32R) && (env->sregs[LITBASE] & 1)) { diff --git a/target/xtensa/helper.h b/target/xtensa/helper.h index 10153c245360..2ebba0b2c2bf 100644 --- a/target/xtensa/helper.h +++ b/target/xtensa/helper.h @@ -12,8 +12,6 @@ DEF_HELPER_2(rotw, void, env, i32) DEF_HELPER_3(window_check, noreturn, env, i32, i32) DEF_HELPER_1(restore_owb, void, env) DEF_HELPER_2(movsp, void, env, i32) -DEF_HELPER_2(wsr_lbeg, void, env, i32) -DEF_HELPER_2(wsr_lend, void, env, i32) #ifndef CONFIG_USER_ONLY DEF_HELPER_1(simcall, void, env) #endif diff --git a/target/xtensa/op_helper.c b/target/xtensa/op_helper.c index e4b42ab3e56c..078aeb6c2c94 100644 --- a/target/xtensa/op_helper.c +++ b/target/xtensa/op_helper.c @@ -107,13 +107,6 @@ static void tb_invalidate_virtual_addr(CPUXtensaState = *env, uint32_t vaddr) } } =20 -#else - -static void tb_invalidate_virtual_addr(CPUXtensaState *env, uint32_t vaddr) -{ - tb_invalidate_phys_addr(vaddr); -} - #endif =20 void HELPER(exception)(CPUXtensaState *env, uint32_t excp) @@ -370,23 +363,6 @@ void HELPER(movsp)(CPUXtensaState *env, uint32_t pc) } } =20 -void HELPER(wsr_lbeg)(CPUXtensaState *env, uint32_t v) -{ - if (env->sregs[LBEG] !=3D v) { - tb_invalidate_virtual_addr(env, env->sregs[LEND] - 1); - env->sregs[LBEG] =3D v; - } -} - -void HELPER(wsr_lend)(CPUXtensaState *env, uint32_t v) -{ - if (env->sregs[LEND] !=3D v) { - tb_invalidate_virtual_addr(env, env->sregs[LEND] - 1); - env->sregs[LEND] =3D v; - tb_invalidate_virtual_addr(env, env->sregs[LEND] - 1); - } -} - void HELPER(dump_state)(CPUXtensaState *env) { XtensaCPU *cpu =3D xtensa_env_get_cpu(env); diff --git a/target/xtensa/overlay_tool.h b/target/xtensa/overlay_tool.h index ee37a04a176c..12609a0d0c1e 100644 --- a/target/xtensa/overlay_tool.h +++ b/target/xtensa/overlay_tool.h @@ -457,6 +457,7 @@ .nareg =3D XCHAL_NUM_AREGS, \ .ndepc =3D (XCHAL_XEA_VERSION >=3D 2), \ .inst_fetch_width =3D XCHAL_INST_FETCH_WIDTH, \ + .max_insn_size =3D XCHAL_MAX_INSTRUCTION_SIZE, \ EXCEPTIONS_SECTION, \ INTERRUPTS_SECTION, \ TLB_SECTION, \ diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c index 46e13384488e..a435d9c36cf1 100644 --- a/target/xtensa/translate.c +++ b/target/xtensa/translate.c @@ -53,7 +53,7 @@ struct DisasContext { uint32_t pc; int cring; int ring; - uint32_t lbeg; + uint32_t lbeg_off; uint32_t lend; =20 bool sar_5bit; @@ -390,11 +390,9 @@ static void gen_jump(DisasContext *dc, TCGv dest) static void gen_jumpi(DisasContext *dc, uint32_t dest, int slot) { TCGv_i32 tmp =3D tcg_const_i32(dest); -#ifndef CONFIG_USER_ONLY if (((dc->base.pc_first ^ dest) & TARGET_PAGE_MASK) !=3D 0) { slot =3D -1; } -#endif gen_jump_slot(dc, tmp, slot); tcg_temp_free(tmp); } @@ -420,25 +418,25 @@ static void gen_callw(DisasContext *dc, int callinc, = TCGv_i32 dest) static void gen_callwi(DisasContext *dc, int callinc, uint32_t dest, int s= lot) { TCGv_i32 tmp =3D tcg_const_i32(dest); -#ifndef CONFIG_USER_ONLY if (((dc->base.pc_first ^ dest) & TARGET_PAGE_MASK) !=3D 0) { slot =3D -1; } -#endif gen_callw_slot(dc, callinc, tmp, slot); tcg_temp_free(tmp); } =20 static bool gen_check_loop_end(DisasContext *dc, int slot) { - if (option_enabled(dc, XTENSA_OPTION_LOOP) && - !(dc->base.tb->flags & XTENSA_TBFLAG_EXCM) && - dc->base.pc_next =3D=3D dc->lend) { + if (dc->base.pc_next =3D=3D dc->lend) { TCGLabel *label =3D gen_new_label(); =20 tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_SR[LCOUNT], 0, label); tcg_gen_subi_i32(cpu_SR[LCOUNT], cpu_SR[LCOUNT], 1); - gen_jumpi(dc, dc->lbeg, slot); + if (dc->lbeg_off) { + gen_jumpi(dc, dc->base.pc_next - dc->lbeg_off, slot); + } else { + gen_jump(dc, cpu_SR[LBEG]); + } gen_set_label(label); gen_jumpi(dc, dc->base.pc_next, -1); return true; @@ -534,16 +532,6 @@ static void gen_rsr(DisasContext *dc, TCGv_i32 d, uint= 32_t sr) } } =20 -static void gen_wsr_lbeg(DisasContext *dc, uint32_t sr, TCGv_i32 s) -{ - gen_helper_wsr_lbeg(cpu_env, s); -} - -static void gen_wsr_lend(DisasContext *dc, uint32_t sr, TCGv_i32 s) -{ - gen_helper_wsr_lend(cpu_env, s); -} - static void gen_wsr_sar(DisasContext *dc, uint32_t sr, TCGv_i32 s) { tcg_gen_andi_i32(cpu_SR[sr], s, 0x3f); @@ -743,8 +731,6 @@ static void gen_wsr(DisasContext *dc, uint32_t sr, TCGv= _i32 s) { static void (* const wsr_handler[256])(DisasContext *dc, uint32_t sr, TCGv_i32 v) =3D { - [LBEG] =3D gen_wsr_lbeg, - [LEND] =3D gen_wsr_lend, [SAR] =3D gen_wsr_sar, [BR] =3D gen_wsr_br, [LITBASE] =3D gen_wsr_litbase, @@ -906,13 +892,6 @@ static void disas_xtensa_insn(CPUXtensaState *env, Dis= asContext *dc) } =20 dc->base.pc_next =3D dc->pc + len; - if (xtensa_option_enabled(dc->config, XTENSA_OPTION_LOOP) && - dc->lbeg =3D=3D dc->pc && - ((dc->pc ^ (dc->base.pc_next - 1)) & -dc->config->inst_fetch_width= )) { - qemu_log_mask(LOG_GUEST_ERROR, - "unaligned first instruction of a loop (pc =3D %08x)= \n", - dc->pc); - } for (i =3D 1; i < len; ++i) { b[i] =3D cpu_ldub_code(env, dc->pc + i); } @@ -1097,8 +1076,10 @@ static void xtensa_tr_init_disas_context(DisasContex= tBase *dcbase, dc->pc =3D dc->base.pc_first; dc->ring =3D tb_flags & XTENSA_TBFLAG_RING_MASK; dc->cring =3D (tb_flags & XTENSA_TBFLAG_EXCM) ? 0 : dc->ring; - dc->lbeg =3D env->sregs[LBEG]; - dc->lend =3D env->sregs[LEND]; + dc->lbeg_off =3D (dc->base.tb->cs_base & XTENSA_CSBASE_LBEG_OFF_MASK) = >> + XTENSA_CSBASE_LBEG_OFF_SHIFT; + dc->lend =3D (dc->base.tb->cs_base & XTENSA_CSBASE_LEND_MASK) + + (dc->base.pc_first & TARGET_PAGE_MASK); dc->debug =3D tb_flags & XTENSA_TBFLAG_DEBUG; dc->icount =3D tb_flags & XTENSA_TBFLAG_ICOUNT; dc->cpenable =3D (tb_flags & XTENSA_TBFLAG_CPENABLE_MASK) >> @@ -1712,12 +1693,10 @@ static void translate_loop(DisasContext *dc, const = uint32_t arg[], const uint32_t par[]) { uint32_t lend =3D arg[1]; - TCGv_i32 tmp =3D tcg_const_i32(lend); =20 tcg_gen_subi_i32(cpu_SR[LCOUNT], cpu_R[arg[0]], 1); tcg_gen_movi_i32(cpu_SR[LBEG], dc->base.pc_next); - gen_helper_wsr_lend(cpu_env, tmp); - tcg_temp_free(tmp); + tcg_gen_movi_i32(cpu_SR[LEND], lend); =20 if (par[0] !=3D TCG_COND_NEVER) { TCGLabel *label =3D gen_new_label(); @@ -4609,7 +4588,7 @@ static const XtensaOpcodeOps core_ops[] =3D { .translate =3D translate_wsr, .test_ill =3D test_ill_wsr, .par =3D (const uint32_t[]){LBEG}, - .op_flags =3D XTENSA_OP_EXIT_TB_0, + .op_flags =3D XTENSA_OP_EXIT_TB_M1, .windowed_register_op =3D 0x1, }, { .name =3D "wsr.lcount", @@ -4622,7 +4601,7 @@ static const XtensaOpcodeOps core_ops[] =3D { .translate =3D translate_wsr, .test_ill =3D test_ill_wsr, .par =3D (const uint32_t[]){LEND}, - .op_flags =3D XTENSA_OP_EXIT_TB_0, + .op_flags =3D XTENSA_OP_EXIT_TB_M1, .windowed_register_op =3D 0x1, }, { .name =3D "wsr.litbase", @@ -5183,7 +5162,7 @@ static const XtensaOpcodeOps core_ops[] =3D { .translate =3D translate_xsr, .test_ill =3D test_ill_xsr, .par =3D (const uint32_t[]){LBEG}, - .op_flags =3D XTENSA_OP_EXIT_TB_0, + .op_flags =3D XTENSA_OP_EXIT_TB_M1, .windowed_register_op =3D 0x1, }, { .name =3D "xsr.lcount", @@ -5196,7 +5175,7 @@ static const XtensaOpcodeOps core_ops[] =3D { .translate =3D translate_xsr, .test_ill =3D test_ill_xsr, .par =3D (const uint32_t[]){LEND}, - .op_flags =3D XTENSA_OP_EXIT_TB_0, + .op_flags =3D XTENSA_OP_EXIT_TB_M1, .windowed_register_op =3D 0x1, }, { .name =3D "xsr.litbase", --=20 2.11.0