From nobody Mon Apr 29 01:37:41 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=gmail.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1538615198725931.6088168489553; Wed, 3 Oct 2018 18:06:38 -0700 (PDT) Received: from localhost ([::1]:53714 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g7s62-0002Fq-DV for importer@patchew.org; Wed, 03 Oct 2018 21:06:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46644) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g7s4v-0001v1-OV for qemu-devel@nongnu.org; Wed, 03 Oct 2018 21:05:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g7s4t-0003J8-SQ for qemu-devel@nongnu.org; Wed, 03 Oct 2018 21:05:21 -0400 Received: from mail-lf1-x142.google.com ([2a00:1450:4864:20::142]:40515) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1g7s4t-0003C8-3a for qemu-devel@nongnu.org; Wed, 03 Oct 2018 21:05:19 -0400 Received: by mail-lf1-x142.google.com with SMTP id t22-v6so5516044lfb.7 for ; Wed, 03 Oct 2018 18:05:16 -0700 (PDT) Received: from octofox.cadence.com (jcmvbkbc-1-pt.tunnel.tserv24.sto1.ipv6.he.net. [2001:470:27:1fa::2]) by smtp.gmail.com with ESMTPSA id g68-v6sm704843lje.44.2018.10.03.18.05.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Oct 2018 18:05:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=CQf4AKWIo8xTBCO1Bkq06i6RRIRwkaHbR7lHfTGtO/4=; b=WJN4ZBSDvvIdfRkJi76iTfGSZjjChmG7jJux5/865tIvKx5CVcppKxKz91s+pM8z51 za3VNaYfDrdz5va7JCkRnx3mX1zGx3qEbYJcvMvPAKDDp3nyHXPHCrH4a12lnQuyAVmt sTq8a7BGrJ9OLmyt5z55A47S97nscfjdo9ppeoaNSLgPuqZrAeorAFA53WXwmGAYmU3+ 5afTLyt3L/Ecjy3gKXD8nBXDlGMKZxzycEo9PiO/ZMhQV+RPJLlHcjUgcifMlLvBglFT hq9/+cYdlUJBZ6UzJ+lwxkw2Y73ykPk2DpYP33HUYrqhqL/U3E2M9rhJlV/kUmmOBqVw c0CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=CQf4AKWIo8xTBCO1Bkq06i6RRIRwkaHbR7lHfTGtO/4=; b=n8gEA+2W8xYaODXSDSRaz3Ou9f78sZN7Gtk9LVxarRWxzN/5WGQ/xMcMLs7Gi8OxOO m4CaXf4X8JRjK4qsmBmpzjEoF4qAPepRW31enkWhCU+zbNf2Wv8J3gyBlMlX/xR6Mg8G Qu+khDeh3+AcasAJdqpBFV+as/PavWin1r0Iu0YJZ8MJjvszlkTiNJewfyHxmiAW1jT8 oiPwLIxg7Bu+a6b1odXxrft7csEX3LTS7to/IrutgUVSyxXqZgqk0HcjlpCKRLgPXZbt yQCkJOQL5ovV5IrGIDBpLssHrMZa/PpeUVIZ4b7YqdNGcomYL9G6GUM3bgTwkE4+kaLd vQ8g== X-Gm-Message-State: ABuFfogznDD+iauuyzLcskbrIgKwBO1AncmJHTjvbcvJ9y53PR1WygZB 9gfMfnYr9zbkqfz3BJLg3IGtEkMc X-Google-Smtp-Source: ACcGV617jB9OMFdpQvrAxz2KZgZthLAfNHacpq5JAmy9DamloLr7da0rXEC708Vs0xDQs1IztfMTtA== X-Received: by 2002:a19:d7d6:: with SMTP id q83-v6mr2280800lfi.27.1538615115251; Wed, 03 Oct 2018 18:05:15 -0700 (PDT) From: Max Filippov To: qemu-devel@nongnu.org Date: Wed, 3 Oct 2018 18:05:05 -0700 Message-Id: <20181004010505.26184-1-jcmvbkbc@gmail.com> X-Mailer: git-send-email 2.11.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::142 Subject: [Qemu-devel] [RFC] target/xtensa: rework zero overhead loops implementation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Max Filippov , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDMRC_1 RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Don't invalidate TB with the end of zero overhead loop when LBEG or LEND change. Instead encode the distance from the TB start to the LEND in the TB flags and generate loopback code when offset of the next PC from the TB start equals that distance. Distance not greater than the maximal instruction length is encoded literally, greater distances are capped at the target page size and encoded as the maximal instruction length plus the greatest power of 2 that is not bigger than the distance. Although this change adds dynamic TB search at the end of each zero overhead loop the resulting emulation speed is about 10% higher in uClibc-ng and LTP tests. Signed-off-by: Max Filippov --- dtc | 2 +- target/xtensa/cpu.h | 11 ++++++++++ target/xtensa/helper.h | 2 -- target/xtensa/op_helper.c | 24 --------------------- target/xtensa/translate.c | 53 ++++++++++++++++++++-----------------------= ---- 5 files changed, 34 insertions(+), 58 deletions(-) diff --git a/dtc b/dtc index 88f18909db73..e54388015af1 160000 --- a/dtc +++ b/dtc @@ -1 +1 @@ -Subproject commit 88f18909db731a627456f26d779445f84e449536 +Subproject commit e54388015af1fb4bf04d0bca99caba1074d9cc42 diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h index 34e5ccd9f1d6..ad76d75aadde 100644 --- a/target/xtensa/cpu.h +++ b/target/xtensa/cpu.h @@ -694,6 +694,8 @@ static inline int cpu_mmu_index(CPUXtensaState *env, bo= ol ifetch) #define XTENSA_TBFLAG_CWOE 0x40000 #define XTENSA_TBFLAG_CALLINC_MASK 0x180000 #define XTENSA_TBFLAG_CALLINC_SHIFT 19 +#define XTENSA_TBFLAG_LEND_MASK 0xfe00000 +#define XTENSA_TBFLAG_LEND_SHIFT 21 =20 static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, target_ulong = *pc, target_ulong *cs_base, uint32_t *flags) @@ -706,6 +708,15 @@ static inline void cpu_get_tb_cpu_state(CPUXtensaState= *env, target_ulong *pc, *flags |=3D xtensa_get_ring(env); if (env->sregs[PS] & PS_EXCM) { *flags |=3D XTENSA_TBFLAG_EXCM; + } else if (xtensa_option_enabled(env->config, XTENSA_OPTION_LOOP)) { + target_ulong lend_dist =3D env->sregs[LEND] - env->pc; + + if (lend_dist > (1u << TARGET_PAGE_BITS)) { + lend_dist =3D MAX_INSN_LENGTH + 31 - TARGET_PAGE_BITS; + } else if (lend_dist > MAX_INSN_LENGTH) { + lend_dist =3D MAX_INSN_LENGTH + 31 - clz32(lend_dist); + } + *flags |=3D lend_dist << XTENSA_TBFLAG_LEND_SHIFT; } if (xtensa_option_enabled(env->config, XTENSA_OPTION_EXTENDED_L32R) && (env->sregs[LITBASE] & 1)) { diff --git a/target/xtensa/helper.h b/target/xtensa/helper.h index 10153c245360..2ebba0b2c2bf 100644 --- a/target/xtensa/helper.h +++ b/target/xtensa/helper.h @@ -12,8 +12,6 @@ DEF_HELPER_2(rotw, void, env, i32) DEF_HELPER_3(window_check, noreturn, env, i32, i32) DEF_HELPER_1(restore_owb, void, env) DEF_HELPER_2(movsp, void, env, i32) -DEF_HELPER_2(wsr_lbeg, void, env, i32) -DEF_HELPER_2(wsr_lend, void, env, i32) #ifndef CONFIG_USER_ONLY DEF_HELPER_1(simcall, void, env) #endif diff --git a/target/xtensa/op_helper.c b/target/xtensa/op_helper.c index e4b42ab3e56c..078aeb6c2c94 100644 --- a/target/xtensa/op_helper.c +++ b/target/xtensa/op_helper.c @@ -107,13 +107,6 @@ static void tb_invalidate_virtual_addr(CPUXtensaState = *env, uint32_t vaddr) } } =20 -#else - -static void tb_invalidate_virtual_addr(CPUXtensaState *env, uint32_t vaddr) -{ - tb_invalidate_phys_addr(vaddr); -} - #endif =20 void HELPER(exception)(CPUXtensaState *env, uint32_t excp) @@ -370,23 +363,6 @@ void HELPER(movsp)(CPUXtensaState *env, uint32_t pc) } } =20 -void HELPER(wsr_lbeg)(CPUXtensaState *env, uint32_t v) -{ - if (env->sregs[LBEG] !=3D v) { - tb_invalidate_virtual_addr(env, env->sregs[LEND] - 1); - env->sregs[LBEG] =3D v; - } -} - -void HELPER(wsr_lend)(CPUXtensaState *env, uint32_t v) -{ - if (env->sregs[LEND] !=3D v) { - tb_invalidate_virtual_addr(env, env->sregs[LEND] - 1); - env->sregs[LEND] =3D v; - tb_invalidate_virtual_addr(env, env->sregs[LEND] - 1); - } -} - void HELPER(dump_state)(CPUXtensaState *env) { XtensaCPU *cpu =3D xtensa_env_get_cpu(env); diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c index 46e13384488e..c48285ce207e 100644 --- a/target/xtensa/translate.c +++ b/target/xtensa/translate.c @@ -54,7 +54,7 @@ struct DisasContext { int cring; int ring; uint32_t lbeg; - uint32_t lend; + uint32_t lend_dist; =20 bool sar_5bit; bool sar_m32_5bit; @@ -431,14 +431,13 @@ static void gen_callwi(DisasContext *dc, int callinc,= uint32_t dest, int slot) =20 static bool gen_check_loop_end(DisasContext *dc, int slot) { - if (option_enabled(dc, XTENSA_OPTION_LOOP) && - !(dc->base.tb->flags & XTENSA_TBFLAG_EXCM) && - dc->base.pc_next =3D=3D dc->lend) { + if (dc->lend_dist && dc->lend_dist <=3D MAX_INSN_LENGTH && + dc->base.pc_next - dc->base.pc_first =3D=3D dc->lend_dist) { TCGLabel *label =3D gen_new_label(); =20 tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_SR[LCOUNT], 0, label); tcg_gen_subi_i32(cpu_SR[LCOUNT], cpu_SR[LCOUNT], 1); - gen_jumpi(dc, dc->lbeg, slot); + gen_jump(dc, cpu_SR[LBEG]); gen_set_label(label); gen_jumpi(dc, dc->base.pc_next, -1); return true; @@ -534,16 +533,6 @@ static void gen_rsr(DisasContext *dc, TCGv_i32 d, uint= 32_t sr) } } =20 -static void gen_wsr_lbeg(DisasContext *dc, uint32_t sr, TCGv_i32 s) -{ - gen_helper_wsr_lbeg(cpu_env, s); -} - -static void gen_wsr_lend(DisasContext *dc, uint32_t sr, TCGv_i32 s) -{ - gen_helper_wsr_lend(cpu_env, s); -} - static void gen_wsr_sar(DisasContext *dc, uint32_t sr, TCGv_i32 s) { tcg_gen_andi_i32(cpu_SR[sr], s, 0x3f); @@ -743,8 +732,6 @@ static void gen_wsr(DisasContext *dc, uint32_t sr, TCGv= _i32 s) { static void (* const wsr_handler[256])(DisasContext *dc, uint32_t sr, TCGv_i32 v) =3D { - [LBEG] =3D gen_wsr_lbeg, - [LEND] =3D gen_wsr_lend, [SAR] =3D gen_wsr_sar, [BR] =3D gen_wsr_br, [LITBASE] =3D gen_wsr_litbase, @@ -1098,7 +1085,11 @@ static void xtensa_tr_init_disas_context(DisasContex= tBase *dcbase, dc->ring =3D tb_flags & XTENSA_TBFLAG_RING_MASK; dc->cring =3D (tb_flags & XTENSA_TBFLAG_EXCM) ? 0 : dc->ring; dc->lbeg =3D env->sregs[LBEG]; - dc->lend =3D env->sregs[LEND]; + dc->lend_dist =3D (tb_flags & XTENSA_TBFLAG_LEND_MASK) >> + XTENSA_TBFLAG_LEND_SHIFT; + if (dc->lend_dist > MAX_INSN_LENGTH) { + dc->lend_dist =3D (1 << (dc->lend_dist - MAX_INSN_LENGTH)) + 1; + } dc->debug =3D tb_flags & XTENSA_TBFLAG_DEBUG; dc->icount =3D tb_flags & XTENSA_TBFLAG_ICOUNT; dc->cpenable =3D (tb_flags & XTENSA_TBFLAG_CPENABLE_MASK) >> @@ -1150,7 +1141,6 @@ static void xtensa_tr_translate_insn(DisasContextBase= *dcbase, CPUState *cpu) { DisasContext *dc =3D container_of(dcbase, DisasContext, base); CPUXtensaState *env =3D cpu->env_ptr; - target_ulong page_start; =20 /* These two conditions only apply to the first insn in the TB, but this is the first TranslateOps hook that allows exiting. */ @@ -1189,11 +1179,16 @@ static void xtensa_tr_translate_insn(DisasContextBa= se *dcbase, CPUState *cpu) } =20 /* End the TB if the next insn will cross into the next page. */ - page_start =3D dc->base.pc_first & TARGET_PAGE_MASK; - if (dc->base.is_jmp =3D=3D DISAS_NEXT && - (dc->pc - page_start >=3D TARGET_PAGE_SIZE || - dc->pc - page_start + xtensa_insn_len(env, dc) > TARGET_PAGE_SIZE= )) { - dc->base.is_jmp =3D DISAS_TOO_MANY; + if (dc->base.is_jmp =3D=3D DISAS_NEXT) { + target_ulong page_start =3D dc->base.pc_first & TARGET_PAGE_MASK; + unsigned next_insn_len =3D xtensa_insn_len(env, dc); + + if (dc->pc - page_start >=3D TARGET_PAGE_SIZE || + dc->pc - page_start + next_insn_len > TARGET_PAGE_SIZE || + (dc->lend_dist && + dc->pc - dc->base.pc_first + next_insn_len > dc->lend_dist)) { + dc->base.is_jmp =3D DISAS_TOO_MANY; + } } } =20 @@ -1712,12 +1707,10 @@ static void translate_loop(DisasContext *dc, const = uint32_t arg[], const uint32_t par[]) { uint32_t lend =3D arg[1]; - TCGv_i32 tmp =3D tcg_const_i32(lend); =20 tcg_gen_subi_i32(cpu_SR[LCOUNT], cpu_R[arg[0]], 1); tcg_gen_movi_i32(cpu_SR[LBEG], dc->base.pc_next); - gen_helper_wsr_lend(cpu_env, tmp); - tcg_temp_free(tmp); + tcg_gen_movi_i32(cpu_SR[LEND], lend); =20 if (par[0] !=3D TCG_COND_NEVER) { TCGLabel *label =3D gen_new_label(); @@ -4609,7 +4602,6 @@ static const XtensaOpcodeOps core_ops[] =3D { .translate =3D translate_wsr, .test_ill =3D test_ill_wsr, .par =3D (const uint32_t[]){LBEG}, - .op_flags =3D XTENSA_OP_EXIT_TB_0, .windowed_register_op =3D 0x1, }, { .name =3D "wsr.lcount", @@ -4622,7 +4614,7 @@ static const XtensaOpcodeOps core_ops[] =3D { .translate =3D translate_wsr, .test_ill =3D test_ill_wsr, .par =3D (const uint32_t[]){LEND}, - .op_flags =3D XTENSA_OP_EXIT_TB_0, + .op_flags =3D XTENSA_OP_EXIT_TB_M1, .windowed_register_op =3D 0x1, }, { .name =3D "wsr.litbase", @@ -5183,7 +5175,6 @@ static const XtensaOpcodeOps core_ops[] =3D { .translate =3D translate_xsr, .test_ill =3D test_ill_xsr, .par =3D (const uint32_t[]){LBEG}, - .op_flags =3D XTENSA_OP_EXIT_TB_0, .windowed_register_op =3D 0x1, }, { .name =3D "xsr.lcount", @@ -5196,7 +5187,7 @@ static const XtensaOpcodeOps core_ops[] =3D { .translate =3D translate_xsr, .test_ill =3D test_ill_xsr, .par =3D (const uint32_t[]){LEND}, - .op_flags =3D XTENSA_OP_EXIT_TB_0, + .op_flags =3D XTENSA_OP_EXIT_TB_M1, .windowed_register_op =3D 0x1, }, { .name =3D "xsr.litbase", --=20 2.11.0