From nobody Tue Feb 10 18:54:41 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1504825059886144.7277231208459; Thu, 7 Sep 2017 15:57:39 -0700 (PDT) Received: from localhost ([::1]:42582 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dq5ju-0008Kb-VE for importer@patchew.org; Thu, 07 Sep 2017 18:57:38 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52403) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dq5UK-0003Ib-0A for qemu-devel@nongnu.org; Thu, 07 Sep 2017 18:41:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dq5UE-0008Pu-43 for qemu-devel@nongnu.org; Thu, 07 Sep 2017 18:41:32 -0400 Received: from mail-pg0-x22b.google.com ([2607:f8b0:400e:c05::22b]:35578) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dq5UD-0008PQ-RX for qemu-devel@nongnu.org; Thu, 07 Sep 2017 18:41:26 -0400 Received: by mail-pg0-x22b.google.com with SMTP id 188so1820206pgb.2 for ; Thu, 07 Sep 2017 15:41:25 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-108-236.tukw.qwest.net. [97.126.108.236]) by smtp.gmail.com with ESMTPSA id h19sm770678pfh.142.2017.09.07.15.41.22 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 07 Sep 2017 15:41:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=1KhGx0qtsp/MTCShkhoI1NsYI1NCaIblZvH8Oahm9TU=; b=JUmCS+8IFZ3Uc+R9GEjERjaA+gpfRz+lajt26SqqjlK+ggOqFGFVmHlqdnCNazc2YA rBGo9guDx/h4e6e+QPYwS/JmXOKn76jIO+C4WDA4351PudXtl2wXzdXmDU0KwFG1Nt20 ENOOLb9vo9OeekDj3cqmYBZaXQ8e4U+C5ioxk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=1KhGx0qtsp/MTCShkhoI1NsYI1NCaIblZvH8Oahm9TU=; b=X3g6qBywQv4X4FohLWKVmZ9L9pLk3PNoMCKAThYP+qhf9kBFdxvQAKa8vAuuhte+7h Z7+sZ0xAikvpq5ViM/PL9Kl2wkriMM3ZmRQeYbSuHfXKSUEDmUDbd9AxKT2tMFXewEZe UUl9lxpdYu0egtDJzKgPxZdU0yn76E9/Cy37tQ/snuc4HyM/3QWEpKi9yWTPuUdwjiwE P8Hg4eEueJE6EWPBI/DEDz+RF3jAo5IoG7UOUACdZHhVvwJmvrYd/TUPkqKeMv6wIbIY SDVpiA0ArjzCwREeCclnEIrKDqwGTCdx/Cbc2GSVbjlcMCZIXb3X2CdHhcCQZ/vwVVYG iIhA== X-Gm-Message-State: AHPjjUhnoFXSge5/AZIlhSY0noskBfGfEmly1enCSDy5bcOvR+/rBRs5 n+AV9FcBVfqhtSzPqkKy7A== X-Google-Smtp-Source: ADKCNb4OXM8IHipUs6E4EpekV7tppLEXaQ+W8Dduq2fEzoEOM/PbMid8w2TLQNonHoD+Y5oSHxW4bg== X-Received: by 10.99.126.84 with SMTP id o20mr947319pgn.141.1504824084272; Thu, 07 Sep 2017 15:41:24 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 7 Sep 2017 15:40:49 -0700 Message-Id: <20170907224051.21518-22-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170907224051.21518-1-richard.henderson@linaro.org> References: <20170907224051.21518-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22b Subject: [Qemu-devel] [PULL 21/23] tcg/ppc: Change TCG_REG_RA to TCG_REG_TB X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Richard Henderson At this point the conversion is a wash. Loading of TB+ofs is smaller, but the actual return address from exit_tb is larger. There are a few more insns required to transition between TBs. But the expectation is that accesses to the constant pool will on the whole be smaller. Signed-off-by: Richard Henderson --- tcg/ppc/tcg-target.inc.c | 273 +++++++++++++++++++++----------------------= ---- 1 file changed, 122 insertions(+), 151 deletions(-) diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c index d772faf7be..bc14d2c9c6 100644 --- a/tcg/ppc/tcg-target.inc.c +++ b/tcg/ppc/tcg-target.inc.c @@ -39,29 +39,8 @@ # define TCG_REG_TMP1 TCG_REG_R12 #endif =20 -/* For the 64-bit target, we don't like the 5 insn sequence needed to build - full 64-bit addresses. Better to have a base register to which we can - apply a 32-bit displacement. - - There are generally three items of interest: - (1) helper functions in the main executable, - (2) TranslationBlock data structures, - (3) the return address in the epilogue. - - For user-only, we USE_STATIC_CODE_GEN_BUFFER, so the code_gen_buffer - will be inside the main executable, and thus near enough to make a - pointer to the epilogue be within 2GB of all helper functions. - - For softmmu, we'll let the kernel choose the address of code_gen_buffer, - and odds are it'll be somewhere close to the main malloc arena, and so - a pointer to the epilogue will be within 2GB of the TranslationBlocks. - - For --enable-pie, everything will be kinda near everything else, - somewhere in high memory. - - Thus we choose to keep the return address in a call-saved register. */ -#define TCG_REG_RA TCG_REG_R31 -#define USE_REG_RA (TCG_TARGET_REG_BITS =3D=3D 64) +#define TCG_REG_TB TCG_REG_R31 +#define USE_REG_TB (TCG_TARGET_REG_BITS =3D=3D 64) =20 /* Shorthand for size of a pointer. Avoid promotion to unsigned. */ #define SZP ((int)sizeof(void *)) @@ -614,50 +593,68 @@ static inline void tcg_out_shri64(TCGContext *s, TCGR= eg dst, TCGReg src, int c) tcg_out_rld(s, RLDICL, dst, src, 64 - c, c); } =20 -static void tcg_out_movi32(TCGContext *s, TCGReg ret, int32_t arg) +static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret, + tcg_target_long arg, bool in_prologue) { - if (arg =3D=3D (int16_t) arg) { + intptr_t tb_diff; + int32_t high; + + tcg_debug_assert(TCG_TARGET_REG_BITS =3D=3D 64 || type =3D=3D TCG_TYPE= _I32); + + if (TCG_TARGET_REG_BITS =3D=3D 64 && type =3D=3D TCG_TYPE_I32) { + arg =3D (int32_t)arg; + } + + /* Load 16-bit immediates with one insn. */ + if (arg =3D=3D (int16_t)arg) { tcg_out32(s, ADDI | TAI(ret, 0, arg)); - } else { + return; + } + + /* Load addresses within the TB with one insn. */ + tb_diff =3D arg - (intptr_t)s->code_gen_ptr; + if (!in_prologue && USE_REG_TB && tb_diff =3D=3D (int16_t)tb_diff) { + tcg_out32(s, ADDI | TAI(ret, TCG_REG_TB, tb_diff)); + return; + } + + /* Load 32-bit immediates with two insns. */ + if (TCG_TARGET_REG_BITS =3D=3D 32 || arg =3D=3D (int32_t)arg) { tcg_out32(s, ADDIS | TAI(ret, 0, arg >> 16)); if (arg & 0xffff) { tcg_out32(s, ORI | SAI(ret, ret, arg)); } + return; } -} - -static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret, - tcg_target_long arg) -{ - tcg_debug_assert(TCG_TARGET_REG_BITS =3D=3D 64 || type =3D=3D TCG_TYPE= _I32); - if (type =3D=3D TCG_TYPE_I32 || arg =3D=3D (int32_t)arg) { - tcg_out_movi32(s, ret, arg); - } else if (arg =3D=3D (uint32_t)arg && !(arg & 0x8000)) { + if (arg =3D=3D (uint32_t)arg && !(arg & 0x8000)) { tcg_out32(s, ADDI | TAI(ret, 0, arg)); tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16)); - } else { - int32_t high; + return; + } =20 - if (USE_REG_RA) { - intptr_t diff =3D arg - (intptr_t)tb_ret_addr; - if (diff =3D=3D (int32_t)diff) { - tcg_out_mem_long(s, ADDI, ADD, ret, TCG_REG_RA, diff); - return; - } - } + /* Load addresses within 2GB of TB with 2 (or rarely 3) insns. */ + if (!in_prologue && USE_REG_TB && tb_diff =3D=3D (int32_t)tb_diff) { + tcg_out_mem_long(s, ADDI, ADD, ret, TCG_REG_TB, tb_diff); + return; + } =20 - high =3D arg >> 31 >> 1; - tcg_out_movi32(s, ret, high); - if (high) { - tcg_out_shli64(s, ret, ret, 32); - } - if (arg & 0xffff0000) { - tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16)); - } - if (arg & 0xffff) { - tcg_out32(s, ORI | SAI(ret, ret, arg)); - } + high =3D arg >> 31 >> 1; + tcg_out_movi(s, TCG_TYPE_I32, ret, high); + if (high) { + tcg_out_shli64(s, ret, ret, 32); } + if (arg & 0xffff0000) { + tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16)); + } + if (arg & 0xffff) { + tcg_out32(s, ORI | SAI(ret, ret, arg)); + } +} + +static inline void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret, + tcg_target_long arg) +{ + tcg_out_movi_int(s, type, ret, arg, false); } =20 static bool mask_operand(uint32_t c, int *mb, int *me) @@ -1293,49 +1290,43 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0) tcg_out32(s, insn); } =20 -#ifdef __powerpc64__ void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr, uintptr_t addr) { - tcg_insn_unit i1, i2; - uint64_t pair; - intptr_t diff =3D addr - jmp_addr; - - if (in_range_b(diff)) { - i1 =3D B | (diff & 0x3fffffc); - i2 =3D NOP; - } else if (USE_REG_RA) { - intptr_t lo, hi; - diff =3D addr - (uintptr_t)tb_ret_addr; - lo =3D (int16_t)diff; - hi =3D (int32_t)(diff - lo); - tcg_debug_assert(diff =3D=3D hi + lo); - i1 =3D ADDIS | TAI(TCG_REG_TMP1, TCG_REG_RA, hi >> 16); - i2 =3D ADDI | TAI(TCG_REG_TMP1, TCG_REG_TMP1, lo); - } else { - tcg_debug_assert(TCG_TARGET_REG_BITS =3D=3D 32 || addr =3D=3D (int= 32_t)addr); - i1 =3D ADDIS | TAI(TCG_REG_TMP1, 0, addr >> 16); - i2 =3D ORI | SAI(TCG_REG_TMP1, TCG_REG_TMP1, addr); - } + if (TCG_TARGET_REG_BITS =3D=3D 64) { + tcg_insn_unit i1, i2; + intptr_t tb_diff =3D addr - tc_ptr; + intptr_t br_diff =3D addr - (jmp_addr + 4); + uint64_t pair; + + /* This does not exercise the range of the branch, but we do + still need to be able to load the new value of TCG_REG_TB. + But this does still happen quite often. */ + if (tb_diff =3D=3D (int16_t)tb_diff) { + i1 =3D ADDI | TAI(TCG_REG_TB, TCG_REG_TB, tb_diff); + i2 =3D B | (br_diff & 0x3fffffc); + } else { + intptr_t lo =3D (int16_t)tb_diff; + intptr_t hi =3D (int32_t)(tb_diff - lo); + assert(tb_diff =3D=3D hi + lo); + i1 =3D ADDIS | TAI(TCG_REG_TB, TCG_REG_TB, hi >> 16); + i2 =3D ADDI | TAI(TCG_REG_TB, TCG_REG_TB, lo); + } #ifdef HOST_WORDS_BIGENDIAN - pair =3D (uint64_t)i1 << 32 | i2; + pair =3D (uint64_t)i1 << 32 | i2; #else - pair =3D (uint64_t)i2 << 32 | i1; + pair =3D (uint64_t)i2 << 32 | i1; #endif =20 - atomic_set((uint64_t *)jmp_addr, pair); - flush_icache_range(jmp_addr, jmp_addr + 8); -} -#else -void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_addr, - uintptr_t addr) -{ - intptr_t diff =3D addr - jmp_addr; - tcg_debug_assert(in_range_b(diff)); - atomic_set((uint32_t *)jmp_addr, B | (diff & 0x3fffffc)); - flush_icache_range(jmp_addr, jmp_addr + 4); + atomic_set((uint64_t *)jmp_addr, pair); + flush_icache_range(jmp_addr, jmp_addr + 8); + } else { + intptr_t diff =3D addr - jmp_addr; + tcg_debug_assert(in_range_b(diff)); + atomic_set((uint32_t *)jmp_addr, B | (diff & 0x3fffffc)); + flush_icache_range(jmp_addr, jmp_addr + 4); + } } -#endif =20 static void tcg_out_call(TCGContext *s, tcg_insn_unit *target) { @@ -1897,44 +1888,20 @@ static void tcg_target_qemu_prologue(TCGContext *s) =20 #ifndef CONFIG_SOFTMMU if (guest_base) { - tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base); + tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base, = true); tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG); } #endif =20 tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]); tcg_out32(s, MTSPR | RS(tcg_target_call_iarg_regs[1]) | CTR); - - if (USE_REG_RA) { -#ifdef _CALL_AIX - /* Make the caller load the value as the TOC into R2. */ - tb_ret_addr =3D s->code_ptr + 2; - desc[1] =3D tb_ret_addr; - tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_RA, TCG_REG_R2); - tcg_out32(s, BCCTR | BO_ALWAYS); -#elif defined(_CALL_ELF) && _CALL_ELF =3D=3D 2 - /* Compute from the incoming R12 value. */ - tb_ret_addr =3D s->code_ptr + 2; - tcg_out32(s, ADDI | TAI(TCG_REG_RA, TCG_REG_R12, - tcg_ptr_byte_diff(tb_ret_addr, s->code_buf= ))); - tcg_out32(s, BCCTR | BO_ALWAYS); -#else - /* Reserve max 5 insns for the constant load. */ - tb_ret_addr =3D s->code_ptr + 6; - tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_RA, (intptr_t)tb_ret_addr); - tcg_out32(s, BCCTR | BO_ALWAYS); - while (s->code_ptr < tb_ret_addr) { - tcg_out32(s, NOP); - } -#endif - } else { - tcg_out32(s, BCCTR | BO_ALWAYS); - tb_ret_addr =3D s->code_ptr; + if (USE_REG_TB) { + tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, tcg_target_call_iarg_regs= [1]); } + tcg_out32(s, BCCTR | BO_ALWAYS); =20 /* Epilogue */ - tcg_debug_assert(tb_ret_addr =3D=3D s->code_ptr); - s->code_gen_epilogue =3D tb_ret_addr; + s->code_gen_epilogue =3D tb_ret_addr =3D s->code_ptr; =20 tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R0, TCG_REG_R1, FRAME_SIZE+LR_OFFS= ET); for (i =3D 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); ++i) { @@ -1954,44 +1921,48 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc= , const TCGArg *args, =20 switch (opc) { case INDEX_op_exit_tb: - if (USE_REG_RA) { - ptrdiff_t disp =3D tcg_pcrel_diff(s, tb_ret_addr); - - /* Use a direct branch if we can, otherwise use the value in R= A. - Note that the direct branch is always backward, thus we need - to account for the possibility of 5 insns from the movi. */ - if (!in_range_b(disp - 20)) { - tcg_out32(s, MTSPR | RS(TCG_REG_RA) | CTR); - tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R3, args[0]); - tcg_out32(s, BCCTR | BO_ALWAYS); - break; - } - } tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R3, args[0]); tcg_out_b(s, 0, tb_ret_addr); break; case INDEX_op_goto_tb: - tcg_debug_assert(s->tb_jmp_insn_offset); - /* Direct jump. */ -#ifdef __powerpc64__ - /* Ensure the next insns are 8-byte aligned. */ - if ((uintptr_t)s->code_ptr & 7) { - tcg_out32(s, NOP); - } - s->tb_jmp_insn_offset[args[0]] =3D tcg_current_code_size(s); - /* To be replaced by either a branch+nop or a load into TMP1. */ - s->code_ptr +=3D 2; - tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | CTR); + if (s->tb_jmp_insn_offset) { + /* Direct jump. */ + if (TCG_TARGET_REG_BITS =3D=3D 64) { + /* Ensure the next insns are 8-byte aligned. */ + if ((uintptr_t)s->code_ptr & 7) { + tcg_out32(s, NOP); + } + s->tb_jmp_insn_offset[args[0]] =3D tcg_current_code_size(s= ); + tcg_out32(s, ADDIS | TAI(TCG_REG_TB, TCG_REG_TB, 0)); + tcg_out32(s, ADDI | TAI(TCG_REG_TB, TCG_REG_TB, 0)); + } else { + s->tb_jmp_insn_offset[args[0]] =3D tcg_current_code_size(s= ); + tcg_out32(s, B); + s->tb_jmp_reset_offset[args[0]] =3D tcg_current_code_size(= s); + break; + } + } else { + /* Indirect jump. */ + tcg_debug_assert(s->tb_jmp_insn_offset =3D=3D NULL); + tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TB, 0, + (intptr_t)(s->tb_jmp_insn_offset + args[0])); + } + tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR); tcg_out32(s, BCCTR | BO_ALWAYS); -#else - /* To be replaced by a branch. */ - s->code_ptr++; -#endif - s->tb_jmp_reset_offset[args[0]] =3D tcg_current_code_size(s); + s->tb_jmp_reset_offset[args[0]] =3D c =3D tcg_current_code_size(s); + if (USE_REG_TB) { + /* For the unlinked case, need to reset TCG_REG_TB. */ + c =3D -c; + assert(c =3D=3D (int16_t)c); + tcg_out32(s, ADDI | TAI(TCG_REG_TB, TCG_REG_TB, c)); + } break; case INDEX_op_goto_ptr: tcg_out32(s, MTSPR | RS(args[0]) | CTR); - tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R3, 0); + if (USE_REG_TB) { + tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, args[0]); + } + tcg_out32(s, ADDI | TAI(TCG_REG_R3, 0, 0)); tcg_out32(s, BCCTR | BO_ALWAYS); break; case INDEX_op_br: @@ -2761,8 +2732,8 @@ static void tcg_target_init(TCGContext *s) tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13); /* thread pointer */ #endif tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1); /* mem temp */ - if (USE_REG_RA) { - tcg_regset_set_reg(s->reserved_regs, TCG_REG_RA); /* return addr = */ + if (USE_REG_TB) { + tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB); /* tb->tc_ptr */ } } =20 --=20 2.13.5