From nobody Mon Feb 9 07:55:19 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1500521991256291.2138133884731; Wed, 19 Jul 2017 20:39:51 -0700 (PDT) Received: from localhost ([::1]:35927 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dY2JY-0002vx-V6 for importer@patchew.org; Wed, 19 Jul 2017 23:39:48 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60519) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dY1qY-0001G5-A3 for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dY1qM-0005vn-Pq for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:50 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:50359) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dY1qM-0005ty-Dh for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:38 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 0ED2D20B62; Wed, 19 Jul 2017 23:09:38 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute4.internal (MEProxy); Wed, 19 Jul 2017 23:09:38 -0400 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id B0BD5241E0; Wed, 19 Jul 2017 23:09:37 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc :date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=n5g /MKWDlH9d+wkYacQLM0igFrKnNchnVLVrQX9OlPQ=; b=U0QzpcIV3PSqQ/iSAPE xSykj25UYJ11/9zG5L3SR5vA9SZP1TEKUnee5bdqWhB8JXMYj5ANSM+l8aXQFoEB 1nBS62BmuFwfavdZLs1eJXjLWj10eKvkFpo6ur5TdDBExy4LXmqqThLOIu51zI/Q fbZnOchC5CMDAyMn3NO9ROFo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc :x-sasl-enc; s=fm1; bh=n5g/MKWDlH9d+wkYacQLM0igFrKnNchnVLVrQX9Ol PQ=; b=rr4L97LmtAt9sifkNAunk8ZCIiXUqCigxs/oHcXRUWGfIpkjVAC5IOcS2 vOBspx5TQLRZf/pn+25v+iaaLwZnbCnJdH6Q37Sc2ad+3otKp3IQ6PIH0ZnKz7yA /Db8ET+FqiiptCqCzY2GvK1eEB40GETnh6ViLYy7UqDVQRALOOXJKuss9uiSF5Sr G7hOR2pLngkbYA6Dl1WdxJExiFNsvUAAsJXHRyFUUPYPoCgilw2Nue+nStYo/GJC on7pSjOktAWriNNn+bQzm+iuExFa2k/HmAOISAGu2d6wmwOQSSUmsYTMvlm3tV7+ 8lGauhLDXD3d8js5WgK92HAvV/1GQ== X-ME-Sender: X-Sasl-enc: FMfMp9o0lAUE95n3Bph6CBSvof4SDhjLr9/IRGWU2AAL 1500520177 From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Wed, 19 Jul 2017 23:09:21 -0400 Message-Id: <1500520169-23367-36-git-send-email-cota@braap.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1500520169-23367-1-git-send-email-cota@braap.org> References: <1500520169-23367-1-git-send-email-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.25 Subject: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Groundwork for supporting multiple TCG contexts. While at it, also allocate temps_used directly as a bitmap of the required size, instead of having a bitmap of TCG_MAX_TEMPS via TCGTempSet. Performance-wise we lose about 2% in a translation-heavy workload such as booting+shutting down debian-arm: Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \ -machine type=3Dvirt -nographic -smp 1 -m 4096 \ -netdev user,id=3Dunet,hostfwd=3Dtcp::2222-:22 \ -device virtio-net-device,netdev=3Dunet \ -drive file=3Ddie-on-boot.qcow2,id=3Dmyblock,index=3D0,if=3Dnone \ -device virtio-blk-device,drive=3Dmyblock \ -kernel kernel.img -append console=3DttyAMA0 root=3D/dev/vda1 \ -name arm,debug-threads=3Don -smp 1' (10 runs): Before: 19489.126318 task-clock # 0.960 CPUs utilized = ( +- 0.96% ) 23,697 context-switches # 0.001 M/sec = ( +- 0.51% ) 1 CPU-migrations # 0.000 M/sec 19,953 page-faults # 0.001 M/sec = ( +- 0.40% ) 56,214,402,410 cycles # 2.884 GHz = ( +- 0.95% ) [83.34%] 25,516,669,513 stalled-cycles-frontend # 45.39% frontend cycles idl= e ( +- 0.69% ) [83.33%] 17,266,165,747 stalled-cycles-backend # 30.71% backend cycles idl= e ( +- 0.59% ) [66.66%] 79,007,843,327 instructions # 1.41 insns per cycle # 0.32 stalled cycles per = insn ( +- 1.19% ) [83.34%] 13,136,600,416 branches # 674.048 M/sec = ( +- 1.29% ) [83.34%] 274,715,270 branch-misses # 2.09% of all branches = ( +- 0.79% ) [83.33%] 20.300335944 seconds time elapsed = ( +- 0.55% ) After: 19917.737030 task-clock # 0.955 CPUs utilized = ( +- 0.74% ) 23,973 context-switches # 0.001 M/sec = ( +- 0.37% ) 1 CPU-migrations # 0.000 M/sec 19,824 page-faults # 0.001 M/sec = ( +- 0.38% ) 57,380,269,537 cycles # 2.881 GHz = ( +- 0.70% ) [83.34%] 26,462,452,508 stalled-cycles-frontend # 46.12% frontend cycles idl= e ( +- 0.65% ) [83.34%] 17,970,546,047 stalled-cycles-backend # 31.32% backend cycles idl= e ( +- 0.64% ) [66.67%] 79,527,238,334 instructions # 1.39 insns per cycle # 0.33 stalled cycles per = insn ( +- 0.79% ) [83.33%] 13,272,362,192 branches # 666.359 M/sec = ( +- 0.83% ) [83.34%] 278,357,773 branch-misses # 2.10% of all branches = ( +- 0.65% ) [83.33%] 20.850558455 seconds time elapsed = ( +- 0.55% ) That is, 2.70% slowdown. The perf difference shrinks a bit when using a high-performance allocator such as tcmalloc: Before: 19372.008814 task-clock # 0.957 CPUs utilized = ( +- 1.00% ) 23,621 context-switches # 0.001 M/sec = ( +- 0.50% ) 1 CPU-migrations # 0.000 M/sec 13,289 page-faults # 0.001 M/sec = ( +- 1.46% ) 55,824,272,818 cycles # 2.882 GHz = ( +- 1.00% ) [83.33%] 25,284,946,453 stalled-cycles-frontend # 45.29% frontend cycles idl= e ( +- 1.12% ) [83.32%] 17,100,517,753 stalled-cycles-backend # 30.63% backend cycles idl= e ( +- 0.86% ) [66.69%] 78,193,046,990 instructions # 1.40 insns per cycle # 0.32 stalled cycles per = insn ( +- 1.14% ) [83.35%] 12,986,014,194 branches # 670.349 M/sec = ( +- 1.22% ) [83.34%] 272,581,789 branch-misses # 2.10% of all branches = ( +- 0.62% ) [83.33%] 20.249726404 seconds time elapsed = ( +- 0.61% ) After: 19809.295886 task-clock # 0.962 CPUs utilized = ( +- 0.99% ) 23,894 context-switches # 0.001 M/sec = ( +- 0.50% ) 1 CPU-migrations # 0.000 M/sec 12,927 page-faults # 0.001 M/sec = ( +- 0.78% ) 57,131,686,004 cycles # 2.884 GHz = ( +- 0.97% ) [83.34%] 25,965,120,001 stalled-cycles-frontend # 45.45% frontend cycles idl= e ( +- 0.71% ) [83.35%] 17,534,942,176 stalled-cycles-backend # 30.69% backend cycles idl= e ( +- 0.54% ) [66.68%] 80,000,003,715 instructions # 1.40 insns per cycle # 0.32 stalled cycles per = insn ( +- 1.24% ) [83.34%] 13,327,272,806 branches # 672.779 M/sec = ( +- 1.31% ) [83.34%] 273,622,661 branch-misses # 2.05% of all branches = ( +- 0.95% ) [83.31%] 20.601366430 seconds time elapsed = ( +- 0.60% ) That is, 1.77% slowdown. Suggested-by: Richard Henderson Signed-off-by: Emilio G. Cota Reviewed-by: Richard Henderson --- tcg/optimize.c | 307 ++++++++++++++++++++++++++++++-----------------------= ---- 1 file changed, 162 insertions(+), 145 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index adfc56c..b727a4a 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -40,21 +40,18 @@ struct tcg_temp_info { tcg_target_ulong mask; }; =20 -static struct tcg_temp_info temps[TCG_MAX_TEMPS]; -static TCGTempSet temps_used; - -static inline bool temp_is_const(TCGArg arg) +static inline bool temp_is_const(const struct tcg_temp_info *temps, TCGArg= arg) { return temps[arg].is_const; } =20 -static inline bool temp_is_copy(TCGArg arg) +static inline bool temp_is_copy(const struct tcg_temp_info *temps, TCGArg = arg) { return temps[arg].next_copy !=3D arg; } =20 /* Reset TEMP's state, possibly removing the temp for the list of copies. = */ -static void reset_temp(TCGArg temp) +static void reset_temp(struct tcg_temp_info *temps, TCGArg temp) { temps[temps[temp].next_copy].prev_copy =3D temps[temp].prev_copy; temps[temps[temp].prev_copy].next_copy =3D temps[temp].next_copy; @@ -64,21 +61,16 @@ static void reset_temp(TCGArg temp) temps[temp].mask =3D -1; } =20 -/* Reset all temporaries, given that there are NB_TEMPS of them. */ -static void reset_all_temps(int nb_temps) -{ - bitmap_zero(temps_used.l, nb_temps); -} - /* Initialize and activate a temporary. */ -static void init_temp_info(TCGArg temp) +static void init_temp_info(struct tcg_temp_info *temps, + unsigned long *temps_used, TCGArg temp) { - if (!test_bit(temp, temps_used.l)) { + if (!test_bit(temp, temps_used)) { temps[temp].next_copy =3D temp; temps[temp].prev_copy =3D temp; temps[temp].is_const =3D false; temps[temp].mask =3D -1; - set_bit(temp, temps_used.l); + set_bit(temp, temps_used); } } =20 @@ -116,7 +108,8 @@ static TCGOpcode op_to_movi(TCGOpcode op) } } =20 -static TCGArg find_better_copy(TCGContext *s, TCGArg temp) +static TCGArg find_better_copy(TCGContext *s, const struct tcg_temp_info *= temps, + TCGArg temp) { TCGArg i; =20 @@ -145,7 +138,8 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg te= mp) return temp; } =20 -static bool temps_are_copies(TCGArg arg1, TCGArg arg2) +static bool temps_are_copies(const struct tcg_temp_info *temps, TCGArg arg= 1, + TCGArg arg2) { TCGArg i; =20 @@ -153,7 +147,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2) return true; } =20 - if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) { + if (!temp_is_copy(temps, arg1) || !temp_is_copy(temps, arg2)) { return false; } =20 @@ -166,15 +160,15 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2) return false; } =20 -static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args, - TCGArg dst, TCGArg val) +static void tcg_opt_gen_movi(TCGContext *s, struct tcg_temp_info *temps, + TCGOp *op, TCGArg *args, TCGArg dst, TCGArg v= al) { TCGOpcode new_op =3D op_to_movi(op->opc); tcg_target_ulong mask; =20 op->opc =3D new_op; =20 - reset_temp(dst); + reset_temp(temps, dst); temps[dst].is_const =3D true; temps[dst].val =3D val; mask =3D val; @@ -188,10 +182,10 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op= , TCGArg *args, args[1] =3D val; } =20 -static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args, - TCGArg dst, TCGArg src) +static void tcg_opt_gen_mov(TCGContext *s, struct tcg_temp_info *temps, + TCGOp *op, TCGArg *args, TCGArg dst, TCGArg sr= c) { - if (temps_are_copies(dst, src)) { + if (temps_are_copies(temps, dst, src)) { tcg_op_remove(s, op); return; } @@ -201,7 +195,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, T= CGArg *args, =20 op->opc =3D new_op; =20 - reset_temp(dst); + reset_temp(temps, dst); mask =3D temps[src].mask; if (TCG_TARGET_REG_BITS > 32 && new_op =3D=3D INDEX_op_mov_i32) { /* High bits of the destination are now garbage. */ @@ -463,10 +457,11 @@ static bool do_constant_folding_cond_eq(TCGCond c) =20 /* Return 2 if the condition can't be simplified, and the result of the condition (0 or 1) if it can */ -static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x, - TCGArg y, TCGCond c) +static TCGArg +do_constant_folding_cond(const struct tcg_temp_info *temps, TCGOpcode op, + TCGArg x, TCGArg y, TCGCond c) { - if (temp_is_const(x) && temp_is_const(y)) { + if (temp_is_const(temps, x) && temp_is_const(temps, y)) { switch (op_bits(op)) { case 32: return do_constant_folding_cond_32(temps[x].val, temps[y].val,= c); @@ -475,9 +470,9 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TC= GArg x, default: tcg_abort(); } - } else if (temps_are_copies(x, y)) { + } else if (temps_are_copies(temps, x, y)) { return do_constant_folding_cond_eq(c); - } else if (temp_is_const(y) && temps[y].val =3D=3D 0) { + } else if (temp_is_const(temps, y) && temps[y].val =3D=3D 0) { switch (c) { case TCG_COND_LTU: return 0; @@ -492,15 +487,17 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, = TCGArg x, =20 /* Return 2 if the condition can't be simplified, and the result of the condition (0 or 1) if it can */ -static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c) +static TCGArg +do_constant_folding_cond2(const struct tcg_temp_info *temps, TCGArg *p1, + TCGArg *p2, TCGCond c) { TCGArg al =3D p1[0], ah =3D p1[1]; TCGArg bl =3D p2[0], bh =3D p2[1]; =20 - if (temp_is_const(bl) && temp_is_const(bh)) { + if (temp_is_const(temps, bl) && temp_is_const(temps, bh)) { uint64_t b =3D ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[b= l].val; =20 - if (temp_is_const(al) && temp_is_const(ah)) { + if (temp_is_const(temps, al) && temp_is_const(temps, ah)) { uint64_t a; a =3D ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].va= l; return do_constant_folding_cond_64(a, b, c); @@ -516,18 +513,19 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, T= CGArg *p2, TCGCond c) } } } - if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) { + if (temps_are_copies(temps, al, bl) && temps_are_copies(temps, ah, bh)= ) { return do_constant_folding_cond_eq(c); } return 2; } =20 -static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2) +static bool swap_commutative(const struct tcg_temp_info *temps, TCGArg des= t, + TCGArg *p1, TCGArg *p2) { TCGArg a1 =3D *p1, a2 =3D *p2; int sum =3D 0; - sum +=3D temp_is_const(a1); - sum -=3D temp_is_const(a2); + sum +=3D temp_is_const(temps, a1); + sum -=3D temp_is_const(temps, a2); =20 /* Prefer the constant in second argument, and then the form op a, a, b, which is better handled on non-RISC hosts. */ @@ -539,13 +537,14 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1,= TCGArg *p2) return false; } =20 -static bool swap_commutative2(TCGArg *p1, TCGArg *p2) +static bool swap_commutative2(const struct tcg_temp_info *temps, TCGArg *p= 1, + TCGArg *p2) { int sum =3D 0; - sum +=3D temp_is_const(p1[0]); - sum +=3D temp_is_const(p1[1]); - sum -=3D temp_is_const(p2[0]); - sum -=3D temp_is_const(p2[1]); + sum +=3D temp_is_const(temps, p1[0]); + sum +=3D temp_is_const(temps, p1[1]); + sum -=3D temp_is_const(temps, p2[0]); + sum -=3D temp_is_const(temps, p2[1]); if (sum > 0) { TCGArg t; t =3D p1[0], p1[0] =3D p2[0], p2[0] =3D t; @@ -558,6 +557,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2) /* Propagate constants and copies, fold constant expressions. */ void tcg_optimize(TCGContext *s) { + struct tcg_temp_info *temps; + unsigned long *temps_used; int oi, oi_next, nb_temps, nb_globals; TCGArg *prev_mb_args =3D NULL; =20 @@ -568,7 +569,8 @@ void tcg_optimize(TCGContext *s) =20 nb_temps =3D s->nb_temps; nb_globals =3D s->nb_globals; - reset_all_temps(nb_temps); + temps =3D g_new(struct tcg_temp_info, nb_temps); + temps_used =3D bitmap_new(nb_temps); =20 for (oi =3D s->gen_op_buf[0].next; oi !=3D 0; oi =3D oi_next) { tcg_target_ulong mask, partmask, affected; @@ -590,21 +592,21 @@ void tcg_optimize(TCGContext *s) for (i =3D 0; i < nb_oargs + nb_iargs; i++) { tmp =3D args[i]; if (tmp !=3D TCG_CALL_DUMMY_ARG) { - init_temp_info(tmp); + init_temp_info(temps, temps_used, tmp); } } } else { nb_oargs =3D def->nb_oargs; nb_iargs =3D def->nb_iargs; for (i =3D 0; i < nb_oargs + nb_iargs; i++) { - init_temp_info(args[i]); + init_temp_info(temps, temps_used, args[i]); } } =20 /* Do copy propagation */ for (i =3D nb_oargs; i < nb_oargs + nb_iargs; i++) { - if (temp_is_copy(args[i])) { - args[i] =3D find_better_copy(s, args[i]); + if (temp_is_copy(temps, args[i])) { + args[i] =3D find_better_copy(s, temps, args[i]); } } =20 @@ -620,44 +622,44 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(nor): CASE_OP_32_64(muluh): CASE_OP_32_64(mulsh): - swap_commutative(args[0], &args[1], &args[2]); + swap_commutative(temps, args[0], &args[1], &args[2]); break; CASE_OP_32_64(brcond): - if (swap_commutative(-1, &args[0], &args[1])) { + if (swap_commutative(temps, -1, &args[0], &args[1])) { args[2] =3D tcg_swap_cond(args[2]); } break; CASE_OP_32_64(setcond): - if (swap_commutative(args[0], &args[1], &args[2])) { + if (swap_commutative(temps, args[0], &args[1], &args[2])) { args[3] =3D tcg_swap_cond(args[3]); } break; CASE_OP_32_64(movcond): - if (swap_commutative(-1, &args[1], &args[2])) { + if (swap_commutative(temps, -1, &args[1], &args[2])) { args[5] =3D tcg_swap_cond(args[5]); } /* For movcond, we canonicalize the "false" input reg to match the destination reg so that the tcg backend can implement a "move if true" operation. */ - if (swap_commutative(args[0], &args[4], &args[3])) { + if (swap_commutative(temps, args[0], &args[4], &args[3])) { args[5] =3D tcg_invert_cond(args[5]); } break; CASE_OP_32_64(add2): - swap_commutative(args[0], &args[2], &args[4]); - swap_commutative(args[1], &args[3], &args[5]); + swap_commutative(temps, args[0], &args[2], &args[4]); + swap_commutative(temps, args[1], &args[3], &args[5]); break; CASE_OP_32_64(mulu2): CASE_OP_32_64(muls2): - swap_commutative(args[0], &args[2], &args[3]); + swap_commutative(temps, args[0], &args[2], &args[3]); break; case INDEX_op_brcond2_i32: - if (swap_commutative2(&args[0], &args[2])) { + if (swap_commutative2(temps, &args[0], &args[2])) { args[4] =3D tcg_swap_cond(args[4]); } break; case INDEX_op_setcond2_i32: - if (swap_commutative2(&args[1], &args[3])) { + if (swap_commutative2(temps, &args[1], &args[3])) { args[5] =3D tcg_swap_cond(args[5]); } break; @@ -673,8 +675,8 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(sar): CASE_OP_32_64(rotl): CASE_OP_32_64(rotr): - if (temp_is_const(args[1]) && temps[args[1]].val =3D=3D 0) { - tcg_opt_gen_movi(s, op, args, args[0], 0); + if (temp_is_const(temps, args[1]) && temps[args[1]].val =3D=3D= 0) { + tcg_opt_gen_movi(s, temps, op, args, args[0], 0); continue; } break; @@ -683,7 +685,7 @@ void tcg_optimize(TCGContext *s) TCGOpcode neg_op; bool have_neg; =20 - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { /* Proceed with possible constant folding. */ break; } @@ -697,9 +699,9 @@ void tcg_optimize(TCGContext *s) if (!have_neg) { break; } - if (temp_is_const(args[1]) && temps[args[1]].val =3D=3D 0)= { + if (temp_is_const(temps, args[1]) && temps[args[1]].val = =3D=3D 0) { op->opc =3D neg_op; - reset_temp(args[0]); + reset_temp(temps, args[0]); args[1] =3D args[2]; continue; } @@ -707,30 +709,30 @@ void tcg_optimize(TCGContext *s) break; CASE_OP_32_64(xor): CASE_OP_32_64(nand): - if (!temp_is_const(args[1]) - && temp_is_const(args[2]) && temps[args[2]].val =3D=3D -1)= { + if (!temp_is_const(temps, args[1]) + && temp_is_const(temps, args[2]) && temps[args[2]].val =3D= =3D -1) { i =3D 1; goto try_not; } break; CASE_OP_32_64(nor): - if (!temp_is_const(args[1]) - && temp_is_const(args[2]) && temps[args[2]].val =3D=3D 0) { + if (!temp_is_const(temps, args[1]) + && temp_is_const(temps, args[2]) && temps[args[2]].val =3D= =3D 0) { i =3D 1; goto try_not; } break; CASE_OP_32_64(andc): - if (!temp_is_const(args[2]) - && temp_is_const(args[1]) && temps[args[1]].val =3D=3D -1)= { + if (!temp_is_const(temps, args[2]) + && temp_is_const(temps, args[1]) && temps[args[1]].val =3D= =3D -1) { i =3D 2; goto try_not; } break; CASE_OP_32_64(orc): CASE_OP_32_64(eqv): - if (!temp_is_const(args[2]) - && temp_is_const(args[1]) && temps[args[1]].val =3D=3D 0) { + if (!temp_is_const(temps, args[2]) + && temp_is_const(temps, args[1]) && temps[args[1]].val =3D= =3D 0) { i =3D 2; goto try_not; } @@ -751,7 +753,7 @@ void tcg_optimize(TCGContext *s) break; } op->opc =3D not_op; - reset_temp(args[0]); + reset_temp(temps, args[0]); args[1] =3D args[i]; continue; } @@ -771,18 +773,18 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(or): CASE_OP_32_64(xor): CASE_OP_32_64(andc): - if (!temp_is_const(args[1]) - && temp_is_const(args[2]) && temps[args[2]].val =3D=3D 0) { - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + if (!temp_is_const(temps, args[1]) + && temp_is_const(temps, args[2]) && temps[args[2]].val =3D= =3D 0) { + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); continue; } break; CASE_OP_32_64(and): CASE_OP_32_64(orc): CASE_OP_32_64(eqv): - if (!temp_is_const(args[1]) - && temp_is_const(args[2]) && temps[args[2]].val =3D=3D -1)= { - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + if (!temp_is_const(temps, args[1]) + && temp_is_const(temps, args[2]) && temps[args[2]].val =3D= =3D -1) { + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); continue; } break; @@ -819,7 +821,7 @@ void tcg_optimize(TCGContext *s) =20 CASE_OP_32_64(and): mask =3D temps[args[2]].mask; - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { and_const: affected =3D temps[args[1]].mask & ~mask; } @@ -838,7 +840,7 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(andc): /* Known-zeros does not imply known-ones. Therefore unless args[2] is constant, we can't infer anything from it. */ - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { mask =3D ~temps[args[2]].mask; goto and_const; } @@ -847,26 +849,26 @@ void tcg_optimize(TCGContext *s) break; =20 case INDEX_op_sar_i32: - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp =3D temps[args[2]].val & 31; mask =3D (int32_t)temps[args[1]].mask >> tmp; } break; case INDEX_op_sar_i64: - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp =3D temps[args[2]].val & 63; mask =3D (int64_t)temps[args[1]].mask >> tmp; } break; =20 case INDEX_op_shr_i32: - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp =3D temps[args[2]].val & 31; mask =3D (uint32_t)temps[args[1]].mask >> tmp; } break; case INDEX_op_shr_i64: - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp =3D temps[args[2]].val & 63; mask =3D (uint64_t)temps[args[1]].mask >> tmp; } @@ -880,7 +882,7 @@ void tcg_optimize(TCGContext *s) break; =20 CASE_OP_32_64(shl): - if (temp_is_const(args[2])) { + if (temp_is_const(temps, args[2])) { tmp =3D temps[args[2]].val & (TCG_TARGET_REG_BITS - 1); mask =3D temps[args[1]].mask << tmp; } @@ -976,12 +978,12 @@ void tcg_optimize(TCGContext *s) =20 if (partmask =3D=3D 0) { tcg_debug_assert(nb_oargs =3D=3D 1); - tcg_opt_gen_movi(s, op, args, args[0], 0); + tcg_opt_gen_movi(s, temps, op, args, args[0], 0); continue; } if (affected =3D=3D 0) { tcg_debug_assert(nb_oargs =3D=3D 1); - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); continue; } =20 @@ -991,8 +993,8 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(mul): CASE_OP_32_64(muluh): CASE_OP_32_64(mulsh): - if ((temp_is_const(args[2]) && temps[args[2]].val =3D=3D 0)) { - tcg_opt_gen_movi(s, op, args, args[0], 0); + if ((temp_is_const(temps, args[2]) && temps[args[2]].val =3D= =3D 0)) { + tcg_opt_gen_movi(s, temps, op, args, args[0], 0); continue; } break; @@ -1004,8 +1006,8 @@ void tcg_optimize(TCGContext *s) switch (opc) { CASE_OP_32_64(or): CASE_OP_32_64(and): - if (temps_are_copies(args[1], args[2])) { - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + if (temps_are_copies(temps, args[1], args[2])) { + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); continue; } break; @@ -1018,8 +1020,8 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(andc): CASE_OP_32_64(sub): CASE_OP_32_64(xor): - if (temps_are_copies(args[1], args[2])) { - tcg_opt_gen_movi(s, op, args, args[0], 0); + if (temps_are_copies(temps, args[1], args[2])) { + tcg_opt_gen_movi(s, temps, op, args, args[0], 0); continue; } break; @@ -1032,10 +1034,10 @@ void tcg_optimize(TCGContext *s) allocator where needed and possible. Also detect copies. */ switch (opc) { CASE_OP_32_64(mov): - tcg_opt_gen_mov(s, op, args, args[0], args[1]); + tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]); break; CASE_OP_32_64(movi): - tcg_opt_gen_movi(s, op, args, args[0], args[1]); + tcg_opt_gen_movi(s, temps, op, args, args[0], args[1]); break; =20 CASE_OP_32_64(not): @@ -1051,9 +1053,9 @@ void tcg_optimize(TCGContext *s) case INDEX_op_extu_i32_i64: case INDEX_op_extrl_i64_i32: case INDEX_op_extrh_i64_i32: - if (temp_is_const(args[1])) { + if (temp_is_const(temps, args[1])) { tmp =3D do_constant_folding(opc, temps[args[1]].val, 0); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; @@ -1080,66 +1082,70 @@ void tcg_optimize(TCGContext *s) CASE_OP_32_64(divu): CASE_OP_32_64(rem): CASE_OP_32_64(remu): - if (temp_is_const(args[1]) && temp_is_const(args[2])) { + if (temp_is_const(temps, args[1]) && + temp_is_const(temps, args[2])) { tmp =3D do_constant_folding(opc, temps[args[1]].val, temps[args[2]].val); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; =20 CASE_OP_32_64(clz): CASE_OP_32_64(ctz): - if (temp_is_const(args[1])) { + if (temp_is_const(temps, args[1])) { TCGArg v =3D temps[args[1]].val; if (v !=3D 0) { tmp =3D do_constant_folding(opc, v, 0); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); } else { - tcg_opt_gen_mov(s, op, args, args[0], args[2]); + tcg_opt_gen_mov(s, temps, op, args, args[0], args[2]); } break; } goto do_default; =20 CASE_OP_32_64(deposit): - if (temp_is_const(args[1]) && temp_is_const(args[2])) { + if (temp_is_const(temps, args[1]) && + temp_is_const(temps, args[2])) { tmp =3D deposit64(temps[args[1]].val, args[3], args[4], temps[args[2]].val); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; =20 CASE_OP_32_64(extract): - if (temp_is_const(args[1])) { + if (temp_is_const(temps, args[1])) { tmp =3D extract64(temps[args[1]].val, args[2], args[3]); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; =20 CASE_OP_32_64(sextract): - if (temp_is_const(args[1])) { + if (temp_is_const(temps, args[1])) { tmp =3D sextract64(temps[args[1]].val, args[2], args[3]); - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; =20 CASE_OP_32_64(setcond): - tmp =3D do_constant_folding_cond(opc, args[1], args[2], args[3= ]); + tmp =3D do_constant_folding_cond(temps, opc, args[1], args[2], + args[3]); if (tmp !=3D 2) { - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); break; } goto do_default; =20 CASE_OP_32_64(brcond): - tmp =3D do_constant_folding_cond(opc, args[0], args[1], args[2= ]); + tmp =3D do_constant_folding_cond(temps, opc, args[0], args[1], + args[2]); if (tmp !=3D 2) { if (tmp) { - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); op->opc =3D INDEX_op_br; args[0] =3D args[3]; } else { @@ -1150,12 +1156,14 @@ void tcg_optimize(TCGContext *s) goto do_default; =20 CASE_OP_32_64(movcond): - tmp =3D do_constant_folding_cond(opc, args[1], args[2], args[5= ]); + tmp =3D do_constant_folding_cond(temps, opc, args[1], args[2], + args[5]); if (tmp !=3D 2) { - tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]); + tcg_opt_gen_mov(s, temps, op, args, args[0], args[4 - tmp]= ); break; } - if (temp_is_const(args[3]) && temp_is_const(args[4])) { + if (temp_is_const(temps, args[3]) && + temp_is_const(temps, args[4])) { tcg_target_ulong tv =3D temps[args[3]].val; tcg_target_ulong fv =3D temps[args[4]].val; TCGCond cond =3D args[5]; @@ -1174,8 +1182,10 @@ void tcg_optimize(TCGContext *s) =20 case INDEX_op_add2_i32: case INDEX_op_sub2_i32: - if (temp_is_const(args[2]) && temp_is_const(args[3]) - && temp_is_const(args[4]) && temp_is_const(args[5])) { + if (temp_is_const(temps, args[2]) && + temp_is_const(temps, args[3]) && + temp_is_const(temps, args[4]) && + temp_is_const(temps, args[5])) { uint32_t al =3D temps[args[2]].val; uint32_t ah =3D temps[args[3]].val; uint32_t bl =3D temps[args[4]].val; @@ -1194,8 +1204,8 @@ void tcg_optimize(TCGContext *s) =20 rl =3D args[0]; rh =3D args[1]; - tcg_opt_gen_movi(s, op, args, rl, (int32_t)a); - tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(a >> 32)); + tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)a); + tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(a >> = 32)); =20 /* We've done all we need to do with the movi. Skip it. = */ oi_next =3D op2->next; @@ -1204,7 +1214,8 @@ void tcg_optimize(TCGContext *s) goto do_default; =20 case INDEX_op_mulu2_i32: - if (temp_is_const(args[2]) && temp_is_const(args[3])) { + if (temp_is_const(temps, args[2]) && + temp_is_const(temps, args[3])) { uint32_t a =3D temps[args[2]].val; uint32_t b =3D temps[args[3]].val; uint64_t r =3D (uint64_t)a * b; @@ -1214,8 +1225,8 @@ void tcg_optimize(TCGContext *s) =20 rl =3D args[0]; rh =3D args[1]; - tcg_opt_gen_movi(s, op, args, rl, (int32_t)r); - tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(r >> 32)); + tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)r); + tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(r >> = 32)); =20 /* We've done all we need to do with the movi. Skip it. = */ oi_next =3D op2->next; @@ -1224,11 +1235,11 @@ void tcg_optimize(TCGContext *s) goto do_default; =20 case INDEX_op_brcond2_i32: - tmp =3D do_constant_folding_cond2(&args[0], &args[2], args[4]); + tmp =3D do_constant_folding_cond2(temps, &args[0], &args[2], a= rgs[4]); if (tmp !=3D 2) { if (tmp) { do_brcond_true: - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); op->opc =3D INDEX_op_br; args[0] =3D args[5]; } else { @@ -1236,12 +1247,14 @@ void tcg_optimize(TCGContext *s) tcg_op_remove(s, op); } } else if ((args[4] =3D=3D TCG_COND_LT || args[4] =3D=3D TCG_C= OND_GE) - && temp_is_const(args[2]) && temps[args[2]].val =3D= =3D 0 - && temp_is_const(args[3]) && temps[args[3]].val =3D= =3D 0) { + && temp_is_const(temps, args[2]) + && temps[args[2]].val =3D=3D 0 + && temp_is_const(temps, args[3]) + && temps[args[3]].val =3D=3D 0) { /* Simplify LT/GE comparisons vs zero to a single compare vs the high word of the input. */ do_brcond_high: - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); op->opc =3D INDEX_op_brcond_i32; args[0] =3D args[1]; args[1] =3D args[3]; @@ -1250,14 +1263,14 @@ void tcg_optimize(TCGContext *s) } else if (args[4] =3D=3D TCG_COND_EQ) { /* Simplify EQ comparisons where one of the pairs can be simplified. */ - tmp =3D do_constant_folding_cond(INDEX_op_brcond_i32, + tmp =3D do_constant_folding_cond(temps, INDEX_op_brcond_i3= 2, args[0], args[2], TCG_COND_= EQ); if (tmp =3D=3D 0) { goto do_brcond_false; } else if (tmp =3D=3D 1) { goto do_brcond_high; } - tmp =3D do_constant_folding_cond(INDEX_op_brcond_i32, + tmp =3D do_constant_folding_cond(temps, INDEX_op_brcond_i3= 2, args[1], args[3], TCG_COND_= EQ); if (tmp =3D=3D 0) { goto do_brcond_false; @@ -1265,7 +1278,7 @@ void tcg_optimize(TCGContext *s) goto do_default; } do_brcond_low: - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); op->opc =3D INDEX_op_brcond_i32; args[1] =3D args[2]; args[2] =3D args[4]; @@ -1273,14 +1286,14 @@ void tcg_optimize(TCGContext *s) } else if (args[4] =3D=3D TCG_COND_NE) { /* Simplify NE comparisons where one of the pairs can be simplified. */ - tmp =3D do_constant_folding_cond(INDEX_op_brcond_i32, + tmp =3D do_constant_folding_cond(temps, INDEX_op_brcond_i3= 2, args[0], args[2], TCG_COND_= NE); if (tmp =3D=3D 0) { goto do_brcond_high; } else if (tmp =3D=3D 1) { goto do_brcond_true; } - tmp =3D do_constant_folding_cond(INDEX_op_brcond_i32, + tmp =3D do_constant_folding_cond(temps, INDEX_op_brcond_i3= 2, args[1], args[3], TCG_COND_= NE); if (tmp =3D=3D 0) { goto do_brcond_low; @@ -1294,17 +1307,19 @@ void tcg_optimize(TCGContext *s) break; =20 case INDEX_op_setcond2_i32: - tmp =3D do_constant_folding_cond2(&args[1], &args[3], args[5]); + tmp =3D do_constant_folding_cond2(temps, &args[1], &args[3], a= rgs[5]); if (tmp !=3D 2) { do_setcond_const: - tcg_opt_gen_movi(s, op, args, args[0], tmp); + tcg_opt_gen_movi(s, temps, op, args, args[0], tmp); } else if ((args[5] =3D=3D TCG_COND_LT || args[5] =3D=3D TCG_C= OND_GE) - && temp_is_const(args[3]) && temps[args[3]].val =3D= =3D 0 - && temp_is_const(args[4]) && temps[args[4]].val =3D= =3D 0) { + && temp_is_const(temps, args[3]) + && temps[args[3]].val =3D=3D 0 + && temp_is_const(temps, args[4]) + && temps[args[4]].val =3D=3D 0) { /* Simplify LT/GE comparisons vs zero to a single compare vs the high word of the input. */ do_setcond_high: - reset_temp(args[0]); + reset_temp(temps, args[0]); temps[args[0]].mask =3D 1; op->opc =3D INDEX_op_setcond_i32; args[1] =3D args[2]; @@ -1313,14 +1328,14 @@ void tcg_optimize(TCGContext *s) } else if (args[5] =3D=3D TCG_COND_EQ) { /* Simplify EQ comparisons where one of the pairs can be simplified. */ - tmp =3D do_constant_folding_cond(INDEX_op_setcond_i32, + tmp =3D do_constant_folding_cond(temps, INDEX_op_setcond_i= 32, args[1], args[3], TCG_COND_= EQ); if (tmp =3D=3D 0) { goto do_setcond_const; } else if (tmp =3D=3D 1) { goto do_setcond_high; } - tmp =3D do_constant_folding_cond(INDEX_op_setcond_i32, + tmp =3D do_constant_folding_cond(temps, INDEX_op_setcond_i= 32, args[2], args[4], TCG_COND_= EQ); if (tmp =3D=3D 0) { goto do_setcond_high; @@ -1328,7 +1343,7 @@ void tcg_optimize(TCGContext *s) goto do_default; } do_setcond_low: - reset_temp(args[0]); + reset_temp(temps, args[0]); temps[args[0]].mask =3D 1; op->opc =3D INDEX_op_setcond_i32; args[2] =3D args[3]; @@ -1336,14 +1351,14 @@ void tcg_optimize(TCGContext *s) } else if (args[5] =3D=3D TCG_COND_NE) { /* Simplify NE comparisons where one of the pairs can be simplified. */ - tmp =3D do_constant_folding_cond(INDEX_op_setcond_i32, + tmp =3D do_constant_folding_cond(temps, INDEX_op_setcond_i= 32, args[1], args[3], TCG_COND_= NE); if (tmp =3D=3D 0) { goto do_setcond_high; } else if (tmp =3D=3D 1) { goto do_setcond_const; } - tmp =3D do_constant_folding_cond(INDEX_op_setcond_i32, + tmp =3D do_constant_folding_cond(temps, INDEX_op_setcond_i= 32, args[2], args[4], TCG_COND_= NE); if (tmp =3D=3D 0) { goto do_setcond_low; @@ -1360,8 +1375,8 @@ void tcg_optimize(TCGContext *s) if (!(args[nb_oargs + nb_iargs + 1] & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS)= )) { for (i =3D 0; i < nb_globals; i++) { - if (test_bit(i, temps_used.l)) { - reset_temp(i); + if (test_bit(i, temps_used)) { + reset_temp(temps, i); } } } @@ -1375,11 +1390,11 @@ void tcg_optimize(TCGContext *s) block, otherwise we only trash the output args. "mask" is the non-zero bits mask for the first output arg. */ if (def->flags & TCG_OPF_BB_END) { - reset_all_temps(nb_temps); + bitmap_zero(temps_used, nb_temps); } else { do_reset_output: for (i =3D 0; i < nb_oargs; i++) { - reset_temp(args[i]); + reset_temp(temps, args[i]); /* Save the corresponding known-zero bits mask for the first output argument (only one supported so far). = */ if (i =3D=3D 0) { @@ -1428,4 +1443,6 @@ void tcg_optimize(TCGContext *s) prev_mb_args =3D args; } } + g_free(temps); + g_free(temps_used); } --=20 2.7.4