From nobody Mon Feb  9 07:55:19 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1500521991256291.2138133884731;
 Wed, 19 Jul 2017 20:39:51 -0700 (PDT)
Received: from localhost ([::1]:35927 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1dY2JY-0002vx-V6
	for importer@patchew.org; Wed, 19 Jul 2017 23:39:48 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60519)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1dY1qY-0001G5-A3
	for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:56 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1dY1qM-0005vn-Pq
	for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:50 -0400
Received: from out1-smtp.messagingengine.com ([66.111.4.25]:50359)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <cota@braap.org>) id 1dY1qM-0005ty-Dh
	for qemu-devel@nongnu.org; Wed, 19 Jul 2017 23:09:38 -0400
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
	by mailout.nyi.internal (Postfix) with ESMTP id 0ED2D20B62;
	Wed, 19 Jul 2017 23:09:38 -0400 (EDT)
Received: from frontend2 ([10.202.2.161])
	by compute4.internal (MEProxy); Wed, 19 Jul 2017 23:09:38 -0400
Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216])
	by mail.messagingengine.com (Postfix) with ESMTPA id B0BD5241E0;
	Wed, 19 Jul 2017 23:09:37 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc
	:date:from:in-reply-to:message-id:references:subject:to
	:x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=n5g
	/MKWDlH9d+wkYacQLM0igFrKnNchnVLVrQX9OlPQ=; b=U0QzpcIV3PSqQ/iSAPE
	xSykj25UYJ11/9zG5L3SR5vA9SZP1TEKUnee5bdqWhB8JXMYj5ANSM+l8aXQFoEB
	1nBS62BmuFwfavdZLs1eJXjLWj10eKvkFpo6ur5TdDBExy4LXmqqThLOIu51zI/Q
	fbZnOchC5CMDAyMn3NO9ROFo=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
	messagingengine.com; h=cc:date:from:in-reply-to:message-id
	:references:subject:to:x-me-sender:x-me-sender:x-sasl-enc
	:x-sasl-enc; s=fm1; bh=n5g/MKWDlH9d+wkYacQLM0igFrKnNchnVLVrQX9Ol
	PQ=; b=rr4L97LmtAt9sifkNAunk8ZCIiXUqCigxs/oHcXRUWGfIpkjVAC5IOcS2
	vOBspx5TQLRZf/pn+25v+iaaLwZnbCnJdH6Q37Sc2ad+3otKp3IQ6PIH0ZnKz7yA
	/Db8ET+FqiiptCqCzY2GvK1eEB40GETnh6ViLYy7UqDVQRALOOXJKuss9uiSF5Sr
	G7hOR2pLngkbYA6Dl1WdxJExiFNsvUAAsJXHRyFUUPYPoCgilw2Nue+nStYo/GJC
	on7pSjOktAWriNNn+bQzm+iuExFa2k/HmAOISAGu2d6wmwOQSSUmsYTMvlm3tV7+
	8lGauhLDXD3d8js5WgK92HAvV/1GQ==
X-ME-Sender: <xms:8R5wWa2r9qR3HVUd6yshzvsurrb3Mg80JtxUA_nD44h5GeE0St_oOA>
X-Sasl-enc: FMfMp9o0lAUE95n3Bph6CBSvof4SDhjLr9/IRGWU2AAL 1500520177
From: "Emilio G. Cota" <cota@braap.org>
To: qemu-devel@nongnu.org
Date: Wed, 19 Jul 2017 23:09:21 -0400
Message-Id: <1500520169-23367-36-git-send-email-cota@braap.org>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1500520169-23367-1-git-send-email-cota@braap.org>
References: <1500520169-23367-1-git-send-email-cota@braap.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 66.111.4.25
Subject: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer
 temps
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Richard Henderson <rth@twiddle.net>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail: RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of having a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 2% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
	-machine type=3Dvirt -nographic -smp 1 -m 4096 \
	-netdev user,id=3Dunet,hostfwd=3Dtcp::2222-:22 \
	-device virtio-net-device,netdev=3Dunet \
	-drive file=3Ddie-on-boot.qcow2,id=3Dmyblock,index=3D0,if=3Dnone \
	-device virtio-blk-device,drive=3Dmyblock \
	-kernel kernel.img -append console=3DttyAMA0 root=3D/dev/vda1 \
	-name arm,debug-threads=3Don -smp 1' (10 runs):

Before:
      19489.126318 task-clock                #    0.960 CPUs utilized      =
      ( +-  0.96% )
            23,697 context-switches          #    0.001 M/sec              =
      ( +-  0.51% )
                 1 CPU-migrations            #    0.000 M/sec
            19,953 page-faults               #    0.001 M/sec              =
      ( +-  0.40% )
    56,214,402,410 cycles                    #    2.884 GHz                =
      ( +-  0.95% ) [83.34%]
    25,516,669,513 stalled-cycles-frontend   #   45.39% frontend cycles idl=
e     ( +-  0.69% ) [83.33%]
    17,266,165,747 stalled-cycles-backend    #   30.71% backend  cycles idl=
e     ( +-  0.59% ) [66.66%]
    79,007,843,327 instructions              #    1.41  insns per cycle
                                             #    0.32  stalled cycles per =
insn  ( +-  1.19% ) [83.34%]
    13,136,600,416 branches                  #  674.048 M/sec              =
      ( +-  1.29% ) [83.34%]
       274,715,270 branch-misses             #    2.09% of all branches    =
      ( +-  0.79% ) [83.33%]

      20.300335944 seconds time elapsed                                    =
      ( +-  0.55% )

After:
      19917.737030 task-clock                #    0.955 CPUs utilized      =
      ( +-  0.74% )
            23,973 context-switches          #    0.001 M/sec              =
      ( +-  0.37% )
                 1 CPU-migrations            #    0.000 M/sec
            19,824 page-faults               #    0.001 M/sec              =
      ( +-  0.38% )
    57,380,269,537 cycles                    #    2.881 GHz                =
      ( +-  0.70% ) [83.34%]
    26,462,452,508 stalled-cycles-frontend   #   46.12% frontend cycles idl=
e     ( +-  0.65% ) [83.34%]
    17,970,546,047 stalled-cycles-backend    #   31.32% backend  cycles idl=
e     ( +-  0.64% ) [66.67%]
    79,527,238,334 instructions              #    1.39  insns per cycle
                                             #    0.33  stalled cycles per =
insn  ( +-  0.79% ) [83.33%]
    13,272,362,192 branches                  #  666.359 M/sec              =
      ( +-  0.83% ) [83.34%]
       278,357,773 branch-misses             #    2.10% of all branches    =
      ( +-  0.65% ) [83.33%]

      20.850558455 seconds time elapsed                                    =
      ( +-  0.55% )

That is, 2.70% slowdown.

The perf difference shrinks a bit when using a high-performance allocator
such as tcmalloc:

Before:
      19372.008814 task-clock                #    0.957 CPUs utilized      =
      ( +-  1.00% )
            23,621 context-switches          #    0.001 M/sec              =
      ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            13,289 page-faults               #    0.001 M/sec              =
      ( +-  1.46% )
    55,824,272,818 cycles                    #    2.882 GHz                =
      ( +-  1.00% ) [83.33%]
    25,284,946,453 stalled-cycles-frontend   #   45.29% frontend cycles idl=
e     ( +-  1.12% ) [83.32%]
    17,100,517,753 stalled-cycles-backend    #   30.63% backend  cycles idl=
e     ( +-  0.86% ) [66.69%]
    78,193,046,990 instructions              #    1.40  insns per cycle
                                             #    0.32  stalled cycles per =
insn  ( +-  1.14% ) [83.35%]
    12,986,014,194 branches                  #  670.349 M/sec              =
      ( +-  1.22% ) [83.34%]
       272,581,789 branch-misses             #    2.10% of all branches    =
      ( +-  0.62% ) [83.33%]

      20.249726404 seconds time elapsed                                    =
      ( +-  0.61% )

After:
      19809.295886 task-clock                #    0.962 CPUs utilized      =
      ( +-  0.99% )
            23,894 context-switches          #    0.001 M/sec              =
      ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            12,927 page-faults               #    0.001 M/sec              =
      ( +-  0.78% )
    57,131,686,004 cycles                    #    2.884 GHz                =
      ( +-  0.97% ) [83.34%]
    25,965,120,001 stalled-cycles-frontend   #   45.45% frontend cycles idl=
e     ( +-  0.71% ) [83.35%]
    17,534,942,176 stalled-cycles-backend    #   30.69% backend  cycles idl=
e     ( +-  0.54% ) [66.68%]
    80,000,003,715 instructions              #    1.40  insns per cycle
                                             #    0.32  stalled cycles per =
insn  ( +-  1.24% ) [83.34%]
    13,327,272,806 branches                  #  672.779 M/sec              =
      ( +-  1.31% ) [83.34%]
       273,622,661 branch-misses             #    2.05% of all branches    =
      ( +-  0.95% ) [83.31%]

      20.601366430 seconds time elapsed                                    =
      ( +-  0.60% )

That is, 1.77% slowdown.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 307 ++++++++++++++++++++++++++++++-----------------------=
----
 1 file changed, 162 insertions(+), 145 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index adfc56c..b727a4a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -40,21 +40,18 @@ struct tcg_temp_info {
     tcg_target_ulong mask;
 };
=20
-static struct tcg_temp_info temps[TCG_MAX_TEMPS];
-static TCGTempSet temps_used;
-
-static inline bool temp_is_const(TCGArg arg)
+static inline bool temp_is_const(const struct tcg_temp_info *temps, TCGArg=
 arg)
 {
     return temps[arg].is_const;
 }
=20
-static inline bool temp_is_copy(TCGArg arg)
+static inline bool temp_is_copy(const struct tcg_temp_info *temps, TCGArg =
arg)
 {
     return temps[arg].next_copy !=3D arg;
 }
=20
 /* Reset TEMP's state, possibly removing the temp for the list of copies. =
 */
-static void reset_temp(TCGArg temp)
+static void reset_temp(struct tcg_temp_info *temps, TCGArg temp)
 {
     temps[temps[temp].next_copy].prev_copy =3D temps[temp].prev_copy;
     temps[temps[temp].prev_copy].next_copy =3D temps[temp].next_copy;
@@ -64,21 +61,16 @@ static void reset_temp(TCGArg temp)
     temps[temp].mask =3D -1;
 }
=20
-/* Reset all temporaries, given that there are NB_TEMPS of them.  */
-static void reset_all_temps(int nb_temps)
-{
-    bitmap_zero(temps_used.l, nb_temps);
-}
-
 /* Initialize and activate a temporary.  */
-static void init_temp_info(TCGArg temp)
+static void init_temp_info(struct tcg_temp_info *temps,
+                           unsigned long *temps_used, TCGArg temp)
 {
-    if (!test_bit(temp, temps_used.l)) {
+    if (!test_bit(temp, temps_used)) {
         temps[temp].next_copy =3D temp;
         temps[temp].prev_copy =3D temp;
         temps[temp].is_const =3D false;
         temps[temp].mask =3D -1;
-        set_bit(temp, temps_used.l);
+        set_bit(temp, temps_used);
     }
 }
=20
@@ -116,7 +108,8 @@ static TCGOpcode op_to_movi(TCGOpcode op)
     }
 }
=20
-static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
+static TCGArg find_better_copy(TCGContext *s, const struct tcg_temp_info *=
temps,
+                               TCGArg temp)
 {
     TCGArg i;
=20
@@ -145,7 +138,8 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg te=
mp)
     return temp;
 }
=20
-static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
+static bool temps_are_copies(const struct tcg_temp_info *temps, TCGArg arg=
1,
+                             TCGArg arg2)
 {
     TCGArg i;
=20
@@ -153,7 +147,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
         return true;
     }
=20
-    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
+    if (!temp_is_copy(temps, arg1) || !temp_is_copy(temps, arg2)) {
         return false;
     }
=20
@@ -166,15 +160,15 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
     return false;
 }
=20
-static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
-                             TCGArg dst, TCGArg val)
+static void tcg_opt_gen_movi(TCGContext *s, struct tcg_temp_info *temps,
+                             TCGOp *op, TCGArg *args, TCGArg dst, TCGArg v=
al)
 {
     TCGOpcode new_op =3D op_to_movi(op->opc);
     tcg_target_ulong mask;
=20
     op->opc =3D new_op;
=20
-    reset_temp(dst);
+    reset_temp(temps, dst);
     temps[dst].is_const =3D true;
     temps[dst].val =3D val;
     mask =3D val;
@@ -188,10 +182,10 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op=
, TCGArg *args,
     args[1] =3D val;
 }
=20
-static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
-                            TCGArg dst, TCGArg src)
+static void tcg_opt_gen_mov(TCGContext *s, struct tcg_temp_info *temps,
+                            TCGOp *op, TCGArg *args, TCGArg dst, TCGArg sr=
c)
 {
-    if (temps_are_copies(dst, src)) {
+    if (temps_are_copies(temps, dst, src)) {
         tcg_op_remove(s, op);
         return;
     }
@@ -201,7 +195,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, T=
CGArg *args,
=20
     op->opc =3D new_op;
=20
-    reset_temp(dst);
+    reset_temp(temps, dst);
     mask =3D temps[src].mask;
     if (TCG_TARGET_REG_BITS > 32 && new_op =3D=3D INDEX_op_mov_i32) {
         /* High bits of the destination are now garbage.  */
@@ -463,10 +457,11 @@ static bool do_constant_folding_cond_eq(TCGCond c)
=20
 /* Return 2 if the condition can't be simplified, and the result
    of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
-                                       TCGArg y, TCGCond c)
+static TCGArg
+do_constant_folding_cond(const struct tcg_temp_info *temps, TCGOpcode op,
+                         TCGArg x, TCGArg y, TCGCond c)
 {
-    if (temp_is_const(x) && temp_is_const(y)) {
+    if (temp_is_const(temps, x) && temp_is_const(temps, y)) {
         switch (op_bits(op)) {
         case 32:
             return do_constant_folding_cond_32(temps[x].val, temps[y].val,=
 c);
@@ -475,9 +470,9 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TC=
GArg x,
         default:
             tcg_abort();
         }
-    } else if (temps_are_copies(x, y)) {
+    } else if (temps_are_copies(temps, x, y)) {
         return do_constant_folding_cond_eq(c);
-    } else if (temp_is_const(y) && temps[y].val =3D=3D 0) {
+    } else if (temp_is_const(temps, y) && temps[y].val =3D=3D 0) {
         switch (c) {
         case TCG_COND_LTU:
             return 0;
@@ -492,15 +487,17 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, =
TCGArg x,
=20
 /* Return 2 if the condition can't be simplified, and the result
    of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
+static TCGArg
+do_constant_folding_cond2(const struct tcg_temp_info *temps, TCGArg *p1,
+                          TCGArg *p2, TCGCond c)
 {
     TCGArg al =3D p1[0], ah =3D p1[1];
     TCGArg bl =3D p2[0], bh =3D p2[1];
=20
-    if (temp_is_const(bl) && temp_is_const(bh)) {
+    if (temp_is_const(temps, bl) && temp_is_const(temps, bh)) {
         uint64_t b =3D ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[b=
l].val;
=20
-        if (temp_is_const(al) && temp_is_const(ah)) {
+        if (temp_is_const(temps, al) && temp_is_const(temps, ah)) {
             uint64_t a;
             a =3D ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].va=
l;
             return do_constant_folding_cond_64(a, b, c);
@@ -516,18 +513,19 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, T=
CGArg *p2, TCGCond c)
             }
         }
     }
-    if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) {
+    if (temps_are_copies(temps, al, bl) && temps_are_copies(temps, ah, bh)=
) {
         return do_constant_folding_cond_eq(c);
     }
     return 2;
 }
=20
-static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
+static bool swap_commutative(const struct tcg_temp_info *temps, TCGArg des=
t,
+                             TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 =3D *p1, a2 =3D *p2;
     int sum =3D 0;
-    sum +=3D temp_is_const(a1);
-    sum -=3D temp_is_const(a2);
+    sum +=3D temp_is_const(temps, a1);
+    sum -=3D temp_is_const(temps, a2);
=20
     /* Prefer the constant in second argument, and then the form
        op a, a, b, which is better handled on non-RISC hosts. */
@@ -539,13 +537,14 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1,=
 TCGArg *p2)
     return false;
 }
=20
-static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
+static bool swap_commutative2(const struct tcg_temp_info *temps, TCGArg *p=
1,
+                              TCGArg *p2)
 {
     int sum =3D 0;
-    sum +=3D temp_is_const(p1[0]);
-    sum +=3D temp_is_const(p1[1]);
-    sum -=3D temp_is_const(p2[0]);
-    sum -=3D temp_is_const(p2[1]);
+    sum +=3D temp_is_const(temps, p1[0]);
+    sum +=3D temp_is_const(temps, p1[1]);
+    sum -=3D temp_is_const(temps, p2[0]);
+    sum -=3D temp_is_const(temps, p2[1]);
     if (sum > 0) {
         TCGArg t;
         t =3D p1[0], p1[0] =3D p2[0], p2[0] =3D t;
@@ -558,6 +557,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
+    struct tcg_temp_info *temps;
+    unsigned long *temps_used;
     int oi, oi_next, nb_temps, nb_globals;
     TCGArg *prev_mb_args =3D NULL;
=20
@@ -568,7 +569,8 @@ void tcg_optimize(TCGContext *s)
=20
     nb_temps =3D s->nb_temps;
     nb_globals =3D s->nb_globals;
-    reset_all_temps(nb_temps);
+    temps =3D g_new(struct tcg_temp_info, nb_temps);
+    temps_used =3D bitmap_new(nb_temps);
=20
     for (oi =3D s->gen_op_buf[0].next; oi !=3D 0; oi =3D oi_next) {
         tcg_target_ulong mask, partmask, affected;
@@ -590,21 +592,21 @@ void tcg_optimize(TCGContext *s)
             for (i =3D 0; i < nb_oargs + nb_iargs; i++) {
                 tmp =3D args[i];
                 if (tmp !=3D TCG_CALL_DUMMY_ARG) {
-                    init_temp_info(tmp);
+                    init_temp_info(temps, temps_used, tmp);
                 }
             }
         } else {
             nb_oargs =3D def->nb_oargs;
             nb_iargs =3D def->nb_iargs;
             for (i =3D 0; i < nb_oargs + nb_iargs; i++) {
-                init_temp_info(args[i]);
+                init_temp_info(temps, temps_used, args[i]);
             }
         }
=20
         /* Do copy propagation */
         for (i =3D nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (temp_is_copy(args[i])) {
-                args[i] =3D find_better_copy(s, args[i]);
+            if (temp_is_copy(temps, args[i])) {
+                args[i] =3D find_better_copy(s, temps, args[i]);
             }
         }
=20
@@ -620,44 +622,44 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(nor):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            swap_commutative(args[0], &args[1], &args[2]);
+            swap_commutative(temps, args[0], &args[1], &args[2]);
             break;
         CASE_OP_32_64(brcond):
-            if (swap_commutative(-1, &args[0], &args[1])) {
+            if (swap_commutative(temps, -1, &args[0], &args[1])) {
                 args[2] =3D tcg_swap_cond(args[2]);
             }
             break;
         CASE_OP_32_64(setcond):
-            if (swap_commutative(args[0], &args[1], &args[2])) {
+            if (swap_commutative(temps, args[0], &args[1], &args[2])) {
                 args[3] =3D tcg_swap_cond(args[3]);
             }
             break;
         CASE_OP_32_64(movcond):
-            if (swap_commutative(-1, &args[1], &args[2])) {
+            if (swap_commutative(temps, -1, &args[1], &args[2])) {
                 args[5] =3D tcg_swap_cond(args[5]);
             }
             /* For movcond, we canonicalize the "false" input reg to match
                the destination reg so that the tcg backend can implement
                a "move if true" operation.  */
-            if (swap_commutative(args[0], &args[4], &args[3])) {
+            if (swap_commutative(temps, args[0], &args[4], &args[3])) {
                 args[5] =3D tcg_invert_cond(args[5]);
             }
             break;
         CASE_OP_32_64(add2):
-            swap_commutative(args[0], &args[2], &args[4]);
-            swap_commutative(args[1], &args[3], &args[5]);
+            swap_commutative(temps, args[0], &args[2], &args[4]);
+            swap_commutative(temps, args[1], &args[3], &args[5]);
             break;
         CASE_OP_32_64(mulu2):
         CASE_OP_32_64(muls2):
-            swap_commutative(args[0], &args[2], &args[3]);
+            swap_commutative(temps, args[0], &args[2], &args[3]);
             break;
         case INDEX_op_brcond2_i32:
-            if (swap_commutative2(&args[0], &args[2])) {
+            if (swap_commutative2(temps, &args[0], &args[2])) {
                 args[4] =3D tcg_swap_cond(args[4]);
             }
             break;
         case INDEX_op_setcond2_i32:
-            if (swap_commutative2(&args[1], &args[3])) {
+            if (swap_commutative2(temps, &args[1], &args[3])) {
                 args[5] =3D tcg_swap_cond(args[5]);
             }
             break;
@@ -673,8 +675,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
-            if (temp_is_const(args[1]) && temps[args[1]].val =3D=3D 0) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if (temp_is_const(temps, args[1]) && temps[args[1]].val =3D=3D=
 0) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -683,7 +685,7 @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode neg_op;
                 bool have_neg;
=20
-                if (temp_is_const(args[2])) {
+                if (temp_is_const(temps, args[2])) {
                     /* Proceed with possible constant folding. */
                     break;
                 }
@@ -697,9 +699,9 @@ void tcg_optimize(TCGContext *s)
                 if (!have_neg) {
                     break;
                 }
-                if (temp_is_const(args[1]) && temps[args[1]].val =3D=3D 0)=
 {
+                if (temp_is_const(temps, args[1]) && temps[args[1]].val =
=3D=3D 0) {
                     op->opc =3D neg_op;
-                    reset_temp(args[0]);
+                    reset_temp(temps, args[0]);
                     args[1] =3D args[2];
                     continue;
                 }
@@ -707,30 +709,30 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(xor):
         CASE_OP_32_64(nand):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val =3D=3D -1)=
 {
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val =3D=
=3D -1) {
                 i =3D 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(nor):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val =3D=3D 0) {
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val =3D=
=3D 0) {
                 i =3D 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(andc):
-            if (!temp_is_const(args[2])
-                && temp_is_const(args[1]) && temps[args[1]].val =3D=3D -1)=
 {
+            if (!temp_is_const(temps, args[2])
+                && temp_is_const(temps, args[1]) && temps[args[1]].val =3D=
=3D -1) {
                 i =3D 2;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(args[2])
-                && temp_is_const(args[1]) && temps[args[1]].val =3D=3D 0) {
+            if (!temp_is_const(temps, args[2])
+                && temp_is_const(temps, args[1]) && temps[args[1]].val =3D=
=3D 0) {
                 i =3D 2;
                 goto try_not;
             }
@@ -751,7 +753,7 @@ void tcg_optimize(TCGContext *s)
                     break;
                 }
                 op->opc =3D not_op;
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 args[1] =3D args[i];
                 continue;
             }
@@ -771,18 +773,18 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
         CASE_OP_32_64(andc):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val =3D=3D 0) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val =3D=
=3D 0) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
         CASE_OP_32_64(and):
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val =3D=3D -1)=
 {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val =3D=
=3D -1) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
@@ -819,7 +821,7 @@ void tcg_optimize(TCGContext *s)
=20
         CASE_OP_32_64(and):
             mask =3D temps[args[2]].mask;
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
         and_const:
                 affected =3D temps[args[1]].mask & ~mask;
             }
@@ -838,7 +840,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 mask =3D ~temps[args[2]].mask;
                 goto and_const;
             }
@@ -847,26 +849,26 @@ void tcg_optimize(TCGContext *s)
             break;
=20
         case INDEX_op_sar_i32:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp =3D temps[args[2]].val & 31;
                 mask =3D (int32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp =3D temps[args[2]].val & 63;
                 mask =3D (int64_t)temps[args[1]].mask >> tmp;
             }
             break;
=20
         case INDEX_op_shr_i32:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp =3D temps[args[2]].val & 31;
                 mask =3D (uint32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp =3D temps[args[2]].val & 63;
                 mask =3D (uint64_t)temps[args[1]].mask >> tmp;
             }
@@ -880,7 +882,7 @@ void tcg_optimize(TCGContext *s)
             break;
=20
         CASE_OP_32_64(shl):
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp =3D temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
                 mask =3D temps[args[1]].mask << tmp;
             }
@@ -976,12 +978,12 @@ void tcg_optimize(TCGContext *s)
=20
         if (partmask =3D=3D 0) {
             tcg_debug_assert(nb_oargs =3D=3D 1);
-            tcg_opt_gen_movi(s, op, args, args[0], 0);
+            tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
             continue;
         }
         if (affected =3D=3D 0) {
             tcg_debug_assert(nb_oargs =3D=3D 1);
-            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
             continue;
         }
=20
@@ -991,8 +993,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mul):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            if ((temp_is_const(args[2]) && temps[args[2]].val =3D=3D 0)) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if ((temp_is_const(temps, args[2]) && temps[args[2]].val =3D=
=3D 0)) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -1004,8 +1006,8 @@ void tcg_optimize(TCGContext *s)
         switch (opc) {
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
-            if (temps_are_copies(args[1], args[2])) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (temps_are_copies(temps, args[1], args[2])) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
@@ -1018,8 +1020,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(xor):
-            if (temps_are_copies(args[1], args[2])) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if (temps_are_copies(temps, args[1], args[2])) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -1032,10 +1034,10 @@ void tcg_optimize(TCGContext *s)
            allocator where needed and possible.  Also detect copies. */
         switch (opc) {
         CASE_OP_32_64(mov):
-            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
             break;
         CASE_OP_32_64(movi):
-            tcg_opt_gen_movi(s, op, args, args[0], args[1]);
+            tcg_opt_gen_movi(s, temps, op, args, args[0], args[1]);
             break;
=20
         CASE_OP_32_64(not):
@@ -1051,9 +1053,9 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extu_i32_i64:
         case INDEX_op_extrl_i64_i32:
         case INDEX_op_extrh_i64_i32:
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp =3D do_constant_folding(opc, temps[args[1]].val, 0);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1080,66 +1082,70 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
         CASE_OP_32_64(rem):
         CASE_OP_32_64(remu):
-            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[1]) &&
+                temp_is_const(temps, args[2])) {
                 tmp =3D do_constant_folding(opc, temps[args[1]].val,
                                           temps[args[2]].val);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
=20
         CASE_OP_32_64(clz):
         CASE_OP_32_64(ctz):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 TCGArg v =3D temps[args[1]].val;
                 if (v !=3D 0) {
                     tmp =3D do_constant_folding(opc, v, 0);
-                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                    tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 } else {
-                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
+                    tcg_opt_gen_mov(s, temps, op, args, args[0], args[2]);
                 }
                 break;
             }
             goto do_default;
=20
         CASE_OP_32_64(deposit):
-            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[1]) &&
+                temp_is_const(temps, args[2])) {
                 tmp =3D deposit64(temps[args[1]].val, args[3], args[4],
                                 temps[args[2]].val);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
=20
         CASE_OP_32_64(extract):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp =3D extract64(temps[args[1]].val, args[2], args[3]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
=20
         CASE_OP_32_64(sextract):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp =3D sextract64(temps[args[1]].val, args[2], args[3]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
=20
         CASE_OP_32_64(setcond):
-            tmp =3D do_constant_folding_cond(opc, args[1], args[2], args[3=
]);
+            tmp =3D do_constant_folding_cond(temps, opc, args[1], args[2],
+                                           args[3]);
             if (tmp !=3D 2) {
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
=20
         CASE_OP_32_64(brcond):
-            tmp =3D do_constant_folding_cond(opc, args[0], args[1], args[2=
]);
+            tmp =3D do_constant_folding_cond(temps, opc, args[0], args[1],
+                                           args[2]);
             if (tmp !=3D 2) {
                 if (tmp) {
-                    reset_all_temps(nb_temps);
+                    bitmap_zero(temps_used, nb_temps);
                     op->opc =3D INDEX_op_br;
                     args[0] =3D args[3];
                 } else {
@@ -1150,12 +1156,14 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
=20
         CASE_OP_32_64(movcond):
-            tmp =3D do_constant_folding_cond(opc, args[1], args[2], args[5=
]);
+            tmp =3D do_constant_folding_cond(temps, opc, args[1], args[2],
+                                           args[5]);
             if (tmp !=3D 2) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]);
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[4 - tmp]=
);
                 break;
             }
-            if (temp_is_const(args[3]) && temp_is_const(args[4])) {
+            if (temp_is_const(temps, args[3]) &&
+                temp_is_const(temps, args[4])) {
                 tcg_target_ulong tv =3D temps[args[3]].val;
                 tcg_target_ulong fv =3D temps[args[4]].val;
                 TCGCond cond =3D args[5];
@@ -1174,8 +1182,10 @@ void tcg_optimize(TCGContext *s)
=20
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
-            if (temp_is_const(args[2]) && temp_is_const(args[3])
-                && temp_is_const(args[4]) && temp_is_const(args[5])) {
+            if (temp_is_const(temps, args[2]) &&
+                temp_is_const(temps, args[3]) &&
+                temp_is_const(temps, args[4]) &&
+                temp_is_const(temps, args[5])) {
                 uint32_t al =3D temps[args[2]].val;
                 uint32_t ah =3D temps[args[3]].val;
                 uint32_t bl =3D temps[args[4]].val;
@@ -1194,8 +1204,8 @@ void tcg_optimize(TCGContext *s)
=20
                 rl =3D args[0];
                 rh =3D args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (int32_t)a);
-                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(a >> 32));
+                tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)a);
+                tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(a >> =
32));
=20
                 /* We've done all we need to do with the movi.  Skip it.  =
*/
                 oi_next =3D op2->next;
@@ -1204,7 +1214,8 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
=20
         case INDEX_op_mulu2_i32:
-            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
+            if (temp_is_const(temps, args[2]) &&
+                temp_is_const(temps, args[3])) {
                 uint32_t a =3D temps[args[2]].val;
                 uint32_t b =3D temps[args[3]].val;
                 uint64_t r =3D (uint64_t)a * b;
@@ -1214,8 +1225,8 @@ void tcg_optimize(TCGContext *s)
=20
                 rl =3D args[0];
                 rh =3D args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (int32_t)r);
-                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(r >> 32));
+                tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)r);
+                tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(r >> =
32));
=20
                 /* We've done all we need to do with the movi.  Skip it.  =
*/
                 oi_next =3D op2->next;
@@ -1224,11 +1235,11 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
=20
         case INDEX_op_brcond2_i32:
-            tmp =3D do_constant_folding_cond2(&args[0], &args[2], args[4]);
+            tmp =3D do_constant_folding_cond2(temps, &args[0], &args[2], a=
rgs[4]);
             if (tmp !=3D 2) {
                 if (tmp) {
             do_brcond_true:
-                    reset_all_temps(nb_temps);
+                    bitmap_zero(temps_used, nb_temps);
                     op->opc =3D INDEX_op_br;
                     args[0] =3D args[5];
                 } else {
@@ -1236,12 +1247,14 @@ void tcg_optimize(TCGContext *s)
                     tcg_op_remove(s, op);
                 }
             } else if ((args[4] =3D=3D TCG_COND_LT || args[4] =3D=3D TCG_C=
OND_GE)
-                       && temp_is_const(args[2]) && temps[args[2]].val =3D=
=3D 0
-                       && temp_is_const(args[3]) && temps[args[3]].val =3D=
=3D 0) {
+                       && temp_is_const(temps, args[2])
+                       && temps[args[2]].val =3D=3D 0
+                       && temp_is_const(temps, args[3])
+                       && temps[args[3]].val =3D=3D 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
                 op->opc =3D INDEX_op_brcond_i32;
                 args[0] =3D args[1];
                 args[1] =3D args[3];
@@ -1250,14 +1263,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[4] =3D=3D TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
-                tmp =3D do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp =3D do_constant_folding_cond(temps, INDEX_op_brcond_i3=
2,
                                                args[0], args[2], TCG_COND_=
EQ);
                 if (tmp =3D=3D 0) {
                     goto do_brcond_false;
                 } else if (tmp =3D=3D 1) {
                     goto do_brcond_high;
                 }
-                tmp =3D do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp =3D do_constant_folding_cond(temps, INDEX_op_brcond_i3=
2,
                                                args[1], args[3], TCG_COND_=
EQ);
                 if (tmp =3D=3D 0) {
                     goto do_brcond_false;
@@ -1265,7 +1278,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_brcond_low:
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
                 op->opc =3D INDEX_op_brcond_i32;
                 args[1] =3D args[2];
                 args[2] =3D args[4];
@@ -1273,14 +1286,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[4] =3D=3D TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
-                tmp =3D do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp =3D do_constant_folding_cond(temps, INDEX_op_brcond_i3=
2,
                                                args[0], args[2], TCG_COND_=
NE);
                 if (tmp =3D=3D 0) {
                     goto do_brcond_high;
                 } else if (tmp =3D=3D 1) {
                     goto do_brcond_true;
                 }
-                tmp =3D do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp =3D do_constant_folding_cond(temps, INDEX_op_brcond_i3=
2,
                                                args[1], args[3], TCG_COND_=
NE);
                 if (tmp =3D=3D 0) {
                     goto do_brcond_low;
@@ -1294,17 +1307,19 @@ void tcg_optimize(TCGContext *s)
             break;
=20
         case INDEX_op_setcond2_i32:
-            tmp =3D do_constant_folding_cond2(&args[1], &args[3], args[5]);
+            tmp =3D do_constant_folding_cond2(temps, &args[1], &args[3], a=
rgs[5]);
             if (tmp !=3D 2) {
             do_setcond_const:
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
             } else if ((args[5] =3D=3D TCG_COND_LT || args[5] =3D=3D TCG_C=
OND_GE)
-                       && temp_is_const(args[3]) && temps[args[3]].val =3D=
=3D 0
-                       && temp_is_const(args[4]) && temps[args[4]].val =3D=
=3D 0) {
+                       && temp_is_const(temps, args[3])
+                       && temps[args[3]].val =3D=3D 0
+                       && temp_is_const(temps, args[4])
+                       && temps[args[4]].val =3D=3D 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 temps[args[0]].mask =3D 1;
                 op->opc =3D INDEX_op_setcond_i32;
                 args[1] =3D args[2];
@@ -1313,14 +1328,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[5] =3D=3D TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
-                tmp =3D do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp =3D do_constant_folding_cond(temps, INDEX_op_setcond_i=
32,
                                                args[1], args[3], TCG_COND_=
EQ);
                 if (tmp =3D=3D 0) {
                     goto do_setcond_const;
                 } else if (tmp =3D=3D 1) {
                     goto do_setcond_high;
                 }
-                tmp =3D do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp =3D do_constant_folding_cond(temps, INDEX_op_setcond_i=
32,
                                                args[2], args[4], TCG_COND_=
EQ);
                 if (tmp =3D=3D 0) {
                     goto do_setcond_high;
@@ -1328,7 +1343,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_setcond_low:
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 temps[args[0]].mask =3D 1;
                 op->opc =3D INDEX_op_setcond_i32;
                 args[2] =3D args[3];
@@ -1336,14 +1351,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[5] =3D=3D TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
-                tmp =3D do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp =3D do_constant_folding_cond(temps, INDEX_op_setcond_i=
32,
                                                args[1], args[3], TCG_COND_=
NE);
                 if (tmp =3D=3D 0) {
                     goto do_setcond_high;
                 } else if (tmp =3D=3D 1) {
                     goto do_setcond_const;
                 }
-                tmp =3D do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp =3D do_constant_folding_cond(temps, INDEX_op_setcond_i=
32,
                                                args[2], args[4], TCG_COND_=
NE);
                 if (tmp =3D=3D 0) {
                     goto do_setcond_low;
@@ -1360,8 +1375,8 @@ void tcg_optimize(TCGContext *s)
             if (!(args[nb_oargs + nb_iargs + 1]
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS)=
)) {
                 for (i =3D 0; i < nb_globals; i++) {
-                    if (test_bit(i, temps_used.l)) {
-                        reset_temp(i);
+                    if (test_bit(i, temps_used)) {
+                        reset_temp(temps, i);
                     }
                 }
             }
@@ -1375,11 +1390,11 @@ void tcg_optimize(TCGContext *s)
                block, otherwise we only trash the output args.  "mask" is
                the non-zero bits mask for the first output arg.  */
             if (def->flags & TCG_OPF_BB_END) {
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
             } else {
         do_reset_output:
                 for (i =3D 0; i < nb_oargs; i++) {
-                    reset_temp(args[i]);
+                    reset_temp(temps, args[i]);
                     /* Save the corresponding known-zero bits mask for the
                        first output argument (only one supported so far). =
*/
                     if (i =3D=3D 0) {
@@ -1428,4 +1443,6 @@ void tcg_optimize(TCGContext *s)
             prev_mb_args =3D args;
         }
     }
+    g_free(temps);
+    g_free(temps_used);
 }
--=20
2.7.4