[Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps

Emilio G. Cota posted 43 patches 8 years, 6 months ago
There is a newer version of this series
[Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Posted by Emilio G. Cota 8 years, 6 months ago
Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of having a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 2% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

Before:
      19489.126318 task-clock                #    0.960 CPUs utilized            ( +-  0.96% )
            23,697 context-switches          #    0.001 M/sec                    ( +-  0.51% )
                 1 CPU-migrations            #    0.000 M/sec
            19,953 page-faults               #    0.001 M/sec                    ( +-  0.40% )
    56,214,402,410 cycles                    #    2.884 GHz                      ( +-  0.95% ) [83.34%]
    25,516,669,513 stalled-cycles-frontend   #   45.39% frontend cycles idle     ( +-  0.69% ) [83.33%]
    17,266,165,747 stalled-cycles-backend    #   30.71% backend  cycles idle     ( +-  0.59% ) [66.66%]
    79,007,843,327 instructions              #    1.41  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.19% ) [83.34%]
    13,136,600,416 branches                  #  674.048 M/sec                    ( +-  1.29% ) [83.34%]
       274,715,270 branch-misses             #    2.09% of all branches          ( +-  0.79% ) [83.33%]

      20.300335944 seconds time elapsed                                          ( +-  0.55% )

After:
      19917.737030 task-clock                #    0.955 CPUs utilized            ( +-  0.74% )
            23,973 context-switches          #    0.001 M/sec                    ( +-  0.37% )
                 1 CPU-migrations            #    0.000 M/sec
            19,824 page-faults               #    0.001 M/sec                    ( +-  0.38% )
    57,380,269,537 cycles                    #    2.881 GHz                      ( +-  0.70% ) [83.34%]
    26,462,452,508 stalled-cycles-frontend   #   46.12% frontend cycles idle     ( +-  0.65% ) [83.34%]
    17,970,546,047 stalled-cycles-backend    #   31.32% backend  cycles idle     ( +-  0.64% ) [66.67%]
    79,527,238,334 instructions              #    1.39  insns per cycle
                                             #    0.33  stalled cycles per insn  ( +-  0.79% ) [83.33%]
    13,272,362,192 branches                  #  666.359 M/sec                    ( +-  0.83% ) [83.34%]
       278,357,773 branch-misses             #    2.10% of all branches          ( +-  0.65% ) [83.33%]

      20.850558455 seconds time elapsed                                          ( +-  0.55% )

That is, 2.70% slowdown.

The perf difference shrinks a bit when using a high-performance allocator
such as tcmalloc:

Before:
      19372.008814 task-clock                #    0.957 CPUs utilized            ( +-  1.00% )
            23,621 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            13,289 page-faults               #    0.001 M/sec                    ( +-  1.46% )
    55,824,272,818 cycles                    #    2.882 GHz                      ( +-  1.00% ) [83.33%]
    25,284,946,453 stalled-cycles-frontend   #   45.29% frontend cycles idle     ( +-  1.12% ) [83.32%]
    17,100,517,753 stalled-cycles-backend    #   30.63% backend  cycles idle     ( +-  0.86% ) [66.69%]
    78,193,046,990 instructions              #    1.40  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.14% ) [83.35%]
    12,986,014,194 branches                  #  670.349 M/sec                    ( +-  1.22% ) [83.34%]
       272,581,789 branch-misses             #    2.10% of all branches          ( +-  0.62% ) [83.33%]

      20.249726404 seconds time elapsed                                          ( +-  0.61% )

After:
      19809.295886 task-clock                #    0.962 CPUs utilized            ( +-  0.99% )
            23,894 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            12,927 page-faults               #    0.001 M/sec                    ( +-  0.78% )
    57,131,686,004 cycles                    #    2.884 GHz                      ( +-  0.97% ) [83.34%]
    25,965,120,001 stalled-cycles-frontend   #   45.45% frontend cycles idle     ( +-  0.71% ) [83.35%]
    17,534,942,176 stalled-cycles-backend    #   30.69% backend  cycles idle     ( +-  0.54% ) [66.68%]
    80,000,003,715 instructions              #    1.40  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.24% ) [83.34%]
    13,327,272,806 branches                  #  672.779 M/sec                    ( +-  1.31% ) [83.34%]
       273,622,661 branch-misses             #    2.05% of all branches          ( +-  0.95% ) [83.31%]

      20.601366430 seconds time elapsed                                          ( +-  0.60% )

That is, 1.77% slowdown.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/optimize.c | 307 ++++++++++++++++++++++++++++++---------------------------
 1 file changed, 162 insertions(+), 145 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index adfc56c..b727a4a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -40,21 +40,18 @@ struct tcg_temp_info {
     tcg_target_ulong mask;
 };
 
-static struct tcg_temp_info temps[TCG_MAX_TEMPS];
-static TCGTempSet temps_used;
-
-static inline bool temp_is_const(TCGArg arg)
+static inline bool temp_is_const(const struct tcg_temp_info *temps, TCGArg arg)
 {
     return temps[arg].is_const;
 }
 
-static inline bool temp_is_copy(TCGArg arg)
+static inline bool temp_is_copy(const struct tcg_temp_info *temps, TCGArg arg)
 {
     return temps[arg].next_copy != arg;
 }
 
 /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
-static void reset_temp(TCGArg temp)
+static void reset_temp(struct tcg_temp_info *temps, TCGArg temp)
 {
     temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
     temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
@@ -64,21 +61,16 @@ static void reset_temp(TCGArg temp)
     temps[temp].mask = -1;
 }
 
-/* Reset all temporaries, given that there are NB_TEMPS of them.  */
-static void reset_all_temps(int nb_temps)
-{
-    bitmap_zero(temps_used.l, nb_temps);
-}
-
 /* Initialize and activate a temporary.  */
-static void init_temp_info(TCGArg temp)
+static void init_temp_info(struct tcg_temp_info *temps,
+                           unsigned long *temps_used, TCGArg temp)
 {
-    if (!test_bit(temp, temps_used.l)) {
+    if (!test_bit(temp, temps_used)) {
         temps[temp].next_copy = temp;
         temps[temp].prev_copy = temp;
         temps[temp].is_const = false;
         temps[temp].mask = -1;
-        set_bit(temp, temps_used.l);
+        set_bit(temp, temps_used);
     }
 }
 
@@ -116,7 +108,8 @@ static TCGOpcode op_to_movi(TCGOpcode op)
     }
 }
 
-static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
+static TCGArg find_better_copy(TCGContext *s, const struct tcg_temp_info *temps,
+                               TCGArg temp)
 {
     TCGArg i;
 
@@ -145,7 +138,8 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
     return temp;
 }
 
-static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
+static bool temps_are_copies(const struct tcg_temp_info *temps, TCGArg arg1,
+                             TCGArg arg2)
 {
     TCGArg i;
 
@@ -153,7 +147,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
         return true;
     }
 
-    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
+    if (!temp_is_copy(temps, arg1) || !temp_is_copy(temps, arg2)) {
         return false;
     }
 
@@ -166,15 +160,15 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
     return false;
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
-                             TCGArg dst, TCGArg val)
+static void tcg_opt_gen_movi(TCGContext *s, struct tcg_temp_info *temps,
+                             TCGOp *op, TCGArg *args, TCGArg dst, TCGArg val)
 {
     TCGOpcode new_op = op_to_movi(op->opc);
     tcg_target_ulong mask;
 
     op->opc = new_op;
 
-    reset_temp(dst);
+    reset_temp(temps, dst);
     temps[dst].is_const = true;
     temps[dst].val = val;
     mask = val;
@@ -188,10 +182,10 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
     args[1] = val;
 }
 
-static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
-                            TCGArg dst, TCGArg src)
+static void tcg_opt_gen_mov(TCGContext *s, struct tcg_temp_info *temps,
+                            TCGOp *op, TCGArg *args, TCGArg dst, TCGArg src)
 {
-    if (temps_are_copies(dst, src)) {
+    if (temps_are_copies(temps, dst, src)) {
         tcg_op_remove(s, op);
         return;
     }
@@ -201,7 +195,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
 
     op->opc = new_op;
 
-    reset_temp(dst);
+    reset_temp(temps, dst);
     mask = temps[src].mask;
     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
         /* High bits of the destination are now garbage.  */
@@ -463,10 +457,11 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 
 /* Return 2 if the condition can't be simplified, and the result
    of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
-                                       TCGArg y, TCGCond c)
+static TCGArg
+do_constant_folding_cond(const struct tcg_temp_info *temps, TCGOpcode op,
+                         TCGArg x, TCGArg y, TCGCond c)
 {
-    if (temp_is_const(x) && temp_is_const(y)) {
+    if (temp_is_const(temps, x) && temp_is_const(temps, y)) {
         switch (op_bits(op)) {
         case 32:
             return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
@@ -475,9 +470,9 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
         default:
             tcg_abort();
         }
-    } else if (temps_are_copies(x, y)) {
+    } else if (temps_are_copies(temps, x, y)) {
         return do_constant_folding_cond_eq(c);
-    } else if (temp_is_const(y) && temps[y].val == 0) {
+    } else if (temp_is_const(temps, y) && temps[y].val == 0) {
         switch (c) {
         case TCG_COND_LTU:
             return 0;
@@ -492,15 +487,17 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
 
 /* Return 2 if the condition can't be simplified, and the result
    of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
+static TCGArg
+do_constant_folding_cond2(const struct tcg_temp_info *temps, TCGArg *p1,
+                          TCGArg *p2, TCGCond c)
 {
     TCGArg al = p1[0], ah = p1[1];
     TCGArg bl = p2[0], bh = p2[1];
 
-    if (temp_is_const(bl) && temp_is_const(bh)) {
+    if (temp_is_const(temps, bl) && temp_is_const(temps, bh)) {
         uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
 
-        if (temp_is_const(al) && temp_is_const(ah)) {
+        if (temp_is_const(temps, al) && temp_is_const(temps, ah)) {
             uint64_t a;
             a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
             return do_constant_folding_cond_64(a, b, c);
@@ -516,18 +513,19 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
             }
         }
     }
-    if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) {
+    if (temps_are_copies(temps, al, bl) && temps_are_copies(temps, ah, bh)) {
         return do_constant_folding_cond_eq(c);
     }
     return 2;
 }
 
-static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
+static bool swap_commutative(const struct tcg_temp_info *temps, TCGArg dest,
+                             TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 = *p1, a2 = *p2;
     int sum = 0;
-    sum += temp_is_const(a1);
-    sum -= temp_is_const(a2);
+    sum += temp_is_const(temps, a1);
+    sum -= temp_is_const(temps, a2);
 
     /* Prefer the constant in second argument, and then the form
        op a, a, b, which is better handled on non-RISC hosts. */
@@ -539,13 +537,14 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
     return false;
 }
 
-static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
+static bool swap_commutative2(const struct tcg_temp_info *temps, TCGArg *p1,
+                              TCGArg *p2)
 {
     int sum = 0;
-    sum += temp_is_const(p1[0]);
-    sum += temp_is_const(p1[1]);
-    sum -= temp_is_const(p2[0]);
-    sum -= temp_is_const(p2[1]);
+    sum += temp_is_const(temps, p1[0]);
+    sum += temp_is_const(temps, p1[1]);
+    sum -= temp_is_const(temps, p2[0]);
+    sum -= temp_is_const(temps, p2[1]);
     if (sum > 0) {
         TCGArg t;
         t = p1[0], p1[0] = p2[0], p2[0] = t;
@@ -558,6 +557,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
+    struct tcg_temp_info *temps;
+    unsigned long *temps_used;
     int oi, oi_next, nb_temps, nb_globals;
     TCGArg *prev_mb_args = NULL;
 
@@ -568,7 +569,8 @@ void tcg_optimize(TCGContext *s)
 
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
-    reset_all_temps(nb_temps);
+    temps = g_new(struct tcg_temp_info, nb_temps);
+    temps_used = bitmap_new(nb_temps);
 
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
         tcg_target_ulong mask, partmask, affected;
@@ -590,21 +592,21 @@ void tcg_optimize(TCGContext *s)
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
                 tmp = args[i];
                 if (tmp != TCG_CALL_DUMMY_ARG) {
-                    init_temp_info(tmp);
+                    init_temp_info(temps, temps_used, tmp);
                 }
             }
         } else {
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                init_temp_info(args[i]);
+                init_temp_info(temps, temps_used, args[i]);
             }
         }
 
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (temp_is_copy(args[i])) {
-                args[i] = find_better_copy(s, args[i]);
+            if (temp_is_copy(temps, args[i])) {
+                args[i] = find_better_copy(s, temps, args[i]);
             }
         }
 
@@ -620,44 +622,44 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(nor):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            swap_commutative(args[0], &args[1], &args[2]);
+            swap_commutative(temps, args[0], &args[1], &args[2]);
             break;
         CASE_OP_32_64(brcond):
-            if (swap_commutative(-1, &args[0], &args[1])) {
+            if (swap_commutative(temps, -1, &args[0], &args[1])) {
                 args[2] = tcg_swap_cond(args[2]);
             }
             break;
         CASE_OP_32_64(setcond):
-            if (swap_commutative(args[0], &args[1], &args[2])) {
+            if (swap_commutative(temps, args[0], &args[1], &args[2])) {
                 args[3] = tcg_swap_cond(args[3]);
             }
             break;
         CASE_OP_32_64(movcond):
-            if (swap_commutative(-1, &args[1], &args[2])) {
+            if (swap_commutative(temps, -1, &args[1], &args[2])) {
                 args[5] = tcg_swap_cond(args[5]);
             }
             /* For movcond, we canonicalize the "false" input reg to match
                the destination reg so that the tcg backend can implement
                a "move if true" operation.  */
-            if (swap_commutative(args[0], &args[4], &args[3])) {
+            if (swap_commutative(temps, args[0], &args[4], &args[3])) {
                 args[5] = tcg_invert_cond(args[5]);
             }
             break;
         CASE_OP_32_64(add2):
-            swap_commutative(args[0], &args[2], &args[4]);
-            swap_commutative(args[1], &args[3], &args[5]);
+            swap_commutative(temps, args[0], &args[2], &args[4]);
+            swap_commutative(temps, args[1], &args[3], &args[5]);
             break;
         CASE_OP_32_64(mulu2):
         CASE_OP_32_64(muls2):
-            swap_commutative(args[0], &args[2], &args[3]);
+            swap_commutative(temps, args[0], &args[2], &args[3]);
             break;
         case INDEX_op_brcond2_i32:
-            if (swap_commutative2(&args[0], &args[2])) {
+            if (swap_commutative2(temps, &args[0], &args[2])) {
                 args[4] = tcg_swap_cond(args[4]);
             }
             break;
         case INDEX_op_setcond2_i32:
-            if (swap_commutative2(&args[1], &args[3])) {
+            if (swap_commutative2(temps, &args[1], &args[3])) {
                 args[5] = tcg_swap_cond(args[5]);
             }
             break;
@@ -673,8 +675,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
-            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if (temp_is_const(temps, args[1]) && temps[args[1]].val == 0) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -683,7 +685,7 @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode neg_op;
                 bool have_neg;
 
-                if (temp_is_const(args[2])) {
+                if (temp_is_const(temps, args[2])) {
                     /* Proceed with possible constant folding. */
                     break;
                 }
@@ -697,9 +699,9 @@ void tcg_optimize(TCGContext *s)
                 if (!have_neg) {
                     break;
                 }
-                if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
+                if (temp_is_const(temps, args[1]) && temps[args[1]].val == 0) {
                     op->opc = neg_op;
-                    reset_temp(args[0]);
+                    reset_temp(temps, args[0]);
                     args[1] = args[2];
                     continue;
                 }
@@ -707,30 +709,30 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(xor):
         CASE_OP_32_64(nand):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val == -1) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(nor):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val == 0) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(andc):
-            if (!temp_is_const(args[2])
-                && temp_is_const(args[1]) && temps[args[1]].val == -1) {
+            if (!temp_is_const(temps, args[2])
+                && temp_is_const(temps, args[1]) && temps[args[1]].val == -1) {
                 i = 2;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(args[2])
-                && temp_is_const(args[1]) && temps[args[1]].val == 0) {
+            if (!temp_is_const(temps, args[2])
+                && temp_is_const(temps, args[1]) && temps[args[1]].val == 0) {
                 i = 2;
                 goto try_not;
             }
@@ -751,7 +753,7 @@ void tcg_optimize(TCGContext *s)
                     break;
                 }
                 op->opc = not_op;
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 args[1] = args[i];
                 continue;
             }
@@ -771,18 +773,18 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
         CASE_OP_32_64(andc):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val == 0) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
         CASE_OP_32_64(and):
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val == -1) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
@@ -819,7 +821,7 @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(and):
             mask = temps[args[2]].mask;
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
         and_const:
                 affected = temps[args[1]].mask & ~mask;
             }
@@ -838,7 +840,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 mask = ~temps[args[2]].mask;
                 goto and_const;
             }
@@ -847,26 +849,26 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_sar_i32:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (int32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (int64_t)temps[args[1]].mask >> tmp;
             }
             break;
 
         case INDEX_op_shr_i32:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (uint32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (uint64_t)temps[args[1]].mask >> tmp;
             }
@@ -880,7 +882,7 @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(shl):
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
                 mask = temps[args[1]].mask << tmp;
             }
@@ -976,12 +978,12 @@ void tcg_optimize(TCGContext *s)
 
         if (partmask == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_movi(s, op, args, args[0], 0);
+            tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
             continue;
         }
         if (affected == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
             continue;
         }
 
@@ -991,8 +993,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mul):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if ((temp_is_const(temps, args[2]) && temps[args[2]].val == 0)) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -1004,8 +1006,8 @@ void tcg_optimize(TCGContext *s)
         switch (opc) {
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
-            if (temps_are_copies(args[1], args[2])) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (temps_are_copies(temps, args[1], args[2])) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
@@ -1018,8 +1020,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(xor):
-            if (temps_are_copies(args[1], args[2])) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if (temps_are_copies(temps, args[1], args[2])) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -1032,10 +1034,10 @@ void tcg_optimize(TCGContext *s)
            allocator where needed and possible.  Also detect copies. */
         switch (opc) {
         CASE_OP_32_64(mov):
-            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
             break;
         CASE_OP_32_64(movi):
-            tcg_opt_gen_movi(s, op, args, args[0], args[1]);
+            tcg_opt_gen_movi(s, temps, op, args, args[0], args[1]);
             break;
 
         CASE_OP_32_64(not):
@@ -1051,9 +1053,9 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extu_i32_i64:
         case INDEX_op_extrl_i64_i32:
         case INDEX_op_extrh_i64_i32:
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1080,66 +1082,70 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
         CASE_OP_32_64(rem):
         CASE_OP_32_64(remu):
-            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[1]) &&
+                temp_is_const(temps, args[2])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val,
                                           temps[args[2]].val);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(clz):
         CASE_OP_32_64(ctz):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 TCGArg v = temps[args[1]].val;
                 if (v != 0) {
                     tmp = do_constant_folding(opc, v, 0);
-                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                    tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 } else {
-                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
+                    tcg_opt_gen_mov(s, temps, op, args, args[0], args[2]);
                 }
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(deposit):
-            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[1]) &&
+                temp_is_const(temps, args[2])) {
                 tmp = deposit64(temps[args[1]].val, args[3], args[4],
                                 temps[args[2]].val);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(extract):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp = extract64(temps[args[1]].val, args[2], args[3]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(sextract):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp = sextract64(temps[args[1]].val, args[2], args[3]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(setcond):
-            tmp = do_constant_folding_cond(opc, args[1], args[2], args[3]);
+            tmp = do_constant_folding_cond(temps, opc, args[1], args[2],
+                                           args[3]);
             if (tmp != 2) {
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(brcond):
-            tmp = do_constant_folding_cond(opc, args[0], args[1], args[2]);
+            tmp = do_constant_folding_cond(temps, opc, args[0], args[1],
+                                           args[2]);
             if (tmp != 2) {
                 if (tmp) {
-                    reset_all_temps(nb_temps);
+                    bitmap_zero(temps_used, nb_temps);
                     op->opc = INDEX_op_br;
                     args[0] = args[3];
                 } else {
@@ -1150,12 +1156,14 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         CASE_OP_32_64(movcond):
-            tmp = do_constant_folding_cond(opc, args[1], args[2], args[5]);
+            tmp = do_constant_folding_cond(temps, opc, args[1], args[2],
+                                           args[5]);
             if (tmp != 2) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]);
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[4 - tmp]);
                 break;
             }
-            if (temp_is_const(args[3]) && temp_is_const(args[4])) {
+            if (temp_is_const(temps, args[3]) &&
+                temp_is_const(temps, args[4])) {
                 tcg_target_ulong tv = temps[args[3]].val;
                 tcg_target_ulong fv = temps[args[4]].val;
                 TCGCond cond = args[5];
@@ -1174,8 +1182,10 @@ void tcg_optimize(TCGContext *s)
 
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
-            if (temp_is_const(args[2]) && temp_is_const(args[3])
-                && temp_is_const(args[4]) && temp_is_const(args[5])) {
+            if (temp_is_const(temps, args[2]) &&
+                temp_is_const(temps, args[3]) &&
+                temp_is_const(temps, args[4]) &&
+                temp_is_const(temps, args[5])) {
                 uint32_t al = temps[args[2]].val;
                 uint32_t ah = temps[args[3]].val;
                 uint32_t bl = temps[args[4]].val;
@@ -1194,8 +1204,8 @@ void tcg_optimize(TCGContext *s)
 
                 rl = args[0];
                 rh = args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (int32_t)a);
-                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(a >> 32));
+                tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)a);
+                tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(a >> 32));
 
                 /* We've done all we need to do with the movi.  Skip it.  */
                 oi_next = op2->next;
@@ -1204,7 +1214,8 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_mulu2_i32:
-            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
+            if (temp_is_const(temps, args[2]) &&
+                temp_is_const(temps, args[3])) {
                 uint32_t a = temps[args[2]].val;
                 uint32_t b = temps[args[3]].val;
                 uint64_t r = (uint64_t)a * b;
@@ -1214,8 +1225,8 @@ void tcg_optimize(TCGContext *s)
 
                 rl = args[0];
                 rh = args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (int32_t)r);
-                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(r >> 32));
+                tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)r);
+                tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(r >> 32));
 
                 /* We've done all we need to do with the movi.  Skip it.  */
                 oi_next = op2->next;
@@ -1224,11 +1235,11 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_brcond2_i32:
-            tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
+            tmp = do_constant_folding_cond2(temps, &args[0], &args[2], args[4]);
             if (tmp != 2) {
                 if (tmp) {
             do_brcond_true:
-                    reset_all_temps(nb_temps);
+                    bitmap_zero(temps_used, nb_temps);
                     op->opc = INDEX_op_br;
                     args[0] = args[5];
                 } else {
@@ -1236,12 +1247,14 @@ void tcg_optimize(TCGContext *s)
                     tcg_op_remove(s, op);
                 }
             } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
-                       && temp_is_const(args[2]) && temps[args[2]].val == 0
-                       && temp_is_const(args[3]) && temps[args[3]].val == 0) {
+                       && temp_is_const(temps, args[2])
+                       && temps[args[2]].val == 0
+                       && temp_is_const(temps, args[3])
+                       && temps[args[3]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
                 op->opc = INDEX_op_brcond_i32;
                 args[0] = args[1];
                 args[1] = args[3];
@@ -1250,14 +1263,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[4] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32,
                                                args[0], args[2], TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_brcond_false;
                 } else if (tmp == 1) {
                     goto do_brcond_high;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32,
                                                args[1], args[3], TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_brcond_false;
@@ -1265,7 +1278,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_brcond_low:
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
                 op->opc = INDEX_op_brcond_i32;
                 args[1] = args[2];
                 args[2] = args[4];
@@ -1273,14 +1286,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[4] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32,
                                                args[0], args[2], TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_brcond_high;
                 } else if (tmp == 1) {
                     goto do_brcond_true;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32,
                                                args[1], args[3], TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_brcond_low;
@@ -1294,17 +1307,19 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_setcond2_i32:
-            tmp = do_constant_folding_cond2(&args[1], &args[3], args[5]);
+            tmp = do_constant_folding_cond2(temps, &args[1], &args[3], args[5]);
             if (tmp != 2) {
             do_setcond_const:
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
             } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
-                       && temp_is_const(args[3]) && temps[args[3]].val == 0
-                       && temp_is_const(args[4]) && temps[args[4]].val == 0) {
+                       && temp_is_const(temps, args[3])
+                       && temps[args[3]].val == 0
+                       && temp_is_const(temps, args[4])
+                       && temps[args[4]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 temps[args[0]].mask = 1;
                 op->opc = INDEX_op_setcond_i32;
                 args[1] = args[2];
@@ -1313,14 +1328,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[5] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32,
                                                args[1], args[3], TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_setcond_const;
                 } else if (tmp == 1) {
                     goto do_setcond_high;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32,
                                                args[2], args[4], TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_setcond_high;
@@ -1328,7 +1343,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_setcond_low:
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 temps[args[0]].mask = 1;
                 op->opc = INDEX_op_setcond_i32;
                 args[2] = args[3];
@@ -1336,14 +1351,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[5] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32,
                                                args[1], args[3], TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_setcond_high;
                 } else if (tmp == 1) {
                     goto do_setcond_const;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32,
                                                args[2], args[4], TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_setcond_low;
@@ -1360,8 +1375,8 @@ void tcg_optimize(TCGContext *s)
             if (!(args[nb_oargs + nb_iargs + 1]
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
-                    if (test_bit(i, temps_used.l)) {
-                        reset_temp(i);
+                    if (test_bit(i, temps_used)) {
+                        reset_temp(temps, i);
                     }
                 }
             }
@@ -1375,11 +1390,11 @@ void tcg_optimize(TCGContext *s)
                block, otherwise we only trash the output args.  "mask" is
                the non-zero bits mask for the first output arg.  */
             if (def->flags & TCG_OPF_BB_END) {
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
             } else {
         do_reset_output:
                 for (i = 0; i < nb_oargs; i++) {
-                    reset_temp(args[i]);
+                    reset_temp(temps, args[i]);
                     /* Save the corresponding known-zero bits mask for the
                        first output argument (only one supported so far). */
                     if (i == 0) {
@@ -1428,4 +1443,6 @@ void tcg_optimize(TCGContext *s)
             prev_mb_args = args;
         }
     }
+    g_free(temps);
+    g_free(temps_used);
 }
-- 
2.7.4


Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Posted by Richard Henderson 8 years, 6 months ago
On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> Groundwork for supporting multiple TCG contexts.
> 
> While at it, also allocate temps_used directly as a bitmap of the
> required size, instead of having a bitmap of TCG_MAX_TEMPS via
> TCGTempSet.
> 
> Performance-wise we lose about 2% in a translation-heavy workload
> such as booting+shutting down debian-arm:
> 
> Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
> 	-machine type=virt -nographic -smp 1 -m 4096 \
> 	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
> 	-device virtio-net-device,netdev=unet \
> 	-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
> 	-device virtio-blk-device,drive=myblock \
> 	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
> 	-name arm,debug-threads=on -smp 1' (10 runs):
> 
> Before:
>        19489.126318 task-clock                #    0.960 CPUs utilized            ( +-  0.96% )
>              23,697 context-switches          #    0.001 M/sec                    ( +-  0.51% )
>                   1 CPU-migrations            #    0.000 M/sec
>              19,953 page-faults               #    0.001 M/sec                    ( +-  0.40% )
>      56,214,402,410 cycles                    #    2.884 GHz                      ( +-  0.95% ) [83.34%]
>      25,516,669,513 stalled-cycles-frontend   #   45.39% frontend cycles idle     ( +-  0.69% ) [83.33%]
>      17,266,165,747 stalled-cycles-backend    #   30.71% backend  cycles idle     ( +-  0.59% ) [66.66%]
>      79,007,843,327 instructions              #    1.41  insns per cycle
>                                               #    0.32  stalled cycles per insn  ( +-  1.19% ) [83.34%]
>      13,136,600,416 branches                  #  674.048 M/sec                    ( +-  1.29% ) [83.34%]
>         274,715,270 branch-misses             #    2.09% of all branches          ( +-  0.79% ) [83.33%]
> 
>        20.300335944 seconds time elapsed                                          ( +-  0.55% )
> 
> After:
>        19917.737030 task-clock                #    0.955 CPUs utilized            ( +-  0.74% )
>              23,973 context-switches          #    0.001 M/sec                    ( +-  0.37% )
>                   1 CPU-migrations            #    0.000 M/sec
>              19,824 page-faults               #    0.001 M/sec                    ( +-  0.38% )
>      57,380,269,537 cycles                    #    2.881 GHz                      ( +-  0.70% ) [83.34%]
>      26,462,452,508 stalled-cycles-frontend   #   46.12% frontend cycles idle     ( +-  0.65% ) [83.34%]
>      17,970,546,047 stalled-cycles-backend    #   31.32% backend  cycles idle     ( +-  0.64% ) [66.67%]
>      79,527,238,334 instructions              #    1.39  insns per cycle
>                                               #    0.33  stalled cycles per insn  ( +-  0.79% ) [83.33%]
>      13,272,362,192 branches                  #  666.359 M/sec                    ( +-  0.83% ) [83.34%]
>         278,357,773 branch-misses             #    2.10% of all branches          ( +-  0.65% ) [83.33%]
> 
>        20.850558455 seconds time elapsed                                          ( +-  0.55% )
> 
> That is, 2.70% slowdown.

That's disappointing.  How about using tcg_malloc?

Maximum allocation is sizeof(tcg_temp_info) * TCG_MAX_TEMPS = 12288, which is 
less than TCG_POOL_CHUNK_SIZE, so we'll retain the allocation in the pool 
across translations.

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Posted by Emilio G. Cota 8 years, 6 months ago
On Wed, Jul 19, 2017 at 21:39:35 -1000, Richard Henderson wrote:
> On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> >Groundwork for supporting multiple TCG contexts.
> >That is, 2.70% slowdown.
> 
> That's disappointing.  How about using tcg_malloc?
> 
> Maximum allocation is sizeof(tcg_temp_info) * TCG_MAX_TEMPS = 12288, which
> is less than TCG_POOL_CHUNK_SIZE, so we'll retain the allocation in the pool
> across translations.

             exec time (s)  Relative slowdown wrt original (%)
---------------------------------------------------------------
 original     20.213321616                                  0.
 tcg_malloc   20.441130078                           1.1270214
 TCGContext   20.477846517                           1.3086662
 g_malloc     20.780527895                           2.8061013

So will go with tcg_malloc.

BTW, is there any chance that the pool will be initialized before we copy
tcg_init_ctx? That'd mean the main thread has performed translation, which
seems unlikely to me. But should then we bother clearing the TCGProfile
counters after we copy tcg_init_ctx? I don't see how without translation
counters would be !0.

		E.

Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Posted by Richard Henderson 8 years, 6 months ago
On 07/20/2017 01:53 PM, Emilio G. Cota wrote:
> BTW, is there any chance that the pool will be initialized before we copy
> tcg_init_ctx? That'd mean the main thread has performed translation, which
> seems unlikely to me. But should then we bother clearing the TCGProfile
> counters after we copy tcg_init_ctx? I don't see how without translation
> counters would be !0.

I wouldn't think so.  This cpu setup should be happening very early.

We could perhaps look at arranging fields such that all the fields that are 
"shared" between the contexts are up front, and use the qemu standard

   memcpy(new, old, offsetof(TCGContext, end_common_fields));

trick, and zero the rest.


r~

Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Posted by Emilio G. Cota 8 years, 6 months ago
On Thu, Jul 20, 2017 at 14:02:53 -1000, Richard Henderson wrote:
> On 07/20/2017 01:53 PM, Emilio G. Cota wrote:
> >BTW, is there any chance that the pool will be initialized before we copy
> >tcg_init_ctx? That'd mean the main thread has performed translation, which
> >seems unlikely to me. But should then we bother clearing the TCGProfile
> >counters after we copy tcg_init_ctx? I don't see how without translation
> >counters would be !0.
> 
> I wouldn't think so.  This cpu setup should be happening very early.

OK. I've removed the clearing of prof in v4.

> We could perhaps look at arranging fields such that all the fields that are
> "shared" between the contexts are up front, and use the qemu standard
> 
>   memcpy(new, old, offsetof(TCGContext, end_common_fields));
> 
> trick, and zero the rest.

It'll be much faster if you do this because you're familiar with all
the fields in there (I'm not); I've added this to the "to do later"
list in v4's cover letter so that we do not forget.

v4 coming up.

		E.