[PATCH 4/4] tcg/optimize: optimize TSTNE using smask and zmask

Paolo Bonzini posted 4 patches 9 months ago
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>
[PATCH 4/4] tcg/optimize: optimize TSTNE using smask and zmask
Posted by Paolo Bonzini 9 months ago
Generalize the existing optimization of "TSTNE x,sign" and "TSTNE x,-1".

This can be useful in some cases when the i386 frontend creates opcodes
that test against 0xff or 0x80.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tcg/optimize.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 752cc5c56b6..ab976a5bbe7 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -793,6 +793,7 @@ static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest,
                                      TCGArg *p1, TCGArg *p2, TCGArg *pcond)
 {
     TCGCond cond;
+    TempOptInfo *i1;
     bool swap;
     int r;
 
@@ -810,19 +811,21 @@ static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest,
         return -1;
     }
 
+    i1 = arg_info(*p1);
+
     /*
      * TSTNE x,x -> NE x,0
-     * TSTNE x,-1 -> NE x,0
+     * TSTNE x,i -> NE x,0 if i includes all nonzero bits of x
      */
-    if (args_are_copies(*p1, *p2) || arg_is_const_val(*p2, -1)) {
+    if (args_are_copies(*p1, *p2) ||
+        (arg_is_const(*p2) && (i1->z_mask & ~arg_info(*p2)->val) == 0)) {
         *p2 = arg_new_constant(ctx, 0);
         *pcond = tcg_tst_eqne_cond(cond);
         return -1;
     }
 
-    /* TSTNE x,sign -> LT x,0 */
-    if (arg_is_const_val(*p2, (ctx->type == TCG_TYPE_I32
-                               ? INT32_MIN : INT64_MIN))) {
+    /* TSTNE x,i -> LT x,0 if i only includes sign bit copies */
+    if (arg_is_const(*p2) && (arg_info(*p2)->val & ~i1->s_mask) == 0) {
         *p2 = arg_new_constant(ctx, 0);
         *pcond = tcg_tst_ltge_cond(cond);
         return -1;
-- 
2.43.2
Re: [PATCH 4/4] tcg/optimize: optimize TSTNE using smask and zmask
Posted by Richard Henderson 9 months ago
On 2/28/24 01:11, Paolo Bonzini wrote:
> -    /* TSTNE x,sign -> LT x,0 */
> -    if (arg_is_const_val(*p2, (ctx->type == TCG_TYPE_I32
> -                               ? INT32_MIN : INT64_MIN))) {
> +    /* TSTNE x,i -> LT x,0 if i only includes sign bit copies */
> +    if (arg_is_const(*p2) && (arg_info(*p2)->val & ~i1->s_mask) == 0) {

This is a good idea, but s_mask isn't defined like you think -- it is *repetitions* of the 
sign bit, but not including the sign bit itself.  For INT64_MIN, s_mask == 0.

So for TSTNE min,min, (min & ~0) != 0, so the test won't pass.

r~
Re: [PATCH 4/4] tcg/optimize: optimize TSTNE using smask and zmask
Posted by Paolo Bonzini 9 months ago
On 2/29/24 00:10, Richard Henderson wrote:
> On 2/28/24 01:11, Paolo Bonzini wrote:
>> -    /* TSTNE x,sign -> LT x,0 */
>> -    if (arg_is_const_val(*p2, (ctx->type == TCG_TYPE_I32
>> -                               ? INT32_MIN : INT64_MIN))) {
>> +    /* TSTNE x,i -> LT x,0 if i only includes sign bit copies */
>> +    if (arg_is_const(*p2) && (arg_info(*p2)->val & ~i1->s_mask) == 0) {
> 
> This is a good idea, but s_mask isn't defined like you think -- it is 
> *repetitions* of the sign bit, but not including the sign bit itself.  
> For INT64_MIN, s_mask == 0.
> 
> So for TSTNE min,min, (min & ~0) != 0, so the test won't pass.

Oh! So I have to squash:

diff --git a/tcg/optimize.c b/tcg/optimize.c
index ab976a5bbe7..44d1b1a6d8a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -140,6 +140,12 @@ static inline bool arg_is_const_val(TCGArg arg, uint64_t val)
      return ts_is_const_val(arg_temp(arg), val);
  }
  
+/* Calculate all the copies of the sign bit, both redundant and not. */
+static inline uint64_t all_sign_bit_copies(TempOptInfo *info)
+{
+    return (info->s_mask >> 1) | INT64_MIN;
+}
+
  static inline bool ts_is_copy(TCGTemp *ts)
  {
      return ts_info(ts)->next_copy != ts;
@@ -825,7 +831,7 @@ static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest,
      }
  
      /* TSTNE x,i -> LT x,0 if i only includes sign bit copies */
-    if (arg_is_const(*p2) && (arg_info(*p2)->val & ~i1->s_mask) == 0) {
+    if (arg_is_const(*p2) && (arg_info(*p2)->val & ~all_sign_bit_copies(i1)) == 0) {
          *p2 = arg_new_constant(ctx, 0);
          *pcond = tcg_tst_ltge_cond(cond);
          return -1;


I tested with

    movq $0xffffffff80000000, %rbx
    test %ebx, %ebx
    js y

and I get

  brcond_i64 cc_dst,$0x80000000,tstne,$L1

which works and matches your explanation:

  i1.s_mask == 0xffffffff00000000
  i2.val == 0x80000000
  all_sign_bit_copies(i1) == 0xffffffff80000000
  u2.val & ~all_sign_bit_copies(i1) == 0

Thanks!

Paolo


Re: [PATCH 4/4] tcg/optimize: optimize TSTNE using smask and zmask
Posted by Richard Henderson 9 months ago
On 2/28/24 23:35, Paolo Bonzini wrote:
> On 2/29/24 00:10, Richard Henderson wrote:
>> On 2/28/24 01:11, Paolo Bonzini wrote:
>>> -    /* TSTNE x,sign -> LT x,0 */
>>> -    if (arg_is_const_val(*p2, (ctx->type == TCG_TYPE_I32
>>> -                               ? INT32_MIN : INT64_MIN))) {
>>> +    /* TSTNE x,i -> LT x,0 if i only includes sign bit copies */
>>> +    if (arg_is_const(*p2) && (arg_info(*p2)->val & ~i1->s_mask) == 0) {
>>
>> This is a good idea, but s_mask isn't defined like you think -- it is *repetitions* of 
>> the sign bit, but not including the sign bit itself. For INT64_MIN, s_mask == 0.
>>
>> So for TSTNE min,min, (min & ~0) != 0, so the test won't pass.
> 
> Oh! So I have to squash:
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index ab976a5bbe7..44d1b1a6d8a 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -140,6 +140,12 @@ static inline bool arg_is_const_val(TCGArg arg, uint64_t val)
>       return ts_is_const_val(arg_temp(arg), val);
>   }
> 
> +/* Calculate all the copies of the sign bit, both redundant and not. */
> +static inline uint64_t all_sign_bit_copies(TempOptInfo *info)
> +{
> +    return (info->s_mask >> 1) | INT64_MIN;
> +}

You need to care about type too -- for TCG_TYPE_I32, you'll want to OR in INT32_MIN.  The 
high bits of s_mask will be unknown (might be 1's from fold_masks, might be 0 from reset_ts).

But otherwise that's a good solution.


r~