[PATCH for 10.0] tcg/optimize: fold recursively after optimizing deposit

Paolo Bonzini posted 1 patch 2 days, 4 hours ago
tcg/optimize.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH for 10.0] tcg/optimize: fold recursively after optimizing deposit
Posted by Paolo Bonzini 2 days, 4 hours ago
When generating code for x86 targets, this is able to simplify XOR+SETcc
sequences.  SETcc generates a setcond+deposit pair of TCG opcodes which
used to become setcond+ext32u after optimization; now TCG recognizes
that the output of setcond is itself already zero extended and turns
the deposit into just a mov.

There are similar cases in fold_movcond and fold_setcond_zmask, but I couldn't
trigger them and they require moving around functions to avoid forward
references[1], so I am leaving them aside for now.

[1] I assume the lack of forward references is intentional in order to
avoid possible mutual recursion

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tcg/optimize.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e9ef16b3c6b..e0fdaeb5500 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1620,7 +1620,7 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
         op->args[1] = op->args[2];
         op->args[2] = arg_new_constant(ctx, mask);
         ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
-        return false;
+        return fold_and(ctx, op);
     }
 
     /* Inserting zero into a value. */
@@ -1630,7 +1630,7 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
         op->opc = and_opc;
         op->args[2] = arg_new_constant(ctx, mask);
         ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
-        return false;
+        return fold_and(ctx, op);
     }
 
     ctx->z_mask = deposit64(arg_info(op->args[1])->z_mask,
-- 
2.47.0
Re: [PATCH for 10.0] tcg/optimize: fold recursively after optimizing deposit
Posted by Richard Henderson 1 day, 9 hours ago
On 11/21/24 02:19, Paolo Bonzini wrote:
> When generating code for x86 targets, this is able to simplify XOR+SETcc
> sequences.  SETcc generates a setcond+deposit pair of TCG opcodes which
> used to become setcond+ext32u after optimization; now TCG recognizes
> that the output of setcond is itself already zero extended and turns
> the deposit into just a mov.
> 
> There are similar cases in fold_movcond and fold_setcond_zmask, but I couldn't
> trigger them and they require moving around functions to avoid forward
> references[1], so I am leaving them aside for now.
> 
> [1] I assume the lack of forward references is intentional in order to
> avoid possible mutual recursion
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   tcg/optimize.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

As far as this goes, it's certainly correct.  See also

https://lore.kernel.org/qemu-devel/20240312143839.136408-1-richard.henderson@linaro.org/

which I failed to pick up after the 9.0 release.  :-/


r~

> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index e9ef16b3c6b..e0fdaeb5500 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -1620,7 +1620,7 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
>           op->args[1] = op->args[2];
>           op->args[2] = arg_new_constant(ctx, mask);
>           ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
> -        return false;
> +        return fold_and(ctx, op);
>       }
>   
>       /* Inserting zero into a value. */
> @@ -1630,7 +1630,7 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
>           op->opc = and_opc;
>           op->args[2] = arg_new_constant(ctx, mask);
>           ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
> -        return false;
> +        return fold_and(ctx, op);
>       }
>   
>       ctx->z_mask = deposit64(arg_info(op->args[1])->z_mask,