It looks like a typo. When the false value (C) is the constant -1, the
correct fold should be: R = B | ~A
Reproducer (LoongArch64 assembly):
.text
.globl _start
_start:
vldi $vr1, 3073
vldi $vr2, 1023
vbitsel.v $vr0, $vr2, $vr1, $vr2
vpickve2gr.d $a1, $vr0, 1
xori $a0, $a1, 1
li.w $a7, 93
syscall 0
Fixes: e58b977238e3 ("tcg/optimize: Optimize bitsel_vec")
Link: https://github.com/llvm/llvm-project/issues/159610
Signed-off-by: WANG Rui <wangrui@loongson.cn>
---
tcg/optimize.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 3638ab9fea..f69702b26e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1568,9 +1568,10 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp *op)
return fold_and(ctx, op);
}
if (fv == -1 && TCG_TARGET_HAS_orc_vec) {
+ TCGArg ta = op->args[2];
op->opc = INDEX_op_orc_vec;
op->args[2] = op->args[1];
- op->args[1] = op->args[3];
+ op->args[1] = ta;
return fold_orc(ctx, op);
}
}
--
2.51.0
On 19.09.2025 15:49, WANG Rui wrote: > It looks like a typo. When the false value (C) is the constant -1, the > correct fold should be: R = B | ~A > > Reproducer (LoongArch64 assembly): > > .text > .globl _start > _start: > vldi $vr1, 3073 > vldi $vr2, 1023 > vbitsel.v $vr0, $vr2, $vr1, $vr2 > vpickve2gr.d $a1, $vr0, 1 > xori $a0, $a1, 1 > li.w $a7, 93 > syscall 0 > > Fixes: e58b977238e3 ("tcg/optimize: Optimize bitsel_vec") > Link: https://github.com/llvm/llvm-project/issues/159610 > Signed-off-by: WANG Rui <wangrui@loongson.cn> It also looks like qemu-stable@ material. Please let me know if it isn't. Thanks, /mjt > tcg/optimize.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/tcg/optimize.c b/tcg/optimize.c > index 3638ab9fea..f69702b26e 100644 > --- a/tcg/optimize.c > +++ b/tcg/optimize.c > @@ -1568,9 +1568,10 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp *op) > return fold_and(ctx, op); > } > if (fv == -1 && TCG_TARGET_HAS_orc_vec) { > + TCGArg ta = op->args[2]; > op->opc = INDEX_op_orc_vec; > op->args[2] = op->args[1]; > - op->args[1] = op->args[3]; > + op->args[1] = ta; > return fold_orc(ctx, op); > } > }
On 9/24/25 12:31, Michael Tokarev wrote: > On 19.09.2025 15:49, WANG Rui wrote: >> It looks like a typo. When the false value (C) is the constant -1, the >> correct fold should be: R = B | ~A >> >> Reproducer (LoongArch64 assembly): >> >> .text >> .globl _start >> _start: >> vldi $vr1, 3073 >> vldi $vr2, 1023 >> vbitsel.v $vr0, $vr2, $vr1, $vr2 >> vpickve2gr.d $a1, $vr0, 1 >> xori $a0, $a1, 1 >> li.w $a7, 93 >> syscall 0 >> >> Fixes: e58b977238e3 ("tcg/optimize: Optimize bitsel_vec") >> Link: https://github.com/llvm/llvm-project/issues/159610 >> Signed-off-by: WANG Rui <wangrui@loongson.cn> > > It also looks like qemu-stable@ material. > > Please let me know if it isn't. It is, yes. r~
On 19/9/25 14:49, WANG Rui wrote: > It looks like a typo. Likely from the TCG_TARGET_HAS_andc_vec case. > When the false value (C) is the constant -1, the > correct fold should be: R = B | ~A > > Reproducer (LoongArch64 assembly): > > .text > .globl _start > _start: > vldi $vr1, 3073 > vldi $vr2, 1023 > vbitsel.v $vr0, $vr2, $vr1, $vr2 > vpickve2gr.d $a1, $vr0, 1 > xori $a0, $a1, 1 > li.w $a7, 93 > syscall 0 > > Fixes: e58b977238e3 ("tcg/optimize: Optimize bitsel_vec") > Link: https://github.com/llvm/llvm-project/issues/159610 > Signed-off-by: WANG Rui <wangrui@loongson.cn> > --- > tcg/optimize.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/tcg/optimize.c b/tcg/optimize.c > index 3638ab9fea..f69702b26e 100644 > --- a/tcg/optimize.c > +++ b/tcg/optimize.c > @@ -1568,9 +1568,10 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp *op) > return fold_and(ctx, op); > } > if (fv == -1 && TCG_TARGET_HAS_orc_vec) { > + TCGArg ta = op->args[2]; > op->opc = INDEX_op_orc_vec; > op->args[2] = op->args[1]; > - op->args[1] = op->args[3]; > + op->args[1] = ta; > return fold_orc(ctx, op); > } > } Looks correct, but I don't understand the swap. Can't we justkeep the same argument order for an ORC opcode? I'd have done: -- >8 -- @@ -1569,8 +1569,6 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp *op) } if (fv == -1 && TCG_TARGET_HAS_orc_vec) { op->opc = INDEX_op_orc_vec; - op->args[2] = op->args[1]; - op->args[1] = op->args[3]; return fold_orc(ctx, op); } } ---
On Wed, Sep 24, 2025 at 12:03 PM Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > > On 19/9/25 14:49, WANG Rui wrote: > > It looks like a typo. > > Likely from the TCG_TARGET_HAS_andc_vec case. > > > When the false value (C) is the constant -1, the > > correct fold should be: R = B | ~A > > > > Reproducer (LoongArch64 assembly): > > > > .text > > .globl _start > > _start: > > vldi $vr1, 3073 > > vldi $vr2, 1023 > > vbitsel.v $vr0, $vr2, $vr1, $vr2 > > vpickve2gr.d $a1, $vr0, 1 > > xori $a0, $a1, 1 > > li.w $a7, 93 > > syscall 0 > > > > Fixes: e58b977238e3 ("tcg/optimize: Optimize bitsel_vec") > > Link: https://github.com/llvm/llvm-project/issues/159610 > > Signed-off-by: WANG Rui <wangrui@loongson.cn> > > --- > > tcg/optimize.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/tcg/optimize.c b/tcg/optimize.c > > index 3638ab9fea..f69702b26e 100644 > > --- a/tcg/optimize.c > > +++ b/tcg/optimize.c > > @@ -1568,9 +1568,10 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp *op) > > return fold_and(ctx, op); > > } > > if (fv == -1 && TCG_TARGET_HAS_orc_vec) { > > + TCGArg ta = op->args[2]; > > op->opc = INDEX_op_orc_vec; > > op->args[2] = op->args[1]; > > - op->args[1] = op->args[3]; > > + op->args[1] = ta; > > return fold_orc(ctx, op); > > } > > } > Looks correct, but I don't understand the swap. Can't we justkeep the > same argument order for an ORC opcode? I'd have done: > > -- >8 -- > @@ -1569,8 +1569,6 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp > *op) > } > if (fv == -1 && TCG_TARGET_HAS_orc_vec) { > op->opc = INDEX_op_orc_vec; > - op->args[2] = op->args[1]; > - op->args[1] = op->args[3]; > return fold_orc(ctx, op); > } Bitwise logic can be tricky and easy to get wrong. In general, (a | ~b) != (b | ~a). For example, when a = 0 and b = 1, the results differ. > } > ---
On 24/9/25 06:29, WANG Rui wrote: > On Wed, Sep 24, 2025 at 12:03 PM Philippe Mathieu-Daudé > <philmd@linaro.org> wrote: >> >> On 19/9/25 14:49, WANG Rui wrote: >>> It looks like a typo. >> >> Likely from the TCG_TARGET_HAS_andc_vec case. >> >>> When the false value (C) is the constant -1, the >>> correct fold should be: R = B | ~A >>> >>> Reproducer (LoongArch64 assembly): >>> >>> .text >>> .globl _start >>> _start: >>> vldi $vr1, 3073 >>> vldi $vr2, 1023 >>> vbitsel.v $vr0, $vr2, $vr1, $vr2 >>> vpickve2gr.d $a1, $vr0, 1 >>> xori $a0, $a1, 1 >>> li.w $a7, 93 >>> syscall 0 >>> >>> Fixes: e58b977238e3 ("tcg/optimize: Optimize bitsel_vec") >>> Link: https://github.com/llvm/llvm-project/issues/159610 >>> Signed-off-by: WANG Rui <wangrui@loongson.cn> >>> --- >>> tcg/optimize.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/tcg/optimize.c b/tcg/optimize.c >>> index 3638ab9fea..f69702b26e 100644 >>> --- a/tcg/optimize.c >>> +++ b/tcg/optimize.c >>> @@ -1568,9 +1568,10 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp *op) >>> return fold_and(ctx, op); >>> } >>> if (fv == -1 && TCG_TARGET_HAS_orc_vec) { >>> + TCGArg ta = op->args[2]; >>> op->opc = INDEX_op_orc_vec; >>> op->args[2] = op->args[1]; >>> - op->args[1] = op->args[3]; >>> + op->args[1] = ta; >>> return fold_orc(ctx, op); >>> } >>> } >> Looks correct, but I don't understand the swap. Can't we justkeep the >> same argument order for an ORC opcode? I'd have done: >> >> -- >8 -- >> @@ -1569,8 +1569,6 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp >> *op) >> } >> if (fv == -1 && TCG_TARGET_HAS_orc_vec) { >> op->opc = INDEX_op_orc_vec; >> - op->args[2] = op->args[1]; >> - op->args[1] = op->args[3]; >> return fold_orc(ctx, op); >> } > > Bitwise logic can be tricky and easy to get wrong. In general, (a | > ~b) != (b | ~a). For example, when a = 0 and b = 1, the results > differ. Oh right. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> > >> } >> --- >
On 9/19/25 05:49, WANG Rui wrote: > It looks like a typo. When the false value (C) is the constant -1, the > correct fold should be: R = B | ~A > > Reproducer (LoongArch64 assembly): > > .text > .globl _start > _start: > vldi $vr1, 3073 > vldi $vr2, 1023 > vbitsel.v $vr0, $vr2, $vr1, $vr2 > vpickve2gr.d $a1, $vr0, 1 > xori $a0, $a1, 1 > li.w $a7, 93 > syscall 0 > > Fixes: e58b977238e3 ("tcg/optimize: Optimize bitsel_vec") > Link: https://github.com/llvm/llvm-project/issues/159610 > Signed-off-by: WANG Rui <wangrui@loongson.cn> > --- > tcg/optimize.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~ > > diff --git a/tcg/optimize.c b/tcg/optimize.c > index 3638ab9fea..f69702b26e 100644 > --- a/tcg/optimize.c > +++ b/tcg/optimize.c > @@ -1568,9 +1568,10 @@ static bool fold_bitsel_vec(OptContext *ctx, TCGOp *op) > return fold_and(ctx, op); > } > if (fv == -1 && TCG_TARGET_HAS_orc_vec) { > + TCGArg ta = op->args[2]; > op->opc = INDEX_op_orc_vec; > op->args[2] = op->args[1]; > - op->args[1] = op->args[3]; > + op->args[1] = ta; > return fold_orc(ctx, op); > } > }
© 2016 - 2025 Red Hat, Inc.