On 30/10/2021 18:17, Richard Henderson wrote:
> On 10/29/21 1:23 PM, matheus.ferst@eldorado.org.br wrote:
>> From: Luis Pires <luis.pires@eldorado.org.br>
>>
>> Implement the following PowerISA v3.1 instruction:
>> cntlzdm: Count Leading Zeros Doubleword Under Bit Mask
>>
>> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
>> Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>> v2:
>> - Inline implementation of cntlzdm
>> ---
>> target/ppc/insn32.decode | 1 +
>> target/ppc/translate/fixedpoint-impl.c.inc | 36 ++++++++++++++++++++++
>> 2 files changed, 37 insertions(+)
>>
>> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
>> index 9cb9fc00b8..221cb00dd6 100644
>> --- a/target/ppc/insn32.decode
>> +++ b/target/ppc/insn32.decode
>> @@ -203,6 +203,7 @@ ADDPCIS 010011 ..... ..... ..........
>> 00010 . @DX
>> ## Fixed-Point Logical Instructions
>>
>> CFUGED 011111 ..... ..... ..... 0011011100 - @X
>> +CNTLZDM 011111 ..... ..... ..... 0000111011 - @X
>>
>> ### Float-Point Load Instructions
>>
>> diff --git a/target/ppc/translate/fixedpoint-impl.c.inc
>> b/target/ppc/translate/fixedpoint-impl.c.inc
>> index 0d9c6e0996..c9e9ae35df 100644
>> --- a/target/ppc/translate/fixedpoint-impl.c.inc
>> +++ b/target/ppc/translate/fixedpoint-impl.c.inc
>> @@ -413,3 +413,39 @@ static bool trans_CFUGED(DisasContext *ctx, arg_X
>> *a)
>> #endif
>> return true;
>> }
>> +
>> +#if defined(TARGET_PPC64)
>> +static void do_cntlzdm(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 mask)
>> +{
>> + TCGv_i64 tmp;
>> + TCGLabel *l1;
>> +
>> + tmp = tcg_temp_local_new_i64();
>> + l1 = gen_new_label();
>> +
>> + tcg_gen_and_i64(tmp, src, mask);
>> + tcg_gen_clzi_i64(tmp, tmp, 64);
>> +
>> + tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, l1);
>> +
>> + tcg_gen_subfi_i64(tmp, 64, tmp);
>> + tcg_gen_shr_i64(tmp, mask, tmp);
>> + tcg_gen_ctpop_i64(tmp, tmp);
>> +
>> + gen_set_label(l1);
>> +
>> + tcg_gen_mov_i64(dst, tmp);
>
> This works, but a form without brcond would be better (due to how poorly
> tcg handles basic
> blocks).
>
I should've tried a little harder to get rid of this branch...
> How about
>
> tcg_gen_clzi_i64(tmp, tmp, 0);
>
> tcg_gen_xori_i64(tmp, tmp, 63);
> tcg_gen_shr_i64(tmp, mask, tmp);
> tcg_gen_shri_i64(tmp, tmp, 1);
>
> tcg_gen_ctpop_i64(dst, tmp);
>
> The middle 3 operations perform a shift between [1-64], such that we are
> assured of 0 for 64.
When src & mask == 0 we shouldn't shift mask (or shift it zero bits), so
I guess we can't have this shri. Maybe something like
tcg_gen_and_i64(t0, src, mask);
tcg_gen_clzi_i64(t0, t0, -1);
tcg_gen_setcondi_i64(TCG_COND_NE, t1, t0, -1);
tcg_gen_andi_i64(t0, t0, 63);
tcg_gen_xori_i64(t0, t0, 63);
tcg_gen_shr_i64(t0, mask, t0);
tcg_gen_shr_i64(t0, t0, t1);
tcg_gen_ctpop_i64(dst, t0);
So we still shift 63+1 bits when there are no leading zeros and shift 0
bits when it's all zeros.
>
> Either way,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>