From nobody Tue Nov 11 05:02:34 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1561116016; cv=none; d=zoho.com; s=zohoarc; b=PhNZ1HtCZBEc5oBW6jBgbiCX89VDW+P4yicis0BXDuXCQnYT64pobhW229bqSxvPD1Yz6C7ZsPyZnLdN1caaxDOwHtkf8rzeN/Rky3QT/KqyqiW22Ja/3QY6ejBv1hQ7MQ+sfLWpCTT9xUofwV/lsDnCrFZddRVZnbobFfZu44U= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1561116016; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=hn8NMQ3TTd1BKdyZYXTTEy3AZG3l2JIOD+UiVvmszfI=; b=mV4Hc6lTbZu5J/7xeDG7R0qon1w2uaWmonjCbb6ywW19ph1OGebCDyrCKyiEA1ANkYeM7rK68wxYnfMpBIE6B4n652vm44CsskNLGoqQvM2Xe12cWrKvvCINfc79zqYj6BNxUTakhQJLLBFM+8mhwpLfO2eQvVsP2Bay4P7O53Y= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1561116016028321.92925232551454; Fri, 21 Jun 2019 04:20:16 -0700 (PDT) Received: from localhost ([::1]:59486 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1heHaY-0002La-KU for importer@patchew.org; Fri, 21 Jun 2019 07:20:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:43546) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1heHPI-0000yy-RS for qemu-devel@nongnu.org; Fri, 21 Jun 2019 07:08:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1heHPF-0008ES-No for qemu-devel@nongnu.org; Fri, 21 Jun 2019 07:08:36 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:48139 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1heHPF-0008DZ-Bn for qemu-devel@nongnu.org; Fri, 21 Jun 2019 07:08:33 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 8D4DB1A22A0; Fri, 21 Jun 2019 13:07:25 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id 64B021A453E; Fri, 21 Jun 2019 13:07:25 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Fri, 21 Jun 2019 13:07:11 +0200 Message-Id: <1561115232-17155-8-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1561115232-17155-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1561115232-17155-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH v3 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize Altivec instruction vclzh (Vector Count Leading Zeros Halfword). This instruction counts the number of leading zeros of each halfword element in source register and places result in the appropriate halfword element of destination register. In each iteration of outer for loop count operation is performed on one doubleword element of source register vB. In the first iteration, higher doubleword element of vB is placed in variable avr, and then counting for every halfword element is performed by using tcg_gen_clzi_i64. Since it counts leading zeros on 64 bit lenght, ith byte element has to be moved to the highest 16 bits of tmp, or-ed with mask(in order to get all ones in lowest 48 bits), then perform tcg_gen_clzi_i64 and move it's result in appropriate halfword element of result. This is done in inner for loop. After the operation is finished, the result is saved in the appropriate doubleword element of destination register vD. The same sequence of orders is to be applied again for the lower doubleword element of vB. Optimize Altivec instruction vclzb (Vector Count Leading Zeros Byte). This instruction counts the number of leading zeros of each byte element in source register and places result in the appropriate byte element of destination register. In each iteration of the outer for loop, counting operation is done on one doubleword element of source register vB. In the first iteration, the higher doubleword element of vB is placed in variable avr, and then counting for every byte element is performed using tcg_gen_clzi_i64. Since it counts leading zeros on 64 bit lenght, ith byte element has to be moved to the hig= hest 8 bits of variable tmp, or-ed with mask(in order to get all ones in the lo= west 56 bits), then perform tcg_gen_clzi_i64 and move it's result in the appropr= iate byte element of result. This is done in inner for loop. After the operation= is finished, the result is saved in the appropriate doubleword element of des= tination register vD. The same sequence of orders is to be applied again for the low= er doubleword element of vB. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 122 ++++++++++++++++++++++++++++++++= +++- 3 files changed, 120 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 595241c..17b4b06 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -303,8 +303,6 @@ DEF_HELPER_4(vcfsx, void, env, avr, avr, i32) DEF_HELPER_4(vctuxs, void, env, avr, avr, i32) DEF_HELPER_4(vctsxs, void, env, avr, avr, i32) =20 -DEF_HELPER_2(vclzb, void, avr, avr) -DEF_HELPER_2(vclzh, void, avr, avr) DEF_HELPER_2(vctzb, void, avr, avr) DEF_HELPER_2(vctzh, void, avr, avr) DEF_HELPER_2(vctzw, void, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 82cb12e..264b5e7 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1800,15 +1800,6 @@ VUPK(lsw, s64, s32, UPKLO) } \ } =20 -#define clzb(v) ((v) ? clz32((uint32_t)(v) << 24) : 8) -#define clzh(v) ((v) ? clz32((uint32_t)(v) << 16) : 16) - -VGENERIC_DO(clzb, u8) -VGENERIC_DO(clzh, u16) - -#undef clzb -#undef clzh - #define ctzb(v) ((v) ? ctz32(v) : 8) #define ctzh(v) ((v) ? ctz32(v) : 16) #define ctzw(v) ctz32((v)) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 9ed2fae..518d9de 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -840,6 +840,124 @@ static void trans_vgbbd(DisasContext *ctx) } =20 /* + * vclzb VRT,VRB - Vector Count Leading Zeros Byte + * + * Counting the number of leading zero bits of each byte element in source + * register and placing result in appropriate byte element of destination + * register. + */ +static void trans_vclzb(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffffffffULL); + int i, j; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword of vB in avr. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword of vB in avr. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every byte element using tcg_gen_clzi_i64. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith byte element to highest 8 bits of tmp, or it with mask(so w= e get + * all ones in lowest 56 bits), then perform tcg_gen_clzi_i64 and = move + * it's result in appropriate byte element of result. + */ + tcg_gen_shli_i64(tmp, avr, 56); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + for (j =3D 1; j < 7; j++) { + tcg_gen_shli_i64(tmp, avr, (7 - j) * 8); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, j * 8, 8); + } + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 56, 8); + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + set_avr64(VT, result, true); + } else { + /* Place result in low doubleword element of vD. */ + set_avr64(VT, result, false); + } + } + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* + * vclzh VRT,VRB - Vector Count Leading Zeros Halfword + * + * Counting the number of leading zero bits of each halfword element in so= urce + * register and placing result in appropriate halfword element of destinat= ion + * register. + */ +static void trans_vclzh(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffffffULL); + int i, j; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword element of vB in avr. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword element of vB in avr. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every halfword element using tcg_gen_clzi_i64. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith byte element to highest 16 bits of tmp, or it with mask(so = we get + * all ones in lowest 48 bits), then perform tcg_gen_clzi_i64 and = move + * it's result in appropriate halfword element of result. + */ + tcg_gen_shli_i64(tmp, avr, 48); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + for (j =3D 1; j < 3; j++) { + tcg_gen_shli_i64(tmp, avr, (3 - j) * 16); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, j * 16, 16); + } + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 48, 16); + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + set_avr64(VT, result, true); + } else { + /* Place result in low doubleword element of vD. */ + set_avr64(VT, result, false); + } + } + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* * vclzw VRT,VRB - Vector Count Leading Zeros Word * * Counting the number of leading zero bits of each word element in source @@ -1404,8 +1522,8 @@ GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20) GEN_VAFORM_PAIRED(vsel, vperm, 21) GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) =20 -GEN_VXFORM_NOA(vclzb, 1, 28) -GEN_VXFORM_NOA(vclzh, 1, 29) +GEN_VXFORM_TRANS(vclzb, 1, 28) +GEN_VXFORM_TRANS(vclzh, 1, 29) GEN_VXFORM_TRANS(vclzw, 1, 30) GEN_VXFORM_TRANS(vclzd, 1, 31) GEN_VXFORM_NOA_2(vnegw, 1, 24, 6) --=20 2.7.4