From nobody Thu May 16 14:37:59 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1571311812; cv=none; d=zoho.com; s=zohoarc; b=lrtIfqf188sLVdYZSSupsNd1T+hUUqNYFPC5D+JzQjoJTSli8Rhc8pUSiW86qRscd00rR2rnXT+semUzZOmxN4YXUsGw9nwHZysjqJzZP0zvBl+0GgOK3lnhaUT9PQo1rtkKgp8cMF8856Ob4u8udTgLKbHTOqyByHx5PdC2dhs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1571311812; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=WaLMYxvokNkqf7mVRpffLoGco8MTE8mFrX7WCaddrmI=; b=ie1GVgFN74wceIO6sJusSaC93BdvWKe7Xnbvcpk0eK8UbLogTU/u6eobATsLKz+UI3CMypKGNox5MQgGZ0LgsAtSY+aHbSt0Ok+FerpBjdUDOQMnjvDkcfQ7ksqFteFrcvOQ/MA5TL2vA1wY+weyOtpArjtkXvo1SEIM5nqnn5g= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1571311812065832.4677494585744; Thu, 17 Oct 2019 04:30:12 -0700 (PDT) Received: from localhost ([::1]:44273 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iL3ys-0002Qu-Tk for importer@patchew.org; Thu, 17 Oct 2019 07:30:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46650) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iL3wc-0008FB-QO for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iL3wb-0007K4-1b for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:50 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33235 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iL3wa-0007Ij-NJ for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 505241A21C6; Thu, 17 Oct 2019 13:27:43 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.14.77]) by mail.rt-rk.com (Postfix) with ESMTPSA id E6BC71A21BA; Thu, 17 Oct 2019 13:27:42 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Subject: [PATCH v7 1/3] target/ppc: Optimize emulation of vclzh and vclzb instructions Date: Thu, 17 Oct 2019 13:27:37 +0200 Message-Id: <1571311659-15556-2-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571311659-15556-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1571311659-15556-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 89.216.37.149 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: stefan.brankovic@rt-rk.com, richard.henderson@linaro.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize Altivec instruction vclzh (Vector Count Leading Zeros Halfword). This instruction counts the number of leading zeros of each halfword element in source register and places result in the appropriate halfword element of destination register. In each iteration of outer for loop count operation is performed on one doubleword element of source register vB. In the first iteration, higher doubleword element of vB is placed in variable avr, and then counting for every halfword element is performed by using tcg_gen_clzi_i64. Since it counts leading zeros on 64 bit lenght, ith byte element has to be moved to the highest 16 bits of tmp, or-ed with mask(in order to get all ones in lowest 48 bits), then perform tcg_gen_clzi_i64 and move it's result in appropriate halfword element of result. This is done in inner for loop. After the operation is finished, the result is saved in the appropriate doubleword element of destination register vD. The same sequence of orders is to be applied again for the lower doubleword element of vB. Optimize Altivec instruction vclzb (Vector Count Leading Zeros Byte). This instruction counts the number of leading zeros of each byte element in source register and places result in the appropriate byte element of destination register. In each iteration of the outer for loop, counting operation is done on one doubleword element of source register vB. In the first iteration, the higher doubleword element of vB is placed in variable avr, and then counting for every byte element is performed using tcg_gen_clzi_i64. Since it counts leading zeros on 64 bit lenght, ith byte element has to be moved to the hig= hest 8 bits of variable tmp, or-ed with mask(in order to get all ones in the lo= west 56 bits), then perform tcg_gen_clzi_i64 and move it's result in the appropr= iate byte element of result. This is done in inner for loop. After the operation= is finished, the result is saved in the appropriate doubleword element of des= tination register vD. The same sequence of orders is to be applied again for the low= er doubleword element of vB. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 136 ++++++++++++++++++++++++++++++++= +++- 3 files changed, 134 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index f843814..281e54f 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -308,8 +308,6 @@ DEF_HELPER_4(vcfsx, void, env, avr, avr, i32) DEF_HELPER_4(vctuxs, void, env, avr, avr, i32) DEF_HELPER_4(vctsxs, void, env, avr, avr, i32) =20 -DEF_HELPER_2(vclzb, void, avr, avr) -DEF_HELPER_2(vclzh, void, avr, avr) DEF_HELPER_2(vctzb, void, avr, avr) DEF_HELPER_2(vctzh, void, avr, avr) DEF_HELPER_2(vctzw, void, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 6d238b9..cd00f5e 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1817,15 +1817,6 @@ VUPK(lsw, s64, s32, UPKLO) } \ } =20 -#define clzb(v) ((v) ? clz32((uint32_t)(v) << 24) : 8) -#define clzh(v) ((v) ? clz32((uint32_t)(v) << 16) : 16) - -VGENERIC_DO(clzb, u8) -VGENERIC_DO(clzh, u16) - -#undef clzb -#undef clzh - #define ctzb(v) ((v) ? ctz32(v) : 8) #define ctzh(v) ((v) ? ctz32(v) : 16) #define ctzw(v) ctz32((v)) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 2472a52..a428ef3 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -751,6 +751,138 @@ static void trans_vgbbd(DisasContext *ctx) } =20 /* + * vclzb VRT,VRB - Vector Count Leading Zeros Byte + * + * Counting the number of leading zero bits of each byte element in source + * register and placing result in appropriate byte element of destination + * register. + */ +static void trans_vclzb(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + TCGv_i64 result2 =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffffffffULL); + int i, j; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword of vB in 'avr'. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword of vB in 'avr'. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every byte element using 'tcg_gen_clzi_i64'. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith byte element to highest 8 bits of 'tmp', or it with mask(so= we + * get all ones in lowest 56 bits), then perform 'tcg_gen_clzi_i64= ' and + * move it's result in appropriate byte element of result. + */ + tcg_gen_shli_i64(tmp, avr, 56); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + for (j =3D 1; j < 7; j++) { + tcg_gen_shli_i64(tmp, avr, (7 - j) * 8); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, j * 8, 8); + } + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 56, 8); + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + tcg_gen_mov_i64(result1, result); + } else { + /* Place result in low doubleword element of vD. */ + tcg_gen_mov_i64(result2, result); + } + } + + set_avr64(VT, result1, true); + set_avr64(VT, result2, false); + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); + tcg_temp_free_i64(result2); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* + * vclzh VRT,VRB - Vector Count Leading Zeros Halfword + * + * Counting the number of leading zero bits of each halfword element in so= urce + * register and placing result in appropriate halfword element of destinat= ion + * register. + */ +static void trans_vclzh(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + TCGv_i64 result2 =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffffffULL); + int i, j; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword element of vB in 'avr'. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword element of vB in 'avr'. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every halfword element using 'tcg_gen_clzi_i6= 4'. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith byte element to highest 16 bits of 'tmp', or it with mask(s= o we + * get all ones in lowest 48 bits), then perform 'tcg_gen_clzi_i64= ' and + * move it's result in appropriate halfword element of result. + */ + tcg_gen_shli_i64(tmp, avr, 48); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + for (j =3D 1; j < 3; j++) { + tcg_gen_shli_i64(tmp, avr, (3 - j) * 16); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, j * 16, 16); + } + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 48, 16); + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + tcg_gen_mov_i64(result1, result); + } else { + /* Place result in low doubleword element of vD. */ + tcg_gen_mov_i64(result2, result); + } + } + + set_avr64(VT, result1, true); + set_avr64(VT, result2, false); + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); + tcg_temp_free_i64(result2); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* * vclzw VRT,VRB - Vector Count Leading Zeros Word * * Counting the number of leading zero bits of each word element in source @@ -1315,8 +1447,8 @@ GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20) GEN_VAFORM_PAIRED(vsel, vperm, 21) GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) =20 -GEN_VXFORM_NOA(vclzb, 1, 28) -GEN_VXFORM_NOA(vclzh, 1, 29) +GEN_VXFORM_TRANS(vclzb, 1, 28) +GEN_VXFORM_TRANS(vclzh, 1, 29) GEN_VXFORM_TRANS(vclzw, 1, 30) GEN_VXFORM_TRANS(vclzd, 1, 31) GEN_VXFORM_NOA_2(vnegw, 1, 24, 6) --=20 2.7.4 From nobody Thu May 16 14:37:59 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1571311741; cv=none; d=zoho.com; s=zohoarc; b=F5fm4+EBBb47kyRSNth4+YMeB71RnGkzY7frVTbOZnd044cO98k1EVF2UZ1R+T1yLhB6S/c86sUxh7kZgHIvZZPi3AVjRK8AeRpwmXrMW/Hz1I9Gg5RWnJ//QpcDLqLXoVnMylp9t16bcMbCEoEhQWr51IBwhdEDBLrHcgYrxKk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1571311741; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=/XlmJJhjgX2/FeljfjxFtdI8oCv2El7C1HEQr2HJAfg=; b=YrZQvWz9eH9oWXmt31z/XAmqJQtcrY/OAqQdf0yCE2pug/yrH1S+eM0Amug1soTG7LlI4LAMvrFCeHfPoFWkm/kDUDszALd/xsxWv1pRmD5C88kFN7bhUtMsPKYljhCxQXmpPCCYgV/WXEWK6HxwFlYuBFqQjhEsvB5FqzKj0jg= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1571311741381539.1907520740305; Thu, 17 Oct 2019 04:29:01 -0700 (PDT) Received: from localhost ([::1]:44256 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iL3xd-0000gx-6s for importer@patchew.org; Thu, 17 Oct 2019 07:28:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46642) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iL3wc-0008F9-LT for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iL3wa-0007Jx-UQ for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:50 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33396 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iL3wa-0007Ir-Jo for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 7B2FB1A21DB; Thu, 17 Oct 2019 13:27:43 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.14.77]) by mail.rt-rk.com (Postfix) with ESMTPSA id F167F1A21C3; Thu, 17 Oct 2019 13:27:42 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Subject: [PATCH v7 2/3] target/ppc: Optimize emulation of vpkpx instruction Date: Thu, 17 Oct 2019 13:27:38 +0200 Message-Id: <1571311659-15556-3-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571311659-15556-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1571311659-15556-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 89.216.37.149 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: stefan.brankovic@rt-rk.com, richard.henderson@linaro.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize altivec instruction vpkpx (Vector Pack Pixel). Rearranges 8 pixels coded in 6-5-5 pattern (4 from each source register) into contigous array of bits in the destination register. In each iteration of outer loop, the instruction is to be done with the 6-5-5 pack for 2 pixels of each doubleword element of each source register. The first thing to be done in outer loop is choosing which doubleword element of which register is to be used in current iteration and it is to be placed in avr variable. The next step is to perform 6-5-5 pack of pixels on avr variable in inner for loop(2 iterations, 1 for each pixel) and save result in tmp variable. In the end of outer for loop, the result is merged in variable called result and saved in appropriate doubleword element of vD if the whole doubleword is finished(every second iteration). The outer loop has 4 iterations. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 1 - target/ppc/int_helper.c | 21 -------- target/ppc/translate/vmx-impl.inc.c | 99 +++++++++++++++++++++++++++++++++= +++- 3 files changed, 98 insertions(+), 23 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 281e54f..b489b38 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -258,7 +258,6 @@ DEF_HELPER_4(vpkudus, void, env, avr, avr, avr) DEF_HELPER_4(vpkuhum, void, env, avr, avr, avr) DEF_HELPER_4(vpkuwum, void, env, avr, avr, avr) DEF_HELPER_4(vpkudum, void, env, avr, avr, avr) -DEF_HELPER_3(vpkpx, void, avr, avr, avr) DEF_HELPER_5(vmhaddshs, void, env, avr, avr, avr, avr) DEF_HELPER_5(vmhraddshs, void, env, avr, avr, avr, avr) DEF_HELPER_5(vmsumuhm, void, env, avr, avr, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index cd00f5e..f910c11 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1262,27 +1262,6 @@ void helper_vpmsumd(ppc_avr_t *r, ppc_avr_t *a, ppc_= avr_t *b) #else #define PKBIG 0 #endif -void helper_vpkpx(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) -{ - int i, j; - ppc_avr_t result; -#if defined(HOST_WORDS_BIGENDIAN) - const ppc_avr_t *x[2] =3D { a, b }; -#else - const ppc_avr_t *x[2] =3D { b, a }; -#endif - - VECTOR_FOR_INORDER_I(i, u64) { - VECTOR_FOR_INORDER_I(j, u32) { - uint32_t e =3D x[i]->u32[j]; - - result.u16[4 * i + j] =3D (((e >> 9) & 0xfc00) | - ((e >> 6) & 0x3e0) | - ((e >> 3) & 0x1f)); - } - } - *r =3D result; -} =20 #define VPK(suffix, from, to, cvt, dosat) \ void helper_vpk##suffix(CPUPPCState *env, ppc_avr_t *r, \ diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index a428ef3..3550ffa 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -579,6 +579,103 @@ static void trans_lvsr(DisasContext *ctx) } =20 /* + * vpkpx VRT,VRA,VRB - Vector Pack Pixel + * + * Rearranges 8 pixels coded in 6-5-5 pattern (4 from each source register) + * into contigous array of bits in the destination register. + */ +static void trans_vpkpx(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VA =3D rA(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 shifted =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + TCGv_i64 result2 =3D tcg_temp_new_i64(); + int64_t mask1 =3D 0x1fULL; + int64_t mask2 =3D 0x1fULL << 5; + int64_t mask3 =3D 0x3fULL << 10; + int i, j; + /* + * In each iteration do the 6-5-5 pack for 2 pixels of each doubleword + * element of each source register. + */ + for (i =3D 0; i < 4; i++) { + switch (i) { + case 0: + /* + * Get high doubleword of vA to perform 6-5-5 pack of pixels + * 1 and 2. + */ + get_avr64(avr, VA, true); + tcg_gen_movi_i64(result, 0x0ULL); + break; + case 1: + /* + * Get low doubleword of vA to perform 6-5-5 pack of pixels + * 3 and 4. + */ + get_avr64(avr, VA, false); + break; + case 2: + /* + * Get high doubleword of vB to perform 6-5-5 pack of pixels + * 5 and 6. + */ + get_avr64(avr, VB, true); + tcg_gen_movi_i64(result, 0x0ULL); + break; + case 3: + /* + * Get low doubleword of vB to perform 6-5-5 pack of pixels + * 7 and 8. + */ + get_avr64(avr, VB, false); + break; + } + /* Perform the packing for 2 pixels(each iteration for 1). */ + tcg_gen_movi_i64(tmp, 0x0ULL); + for (j =3D 0; j < 2; j++) { + tcg_gen_shri_i64(shifted, avr, (j * 16 + 3)); + tcg_gen_andi_i64(shifted, shifted, mask1 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + + tcg_gen_shri_i64(shifted, avr, (j * 16 + 6)); + tcg_gen_andi_i64(shifted, shifted, mask2 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + + tcg_gen_shri_i64(shifted, avr, (j * 16 + 9)); + tcg_gen_andi_i64(shifted, shifted, mask3 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + } + if ((i =3D=3D 0) || (i =3D=3D 2)) { + tcg_gen_shli_i64(tmp, tmp, 32); + } + tcg_gen_or_i64(result, result, tmp); + if (i =3D=3D 1) { + /* Place packed pixels 1:4 to high doubleword of vD. */ + tcg_gen_mov_i64(result1, result); + } + if (i =3D=3D 3) { + /* Place packed pixels 5:8 to low doubleword of vD. */ + tcg_gen_mov_i64(result2, result); + } + } + set_avr64(VT, result1, true); + set_avr64(VT, result2, false); + + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(shifted); + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); + tcg_temp_free_i64(result2); +} + +/* * vsl VRT,VRA,VRB - Vector Shift Left * * Shifting left 128 bit value of vA by value specified in bits 125-127 of= vB. @@ -1063,7 +1160,7 @@ GEN_VXFORM_ENV(vpksdus, 7, 21); GEN_VXFORM_ENV(vpkshss, 7, 6); GEN_VXFORM_ENV(vpkswss, 7, 7); GEN_VXFORM_ENV(vpksdss, 7, 23); -GEN_VXFORM(vpkpx, 7, 12); +GEN_VXFORM_TRANS(vpkpx, 7, 12); GEN_VXFORM_ENV(vsum4ubs, 4, 24); GEN_VXFORM_ENV(vsum4sbs, 4, 28); GEN_VXFORM_ENV(vsum4shs, 4, 25); --=20 2.7.4 From nobody Thu May 16 14:37:59 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1571311924; cv=none; d=zoho.com; s=zohoarc; b=NN69lFyUipsPbEvHKEsTc2bOX8siwWTAMwu1Qr2JH4IOyEhUPAfx7aBe6iRIWw9GTVRYScfW6lElBdZPysWrysF8Hj7C5MFiYF6qXWkACyNB11FKUSG6cdiplvw/Q7tMZfFGgixsF8vazcs0FWyfjXJhF82szIdNfbCAbajDkKo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1571311924; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=Ot9xxjD6iLhZqWk+ulBrGMolJc/eepuYYipL6cOE3dM=; b=Ey+mezIAJT9Jib7K8wanRSovuFjrhZU5c/+mMwmzjwfTMd0jjxJ8tgRm/bIWisTQpn64b65YH7MEulVUPQa4xkv0AuxKEsHL6wXI3D6mPFMRgOBsPXK/B5rDCdZdaYfWVXliV5A7ZHko0CMszYkRLeLMArg3AiJ1wXFMVDDPQ4M= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1571311924895526.7868219746706; Thu, 17 Oct 2019 04:32:04 -0700 (PDT) Received: from localhost ([::1]:44310 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iL40h-00044l-Rw for importer@patchew.org; Thu, 17 Oct 2019 07:32:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46647) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iL3wc-0008FA-MM for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iL3wb-0007KC-54 for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:50 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33347 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iL3wa-0007Iq-RF for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:49 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 829CD1A21BA; Thu, 17 Oct 2019 13:27:43 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.14.77]) by mail.rt-rk.com (Postfix) with ESMTPSA id 1312C1A21DA; Thu, 17 Oct 2019 13:27:43 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Subject: [PATCH v7 3/3] target/ppc: Optimize emulation of vupkhpx and vupklpx instructions Date: Thu, 17 Oct 2019 13:27:39 +0200 Message-Id: <1571311659-15556-4-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571311659-15556-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1571311659-15556-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 89.216.37.149 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: stefan.brankovic@rt-rk.com, richard.henderson@linaro.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" 'trans_vupkpx' function implements both vupkhpx and vupklpx instructions wi= th argument 'high' determine which instruction is processed. Instructions are implemented in two 'for' loops. Outer 'for' loop repeats unpacking two time= s, since both doubleword elements of destination register are formed the same = way. It also stores result of every iteration in temporary register, that is lat= er transferred to destination register. Inner 'for' loop does unpacking of pix= els and forms resulting doubleword 32 by 32 bits. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 20 -------- target/ppc/translate/vmx-impl.inc.c | 91 +++++++++++++++++++++++++++++++++= +++- 3 files changed, 89 insertions(+), 24 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index b489b38..fd06b56 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -233,8 +233,6 @@ DEF_HELPER_2(vextsh2d, void, avr, avr) DEF_HELPER_2(vextsw2d, void, avr, avr) DEF_HELPER_2(vnegw, void, avr, avr) DEF_HELPER_2(vnegd, void, avr, avr) -DEF_HELPER_2(vupkhpx, void, avr, avr) -DEF_HELPER_2(vupklpx, void, avr, avr) DEF_HELPER_2(vupkhsb, void, avr, avr) DEF_HELPER_2(vupkhsh, void, avr, avr) DEF_HELPER_2(vupkhsw, void, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index f910c11..9ee667d 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1737,26 +1737,6 @@ void helper_vsum4ubs(CPUPPCState *env, ppc_avr_t *r,= ppc_avr_t *a, ppc_avr_t *b) #define UPKHI 0 #define UPKLO 1 #endif -#define VUPKPX(suffix, hi) \ - void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ - { \ - int i; \ - ppc_avr_t result; \ - \ - for (i =3D 0; i < ARRAY_SIZE(r->u32); i++) { \ - uint16_t e =3D b->u16[hi ? i : i + 4]; \ - uint8_t a =3D (e >> 15) ? 0xff : 0; \ - uint8_t r =3D (e >> 10) & 0x1f; \ - uint8_t g =3D (e >> 5) & 0x1f; \ - uint8_t b =3D e & 0x1f; \ - \ - result.u32[i] =3D (a << 24) | (r << 16) | (g << 8) | b; \ - } \ - *r =3D result; \ - } -VUPKPX(lpx, UPKLO) -VUPKPX(hpx, UPKHI) -#undef VUPKPX =20 #define VUPK(suffix, unpacked, packee, hi) \ void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 3550ffa..09d80d6 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -1031,6 +1031,95 @@ static void trans_vclzd(DisasContext *ctx) tcg_temp_free_i64(avr); } =20 +/* + * vupkhpx VRT,VRB - Vector Unpack High Pixel + * vupklpx VRT,VRB - Vector Unpack Low Pixel + * + * Unpacks 4 pixels coded in 1-5-5-5 pattern from high/low doubleword elem= ent + * of source register into contigous array of bits in the destination regi= ster. + * Argument 'high' determines if high or low doubleword element of source + * register is processed. + */ +static void trans_vupkpx(DisasContext *ctx, int high) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + TCGv_i64 result2 =3D tcg_temp_new_i64(); + int64_t mask1 =3D 0x1fULL; + int64_t mask2 =3D 0x1fULL << 8; + int64_t mask3 =3D 0x1fULL << 16; + int64_t mask4 =3D 0xffULL << 56; + int i, j; + + if (high =3D=3D 1) { + get_avr64(avr, VB, true); + } else { + get_avr64(avr, VB, false); + } + + tcg_gen_movi_i64(result, 0x0ULL); + for (i =3D 0; i < 2; i++) { + for (j =3D 0; j < 2; j++) { + tcg_gen_shli_i64(tmp, avr, (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask1 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shli_i64(tmp, avr, 3 + (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask2 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shli_i64(tmp, avr, 6 + (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask3 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shri_i64(tmp, avr, (j * 16)); + tcg_gen_ext16s_i64(tmp, tmp); + tcg_gen_andi_i64(tmp, tmp, mask4); + tcg_gen_shri_i64(tmp, tmp, (32 * (1 - j))); + tcg_gen_or_i64(result, result, tmp); + } + if (i =3D=3D 0) { + tcg_gen_mov_i64(result1, result); + tcg_gen_movi_i64(result, 0x0ULL); + tcg_gen_shri_i64(avr, avr, 32); + } + if (i =3D=3D 1) { + tcg_gen_mov_i64(result2, result); + } + } + + set_avr64(VT, result1, false); + set_avr64(VT, result2, true); + + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); + tcg_temp_free_i64(result2); +} + +static void gen_vupkhpx(DisasContext *ctx) +{ + if (unlikely(!ctx->altivec_enabled)) { + gen_exception(ctx, POWERPC_EXCP_VPU); + return; + } + trans_vupkpx(ctx, 1); +} + +static void gen_vupklpx(DisasContext *ctx) +{ + if (unlikely(!ctx->altivec_enabled)) { + gen_exception(ctx, POWERPC_EXCP_VPU); + return; + } + trans_vupkpx(ctx, 0); +} + GEN_VXFORM(vmuloub, 4, 0); GEN_VXFORM(vmulouh, 4, 1); GEN_VXFORM(vmulouw, 4, 2); @@ -1348,8 +1437,6 @@ GEN_VXFORM_NOA(vupkhsw, 7, 25); GEN_VXFORM_NOA(vupklsb, 7, 10); GEN_VXFORM_NOA(vupklsh, 7, 11); GEN_VXFORM_NOA(vupklsw, 7, 27); -GEN_VXFORM_NOA(vupkhpx, 7, 13); -GEN_VXFORM_NOA(vupklpx, 7, 15); GEN_VXFORM_NOA_ENV(vrefp, 5, 4); GEN_VXFORM_NOA_ENV(vrsqrtefp, 5, 5); GEN_VXFORM_NOA_ENV(vexptefp, 5, 6); --=20 2.7.4