From nobody Sat Apr 27 22:20:52 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1571834244; cv=none; d=zoho.com; s=zohoarc; b=XEXCHKWlM1IY+ELLOSl9t0lLMKEjgBqnDlnpx8I6AJBLOUgv74cO0vdgctTwQmoS+mH+5rOf4Aq8CXssXUQq4RId83mpyOu3awv0XpvwZxfcnB4s//I2QJcpnL5cFWyN8Ag3ShymrTlWgDYCp1Gip7foOhkpPZZ27FL2+qHdAEE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1571834244; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=fAKhSYY8/K9zPFduYyPqAd9AGPJRC7AQ5xxXLH/t2Fs=; b=HAs4heeQ8xMFl4yQvFCBOqmpDlY7Eow6NMFUHhbOiKE10YKmyFy52i04WxNyjqPns/NAPOFjoYs8K0pTmCWQ241Qvg7E9zahXMkK2SPUAgGEZu8SHd+kwl9Rc4eIL7XZnwiEEQcljVnbWhpTHVNfcxPxFd6fsTQW6bgNHr+IGXw= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1571834244472748.4339279582853; Wed, 23 Oct 2019 05:37:24 -0700 (PDT) Received: from localhost ([::1]:34818 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iNFtD-0001f6-6V for importer@patchew.org; Wed, 23 Oct 2019 08:37:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35234) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iNFnM-0005gC-7S for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iNFnG-0001Fw-Lz for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:20 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:36666 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iNFnG-0000vl-Ay for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:14 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 03C861A23B4; Wed, 23 Oct 2019 14:30:08 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.14.77]) by mail.rt-rk.com (Postfix) with ESMTPSA id B84431A23AB; Wed, 23 Oct 2019 14:30:07 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Subject: [PATCH v8 1/3] target/ppc: Optimize emulation of vclzh and vclzb instructions Date: Wed, 23 Oct 2019 14:30:02 +0200 Message-Id: <1571833804-31334-2-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571833804-31334-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1571833804-31334-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 89.216.37.149 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aleksandar.markovic@rt-rk.com, stefan.brankovic@rt-rk.com, richard.henderson@linaro.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize emulation of Altivec instructions vclzh (Vector Count Leading Zeros Halfword) and vclzb (Vector Count Leading Zeros Byte).This instructions count the number of leading zeros of each halfword/byte element in source register and place result in the appropriate halfword/byte element of the destination register. Emulation of vclzh instruction is implemented in two 'for' loops. In each iteration of the outer 'for' loop count operation is performed on one doubleword element of source register vB. In the first iteration, a higher doubleword element of vB is placed in variable 'avr', and then count= ing for every halfword element is performed by using 'tcg_gen_clzi_i64'. Since it counts leading zeros on 64 bit lenght, ith halword element has to be moved to the highest 16 bits of variable 'tmp', or-ed with 'mask'(in ord= er to get all ones in the lowest 48 bits), then perform 'tcg_gen_clzi_i64' and move it's result in the appropriate halfword element of variable 'result'. This is done in inner 'for' loop. After the operation is finished, the 'res= ult' is saved in the appropriate doubleword element of the destination register = vD. The same sequence of orders is to be applied again to the lower doubleword element of vB. Emulation of vclzb instruction is implemented in two 'for' loops. In each iteration of the outer 'for' loop count operation is performed on one doubleword element of source register vB. In the first iteration, the higher doubleword element of vB is placed in variable 'avr', and then count= ing for every byte element is performed using 'tcg_gen_clzi_i64'. Since it coun= ts leading zeros on 64 bit length, ith byte element has to be moved to the highest 8 bits of variable 'tmp', or-ed with 'mask'(in order to get all ones in the lowest 56 bits), then perform 'tcg_gen_clzi_i64' and move it's result in the appropriate byte element of variable 'result'. This is done in inner 'for' loop. After the operation is finished, the 'result' is saved in the appropriate doubleword element of the destination register vD. The same seq= uence of orders is to be applied again for the lower doubleword element of vB. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 132 ++++++++++++++++++++++++++++++++= +++- 3 files changed, 130 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index f843814..281e54f 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -308,8 +308,6 @@ DEF_HELPER_4(vcfsx, void, env, avr, avr, i32) DEF_HELPER_4(vctuxs, void, env, avr, avr, i32) DEF_HELPER_4(vctsxs, void, env, avr, avr, i32) =20 -DEF_HELPER_2(vclzb, void, avr, avr) -DEF_HELPER_2(vclzh, void, avr, avr) DEF_HELPER_2(vctzb, void, avr, avr) DEF_HELPER_2(vctzh, void, avr, avr) DEF_HELPER_2(vctzw, void, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 6d238b9..cd00f5e 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1817,15 +1817,6 @@ VUPK(lsw, s64, s32, UPKLO) } \ } =20 -#define clzb(v) ((v) ? clz32((uint32_t)(v) << 24) : 8) -#define clzh(v) ((v) ? clz32((uint32_t)(v) << 16) : 16) - -VGENERIC_DO(clzb, u8) -VGENERIC_DO(clzh, u16) - -#undef clzb -#undef clzh - #define ctzb(v) ((v) ? ctz32(v) : 8) #define ctzh(v) ((v) ? ctz32(v) : 16) #define ctzw(v) ctz32((v)) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 2472a52..3ad425a 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -751,6 +751,134 @@ static void trans_vgbbd(DisasContext *ctx) } =20 /* + * vclzb VRT,VRB - Vector Count Leading Zeros Byte + * + * Counting the number of leading zero bits of each byte element in source + * register and placing result in appropriate byte element of destination + * register. + */ +static void trans_vclzb(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffffffffULL); + int i, j; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword of vB in 'avr'. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword of vB in 'avr'. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every byte element using 'tcg_gen_clzi_i64'. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith byte element to highest 8 bits of 'tmp', or it with mask(so= we + * get all ones in lowest 56 bits), then perform 'tcg_gen_clzi_i64= ' and + * move it's result in appropriate byte element of result. + */ + /* count leading zeroes for bits 0..8 */ + tcg_gen_shli_i64(tmp, avr, 56); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + for (j =3D 1; j < 7; j++) { + /* count leading zeroes for bits 8*j..8*j+7 */ + tcg_gen_shli_i64(tmp, avr, (7 - j) * 8); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, j * 8, 8); + } + /* count leading zeroes for bits 56..63 */ + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 56, 8); + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + tcg_gen_mov_i64(result1, result); + } + } + + set_avr64(VT, result1, true); + set_avr64(VT, result, false); + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* + * vclzh VRT,VRB - Vector Count Leading Zeros Halfword + * + * Counting the number of leading zero bits of each halfword element in so= urce + * register and placing result in appropriate halfword element of destinat= ion + * register. + */ +static void trans_vclzh(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffffffULL); + int i, j; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword element of vB in 'avr'. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword element of vB in 'avr'. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every halfword element using 'tcg_gen_clzi_i6= 4'. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith byte element to highest 16 bits of 'tmp', or it with mask(s= o we + * get all ones in lowest 48 bits), then perform 'tcg_gen_clzi_i64= ' and + * move it's result in appropriate halfword element of result. + */ + /* count leading zeroes for bits 0..16 */ + tcg_gen_shli_i64(tmp, avr, 48); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + for (j =3D 1; j < 3; j++) { + /* count leading zeroes for bits 16*j..16*j+15 */ + tcg_gen_shli_i64(tmp, avr, (3 - j) * 16); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, j * 16, 16); + } + /* count leading zeroes for bits 48..63 */ + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 48, 16); + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + tcg_gen_mov_i64(result1, result); + } + } + + set_avr64(VT, result1, true); + set_avr64(VT, result, false); + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* * vclzw VRT,VRB - Vector Count Leading Zeros Word * * Counting the number of leading zero bits of each word element in source @@ -1315,8 +1443,8 @@ GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20) GEN_VAFORM_PAIRED(vsel, vperm, 21) GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) =20 -GEN_VXFORM_NOA(vclzb, 1, 28) -GEN_VXFORM_NOA(vclzh, 1, 29) +GEN_VXFORM_TRANS(vclzb, 1, 28) +GEN_VXFORM_TRANS(vclzh, 1, 29) GEN_VXFORM_TRANS(vclzw, 1, 30) GEN_VXFORM_TRANS(vclzd, 1, 31) GEN_VXFORM_NOA_2(vnegw, 1, 24, 6) --=20 2.7.4 From nobody Sat Apr 27 22:20:52 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1571834138; cv=none; d=zoho.com; s=zohoarc; b=HtNkS9zfh5pdWrWtL7toKIWKvRyeYgvKTYi4GHJmvs3OG9sc5jMa66mEANuBHaFNmVjDwkMIfsgyKZIVUQiRdf+lNoy2wXoPYjIvDKHlzf6/NdxkGHAhwV8FXTZ1TeiGjtaeyIUVp01voWmN198PMHlGgVb2DQgXK0+ARTt+JVQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1571834138; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=tvHzOk8iadtGWDq9PM7s/93DqcAynzMhZIXxKW7Wvcs=; b=CDOBvp/TcmXlB/Lh0DPccKvwpVIt3ffZD97VdE8FMgxW/bRqORw8OSFx7KagStfuNjK1oBmsDSrMpFra/DV8BGKIiFf330GyzAEMt/GcpQsVE6Qz3d5bjyruJbHoqCSO3yHeVhTrOTNA8OApeY/agnHpUTl8C/97ZorLJ7J/T5A= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1571834138955472.58030101351176; Wed, 23 Oct 2019 05:35:38 -0700 (PDT) Received: from localhost ([::1]:34814 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iNFrU-00008j-Je for importer@patchew.org; Wed, 23 Oct 2019 08:35:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35203) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iNFnI-0005ZS-C5 for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iNFnG-0001Fp-H6 for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:16 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:36716 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iNFnG-0000w5-6a for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:14 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 1657E1A23B7; Wed, 23 Oct 2019 14:30:08 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.14.77]) by mail.rt-rk.com (Postfix) with ESMTPSA id C42501A23AC; Wed, 23 Oct 2019 14:30:07 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Subject: [PATCH v8 2/3] target/ppc: Optimize emulation of vpkpx instruction Date: Wed, 23 Oct 2019 14:30:03 +0200 Message-Id: <1571833804-31334-3-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571833804-31334-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1571833804-31334-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 89.216.37.149 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aleksandar.markovic@rt-rk.com, stefan.brankovic@rt-rk.com, richard.henderson@linaro.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize altivec instruction vpkpx (Vector Pack Pixel). Rearranges 8 pixels coded in 6-5-5 pattern (4 from each source register) into a contigous array of bits in the destination register. In each iteration of outer loop, the instruction is to be done with the 6-5-5 pack for 2 pixels of each doubleword element of each source register. The first thing to be done in outer loop is choosing which doubleword element of which register is to be used in the current iteration and it is to be placed in 'avr' variable. The next step is to perform 6-5-5 pack of pixels on 'avr' variable in inner 'for' loop(2 iterations, 1 for each pixel) and save result in 'tmp' variable. At the end of the outer 'for' loop, the result is merged in the variable called 'result' and saved in the appropriate doubleword element of vD if the whole doubleword is finished(every second iteration). The outer loop has 4 iterations. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 1 - target/ppc/int_helper.c | 21 --------- target/ppc/translate/vmx-impl.inc.c | 93 +++++++++++++++++++++++++++++++++= +++- 3 files changed, 92 insertions(+), 23 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 281e54f..b489b38 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -258,7 +258,6 @@ DEF_HELPER_4(vpkudus, void, env, avr, avr, avr) DEF_HELPER_4(vpkuhum, void, env, avr, avr, avr) DEF_HELPER_4(vpkuwum, void, env, avr, avr, avr) DEF_HELPER_4(vpkudum, void, env, avr, avr, avr) -DEF_HELPER_3(vpkpx, void, avr, avr, avr) DEF_HELPER_5(vmhaddshs, void, env, avr, avr, avr, avr) DEF_HELPER_5(vmhraddshs, void, env, avr, avr, avr, avr) DEF_HELPER_5(vmsumuhm, void, env, avr, avr, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index cd00f5e..f910c11 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1262,27 +1262,6 @@ void helper_vpmsumd(ppc_avr_t *r, ppc_avr_t *a, ppc_= avr_t *b) #else #define PKBIG 0 #endif -void helper_vpkpx(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) -{ - int i, j; - ppc_avr_t result; -#if defined(HOST_WORDS_BIGENDIAN) - const ppc_avr_t *x[2] =3D { a, b }; -#else - const ppc_avr_t *x[2] =3D { b, a }; -#endif - - VECTOR_FOR_INORDER_I(i, u64) { - VECTOR_FOR_INORDER_I(j, u32) { - uint32_t e =3D x[i]->u32[j]; - - result.u16[4 * i + j] =3D (((e >> 9) & 0xfc00) | - ((e >> 6) & 0x3e0) | - ((e >> 3) & 0x1f)); - } - } - *r =3D result; -} =20 #define VPK(suffix, from, to, cvt, dosat) \ void helper_vpk##suffix(CPUPPCState *env, ppc_avr_t *r, \ diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 3ad425a..dcb6fd9 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -579,6 +579,97 @@ static void trans_lvsr(DisasContext *ctx) } =20 /* + * vpkpx VRT,VRA,VRB - Vector Pack Pixel + * + * Rearranges 8 pixels coded in 6-5-5 pattern (4 from each source register) + * into contigous array of bits in the destination register. + */ +static void trans_vpkpx(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VA =3D rA(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 shifted =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + int64_t mask1 =3D 0x1fULL; + int64_t mask2 =3D 0x1fULL << 5; + int64_t mask3 =3D 0x3fULL << 10; + int i, j; + /* + * In each iteration do the 6-5-5 pack for 2 pixels of each doubleword + * element of each source register. + */ + for (i =3D 0; i < 4; i++) { + switch (i) { + case 0: + /* + * Get high doubleword of vA to perform 6-5-5 pack of pixels + * 1 and 2. + */ + get_avr64(avr, VA, true); + tcg_gen_movi_i64(result, 0x0ULL); + break; + case 1: + /* + * Get low doubleword of vA to perform 6-5-5 pack of pixels + * 3 and 4. + */ + get_avr64(avr, VA, false); + break; + case 2: + /* + * Get high doubleword of vB to perform 6-5-5 pack of pixels + * 5 and 6. + */ + get_avr64(avr, VB, true); + tcg_gen_movi_i64(result, 0x0ULL); + break; + case 3: + /* + * Get low doubleword of vB to perform 6-5-5 pack of pixels + * 7 and 8. + */ + get_avr64(avr, VB, false); + break; + } + /* Perform the packing for 2 pixels(each iteration for 1). */ + tcg_gen_movi_i64(tmp, 0x0ULL); + for (j =3D 0; j < 2; j++) { + tcg_gen_shri_i64(shifted, avr, (j * 16 + 3)); + tcg_gen_andi_i64(shifted, shifted, mask1 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + + tcg_gen_shri_i64(shifted, avr, (j * 16 + 6)); + tcg_gen_andi_i64(shifted, shifted, mask2 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + + tcg_gen_shri_i64(shifted, avr, (j * 16 + 9)); + tcg_gen_andi_i64(shifted, shifted, mask3 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + } + if ((i =3D=3D 0) || (i =3D=3D 2)) { + tcg_gen_shli_i64(tmp, tmp, 32); + } + tcg_gen_or_i64(result, result, tmp); + if (i =3D=3D 1) { + /* Place packed pixels 1:4 to high doubleword of vD. */ + tcg_gen_mov_i64(result1, result); + } + } + set_avr64(VT, result1, true); + set_avr64(VT, result, false); + + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(shifted); + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); +} + +/* * vsl VRT,VRA,VRB - Vector Shift Left * * Shifting left 128 bit value of vA by value specified in bits 125-127 of= vB. @@ -1059,7 +1150,7 @@ GEN_VXFORM_ENV(vpksdus, 7, 21); GEN_VXFORM_ENV(vpkshss, 7, 6); GEN_VXFORM_ENV(vpkswss, 7, 7); GEN_VXFORM_ENV(vpksdss, 7, 23); -GEN_VXFORM(vpkpx, 7, 12); +GEN_VXFORM_TRANS(vpkpx, 7, 12); GEN_VXFORM_ENV(vsum4ubs, 4, 24); GEN_VXFORM_ENV(vsum4sbs, 4, 28); GEN_VXFORM_ENV(vsum4shs, 4, 25); --=20 2.7.4 From nobody Sat Apr 27 22:20:52 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1571834329; cv=none; d=zoho.com; s=zohoarc; b=T4nhrhQKjSy5fq2aSTjgt0zNpwxB2PHzMAbUEV0i/MQ3NELVaJW9TSsD+2IY5tpz2XQAwwWIK0ncx7UddtB9WJaENg3vzMMnyWEr8DlQ85x5ioZU4LTOAudbqq4EBrAOiksCUFw7QMh+jvnmLImiUlQ2gVsNh5RfjnLlIEJZiC0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1571834329; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=dgn/kTsOxVZNnf1JP9WQi1VcLwq9QTEInNN6pe2B/wc=; b=CUKLQmuO9xuPpe1w28PIb93625qnvHPcmZXJIASrtM3qZj+A3Ej5aKQj4/QZp4bHdWZF5EKeSOh7l+kNVW+Wa3BtqP4wNYHx29gA1Mz3RtFFpZ4a6H8XRC1BGPQl0jdB/ksE+XEdRTCxqVJQKg9dYNenUElCkvDDMY1sSABoeKI= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1571834329545875.8755034929011; Wed, 23 Oct 2019 05:38:49 -0700 (PDT) Received: from localhost ([::1]:34876 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iNFuZ-0003yK-Lq for importer@patchew.org; Wed, 23 Oct 2019 08:38:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35201) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iNFnI-0005ZO-Ax for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iNFnG-0001G1-MZ for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:16 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:36766 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iNFnG-0000wI-BH for qemu-devel@nongnu.org; Wed, 23 Oct 2019 08:31:14 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 3BDD91A22D4; Wed, 23 Oct 2019 14:30:08 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.14.77]) by mail.rt-rk.com (Postfix) with ESMTPSA id CD2F31A23AE; Wed, 23 Oct 2019 14:30:07 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Subject: [PATCH v8 3/3] target/ppc: Optimize emulation of vupkhpx and vupklpx instructions Date: Wed, 23 Oct 2019 14:30:04 +0200 Message-Id: <1571833804-31334-4-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571833804-31334-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1571833804-31334-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 89.216.37.149 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aleksandar.markovic@rt-rk.com, stefan.brankovic@rt-rk.com, richard.henderson@linaro.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" 'trans_vupkpx' function implements emulation of both vupkhpx and vupklpx instructions, while its argument 'high' determines which instruction is processed. Instructions are implemented in two 'for' loops. Outer 'for' loop repeats unpacking two times, since both doubleword elements of the destination register are formed the same way. It also stores result of every iteration in a temporary variable 'result', that is later transferred to the destination register. Inner 'for' loop does unpacking of pixels in two iterations. Each iteration takes 16 bits from source register and unpacks them into 32 bits of the destination register. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 20 --------- target/ppc/translate/vmx-impl.inc.c | 82 +++++++++++++++++++++++++++++++++= +++- 3 files changed, 80 insertions(+), 24 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index b489b38..fd06b56 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -233,8 +233,6 @@ DEF_HELPER_2(vextsh2d, void, avr, avr) DEF_HELPER_2(vextsw2d, void, avr, avr) DEF_HELPER_2(vnegw, void, avr, avr) DEF_HELPER_2(vnegd, void, avr, avr) -DEF_HELPER_2(vupkhpx, void, avr, avr) -DEF_HELPER_2(vupklpx, void, avr, avr) DEF_HELPER_2(vupkhsb, void, avr, avr) DEF_HELPER_2(vupkhsh, void, avr, avr) DEF_HELPER_2(vupkhsw, void, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index f910c11..9ee667d 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1737,26 +1737,6 @@ void helper_vsum4ubs(CPUPPCState *env, ppc_avr_t *r,= ppc_avr_t *a, ppc_avr_t *b) #define UPKHI 0 #define UPKLO 1 #endif -#define VUPKPX(suffix, hi) \ - void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ - { \ - int i; \ - ppc_avr_t result; \ - \ - for (i =3D 0; i < ARRAY_SIZE(r->u32); i++) { \ - uint16_t e =3D b->u16[hi ? i : i + 4]; \ - uint8_t a =3D (e >> 15) ? 0xff : 0; \ - uint8_t r =3D (e >> 10) & 0x1f; \ - uint8_t g =3D (e >> 5) & 0x1f; \ - uint8_t b =3D e & 0x1f; \ - \ - result.u32[i] =3D (a << 24) | (r << 16) | (g << 8) | b; \ - } \ - *r =3D result; \ - } -VUPKPX(lpx, UPKLO) -VUPKPX(hpx, UPKHI) -#undef VUPKPX =20 #define VUPK(suffix, unpacked, packee, hi) \ void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index dcb6fd9..9d27d2d 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -670,6 +670,84 @@ static void trans_vpkpx(DisasContext *ctx) } =20 /* + * vupkhpx VRT,VRB - Vector Unpack High Pixel + * vupklpx VRT,VRB - Vector Unpack Low Pixel + * + * Unpacks 4 pixels coded in 1-5-5-5 pattern from high/low doubleword elem= ent + * of source register into contigous array of bits in the destination regi= ster. + * Argument 'high' determines if high or low doubleword element of source + * register is processed. + */ +static void trans_vupkpx(DisasContext *ctx, bool high) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + int64_t mask1 =3D 0x1fULL; + int64_t mask2 =3D 0x1fULL << 8; + int64_t mask3 =3D 0x1fULL << 16; + int64_t mask4 =3D 0xffULL << 56; + int i, j; + + if (high =3D=3D true) { + /* vupkhpx */ + get_avr64(avr, VB, true); + } else { + /* vupklpx */ + get_avr64(avr, VB, false); + } + + tcg_gen_movi_i64(result, 0x0ULL); + for (i =3D 0; i < 2; i++) { + for (j =3D 0; j < 2; j++) { + tcg_gen_shli_i64(tmp, avr, (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask1 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shli_i64(tmp, avr, 3 + (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask2 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shli_i64(tmp, avr, 6 + (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask3 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shri_i64(tmp, avr, (j * 16)); + tcg_gen_ext16s_i64(tmp, tmp); + tcg_gen_andi_i64(tmp, tmp, mask4); + tcg_gen_shri_i64(tmp, tmp, (32 * (1 - j))); + tcg_gen_or_i64(result, result, tmp); + } + if (i =3D=3D 0) { + tcg_gen_mov_i64(result1, result); + tcg_gen_movi_i64(result, 0x0ULL); + tcg_gen_shri_i64(avr, avr, 32); + } + } + + set_avr64(VT, result1, false); + set_avr64(VT, result, true); + + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); +} + +static void trans_vupkhpx(DisasContext *ctx) +{ + trans_vupkpx(ctx, true); +} + +static void trans_vupklpx(DisasContext *ctx) +{ + trans_vupkpx(ctx, false); +} + +/* * vsl VRT,VRA,VRB - Vector Shift Left * * Shifting left 128 bit value of vA by value specified in bits 125-127 of= vB. @@ -1338,8 +1416,8 @@ GEN_VXFORM_NOA(vupkhsw, 7, 25); GEN_VXFORM_NOA(vupklsb, 7, 10); GEN_VXFORM_NOA(vupklsh, 7, 11); GEN_VXFORM_NOA(vupklsw, 7, 27); -GEN_VXFORM_NOA(vupkhpx, 7, 13); -GEN_VXFORM_NOA(vupklpx, 7, 15); +GEN_VXFORM_TRANS(vupkhpx, 7, 13); +GEN_VXFORM_TRANS(vupklpx, 7, 15); GEN_VXFORM_NOA_ENV(vrefp, 5, 4); GEN_VXFORM_NOA_ENV(vrsqrtefp, 5, 5); GEN_VXFORM_NOA_ENV(vexptefp, 5, 6); --=20 2.7.4