From nobody Tue Feb 10 02:43:25 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1571311924; cv=none; d=zoho.com; s=zohoarc; b=NN69lFyUipsPbEvHKEsTc2bOX8siwWTAMwu1Qr2JH4IOyEhUPAfx7aBe6iRIWw9GTVRYScfW6lElBdZPysWrysF8Hj7C5MFiYF6qXWkACyNB11FKUSG6cdiplvw/Q7tMZfFGgixsF8vazcs0FWyfjXJhF82szIdNfbCAbajDkKo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1571311924; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=Ot9xxjD6iLhZqWk+ulBrGMolJc/eepuYYipL6cOE3dM=; b=Ey+mezIAJT9Jib7K8wanRSovuFjrhZU5c/+mMwmzjwfTMd0jjxJ8tgRm/bIWisTQpn64b65YH7MEulVUPQa4xkv0AuxKEsHL6wXI3D6mPFMRgOBsPXK/B5rDCdZdaYfWVXliV5A7ZHko0CMszYkRLeLMArg3AiJ1wXFMVDDPQ4M= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1571311924895526.7868219746706; Thu, 17 Oct 2019 04:32:04 -0700 (PDT) Received: from localhost ([::1]:44310 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iL40h-00044l-Rw for importer@patchew.org; Thu, 17 Oct 2019 07:32:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46647) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iL3wc-0008FA-MM for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iL3wb-0007KC-54 for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:50 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33347 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iL3wa-0007Iq-RF for qemu-devel@nongnu.org; Thu, 17 Oct 2019 07:27:49 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 829CD1A21BA; Thu, 17 Oct 2019 13:27:43 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.14.77]) by mail.rt-rk.com (Postfix) with ESMTPSA id 1312C1A21DA; Thu, 17 Oct 2019 13:27:43 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Subject: [PATCH v7 3/3] target/ppc: Optimize emulation of vupkhpx and vupklpx instructions Date: Thu, 17 Oct 2019 13:27:39 +0200 Message-Id: <1571311659-15556-4-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571311659-15556-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1571311659-15556-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 89.216.37.149 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: stefan.brankovic@rt-rk.com, richard.henderson@linaro.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" 'trans_vupkpx' function implements both vupkhpx and vupklpx instructions wi= th argument 'high' determine which instruction is processed. Instructions are implemented in two 'for' loops. Outer 'for' loop repeats unpacking two time= s, since both doubleword elements of destination register are formed the same = way. It also stores result of every iteration in temporary register, that is lat= er transferred to destination register. Inner 'for' loop does unpacking of pix= els and forms resulting doubleword 32 by 32 bits. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 20 -------- target/ppc/translate/vmx-impl.inc.c | 91 +++++++++++++++++++++++++++++++++= +++- 3 files changed, 89 insertions(+), 24 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index b489b38..fd06b56 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -233,8 +233,6 @@ DEF_HELPER_2(vextsh2d, void, avr, avr) DEF_HELPER_2(vextsw2d, void, avr, avr) DEF_HELPER_2(vnegw, void, avr, avr) DEF_HELPER_2(vnegd, void, avr, avr) -DEF_HELPER_2(vupkhpx, void, avr, avr) -DEF_HELPER_2(vupklpx, void, avr, avr) DEF_HELPER_2(vupkhsb, void, avr, avr) DEF_HELPER_2(vupkhsh, void, avr, avr) DEF_HELPER_2(vupkhsw, void, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index f910c11..9ee667d 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1737,26 +1737,6 @@ void helper_vsum4ubs(CPUPPCState *env, ppc_avr_t *r,= ppc_avr_t *a, ppc_avr_t *b) #define UPKHI 0 #define UPKLO 1 #endif -#define VUPKPX(suffix, hi) \ - void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ - { \ - int i; \ - ppc_avr_t result; \ - \ - for (i =3D 0; i < ARRAY_SIZE(r->u32); i++) { \ - uint16_t e =3D b->u16[hi ? i : i + 4]; \ - uint8_t a =3D (e >> 15) ? 0xff : 0; \ - uint8_t r =3D (e >> 10) & 0x1f; \ - uint8_t g =3D (e >> 5) & 0x1f; \ - uint8_t b =3D e & 0x1f; \ - \ - result.u32[i] =3D (a << 24) | (r << 16) | (g << 8) | b; \ - } \ - *r =3D result; \ - } -VUPKPX(lpx, UPKLO) -VUPKPX(hpx, UPKHI) -#undef VUPKPX =20 #define VUPK(suffix, unpacked, packee, hi) \ void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 3550ffa..09d80d6 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -1031,6 +1031,95 @@ static void trans_vclzd(DisasContext *ctx) tcg_temp_free_i64(avr); } =20 +/* + * vupkhpx VRT,VRB - Vector Unpack High Pixel + * vupklpx VRT,VRB - Vector Unpack Low Pixel + * + * Unpacks 4 pixels coded in 1-5-5-5 pattern from high/low doubleword elem= ent + * of source register into contigous array of bits in the destination regi= ster. + * Argument 'high' determines if high or low doubleword element of source + * register is processed. + */ +static void trans_vupkpx(DisasContext *ctx, int high) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + TCGv_i64 result2 =3D tcg_temp_new_i64(); + int64_t mask1 =3D 0x1fULL; + int64_t mask2 =3D 0x1fULL << 8; + int64_t mask3 =3D 0x1fULL << 16; + int64_t mask4 =3D 0xffULL << 56; + int i, j; + + if (high =3D=3D 1) { + get_avr64(avr, VB, true); + } else { + get_avr64(avr, VB, false); + } + + tcg_gen_movi_i64(result, 0x0ULL); + for (i =3D 0; i < 2; i++) { + for (j =3D 0; j < 2; j++) { + tcg_gen_shli_i64(tmp, avr, (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask1 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shli_i64(tmp, avr, 3 + (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask2 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shli_i64(tmp, avr, 6 + (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask3 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shri_i64(tmp, avr, (j * 16)); + tcg_gen_ext16s_i64(tmp, tmp); + tcg_gen_andi_i64(tmp, tmp, mask4); + tcg_gen_shri_i64(tmp, tmp, (32 * (1 - j))); + tcg_gen_or_i64(result, result, tmp); + } + if (i =3D=3D 0) { + tcg_gen_mov_i64(result1, result); + tcg_gen_movi_i64(result, 0x0ULL); + tcg_gen_shri_i64(avr, avr, 32); + } + if (i =3D=3D 1) { + tcg_gen_mov_i64(result2, result); + } + } + + set_avr64(VT, result1, false); + set_avr64(VT, result2, true); + + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); + tcg_temp_free_i64(result2); +} + +static void gen_vupkhpx(DisasContext *ctx) +{ + if (unlikely(!ctx->altivec_enabled)) { + gen_exception(ctx, POWERPC_EXCP_VPU); + return; + } + trans_vupkpx(ctx, 1); +} + +static void gen_vupklpx(DisasContext *ctx) +{ + if (unlikely(!ctx->altivec_enabled)) { + gen_exception(ctx, POWERPC_EXCP_VPU); + return; + } + trans_vupkpx(ctx, 0); +} + GEN_VXFORM(vmuloub, 4, 0); GEN_VXFORM(vmulouh, 4, 1); GEN_VXFORM(vmulouw, 4, 2); @@ -1348,8 +1437,6 @@ GEN_VXFORM_NOA(vupkhsw, 7, 25); GEN_VXFORM_NOA(vupklsb, 7, 10); GEN_VXFORM_NOA(vupklsh, 7, 11); GEN_VXFORM_NOA(vupklsw, 7, 27); -GEN_VXFORM_NOA(vupkhpx, 7, 13); -GEN_VXFORM_NOA(vupklpx, 7, 15); GEN_VXFORM_NOA_ENV(vrefp, 5, 4); GEN_VXFORM_NOA_ENV(vrsqrtefp, 5, 5); GEN_VXFORM_NOA_ENV(vexptefp, 5, 6); --=20 2.7.4