From nobody Mon Feb 9 19:42:25 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1571839450; cv=none; d=zoho.com; s=zohoarc; b=jvS/iok4uNzNjnej2WqPP3et2Va4e9P66KHsrhDGET15AcplLOpd8XEdnXZRgihCiq7+Q2YnUTnTrvpzsy74RoybsmVZrB8kJaCav4V969PskS2jVUHdQGokyRJM1eiIVyvzruYVEQJTPutrjed6p62olEm5G8isGUJSME2nqW4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1571839450; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=bh4kzkMKQqwVFERidL0K7v8fosHml5gWTxmzGGML5D8=; b=O8ZKUhrRs9cMXUBvY2aRyJqOMX8en4ajojnWHhv/ih7YAtuduGThhM4vtxOoxJAH88oBh3Hi1chlqqXkyw1MO1HnaFaiVSzy3cR5QSkPjba0uIJvox7a8LnSzfxVmPX0VA6/nn4PMFqld6DOGy1kr0kvE8F6+IEUW2rg41rBTRU= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1571839450185536.2221617738328; Wed, 23 Oct 2019 07:04:10 -0700 (PDT) Received: from localhost ([::1]:36898 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iNHFA-0004Fu-9Y for importer@patchew.org; Wed, 23 Oct 2019 10:04:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51797) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iNHBJ-0002ZK-Ps for qemu-devel@nongnu.org; Wed, 23 Oct 2019 10:00:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iNHBH-0004pS-NN for qemu-devel@nongnu.org; Wed, 23 Oct 2019 10:00:09 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:46661 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iNHBH-0004n1-23 for qemu-devel@nongnu.org; Wed, 23 Oct 2019 10:00:07 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id D53791A227D; Wed, 23 Oct 2019 16:00:02 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.14.77]) by mail.rt-rk.com (Postfix) with ESMTPSA id 81C8E1A21DA; Wed, 23 Oct 2019 16:00:02 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Subject: [PATCH v9 3/3] target/ppc: Optimize emulation of vupkhpx and vupklpx instructions Date: Wed, 23 Oct 2019 15:59:56 +0200 Message-Id: <1571839196-1739-4-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571839196-1739-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1571839196-1739-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 89.216.37.149 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aleksandar.markovic@rt-rk.com, stefan.brankovic@rt-rk.com, richard.henderson@linaro.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize altivec instructions vupkhpx and vupklpx (Vector Unpack High/Low Pixel). Unpacks 4 pixels coded in 1-5-5-5 pattern from source register into a contigous array of bits in the destination register. 'trans_vupkpx' function implements emulation of both vupkhpx and vupklpx instructions, while its argument 'high' determines which instruction is processed. Instructions are implemented in two 'for' loops. Outer 'for' loop repeats unpacking two times, since both doubleword elements of the destination register are formed the same way. It also stores result of every iteration in a temporary variable 'result', that is later transferred to the destination register. Inner 'for' loop does unpacking of pixels in two iterations. Each iteration takes 16 bits from source register and unpacks them into 32 bits of the destination register. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 20 --------- target/ppc/translate/vmx-impl.inc.c | 82 +++++++++++++++++++++++++++++++++= +++- 3 files changed, 80 insertions(+), 24 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index b489b38..fd06b56 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -233,8 +233,6 @@ DEF_HELPER_2(vextsh2d, void, avr, avr) DEF_HELPER_2(vextsw2d, void, avr, avr) DEF_HELPER_2(vnegw, void, avr, avr) DEF_HELPER_2(vnegd, void, avr, avr) -DEF_HELPER_2(vupkhpx, void, avr, avr) -DEF_HELPER_2(vupklpx, void, avr, avr) DEF_HELPER_2(vupkhsb, void, avr, avr) DEF_HELPER_2(vupkhsh, void, avr, avr) DEF_HELPER_2(vupkhsw, void, avr, avr) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index f910c11..9ee667d 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1737,26 +1737,6 @@ void helper_vsum4ubs(CPUPPCState *env, ppc_avr_t *r,= ppc_avr_t *a, ppc_avr_t *b) #define UPKHI 0 #define UPKLO 1 #endif -#define VUPKPX(suffix, hi) \ - void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ - { \ - int i; \ - ppc_avr_t result; \ - \ - for (i =3D 0; i < ARRAY_SIZE(r->u32); i++) { \ - uint16_t e =3D b->u16[hi ? i : i + 4]; \ - uint8_t a =3D (e >> 15) ? 0xff : 0; \ - uint8_t r =3D (e >> 10) & 0x1f; \ - uint8_t g =3D (e >> 5) & 0x1f; \ - uint8_t b =3D e & 0x1f; \ - \ - result.u32[i] =3D (a << 24) | (r << 16) | (g << 8) | b; \ - } \ - *r =3D result; \ - } -VUPKPX(lpx, UPKLO) -VUPKPX(hpx, UPKHI) -#undef VUPKPX =20 #define VUPK(suffix, unpacked, packee, hi) \ void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 787008d..c246880 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -670,6 +670,84 @@ static void trans_vpkpx(DisasContext *ctx) } =20 /* + * vupkhpx VRT,VRB - Vector Unpack High Pixel + * vupklpx VRT,VRB - Vector Unpack Low Pixel + * + * Unpacks 4 pixels coded in 1-5-5-5 pattern from high/low doubleword elem= ent + * of source register into contigous array of bits in the destination regi= ster. + * Argument 'high' determines if high or low doubleword element of source + * register is processed. + */ +static void trans_vupkpx(DisasContext *ctx, bool high) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + int64_t mask1 =3D 0x1fULL; + int64_t mask2 =3D 0x1fULL << 8; + int64_t mask3 =3D 0x1fULL << 16; + int64_t mask4 =3D 0xffULL << 56; + int i, j; + + if (high =3D=3D true) { + /* vupkhpx */ + get_avr64(avr, VB, true); + } else { + /* vupklpx */ + get_avr64(avr, VB, false); + } + + tcg_gen_movi_i64(result, 0x0ULL); + for (i =3D 0; i < 2; i++) { + for (j =3D 0; j < 2; j++) { + tcg_gen_shli_i64(tmp, avr, (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask1 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shli_i64(tmp, avr, 3 + (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask2 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shli_i64(tmp, avr, 6 + (j * 16)); + tcg_gen_andi_i64(tmp, tmp, mask3 << (j * 32)); + tcg_gen_or_i64(result, result, tmp); + + tcg_gen_shri_i64(tmp, avr, (j * 16)); + tcg_gen_ext16s_i64(tmp, tmp); + tcg_gen_andi_i64(tmp, tmp, mask4); + tcg_gen_shri_i64(tmp, tmp, (32 * (1 - j))); + tcg_gen_or_i64(result, result, tmp); + } + if (i =3D=3D 0) { + tcg_gen_mov_i64(result1, result); + tcg_gen_movi_i64(result, 0x0ULL); + tcg_gen_shri_i64(avr, avr, 32); + } + } + + set_avr64(VT, result1, false); + set_avr64(VT, result, true); + + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(result1); +} + +static void trans_vupkhpx(DisasContext *ctx) +{ + trans_vupkpx(ctx, true); +} + +static void trans_vupklpx(DisasContext *ctx) +{ + trans_vupkpx(ctx, false); +} + +/* * vsl VRT,VRA,VRB - Vector Shift Left * * Shifting left 128 bit value of vA by value specified in bits 125-127 of= vB. @@ -1338,8 +1416,8 @@ GEN_VXFORM_NOA(vupkhsw, 7, 25); GEN_VXFORM_NOA(vupklsb, 7, 10); GEN_VXFORM_NOA(vupklsh, 7, 11); GEN_VXFORM_NOA(vupklsw, 7, 27); -GEN_VXFORM_NOA(vupkhpx, 7, 13); -GEN_VXFORM_NOA(vupklpx, 7, 15); +GEN_VXFORM_TRANS(vupkhpx, 7, 13); +GEN_VXFORM_TRANS(vupklpx, 7, 15); GEN_VXFORM_NOA_ENV(vrefp, 5, 4); GEN_VXFORM_NOA_ENV(vrsqrtefp, 5, 5); GEN_VXFORM_NOA_ENV(vexptefp, 5, 6); --=20 2.7.4