From nobody Mon May 6 08:55:56 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559816348; cv=none; d=zoho.com; s=zohoarc; b=PbeLZTWDWPuVKelQExaHrlZBfnXi5Z0BOPAHwASgk0odJ8xZJeqOzw0+7JU4lrHruiJi1Vn4hQH8MLQfmSe0mKzzUNEEmtPpUGB0nGHq8/Hn5F5Yg61KKZBF+ri2gVSe7RJV0qbyzoHV3JUzANxlgFILWpGJoxjXKpBmUU7CrWk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559816348; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=N+t3B1yOIT9LNtQPpqY9/0pMy6eD+/pEp2e7fqS+tdU=; b=mg1909TKhcgehuSsuiMhlLcJDOuqGKP5YukeeO9hAe2SLowEb7sGNMsI/U7wQiz7wiugHoa3KepLmYo9bcgc+G69wvBr3dMuQFCmt0V8OJPMjWXbk3yf41XJXdsamqdAAkgx9sKRS7yYb4pkmGQC8J17HxyB8j+bRElGdbBY4Vs= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1559816348089324.50608754713653; Thu, 6 Jun 2019 03:19:08 -0700 (PDT) Received: from localhost ([127.0.0.1]:57777 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpUA-0003lP-HR for importer@patchew.org; Thu, 06 Jun 2019 06:19:06 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50007) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpRw-0002Zt-AV for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYpRu-0001rb-SN for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:48 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33631 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hYpRu-0006zl-Hz for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:46 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 829061A1FC1; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id 552FF1A1E51; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Thu, 6 Jun 2019 12:15:23 +0200 Message-Id: <1559816130-17113-2-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Type: text/plain; charset="utf-8" Adding simple macro that is calling tcg implementation of appropriate instruction if altivec support is active. Optimization of altivec instruction lvsl (Load Vector for Shift Left). Place bytes sh:sh+15 of value 0x00 || 0x01 || 0x02 || ... || 0x1E || 0x1F in destination register. Sh is calculated by adding 2 source registers and getting bits 60-63 of result. First we place bits [28-31] of EA to variable sh. After that we create bytes sh:(sh+7) of X(from description) in for loop (by incrementing sh in each iteration and placing it in appropriate byte of variable result) and save them in higher doubleword element of vD. We repeat this once again for lower doubleword element of vD by creating bytes (sh+8):(sh+15) in a for loop and saving result. Optimization of altivec instruction lvsr (Load Vector for Shift Right). Place bytes 16-sh:31-sh of value 0x00 || 0x01 || 0x02 || ... || 0x1E || 0x1F in destination register. Sh is calculated by adding 2 source registers and getting bits 60-63 of result. First we place bits [28-31] of EA to variable sh. After that we create bytes (16-sh):(23-sh) of X(from description) in for loop (by incrementing sh in each iteration and placing it in appropriate byte of variable result) and save them in higher doubleword element of vD. We repeat this once again for lower doubleword element of vD by creating bytes (24-sh):(32-sh) in a for loop and saving result. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 143 ++++++++++++++++++++++++++++----= ---- 1 file changed, 111 insertions(+), 32 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index bd3ff40..140bb05 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -142,38 +142,6 @@ GEN_VR_STVE(bx, 0x07, 0x04, 1); GEN_VR_STVE(hx, 0x07, 0x05, 2); GEN_VR_STVE(wx, 0x07, 0x06, 4); =20 -static void gen_lvsl(DisasContext *ctx) -{ - TCGv_ptr rd; - TCGv EA; - if (unlikely(!ctx->altivec_enabled)) { - gen_exception(ctx, POWERPC_EXCP_VPU); - return; - } - EA =3D tcg_temp_new(); - gen_addr_reg_index(ctx, EA); - rd =3D gen_avr_ptr(rD(ctx->opcode)); - gen_helper_lvsl(rd, EA); - tcg_temp_free(EA); - tcg_temp_free_ptr(rd); -} - -static void gen_lvsr(DisasContext *ctx) -{ - TCGv_ptr rd; - TCGv EA; - if (unlikely(!ctx->altivec_enabled)) { - gen_exception(ctx, POWERPC_EXCP_VPU); - return; - } - EA =3D tcg_temp_new(); - gen_addr_reg_index(ctx, EA); - rd =3D gen_avr_ptr(rD(ctx->opcode)); - gen_helper_lvsr(rd, EA); - tcg_temp_free(EA); - tcg_temp_free_ptr(rd); -} - static void gen_mfvscr(DisasContext *ctx) { TCGv_i32 t; @@ -316,6 +284,16 @@ static void glue(gen_, name)(DisasContext *ctx) = \ tcg_temp_free_ptr(rd); \ } =20 +#define GEN_VXFORM_TRANS(name, opc2, opc3) \ +static void glue(gen_, name)(DisasContext *ctx) \ +{ \ + if (unlikely(!ctx->altivec_enabled)) { \ + gen_exception(ctx, POWERPC_EXCP_VPU); \ + return; \ + } \ + trans_##name(ctx); \ +} + #define GEN_VXFORM_ENV(name, opc2, opc3) \ static void glue(gen_, name)(DisasContext *ctx) \ { \ @@ -515,6 +493,105 @@ static void gen_vmrgow(DisasContext *ctx) tcg_temp_free_i64(avr); } =20 +/* + * lvsl VRT,RA,RB - Load Vector for Shift Left + * + * Let the EA be the sum (rA|0)+(rB). Let sh=3DEA[28=E2=80=9331]. + * Let X be the 32-byte value 0x00 || 0x01 || 0x02 || ... || 0x1E || 0x1F. + * Bytes sh:sh+15 of X are placed into vD. + */ +static void trans_lvsl(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 sh =3D tcg_temp_new_i64(); + TCGv_i64 EA =3D tcg_temp_new(); + int i; + + /* Get sh(from description) by anding EA with 0xf. */ + gen_addr_reg_index(ctx, EA); + tcg_gen_andi_i64(sh, EA, 0xfULL); + /* + * Create bytes sh:sh+7 of X(from description) and place them in + * higher doubleword of vD. + */ + tcg_gen_addi_i64(result, sh, 7); + for (i =3D 7; i >=3D 1; i--) { + tcg_gen_shli_i64(tmp, sh, i * 8); + tcg_gen_or_i64(result, result, tmp); + tcg_gen_addi_i64(sh, sh, 1); + } + set_avr64(VT, result, true); + /* + * Create bytes sh+8:sh+15 of X(from description) and place them in + * lower doubleword of vD. + */ + tcg_gen_addi_i64(result, sh, 8); + for (i =3D 7; i >=3D 1; i--) { + tcg_gen_addi_i64(sh, sh, 1); + tcg_gen_shli_i64(tmp, sh, i * 8); + tcg_gen_or_i64(result, result, tmp); + } + set_avr64(VT, result, false); + + tcg_temp_free_i64(result); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(sh); + tcg_temp_free(EA); +} + +/* + * lvsr VRT,RA,RB - Load Vector for Shift Right + * + * Let the EA be the sum (rA|0)+(rB). Let sh=3DEA[28=E2=80=9331]. + * Let X be the 32-byte value 0x00 || 0x01 || 0x02 || ... || 0x1E || 0x1F. + * Bytes (16-sh):(31-sh) of X are placed into vD. + */ +static void trans_lvsr(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 sh =3D tcg_temp_new_i64(); + TCGv_i64 EA =3D tcg_temp_new(); + int i; + + /* Get sh(from description) by anding EA with 0xf. */ + gen_addr_reg_index(ctx, EA); + tcg_gen_andi_i64(sh, EA, 0xfULL); + /* Make (16-sh) and save it in sh. */ + tcg_gen_subi_i64(sh, sh, 0x10ULL); + tcg_gen_neg_i64(sh, sh); + /* + * Create bytes (16-sh):(23-sh) of X(from description) and place them = in + * higher doubleword of vD. + */ + tcg_gen_addi_i64(result, sh, 7); + for (i =3D 7; i >=3D 1; i--) { + tcg_gen_shli_i64(tmp, sh, i * 8); + tcg_gen_or_i64(result, result, tmp); + tcg_gen_addi_i64(sh, sh, 1); + } + set_avr64(VT, result, true); + /* + * Create bytes (24-sh):(32-sh) of X(from description) and place them = in + * lower doubleword of vD. + */ + tcg_gen_addi_i64(result, sh, 8); + for (i =3D 7; i >=3D 1; i--) { + tcg_gen_addi_i64(sh, sh, 1); + tcg_gen_shli_i64(tmp, sh, i * 8); + tcg_gen_or_i64(result, result, tmp); + } + set_avr64(VT, result, false); + + tcg_temp_free_i64(result); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(sh); + tcg_temp_free(EA); +} + GEN_VXFORM(vmuloub, 4, 0); GEN_VXFORM(vmulouh, 4, 1); GEN_VXFORM(vmulouw, 4, 2); @@ -657,6 +734,8 @@ GEN_VXFORM_DUAL(vmrgow, PPC_NONE, PPC2_ALTIVEC_207, GEN_VXFORM_HETRO(vextubrx, 6, 28) GEN_VXFORM_HETRO(vextuhrx, 6, 29) GEN_VXFORM_HETRO(vextuwrx, 6, 30) +GEN_VXFORM_TRANS(lvsl, 6, 31) +GEN_VXFORM_TRANS(lvsr, 6, 32) GEN_VXFORM_DUAL(vmrgew, PPC_NONE, PPC2_ALTIVEC_207, \ vextuwrx, PPC_NONE, PPC2_ISA300) =20 --=20 2.7.4 From nobody Mon May 6 08:55:56 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559816350; cv=none; d=zoho.com; s=zohoarc; b=axVcIL2jh5VhJM6f0Nvh4jeHUNL2v6LmDx/SsmhVw2whQUJfR/icxzOsOLIlxz161+AhadWWbGdwaXijBlueIrp+IFqLNrcU8loQ6OZyc7wgwRNOX8gec6Wl3gn0F9dpI/C9zEnBYF8s3MGhW3cErhVIBU0s5F2Z3WuNDrsoIF8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559816350; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=N+BdAt/oWfn/m6FLWNdKLt1vANTiRBHMklcqw39UuJY=; b=SED8saQ4LKJwFgbvONZ52RvdC23Gq0jZNyxMKoyqptPMBl2j4UgRKZNqS6u3EEW4uCGIsaeTuOL8PMTtfs3ecEbJeUSOhUkvFOf+wHI9TE43u9axGhLHCW2YbbrQ6Xc7GSyM5UzOEOD7D4ppDo9xI1e87cfUnmcxs2KBx3e4qpo= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (209.51.188.17 [209.51.188.17]) by mx.zohomail.com with SMTPS id 1559816350412198.80772569257374; Thu, 6 Jun 2019 03:19:10 -0700 (PDT) Received: from localhost ([127.0.0.1]:57775 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpU8-0003jy-D8 for importer@patchew.org; Thu, 06 Jun 2019 06:19:04 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50002) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpRw-0002Zn-7L for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYpRu-0001ro-TA for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:48 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33644 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hYpRu-00071C-Im for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:46 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 975251A1E51; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id 6659F1A1FC6; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Thu, 6 Jun 2019 12:15:24 +0200 Message-Id: <1559816130-17113-3-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 2/8] target/ppc: Optimize emulation of vsl and vsr instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimization of altivec instructions vsl and vsr(Vector Shift Left/Rigt). Perform shift operation (left and right respectively) on 128 bit value of register vA by value specified in bits 125-127 of register vB. Lowest 3 bits in each byte element of register vB must be identical or result is undefined. For vsl instruction we do this by first saving bits 125-127 of register vB in variable sh. Then we save highest sh bits of lower doubleword element of register vA in variable shifted, so we don't lose those bits when we perform shift operation on lower doubleword element of register vA, which is our next step. After shifting lower doubleword element we perform shift operation on higher doubleword element of vA and replace lowest sh bits(that are now 0) with bits saved in shifted. For vsr instruction we do this by first saving bits 125-127 of register vB in variable sh. Then we save lowest sh bits of higher doubleword element of register vA in variable shifted, so we don't lose those bits when we perform shift operation on higher doubleword element of register vA, which is our next step. After shifting higher doubleword element we perform shift operation on lower doubleword element of vA and replace highest sh bits(that are now 0) with bits saved in shifted. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 101 ++++++++++++++++++++++++++++++++= +++- 1 file changed, 99 insertions(+), 2 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 140bb05..6bd072a 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -592,6 +592,103 @@ static void trans_lvsr(DisasContext *ctx) tcg_temp_free(EA); } =20 +/* + * vsl VRT,VRA,VRB - Vector Shift Left + * + * Shifting left 128 bit value of vA by value specified in bits 125-127 of= vB. + * Lowest 3 bits in each byte element of register vB must be identical or + * result is undefined. + */ +static void trans_vsl(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VA =3D rA(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avrA =3D tcg_temp_new_i64(); + TCGv_i64 avrB =3D tcg_temp_new_i64(); + TCGv_i64 sh =3D tcg_temp_new_i64(); + TCGv_i64 shifted =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + + /* Place bits 125-127 of vB in sh. */ + get_avr64(avrB, VB, false); + tcg_gen_andi_i64(sh, avrB, 0x07ULL); + + /* + * Save highest sh bits of lower doubleword element of vA in variable + * shifted and perform shift on lower doubleword. + */ + get_avr64(avrA, VA, false); + tcg_gen_subi_i64(tmp, sh, 64); + tcg_gen_neg_i64(tmp, tmp); + tcg_gen_shr_i64(shifted, avrA, tmp); + tcg_gen_shl_i64(avrA, avrA, sh); + set_avr64(VT, avrA, false); + + /* + * Perform shift on higher doubleword element of vA and replace lowest + * sh bits with shifted. + */ + get_avr64(avrA, VA, true); + tcg_gen_shl_i64(avrA, avrA, sh); + tcg_gen_or_i64(avrA, avrA, shifted); + set_avr64(VT, avrA, true); + + tcg_temp_free_i64(avrA); + tcg_temp_free_i64(avrB); + tcg_temp_free_i64(sh); + tcg_temp_free_i64(shifted); + tcg_temp_free_i64(tmp); +} + +/* + * vsr VRT,VRA,VRB - Vector Shift Right + * + * Shifting right 128 bit value of vA by value specified in bits 125-127 o= f vB. + * Lowest 3 bits in each byte element of register vB must be identical or + * result is undefined. + */ +static void trans_vsr(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VA =3D rA(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avrA =3D tcg_temp_new_i64(); + TCGv_i64 avrB =3D tcg_temp_new_i64(); + TCGv_i64 sh =3D tcg_temp_new_i64(); + TCGv_i64 shifted =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + + /* Place bits 125-127 of vB in sh. */ + get_avr64(avrB, VB, false); + tcg_gen_andi_i64(sh, avrB, 0x07ULL); + + /* + * Save lowest sh bits of higher doubleword element of vA in variable + * shifted and perform shift on higher doubleword. + */ + get_avr64(avrA, VA, true); + tcg_gen_subi_i64(tmp, sh, 64); + tcg_gen_neg_i64(tmp, tmp); + tcg_gen_shl_i64(shifted, avrA, tmp); + tcg_gen_shr_i64(avrA, avrA, sh); + set_avr64(VT, avrA, true); + /* + * Perform shift on lower doubleword element of vA and replace highest + * sh bits with shifted. + */ + get_avr64(avrA, VA, false); + tcg_gen_shr_i64(avrA, avrA, sh); + tcg_gen_or_i64(avrA, avrA, shifted); + set_avr64(VT, avrA, false); + + tcg_temp_free_i64(avrA); + tcg_temp_free_i64(avrB); + tcg_temp_free_i64(sh); + tcg_temp_free_i64(shifted); + tcg_temp_free_i64(tmp); +} + GEN_VXFORM(vmuloub, 4, 0); GEN_VXFORM(vmulouh, 4, 1); GEN_VXFORM(vmulouw, 4, 2); @@ -699,11 +796,11 @@ GEN_VXFORM(vrld, 2, 3); GEN_VXFORM(vrldmi, 2, 3); GEN_VXFORM_DUAL(vrld, PPC_NONE, PPC2_ALTIVEC_207, \ vrldmi, PPC_NONE, PPC2_ISA300) -GEN_VXFORM(vsl, 2, 7); +GEN_VXFORM_TRANS(vsl, 2, 7); GEN_VXFORM(vrldnm, 2, 7); GEN_VXFORM_DUAL(vsl, PPC_ALTIVEC, PPC_NONE, \ vrldnm, PPC_NONE, PPC2_ISA300) -GEN_VXFORM(vsr, 2, 11); +GEN_VXFORM_TRANS(vsr, 2, 11); GEN_VXFORM_ENV(vpkuhum, 7, 0); GEN_VXFORM_ENV(vpkuwum, 7, 1); GEN_VXFORM_ENV(vpkudum, 7, 17); --=20 2.7.4 From nobody Mon May 6 08:55:56 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559816708; cv=none; d=zoho.com; s=zohoarc; b=K4RBnUto76CGuu4UbcmYFu5xWx2kXBuP7/qlYoUDsVsYAcjaRj8z86f9h+RWHa8IlYK4Wk/L6alJO++J/JcacIOc2MwnsnA3YozVo1arYIlCwO1KtEKtSHAcjJCuA8rytB9hweU3MJ/EyB8RiJiRL3ITgtl3Zt2YiViN+nJeQuc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559816708; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=wRh21/u+3mLgxdvImnYcgp5ByXiV8k+D5jNwBPpXAa0=; b=VJIOdcFkMNL05ZZ+zCBkODREUMMia4XZnt+8xOENLdwudzMvVr7caENwgFbalCLToxFMKtsvyGWMxMYnUlmxumvIMbmXUysmk75epOAEdLM1qzZV3QAJbeo/hqZxwIRM2DYNqlQUVQSx/oPqgshbNG/eLoHhONQKVjARr5CRs8w= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (209.51.188.17 [209.51.188.17]) by mx.zohomail.com with SMTPS id 1559816708471756.3185456588793; Thu, 6 Jun 2019 03:25:08 -0700 (PDT) Received: from localhost ([127.0.0.1]:57867 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpZp-00009p-8c for importer@patchew.org; Thu, 06 Jun 2019 06:24:57 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50081) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpRz-0002bd-15 for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYpRx-000275-Mp for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:51 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33643 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hYpRx-000719-Ap for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:49 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id A947D1A2106; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id 731B01A1DC6; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Thu, 6 Jun 2019 12:15:25 +0200 Message-Id: <1559816130-17113-4-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 3/8] target/ppc: Optimize emulation of vpkpx instruction X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize altivec instruction vpkpx (Vector Pack Pixel). Rearranges 8 pixels coded in 6-5-5 pattern (4 from each source register) into contigous array of bits in the destination register. In each iteration of outer loop we do the 6-5-5 pack for 2 pixels of each doubleword element of each source register. The first thing we do in outer loop is choosing which doubleword element of which register are we using in current iteration and we place it in avr variable. Then we perform 6-5-5 pack of pixels on avr variable in inner for loop(2 iterations, 1 for each pixel) and save result in tmp variable. In the end of outer for loop, we merge result in variable called result and save it in appropriate doubleword element of vD if whole doubleword is finished(every second iteration). Outer loop has 4 iterations. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 93 +++++++++++++++++++++++++++++++++= +++- 1 file changed, 92 insertions(+), 1 deletion(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 6bd072a..87f69dc 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -593,6 +593,97 @@ static void trans_lvsr(DisasContext *ctx) } =20 /* + * vpkpx VRT,VRA,VRB - Vector Pack Pixel + * + * Rearranges 8 pixels coded in 6-5-5 pattern (4 from each source register) + * into contigous array of bits in the destination register. + */ +static void trans_vpkpx(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VA =3D rA(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 shifted =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + int64_t mask1 =3D 0x1fULL; + int64_t mask2 =3D 0x1fULL << 5; + int64_t mask3 =3D 0x3fULL << 10; + int i, j; + /* + * In each iteration do the 6-5-5 pack for 2 pixels of each doubleword + * element of each source register. + */ + for (i =3D 0; i < 4; i++) { + switch (i) { + case 0: + /* + * Get high doubleword of vA to perfrom 6-5-5 pack of pixels + * 1 and 2. + */ + get_avr64(avr, VA, true); + tcg_gen_movi_i64(result, 0x0ULL); + break; + case 1: + /* + * Get low doubleword of vA to perfrom 6-5-5 pack of pixels + * 3 and 4. + */ + get_avr64(avr, VA, false); + break; + case 2: + /* + * Get high doubleword of vB to perfrom 6-5-5 pack of pixels + * 5 and 6. + */ + get_avr64(avr, VB, true); + tcg_gen_movi_i64(result, 0x0ULL); + break; + case 3: + /* + * Get low doubleword of vB to perfrom 6-5-5 pack of pixels + * 7 and 8. + */ + get_avr64(avr, VB, false); + break; + } + /* Perform the packing for 2 pixels(each iteration for 1). */ + tcg_gen_movi_i64(tmp, 0x0ULL); + for (j =3D 0; j < 2; j++) { + tcg_gen_shri_i64(shifted, avr, (j * 16 + 3)); + tcg_gen_andi_i64(shifted, shifted, mask1 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + + tcg_gen_shri_i64(shifted, avr, (j * 16 + 6)); + tcg_gen_andi_i64(shifted, shifted, mask2 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + + tcg_gen_shri_i64(shifted, avr, (j * 16 + 9)); + tcg_gen_andi_i64(shifted, shifted, mask3 << (j * 16)); + tcg_gen_or_i64(tmp, tmp, shifted); + } + if ((i =3D=3D 0) || (i =3D=3D 2)) { + tcg_gen_shli_i64(tmp, tmp, 32); + } + tcg_gen_or_i64(result, result, tmp); + if (i =3D=3D 1) { + /* Place packed pixels 1:4 to high doubleword of vD. */ + set_avr64(VT, result, true); + } + if (i =3D=3D 3) { + /* Place packed pixels 5:8 to low doubleword of vD. */ + set_avr64(VT, result, false); + } + } + + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(shifted); + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); +} + +/* * vsl VRT,VRA,VRB - Vector Shift Left * * Shifting left 128 bit value of vA by value specified in bits 125-127 of= vB. @@ -813,7 +904,7 @@ GEN_VXFORM_ENV(vpksdus, 7, 21); GEN_VXFORM_ENV(vpkshss, 7, 6); GEN_VXFORM_ENV(vpkswss, 7, 7); GEN_VXFORM_ENV(vpksdss, 7, 23); -GEN_VXFORM(vpkpx, 7, 12); +GEN_VXFORM_TRANS(vpkpx, 7, 12); GEN_VXFORM_ENV(vsum4ubs, 4, 24); GEN_VXFORM_ENV(vsum4sbs, 4, 28); GEN_VXFORM_ENV(vsum4shs, 4, 25); --=20 2.7.4 From nobody Mon May 6 08:55:56 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559816524; cv=none; d=zoho.com; s=zohoarc; b=IOGS1JvuNDZ1rj78YOlexw0XaCygJ30vTBhx1GP2eD5NwjBahmTmdqurjhjeTjkGbDO4no1XI5JeM3QAqXWPmgClqm5NjIRYKmDy3ckzq/go7SG4t555vE7fPR8Yr/gycLRoQ/Xc9kZgDr0gRJXZOcFCu2L56/HfFB8Swa9oip0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559816524; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=wKYb3EmfXG2IyLn4xFBxe5022Qan2Nwcf/zHAt5DCXg=; b=CREZ0GMacbRyjSECv/8V4Ox/+nuJa3KOzSWoUL/vKYx6cNQMfecZDoEnKUShByWHMJ5Z+WjL4DExJ5nlB4x0AiZ2ZHZSuG72YhyLcU3dGJDMBIwfn9em091N4NUYvdd1gD9cFGE7MO9/RhKu2kkdolkNsCaqCtoTV5JfS9wx1MQ= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1559816524647447.3896021575598; Thu, 6 Jun 2019 03:22:04 -0700 (PDT) Received: from localhost ([127.0.0.1]:57834 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpWy-0006Uy-9M for importer@patchew.org; Thu, 06 Jun 2019 06:22:00 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50049) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpRx-0002ad-MS for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYpRw-0001zI-96 for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:49 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:33649 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hYpRv-00071L-TR for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id A8B731A2105; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id 802741A1DE5; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Thu, 6 Jun 2019 12:15:26 +0200 Message-Id: <1559816130-17113-5-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize altivec instruction vgbbd (Vector Gather Bits by Bytes by Doublewo= rd) All ith bits (i in range 1 to 8) of each byte of doubleword element in source register are concatenated and placed into ith byte of appropriate doubleword element in destination register. Following solution is done for every doubleword element of source register (placed in shifted variable): We gather bits in 2x8 iterations. In first iteration bit 1 of byte 1, bit 2 of byte 2,... bit 8 of byte 8 are in their final spots so we just and avr with mask. For every next iteration, we have to shift right both shifted(7 places) and mask(8 places), so we get bit 1 of byte 2, bit 2 of byte 3.. bit 7 of byte 8 in right places so we and shifted with new value of mask... After first 8 iteration(first for loop) we have all first bits in their final place all second bits but second bit from eight byte in their place,... only 1 eight bit from eight byte is in it's place), so we and result1 with mask1 to save those bits that are at right place and save them in result1. In second loop we do all operations symetrical, so we get other half of bits on their final spots, and save result in result2. Or of result1 and result2 is placed in appropriate doubleword element of vD. We repeat this 2 times. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 99 +++++++++++++++++++++++++++++++++= +++- 1 file changed, 98 insertions(+), 1 deletion(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 87f69dc..010f337 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -780,6 +780,103 @@ static void trans_vsr(DisasContext *ctx) tcg_temp_free_i64(tmp); } =20 +/* + * vgbbd VRT,VRB - Vector Gather Bits by Bytes by Doubleword + * + * All ith bits (i in range 1 to 8) of each byte of doubleword element in = source + * register are concatenated and placed into ith byte of appropriate doubl= eword + * element in destination register. + * + * Following solution is done for every doubleword element of source regis= ter + * (placed in shifted variable): + * We gather bits in 2x8 iterations. + * In first iteration bit 1 of byte 1, bit 2 of byte 2,... bit 8 of byte 8= are + * in their final spots so we just and avr with mask. For every next itera= tion, + * we have to shift right both shifted(7 places) and mask(8 places), so we= get + * bit 1 of byte 2, bit 2 of byte 3.. bit 7 of byte 8 in right places so w= e and + * shifted with new value of mask... After first 8 iteration(first for loo= p) we + * have all first bits in their final place all second bits but second bit= from + * eight byte in their place,... only 1 eight bit from eight byte is in it= 's + * place), so we and result1 with mask1 to save those bits that are at rig= ht + * place and save them in result1. In second loop we do all operations + * symetrical, so we get other half of bits on their final spots, and save + * result in result2. Or of result1 and result2 is placed in appropriate + * doubleword element of vD. We repeat this 2 times. + */ +static void trans_vgbbd(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 shifted =3D tcg_temp_new_i64(); + TCGv_i64 result1 =3D tcg_temp_new_i64(); + TCGv_i64 result2 =3D tcg_temp_new_i64(); + uint64_t mask =3D 0x8040201008040201ULL; + uint64_t mask1 =3D 0x80c0e0f0f8fcfeffULL; + uint64_t mask2 =3D 0x7f3f1f0f07030100ULL; + int i; + + get_avr64(avr, VB, true); + tcg_gen_movi_i64(result1, 0x0ULL); + tcg_gen_mov_i64(shifted, avr); + for (i =3D 0; i < 8; i++) { + tcg_gen_andi_i64(tmp, shifted, mask); + tcg_gen_or_i64(result1, result1, tmp); + + tcg_gen_shri_i64(shifted, shifted, 7); + mask =3D mask >> 8; + } + tcg_gen_andi_i64(result1, result1, mask1); + + mask =3D 0x8040201008040201ULL; + tcg_gen_movi_i64(result2, 0x0ULL); + for (i =3D 0; i < 8; i++) { + tcg_gen_andi_i64(tmp, avr, mask); + tcg_gen_or_i64(result2, result2, tmp); + + tcg_gen_shli_i64(avr, avr, 7); + mask =3D mask << 8; + } + tcg_gen_andi_i64(result2, result2, mask2); + + tcg_gen_or_i64(result2, result2, result1); + set_avr64(VT, result2, true); + + mask =3D 0x8040201008040201ULL; + get_avr64(avr, VB, false); + tcg_gen_movi_i64(result1, 0x0ULL); + tcg_gen_mov_i64(shifted, avr); + for (i =3D 0; i < 8; i++) { + tcg_gen_andi_i64(tmp, shifted, mask); + tcg_gen_or_i64(result1, result1, tmp); + + tcg_gen_shri_i64(shifted, shifted, 7); + mask =3D mask >> 8; + } + tcg_gen_andi_i64(result1, result1, mask1); + + mask =3D 0x8040201008040201ULL; + tcg_gen_movi_i64(result2, 0x0ULL); + for (i =3D 0; i < 8; i++) { + tcg_gen_andi_i64(tmp, avr, mask); + tcg_gen_or_i64(result2, result2, tmp); + + tcg_gen_shli_i64(avr, avr, 7); + mask =3D mask << 8; + } + tcg_gen_andi_i64(result2, result2, mask2); + + tcg_gen_or_i64(result2, result2, result1); + set_avr64(VT, result2, false); + + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(avr); + tcg_temp_free_i64(shifted); + tcg_temp_free_i64(result1); + tcg_temp_free_i64(result2); +} + GEN_VXFORM(vmuloub, 4, 0); GEN_VXFORM(vmulouh, 4, 1); GEN_VXFORM(vmulouw, 4, 2); @@ -1319,7 +1416,7 @@ GEN_VXFORM_DUAL(vclzd, PPC_NONE, PPC2_ALTIVEC_207, \ vpopcntd, PPC_NONE, PPC2_ALTIVEC_207) GEN_VXFORM(vbpermd, 6, 23); GEN_VXFORM(vbpermq, 6, 21); -GEN_VXFORM_NOA(vgbbd, 6, 20); +GEN_VXFORM_TRANS(vgbbd, 6, 20); GEN_VXFORM(vpmsumb, 4, 16) GEN_VXFORM(vpmsumh, 4, 17) GEN_VXFORM(vpmsumw, 4, 18) --=20 2.7.4 From nobody Mon May 6 08:55:56 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559816524; cv=none; d=zoho.com; s=zohoarc; b=PYPFb9dF99/jQdFwtlF2d7vg8cTx6LJFf7VntMOsow+mgTAO7VYIYQeTKTmw+rRLL82O/OOS2Hx+8u3jKJ/SlBHISxI76+amdV/6+z3ARPiqTTyfsXwz2JYgNjtMPjSRB4KB7XZ+dBfOsWkkPGx1o6bTyjACzZxeuvC4h1588Fc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559816524; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=8xwMOSqma8XxOtAqmtUIyIW/Pl0fomZ91dbkbMKzzTw=; b=YiMpGNJ++zRDUyRvCw/dWQbR3HWPFkZR/qsp04oA5KaGlXg+Tj83qbpGgn87O9qfZdGDA1V5hrvsjavVR0F92VAFnw3O1hkhkHyZo6ae+AI6nug8OMkV/Ebj6tRDU/jRhUg/xG28tizC/kCj3tAWCcquoRZwca9gdIl95fvtmrM= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1559816524573124.32476546261512; Thu, 6 Jun 2019 03:22:04 -0700 (PDT) Received: from localhost ([127.0.0.1]:57836 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpWy-0006VO-AK for importer@patchew.org; Thu, 06 Jun 2019 06:22:00 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50028) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpRx-0002aM-7I for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYpRw-0001yz-6v for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:49 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:41534 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hYpRv-0001tf-V5 for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id BA0D31A1DCB; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id 8D0791A2006; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Thu, 6 Jun 2019 12:15:27 +0200 Message-Id: <1559816130-17113-6-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 5/8] target/ppc: Optimize emulation of vclzd instruction X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize Altivec instruction vclzd (Vector Count Leading Zeros Doubleword). This instruction counts the number of leading zeros of each doubleword elem= ent in source register and places result in the appropriate doubleword element = of destination register. Using tcg-s count leading zeros instruction two times(once for each doubleword element of source register vB) and placing result in appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/translate/vmx-impl.inc.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 010f337..1c34908 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -877,6 +877,32 @@ static void trans_vgbbd(DisasContext *ctx) tcg_temp_free_i64(result2); } =20 +/* + * vclzd VRT,VRB - Vector Count Leading Zeros Doubleword + * + * Counting the number of leading zero bits of each doubleword element in = source + * register and placing result in appropriate doubleword element of destin= ation + * register. + */ +static void trans_vclzd(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + + /* high doubleword */ + get_avr64(avr, VB, true); + tcg_gen_clzi_i64(avr, avr, 64); + set_avr64(VT, avr, true); + + /* low doubleword */ + get_avr64(avr, VB, false); + tcg_gen_clzi_i64(avr, avr, 64); + set_avr64(VT, avr, false); + + tcg_temp_free_i64(avr); +} + GEN_VXFORM(vmuloub, 4, 0); GEN_VXFORM(vmulouh, 4, 1); GEN_VXFORM(vmulouw, 4, 2); @@ -1388,7 +1414,7 @@ GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) GEN_VXFORM_NOA(vclzb, 1, 28) GEN_VXFORM_NOA(vclzh, 1, 29) GEN_VXFORM_NOA(vclzw, 1, 30) -GEN_VXFORM_NOA(vclzd, 1, 31) +GEN_VXFORM_TRANS(vclzd, 1, 31) GEN_VXFORM_NOA_2(vnegw, 1, 24, 6) GEN_VXFORM_NOA_2(vnegd, 1, 24, 7) GEN_VXFORM_NOA_2(vextsb2w, 1, 24, 16) --=20 2.7.4 From nobody Mon May 6 08:55:56 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559816521; cv=none; d=zoho.com; s=zohoarc; b=kk3cEOAHXFoNO3o04XrVSwa8dqAbZg4hTCKDH7EkxOaK1fF7o5dkZOQ4jo3Uz+HVupTOcOOhiRnaNDphbngxHcHE+zv+V/wWGXNTQ7DF8vUDJJLcz2EXVzv564B8hHIBhfhPj+Hay4j9BQwk/Ti0VzuOuBlLg8ld1hLCzvU6w5M= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559816521; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=Z0kRQvmTwfxY6SstkQyOEbmICBDbc7x/TaOKEFo7VCQ=; b=SIbLnSnTcEXvz/39A0FpBn8Lyk40lnEybmHpbKYvampvA2ghLcohStKV7KdXoDjnKAL4sWHHZ7dJlh6tJ5dcVbT5tIs5p2OLmzkDuHhK3uF+yvMxwjdjUrfq+JzT+l4PEcRuZGnrrre1VVD0ErJUC5Hum+FtypYi63SBlUxjyeI= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1559816521941583.6754215506726; Thu, 6 Jun 2019 03:22:01 -0700 (PDT) Received: from localhost ([127.0.0.1]:57838 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpWy-0006Vu-T7 for importer@patchew.org; Thu, 06 Jun 2019 06:22:00 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50043) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpRx-0002aZ-Jx for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYpRw-0001zY-BO for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:49 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:41533 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hYpRv-0001tc-VI for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id B91911A1DC6; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id 997641A20CD; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Thu, 6 Jun 2019 12:15:28 +0200 Message-Id: <1559816130-17113-7-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 6/8] target/ppc: Optimize emulation of vclzw instruction X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize Altivec instruction vclzw (Vector Count Leading Zeros Word). This instruction counts the number of leading zeros of each word element in source register and places result in the appropriate word element of destination register. We perform counting in two iterations of for loop(one for each doubleword element of source register vB). First thing we do in loop is placing appropriate doubleword element of vB in variable avr. Then we perform counting using tcg-s count leading zeros function. Since it counts leading zeros on 64 bit lenght, we have to move ith word element to highest 32 bits of variable tmp, or it with mask(so we get all ones in lowest 32 bits), then perform tcg_gen_clzi_i64 and move it's result in appropriate word element of variable result. In the end of each loop iteration we save variable result to appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 57 +++++++++++++++++++++++++++++++++= +++- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 1c34908..7689739 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -878,6 +878,61 @@ static void trans_vgbbd(DisasContext *ctx) } =20 /* + * vclzw VRT,VRB - Vector Count Leading Zeros Word + * + * Counting the number of leading zero bits of each word element in source + * register and placing result in appropriate word element of destination + * register. + */ +static void trans_vclzw(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffULL); + int i; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword element of vB in avr. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword element of vB in avr. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every word element using tcg_gen_clzi_i64. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith word element to highest 32 bits of tmp, or it with mask(so = we get + * all ones in lowest 32 bits), then perform tcg_gen_clzi_i64 and = move + * it's result in appropriate word element of result. + */ + tcg_gen_shli_i64(tmp, avr, 32); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 32, 32); + + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + set_avr64(VT, result, true); + } else { + /* Place result in low doubleword element of vD. */ + set_avr64(VT, result, false); + } + } + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* * vclzd VRT,VRB - Vector Count Leading Zeros Doubleword * * Counting the number of leading zero bits of each doubleword element in = source @@ -1413,7 +1468,7 @@ GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) =20 GEN_VXFORM_NOA(vclzb, 1, 28) GEN_VXFORM_NOA(vclzh, 1, 29) -GEN_VXFORM_NOA(vclzw, 1, 30) +GEN_VXFORM_TRANS(vclzw, 1, 30) GEN_VXFORM_TRANS(vclzd, 1, 31) GEN_VXFORM_NOA_2(vnegw, 1, 24, 6) GEN_VXFORM_NOA_2(vnegd, 1, 24, 7) --=20 2.7.4 From nobody Mon May 6 08:55:56 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559816352; cv=none; d=zoho.com; s=zohoarc; b=I38zSf89obmjyzqtPhiqMyz6x8p0HaaGZew7rVjUZEX2mG1Mx3Kb4CkWSSHpAARw/+VytkWvYgXtNslWLg4Mp3ImzSOFwL7CQRntJE5mD5xYvMtBejitFIiUeR6ZjKC9o5q2sCayLNhpuJebrxrWuiJUWzpZoIf8hfb/SJXYVMg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559816352; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=SR1suxBq9gtLC8KsuOS9kT4+/L9qZcBv6R2DqbmKjsA=; b=RHIL8NDsnx7qqshIttIIPr3vWcpFFBI5LwJxsLx5yyoGzjaFgEU+FLr8kLHwDa1Piea/j4tY2oLCPgojLiyN7KJiwsu1mVDL04GF0xf4n2rjUnexo7+F2n4W29ugHZjD0OMW21Ajty/YoctFQ6aNs0zEs2kmLhRIlPg0Ifm4bro= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 155981635273298.1050206119462; Thu, 6 Jun 2019 03:19:12 -0700 (PDT) Received: from localhost ([127.0.0.1]:57779 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpUA-0003lw-Cm for importer@patchew.org; Thu, 06 Jun 2019 06:19:06 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50064) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpRy-0002b4-6v for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYpRw-0001zf-Bx for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:50 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:41538 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hYpRv-0001ts-W0 for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 093331A2006; Thu, 6 Jun 2019 12:15:37 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id DA8051A1DE5; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Thu, 6 Jun 2019 12:15:29 +0200 Message-Id: <1559816130-17113-8-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Optimize Altivec instruction vclzh (Vector Count Leading Zeros Halfword). This instruction counts the number of leading zeros of each halfword element in source register and places result in the appropriate halfword element of destination register. In each iteration of outer for loop we perform count operation on one doubleword elements of source register vB. In first iteration we place higher doubleword element of vB in variable avr, then we perform count for every halfword element using tcg_gen_clzi_i64. Since it counts leading zeros on 64 bit lenght, we have to move ith byte element to highest 16 bits of tmp, or it with mask(so we get all ones in lowest 48 bits), then perform tcg_gen_clzi_i64 and move it's result in appropriate halfword element of result. We do this in inner for loop. After operation is finished we save result in appropriate doubleword element of destination register vD. We repeat this once again for lower doubleword element of vB. Optimize Altivec instruction vclzb (Vector Count Leading Zeros Byte). This instruction counts the number of leading zeros of each byte element in source register and places result in the appropriate byte element of destination register. In each iteration of outer for loop we perform count operation on one doubleword elements of source register vB. In first iteration we place higher doubleword element of vB in variable avr, then we perform count for every byte element using tcg_gen_clzi_i64. Since it counts leading zeros on 64 bit lenght, we have to move ith byte element to highest 8 bits of variable tmp, or it with mask(so we get all ones in lowest 56 bits), then perform tcg_gen_clzi_i64 and move it's result in appropriate byte element of result. We do this in inner for loop. After operation is finished we save result in appropriate doubleword element of destination register vD. We repeat this once again for lower doubleword element of vB. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 122 ++++++++++++++++++++++++++++++++= +++- 1 file changed, 120 insertions(+), 2 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 7689739..8535a31 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -878,6 +878,124 @@ static void trans_vgbbd(DisasContext *ctx) } =20 /* + * vclzb VRT,VRB - Vector Count Leading Zeros Byte + * + * Counting the number of leading zero bits of each byte element in source + * register and placing result in appropriate byte element of destination + * register. + */ +static void trans_vclzb(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffffffffULL); + int i, j; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword of vB in avr. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword of vB in avr. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every byte element using tcg_gen_clzi_i64. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith byte element to highest 8 bits of tmp, or it with mask(so w= e get + * all ones in lowest 56 bits), then perform tcg_gen_clzi_i64 and = move + * it's result in appropriate byte element of result. + */ + tcg_gen_shli_i64(tmp, avr, 56); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + for (j =3D 1; j < 7; j++) { + tcg_gen_shli_i64(tmp, avr, (7 - j) * 8); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, j * 8, 8); + } + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 56, 8); + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + set_avr64(VT, result, true); + } else { + /* Place result in low doubleword element of vD. */ + set_avr64(VT, result, false); + } + } + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* + * vclzh VRT,VRB - Vector Count Leading Zeros Halfword + * + * Counting the number of leading zero bits of each halfword element in so= urce + * register and placing result in appropriate halfword element of destinat= ion + * register. + */ +static void trans_vclzh(DisasContext *ctx) +{ + int VT =3D rD(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 avr =3D tcg_temp_new_i64(); + TCGv_i64 result =3D tcg_temp_new_i64(); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 mask =3D tcg_const_i64(0xffffffffffffULL); + int i, j; + + for (i =3D 0; i < 2; i++) { + if (i =3D=3D 0) { + /* Get high doubleword element of vB in avr. */ + get_avr64(avr, VB, true); + } else { + /* Get low doubleword element of vB in avr. */ + get_avr64(avr, VB, false); + } + /* + * Perform count for every halfword element using tcg_gen_clzi_i64. + * Since it counts leading zeros on 64 bit lenght, we have to move + * ith byte element to highest 16 bits of tmp, or it with mask(so = we get + * all ones in lowest 48 bits), then perform tcg_gen_clzi_i64 and = move + * it's result in appropriate halfword element of result. + */ + tcg_gen_shli_i64(tmp, avr, 48); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(result, tmp, 64); + for (j =3D 1; j < 3; j++) { + tcg_gen_shli_i64(tmp, avr, (3 - j) * 16); + tcg_gen_or_i64(tmp, tmp, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, j * 16, 16); + } + tcg_gen_or_i64(tmp, avr, mask); + tcg_gen_clzi_i64(tmp, tmp, 64); + tcg_gen_deposit_i64(result, result, tmp, 48, 16); + if (i =3D=3D 0) { + /* Place result in high doubleword element of vD. */ + set_avr64(VT, result, true); + } else { + /* Place result in low doubleword element of vD. */ + set_avr64(VT, result, false); + } + } + + tcg_temp_free_i64(avr); + tcg_temp_free_i64(result); + tcg_temp_free_i64(tmp); + tcg_temp_free_i64(mask); +} + +/* * vclzw VRT,VRB - Vector Count Leading Zeros Word * * Counting the number of leading zero bits of each word element in source @@ -1466,8 +1584,8 @@ GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20) GEN_VAFORM_PAIRED(vsel, vperm, 21) GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23) =20 -GEN_VXFORM_NOA(vclzb, 1, 28) -GEN_VXFORM_NOA(vclzh, 1, 29) +GEN_VXFORM_TRANS(vclzb, 1, 28) +GEN_VXFORM_TRANS(vclzh, 1, 29) GEN_VXFORM_TRANS(vclzw, 1, 30) GEN_VXFORM_TRANS(vclzd, 1, 31) GEN_VXFORM_NOA_2(vnegw, 1, 24, 6) --=20 2.7.4 From nobody Mon May 6 08:55:56 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1559816632; cv=none; d=zoho.com; s=zohoarc; b=m6seVvdNdgFQRy8hLGlweCM9gZKkiRC3NXh0InKpDlPWMhHOlAfFVtoNjKs1A6+6WdGPQFM6McPSSYx4G+rtQNZiD5nbJgP9/k4SjyjiwMS81keL7R8C6ewV0BDR71QsRepGFocLMAkyr6fwB3pxonso34K92YqVDFtuCpirHf0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1559816632; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=XyMlCu3bJAe/gQQy7WGczY07rQ78v09ybItddONvY7Q=; b=LmP43xg86pR2geohyitiFHjPegsMwUHxHSjD9bvy9hPeCB9hKkQbg+n9OMhphDrfh3AohsjK2u7DLiKYtt0PsgQVJtTzZoSlxRPQBAPu7f367rR7KMWdIwVgdMnPqozu/eERXJfBDwhBfcE5FS0DlmN9jq6CP/i7L0Tu9AV1K8w= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1559816632235705.6761925381064; Thu, 6 Jun 2019 03:23:52 -0700 (PDT) Received: from localhost ([127.0.0.1]:57861 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpYk-00080e-U8 for importer@patchew.org; Thu, 06 Jun 2019 06:23:50 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50057) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYpRx-0002ar-Uk for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hYpRw-0001zv-Cl for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:49 -0400 Received: from mx2.rt-rk.com ([89.216.37.149]:41537 helo=mail.rt-rk.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hYpRw-0001tp-0v for qemu-devel@nongnu.org; Thu, 06 Jun 2019 06:16:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rt-rk.com (Postfix) with ESMTP id 1417A1A1DE5; Thu, 6 Jun 2019 12:15:37 +0200 (CEST) Received: from rtrkw870-lin.domain.local (rtrkw870-lin.domain.local [10.10.13.132]) by mail.rt-rk.com (Postfix) with ESMTPSA id EB2251A1E45; Thu, 6 Jun 2019 12:15:36 +0200 (CEST) X-Virus-Scanned: amavisd-new at rt-rk.com From: Stefan Brankovic To: qemu-devel@nongnu.org Date: Thu, 6 Jun 2019 12:15:30 +0200 Message-Id: <1559816130-17113-9-git-send-email-stefan.brankovic@rt-rk.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> References: <1559816130-17113-1-git-send-email-stefan.brankovic@rt-rk.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 89.216.37.149 Subject: [Qemu-devel] [PATCH 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Since I found this two instructions implemented with tcg, I refactored them so they are consistent with other similar implementations that I introduced in this patch. Also had to add new dual macro GEN_VXFORM_TRANS_DUAL. We use this macro if one instruction is realized with direct translation, and second one with helper. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 62 ++++++++++++++++++++-------------= ---- 1 file changed, 33 insertions(+), 29 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx= -impl.inc.c index 8535a31..46c6f34 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -350,6 +350,24 @@ static void glue(gen_, name0##_##name1)(DisasContext *= ctx) \ } \ } =20 +/* + * We use this macro if one instruction is realized with direct + * translation, and second one with helper. + */ +#define GEN_VXFORM_TRANS_DUAL(name0, flg0, flg2_0, name1, flg1, flg2_1)\ +static void glue(gen_, name0##_##name1)(DisasContext *ctx) \ +{ \ + if ((Rc(ctx->opcode) =3D=3D 0) && = \ + ((ctx->insns_flags & flg0) || (ctx->insns_flags2 & flg2_0))) { \ + trans_##name0(ctx); \ + } else if ((Rc(ctx->opcode) =3D=3D 1) && = \ + ((ctx->insns_flags & flg1) || (ctx->insns_flags2 & flg2_1))) { \ + gen_##name1(ctx); \ + } else { \ + gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); \ + } \ +} + /* Adds support to provide invalid mask */ #define GEN_VXFORM_DUAL_EXT(name0, flg0, flg2_0, inval0, \ name1, flg1, flg2_1, inval1) \ @@ -431,20 +449,13 @@ GEN_VXFORM(vmrglb, 6, 4); GEN_VXFORM(vmrglh, 6, 5); GEN_VXFORM(vmrglw, 6, 6); =20 -static void gen_vmrgew(DisasContext *ctx) +static void trans_vmrgew(DisasContext *ctx) { - TCGv_i64 tmp; - TCGv_i64 avr; - int VT, VA, VB; - if (unlikely(!ctx->altivec_enabled)) { - gen_exception(ctx, POWERPC_EXCP_VPU); - return; - } - VT =3D rD(ctx->opcode); - VA =3D rA(ctx->opcode); - VB =3D rB(ctx->opcode); - tmp =3D tcg_temp_new_i64(); - avr =3D tcg_temp_new_i64(); + int VT =3D rD(ctx->opcode); + int VA =3D rA(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 tmp =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); =20 get_avr64(avr, VB, true); tcg_gen_shri_i64(tmp, avr, 32); @@ -462,21 +473,14 @@ static void gen_vmrgew(DisasContext *ctx) tcg_temp_free_i64(avr); } =20 -static void gen_vmrgow(DisasContext *ctx) +static void trans_vmrgow(DisasContext *ctx) { - TCGv_i64 t0, t1; - TCGv_i64 avr; - int VT, VA, VB; - if (unlikely(!ctx->altivec_enabled)) { - gen_exception(ctx, POWERPC_EXCP_VPU); - return; - } - VT =3D rD(ctx->opcode); - VA =3D rA(ctx->opcode); - VB =3D rB(ctx->opcode); - t0 =3D tcg_temp_new_i64(); - t1 =3D tcg_temp_new_i64(); - avr =3D tcg_temp_new_i64(); + int VT =3D rD(ctx->opcode); + int VA =3D rA(ctx->opcode); + int VB =3D rB(ctx->opcode); + TCGv_i64 t0 =3D tcg_temp_new_i64(); + TCGv_i64 t1 =3D tcg_temp_new_i64(); + TCGv_i64 avr =3D tcg_temp_new_i64(); =20 get_avr64(t0, VB, true); get_avr64(t1, VA, true); @@ -1213,14 +1217,14 @@ GEN_VXFORM_ENV(vminfp, 5, 17); GEN_VXFORM_HETRO(vextublx, 6, 24) GEN_VXFORM_HETRO(vextuhlx, 6, 25) GEN_VXFORM_HETRO(vextuwlx, 6, 26) -GEN_VXFORM_DUAL(vmrgow, PPC_NONE, PPC2_ALTIVEC_207, +GEN_VXFORM_TRANS_DUAL(vmrgow, PPC_NONE, PPC2_ALTIVEC_207, vextuwlx, PPC_NONE, PPC2_ISA300) GEN_VXFORM_HETRO(vextubrx, 6, 28) GEN_VXFORM_HETRO(vextuhrx, 6, 29) GEN_VXFORM_HETRO(vextuwrx, 6, 30) GEN_VXFORM_TRANS(lvsl, 6, 31) GEN_VXFORM_TRANS(lvsr, 6, 32) -GEN_VXFORM_DUAL(vmrgew, PPC_NONE, PPC2_ALTIVEC_207, \ +GEN_VXFORM_TRANS_DUAL(vmrgew, PPC_NONE, PPC2_ALTIVEC_207, vextuwrx, PPC_NONE, PPC2_ISA300) =20 #define GEN_VXRFORM1(opname, name, str, opc2, opc3) \ --=20 2.7.4