From nobody Tue Feb 10 13:36:31 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1663970693240827.66503920096; Fri, 23 Sep 2022 15:04:53 -0700 (PDT) Received: from localhost ([::1]:36602 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1obqmx-00078i-8B for importer@patchew.org; Fri, 23 Sep 2022 18:04:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:39032) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1obqY0-0006ad-TS; Fri, 23 Sep 2022 17:49:24 -0400 Received: from [200.168.210.66] (port=12827 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1obqXy-0006ea-VG; Fri, 23 Sep 2022 17:49:24 -0400 Received: from p9ibm ([10.10.71.235]) by outlook.eldorado.org.br over TLS secured channel with Microsoft SMTPSVC(8.5.9600.16384); Fri, 23 Sep 2022 18:47:57 -0300 Received: from eldorado.org.br (unknown [10.10.70.45]) by p9ibm (Postfix) with ESMTP id 73447800491; Fri, 23 Sep 2022 18:47:57 -0300 (-03) From: "Lucas Mateus Castro(alqotel)" To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org Cc: richard.henderson@linaro.org, Daniel Henrique Barboza , "Lucas Mateus Castro (alqotel)" , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , David Gibson , Greg Kurz Subject: [PATCH 05/12] target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec Date: Fri, 23 Sep 2022 18:47:47 -0300 Message-Id: <20220923214754.217819-6-lucas.araujo@eldorado.org.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220923214754.217819-1-lucas.araujo@eldorado.org.br> References: <20220923214754.217819-1-lucas.araujo@eldorado.org.br> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-OriginalArrivalTime: 23 Sep 2022 21:47:57.0718 (UTC) FILETIME=[24D17360:01D8CF96] X-Host-Lookup-Failed: Reverse DNS lookup failed for 200.168.210.66 (failed) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=200.168.210.66; envelope-from=lucas.araujo@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZM-MESSAGEID: 1663970694560100001 Content-Type: text/plain; charset="utf-8" From: "Lucas Mateus Castro (alqotel)" Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to decodetree. vprtybw: rept loop master patch 8 12500 0,01215900 0,00705600 (-42.0%) 25 4000 0,01198700 0,00574400 (-52.1%) 100 1000 0,01307800 0,00692200 (-47.1%) 500 200 0,01794800 0,01558800 (-13.1%) 2500 40 0,04028200 0,05400800 (+34.1%) 8000 12 0,10127300 0,16744700 (+65.3%) vprtybd: rept loop master patch 8 12500 0,00757400 0,00791600 (+4.5%) 25 4000 0,00651300 0,00673700 (+3.4%) 100 1000 0,00713400 0,00837700 (+17.4%) 500 200 0,01195400 0,01937400 (+62.1%) 2500 40 0,03478600 0,07005500 (+101.4%) 8000 12 0,09539600 0,21013500 (+120.3%) vprtybq: rept loop master patch 8 12500 0,00065540 0,00066440 (+1.4%) 25 4000 0,00057720 0,00059850 (+3.7%) 100 1000 0,00066400 0,00069360 (+4.5%) 500 200 0,00115170 0,00127360 (+10.6%) 2500 40 0,00341890 0,00391550 (+14.5%) 8000 12 0,00951220 0,01111480 (+16.8%) I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ, I'm not sure if it's worth to move those instructions. Comparing the=20 assembly of the helper with the TCGop they are pretty similar, so I'm not sure why vprtybd took so much more time. Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- target/ppc/helper.h | 6 ++-- target/ppc/insn32.decode | 4 +++ target/ppc/int_helper.c | 6 ++-- target/ppc/translate/vmx-impl.c.inc | 55 +++++++++++++++++++++++++++-- target/ppc/translate/vmx-ops.c.inc | 3 -- 5 files changed, 62 insertions(+), 12 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index feccf30bcb..6a43e32ad3 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -194,9 +194,9 @@ DEF_HELPER_FLAGS_3(vsro, TCG_CALL_NO_RWG, void, avr, av= r, avr) DEF_HELPER_FLAGS_3(vsrv, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(vslv, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_4(VADDCUW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) -DEF_HELPER_FLAGS_2(vprtybw, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_2(vprtybd, TCG_CALL_NO_RWG, void, avr, avr) -DEF_HELPER_FLAGS_2(vprtybq, TCG_CALL_NO_RWG, void, avr, avr) +DEF_HELPER_FLAGS_3(VPRTYBW, TCG_CALL_NO_RWG, void, avr, avr, i32) +DEF_HELPER_FLAGS_3(VPRTYBD, TCG_CALL_NO_RWG, void, avr, avr, i32) +DEF_HELPER_FLAGS_3(VPRTYBQ, TCG_CALL_NO_RWG, void, avr, avr, i32) DEF_HELPER_FLAGS_4(VSUBCUW, TCG_CALL_NO_RWG, void, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 2658dd3395..aa4968e6b9 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -529,6 +529,10 @@ VCTZDM 000100 ..... ..... ..... 11111000100 = @VX VPDEPD 000100 ..... ..... ..... 10111001101 @VX VPEXTD 000100 ..... ..... ..... 10110001101 @VX =20 +VPRTYBD 000100 ..... 01001 ..... 11000000010 @VX_tb +VPRTYBQ 000100 ..... 01010 ..... 11000000010 @VX_tb +VPRTYBW 000100 ..... 01000 ..... 11000000010 @VX_tb + ## Vector Permute and Formatting Instruction =20 VEXTDUBVLX 000100 ..... ..... ..... ..... 011000 @VA diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 338ebced22..64b2d44a66 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -502,7 +502,7 @@ void helper_VADDCUW(ppc_avr_t *r, ppc_avr_t *a, ppc_avr= _t *b, uint32_t v) } =20 /* vprtybw */ -void helper_vprtybw(ppc_avr_t *r, ppc_avr_t *b) +void helper_VPRTYBW(ppc_avr_t *r, ppc_avr_t *b, uint32_t v) { int i; for (i =3D 0; i < ARRAY_SIZE(r->u32); i++) { @@ -513,7 +513,7 @@ void helper_vprtybw(ppc_avr_t *r, ppc_avr_t *b) } =20 /* vprtybd */ -void helper_vprtybd(ppc_avr_t *r, ppc_avr_t *b) +void helper_VPRTYBD(ppc_avr_t *r, ppc_avr_t *b, uint32_t v) { int i; for (i =3D 0; i < ARRAY_SIZE(r->u64); i++) { @@ -525,7 +525,7 @@ void helper_vprtybd(ppc_avr_t *r, ppc_avr_t *b) } =20 /* vprtybq */ -void helper_vprtybq(ppc_avr_t *r, ppc_avr_t *b) +void helper_VPRTYBQ(ppc_avr_t *r, ppc_avr_t *b, uint32_t v) { uint64_t res =3D b->u64[0] ^ b->u64[1]; res ^=3D res >> 32; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx= -impl.c.inc index 3f614097ac..06d91d1304 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -1659,9 +1659,58 @@ GEN_VXFORM_NOA_ENV(vrfim, 5, 11); GEN_VXFORM_NOA_ENV(vrfin, 5, 8); GEN_VXFORM_NOA_ENV(vrfip, 5, 10); GEN_VXFORM_NOA_ENV(vrfiz, 5, 9); -GEN_VXFORM_NOA(vprtybw, 1, 24); -GEN_VXFORM_NOA(vprtybd, 1, 24); -GEN_VXFORM_NOA(vprtybq, 1, 24); + +static void gen_vprtyb(unsigned vece, TCGv_vec t, TCGv_vec b) +{ + int i; + TCGv_vec tmp =3D tcg_temp_new_vec_matching(b); + /* MO_32 is 2, so 2 iteractions for MO_32 and 3 for MO_64 */ + for (i =3D 0; i < vece; i++) { + tcg_gen_shri_vec(vece, tmp, b, (4 << (vece - i))); + tcg_gen_xor_vec(vece, b, tmp, b); + } + tcg_gen_dupi_vec(vece, tmp, 1); + tcg_gen_and_vec(vece, t, b, tmp); + tcg_temp_free_vec(tmp); +} + +static bool do_vx_vprtyb(DisasContext *ctx, arg_VX_tb *a, unsigned vece) +{ + static const TCGOpcode vecop_list[] =3D { + INDEX_op_shri_vec, 0 + }; + + static const GVecGen2 op[] =3D { + { + .fniv =3D gen_vprtyb, + .fno =3D gen_helper_VPRTYBW, + .opt_opc =3D vecop_list, + .vece =3D MO_32 + }, + { + .fniv =3D gen_vprtyb, + .fno =3D gen_helper_VPRTYBD, + .opt_opc =3D vecop_list, + .vece =3D MO_64 + }, + { + .fno =3D gen_helper_VPRTYBQ, + .vece =3D MO_128 + }, + }; + + REQUIRE_INSNS_FLAGS2(ctx, ISA300); + REQUIRE_VECTOR(ctx); + + tcg_gen_gvec_2(avr_full_offset(a->vrt), avr_full_offset(a->vrb), + 16, 16, &op[vece - MO_32]); + + return true; +} + +TRANS(VPRTYBW, do_vx_vprtyb, MO_32) +TRANS(VPRTYBD, do_vx_vprtyb, MO_64) +TRANS(VPRTYBQ, do_vx_vprtyb, MO_128) =20 static void gen_vsplt(DisasContext *ctx, int vece) { diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-= ops.c.inc index 27908533dd..46a620a232 100644 --- a/target/ppc/translate/vmx-ops.c.inc +++ b/target/ppc/translate/vmx-ops.c.inc @@ -106,9 +106,6 @@ GEN_VXFORM_300(vsrv, 2, 28), GEN_VXFORM_300(vslv, 2, 29), GEN_VXFORM(vslo, 6, 16), GEN_VXFORM(vsro, 6, 17), -GEN_HANDLER_E_2(vprtybw, 0x4, 0x1, 0x18, 8, 0, PPC_NONE, PPC2_ISA300), -GEN_HANDLER_E_2(vprtybd, 0x4, 0x1, 0x18, 9, 0, PPC_NONE, PPC2_ISA300), -GEN_HANDLER_E_2(vprtybq, 0x4, 0x1, 0x18, 10, 0, PPC_NONE, PPC2_ISA300), =20 GEN_VXFORM(xpnd04_1, 0, 22), GEN_VXFORM_300(bcdsr, 0, 23), --=20 2.31.1