From nobody Tue Feb 10 13:37:08 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=linaro.org ARC-Seal: i=1; a=rsa-sha256; t=1661208189; cv=none; d=zohomail.com; s=zohoarc; b=j6hh9QFpV+D4pJskedPFZTNWdkuRZeqFhTMFLKophX6cCZcu2LAc211IDryfHUhBQh7ZvRekg5TDbG12MIRDfX5iGOhiLK1VgwCBnWunvbL9xGgWY8ArOGtU9qEgi9ohhFR7NBJD/yE7e+j5pLgweWi1jL7c/C0cpNjuEpfLF0w= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1661208189; h=Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=PxgN9VWEP3jH1n+0VfN2/ZE7ETcERcb6AXjxgYUaLFg=; b=cYg0ArQqThx7PiuKoaJIMKEClm/SDc0KifKbmaLyA/l8rZtvQMjKux39xWCpuJINNLW9LSnctkhaaC017rdeHuQ/jhf8PlTkiqQm3VfDlK7WROcCgJ5Nka8rzh0FW24N+RZSaDtIkATnM0chWCUTjpEpnRclrr2OEG4KIIsJ04I= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 16612081894721011.476914751993; Mon, 22 Aug 2022 15:43:09 -0700 (PDT) Received: from localhost ([::1]:45810 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oQG8S-00010r-AZ for importer@patchew.org; Mon, 22 Aug 2022 18:43:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58964) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oQG3B-0003II-D0 for qemu-devel@nongnu.org; Mon, 22 Aug 2022 18:37:41 -0400 Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b]:34812) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oQG35-0006Zs-8v for qemu-devel@nongnu.org; Mon, 22 Aug 2022 18:37:37 -0400 Received: by mail-pl1-x62b.google.com with SMTP id jl18so11226953plb.1 for ; Mon, 22 Aug 2022 15:37:34 -0700 (PDT) Received: from stoup.. ([71.212.157.236]) by smtp.gmail.com with ESMTPSA id z6-20020a63e106000000b0042a2777550dsm7017419pgh.47.2022.08.22.15.37.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 15:37:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc; bh=PxgN9VWEP3jH1n+0VfN2/ZE7ETcERcb6AXjxgYUaLFg=; b=uaS6tREJEBYF4e0nqq1HZyBrhYMXsfJ3Q3/P2BU5pHfDrzDHPkzgdt/BAi/HKl16Ys 32pbWLHqpYC6qGebX8jlK4NsBVR0q2zEcqYhaSvgjJzVsKnLRMLNj/w0xKDhbKjwwur7 UqxmzQ3jDIu0jdNujkbLOkf7Qxr57qFxO3O9MRuOhM1juPNu6fF1mdVTg0nZ0T5D5sVR V3NIpEv9npwrW1WjXd4q2JhC3YTmU76Sg2UKeaxsb3SJxRTpq7K81G471RVPoLyn0ZEk N46huLXvwbzaum2ZfJ7oafViqlEdMGv2+FhgMD/Ozje9moVmupnnQ2A9feaB/vuuRXEo mAhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc; bh=PxgN9VWEP3jH1n+0VfN2/ZE7ETcERcb6AXjxgYUaLFg=; b=zlE0tt0BRB7MYyQ0R/qNWTrdv/4R6OfHvx50jMUzcosJ4MfultXlUaFsszjfyB/yxV F01EnY1CroabZ14GXK5gXx/jsENMMKNWbwP2h0kREd9xDIYcDRJhl/4vv9kST9G+M3UM ktJSUUdi4kt21KtgOsLlF01lOk9Eo/XXO9/pFcSk50ynQ1oan79Bpu4hTnN1ZuflBUPb NTRubOh1T1UucBJnC2c4AMyJtcvxrhh+c2+5V4+qg7xFUHYTiX0nZDdZ0Ki3e9Y3g4Rw dJu7+ctXrZdQdfLMt5jcUOuJNb5DKrA+BrDk47JAVl2vHvsZEXX29cZwSitdtCaXGW8V W0Mw== X-Gm-Message-State: ACgBeo0MIEO37sYIyVN4psGYYfhiqpstKMnotc1ujhOjBVN42Gu4/pBY RfymJA55knhd5uhR3RhKXXkVIpvRsVZFzQ== X-Google-Smtp-Source: AA6agR7igKgYoq97aetSfx4FjXiFGrSv98Mdt/giJc6hGU1c3p7qYzZGxXmOKhluxvXGTZ3v5Umj5g== X-Received: by 2002:a17:90b:3ec9:b0:1f5:62d5:4157 with SMTP id rm9-20020a17090b3ec900b001f562d54157mr428989pjb.183.1661207854000; Mon, 22 Aug 2022 15:37:34 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH 6/6] target/i386: Use tcg gvec ops for pmovmskb Date: Mon, 22 Aug 2022 15:37:22 -0700 Message-Id: <20220822223722.1697758-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220822223722.1697758-1-richard.henderson@linaro.org> References: <20220822223722.1697758-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::62b; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x62b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @linaro.org) X-ZM-MESSAGEID: 1661208189745100001 Content-Type: text/plain; charset="utf-8" As pmovmskb is used by strlen et al, this is the third highest overhead sse operation at %0.8. Signed-off-by: Richard Henderson --- target/i386/ops_sse.h | 26 ----------- target/i386/ops_sse_header.h | 1 - target/i386/tcg/translate.c | 86 +++++++++++++++++++++++++++++++----- 3 files changed, 74 insertions(+), 39 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index b21f315f37..9f9801be63 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1098,32 +1098,6 @@ uint32_t helper_movmskpd(CPUX86State *env, Reg *s) =20 #endif =20 -uint32_t glue(helper_pmovmskb, SUFFIX)(CPUX86State *env, Reg *s) -{ - uint32_t val; - - val =3D 0; - val |=3D (s->B(0) >> 7); - val |=3D (s->B(1) >> 6) & 0x02; - val |=3D (s->B(2) >> 5) & 0x04; - val |=3D (s->B(3) >> 4) & 0x08; - val |=3D (s->B(4) >> 3) & 0x10; - val |=3D (s->B(5) >> 2) & 0x20; - val |=3D (s->B(6) >> 1) & 0x40; - val |=3D (s->B(7)) & 0x80; -#if SHIFT =3D=3D 1 - val |=3D (s->B(8) << 1) & 0x0100; - val |=3D (s->B(9) << 2) & 0x0200; - val |=3D (s->B(10) << 3) & 0x0400; - val |=3D (s->B(11) << 4) & 0x0800; - val |=3D (s->B(12) << 5) & 0x1000; - val |=3D (s->B(13) << 6) & 0x2000; - val |=3D (s->B(14) << 7) & 0x4000; - val |=3D (s->B(15) << 8) & 0x8000; -#endif - return val; -} - void glue(helper_packsswb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { Reg r; diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 542701720e..d6bb10342c 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -201,7 +201,6 @@ DEF_HELPER_2(movmskps, i32, env, Reg) DEF_HELPER_2(movmskpd, i32, env, Reg) #endif =20 -DEF_HELPER_2(glue(pmovmskb, SUFFIX), i32, env, Reg) DEF_HELPER_3(glue(packsswb, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(packuswb, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(packssdw, SUFFIX), void, env, Reg, Reg) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index d25d914d63..5829c702d6 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2776,6 +2776,77 @@ static inline void gen_op_movq_env_0(DisasContext *s= , int d_offset) tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset); } =20 +static void gen_pmovmskb_i64(TCGv_i64 d, TCGv_i64 s) +{ + TCGv_i64 t =3D tcg_temp_new_i64(); + + tcg_gen_andi_i64(d, s, 0x8080808080808080ull); + + /* + * After each shift+or pair: + * 0: a.......b.......c.......d.......e.......f.......g.......h....... + * 7: ab......bc......cd......de......ef......fg......gh......h....... + * 14: abcd....bcde....cdef....defg....efgh....fgh.....gh......h....... + * 28: abcdefghbcdefgh.cdefgh..defgh...efgh....fgh.....gh......h....... + * The result is left in the high bits of the word. + */ + tcg_gen_shli_i64(t, d, 7); + tcg_gen_or_i64(d, d, t); + tcg_gen_shli_i64(t, d, 14); + tcg_gen_or_i64(d, d, t); + tcg_gen_shli_i64(t, d, 28); + tcg_gen_or_i64(d, d, t); +} + +static void gen_pmovmskb_vec(unsigned vece, TCGv_vec d, TCGv_vec s) +{ + TCGv_vec t =3D tcg_temp_new_vec_matching(d); + TCGv_vec m =3D tcg_constant_vec_matching(d, MO_8, 0x80); + + /* See above */ + tcg_gen_and_vec(vece, d, s, m); + tcg_gen_shli_vec(vece, t, d, 7); + tcg_gen_or_vec(vece, d, d, t); + tcg_gen_shli_vec(vece, t, d, 14); + tcg_gen_or_vec(vece, d, d, t); + if (vece =3D=3D MO_64) { + tcg_gen_shli_vec(vece, t, d, 28); + tcg_gen_or_vec(vece, d, d, t); + } +} + +static void gen_gvec_pmovmskb(TCGv out, int s_reg, bool is_xmm) +{ + static const TCGOpcode vecop_list[] =3D { INDEX_op_shli_vec, 0 }; + static const GVecGen2 g =3D { + .fni8 =3D gen_pmovmskb_i64, + .fniv =3D gen_pmovmskb_vec, + .opt_opc =3D vecop_list, + .vece =3D MO_64, + .prefer_i64 =3D TCG_TARGET_REG_BITS =3D=3D 64 + }; + + int s_ofs =3D (is_xmm + ? offsetof(CPUX86State, xmm_regs[s_reg].ZMM_X(0)) + : offsetof(CPUX86State, fpregs[s_reg].mmx)); + int d_ofs =3D (is_xmm + ? offsetof(CPUX86State, xmm_t0.ZMM_X(0)) + : offsetof(CPUX86State, mmx_t0)); + int vec_len =3D is_xmm ? 16 : 8; + + tcg_gen_gvec_2(d_ofs, s_ofs, vec_len, vec_len, &g); + + if (is_xmm) { + TCGv t =3D tcg_temp_new(); + tcg_gen_ld8u_tl(t, cpu_env, d_ofs + offsetof(XMMReg, XMM_B(15))); + tcg_gen_ld8u_tl(out, cpu_env, d_ofs + offsetof(XMMReg, XMM_B(7))); + tcg_gen_deposit_tl(out, out, t, 8, TARGET_LONG_BITS - 8); + tcg_temp_free(t); + } else { + tcg_gen_ld8u_tl(out, cpu_env, d_ofs + offsetof(MMXReg, MMX_B(7))); + } +} + typedef void (*SSEFunc_i_ep)(TCGv_i32 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_l_ep)(TCGv_i64 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_0_epi)(TCGv_ptr env, TCGv_ptr reg, TCGv_i32 val); @@ -3742,21 +3813,12 @@ static void gen_sse(CPUX86State *env, DisasContext = *s, int b, break; case 0xd7: /* pmovmskb */ case 0x1d7: - if (mod !=3D 3) + if (mod !=3D 3) { goto illegal_op; - if (b1) { - rm =3D (modrm & 7) | REX_B(s); - tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State, xmm_regs[rm])); - gen_helper_pmovmskb_xmm(s->tmp2_i32, cpu_env, s->ptr0); - } else { - rm =3D (modrm & 7); - tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State, fpregs[rm].mmx)); - gen_helper_pmovmskb_mmx(s->tmp2_i32, cpu_env, s->ptr0); } + rm =3D (modrm & 7) | (is_xmm ? REX_B(s) : 0); reg =3D ((modrm >> 3) & 7) | REX_R(s); - tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32); + gen_gvec_pmovmskb(cpu_regs[reg], rm, is_xmm); break; =20 case 0x138: --=20 2.34.1