From nobody Sun Nov 24 07:59:39 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=linaro.org ARC-Seal: i=1; a=rsa-sha256; t=1726073579; cv=none; d=zohomail.com; s=zohoarc; b=FLMswOq/bKCwwA7UCNCgBtv2SqTAOwmdUWzS3QXR0kNjdcBFzI/DLIRpRKVgnKi/4FJrZnXqeOX8QAZmc8ciYhV99my/tgXIOWzhYUG8OEz9I7qffEzwKTfwUa1nikLoqifDlZ76BCojOrbj+mUU8FzDVrVo1zq7+X3Ac987j3o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1726073579; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=tnL0glGXZj+IRh5HnNxFKu2Xl8Anqsqr5rwYjln/1tc=; b=VWaE85WAlbYUKEpp5YCrs4XvEXRQD8tT0cUM0g+8Qu9rCnXENMq3CTL2lFiH0hT1aFFlLZgt7LnrgMwznYmyYnlNG/vXYJWRa0OTMn7AEgFrAhH2MTGr6UF7PvnU+G1ntyvG8Q8fodeUV1sJPYZ+HTp2NYY8Z3Zs4LJtrhN01Yw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1726073579243163.3621614555584; Wed, 11 Sep 2024 09:52:59 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1soQYT-00021I-6d; Wed, 11 Sep 2024 12:50:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1soQYR-0001wJ-K7 for qemu-devel@nongnu.org; Wed, 11 Sep 2024 12:50:55 -0400 Received: from mail-pf1-x42c.google.com ([2607:f8b0:4864:20::42c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1soQYP-0003eL-Ff for qemu-devel@nongnu.org; Wed, 11 Sep 2024 12:50:55 -0400 Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-71911585ac4so1699794b3a.1 for ; Wed, 11 Sep 2024 09:50:52 -0700 (PDT) Received: from stoup.. (174-21-81-121.tukw.qwest.net. [174.21.81.121]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71908fe4e7esm3186947b3a.80.2024.09.11.09.50.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Sep 2024 09:50:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1726073452; x=1726678252; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tnL0glGXZj+IRh5HnNxFKu2Xl8Anqsqr5rwYjln/1tc=; b=TChzLhdXqaZINMF4UoycKvqMxVIBIZMqRlpFnRkwO4JisUzLdXlqiRSY0JSLWYEot0 gL6iKMO3t8/bnakxJ4kwzIxTAvlwWiSqyC+l6w4bXQo42G8P2U1CjgfAodUvNgxCoeF0 vTuroKrkJ3V1KivbM9vMTA3sMSxhhAvn1Z1nWLJOnWDX7NtPYkUz5f4hYxlA1YP0elB1 YnIEVzJCXzoDzMHB+KB+hPSBub4gBrPnYrzSu7wrMpiHIMAmzGjwAu7UyeoYnOZRZJx7 00N4Zn/BUzSYNapZbrBtjcI1tvdLrF6vtZa/oD9dNTpJgQW8fb1+DIyWNugO48+5+TtG q1mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726073452; x=1726678252; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tnL0glGXZj+IRh5HnNxFKu2Xl8Anqsqr5rwYjln/1tc=; b=pGxIYzQ5M61JiTT7T/IaWl4f+3LT++zUYz+8+E7jkdCufr/WL9q8ircoQ94999PsJ6 9EVb5vmub4W/mfzq32w12441sd78jDIph+FgVZOdfT8Ioygidx1hE2yqmHTTABoqowKT ds9zRjqg8qSp9mRn+yJkmOt7ZRx2GMP1s8BV7ZK3bM2OsSpDgkjbkZb4X6EZHjcAugvu 7ZdE+zz8aES6H0RVUk3cvQresDJOZUIEAPxyGOlwE/fRNK6iLJfScYI6XS1JBSGrCsrr 8WuOx2XhwCK369EQ+3cPcdaikIlWmmojcoIys8lRaFrM00XrVqrHFqcIvu7rTRf97zPs Lhew== X-Gm-Message-State: AOJu0Yz7tckxQTaq5fkOSGFMgDGuPJBnorEGpp6NzrpKmom2F5lRJYPO jAiUSdow77ivbLnRcA0qc5Ol7m/fzHOK84QwZb2pdFQ8g/ffkoH2bypMzVpLat18OMVO/q5exIQ 3 X-Google-Smtp-Source: AGHT+IENDTzTQo5JQkpteKRYdKoLdI6zdDF46E3WnrjkI34guG67n1zUR+hbaUidD9WncAFS9oFjBg== X-Received: by 2002:a05:6a00:2381:b0:70d:21d9:e2ae with SMTP id d2e1a72fcca58-71926063653mr26411b3a.6.1726073451767; Wed, 11 Sep 2024 09:50:51 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: zhiwei_liu@linux.alibaba.com, tangtiancheng.ttc@alibaba-inc.com, philmd@linaro.org Subject: [PATCH v2 03/18] tcg/i386: Do not expand cmp_vec early Date: Wed, 11 Sep 2024 09:50:32 -0700 Message-ID: <20240911165047.1035764-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240911165047.1035764-1-richard.henderson@linaro.org> References: <20240911165047.1035764-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::42c; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @linaro.org) X-ZM-MESSAGEID: 1726073580828116600 Content-Type: text/plain; charset="utf-8" Move most of expansion to opcode generation, leaving the conversion of unsigned to signed to be done in the early phase. Small inefficiencies, but not incorrect results, are introduced until cmpsel_vec is converted in the next patch. Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.c.inc | 223 +++++++++++++++++--------------------- 1 file changed, 100 insertions(+), 123 deletions(-) diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc index af71a397b1..278e567b56 100644 --- a/tcg/i386/tcg-target.c.inc +++ b/tcg/i386/tcg-target.c.inc @@ -3029,6 +3029,92 @@ static inline void tcg_out_op(TCGContext *s, TCGOpco= de opc, #undef OP_32_64 } =20 +static int const umin_insn[4] =3D { + OPC_PMINUB, OPC_PMINUW, OPC_PMINUD, OPC_VPMINUQ +}; + +static int const umax_insn[4] =3D { + OPC_PMAXUB, OPC_PMAXUW, OPC_PMAXUD, OPC_VPMAXUQ +}; + +static bool tcg_out_cmp_vec_noinv(TCGContext *s, TCGType type, unsigned ve= ce, + TCGReg v0, TCGReg v1, TCGReg v2, TCGCond= cond) +{ + static int const cmpeq_insn[4] =3D { + OPC_PCMPEQB, OPC_PCMPEQW, OPC_PCMPEQD, OPC_PCMPEQQ + }; + static int const cmpgt_insn[4] =3D { + OPC_PCMPGTB, OPC_PCMPGTW, OPC_PCMPGTD, OPC_PCMPGTQ + }; + + enum { + NEED_INV =3D 1, + NEED_SWAP =3D 2, + NEED_UMIN =3D 4, + NEED_UMAX =3D 8, + INVALID =3D 16, + }; + static const uint8_t cond_fixup[16] =3D { + [0 ... 15] =3D INVALID, + [TCG_COND_EQ] =3D 0, + [TCG_COND_GT] =3D 0, + [TCG_COND_NE] =3D NEED_INV, + [TCG_COND_LE] =3D NEED_INV, + [TCG_COND_LT] =3D NEED_SWAP, + [TCG_COND_GE] =3D NEED_SWAP | NEED_INV, + [TCG_COND_LEU] =3D NEED_UMIN, + [TCG_COND_GTU] =3D NEED_UMIN | NEED_INV, + [TCG_COND_GEU] =3D NEED_UMAX, + [TCG_COND_LTU] =3D NEED_UMAX | NEED_INV, + }; + int fixup =3D cond_fixup[cond]; + + assert(!(fixup & INVALID)); + + if (fixup & NEED_INV) { + cond =3D tcg_invert_cond(cond); + } + + if (fixup & NEED_SWAP) { + TCGReg swap =3D v1; + v1 =3D v2; + v2 =3D swap; + cond =3D tcg_swap_cond(cond); + } + + if (fixup & (NEED_UMIN | NEED_UMAX)) { + int op =3D (fixup & NEED_UMIN ? umin_insn[vece] : umax_insn[vece]); + + /* avx2 does not have 64-bit min/max; adjusted during expand. */ + assert(vece <=3D MO_32); + + tcg_out_vex_modrm_type(s, op, TCG_TMP_VEC, v1, v2, type); + v2 =3D TCG_TMP_VEC; + cond =3D TCG_COND_EQ; + } + + switch (cond) { + case TCG_COND_EQ: + tcg_out_vex_modrm_type(s, cmpeq_insn[vece], v0, v1, v2, type); + break; + case TCG_COND_GT: + tcg_out_vex_modrm_type(s, cmpgt_insn[vece], v0, v1, v2, type); + break; + default: + g_assert_not_reached(); + } + return fixup & NEED_INV; +} + +static void tcg_out_cmp_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg v0, TCGReg v1, TCGReg v2, TCGCond cond) +{ + if (tcg_out_cmp_vec_noinv(s, type, vece, v0, v1, v2, cond)) { + tcg_out_dupi_vec(s, type, vece, TCG_TMP_VEC, -1); + tcg_out_vex_modrm_type(s, OPC_PXOR, v0, v0, TCG_TMP_VEC, type); + } +} + static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl, unsigned vece, const TCGArg args[TCG_MAX_OP_ARGS], @@ -3058,12 +3144,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode = opc, static int const shift_imm_insn[4] =3D { OPC_UD2, OPC_PSHIFTW_Ib, OPC_PSHIFTD_Ib, OPC_PSHIFTQ_Ib }; - static int const cmpeq_insn[4] =3D { - OPC_PCMPEQB, OPC_PCMPEQW, OPC_PCMPEQD, OPC_PCMPEQQ - }; - static int const cmpgt_insn[4] =3D { - OPC_PCMPGTB, OPC_PCMPGTW, OPC_PCMPGTD, OPC_PCMPGTQ - }; static int const punpckl_insn[4] =3D { OPC_PUNPCKLBW, OPC_PUNPCKLWD, OPC_PUNPCKLDQ, OPC_PUNPCKLQDQ }; @@ -3082,12 +3162,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode = opc, static int const smax_insn[4] =3D { OPC_PMAXSB, OPC_PMAXSW, OPC_PMAXSD, OPC_VPMAXSQ }; - static int const umin_insn[4] =3D { - OPC_PMINUB, OPC_PMINUW, OPC_PMINUD, OPC_VPMINUQ - }; - static int const umax_insn[4] =3D { - OPC_PMAXUB, OPC_PMAXUW, OPC_PMAXUD, OPC_VPMAXUQ - }; static int const rotlv_insn[4] =3D { OPC_UD2, OPC_UD2, OPC_VPROLVD, OPC_VPROLVQ }; @@ -3243,15 +3317,8 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode = opc, break; =20 case INDEX_op_cmp_vec: - sub =3D args[3]; - if (sub =3D=3D TCG_COND_EQ) { - insn =3D cmpeq_insn[vece]; - } else if (sub =3D=3D TCG_COND_GT) { - insn =3D cmpgt_insn[vece]; - } else { - g_assert_not_reached(); - } - goto gen_simd; + tcg_out_cmp_vec(s, type, vece, a0, a1, a2, args[3]); + break; =20 case INDEX_op_andc_vec: insn =3D OPC_PANDN; @@ -3971,88 +4038,19 @@ static void expand_vec_mul(TCGType type, unsigned v= ece, } } =20 -static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0, - TCGv_vec v1, TCGv_vec v2, TCGCond cond) +static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0, + TCGv_vec v1, TCGv_vec v2, TCGCond cond) { - enum { - NEED_INV =3D 1, - NEED_SWAP =3D 2, - NEED_BIAS =3D 4, - NEED_UMIN =3D 8, - NEED_UMAX =3D 16, - }; - TCGv_vec t1, t2, t3; - uint8_t fixup; + /* + * Without AVX512, there are no 64-bit unsigned comparisons. + * We must bias the inputs so that they become signed. + * All other swapping and inversion are handled during code generation. + */ + if (vece =3D=3D MO_64 && is_unsigned_cond(cond)) { + TCGv_vec t1 =3D tcg_temp_new_vec(type); + TCGv_vec t2 =3D tcg_temp_new_vec(type); + TCGv_vec t3 =3D tcg_constant_vec(type, vece, 1ull << ((8 << vece) = - 1)); =20 - switch (cond) { - case TCG_COND_EQ: - case TCG_COND_GT: - fixup =3D 0; - break; - case TCG_COND_NE: - case TCG_COND_LE: - fixup =3D NEED_INV; - break; - case TCG_COND_LT: - fixup =3D NEED_SWAP; - break; - case TCG_COND_GE: - fixup =3D NEED_SWAP | NEED_INV; - break; - case TCG_COND_LEU: - if (tcg_can_emit_vec_op(INDEX_op_umin_vec, type, vece)) { - fixup =3D NEED_UMIN; - } else { - fixup =3D NEED_BIAS | NEED_INV; - } - break; - case TCG_COND_GTU: - if (tcg_can_emit_vec_op(INDEX_op_umin_vec, type, vece)) { - fixup =3D NEED_UMIN | NEED_INV; - } else { - fixup =3D NEED_BIAS; - } - break; - case TCG_COND_GEU: - if (tcg_can_emit_vec_op(INDEX_op_umax_vec, type, vece)) { - fixup =3D NEED_UMAX; - } else { - fixup =3D NEED_BIAS | NEED_SWAP | NEED_INV; - } - break; - case TCG_COND_LTU: - if (tcg_can_emit_vec_op(INDEX_op_umax_vec, type, vece)) { - fixup =3D NEED_UMAX | NEED_INV; - } else { - fixup =3D NEED_BIAS | NEED_SWAP; - } - break; - default: - g_assert_not_reached(); - } - - if (fixup & NEED_INV) { - cond =3D tcg_invert_cond(cond); - } - if (fixup & NEED_SWAP) { - t1 =3D v1, v1 =3D v2, v2 =3D t1; - cond =3D tcg_swap_cond(cond); - } - - t1 =3D t2 =3D NULL; - if (fixup & (NEED_UMIN | NEED_UMAX)) { - t1 =3D tcg_temp_new_vec(type); - if (fixup & NEED_UMIN) { - tcg_gen_umin_vec(vece, t1, v1, v2); - } else { - tcg_gen_umax_vec(vece, t1, v1, v2); - } - v2 =3D t1; - cond =3D TCG_COND_EQ; - } else if (fixup & NEED_BIAS) { - t1 =3D tcg_temp_new_vec(type); - t2 =3D tcg_temp_new_vec(type); - t3 =3D tcg_constant_vec(type, vece, 1ull << ((8 << vece) - 1)); tcg_gen_sub_vec(vece, t1, v1, t3); tcg_gen_sub_vec(vece, t2, v2, t3); v1 =3D t1; @@ -4060,26 +4058,9 @@ static bool expand_vec_cmp_noinv(TCGType type, unsig= ned vece, TCGv_vec v0, cond =3D tcg_signed_cond(cond); } =20 - tcg_debug_assert(cond =3D=3D TCG_COND_EQ || cond =3D=3D TCG_COND_GT); /* Expand directly; do not recurse. */ vec_gen_4(INDEX_op_cmp_vec, type, vece, tcgv_vec_arg(v0), tcgv_vec_arg(v1), tcgv_vec_arg(v2), cond); - - if (t1) { - tcg_temp_free_vec(t1); - if (t2) { - tcg_temp_free_vec(t2); - } - } - return fixup & NEED_INV; -} - -static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0, - TCGv_vec v1, TCGv_vec v2, TCGCond cond) -{ - if (expand_vec_cmp_noinv(type, vece, v0, v1, v2, cond)) { - tcg_gen_not_vec(vece, v0, v0); - } } =20 static void expand_vec_cmpsel(TCGType type, unsigned vece, TCGv_vec v0, @@ -4088,11 +4069,7 @@ static void expand_vec_cmpsel(TCGType type, unsigned= vece, TCGv_vec v0, { TCGv_vec t =3D tcg_temp_new_vec(type); =20 - if (expand_vec_cmp_noinv(type, vece, t, c1, c2, cond)) { - /* Invert the sense of the compare by swapping arguments. */ - TCGv_vec x; - x =3D v3, v3 =3D v4, v4 =3D x; - } + expand_vec_cmp(type, vece, t, c1, c2, cond); vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, vece, tcgv_vec_arg(v0), tcgv_vec_arg(v4), tcgv_vec_arg(v3), tcgv_vec_arg(t)); --=20 2.43.0