From nobody Tue Feb 10 01:00:06 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1665699370; cv=none; d=zohomail.com; s=zohoarc; b=m778jBjIkK55l660IhDTZfbhZwOfUx3KK+zE0UoID87NfT/Cw2VqlvsyM2AKW1ChyNXtpZrM3qhPj89mumO88POalJZU+AV86a/+5LDwvpaKSKB/mQaGBP1wNhSjGM0obpitxgF70+5BMkLMfbnIG790ajbRWncQnrhCEtbPPa4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1665699370; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=LDe/5KRneq7ZuTQPljreTWaGuzYLxvydtBy+AA57TAQ=; b=TBBSl0R3yu3HggjrIl3Ypy2Db1mvA5xGYYSUU/xSSia/UgBe99mNVem7xwsRNWD+IfITAS9rCOqCZemnITDQo+7ZMNEUpaCso6xdNVpF7gSYwDTT1k11gzmXcK4V3YZlxjwJlX3npJKEJ0zEOkqSPemv/jlclwPS2Qd2EJjqq+c= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1665699370047583.5288417716897; Thu, 13 Oct 2022 15:16:10 -0700 (PDT) Received: from localhost ([::1]:52262 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oj6Uq-0003Ca-GF for importer@patchew.org; Thu, 13 Oct 2022 18:16:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35904) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oj63w-0002rk-22 for qemu-devel@nongnu.org; Thu, 13 Oct 2022 17:48:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:46028) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oj63s-0005Jo-Bp for qemu-devel@nongnu.org; Thu, 13 Oct 2022 17:48:19 -0400 Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-252-8yWz1f6zOz2KY5tPMGq_uQ-1; Thu, 13 Oct 2022 17:48:12 -0400 Received: by mail-ed1-f71.google.com with SMTP id r16-20020a05640251d000b004599cfb8b95so2315702edd.5 for ; Thu, 13 Oct 2022 14:48:12 -0700 (PDT) Received: from avogadro.local ([2001:b07:6468:f312:1c09:f536:3de6:228c]) by smtp.gmail.com with ESMTPSA id c22-20020a170906155600b007081282cbd8sm458450ejd.76.2022.10.13.14.48.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Oct 2022 14:48:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665697694; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LDe/5KRneq7ZuTQPljreTWaGuzYLxvydtBy+AA57TAQ=; b=WYZPQJ1eZAnA8VyH3XJ6wB2qxiszD+4vqbYBuBO7h/Y7K/fDJZzHohNp8ZEvmlir8gKZdz D6pcb/LB9S6jHNvzwNWyMQpUgPYFeRYVPA/SX2y0FtWt/qxo103sCZDdEGMJ/TRgzEqWyw NeXxUgwAJ3cbXJmzGKkCBwR7wRvuWiY= X-MC-Unique: 8yWz1f6zOz2KY5tPMGq_uQ-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LDe/5KRneq7ZuTQPljreTWaGuzYLxvydtBy+AA57TAQ=; b=TjrMlsx4fj8ySNDU/mRk5tUTfDbDvZIjL3DeWpcWNpf1VCoHy3SzKPWfD/l6vK/clA 7ZpsN4iAsoJZ+vSxCv5LgYO2z7/Kv1xf5buiv6SfpjqorPPXBTAA0ScOElAV+aVs+5Yz ygaXoXsUCF4nvHOvwxUFmizzCqowBIddlUjZrxLpmIrWq6def4GgZg/DHkuu0PASffvP 1wmNp1gY+UNJPGImJPdKOkPhEqM5WcSSmx/DkWCN9f+EkBcLjMUMdcLXZi+dR2N/3Imf enhzZMmphknTyi0YDrvLBU3Rus6ICwRVbUnsiOAICrrWR83IewvaSq8H2GFg0Ueu9hul zfGw== X-Gm-Message-State: ACrzQf1EvU0OhE2Q+eLP7Kps6dZL96PN4dwR9V2xVygFts62VTzOpa4L NZKcTOGuLuEbdIr9oriIGS/fOtySxtnTWv62ajeUkXDBuIE73DARbJ66R2ZFQPu5roWtgLtzXuZ Vta8/CAB3rPakGJtQhwkNY13UgPLWmyE/1+xVCZKTDSO0zGk19ASKVUVPRQlDK1IGoZM= X-Received: by 2002:a50:fe85:0:b0:458:5562:bf1e with SMTP id d5-20020a50fe85000000b004585562bf1emr1504577edt.167.1665697690501; Thu, 13 Oct 2022 14:48:10 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5vhkpOMl2lTcQUamq8liBV1OKfk8S5MIPpD00vbYqIqVUD1/wBgXv4Dt1aj3f6Eo6yE6Sn7g== X-Received: by 2002:a50:fe85:0:b0:458:5562:bf1e with SMTP id d5-20020a50fe85000000b004585562bf1emr1504540edt.167.1665697689821; Thu, 13 Oct 2022 14:48:09 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: paul@nowt.org, richard.henderson@linaro.org Subject: [PATCH 26/35] target/i386: reimplement 0x0f 0x38, add AVX Date: Thu, 13 Oct 2022 23:46:42 +0200 Message-Id: <20221013214651.672114-27-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221013214651.672114-1-pbonzini@redhat.com> References: <20221013214651.672114-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1665699371866100001 Content-Type: text/plain; charset="utf-8" There are several special cases here: 1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov. 2) some instructions, such as variable-width shifts, select the vector elem= ent size via REX.W. 3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands. 3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order. X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument. These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook . Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/ops_sse.h | 188 ++++++++++++++++++++++++++- target/i386/ops_sse_header.h | 19 +++ target/i386/tcg/decode-new.c.inc | 119 +++++++++++++++++- target/i386/tcg/decode-new.h | 6 + target/i386/tcg/emit.c.inc | 210 ++++++++++++++++++++++++++++++- target/i386/tcg/translate.c | 2 +- 6 files changed, 536 insertions(+), 8 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index cb8909adcf..104a53fda0 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2382,6 +2382,36 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86Stat= e *env, Reg *d, Reg *s, #endif =20 #if SHIFT >=3D 1 +void glue(helper_vpermilpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg = *s) +{ + uint64_t r0, r1; + int i; + + for (i =3D 0; i < 1 << SHIFT; i +=3D 2) { + r0 =3D v->Q(i + ((s->Q(i) >> 1) & 1)); + r1 =3D v->Q(i + ((s->Q(i+1) >> 1) & 1)); + d->Q(i) =3D r0; + d->Q(i+1) =3D r1; + } +} + +void glue(helper_vpermilps, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg = *s) +{ + uint32_t r0, r1, r2, r3; + int i; + + for (i =3D 0; i < 2 << SHIFT; i +=3D 4) { + r0 =3D v->L(i + (s->L(i) & 3)); + r1 =3D v->L(i + (s->L(i+1) & 3)); + r2 =3D v->L(i + (s->L(i+2) & 3)); + r3 =3D v->L(i + (s->L(i+3) & 3)); + d->L(i) =3D r0; + d->L(i+1) =3D r1; + d->L(i+2) =3D r2; + d->L(i+3) =3D r3; + } +} + void glue(helper_vpermilpd_imm, SUFFIX)(Reg *d, Reg *s, uint32_t order) { uint64_t r0, r1; @@ -2414,6 +2444,150 @@ void glue(helper_vpermilps_imm, SUFFIX)(Reg *d, Reg= *s, uint32_t order) } } =20 +#if SHIFT =3D=3D 1 +#define FPSRLVD(x, c) (c < 32 ? ((x) >> c) : 0) +#define FPSRLVQ(x, c) (c < 64 ? ((x) >> c) : 0) +#define FPSRAVD(x, c) ((int32_t)(x) >> (c < 64 ? c : 31)) +#define FPSRAVQ(x, c) ((int64_t)(x) >> (c < 64 ? c : 63)) +#define FPSLLVD(x, c) (c < 32 ? ((x) << c) : 0) +#define FPSLLVQ(x, c) (c < 64 ? ((x) << c) : 0) +#endif + +SSE_HELPER_L(helper_vpsrlvd, FPSRLVD) +SSE_HELPER_L(helper_vpsravd, FPSRAVD) +SSE_HELPER_L(helper_vpsllvd, FPSLLVD) + +SSE_HELPER_Q(helper_vpsrlvq, FPSRLVQ) +SSE_HELPER_Q(helper_vpsravq, FPSRAVQ) +SSE_HELPER_Q(helper_vpsllvq, FPSLLVQ) + +void glue(helper_vtestps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + uint32_t zf =3D 0, cf =3D 0; + int i; + + for (i =3D 0; i < 2 << SHIFT; i++) { + zf |=3D (s->L(i) & d->L(i)); + cf |=3D (s->L(i) & ~d->L(i)); + } + CC_SRC =3D ((zf >> 31) ? 0 : CC_Z) | ((cf >> 31) ? 0 : CC_C); +} + +void glue(helper_vtestpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + uint64_t zf =3D 0, cf =3D 0; + int i; + + for (i =3D 0; i < 1 << SHIFT; i++) { + zf |=3D (s->Q(i) & d->Q(i)); + cf |=3D (s->Q(i) & ~d->Q(i)); + } + CC_SRC =3D ((zf >> 63) ? 0 : CC_Z) | ((cf >> 63) ? 0 : CC_C); +} + +void glue(helper_vpmaskmovd_st, SUFFIX)(CPUX86State *env, + Reg *v, Reg *s, target_ulong a0) +{ + int i; + + for (i =3D 0; i < (2 << SHIFT); i++) { + if (v->L(i) >> 31) { + cpu_stl_data_ra(env, a0 + i * 4, s->L(i), GETPC()); + } + } +} + +void glue(helper_vpmaskmovq_st, SUFFIX)(CPUX86State *env, + Reg *v, Reg *s, target_ulong a0) +{ + int i; + + for (i =3D 0; i < (1 << SHIFT); i++) { + if (v->Q(i) >> 63) { + cpu_stq_data_ra(env, a0 + i * 8, s->Q(i), GETPC()); + } + } +} + +void glue(helper_vpmaskmovd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg= *s) +{ + int i; + + for (i =3D 0; i < (2 << SHIFT); i++) { + d->L(i) =3D (v->L(i) >> 31) ? s->L(i) : 0; + } +} + +void glue(helper_vpmaskmovq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg= *s) +{ + int i; + + for (i =3D 0; i < (1 << SHIFT); i++) { + d->Q(i) =3D (v->Q(i) >> 63) ? s->Q(i) : 0; + } +} + +void glue(helper_vpgatherdd, SUFFIX)(CPUX86State *env, + Reg *d, Reg *v, Reg *s, target_ulong a0, unsigned scale) +{ + int i; + for (i =3D 0; i < (2 << SHIFT); i++) { + if (v->L(i) >> 31) { + target_ulong addr =3D a0 + + ((target_ulong)(int32_t)s->L(i) << scale); + d->L(i) =3D cpu_ldl_data_ra(env, addr, GETPC()); + } + v->L(i) =3D 0; + } +} + +void glue(helper_vpgatherdq, SUFFIX)(CPUX86State *env, + Reg *d, Reg *v, Reg *s, target_ulong a0, unsigned scale) +{ + int i; + for (i =3D 0; i < (1 << SHIFT); i++) { + if (v->Q(i) >> 63) { + target_ulong addr =3D a0 + + ((target_ulong)(int32_t)s->L(i) << scale); + d->Q(i) =3D cpu_ldq_data_ra(env, addr, GETPC()); + } + v->Q(i) =3D 0; + } +} + +void glue(helper_vpgatherqd, SUFFIX)(CPUX86State *env, + Reg *d, Reg *v, Reg *s, target_ulong a0, unsigned scale) +{ + int i; + for (i =3D 0; i < (1 << SHIFT); i++) { + if (v->L(i) >> 31) { + target_ulong addr =3D a0 + + ((target_ulong)(int64_t)s->Q(i) << scale); + d->L(i) =3D cpu_ldl_data_ra(env, addr, GETPC()); + } + v->L(i) =3D 0; + } + for (i /=3D 2; i < 1 << SHIFT; i++) { + d->Q(i) =3D 0; + v->Q(i) =3D 0; + } +} + +void glue(helper_vpgatherqq, SUFFIX)(CPUX86State *env, + Reg *d, Reg *v, Reg *s, target_ulong a0, unsigned scale) +{ + int i; + for (i =3D 0; i < (1 << SHIFT); i++) { + if (v->Q(i) >> 63) { + target_ulong addr =3D a0 + + ((target_ulong)(int64_t)s->Q(i) << scale); + d->Q(i) =3D cpu_ldq_data_ra(env, addr, GETPC()); + } + v->Q(i) =3D 0; + } +} +#endif + #if SHIFT >=3D 2 void helper_vpermdq_ymm(Reg *d, Reg *v, Reg *s, uint32_t order) { @@ -2473,7 +2647,19 @@ void helper_vpermq_ymm(Reg *d, Reg *s, uint32_t orde= r) d->Q(2) =3D r2; d->Q(3) =3D r3; } -#endif + +void helper_vpermd_ymm(Reg *d, Reg *v, Reg *s) +{ + uint32_t r[8]; + int i; + + for (i =3D 0; i < 8; i++) { + r[i] =3D s->L(v->L(i) & 7); + } + for (i =3D 0; i < 8; i++) { + d->L(i) =3D r[i]; + } +} #endif =20 #undef SSE_HELPER_S diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 1afc4ff6a1..dd8dcebc23 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -413,9 +413,28 @@ DEF_HELPER_5(glue(pclmulqdq, SUFFIX), void, env, Reg, = Reg, Reg, i32) =20 /* AVX helpers */ #if SHIFT >=3D 1 +DEF_HELPER_4(glue(vpermilpd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpermilps, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_3(glue(vpermilpd_imm, SUFFIX), void, Reg, Reg, i32) DEF_HELPER_3(glue(vpermilps_imm, SUFFIX), void, Reg, Reg, i32) +DEF_HELPER_4(glue(vpsrlvd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsravd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsllvd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsrlvq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsravq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsllvq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_3(glue(vtestps, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(vtestpd, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(vpmaskmovd_st, SUFFIX), void, env, Reg, Reg, tl) +DEF_HELPER_4(glue(vpmaskmovq_st, SUFFIX), void, env, Reg, Reg, tl) +DEF_HELPER_4(glue(vpmaskmovd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpmaskmovq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_6(glue(vpgatherdd, SUFFIX), void, env, Reg, Reg, Reg, tl, i32) +DEF_HELPER_6(glue(vpgatherdq, SUFFIX), void, env, Reg, Reg, Reg, tl, i32) +DEF_HELPER_6(glue(vpgatherqd, SUFFIX), void, env, Reg, Reg, Reg, tl, i32) +DEF_HELPER_6(glue(vpgatherqq, SUFFIX), void, env, Reg, Reg, Reg, tl, i32) #if SHIFT =3D=3D 2 +DEF_HELPER_3(vpermd_ymm, void, Reg, Reg, Reg) DEF_HELPER_4(vpermdq_ymm, void, Reg, Reg, Reg, i32) DEF_HELPER_3(vpermq_ymm, void, Reg, Reg, i32) #endif diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.= c.inc index 1c2e8f559d..6732720302 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -92,6 +92,7 @@ #define mmx .special =3D X86_SPECIAL_MMX, #define zext0 .special =3D X86_SPECIAL_ZExtOp0, #define zext2 .special =3D X86_SPECIAL_ZExtOp2, +#define avx_movx .special =3D X86_SPECIAL_AVXExtMov, =20 #define vex1 .vex_class =3D 1, #define vex1_rep3 .vex_class =3D 1, .vex_special =3D X86_VEX_REPScalar, @@ -302,6 +303,105 @@ static void decode_0FD6(DisasContext *s, CPUX86State = *env, X86OpEntry *entry, ui } =20 static const X86OpEntry opcodes_0F38_00toEF[240] =3D { + [0x00] =3D X86_OP_ENTRY3(PSHUFB, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), + [0x01] =3D X86_OP_ENTRY3(PHADDW, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), + [0x02] =3D X86_OP_ENTRY3(PHADDD, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), + [0x03] =3D X86_OP_ENTRY3(PHADDSW, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), + [0x04] =3D X86_OP_ENTRY3(PMADDUBSW, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), + [0x05] =3D X86_OP_ENTRY3(PHSUBW, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), + [0x06] =3D X86_OP_ENTRY3(PHSUBD, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), + [0x07] =3D X86_OP_ENTRY3(PHSUBSW, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), + + [0x10] =3D X86_OP_ENTRY2(PBLENDVB, V,x, W,x, vex4 cpuid(SSE4= 1) avx2_256 p_66), + [0x14] =3D X86_OP_ENTRY2(BLENDVPS, V,x, W,x, vex4 cpuid(SSE4= 1) p_66), + [0x15] =3D X86_OP_ENTRY2(BLENDVPD, V,x, W,x, vex4 cpuid(SSE4= 1) p_66), + /* Listed incorrectly as type 4 */ + [0x16] =3D X86_OP_ENTRY3(VPERMD, V,qq, H,qq, W,qq, vex6 cpuid= (AVX2) p_66), + [0x17] =3D X86_OP_ENTRY3(VPTEST, None,None, V,x, W,x, vex4 cpuid= (SSE41) p_66), + + /* + * Source operand listed as Mq/Ux and similar in the manual; incorrect= ly listed + * as 128-bit only in 2-17. + */ + [0x20] =3D X86_OP_ENTRY3(VPMOVSXBW, V,x, None,None, W,q, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x21] =3D X86_OP_ENTRY3(VPMOVSXBD, V,x, None,None, W,d, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x22] =3D X86_OP_ENTRY3(VPMOVSXBQ, V,x, None,None, W,w, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x23] =3D X86_OP_ENTRY3(VPMOVSXWD, V,x, None,None, W,q, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x24] =3D X86_OP_ENTRY3(VPMOVSXWQ, V,x, None,None, W,d, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x25] =3D X86_OP_ENTRY3(VPMOVSXDQ, V,x, None,None, W,q, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + + /* Same as PMOVSX. */ + [0x30] =3D X86_OP_ENTRY3(VPMOVZXBW, V,x, None,None, W,q, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x31] =3D X86_OP_ENTRY3(VPMOVZXBD, V,x, None,None, W,d, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x32] =3D X86_OP_ENTRY3(VPMOVZXBQ, V,x, None,None, W,w, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x33] =3D X86_OP_ENTRY3(VPMOVZXWD, V,x, None,None, W,q, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x34] =3D X86_OP_ENTRY3(VPMOVZXWQ, V,x, None,None, W,d, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x35] =3D X86_OP_ENTRY3(VPMOVZXDQ, V,x, None,None, W,q, vex5 cpuid= (SSE41) avx_movx avx2_256 p_66), + [0x36] =3D X86_OP_ENTRY3(VPERMD, V,qq, H,qq, W,qq, vex6 cpuid= (AVX2) p_66), + [0x37] =3D X86_OP_ENTRY3(PCMPGTQ, V,x, H,x, W,x, vex4 cpuid= (SSE42) avx2_256 p_66), + + [0x40] =3D X86_OP_ENTRY3(PMULLD, V,x, H,x, W,x, vex4 cpui= d(SSE41) avx2_256 p_66), + [0x41] =3D X86_OP_ENTRY3(VPHMINPOSUW, V,dq, None,None, W,dq, vex4 cpui= d(SSE41) p_66), + /* Listed incorrectly as type 4 */ + [0x45] =3D X86_OP_ENTRY3(VPSRLV, V,x, H,x, W,x, vex6 cpui= d(AVX2) p_66), + [0x46] =3D X86_OP_ENTRY3(VPSRAV, V,x, H,x, W,x, vex6 cpui= d(AVX2) p_66), + [0x47] =3D X86_OP_ENTRY3(VPSLLV, V,x, H,x, W,x, vex6 cpui= d(AVX2) p_66), + + [0x90] =3D X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 cpuid(AVX2= ) p_66), /* vpgatherdd/q */ + [0x91] =3D X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 cpuid(AVX2= ) p_66), /* vpgatherqd/q */ + [0x92] =3D X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 cpuid(AVX2= ) p_66), /* vgatherdps/d */ + [0x93] =3D X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 cpuid(AVX2= ) p_66), /* vgatherqps/d */ + + [0x08] =3D X86_OP_ENTRY3(PSIGNB, V,x, H,x, W,x, vex4 cpuid= (SSSE3) mmx avx2_256 p_00_66), + [0x09] =3D X86_OP_ENTRY3(PSIGNW, V,x, H,x, W,x, vex4 cpuid= (SSSE3) mmx avx2_256 p_00_66), + [0x0a] =3D X86_OP_ENTRY3(PSIGND, V,x, H,x, W,x, vex4 cpuid= (SSSE3) mmx avx2_256 p_00_66), + [0x0b] =3D X86_OP_ENTRY3(PMULHRSW, V,x, H,x, W,x, vex4 cpuid= (SSSE3) mmx avx2_256 p_00_66), + [0x0c] =3D X86_OP_ENTRY3(VPERMILPS, V,x, H,x, W,x, vex4 cpuid= (AVX) p_00_66), + [0x0d] =3D X86_OP_ENTRY3(VPERMILPD, V,x, H,x, W,x, vex4 cpuid= (AVX) p_66), + [0x0e] =3D X86_OP_ENTRY3(VTESTPS, None,None, V,x, W,x, vex4 cpuid= (AVX) p_66), + [0x0f] =3D X86_OP_ENTRY3(VTESTPD, None,None, V,x, W,x, vex4 cpuid= (AVX) p_66), + + [0x18] =3D X86_OP_ENTRY3(VPBROADCASTD, V,x, None,None, W,d, vex6 c= puid(AVX) p_66), /* vbroadcastss */ + [0x19] =3D X86_OP_ENTRY3(VPBROADCASTQ, V,qq, None,None, W,q, vex6 c= puid(AVX) p_66), /* vbroadcastsd */ + [0x1a] =3D X86_OP_ENTRY3(VBROADCASTx128, V,qq, None,None, WM,dq,vex6 c= puid(AVX) p_66), + [0x1c] =3D X86_OP_ENTRY3(PABSB, V,x, None,None, W,x, vex4 c= puid(SSSE3) mmx avx2_256 p_00_66), + [0x1d] =3D X86_OP_ENTRY3(PABSW, V,x, None,None, W,x, vex4 c= puid(SSSE3) mmx avx2_256 p_00_66), + [0x1e] =3D X86_OP_ENTRY3(PABSD, V,x, None,None, W,x, vex4 c= puid(SSSE3) mmx avx2_256 p_00_66), + + [0x28] =3D X86_OP_ENTRY3(PMULDQ, V,x, H,x, W,x, vex4 cpu= id(SSE41) avx2_256 p_66), + [0x29] =3D X86_OP_ENTRY3(PCMPEQQ, V,x, H,x, W,x, vex4 cpu= id(SSE41) avx2_256 p_66), + [0x2a] =3D X86_OP_ENTRY3(MOVDQ, V,x, None,None, WM,x, vex1 cpu= id(SSE41) avx2_256 p_66), /* movntdqa */ + [0x2b] =3D X86_OP_ENTRY3(VPACKUSDW, V,x, H,x, W,x, vex4 cpu= id(SSE41) avx2_256 p_66), + [0x2c] =3D X86_OP_ENTRY3(VMASKMOVPS, V,x, H,x, WM,x, vex6 cpu= id(AVX) p_66), + [0x2d] =3D X86_OP_ENTRY3(VMASKMOVPD, V,x, H,x, WM,x, vex6 cpu= id(AVX) p_66), + /* Incorrectly listed as Mx,Hx,Vx in the manual */ + [0x2e] =3D X86_OP_ENTRY3(VMASKMOVPS_st, M,x, V,x, H,x, vex6 cpu= id(AVX) p_66), + [0x2f] =3D X86_OP_ENTRY3(VMASKMOVPD_st, M,x, V,x, H,x, vex6 cpu= id(AVX) p_66), + + [0x38] =3D X86_OP_ENTRY3(PMINSB, V,x, H,x, W,x, vex4 cpuid(SS= E41) avx2_256 p_66), + [0x39] =3D X86_OP_ENTRY3(PMINSD, V,x, H,x, W,x, vex4 cpuid(SS= E41) avx2_256 p_66), + [0x3a] =3D X86_OP_ENTRY3(PMINUW, V,x, H,x, W,x, vex4 cpuid(SS= E41) avx2_256 p_66), + [0x3b] =3D X86_OP_ENTRY3(PMINUD, V,x, H,x, W,x, vex4 cpuid(SS= E41) avx2_256 p_66), + [0x3c] =3D X86_OP_ENTRY3(PMAXSB, V,x, H,x, W,x, vex4 cpuid(SS= E41) avx2_256 p_66), + [0x3d] =3D X86_OP_ENTRY3(PMAXSD, V,x, H,x, W,x, vex4 cpuid(SS= E41) avx2_256 p_66), + [0x3e] =3D X86_OP_ENTRY3(PMAXUW, V,x, H,x, W,x, vex4 cpuid(SS= E41) avx2_256 p_66), + [0x3f] =3D X86_OP_ENTRY3(PMAXUD, V,x, H,x, W,x, vex4 cpuid(SS= E41) avx2_256 p_66), + + [0x58] =3D X86_OP_ENTRY3(VPBROADCASTD, V,x, None,None, W,d, vex6 c= puid(AVX2) p_66), + [0x59] =3D X86_OP_ENTRY3(VPBROADCASTQ, V,x, None,None, W,q, vex6 c= puid(AVX2) p_66), + [0x5a] =3D X86_OP_ENTRY3(VBROADCASTx128, V,qq, None,None, WM,dq,vex6 c= puid(AVX2) p_66), + + [0x78] =3D X86_OP_ENTRY3(VPBROADCASTB, V,x, None,None, W,b, vex6 c= puid(AVX2) p_66), + [0x79] =3D X86_OP_ENTRY3(VPBROADCASTW, V,x, None,None, W,w, vex6 c= puid(AVX2) p_66), + + [0x8c] =3D X86_OP_ENTRY3(VPMASKMOV, V,x, H,x, WM,x, vex6 cpuid(AVX= 2) p_66), + [0x8e] =3D X86_OP_ENTRY3(VPMASKMOV_st, M,x, V,x, H,x, vex6 cpuid(AVX= 2) p_66), + + [0xdb] =3D X86_OP_ENTRY3(VAESIMC, V,dq, None,None, W,dq, vex4 cpui= d(AES) p_66), + [0xdc] =3D X86_OP_ENTRY3(VAESENC, V,x, H,x, W,x, vex4 cpui= d(AES) p_66), + [0xdd] =3D X86_OP_ENTRY3(VAESENCLAST, V,x, H,x, W,x, vex4 cpui= d(AES) p_66), + [0xde] =3D X86_OP_ENTRY3(VAESDEC, V,x, H,x, W,x, vex4 cpui= d(AES) p_66), + [0xdf] =3D X86_OP_ENTRY3(VAESDECLAST, V,x, H,x, W,x, vex4 cpui= d(AES) p_66), }; =20 /* five rows for no prefix, 66, F3, F2, 66+F2 */ @@ -431,8 +538,8 @@ static const X86OpEntry opcodes_0F3A[256] =3D { [0x0b] =3D X86_OP_ENTRY4(VROUNDSD, V,x, H,x, W,sd, vex3 cpuid(SSE41= ) p_66), [0x0c] =3D X86_OP_ENTRY4(VBLENDPS, V,x, H,x, W,x, vex4 cpuid(SSE4= 1) p_66), [0x0d] =3D X86_OP_ENTRY4(VBLENDPD, V,x, H,x, W,x, vex4 cpuid(SSE4= 1) p_66), - [0x0e] =3D X86_OP_ENTRY4(VPBLENDW, V,x, H,x, W,x, vex4 cpuid(SSE4= 1) p_66), - [0x0f] =3D X86_OP_ENTRY4(PALIGNR, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx p_00_66), + [0x0e] =3D X86_OP_ENTRY4(VPBLENDW, V,x, H,x, W,x, vex4 cpuid(SSE4= 1) avx2_256 p_66), + [0x0f] =3D X86_OP_ENTRY4(PALIGNR, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), =20 [0x18] =3D X86_OP_ENTRY4(VINSERTx128, V,qq, H,qq, W,qq, vex6 cpuid(AV= X) p_66), [0x19] =3D X86_OP_ENTRY3(VEXTRACTx128, W,dq, V,qq, I,b, vex6 cpuid(AV= X) p_66), @@ -1428,6 +1535,14 @@ static void disas_insn_new(DisasContext *s, CPUState= *cpu, int b) } break; =20 + case X86_SPECIAL_AVXExtMov: + if (!decode.op[2].has_ea) { + decode.op[2].ot =3D s->vex_l ? MO_256 : MO_128; + } else if (s->vex_l) { + decode.op[2].ot++; + } + break; + case X86_SPECIAL_MMX: if (!(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ | PREFIX_DATA))) { gen_helper_enter_mmx(cpu_env); diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h index ef318a00ed..2f0b0e4cfd 100644 --- a/target/i386/tcg/decode-new.h +++ b/target/i386/tcg/decode-new.h @@ -142,6 +142,12 @@ typedef enum X86InsnSpecial { X86_SPECIAL_ZExtOp0, X86_SPECIAL_ZExtOp2, =20 + /* + * Register operand 2 is extended to full width, while a memory operand + * is doubled in size if VEX.L=3D1. + */ + X86_SPECIAL_AVXExtMov, + /* * MMX instruction exists with no prefix; if there is no prefix, V/H/W= /U operands * become P/P/Q/N, and size "x" becomes "q". diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index ad93094ca8..0c5e9d956a 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -19,6 +19,9 @@ * License along with this library; if not, see . */ =20 +typedef void (*SSEFunc_0_epppti)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr re= g_b, + TCGv_ptr reg_c, TCGv a0, TCGv_i32 scale); + static inline TCGv_i32 tcg_constant8u_i32(uint8_t val) { return tcg_constant_i32(val); @@ -484,15 +487,20 @@ static inline void gen_ternary_sse(DisasContext *s, C= PUX86State *env, X86Decoded fn(cpu_env, OP_PTR0, OP_PTR1, OP_PTR2, ptr3); tcg_temp_free_ptr(ptr3); } -#define TERNARY_SSE(uvname, lname) = \ +#define TERNARY_SSE(uname, uvname, lname) = \ static void gen_##uvname(DisasContext *s, CPUX86State *env, X86DecodedInsn= *decode) \ { = \ gen_ternary_sse(s, env, decode, (uint8_t)decode->immediate >> 4, = \ gen_helper_##lname##_xmm, gen_helper_##lname##_ymm); = \ +} = \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) \ +{ = \ + gen_ternary_sse(s, env, decode, 0, = \ + gen_helper_##lname##_xmm, gen_helper_##lname##_ymm); = \ } -TERNARY_SSE(VBLENDVPS, blendvps) -TERNARY_SSE(VBLENDVPD, blendvpd) -TERNARY_SSE(VPBLENDVB, pblendvb) +TERNARY_SSE(BLENDVPS, VBLENDVPS, blendvps) +TERNARY_SSE(BLENDVPD, VBLENDVPD, blendvpd) +TERNARY_SSE(PBLENDVB, VPBLENDVB, pblendvb) =20 static inline void gen_binary_imm_sse(DisasContext *s, CPUX86State *env, X= 86DecodedInsn *decode, SSEFunc_0_epppi xmm, SSEFunc_0_epppi= ymm) @@ -522,6 +530,25 @@ BINARY_IMM_SSE(VDDPD, dppd) BINARY_IMM_SSE(VMPSADBW, mpsadbw) BINARY_IMM_SSE(PCLMULQDQ, pclmulqdq) =20 + +#define UNARY_INT_GVEC(uname, func, ...) = \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) \ +{ = \ + int vec_len =3D vector_len(s, decode); = \ + = \ + func(__VA_ARGS__, decode->op[0].offset, = \ + decode->op[2].offset, vec_len, vec_len); = \ +} +UNARY_INT_GVEC(PABSB, tcg_gen_gvec_abs, MO_8) +UNARY_INT_GVEC(PABSW, tcg_gen_gvec_abs, MO_16) +UNARY_INT_GVEC(PABSD, tcg_gen_gvec_abs, MO_32) +UNARY_INT_GVEC(VBROADCASTx128, tcg_gen_gvec_dup_mem, MO_128) +UNARY_INT_GVEC(VPBROADCASTB, tcg_gen_gvec_dup_mem, MO_8) +UNARY_INT_GVEC(VPBROADCASTW, tcg_gen_gvec_dup_mem, MO_16) +UNARY_INT_GVEC(VPBROADCASTD, tcg_gen_gvec_dup_mem, MO_32) +UNARY_INT_GVEC(VPBROADCASTQ, tcg_gen_gvec_dup_mem, MO_64) + + #define BINARY_INT_GVEC(uname, func, ...) = \ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) \ { = \ @@ -544,14 +571,25 @@ BINARY_INT_GVEC(PAND, tcg_gen_gvec_and, MO_64) BINARY_INT_GVEC(PCMPEQB, tcg_gen_gvec_cmp, TCG_COND_EQ, MO_8) BINARY_INT_GVEC(PCMPEQD, tcg_gen_gvec_cmp, TCG_COND_EQ, MO_32) BINARY_INT_GVEC(PCMPEQW, tcg_gen_gvec_cmp, TCG_COND_EQ, MO_16) +BINARY_INT_GVEC(PCMPEQQ, tcg_gen_gvec_cmp, TCG_COND_EQ, MO_64) BINARY_INT_GVEC(PCMPGTB, tcg_gen_gvec_cmp, TCG_COND_GT, MO_8) BINARY_INT_GVEC(PCMPGTW, tcg_gen_gvec_cmp, TCG_COND_GT, MO_16) BINARY_INT_GVEC(PCMPGTD, tcg_gen_gvec_cmp, TCG_COND_GT, MO_32) +BINARY_INT_GVEC(PCMPGTQ, tcg_gen_gvec_cmp, TCG_COND_GT, MO_64) +BINARY_INT_GVEC(PMAXSB, tcg_gen_gvec_smax, MO_8) BINARY_INT_GVEC(PMAXSW, tcg_gen_gvec_smax, MO_16) +BINARY_INT_GVEC(PMAXSD, tcg_gen_gvec_smax, MO_32) BINARY_INT_GVEC(PMAXUB, tcg_gen_gvec_umax, MO_8) +BINARY_INT_GVEC(PMAXUW, tcg_gen_gvec_umax, MO_16) +BINARY_INT_GVEC(PMAXUD, tcg_gen_gvec_umax, MO_32) +BINARY_INT_GVEC(PMINSB, tcg_gen_gvec_smin, MO_8) BINARY_INT_GVEC(PMINSW, tcg_gen_gvec_smin, MO_16) +BINARY_INT_GVEC(PMINSD, tcg_gen_gvec_smin, MO_32) BINARY_INT_GVEC(PMINUB, tcg_gen_gvec_umin, MO_8) +BINARY_INT_GVEC(PMINUW, tcg_gen_gvec_umin, MO_16) +BINARY_INT_GVEC(PMINUD, tcg_gen_gvec_umin, MO_32) BINARY_INT_GVEC(PMULLW, tcg_gen_gvec_mul, MO_16) +BINARY_INT_GVEC(PMULLD, tcg_gen_gvec_mul, MO_32) BINARY_INT_GVEC(POR, tcg_gen_gvec_or, MO_64) BINARY_INT_GVEC(PSUBB, tcg_gen_gvec_sub, MO_8) BINARY_INT_GVEC(PSUBW, tcg_gen_gvec_sub, MO_16) @@ -626,6 +664,19 @@ BINARY_INT_MMX(PSRLQ_r, psrlq) BINARY_INT_MMX(PSRAW_r, psraw) BINARY_INT_MMX(PSRAD_r, psrad) =20 +BINARY_INT_MMX(PHADDW, phaddw) +BINARY_INT_MMX(PHADDSW, phaddsw) +BINARY_INT_MMX(PHADDD, phaddd) +BINARY_INT_MMX(PHSUBW, phsubw) +BINARY_INT_MMX(PHSUBSW, phsubsw) +BINARY_INT_MMX(PHSUBD, phsubd) +BINARY_INT_MMX(PMADDUBSW, pmaddubsw) +BINARY_INT_MMX(PSHUFB, pshufb) +BINARY_INT_MMX(PSIGNB, psignb) +BINARY_INT_MMX(PSIGNW, psignw) +BINARY_INT_MMX(PSIGND, psignd) +BINARY_INT_MMX(PMULHRSW, pmulhrsw) + /* Instructions with no MMX equivalent. */ #define BINARY_INT_SSE(uname, lname) = \ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) \ @@ -636,8 +687,35 @@ static void gen_##uname(DisasContext *s, CPUX86State *= env, X86DecodedInsn *decod gen_helper_##lname##_ymm); = \ } =20 +/* Instructions with no MMX equivalent. */ BINARY_INT_SSE(PUNPCKLQDQ, punpcklqdq) BINARY_INT_SSE(PUNPCKHQDQ, punpckhqdq) +BINARY_INT_SSE(VPACKUSDW, packusdw) +BINARY_INT_SSE(VPERMILPS, vpermilps) +BINARY_INT_SSE(VPERMILPD, vpermilpd) +BINARY_INT_SSE(VMASKMOVPS, vpmaskmovd) +BINARY_INT_SSE(VMASKMOVPD, vpmaskmovq) + +BINARY_INT_SSE(PMULDQ, pmuldq) + +BINARY_INT_SSE(VAESDEC, aesdec) +BINARY_INT_SSE(VAESDECLAST, aesdeclast) +BINARY_INT_SSE(VAESENC, aesenc) +BINARY_INT_SSE(VAESENCLAST, aesenclast) + +#define UNARY_CMP_SSE(uname, lname) = \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) \ +{ = \ + if (!s->vex_l) { = \ + gen_helper_##lname##_xmm(cpu_env, OP_PTR1, OP_PTR2); = \ + } else { = \ + gen_helper_##lname##_ymm(cpu_env, OP_PTR1, OP_PTR2); = \ + } = \ + set_cc_op(s, CC_OP_EFLAGS); = \ +} +UNARY_CMP_SSE(VPTEST, ptest) +UNARY_CMP_SSE(VTESTPS, vtestps) +UNARY_CMP_SSE(VTESTPD, vtestpd) =20 static inline void gen_unary_int_sse(DisasContext *s, CPUX86State *env, X8= 6DecodedInsn *decode, SSEFunc_0_epp xmm, SSEFunc_0_epp ymm) @@ -657,6 +735,20 @@ static void gen_##uname(DisasContext *s, CPUX86State *= env, X86DecodedInsn *decod gen_helper_##lname##_ymm); = \ } =20 +UNARY_INT_SSE(VPMOVSXBW, pmovsxbw) +UNARY_INT_SSE(VPMOVSXBD, pmovsxbd) +UNARY_INT_SSE(VPMOVSXBQ, pmovsxbq) +UNARY_INT_SSE(VPMOVSXWD, pmovsxwd) +UNARY_INT_SSE(VPMOVSXWQ, pmovsxwq) +UNARY_INT_SSE(VPMOVSXDQ, pmovsxdq) + +UNARY_INT_SSE(VPMOVZXBW, pmovzxbw) +UNARY_INT_SSE(VPMOVZXBD, pmovzxbd) +UNARY_INT_SSE(VPMOVZXBQ, pmovzxbq) +UNARY_INT_SSE(VPMOVZXWD, pmovzxwd) +UNARY_INT_SSE(VPMOVZXWQ, pmovzxwq) +UNARY_INT_SSE(VPMOVZXDQ, pmovzxdq) + UNARY_INT_SSE(VCVTDQ2PD, cvtdq2pd) UNARY_INT_SSE(VCVTPD2DQ, cvtpd2dq) UNARY_INT_SSE(VCVTTPD2DQ, cvttpd2dq) @@ -714,6 +806,64 @@ static void gen_##uname(DisasContext *s, CPUX86State *= env, X86DecodedInsn *decod UNARY_IMM_FP_SSE(VROUNDPS, roundps) UNARY_IMM_FP_SSE(VROUNDPD, roundpd) =20 +static inline void gen_rexw_avx(DisasContext *s, CPUX86State *env, X86Deco= dedInsn *decode, + SSEFunc_0_eppp d_xmm, SSEFunc_0_eppp q_xmm, + SSEFunc_0_eppp d_ymm, SSEFunc_0_eppp q_ymm) +{ + SSEFunc_0_eppp d =3D s->vex_l ? d_ymm : d_xmm; + SSEFunc_0_eppp q =3D s->vex_l ? q_ymm : q_xmm; + SSEFunc_0_eppp fn =3D REX_W(s) ? q : d; + fn(cpu_env, OP_PTR0, OP_PTR1, OP_PTR2); +} + +/* REX.W affects whether to operate on 32- or 64-bit elements. */ +#define REXW_AVX(uname, lname) = \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) \ +{ = \ + gen_rexw_avx(s, env, decode, = \ + gen_helper_##lname##d_xmm, gen_helper_##lname##q_xmm, = \ + gen_helper_##lname##d_ymm, gen_helper_##lname##q_ymm); = \ +} +REXW_AVX(VPSLLV, vpsllv) +REXW_AVX(VPSRLV, vpsrlv) +REXW_AVX(VPSRAV, vpsrav) +REXW_AVX(VPMASKMOV, vpmaskmov) + +/* Same as above, but with extra arguments to the helper. */ +static inline void gen_vsib_avx(DisasContext *s, CPUX86State *env, X86Deco= dedInsn *decode, + SSEFunc_0_epppti d_xmm, SSEFunc_0_epppti q= _xmm, + SSEFunc_0_epppti d_ymm, SSEFunc_0_epppti q= _ymm) +{ + SSEFunc_0_epppti d =3D s->vex_l ? d_ymm : d_xmm; + SSEFunc_0_epppti q =3D s->vex_l ? q_ymm : q_xmm; + SSEFunc_0_epppti fn =3D REX_W(s) ? q : d; + TCGv_i32 scale =3D tcg_constant_i32(decode->mem.scale); + TCGv_ptr index =3D tcg_temp_new_ptr(); + + /* Pass third input as (index, base, scale) */ + tcg_gen_addi_ptr(index, cpu_env, ZMM_OFFSET(decode->mem.index)); + fn(cpu_env, OP_PTR0, OP_PTR1, index, s->A0, scale); + + /* + * There are two output operands, so zero OP1's high 128 bits + * in the VEX.128 case. + */ + if (!s->vex_l) { + int ymmh_ofs =3D vector_elem_offset(&decode->op[1], MO_128, 1); + tcg_gen_gvec_dup_imm(MO_64, ymmh_ofs, 16, 16, 0); + } + tcg_temp_free_ptr(index); +} +#define VSIB_AVX(uname, lname) = \ +static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) \ +{ = \ + gen_vsib_avx(s, env, decode, = \ + gen_helper_##lname##d_xmm, gen_helper_##lname##q_xmm, = \ + gen_helper_##lname##d_ymm, gen_helper_##lname##q_ymm); = \ +} +VSIB_AVX(VPGATHERD, vpgatherd) +VSIB_AVX(VPGATHERQ, vpgatherq) + static void gen_ADCOX(DisasContext *s, CPUX86State *env, MemOp ot, int cc_= op) { TCGv carry_in =3D NULL; @@ -1484,6 +1639,12 @@ static void gen_VAESKEYGEN(DisasContext *s, CPUX86St= ate *env, X86DecodedInsn *de gen_helper_aeskeygenassist_xmm(cpu_env, OP_PTR0, OP_PTR1, imm); } =20 +static void gen_VAESIMC(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) +{ + assert(!s->vex_l); + gen_helper_aesimc_xmm(cpu_env, OP_PTR0, OP_PTR2); +} + static void gen_VCVTfp2fp(DisasContext *s, CPUX86State *env, X86DecodedIns= n *decode) { gen_unary_fp_sse(s, env, decode, @@ -1567,6 +1728,41 @@ static void gen_VINSERTx128(DisasContext *s, CPUX86S= tate *env, X86DecodedInsn *d decode->op[1].offset + offsetof(YMMReg, YMM_X(!mask))= , 16, 16); } =20 +static inline void gen_maskmov(DisasContext *s, CPUX86State *env, X86Decod= edInsn *decode, + SSEFunc_0_eppt xmm, SSEFunc_0_eppt ymm) +{ + if (!s->vex_l) { + xmm(cpu_env, OP_PTR2, OP_PTR1, s->A0); + } else { + ymm(cpu_env, OP_PTR2, OP_PTR1, s->A0); + } +} + +static void gen_VMASKMOVPD_st(DisasContext *s, CPUX86State *env, X86Decode= dInsn *decode) +{ + gen_maskmov(s, env, decode, gen_helper_vpmaskmovq_st_xmm, gen_helper_v= pmaskmovq_st_ymm); +} + +static void gen_VMASKMOVPS_st(DisasContext *s, CPUX86State *env, X86Decode= dInsn *decode) +{ + gen_maskmov(s, env, decode, gen_helper_vpmaskmovd_st_xmm, gen_helper_v= pmaskmovd_st_ymm); +} + +static void gen_VPMASKMOV_st(DisasContext *s, CPUX86State *env, X86Decoded= Insn *decode) +{ + if (REX_W(s)) { + gen_VMASKMOVPD_st(s, env, decode); + } else { + gen_VMASKMOVPS_st(s, env, decode); + } +} + +static void gen_VPERMD(DisasContext *s, CPUX86State *env, X86DecodedInsn *= decode) +{ + assert(s->vex_l); + gen_helper_vpermd_ymm(OP_PTR0, OP_PTR1, OP_PTR2); +} + static void gen_VPERM2x128(DisasContext *s, CPUX86State *env, X86DecodedIn= sn *decode) { TCGv_i32 imm =3D tcg_constant8u_i32(decode->immediate); @@ -1574,6 +1770,12 @@ static void gen_VPERM2x128(DisasContext *s, CPUX86St= ate *env, X86DecodedInsn *de gen_helper_vpermdq_ymm(OP_PTR0, OP_PTR1, OP_PTR2, imm); } =20 +static void gen_VPHMINPOSUW(DisasContext *s, CPUX86State *env, X86DecodedI= nsn *decode) +{ + assert(!s->vex_l); + gen_helper_phminposuw_xmm(cpu_env, OP_PTR0, OP_PTR2); +} + static void gen_VROUNDSD(DisasContext *s, CPUX86State *env, X86DecodedInsn= *decode) { TCGv_i32 imm =3D tcg_constant8u_i32(decode->immediate); diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index bffe778a69..9cb567f02f 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -4774,7 +4774,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu) use_new &=3D b <=3D limit; #endif if (use_new && - (b =3D=3D 0x13a || + (b =3D=3D 0x138 || b =3D=3D 0x13a || (b >=3D 0x150 && b <=3D 0x17f) || (b >=3D 0x1d0 && b <=3D 0x1ff))) { disas_insn_new(s, cpu, b + 0x100); --=20 2.37.3