From nobody Sat May 18 10:30:08 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1666440381; cv=none; d=zohomail.com; s=zohoarc; b=m7Hb6jUcjAMnXJPci+l/CegBL3aD4njdB5xT27umPJSN0xfKGCTco8O9slExqysfalbNiJGJOy3Qpp/enlRy+74i9p8Pe9CO83eZ3hgQmhVjnjRy9vACHL09NFHfM6Ybye354f2JAPcQZBsUcH8mykWaLO1Vx54wjDaQS/38mT4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1666440381; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=7120nVrCZQMXVeIhLGCLXbQpCT5M2DSZeitOQvwDDh0=; b=fKV99nXDVmWBXAoTuttQUE9l/iOXzIPW2cHZ1kFbNGbTghLbBaFbEr/O3bsnzom25Df8GVMoKrng0im3QxMJ2d2zvhyeSeM0id+fk6rk885kNNrcK6MTcC2JqYoWaPf6Y9L+zS43MlT6Y9THPIYt3CSULCTGUhmqzgI2qB066OU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1666440381455826.0327047143894; Sat, 22 Oct 2022 05:06:21 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1omD7N-00036G-0g; Sat, 22 Oct 2022 07:56:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omD6x-0002z6-EB for qemu-devel@nongnu.org; Sat, 22 Oct 2022 07:56:22 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omD6v-0006hQ-BO for qemu-devel@nongnu.org; Sat, 22 Oct 2022 07:56:18 -0400 Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-343-arWC5HrrOEultrqdfNFPug-1; Sat, 22 Oct 2022 07:56:15 -0400 Received: by mail-ed1-f70.google.com with SMTP id z20-20020a05640240d400b0045cec07c3dcso5140727edb.3 for ; Sat, 22 Oct 2022 04:56:15 -0700 (PDT) Received: from avogadro.local ([2001:b07:6468:f312:2f4b:62da:3159:e077]) by smtp.gmail.com with ESMTPSA id n26-20020a056402061a00b0045c010d0584sm14879055edv.47.2022.10.22.04.56.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 22 Oct 2022 04:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666439776; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7120nVrCZQMXVeIhLGCLXbQpCT5M2DSZeitOQvwDDh0=; b=M+yFraUJSFBovW+Re5EPRKzssMdSeXRtaD3w/VV7UzNIAAnYm9XMY0u5jbg7BES+3Cd8xC iB0YRGc3cVy4ein57nrSYm/QJNd1QNli6Ed+8lE8CuEErlgZ2b+qapgK8iLnN37mfTisjX DZpnb3ePJrmOYI9ZdPtg4HjP04cwXrQ= X-MC-Unique: arWC5HrrOEultrqdfNFPug-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7120nVrCZQMXVeIhLGCLXbQpCT5M2DSZeitOQvwDDh0=; b=MQUMFejNBZ2EdBF0Rzw7GbUwMoG7ZGxfItSz/j7S5SZsRzFxUhQnCF9at9u+e+q/5O B+0ImRA8Etq0siSHJAtZrIc/4cdawm3mITe9W9wfkW+c476PwImMT2CzMz9f6gpbTNaC ZUcb4XTpYML0SsYCZeMbJ64AII7aiKR7jt03nCq86hk6qygKLCVEF1X4BJcmnDelEJ7X JuSFncjbdQvQDZUiKyNTz0jCQfsextPpsPNbBQJszHwoB/c4ya1mmk+1mmrUpOf8TTjE tGv3Z2bWgFvNaDCtfx2EKzbaoZpCRAydY2/c9aED/XWNGGCoEw9I8l5zgQZe7/+aFp/E 4TYQ== X-Gm-Message-State: ACrzQf0KPf2Qjyr4i9B/fcr3Pc8S6YkqWuuatHEDQe07VbHkWc7+AGEy wb+tc0UMQTvSLj/BkWWcTe6nlr7jbHQVRRUP77mdJGgQmsXrW92wVjl6pNvd560K2L0mrimwEpZ ypSN/lcl2RweW/KHvrPtsSX0nvM8FwD/ORYitVx38IjiT219fshlDwdO8Qr7LMG2UElY= X-Received: by 2002:a17:906:9fc1:b0:761:9192:504f with SMTP id hj1-20020a1709069fc100b007619192504fmr19951443ejc.116.1666439773710; Sat, 22 Oct 2022 04:56:13 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4gqGaUt1p6qSK2mSLoesU//fYbiqLA88ZafhSPNFpjSXiho/mNbV8aCVKSUJyAnAXHWqgvqw== X-Received: by 2002:a17:906:9fc1:b0:761:9192:504f with SMTP id hj1-20020a1709069fc100b007619192504fmr19951427ejc.116.1666439773417; Sat, 22 Oct 2022 04:56:13 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Richard Henderson Subject: [PULL 1/4] target/i386: decode-new: avoid out-of-bounds access to xmm_regs[-1] Date: Sat, 22 Oct 2022 13:56:05 +0200 Message-Id: <20221022115608.152853-2-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221022115608.152853-1-pbonzini@redhat.com> References: <20221022115608.152853-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -23 X-Spam_score: -2.4 X-Spam_bar: -- X-Spam_report: (-2.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.251, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, PP_MIME_FAKE_ASCII_TEXT=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Qemu-devel" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1666440382484100002 If the destination is a memory register, op->n is -1. Going through tcg_gen_gvec_dup_imm path is both useless (the value has been stored by the gen_* function already) and wrong because of the out-of-bounds access. Reviewed-by: Philippe Mathieu-Daud=C3=A9 Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/tcg/emit.c.inc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 27eca591a9..ebf299451d 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -296,7 +296,7 @@ static void gen_writeback(DisasContext *s, X86DecodedIn= sn *decode, int opn, TCGv case X86_OP_MMX: break; case X86_OP_SSE: - if ((s->prefix & PREFIX_VEX) && op->ot =3D=3D MO_128) { + if (!op->has_ea && (s->prefix & PREFIX_VEX) && op->ot =3D=3D MO_12= 8) { tcg_gen_gvec_dup_imm(MO_64, offsetof(CPUX86State, xmm_regs[op->n].ZMM= _X(1)), 16, 16, 0); --=20 2.37.3 From nobody Sat May 18 10:30:08 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1666441399; cv=none; d=zohomail.com; s=zohoarc; b=kwYdPv1LWoU0jjwL7KFJQtyWpIJyJVq48N4zKgkOf8ZhVCmHVzxmlQ3Sx/o/q0i5vWlHd+++D7V4vqJ0amgYARaf1aajciKM+GsA+5/u/4JwZNNnABE6Q2tcFnPoKyUaoR9Em5M/xYfeWdGC+ll1fBCPpDH+DYU0IOTiN4owalQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1666441399; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=6Fps74obIni96mKMShOCBvBkks/5+B5uGMDkYtJzx6k=; b=V4KuwgBQC/fjHV5+YzcYTmREpHPDV5xr7Elf1k66xXKYINWpRvBXmEUnM6C3SFh2zmwuhiHKAjSTgrkpWBr7vcHpcKYtdW1Mmlok+PFKIkzbFVRxGS4YfNVcxc/EIWIgreA/VJLbBusNDTPU7n4gvYKfylrAbrkS6R7zpTFZNgo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1666441399494414.13423577477465; Sat, 22 Oct 2022 05:23:19 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1omD7T-0003C2-3d; Sat, 22 Oct 2022 07:56:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omD73-00032B-SG for qemu-devel@nongnu.org; Sat, 22 Oct 2022 07:56:29 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omD71-0006i0-UW for qemu-devel@nongnu.org; Sat, 22 Oct 2022 07:56:25 -0400 Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-275-7k8qM7TVNOCzTvm26jwwxg-1; Sat, 22 Oct 2022 07:56:18 -0400 Received: by mail-ed1-f70.google.com with SMTP id l18-20020a056402255200b0045d2674d1a0so5160424edb.0 for ; Sat, 22 Oct 2022 04:56:17 -0700 (PDT) Received: from avogadro.local ([2001:b07:6468:f312:2f4b:62da:3159:e077]) by smtp.gmail.com with ESMTPSA id p7-20020a170906784700b0078d9e26db54sm12902093ejm.88.2022.10.22.04.56.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 22 Oct 2022 04:56:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666439783; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6Fps74obIni96mKMShOCBvBkks/5+B5uGMDkYtJzx6k=; b=fdhc7WeyLKwjsafPoqapby6RdDv9vYCHVDf8IfoBppzCyfL1b9xtffYH916BHdgbNrhcJs pa+dsjSi2O14BDLwAR6wULFS6g/gAJZ51rGMH+m9FAtc7DgVZVFgCY9hqgUhDiv0Qw/2PC ViCVueexhyp0pazN/GwTvN8OrNoL7Nk= X-MC-Unique: 7k8qM7TVNOCzTvm26jwwxg-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6Fps74obIni96mKMShOCBvBkks/5+B5uGMDkYtJzx6k=; b=DeeP2pjWVwdXa+OiwvFdpo7+K+AUFpJFVqX3UcCpZMNeXKS3CcD2kBxtOejKSwHpMj /0F6GxwFmZqgOr3KX+5DCKBxMo3K3JufEa4nLWAhSg2UVVE8Mgnqi7wE2A23rTWqlAIn oZz2wsvWBYfrTZlg4o054K3k1HtXq55DXnpKvxTAp/S1BVntX5OEevnqgRWcPIh1WozK S1TgeVvozegrX4cnBFrWD6A1r0bZSlMyu2qwO2GDUNTmRdOfKgeiyNdlGfqpIRZOXBXe Co+dFfCVtvHgx7LqCEorlj8cr8uTza9TbiAXzhga7uIvbKS2ecg6UOZ8c+TMSMb9Pn7J P+xA== X-Gm-Message-State: ACrzQf2gpmrMj4kq9z9qZVkDkVgBL6Kz+YH61evVH0khKQrkqwpjPeyN qlVhJJeNyJ8OCbhB7P9Jxf4BNsQTjuieiqmscjBY+OG7dWhNqFbiNiK+8GVxPNHJ03HMvfN3h/P wEVWy/0bkf/pMJiXtGjAANgsq5u6fpOyg6LigTh2gpVfZjFEu4F5jApFJH8jkUfHvHl8= X-Received: by 2002:a17:907:2cd9:b0:78d:9f4c:9cff with SMTP id hg25-20020a1709072cd900b0078d9f4c9cffmr19572873ejc.345.1666439776340; Sat, 22 Oct 2022 04:56:16 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7YC2f+n/qTzeNofbUTaMwQOoCmwIh6qgXxAx0gQ5qpGkPaP04VsH/ks5JtlcvvTVz9zjWPLg== X-Received: by 2002:a17:907:2cd9:b0:78d:9f4c:9cff with SMTP id hg25-20020a1709072cd900b0078d9f4c9cffmr19572862ejc.345.1666439776101; Sat, 22 Oct 2022 04:56:16 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , Richard Henderson Subject: [PULL 2/4] target/i386: introduce function to set rounding mode from FPCW or MXCSR bits Date: Sat, 22 Oct 2022 13:56:06 +0200 Message-Id: <20221022115608.152853-3-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221022115608.152853-1-pbonzini@redhat.com> References: <20221022115608.152853-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -23 X-Spam_score: -2.4 X-Spam_bar: -- X-Spam_report: (-2.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.251, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, PP_MIME_FAKE_ASCII_TEXT=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Qemu-devel" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1666441400596100001 VROUND, FSTCW and STMXCSR all have to perform the same conversion from x86 rounding modes to softfloat constants. Since the ISA is consistent on the meaning of the two-bit rounding modes, extract the common code into a wrapper for set_float_rounding_mode. Reviewed-by: Philippe Mathieu-Daud=C3=A9 Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/ops_sse.h | 60 +++--------------------------------- target/i386/tcg/fpu_helper.c | 60 +++++++++++++----------------------- 2 files changed, 25 insertions(+), 95 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index d35fc15c65..0799712f6e 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1684,20 +1684,7 @@ void glue(helper_roundps, SUFFIX)(CPUX86State *env, = Reg *d, Reg *s, =20 prev_rounding_mode =3D env->sse_status.float_rounding_mode; if (!(mode & (1 << 2))) { - switch (mode & 3) { - case 0: - set_float_rounding_mode(float_round_nearest_even, &env->sse_st= atus); - break; - case 1: - set_float_rounding_mode(float_round_down, &env->sse_status); - break; - case 2: - set_float_rounding_mode(float_round_up, &env->sse_status); - break; - case 3: - set_float_rounding_mode(float_round_to_zero, &env->sse_status); - break; - } + set_x86_rounding_mode(mode & 3, &env->sse_status); } =20 for (i =3D 0; i < 2 << SHIFT; i++) { @@ -1721,20 +1708,7 @@ void glue(helper_roundpd, SUFFIX)(CPUX86State *env, = Reg *d, Reg *s, =20 prev_rounding_mode =3D env->sse_status.float_rounding_mode; if (!(mode & (1 << 2))) { - switch (mode & 3) { - case 0: - set_float_rounding_mode(float_round_nearest_even, &env->sse_st= atus); - break; - case 1: - set_float_rounding_mode(float_round_down, &env->sse_status); - break; - case 2: - set_float_rounding_mode(float_round_up, &env->sse_status); - break; - case 3: - set_float_rounding_mode(float_round_to_zero, &env->sse_status); - break; - } + set_x86_rounding_mode(mode & 3, &env->sse_status); } =20 for (i =3D 0; i < 1 << SHIFT; i++) { @@ -1759,20 +1733,7 @@ void glue(helper_roundss, SUFFIX)(CPUX86State *env, = Reg *d, Reg *v, Reg *s, =20 prev_rounding_mode =3D env->sse_status.float_rounding_mode; if (!(mode & (1 << 2))) { - switch (mode & 3) { - case 0: - set_float_rounding_mode(float_round_nearest_even, &env->sse_st= atus); - break; - case 1: - set_float_rounding_mode(float_round_down, &env->sse_status); - break; - case 2: - set_float_rounding_mode(float_round_up, &env->sse_status); - break; - case 3: - set_float_rounding_mode(float_round_to_zero, &env->sse_status); - break; - } + set_x86_rounding_mode(mode & 3, &env->sse_status); } =20 d->ZMM_S(0) =3D float32_round_to_int(s->ZMM_S(0), &env->sse_status); @@ -1797,20 +1758,7 @@ void glue(helper_roundsd, SUFFIX)(CPUX86State *env, = Reg *d, Reg *v, Reg *s, =20 prev_rounding_mode =3D env->sse_status.float_rounding_mode; if (!(mode & (1 << 2))) { - switch (mode & 3) { - case 0: - set_float_rounding_mode(float_round_nearest_even, &env->sse_st= atus); - break; - case 1: - set_float_rounding_mode(float_round_down, &env->sse_status); - break; - case 2: - set_float_rounding_mode(float_round_up, &env->sse_status); - break; - case 3: - set_float_rounding_mode(float_round_to_zero, &env->sse_status); - break; - } + set_x86_rounding_mode(mode & 3, &env->sse_status); } =20 d->ZMM_D(0) =3D float64_round_to_int(s->ZMM_D(0), &env->sse_status); diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index a6a90a1817..6f3741b635 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -32,7 +32,8 @@ #define ST(n) (env->fpregs[(env->fpstt + (n)) & 7].d) #define ST1 ST(1) =20 -#define FPU_RC_MASK 0xc00 +#define FPU_RC_SHIFT 10 +#define FPU_RC_MASK (3 << FPU_RC_SHIFT) #define FPU_RC_NEAR 0x000 #define FPU_RC_DOWN 0x400 #define FPU_RC_UP 0x800 @@ -685,28 +686,26 @@ uint32_t helper_fnstcw(CPUX86State *env) return env->fpuc; } =20 +static void set_x86_rounding_mode(unsigned mode, float_status *status) +{ + static FloatRoundMode x86_round_mode[4] =3D { + float_round_nearest_even, + float_round_down, + float_round_up, + float_round_to_zero + }; + assert(mode < ARRAY_SIZE(x86_round_mode)); + set_float_rounding_mode(x86_round_mode[mode], status); +} + void update_fp_status(CPUX86State *env) { - FloatRoundMode rnd_mode; + int rnd_mode; FloatX80RoundPrec rnd_prec; =20 /* set rounding mode */ - switch (env->fpuc & FPU_RC_MASK) { - default: - case FPU_RC_NEAR: - rnd_mode =3D float_round_nearest_even; - break; - case FPU_RC_DOWN: - rnd_mode =3D float_round_down; - break; - case FPU_RC_UP: - rnd_mode =3D float_round_up; - break; - case FPU_RC_CHOP: - rnd_mode =3D float_round_to_zero; - break; - } - set_float_rounding_mode(rnd_mode, &env->fp_status); + rnd_mode =3D (env->fpuc & FPU_RC_MASK) >> FPU_RC_SHIFT; + set_x86_rounding_mode(rnd_mode, &env->fp_status); =20 switch ((env->fpuc >> 8) & 3) { case 0: @@ -3038,11 +3037,8 @@ void helper_xsetbv(CPUX86State *env, uint32_t ecx, u= int64_t mask) /* XXX: optimize by storing fptt and fptags in the static cpu state */ =20 #define SSE_DAZ 0x0040 -#define SSE_RC_MASK 0x6000 -#define SSE_RC_NEAR 0x0000 -#define SSE_RC_DOWN 0x2000 -#define SSE_RC_UP 0x4000 -#define SSE_RC_CHOP 0x6000 +#define SSE_RC_SHIFT 13 +#define SSE_RC_MASK (3 << SSE_RC_SHIFT) #define SSE_FZ 0x8000 =20 void update_mxcsr_status(CPUX86State *env) @@ -3051,22 +3047,8 @@ void update_mxcsr_status(CPUX86State *env) int rnd_type; =20 /* set rounding mode */ - switch (mxcsr & SSE_RC_MASK) { - default: - case SSE_RC_NEAR: - rnd_type =3D float_round_nearest_even; - break; - case SSE_RC_DOWN: - rnd_type =3D float_round_down; - break; - case SSE_RC_UP: - rnd_type =3D float_round_up; - break; - case SSE_RC_CHOP: - rnd_type =3D float_round_to_zero; - break; - } - set_float_rounding_mode(rnd_type, &env->sse_status); + rnd_type =3D (mxcsr & SSE_RC_MASK) >> SSE_RC_SHIFT; + set_x86_rounding_mode(rnd_type, &env->sse_status); =20 /* Set exception flags. */ set_float_exception_flags((mxcsr & FPUS_IE ? float_flag_invalid : 0) | --=20 2.37.3 From nobody Sat May 18 10:30:08 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1666440626; cv=none; d=zohomail.com; s=zohoarc; b=RbeG4Wras8+ZQM6TqDt6v1ouCaexPHdOqwMWcVS0fVbTyQpPB1Rz1eCF2xA8f6iYBLeh6lKmBuy371y/eosnCYb94U76qz+a+1uInThqlkX9JAet9Q6n7izqu/HJsboqvhoRtX7UWuCZQuNU2m0KSgNwCqNxWA3WxZzsFlZATfQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1666440626; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=P0vAuPSb2O4zVXn10PSslk9Dlo76Z34l7Wd9h8BjsKY=; b=NOaayZ7ZEvxBHXmt5vivxjvEHCmlwP+i7FUZS/bFHknaMl88a44QP+VNwG1AM26nrMsCFRBOsnLqmHy2GWaXFqlIa09DjL9AVf1MnnqqOK6qe8vM/HDutfcBxHkwfXKo1+weB/1YG4S8HQKia9cRo39OZ2Db5voEDl0yThWIeto= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1666440626082232.5684834248707; Sat, 22 Oct 2022 05:10:26 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1omD7R-0003Bh-8Z; Sat, 22 Oct 2022 07:56:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omD73-00032A-KX for qemu-devel@nongnu.org; Sat, 22 Oct 2022 07:56:28 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omD71-0006hr-EH for qemu-devel@nongnu.org; Sat, 22 Oct 2022 07:56:25 -0400 Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-564-q91VdstgNsq5VA3-pe47NQ-1; Sat, 22 Oct 2022 07:56:20 -0400 Received: by mail-ed1-f70.google.com with SMTP id dz9-20020a0564021d4900b0045d9a3aded4so5195031edb.22 for ; Sat, 22 Oct 2022 04:56:20 -0700 (PDT) Received: from avogadro.local ([2001:b07:6468:f312:2f4b:62da:3159:e077]) by smtp.gmail.com with ESMTPSA id p7-20020a170906784700b0078d9e26db54sm12902129ejm.88.2022.10.22.04.56.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 22 Oct 2022 04:56:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666439782; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P0vAuPSb2O4zVXn10PSslk9Dlo76Z34l7Wd9h8BjsKY=; b=BMlpp/fY5dQll7fMnMdTvFZVnwKrl9fMbJUatxGNXpkq4ytH4LmJimGnqSKDFR8BEyMGv7 hDUZGySFqATaAYGhJpJNyHJge6Gkc7WDK5+f7UtDWF7OW1GyVqhifrS6hm0Df+s5oyClhJ 2Za0c9UTjvxSHbF1QZ18L5KwyWiXN10= X-MC-Unique: q91VdstgNsq5VA3-pe47NQ-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P0vAuPSb2O4zVXn10PSslk9Dlo76Z34l7Wd9h8BjsKY=; b=udT3WDiRVOdP2mCPdbQCMnaq/6+0PhnahUUjlO42Y91AZUUBOe1cuVJhBVO5CMdYxU rhZFqKgDgZfdAhLGZxJkxyrcEdQ0Z4uTvO0QaGWjaaQOp5PweKJutriTCbAnhR7nUl1d cQm0jX2nqznwxiDLmMtrYyzYdQ+473lOuko31SyJl06FJy0+G7Zqky0zDT/65emAg2oQ pbcpWptAPI7WYxKSJrQ2GQF21L55iiXV/RmBbGRd8dkkT/C9DPGEaYrNP7KMDhDy/npa f8aIAA9Nl0rqhJDtlpxil1gdfo9sQ3DcT23wOeURq1JmX7kBOEQcJeHdtRJpHSJLD2Xj ks5Q== X-Gm-Message-State: ACrzQf0ywi/HrgOFJOJV2VqosVS36bF1YUmUsj3sPulYSMTeyaKJxCEe aLL/ymY2CKIu1uKhvjCAZ2GnAz1f+wzDWDBkpSxjg82yUbKu1hIBGz89jB8Wch1BtUqnLGugazZ glDnA9JuwWOU/Z/bca91fKiHSoQbRjgtv5z/ut1He8ZlWv2JaeWUNItfVID02TEwWp34= X-Received: by 2002:a05:6402:40d0:b0:45c:d00a:82bd with SMTP id z16-20020a05640240d000b0045cd00a82bdmr21698439edb.288.1666439779105; Sat, 22 Oct 2022 04:56:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7CQdWOh4Y3Hkph+OaMfftoiVQwaciru2wRzslaIKM9jN2qJQVIB/wjowEvBQvXd2HwKWqOBA== X-Received: by 2002:a05:6402:40d0:b0:45c:d00a:82bd with SMTP id z16-20020a05640240d000b0045cd00a82bdmr21698414edb.288.1666439778622; Sat, 22 Oct 2022 04:56:18 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: Richard Henderson Subject: [PULL 3/4] target/i386: implement F16C instructions Date: Sat, 22 Oct 2022 13:56:07 +0200 Message-Id: <20221022115608.152853-4-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221022115608.152853-1-pbonzini@redhat.com> References: <20221022115608.152853-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -23 X-Spam_score: -2.4 X-Spam_bar: -- X-Spam_report: (-2.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.251, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Qemu-devel" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1666440627377100003 Content-Type: text/plain; charset="utf-8" F16C only consists of two instructions, which are a bit peculiar nevertheless. First, they access only the low half of an YMM or XMM register for the packed-half operand; the exact size still depends on the VEX.L flag. This is similar to the existing avx_movx flag, but not exactly because avx_movx is hardcoded to affect operand 2. To this end I added a "ph" format name; it's possible to reuse this approach for the VPMOVSX and VPMOVZX instructions, though that would also require adding two more formats for the low-quarter and low-eighth of an operand. Second, VCVTPS2PH is somewhat weird because it *stores* the result of the instruction into memory rather than loading it. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/cpu.c | 5 ++--- target/i386/cpu.h | 3 +++ target/i386/ops_sse.h | 29 +++++++++++++++++++++++++++++ target/i386/ops_sse_header.h | 6 ++++++ target/i386/tcg/decode-new.c.inc | 8 ++++++++ target/i386/tcg/decode-new.h | 2 ++ target/i386/tcg/emit.c.inc | 17 ++++++++++++++++- tests/tcg/i386/test-avx.c | 17 +++++++++++++++++ tests/tcg/i386/test-avx.py | 8 ++++++-- 9 files changed, 89 insertions(+), 6 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 0ebd610faa..6292b7e12f 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -625,13 +625,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t ven= dor1, CPUID_EXT_SSE41 | CPUID_EXT_SSE42 | CPUID_EXT_POPCNT | \ CPUID_EXT_XSAVE | /* CPUID_EXT_OSXSAVE is dynamic */ \ CPUID_EXT_MOVBE | CPUID_EXT_AES | CPUID_EXT_HYPERVISOR | \ - CPUID_EXT_RDRAND | CPUID_EXT_AVX) + CPUID_EXT_RDRAND | CPUID_EXT_AVX | CPUID_EXT_F16C) /* missing: CPUID_EXT_DTES64, CPUID_EXT_DSCPL, CPUID_EXT_VMX, CPUID_EXT_SMX, CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID, CPUID_EXT_FMA, CPUID_EXT_XTPR, CPUID_EXT_PDCM, CPUID_EXT_PCID, CPUID_EXT_DCA, - CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER, - CPUID_EXT_F16C */ + CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER */ =20 #ifdef TARGET_X86_64 #define TCG_EXT2_X86_64_FEATURES (CPUID_EXT2_SYSCALL | CPUID_EXT2_LM) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index dad2b2db8d..d4bc19577a 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1258,6 +1258,7 @@ typedef union ZMMReg { uint16_t _w_ZMMReg[512 / 16]; uint32_t _l_ZMMReg[512 / 32]; uint64_t _q_ZMMReg[512 / 64]; + float16 _h_ZMMReg[512 / 16]; float32 _s_ZMMReg[512 / 32]; float64 _d_ZMMReg[512 / 64]; XMMReg _x_ZMMReg[512 / 128]; @@ -1282,6 +1283,7 @@ typedef struct BNDCSReg { #define ZMM_B(n) _b_ZMMReg[63 - (n)] #define ZMM_W(n) _w_ZMMReg[31 - (n)] #define ZMM_L(n) _l_ZMMReg[15 - (n)] +#define ZMM_H(n) _h_ZMMReg[31 - (n)] #define ZMM_S(n) _s_ZMMReg[15 - (n)] #define ZMM_Q(n) _q_ZMMReg[7 - (n)] #define ZMM_D(n) _d_ZMMReg[7 - (n)] @@ -1301,6 +1303,7 @@ typedef struct BNDCSReg { #define ZMM_B(n) _b_ZMMReg[n] #define ZMM_W(n) _w_ZMMReg[n] #define ZMM_L(n) _l_ZMMReg[n] +#define ZMM_H(n) _h_ZMMReg[n] #define ZMM_S(n) _s_ZMMReg[n] #define ZMM_Q(n) _q_ZMMReg[n] #define ZMM_D(n) _d_ZMMReg[n] diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 0799712f6e..33c61896ee 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -586,6 +586,35 @@ void glue(helper_cvtpd2ps, SUFFIX)(CPUX86State *env, R= eg *d, Reg *s) } } =20 +#if SHIFT >=3D 1 +void glue(helper_cvtph2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ + int i; + + for (i =3D 2 << SHIFT; --i >=3D 0; ) { + d->ZMM_S(i) =3D float16_to_float32(s->ZMM_H(i), true, &env->sse_s= tatus); + } +} + +void glue(helper_cvtps2ph, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, int m= ode) +{ + int i; + FloatRoundMode prev_rounding_mode =3D env->sse_status.float_rounding_m= ode; + if (!(mode & (1 << 2))) { + set_x86_rounding_mode(mode & 3, &env->sse_status); + } + + for (i =3D 0; i < 2 << SHIFT; i++) { + d->ZMM_H(i) =3D float32_to_float16(s->ZMM_S(i), true, &env->sse_st= atus); + } + for (i >>=3D 2; i < 1 << SHIFT; i++) { + d->Q(i) =3D 0; + } + + env->sse_status.float_rounding_mode =3D prev_rounding_mode; +} +#endif + #if SHIFT =3D=3D 1 void helper_cvtss2sd(CPUX86State *env, Reg *d, Reg *v, Reg *s) { diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 2f1f811f9f..c4c41976c0 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -353,6 +353,12 @@ DEF_HELPER_4(glue(aeskeygenassist, SUFFIX), void, env,= Reg, Reg, i32) DEF_HELPER_5(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, Reg, i32) #endif =20 +/* F16C helpers */ +#if SHIFT >=3D 1 +DEF_HELPER_3(glue(cvtph2ps, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(cvtps2ph, SUFFIX), void, env, Reg, Reg, int) +#endif + /* AVX helpers */ #if SHIFT >=3D 1 DEF_HELPER_4(glue(vpermilpd, SUFFIX), void, env, Reg, Reg, Reg) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.= c.inc index 8e1eb9db42..8baee9018a 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -336,6 +336,7 @@ static const X86OpEntry opcodes_0F38_00toEF[240] =3D { [0x07] =3D X86_OP_ENTRY3(PHSUBSW, V,x, H,x, W,x, vex4 cpuid(SSSE= 3) mmx avx2_256 p_00_66), =20 [0x10] =3D X86_OP_ENTRY2(PBLENDVB, V,x, W,x, vex4 cpuid(SSE4= 1) avx2_256 p_66), + [0x13] =3D X86_OP_ENTRY2(VCVTPH2PS, V,x, W,ph, vex11 cpuid(F16= C) p_66), [0x14] =3D X86_OP_ENTRY2(BLENDVPS, V,x, W,x, vex4 cpuid(SSE4= 1) p_66), [0x15] =3D X86_OP_ENTRY2(BLENDVPD, V,x, W,x, vex4 cpuid(SSE4= 1) p_66), /* Listed incorrectly as type 4 */ @@ -525,6 +526,7 @@ static const X86OpEntry opcodes_0F3A[256] =3D { [0x15] =3D X86_OP_ENTRY3(PEXTRW, E,w, V,dq, I,b, vex5 cpuid(SSE4= 1) zext0 p_66), [0x16] =3D X86_OP_ENTRY3(PEXTR, E,y, V,dq, I,b, vex5 cpuid(SSE4= 1) p_66), [0x17] =3D X86_OP_ENTRY3(VEXTRACTPS, E,d, V,dq, I,b, vex5 cpuid(SSE4= 1) p_66), + [0x1d] =3D X86_OP_ENTRY3(VCVTPS2PH, W,ph, V,x, I,b, vex11 cpuid(F16= C) p_66), =20 [0x20] =3D X86_OP_ENTRY4(PINSRB, V,dq, H,dq, E,b, vex5 cpuid(SSE4= 1) zext2 p_66), [0x21] =3D X86_OP_GROUP0(VINSERTPS), @@ -1051,6 +1053,10 @@ static bool decode_op_size(DisasContext *s, X86OpEnt= ry *e, X86OpSize size, MemOp *ot =3D s->vex_l ? MO_256 : MO_128; return true; =20 + case X86_SIZE_ph: /* SSE/AVX packed half precision */ + *ot =3D s->vex_l ? MO_128 : MO_64; + return true; + case X86_SIZE_d64: /* Default to 64-bit in 64-bit mode */ *ot =3D CODE64(s) && s->dflag =3D=3D MO_32 ? MO_64 : s->dflag; return true; @@ -1342,6 +1348,8 @@ static bool has_cpuid_feature(DisasContext *s, X86CPU= IDFeature cpuid) switch (cpuid) { case X86_FEAT_None: return true; + case X86_FEAT_F16C: + return (s->cpuid_ext_features & CPUID_EXT_F16C); case X86_FEAT_MOVBE: return (s->cpuid_ext_features & CPUID_EXT_MOVBE); case X86_FEAT_PCLMULQDQ: diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h index f159c26850..0ef54628ee 100644 --- a/target/i386/tcg/decode-new.h +++ b/target/i386/tcg/decode-new.h @@ -92,6 +92,7 @@ typedef enum X86OpSize { /* Custom */ X86_SIZE_d64, X86_SIZE_f64, + X86_SIZE_ph, /* SSE/AVX packed half precision */ } X86OpSize; =20 typedef enum X86CPUIDFeature { @@ -103,6 +104,7 @@ typedef enum X86CPUIDFeature { X86_FEAT_AVX2, X86_FEAT_BMI1, X86_FEAT_BMI2, + X86_FEAT_F16C, X86_FEAT_MOVBE, X86_FEAT_PCLMULQDQ, X86_FEAT_SSE, diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index ebf299451d..9334f0939d 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -296,7 +296,7 @@ static void gen_writeback(DisasContext *s, X86DecodedIn= sn *decode, int opn, TCGv case X86_OP_MMX: break; case X86_OP_SSE: - if (!op->has_ea && (s->prefix & PREFIX_VEX) && op->ot =3D=3D MO_12= 8) { + if (!op->has_ea && (s->prefix & PREFIX_VEX) && op->ot <=3D MO_128)= { tcg_gen_gvec_dup_imm(MO_64, offsetof(CPUX86State, xmm_regs[op->n].ZMM= _X(1)), 16, 16, 0); @@ -852,6 +852,7 @@ UNARY_INT_SSE(VCVTTPD2DQ, cvttpd2dq) UNARY_INT_SSE(VCVTDQ2PS, cvtdq2ps) UNARY_INT_SSE(VCVTPS2DQ, cvtps2dq) UNARY_INT_SSE(VCVTTPS2DQ, cvttps2dq) +UNARY_INT_SSE(VCVTPH2PS, cvtph2ps) =20 =20 static inline void gen_unary_imm_sse(DisasContext *s, CPUX86State *env, X8= 6DecodedInsn *decode, @@ -1868,6 +1869,20 @@ static void gen_VCVTfp2fp(DisasContext *s, CPUX86Sta= te *env, X86DecodedInsn *dec gen_helper_cvtsd2ss, gen_helper_cvtss2sd); } =20 +static void gen_VCVTPS2PH(DisasContext *s, CPUX86State *env, X86DecodedIns= n *decode) +{ + gen_unary_imm_fp_sse(s, env, decode, + gen_helper_cvtps2ph_xmm, + gen_helper_cvtps2ph_ymm); + /* + * VCVTPS2PH is the only instruction that performs an operation on a + * register source and then *stores* into memory. + */ + if (decode->op[0].has_ea) { + gen_store_sse(s, decode, decode->op[0].offset); + } +} + static void gen_VCVTSI2Sx(DisasContext *s, CPUX86State *env, X86DecodedIns= n *decode) { int vec_len =3D vector_len(s, decode); diff --git a/tests/tcg/i386/test-avx.c b/tests/tcg/i386/test-avx.c index 953e2906fe..c39c0e5bce 100644 --- a/tests/tcg/i386/test-avx.c +++ b/tests/tcg/i386/test-avx.c @@ -28,6 +28,7 @@ typedef struct { } TestDef; =20 reg_state initI; +reg_state initF16; reg_state initF32; reg_state initF64; =20 @@ -221,6 +222,7 @@ static void run_all(void) =20 #define ARRAY_LEN(x) (sizeof(x) / sizeof(x[0])) =20 +uint16_t val_f16[] =3D { 0x4000, 0xbc00, 0x44cd, 0x3a66, 0x4200, 0x7a1a, 0= x4780, 0x4826 }; float val_f32[] =3D {2.0, -1.0, 4.8, 0.8, 3, -42.0, 5e6, 7.5, 8.3}; double val_f64[] =3D {2.0, -1.0, 4.8, 0.8, 3, -42.0, 5e6, 7.5}; v4di val_i64[] =3D { @@ -241,6 +243,12 @@ v4di indexd =3D {0x00000002000000efull, 0xfffffff50000= 0010ull, =20 v4di gather_mem[0x20]; =20 +void init_f16reg(v4di *r) +{ + memset(r, 0, sizeof(*r)); + memcpy(r, val_f16, sizeof(val_f16)); +} + void init_f32reg(v4di *r) { static int n; @@ -315,6 +323,15 @@ int main(int argc, char *argv[]) printf("Int:\n"); dump_regs(&initI); =20 + init_all(&initF16); + init_f16reg(&initF16.ymm[10]); + init_f16reg(&initF16.ymm[11]); + init_f16reg(&initF16.ymm[12]); + init_f16reg(&initF16.mem0[1]); + initF16.ff =3D 16; + printf("F16:\n"); + dump_regs(&initF16); + init_all(&initF32); init_f32reg(&initF32.ymm[10]); init_f32reg(&initF32.ymm[11]); diff --git a/tests/tcg/i386/test-avx.py b/tests/tcg/i386/test-avx.py index 02982329f1..ebb1d99c5e 100755 --- a/tests/tcg/i386/test-avx.py +++ b/tests/tcg/i386/test-avx.py @@ -9,6 +9,7 @@ archs =3D [ "SSE", "SSE2", "SSE3", "SSSE3", "SSE4_1", "SSE4_2", "AES", "AVX", "AVX2", "AES+AVX", "VAES+AVX", + "F16C", ] =20 ignore =3D set(["FISTTP", @@ -19,6 +20,7 @@ 'vBLENDPS': 0x0f, 'CMP[PS][SD]': 0x07, 'VCMP[PS][SD]': 0x1f, + 'vCVTPS2PH': 0x7, 'vDPPD': 0x33, 'vDPPS': 0xff, 'vEXTRACTPS': 0x03, @@ -221,8 +223,10 @@ def ArgGenerator(arg, op): class InsnGenerator: def __init__(self, op, args): self.op =3D op - if op[-2:] in ["PS", "PD", "SS", "SD"]: - if op[-1] =3D=3D 'S': + if op[-2:] in ["PH", "PS", "PD", "SS", "SD"]: + if op[-1] =3D=3D 'H': + self.optype =3D 'F16' + elif op[-1] =3D=3D 'S': self.optype =3D 'F32' else: self.optype =3D 'F64' --=20 2.37.3 From nobody Sat May 18 10:30:08 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1666440740; cv=none; d=zohomail.com; s=zohoarc; b=WQSWoGMMrm0TOyEMAvurxnzRiqxgwhZYb+JQTxfPTZaQZp1+5VoTL0ch9ohO1xLD2jdXHAkV1V8zrA3PbqIneshzHi3z90+H44/Br6kAnrUvGDGNFUsOB1zneOpkpmX6q5T9d0x6rtlvW60kN+rw3VXREMIZUVimvXA1Kn+xWeg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1666440740; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=myzg5aKN1qvz+RXQu6Cn+Ajo7dzQv92bv+8d9HRZ2Gk=; b=Hr1nvO5Uf/3CuuXoEmoQxkKUfw2VIgdg2mYVYJtANK6HhF26oJFAlWmpuETE4DhqCmD9mGc4rLwgkyELy+HPzM+ZHlz/Gj/a9tEj6kZmcHAf8kHEDMU6VvXGBIKFwji6DVCQ69d4GWTGgJb3qovDzWNWaGCRpTY+mgKT9d837a8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 166644074059671.14663304312592; Sat, 22 Oct 2022 05:12:20 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1omD7S-0003Bq-4i; Sat, 22 Oct 2022 07:56:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omD76-00033E-5M for qemu-devel@nongnu.org; Sat, 22 Oct 2022 07:56:29 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omD74-0006iB-3x for qemu-devel@nongnu.org; Sat, 22 Oct 2022 07:56:27 -0400 Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-142-ay8YM4qpM9ezUlTO0wNBnQ-1; Sat, 22 Oct 2022 07:56:24 -0400 Received: by mail-ed1-f72.google.com with SMTP id v13-20020a056402348d00b0045d36615696so5128300edc.14 for ; Sat, 22 Oct 2022 04:56:23 -0700 (PDT) Received: from avogadro.local ([2001:b07:6468:f312:2f4b:62da:3159:e077]) by smtp.gmail.com with ESMTPSA id r6-20020a508d86000000b00458b41d9460sm14833666edh.92.2022.10.22.04.56.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 22 Oct 2022 04:56:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666439785; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=myzg5aKN1qvz+RXQu6Cn+Ajo7dzQv92bv+8d9HRZ2Gk=; b=UlTVpEAru2lzQc/fM7VGrC4P/sXvorVIEf/vvFqQf5hy0HikiYiPuXizTOA7uAJAa7M46/ HJX+dyh9dMf6gU+gFySZyntjyEjp/rYpviGMs1ooRhD4DuPIOSxQfTAMuBmNfhQM4gAk06 jB6Z+4Ad0WUPpcd63vrd1mJVXxwqrHM= X-MC-Unique: ay8YM4qpM9ezUlTO0wNBnQ-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=myzg5aKN1qvz+RXQu6Cn+Ajo7dzQv92bv+8d9HRZ2Gk=; b=Eotuu/pqOqEoF1JB+3TGBbwKtCiX+PSjoYwYck6ji758CtopATm9juBcpzt3wrw4Wd xcCe0tGMr60qzQ6RXD0ZqhSlcx0/wqhbk4Jrfa3al9tg0kE7Q4lAcd1TJC/ZVUD1ohsC 0HCbivZnJ7Tr5Ja6EDgwsGQyL3oyFWn6gv7vVavgDZmwGXFrW4AlsX9DBlnjGzpT2gqi B391+xEmP6W2rsvOFurvwSPqDBNuyDhNUTC+bd685MZMoB2m1+3LYeyk2vKg0+7KtOS1 kUfkBihQIHaJ5wNRLNatCn/BTduGDtdo0A1r+5iWbGV9sap17S1ga/WdFeSPWcpdlfVf JcLw== X-Gm-Message-State: ACrzQf110e/M5EEyYZSRQdGBmqU33+/1vI4+QQBdRewNY9JG5LnewOze iFPM5qn4K/LWUEKMj2qjrLq9Eixcpc0MVsa1TFiw99ZUgtOVyT1rP4VUWDYeBIUM99jJ7fzrKNQ IAmelEpAYHIyuj/FKbzGos+refxpDyJn2fYA59N6DB67WXXpRlMnv0JJpURK2FOTdwJM= X-Received: by 2002:a17:907:3da2:b0:78d:3b45:11d9 with SMTP id he34-20020a1709073da200b0078d3b4511d9mr19785192ejc.87.1666439781585; Sat, 22 Oct 2022 04:56:21 -0700 (PDT) X-Google-Smtp-Source: AMsMyM68S6wg42OFHisw0KdttjF9TtBrlYy+TZ0PyMJoAYE3xbAZyhflB+fjpfTiWp/9L3vTcU4xPQ== X-Received: by 2002:a17:907:3da2:b0:78d:3b45:11d9 with SMTP id he34-20020a1709073da200b0078d3b4511d9mr19785177ejc.87.1666439781166; Sat, 22 Oct 2022 04:56:21 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Cc: Richard Henderson Subject: [PULL 4/4] target/i386: implement FMA instructions Date: Sat, 22 Oct 2022 13:56:08 +0200 Message-Id: <20221022115608.152853-5-pbonzini@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221022115608.152853-1-pbonzini@redhat.com> References: <20221022115608.152853-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -23 X-Spam_score: -2.4 X-Spam_bar: -- X-Spam_report: (-2.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.251, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Qemu-devel" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1666440742033100003 Content-Type: text/plain; charset="utf-8" The only issue with FMA instructions is that there are _a lot_ of them (30 opcodes, each of which comes in up to 4 versions depending on VEX.W and VEX.L; a total of 96 possibilities). However, they can be implement with only 6 helpers, two for scalar operations and four for packed operations. (Scalar versions do not do any merging; they only affect the bottom 32 or 64 bits of the output operand. Therefore, there is no separate XMM and YMM of the scalar helpers). First, we can reduce the number of helpers to one third by passing four operands (one output and three inputs); the reordering of which operands go to the multiply and which go to the add is done in emit.c. Second, the different instructions also dispatch to the same softfloat function, so the flags for float32_muladd and float64_muladd are passed in the helper as int arguments, with a little extra complication to handle FMADDSUB and FMSUBADD. Reviewed-by: Richard Henderson Signed-off-by: Paolo Bonzini --- target/i386/cpu.c | 5 ++-- target/i386/ops_sse.h | 27 +++++++++++++++++ target/i386/ops_sse_header.h | 11 +++++++ target/i386/tcg/decode-new.c.inc | 40 +++++++++++++++++++++++++ target/i386/tcg/decode-new.h | 1 + target/i386/tcg/emit.c.inc | 51 ++++++++++++++++++++++++++++++++ target/i386/tcg/translate.c | 1 + tests/tcg/i386/test-avx.py | 2 +- 8 files changed, 135 insertions(+), 3 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 6292b7e12f..22b681ca37 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -625,10 +625,11 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t ven= dor1, CPUID_EXT_SSE41 | CPUID_EXT_SSE42 | CPUID_EXT_POPCNT | \ CPUID_EXT_XSAVE | /* CPUID_EXT_OSXSAVE is dynamic */ \ CPUID_EXT_MOVBE | CPUID_EXT_AES | CPUID_EXT_HYPERVISOR | \ - CPUID_EXT_RDRAND | CPUID_EXT_AVX | CPUID_EXT_F16C) + CPUID_EXT_RDRAND | CPUID_EXT_AVX | CPUID_EXT_F16C | \ + CPUID_EXT_FMA) /* missing: CPUID_EXT_DTES64, CPUID_EXT_DSCPL, CPUID_EXT_VMX, CPUID_EXT_SMX, - CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID, CPUID_EXT_FMA, + CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID, CPUID_EXT_XTPR, CPUID_EXT_PDCM, CPUID_EXT_PCID, CPUID_EXT_DCA, CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER */ =20 diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 33c61896ee..3cbc36a59d 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2522,6 +2522,33 @@ void helper_vpermd_ymm(Reg *d, Reg *v, Reg *s) } #endif =20 +/* FMA3 op helpers */ +#if SHIFT =3D=3D 1 +#define SSE_HELPER_FMAS(name, elem, F) = \ + void name(CPUX86State *env, Reg *d, Reg *a, Reg *b, Reg *c, int flags)= \ + { = \ + d->elem(0) =3D F(a->elem(0), b->elem(0), c->elem(0), flags, &env->= sse_status); \ + } +#define SSE_HELPER_FMAP(name, elem, num, F) = \ + void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *a, Reg *b, Reg = *c, \ + int flags, int flip) = \ + { = \ + int i; = \ + for (i =3D 0; i < num; i++) { = \ + d->elem(i) =3D F(a->elem(i), b->elem(i), c->elem(i), flags, &e= nv->sse_status); \ + flags ^=3D flip; = \ + } = \ + } + +SSE_HELPER_FMAS(helper_fma4ss, ZMM_S, float32_muladd) +SSE_HELPER_FMAS(helper_fma4sd, ZMM_D, float64_muladd) +#endif + +#if SHIFT >=3D 1 +SSE_HELPER_FMAP(helper_fma4ps, ZMM_S, 2 << SHIFT, float32_muladd) +SSE_HELPER_FMAP(helper_fma4pd, ZMM_D, 1 << SHIFT, float64_muladd) +#endif + #undef SSE_HELPER_S =20 #undef LANE_WIDTH diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index c4c41976c0..8a7b2f4e2f 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -359,6 +359,17 @@ DEF_HELPER_3(glue(cvtph2ps, SUFFIX), void, env, Reg, R= eg) DEF_HELPER_4(glue(cvtps2ph, SUFFIX), void, env, Reg, Reg, int) #endif =20 +/* FMA3 helpers */ +#if SHIFT =3D=3D 1 +DEF_HELPER_6(fma4ss, void, env, Reg, Reg, Reg, Reg, int) +DEF_HELPER_6(fma4sd, void, env, Reg, Reg, Reg, Reg, int) +#endif + +#if SHIFT >=3D 1 +DEF_HELPER_7(glue(fma4ps, SUFFIX), void, env, Reg, Reg, Reg, Reg, int, int) +DEF_HELPER_7(glue(fma4pd, SUFFIX), void, env, Reg, Reg, Reg, Reg, int, int) +#endif + /* AVX helpers */ #if SHIFT >=3D 1 DEF_HELPER_4(glue(vpermilpd, SUFFIX), void, env, Reg, Reg, Reg) diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.= c.inc index 8baee9018a..e4878b967f 100644 --- a/target/i386/tcg/decode-new.c.inc +++ b/target/i386/tcg/decode-new.c.inc @@ -376,6 +376,16 @@ static const X86OpEntry opcodes_0F38_00toEF[240] =3D { [0x92] =3D X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 cpuid(AVX2= ) p_66), /* vgatherdps/d */ [0x93] =3D X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 cpuid(AVX2= ) p_66), /* vgatherqps/d */ =20 + /* Should be exception type 2 but they do not have legacy SSE equivale= nts? */ + [0x96] =3D X86_OP_ENTRY3(VFMADDSUB132Px, V,x, H,x, W,x, vex6 cpuid(F= MA) p_66), + [0x97] =3D X86_OP_ENTRY3(VFMSUBADD132Px, V,x, H,x, W,x, vex6 cpuid(F= MA) p_66), + + [0xa6] =3D X86_OP_ENTRY3(VFMADDSUB213Px, V,x, H,x, W,x, vex6 cpuid(F= MA) p_66), + [0xa7] =3D X86_OP_ENTRY3(VFMSUBADD213Px, V,x, H,x, W,x, vex6 cpuid(F= MA) p_66), + + [0xb6] =3D X86_OP_ENTRY3(VFMADDSUB231Px, V,x, H,x, W,x, vex6 cpuid(F= MA) p_66), + [0xb7] =3D X86_OP_ENTRY3(VFMSUBADD231Px, V,x, H,x, W,x, vex6 cpuid(F= MA) p_66), + [0x08] =3D X86_OP_ENTRY3(PSIGNB, V,x, H,x, W,x, vex4 cpuid= (SSSE3) mmx avx2_256 p_00_66), [0x09] =3D X86_OP_ENTRY3(PSIGNW, V,x, H,x, W,x, vex4 cpuid= (SSSE3) mmx avx2_256 p_00_66), [0x0a] =3D X86_OP_ENTRY3(PSIGND, V,x, H,x, W,x, vex4 cpuid= (SSSE3) mmx avx2_256 p_00_66), @@ -421,6 +431,34 @@ static const X86OpEntry opcodes_0F38_00toEF[240] =3D { [0x8c] =3D X86_OP_ENTRY3(VPMASKMOV, V,x, H,x, WM,x, vex6 cpuid(AVX= 2) p_66), [0x8e] =3D X86_OP_ENTRY3(VPMASKMOV_st, M,x, V,x, H,x, vex6 cpuid(AVX= 2) p_66), =20 + /* Should be exception type 2 or 3 but they do not have legacy SSE equ= ivalents? */ + [0x98] =3D X86_OP_ENTRY3(VFMADD132Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0x99] =3D X86_OP_ENTRY3(VFMADD132Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0x9a] =3D X86_OP_ENTRY3(VFMSUB132Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0x9b] =3D X86_OP_ENTRY3(VFMSUB132Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0x9c] =3D X86_OP_ENTRY3(VFNMADD132Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0x9d] =3D X86_OP_ENTRY3(VFNMADD132Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0x9e] =3D X86_OP_ENTRY3(VFNMSUB132Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0x9f] =3D X86_OP_ENTRY3(VFNMSUB132Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + + [0xa8] =3D X86_OP_ENTRY3(VFMADD213Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xa9] =3D X86_OP_ENTRY3(VFMADD213Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xaa] =3D X86_OP_ENTRY3(VFMSUB213Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xab] =3D X86_OP_ENTRY3(VFMSUB213Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xac] =3D X86_OP_ENTRY3(VFNMADD213Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xad] =3D X86_OP_ENTRY3(VFNMADD213Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xae] =3D X86_OP_ENTRY3(VFNMSUB213Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xaf] =3D X86_OP_ENTRY3(VFNMSUB213Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + + [0xb8] =3D X86_OP_ENTRY3(VFMADD231Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xb9] =3D X86_OP_ENTRY3(VFMADD231Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xba] =3D X86_OP_ENTRY3(VFMSUB231Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xbb] =3D X86_OP_ENTRY3(VFMSUB231Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xbc] =3D X86_OP_ENTRY3(VFNMADD231Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xbd] =3D X86_OP_ENTRY3(VFNMADD231Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xbe] =3D X86_OP_ENTRY3(VFNMSUB231Px, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xbf] =3D X86_OP_ENTRY3(VFNMSUB231Sx, V,x, H,x, W,x, vex6 cpuid(FMA= ) p_66), + [0xdb] =3D X86_OP_ENTRY3(VAESIMC, V,dq, None,None, W,dq, vex4 cpui= d(AES) p_66), [0xdc] =3D X86_OP_ENTRY3(VAESENC, V,x, H,x, W,x, vex4 cpui= d(AES) p_66), [0xdd] =3D X86_OP_ENTRY3(VAESENCLAST, V,x, H,x, W,x, vex4 cpui= d(AES) p_66), @@ -1350,6 +1388,8 @@ static bool has_cpuid_feature(DisasContext *s, X86CPU= IDFeature cpuid) return true; case X86_FEAT_F16C: return (s->cpuid_ext_features & CPUID_EXT_F16C); + case X86_FEAT_FMA: + return (s->cpuid_ext_features & CPUID_EXT_FMA); case X86_FEAT_MOVBE: return (s->cpuid_ext_features & CPUID_EXT_MOVBE); case X86_FEAT_PCLMULQDQ: diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h index 0ef54628ee..cb6b8bcf67 100644 --- a/target/i386/tcg/decode-new.h +++ b/target/i386/tcg/decode-new.h @@ -105,6 +105,7 @@ typedef enum X86CPUIDFeature { X86_FEAT_BMI1, X86_FEAT_BMI2, X86_FEAT_F16C, + X86_FEAT_FMA, X86_FEAT_MOVBE, X86_FEAT_PCLMULQDQ, X86_FEAT_SSE, diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc index 9334f0939d..7037ff91c6 100644 --- a/target/i386/tcg/emit.c.inc +++ b/target/i386/tcg/emit.c.inc @@ -39,6 +39,11 @@ typedef void (*SSEFunc_0_eppt)(TCGv_ptr env, TCGv_ptr re= g_a, TCGv_ptr reg_b, TCGv val); typedef void (*SSEFunc_0_epppti)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr re= g_b, TCGv_ptr reg_c, TCGv a0, TCGv_i32 scale); +typedef void (*SSEFunc_0_eppppi)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr re= g_b, + TCGv_ptr reg_c, TCGv_ptr reg_d, TCGv_i32= flags); +typedef void (*SSEFunc_0_eppppii)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr r= eg_b, + TCGv_ptr reg_c, TCGv_ptr reg_d, TCGv_i32= even, + TCGv_i32 odd); =20 static inline TCGv_i32 tcg_constant8u_i32(uint8_t val) { @@ -491,6 +496,52 @@ FP_SSE(VMIN, min) FP_SSE(VDIV, div) FP_SSE(VMAX, max) =20 +#define FMA_SSE_PACKED(uname, ptr0, ptr1, ptr2, even, odd) = \ +static void gen_##uname##Px(DisasContext *s, CPUX86State *env, X86DecodedI= nsn *decode) \ +{ = \ + SSEFunc_0_eppppii xmm =3D s->vex_w ? gen_helper_fma4pd_xmm : gen_helpe= r_fma4ps_xmm; \ + SSEFunc_0_eppppii ymm =3D s->vex_w ? gen_helper_fma4pd_ymm : gen_helpe= r_fma4ps_ymm; \ + SSEFunc_0_eppppii fn =3D s->vex_l ? ymm : xmm; = \ + = \ + fn(cpu_env, OP_PTR0, ptr0, ptr1, ptr2, = \ + tcg_constant_i32(even), = \ + tcg_constant_i32((even) ^ (odd))); = \ +} + +#define FMA_SSE(uname, ptr0, ptr1, ptr2, flags) = \ +FMA_SSE_PACKED(uname, ptr0, ptr1, ptr2, flags, flags) = \ +static void gen_##uname##Sx(DisasContext *s, CPUX86State *env, X86DecodedI= nsn *decode) \ +{ = \ + SSEFunc_0_eppppi fn =3D s->vex_w ? gen_helper_fma4sd : gen_helper_fma4= ss; \ + = \ + fn(cpu_env, OP_PTR0, ptr0, ptr1, ptr2, = \ + tcg_constant_i32(flags)); = \ +} = \ + +FMA_SSE(VFMADD231, OP_PTR1, OP_PTR2, OP_PTR0, 0) +FMA_SSE(VFMADD213, OP_PTR1, OP_PTR0, OP_PTR2, 0) +FMA_SSE(VFMADD132, OP_PTR0, OP_PTR2, OP_PTR1, 0) + +FMA_SSE(VFNMADD231, OP_PTR1, OP_PTR2, OP_PTR0, float_muladd_negate_product) +FMA_SSE(VFNMADD213, OP_PTR1, OP_PTR0, OP_PTR2, float_muladd_negate_product) +FMA_SSE(VFNMADD132, OP_PTR0, OP_PTR2, OP_PTR1, float_muladd_negate_product) + +FMA_SSE(VFMSUB231, OP_PTR1, OP_PTR2, OP_PTR0, float_muladd_negate_c) +FMA_SSE(VFMSUB213, OP_PTR1, OP_PTR0, OP_PTR2, float_muladd_negate_c) +FMA_SSE(VFMSUB132, OP_PTR0, OP_PTR2, OP_PTR1, float_muladd_negate_c) + +FMA_SSE(VFNMSUB231, OP_PTR1, OP_PTR2, OP_PTR0, float_muladd_negate_c|float= _muladd_negate_product) +FMA_SSE(VFNMSUB213, OP_PTR1, OP_PTR0, OP_PTR2, float_muladd_negate_c|float= _muladd_negate_product) +FMA_SSE(VFNMSUB132, OP_PTR0, OP_PTR2, OP_PTR1, float_muladd_negate_c|float= _muladd_negate_product) + +FMA_SSE_PACKED(VFMADDSUB231, OP_PTR1, OP_PTR2, OP_PTR0, float_muladd_negat= e_c, 0) +FMA_SSE_PACKED(VFMADDSUB213, OP_PTR1, OP_PTR0, OP_PTR2, float_muladd_negat= e_c, 0) +FMA_SSE_PACKED(VFMADDSUB132, OP_PTR0, OP_PTR2, OP_PTR1, float_muladd_negat= e_c, 0) + +FMA_SSE_PACKED(VFMSUBADD231, OP_PTR1, OP_PTR2, OP_PTR0, 0, float_muladd_ne= gate_c) +FMA_SSE_PACKED(VFMSUBADD213, OP_PTR1, OP_PTR0, OP_PTR2, 0, float_muladd_ne= gate_c) +FMA_SSE_PACKED(VFMSUBADD132, OP_PTR0, OP_PTR2, OP_PTR1, 0, float_muladd_ne= gate_c) + #define FP_UNPACK_SSE(uname, lname) = \ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn = *decode) \ { = \ diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index e19d5c1c64..85be2e58c2 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -26,6 +26,7 @@ #include "tcg/tcg-op-gvec.h" #include "exec/cpu_ldst.h" #include "exec/translator.h" +#include "fpu/softfloat.h" =20 #include "exec/helper-proto.h" #include "exec/helper-gen.h" diff --git a/tests/tcg/i386/test-avx.py b/tests/tcg/i386/test-avx.py index ebb1d99c5e..d9ca00a49e 100755 --- a/tests/tcg/i386/test-avx.py +++ b/tests/tcg/i386/test-avx.py @@ -9,7 +9,7 @@ archs =3D [ "SSE", "SSE2", "SSE3", "SSSE3", "SSE4_1", "SSE4_2", "AES", "AVX", "AVX2", "AES+AVX", "VAES+AVX", - "F16C", + "F16C", "FMA", ] =20 ignore =3D set(["FISTTP", --=20 2.37.3