From nobody Mon May 6 05:37:10 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1502324735567430.8396033130216; Wed, 9 Aug 2017 17:25:35 -0700 (PDT) Received: from localhost ([::1]:50382 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dfbI5-0003cM-5I for importer@patchew.org; Wed, 09 Aug 2017 20:25:33 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40217) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dfbHE-0002zV-6t for qemu-devel@nongnu.org; Wed, 09 Aug 2017 20:24:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dfbH9-0004Gr-63 for qemu-devel@nongnu.org; Wed, 09 Aug 2017 20:24:40 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:55788) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dfbH8-0004Ex-Vi for qemu-devel@nongnu.org; Wed, 09 Aug 2017 20:24:35 -0400 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=svr-ies-mbx-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1dfbH4-0005nR-Kc from joseph_myers@mentor.com ; Wed, 09 Aug 2017 17:24:30 -0700 Received: from digraph.polyomino.org.uk (137.202.0.87) by svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Thu, 10 Aug 2017 01:24:27 +0100 Received: from jsm28 (helo=localhost) by digraph.polyomino.org.uk with local-esmtp (Exim 4.86_2) (envelope-from ) id 1dfbGx-0004GF-Ah; Thu, 10 Aug 2017 00:24:23 +0000 Date: Thu, 10 Aug 2017 00:24:23 +0000 From: Joseph Myers X-X-Sender: jsm28@digraph.polyomino.org.uk To: , , , Message-ID: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 X-Originating-IP: [137.202.0.87] X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 Subject: [Qemu-devel] [PATCH] target/i386: fix packusdw in-place operation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The SSE4.1 packusdw instruction combines source and destination vectors of signed 32-bit integers into a single vector of unsigned 16-bit integers, with unsigned saturation. When the source and destination are the same register, this means each 32-bit element of that register is used twice as an input, to produce two of the 16-bit output elements, and so if the operation is carried out element-by-element in-place, no matter what the order in which it is applied to the elements, the first element's operation will overwrite some future input. The helper for packssdw avoids this issue by computing the result in a local temporary and copying it to the destination at the end; this patch fixes the packusdw helper to do likewise. This fixes three gcc test failures in my GCC 6-based testing. Signed-off-by: Joseph Myers --- diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 16509d0..05b1701 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1655,14 +1655,17 @@ SSE_HELPER_Q(helper_pcmpeqq, FCMPEQQ) =20 void glue(helper_packusdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { - d->W(0) =3D satuw((int32_t) d->L(0)); - d->W(1) =3D satuw((int32_t) d->L(1)); - d->W(2) =3D satuw((int32_t) d->L(2)); - d->W(3) =3D satuw((int32_t) d->L(3)); - d->W(4) =3D satuw((int32_t) s->L(0)); - d->W(5) =3D satuw((int32_t) s->L(1)); - d->W(6) =3D satuw((int32_t) s->L(2)); - d->W(7) =3D satuw((int32_t) s->L(3)); + Reg r; + + r.W(0) =3D satuw((int32_t) d->L(0)); + r.W(1) =3D satuw((int32_t) d->L(1)); + r.W(2) =3D satuw((int32_t) d->L(2)); + r.W(3) =3D satuw((int32_t) d->L(3)); + r.W(4) =3D satuw((int32_t) s->L(0)); + r.W(5) =3D satuw((int32_t) s->L(1)); + r.W(6) =3D satuw((int32_t) s->L(2)); + r.W(7) =3D satuw((int32_t) s->L(3)); + *d =3D r; } =20 #define FMINSB(d, s) MIN((int8_t)d, (int8_t)s) --=20 Joseph S. Myers joseph@codesourcery.com