[Qemu-devel] [PATCH] target/i386: fix pmovsx/pmovzx in-place operations

Joseph Myers posted 1 patch 6 years, 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/alpine.DEB.2.20.1708082018390.23380@digraph.polyomino.org.uk
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
[Qemu-devel] [PATCH] target/i386: fix pmovsx/pmovzx in-place operations
Posted by Joseph Myers 6 years, 8 months ago
The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte
or 4-byte inputs and sign-extend or zero-extend them to a wider vector
output.  The associated helpers for these instructions do the
extension on each element in turn, starting with the lowest.  If the
input and output are the same register, this means that all the input
elements after the first have been overwritten before they are read.
This patch makes the helpers extend starting with the highest element,
not the lowest, to avoid such overwriting.  This fixes many GCC test
failures (161 in the gcc testsuite in my GCC 6-based testing) when
testing with a default CPU setting enabling those instructions.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>

---

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 16509d0..d578216 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -1617,18 +1617,18 @@ void glue(helper_ptest, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 #define SSE_HELPER_F(name, elem, num, F)        \
     void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)     \
     {                                           \
-        d->elem(0) = F(0);                      \
-        d->elem(1) = F(1);                      \
         if (num > 2) {                          \
-            d->elem(2) = F(2);                  \
-            d->elem(3) = F(3);                  \
             if (num > 4) {                      \
-                d->elem(4) = F(4);              \
-                d->elem(5) = F(5);              \
-                d->elem(6) = F(6);              \
                 d->elem(7) = F(7);              \
+                d->elem(6) = F(6);              \
+                d->elem(5) = F(5);              \
+                d->elem(4) = F(4);              \
             }                                   \
+            d->elem(3) = F(3);                  \
+            d->elem(2) = F(2);                  \
         }                                       \
+        d->elem(1) = F(1);                      \
+        d->elem(0) = F(0);                      \
     }
 
 SSE_HELPER_F(helper_pmovsxbw, W, 8, (int8_t) s->B)

-- 
Joseph S. Myers
joseph@codesourcery.com

Re: [Qemu-devel] [PATCH] target/i386: fix pmovsx/pmovzx in-place operations
Posted by Paolo Bonzini 6 years, 8 months ago
On 08/08/2017 22:21, Joseph Myers wrote:
> The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte
> or 4-byte inputs and sign-extend or zero-extend them to a wider vector
> output.  The associated helpers for these instructions do the
> extension on each element in turn, starting with the lowest.  If the
> input and output are the same register, this means that all the input
> elements after the first have been overwritten before they are read.
> This patch makes the helpers extend starting with the highest element,
> not the lowest, to avoid such overwriting.  This fixes many GCC test
> failures (161 in the gcc testsuite in my GCC 6-based testing) when
> testing with a default CPU setting enabling those instructions.
> 
> Signed-off-by: Joseph Myers <joseph@codesourcery.com>
> 
> ---
> 
> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
> index 16509d0..d578216 100644
> --- a/target/i386/ops_sse.h
> +++ b/target/i386/ops_sse.h
> @@ -1617,18 +1617,18 @@ void glue(helper_ptest, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
>  #define SSE_HELPER_F(name, elem, num, F)        \
>      void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)     \
>      {                                           \
> -        d->elem(0) = F(0);                      \
> -        d->elem(1) = F(1);                      \
>          if (num > 2) {                          \
> -            d->elem(2) = F(2);                  \
> -            d->elem(3) = F(3);                  \
>              if (num > 4) {                      \
> -                d->elem(4) = F(4);              \
> -                d->elem(5) = F(5);              \
> -                d->elem(6) = F(6);              \
>                  d->elem(7) = F(7);              \
> +                d->elem(6) = F(6);              \
> +                d->elem(5) = F(5);              \
> +                d->elem(4) = F(4);              \
>              }                                   \
> +            d->elem(3) = F(3);                  \
> +            d->elem(2) = F(2);                  \
>          }                                       \
> +        d->elem(1) = F(1);                      \
> +        d->elem(0) = F(0);                      \
>      }
>  
>  SSE_HELPER_F(helper_pmovsxbw, W, 8, (int8_t) s->B)
> 

Queued, thanks.

Paolo