Fix some Neon insns on big-endian hosts

[PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts

Posted by Peter Maydell 5 years, 3 months ago

The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
I believe that gvec_udot_idx_h and gvec_sdot_idx_h are OK
because the index there is over groups of 4*16-bit values,
which are 64 bits each.
---
 target/arm/vec_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 30d76d05beb..0f33127c4c4 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -293,7 +293,7 @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     int8_t *n = vn;
-    int8_t *m_indexed = (int8_t *)vm + index * 4;
+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
@@ -324,7 +324,7 @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     uint8_t *n = vn;
-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
-- 
2.20.1

Re: [PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts

Posted by Philippe Mathieu-Daudé 5 years, 3 months ago

On 10/28/20 8:17 PM, Peter Maydell wrote:
> The helper functions for performing the udot/sdot operations against
> a scalar were not using an address-swizzling macro when converting
> the index of the scalar element into a pointer into the vm array.
> This had no effect on little-endian hosts but meant we generated
> incorrect results on big-endian hosts.
> 
> For these insns, the index is indexing over group of 4 8-bit values,
> so 32 bits per indexed entity, and H4() is therefore what we want.
> (For Neon the only possible input indexes are 0 and 1.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> I believe that gvec_udot_idx_h and gvec_sdot_idx_h are OK
> because the index there is over groups of 4*16-bit values,
> which are 64 bits each.
> ---
>  target/arm/vec_helper.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

[PATCH 1/2] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
[PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts