target/arm: Third slice of MVE implementation

[PATCH for-6.2 03/34] target/arm: Fix MVE VSLI by 0 and VSRI by <dt>

Posted by Peter Maydell 4 years, 7 months ago

In the MVE shift-and-insert insns, we special case VSLI by 0
and VSRI by <dt>, both of which mean "no shift". However we
incorrectly implemented these as "don't update the destination",
which works only if Qd == Qm. When Qd != Qm this kind of
shift must update Qd, honouring the predicate mask.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mve_helper.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index db5d6220854..16a701933b8 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1276,19 +1276,23 @@ DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
                                 void *vm, uint32_t shift)               \
     {                                                                   \
         uint64_t *d = vd, *m = vm;                                      \
-        uint16_t mask;                                                  \
+        uint16_t mask = mve_element_mask(env);                          \
         uint64_t shiftmask;                                             \
         unsigned e;                                                     \
         if (shift == 0 || shift == ESIZE * 8) {                         \
             /*                                                          \
              * Only VSLI can shift by 0; only VSRI can shift by <dt>.   \
              * The generic logic would give the right answer for 0 but  \
-             * fails for <dt>.                                          \
+             * fails for <dt>. In both cases, we must not shift the     \
+             * input but just copy it to the destination, honouring     \
+             * the predicate mask.                                      \
              */                                                         \
+            for (e = 0; e < 16 / 8; e++, mask >>= 8) {                  \
+                mergemask(&d[H8(e)], m[H8(e)], mask);                   \
+            }                                                           \
             goto done;                                                  \
         }                                                               \
         assert(shift < ESIZE * 8);                                      \
-        mask = mve_element_mask(env);                                   \
         /* ESIZE / 2 gives the MO_* value if ESIZE is in [1,2,4] */     \
         shiftmask = dup_const(ESIZE / 2, MASKFN(ESIZE * 8, shift));     \
         for (e = 0; e < 16 / 8; e++, mask >>= 8) {                      \
-- 
2.20.1

Re: [PATCH for-6.2 03/34] target/arm: Fix MVE VSLI by 0 and VSRI by <dt>

Posted by Richard Henderson 4 years, 6 months ago

On 7/13/21 6:36 AM, Peter Maydell wrote:
>           if (shift == 0 || shift == ESIZE * 8) {                         \
>               /*                                                          \
>                * Only VSLI can shift by 0; only VSRI can shift by <dt>.   \
>                * The generic logic would give the right answer for 0 but  \
> -             * fails for <dt>.                                          \
> +             * fails for <dt>. In both cases, we must not shift the     \
> +             * input but just copy it to the destination, honouring     \
> +             * the predicate mask.                                      \
>                */                                                         \
> +            for (e = 0; e < 16 / 8; e++, mask >>= 8) {                  \
> +                mergemask(&d[H8(e)], m[H8(e)], mask);                   \
> +            }                                                           \
>               goto done;                                                  \
>           }                                                               \

VSLI is d = op1 << shift | (d & ~(-1 << shift))

for shift = 0 does result in d = op1.

However,

VRSI is d = op1 >> shift | (d & ~(-1 >> shift))

for shift = 32 results in d = d.


r~