[v1] s390x/tcg: Vector Instruction Support Part 1

[Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)

Posted by David Hildenbrand 6 years, 11 months ago

We cannot use gvec expansion as source and destination elements are
have different element numbers. So we'll expand using a fancy loop.
Also, we have to take care of overlapping source and target registers and
use a temporary register in case they do.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  4 +++
 target/s390x/translate_vx.inc.c | 43 +++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2a9ac9cebc..51003cf917 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1010,6 +1010,10 @@
     F(0xe762, VLVGP,   VRR_f, V,   r2, r3, 0, 0, vlvgp, 0, IF_VEC)
 /* VECTOR LOAD WITH LENGTH */
     F(0xe737, VLL,     VRS_b, V,   la2, r3_32u, 0, 0, vll, 0, IF_VEC)
+/* VECTOR MERGE HIGH */
+    F(0xe761, VMRH,    VRR_c, V,   0, 0, 0, 0, vmr, 0, IF_VEC)
+/* VECTOR MERGE LOW */
+    F(0xe760, VMRL,    VRR_c, V,   0, 0, 0, 0, vmr, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 37f312fbb4..64a5ee55ca 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -481,3 +481,46 @@ static DisasJumpType op_vll(DisasContext *s, DisasOps *o)
     tcg_temp_free_ptr(a0);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vmr(DisasContext *s, DisasOps *o)
+{
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v2 = get_field(s->fields, v2);
+    const uint8_t v3 = get_field(s->fields, v3);
+    const uint8_t es = get_field(s->fields, m4);
+    const bool high = s->fields->op2 == 0x61;
+    int dst_idx, src_idx;
+    uint8_t dst_v = v1;
+    TCGv_i64 tmp;
+
+    if (es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* Source and destination overlap -> use a temporary register */
+    if (v1 == v2 || v1 == v3) {
+        dst_v = TMP_VREG_0;
+    }
+
+    tmp = tcg_temp_new_i64();
+    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) {
+        src_idx = dst_idx / 2;
+        if (!high) {
+            src_idx += NUM_VEC_ELEMENTS(es) / 2;
+        }
+        if (dst_idx % 2 == 0) {
+            read_vec_element_i64(tmp, v2, src_idx, es);
+        } else {
+            read_vec_element_i64(tmp, v3, src_idx, es);
+        }
+        write_vec_element_i64(tmp, dst_v, dst_idx, es);
+    }
+    tcg_temp_free_i64(tmp);
+
+    /* move the temporary to the destination */
+    if (dst_v != v1) {
+        gen_gvec_mov(v1, dst_v);
+    }
+    return DISAS_NEXT;
+}
-- 
2.17.2

Re: [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)

Posted by Richard Henderson 6 years, 11 months ago

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> We cannot use gvec expansion as source and destination elements are
> have different element numbers. So we'll expand using a fancy loop.
> Also, we have to take care of overlapping source and target registers and
> use a temporary register in case they do.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  4 +++
>  target/s390x/translate_vx.inc.c | 43 +++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

Re: [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)

Posted by Richard Henderson 6 years, 11 months ago

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> +    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) {
> +        src_idx = dst_idx / 2;
> +        if (!high) {
> +            src_idx += NUM_VEC_ELEMENTS(es) / 2;
> +        }
> +        if (dst_idx % 2 == 0) {
> +            read_vec_element_i64(tmp, v2, src_idx, es);
> +        } else {
> +            read_vec_element_i64(tmp, v3, src_idx, es);
> +        }
> +        write_vec_element_i64(tmp, dst_v, dst_idx, es);
> +    }

TODO: Note that you do not need a vector temporary here, so long as you load
both source elements before writing, and you iterate in the proper direction.

For VMRL, iterate forward as you do now.  The element access order for MO_32:

 read  v2: 2   3
 read  v3:   2   3
 write v1: 0 1 2 3

For VMRH, iterate backward:

 read  v2: 1   0
 read  v3:   1   0
 write v1: 3 2 1 0


r~

Re: [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)

Posted by David Hildenbrand 6 years, 11 months ago

On 27.02.19 17:20, Richard Henderson wrote:
> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>> +    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) {
>> +        src_idx = dst_idx / 2;
>> +        if (!high) {
>> +            src_idx += NUM_VEC_ELEMENTS(es) / 2;
>> +        }
>> +        if (dst_idx % 2 == 0) {
>> +            read_vec_element_i64(tmp, v2, src_idx, es);
>> +        } else {
>> +            read_vec_element_i64(tmp, v3, src_idx, es);
>> +        }
>> +        write_vec_element_i64(tmp, dst_v, dst_idx, es);
>> +    }
> 
> TODO: Note that you do not need a vector temporary here, so long as you load
> both source elements before writing, and you iterate in the proper direction.
> 
> For VMRL, iterate forward as you do now.  The element access order for MO_32:
> 
>  read  v2: 2   3
>  read  v3:   2   3
>  write v1: 0 1 2 3
> 
> For VMRH, iterate backward:
> 
>  read  v2: 1   0
>  read  v3:   1   0
>  write v1: 3 2 1 0
> 
> 
> r~
> 

Let's have a look for VMRH when iterating forward (My brain is a little
slow in the morning):

v1[0] = v2[0]
v1[1] = v3[0]
v1[2] = v2[1]
v1[3] = v3[1]

If all would overlap

v1[0] = v1[0]
v1[1] = v1[0] -> v1[0] already modified
v1[2] = v1[1] -> v1[1] already modified
v1[3] = v1[1] -> v1[1] already modified

When iterating backwards:

v1[3] = v3[1]
v1[2] = v2[1]
v1[1] = v3[0]
v1[0] = v2[0]

If all would overlap

v1[3] = v1[1]
v1[2] = v1[1]
v1[1] = v1[0]
v1[0] = v1[0]


VMRH when iterating forward:

v1[0] = v2[2]
v1[1] = v3[2]
v1[2] = v2[3]
v1[3] = v3[3]

If all would overlap

v1[0] = v1[2]
v1[1] = v1[2]
v1[2] = v1[3]
v1[3] = v1[3]

Perfect :) I'll split up the two cases! Thanks!

-- 

Thanks,

David / dhildenb