[v1] s390x/tcg: Vector Instruction Support Part 1

[Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE

Posted by David Hildenbrand 6 years, 11 months ago

Also fairly easy to implement. One issue we have is that exceptions will
result in some vectors already being modified. At least handle it
consistently per vector by using a temporary vector. Good enough for
now, add a FIXME.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 46a0739703..65ff8bbd2e 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1000,6 +1000,8 @@
     F(0xe721, VLGV,    VRS_c, V,   la2, 0, r1, 0, vlgv, 0, IF_VEC)
 /* VECTOR LOAD LOGICAL ELEMENT AND ZERO */
     F(0xe704, VLLEZ,   VRX,   V,   la2, 0, 0, 0, vllez, 0, IF_VEC)
+/* VECTOR LOAD MULTIPLE */
+    F(0xe736, VLM,     VRS_a, V,   la2, 0, 0, 0, vlm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 301408d1f2..c9f57afd4a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -377,3 +377,29 @@ static DisasJumpType op_vllez(DisasContext *s, DisasOps *o)
     gen_gvec_mov(get_field(s->fields, v1), TMP_VREG_0);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlm(DisasContext *s, DisasOps *o)
+{
+    const uint8_t v3 = get_field(s->fields, v3);
+    uint8_t v1 = get_field(s->fields, v1);
+
+    while (v3 < v1 || (v3 - v1 + 1) > 16) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /*
+     * FIXME: On exceptions we must not modify any vector.
+     */
+    for (;; v1++) {
+        load_vec_element(s, TMP_VREG_0, 0, o->addr1, MO_64);
+        gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+        load_vec_element(s, TMP_VREG_0, 1, o->addr1, MO_64);
+        gen_gvec_mov(v1, TMP_VREG_0);
+        if (v1 == v3) {
+            break;
+        }
+        gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    }
+    return DISAS_NEXT;
+}
-- 
2.17.2

Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE

Posted by Richard Henderson 6 years, 11 months ago

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> Also fairly easy to implement. One issue we have is that exceptions will
> result in some vectors already being modified. At least handle it
> consistently per vector by using a temporary vector. Good enough for
> now, add a FIXME.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
>  2 files changed, 28 insertions(+)

I suppose the fixme is good enough.  For the record, I think you could do the
check with just two loads -- the first and last quadword.  After that, none of
the other loads can fault, and you can store everything else into the
destination vectors as you read them.

Also missing for the fixme: MO_ALIGN{,_16}.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~

Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE

Posted by David Hildenbrand 6 years, 11 months ago

On 27.02.19 17:02, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> Also fairly easy to implement. One issue we have is that exceptions will
>> result in some vectors already being modified. At least handle it
>> consistently per vector by using a temporary vector. Good enough for
>> now, add a FIXME.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/insn-data.def      |  2 ++
>>  target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
>>  2 files changed, 28 insertions(+)
> 
> I suppose the fixme is good enough.  For the record, I think you could do the
> check with just two loads -- the first and last quadword.  After that, none of
> the other loads can fault, and you can store everything else into the
> destination vectors as you read them.

Aren't such approaches prone to races if other VCPUs invalidate page
tables/TLB entries?

(or am I messing up things and the MMU of this VCPU won't be touched
while in this block and once we touched all applicable pages, it cannot
fail anymore?)

> 
> Also missing for the fixme: MO_ALIGN{,_16}.

Just like the other occurrence, I think MO_ALIGN would be wrong.

"Setting the alignment hint to a non-zero value
that doesn’t correspond to the alignment of the
second operand may reduce performance on
some models."

> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 

Thanks!

> 
> r~
> 


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE

Posted by Richard Henderson 6 years, 11 months ago

On 2/28/19 12:36 AM, David Hildenbrand wrote:
> On 27.02.19 17:02, Richard Henderson wrote:
>> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>>> Also fairly easy to implement. One issue we have is that exceptions will
>>> result in some vectors already being modified. At least handle it
>>> consistently per vector by using a temporary vector. Good enough for
>>> now, add a FIXME.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  target/s390x/insn-data.def      |  2 ++
>>>  target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
>>>  2 files changed, 28 insertions(+)
>>
>> I suppose the fixme is good enough.  For the record, I think you could do the
>> check with just two loads -- the first and last quadword.  After that, none of
>> the other loads can fault, and you can store everything else into the
>> destination vectors as you read them.
> 
> Aren't such approaches prone to races if other VCPUs invalidate page
> tables/TLB entries?

No, because...

> (or am I messing up things and the MMU of this VCPU won't be touched
> while in this block and once we touched all applicable pages, it cannot
> fail anymore?)

Correct.

If vcpu 1 does a global invalidate, the time at which vcpu 2 acknowledges that
invalidate is somewhat fluid.  VCPU 2 will see an interrupt, exit at a TB
boundary, and then acknowledge.

VCPU 1 has to wait for the ack before it knows the operation is complete.

Thus no race within any given instruction's execution.


r~

Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE

Posted by David Hildenbrand 6 years, 11 months ago

On 28.02.19 18:15, Richard Henderson wrote:
> On 2/28/19 12:36 AM, David Hildenbrand wrote:
>> On 27.02.19 17:02, Richard Henderson wrote:
>>> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>>>> Also fairly easy to implement. One issue we have is that exceptions will
>>>> result in some vectors already being modified. At least handle it
>>>> consistently per vector by using a temporary vector. Good enough for
>>>> now, add a FIXME.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  target/s390x/insn-data.def      |  2 ++
>>>>  target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
>>>>  2 files changed, 28 insertions(+)
>>>
>>> I suppose the fixme is good enough.  For the record, I think you could do the
>>> check with just two loads -- the first and last quadword.  After that, none of
>>> the other loads can fault, and you can store everything else into the
>>> destination vectors as you read them.
>>
>> Aren't such approaches prone to races if other VCPUs invalidate page
>> tables/TLB entries?
> 
> No, because...
> 
>> (or am I messing up things and the MMU of this VCPU won't be touched
>> while in this block and once we touched all applicable pages, it cannot
>> fail anymore?)
> 
> Correct.
> 
> If vcpu 1 does a global invalidate, the time at which vcpu 2 acknowledges that
> invalidate is somewhat fluid.  VCPU 2 will see an interrupt, exit at a TB
> boundary, and then acknowledge.
> 
> VCPU 1 has to wait for the ack before it knows the operation is complete.
> 
> Thus no race within any given instruction's execution.

Okay, rings a bell, thanks! :)

So for writing from helpers, I can use probe_write(). What about testing
write access from TCG code?

I could do a load, followed by a store of the loaded value. This should
work in most cases (but eventually could be observed by somebody really
wanting to observe it - which is highly unlikely).


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE

Posted by Richard Henderson 6 years, 11 months ago

On 2/28/19 11:05 AM, David Hildenbrand wrote:
> So for writing from helpers, I can use probe_write(). What about testing
> write access from TCG code?
> 
> I could do a load, followed by a store of the loaded value. This should
> work in most cases (but eventually could be observed by somebody really
> wanting to observe it - which is highly unlikely).

I would call a helper for probe_write.


r~