[v1] target/arm vector improvements

[PATCH 0/4] target/arm vector improvements

Posted by Richard Henderson 4 years, 6 months ago

The first patch has been seen before.

  https://patchwork.ozlabs.org/patch/1115039/

It had a bug and I didn't fix it right away and then forgot.
Fixed now; I had mixed up the operand ordering for aarch32.

The next 3 are something that I noticed while doing other stuff.

In particular, pmull is used heavily during https transfers.
While cloning a repository, the old code peaks at 27% of the
total runtime, as measured by perf top.  The new code does
not quite reach 3% repeating the same clone.

In addition, the new helper functions are in the form that
will be required for the implementation of SVE2.

The comment in patch 2 about ARMv8.4-DIT is perhaps a stretch,
but re-reading the pmull instruction description in the current
ARM ARM brought it to mind.

Since TCG is officially not in the security domain, it's
probably not a bug to just claim to support DIT without
actually doing anything to ensure the algorithms used are in
fact timing independent of the data.

On the other hand, I expect the bit distribution of stuff
going through these sort of hashing algorithms to approach
50% 1's and 0's, so I also don't think we gain anything on
average to terminate the loop early.

Thoughts on DIT specifically?


r~


Richard Henderson (4):
  target/arm: Vectorize USHL and SSHL
  target/arm: Convert PMUL.8 to gvec
  target/arm: Convert PMULL.64 to gvec
  target/arm: Convert PMULL.8 to gvec

 target/arm/helper-sve.h    |   2 +
 target/arm/helper.h        |  21 ++-
 target/arm/translate.h     |   6 +
 target/arm/neon_helper.c   | 117 -------------
 target/arm/translate-a64.c |  83 ++++-----
 target/arm/translate.c     | 350 ++++++++++++++++++++++++++++++++-----
 target/arm/vec_helper.c    | 211 ++++++++++++++++++++++
 7 files changed, 562 insertions(+), 228 deletions(-)

-- 
2.17.1

Re: [PATCH 0/4] target/arm vector improvements

Posted by Alex Bennée 4 years, 6 months ago

Richard Henderson <richard.henderson@linaro.org> writes:

> The first patch has been seen before.
>
>   https://patchwork.ozlabs.org/patch/1115039/
>
> It had a bug and I didn't fix it right away and then forgot.
> Fixed now; I had mixed up the operand ordering for aarch32.
>
> The next 3 are something that I noticed while doing other stuff.
>
> In particular, pmull is used heavily during https transfers.
> While cloning a repository, the old code peaks at 27% of the
> total runtime, as measured by perf top.  The new code does
> not quite reach 3% repeating the same clone.
>
> In addition, the new helper functions are in the form that
> will be required for the implementation of SVE2.
>
> The comment in patch 2 about ARMv8.4-DIT is perhaps a stretch,
> but re-reading the pmull instruction description in the current
> ARM ARM brought it to mind.
>
> Since TCG is officially not in the security domain, it's
> probably not a bug to just claim to support DIT without
> actually doing anything to ensure the algorithms used are in
> fact timing independent of the data.
>
> On the other hand, I expect the bit distribution of stuff
> going through these sort of hashing algorithms to approach
> 50% 1's and 0's, so I also don't think we gain anything on
> average to terminate the loop early.
>
> Thoughts on DIT specifically?

It would be an interesting academic exercise to see if someone could
exploit the JIT from the guest but as you say not a security domain
so....

>
>
> r~
>
>
> Richard Henderson (4):
>   target/arm: Vectorize USHL and SSHL
>   target/arm: Convert PMUL.8 to gvec
>   target/arm: Convert PMULL.64 to gvec
>   target/arm: Convert PMULL.8 to gvec
>
>  target/arm/helper-sve.h    |   2 +
>  target/arm/helper.h        |  21 ++-
>  target/arm/translate.h     |   6 +
>  target/arm/neon_helper.c   | 117 -------------
>  target/arm/translate-a64.c |  83 ++++-----
>  target/arm/translate.c     | 350 ++++++++++++++++++++++++++++++++-----
>  target/arm/vec_helper.c    | 211 ++++++++++++++++++++++
>  7 files changed, 562 insertions(+), 228 deletions(-)


--
Alex Bennée

Re: [PATCH 0/4] target/arm vector improvements

Posted by no-reply@patchew.org 4 years, 6 months ago

Patchew URL: https://patchew.org/QEMU/20191017044232.27601-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH 0/4] target/arm vector improvements
Type: series
Message-id: 20191017044232.27601-1-richard.henderson@linaro.org

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
3ea6653 target/arm: Convert PMULL.8 to gvec
b6cf8ea target/arm: Convert PMULL.64 to gvec
5eddaf2 target/arm: Convert PMUL.8 to gvec
bd8f967 target/arm: Vectorize USHL and SSHL

=== OUTPUT BEGIN ===
1/4 Checking commit bd8f967551b6 (target/arm: Vectorize USHL and SSHL)
ERROR: trailing statements should be on next line
#160: FILE: target/arm/translate.c:3583:
+            case 2: gen_ushl_i32(var, var, shift); break;

ERROR: trailing statements should be on next line
#167: FILE: target/arm/translate.c:3589:
+            case 2: gen_sshl_i32(var, var, shift); break;

total: 2 errors, 0 warnings, 571 lines checked

Patch 1/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/4 Checking commit 5eddaf2661e3 (target/arm: Convert PMUL.8 to gvec)
3/4 Checking commit b6cf8ea095db (target/arm: Convert PMULL.64 to gvec)
4/4 Checking commit 3ea665336e6c (target/arm: Convert PMULL.8 to gvec)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20191017044232.27601-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

Re: [PATCH 0/4] target/arm vector improvements

Posted by Peter Maydell 4 years, 5 months ago

On Thu, 17 Oct 2019 at 05:42, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The first patch has been seen before.
>
>   https://patchwork.ozlabs.org/patch/1115039/
>
> It had a bug and I didn't fix it right away and then forgot.
> Fixed now; I had mixed up the operand ordering for aarch32.

Since Alex had a nit on patch 1 I'm going to assume you'll
respin this series once we're reopen for 5.0 development.

> The comment in patch 2 about ARMv8.4-DIT is perhaps a stretch,
> but re-reading the pmull instruction description in the current
> ARM ARM brought it to mind.
>
> Since TCG is officially not in the security domain, it's
> probably not a bug to just claim to support DIT without
> actually doing anything to ensure the algorithms used are in
> fact timing independent of the data.
>
> On the other hand, I expect the bit distribution of stuff
> going through these sort of hashing algorithms to approach
> 50% 1's and 0's, so I also don't think we gain anything on
> average to terminate the loop early.
>
> Thoughts on DIT specifically?

I think we have two plausible choices for DIT:
 (1) don't implement it, since we can't make the timing
     guarantees it suggests
 (2) implement it but not actually change our behaviour:
     this is technically a lie to the guest, but it means
     that guest OS handling of the PSTATE.DIT bit etc can
     be tested

At the moment we by default do (1). I think we should
probably postpone doing (2) until somebody actually
asks us for it.

thanks
-- PMM

Re: [PATCH 0/4] target/arm vector improvements

Posted by Richard Henderson 4 years, 5 months ago

On 11/18/19 5:26 PM, Peter Maydell wrote:
> On Thu, 17 Oct 2019 at 05:42, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> The first patch has been seen before.
>>
>>   https://patchwork.ozlabs.org/patch/1115039/
>>
>> It had a bug and I didn't fix it right away and then forgot.
>> Fixed now; I had mixed up the operand ordering for aarch32.
> 
> Since Alex had a nit on patch 1 I'm going to assume you'll
> respin this series once we're reopen for 5.0 development.

Oops, I forgot about it.  Yes, we can postpone to 5.0.

> I think we have two plausible choices for DIT:
>  (1) don't implement it, since we can't make the timing
>      guarantees it suggests
>  (2) implement it but not actually change our behaviour:
>      this is technically a lie to the guest, but it means
>      that guest OS handling of the PSTATE.DIT bit etc can
>      be tested
> 
> At the moment we by default do (1). I think we should
> probably postpone doing (2) until somebody actually
> asks us for it.

Fair enough.


r~