[Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state

Richard Henderson posted 17 patches 4 years, 8 months ago
Test docker-clang@ubuntu passed
Test FreeBSD failed
Test asan failed
Test docker-mingw@fedora passed
Test checkpatch passed
Failed in applying to current master (apply log)
There is a newer version of this series
target/arm/cpu.h           |  84 +++++---
target/arm/helper.h        |   4 +
target/arm/internals.h     |   9 +
linux-user/syscall.c       |   1 +
target/arm/cpu.c           |   1 +
target/arm/helper-a64.c    |   3 +
target/arm/helper.c        | 383 ++++++++++++++++++++++++-------------
target/arm/machine.c       |   1 +
target/arm/op_helper.c     |   1 +
target/arm/translate-a64.c |   6 +-
target/arm/translate.c     |  18 +-
11 files changed, 341 insertions(+), 170 deletions(-)
[Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Richard Henderson 4 years, 8 months ago
Changes since v4:
  * Split patch 1 into 15 smaller patches.
  * Cache the new DEBUG_TARGET_EL field.
  * Split out m-profile hflags separately from a-profile 32-bit.
  * Move around non-cached tb flags as well, avoiding repetitive
    checks for m-profile or other mutually exclusive conditions.

  I haven't officially re-run the performance test quoted in the
  last patch, but I have eyeballed "perf top", and have dug into
  the compiled code a bit, which resulted in a few of the new
  cleanup patches (e.g. cs_base, arm_mmu_idx_el, and
  arm_cpu_data_is_big_endian).

Changes since v3:
  * Rebase.
  * Do not cache XSCALE_CPAR now that it overlaps VECSTRIDE.
  * Leave the new v7m bits as uncached.  I haven't figured
    out all of the ways fpccr is modified.

Changes since v2:
  * Do not cache VECLEN, VECSTRIDE, VFPEN.
    These variables come from VFP_FPSCR and VFP_FPEXC, not from
    system control registers.
  * Move HANDLER and STACKCHECK to rebuild_hflags_a32,
    instead of building them in rebuild_hflags_common.

Changes since v1:
  * Apparently I had started a last-minute API change, and failed to
    covert all of the users, and also failed to re-test afterward.
  * Retain assertions for --enable-debug-tcg.

Richard Henderson (17):
  target/arm: Split out rebuild_hflags_common
  target/arm: Split out rebuild_hflags_a64
  target/arm: Split out rebuild_hflags_common_32
  target/arm: Split arm_cpu_data_is_big_endian
  target/arm: Split out rebuild_hflags_m32
  target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
  target/arm: Split out rebuild_hflags_a32
  target/arm: Split out rebuild_hflags_aprofile
  target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in
    cpu_get_tb_cpu_state
  target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
  target/arm: Hoist computation of TBFLAG_A32.VFPEN
  target/arm: Add arm_rebuild_hflags
  target/arm: Split out arm_mmu_idx_el
  target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
  target/arm: Add HELPER(rebuild_hflags_{a32,a64,m32})
  target/arm: Rebuild hflags at EL changes and MSR writes
  target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

 target/arm/cpu.h           |  84 +++++---
 target/arm/helper.h        |   4 +
 target/arm/internals.h     |   9 +
 linux-user/syscall.c       |   1 +
 target/arm/cpu.c           |   1 +
 target/arm/helper-a64.c    |   3 +
 target/arm/helper.c        | 383 ++++++++++++++++++++++++-------------
 target/arm/machine.c       |   1 +
 target/arm/op_helper.c     |   1 +
 target/arm/translate-a64.c |   6 +-
 target/arm/translate.c     |  18 +-
 11 files changed, 341 insertions(+), 170 deletions(-)

-- 
2.17.1


Re: [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Richard Henderson 4 years, 8 months ago
On 8/20/19 2:07 PM, Richard Henderson wrote:
> Changes since v4:
>   * Split patch 1 into 15 smaller patches.
>   * Cache the new DEBUG_TARGET_EL field.
>   * Split out m-profile hflags separately from a-profile 32-bit.
>   * Move around non-cached tb flags as well, avoiding repetitive
>     checks for m-profile or other mutually exclusive conditions.

Just after I posted this, I started rebasing my VHE patch set on top, and I
found that the new DEBUG_TARGET_EL field has used The Last Bit, so that I could
not add the one bit that I need for VHE.

However, while working on this patch set, I noticed that we have a lot of
unnecessary overlap between A- and M- profile in the TBFLAGs.  Thus point 4
above and the completely separate rebuild_hflags_m32().

If we rearrange things like the appended, then we recover 4 bits.

Thoughts?


r~
Re: [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Peter Maydell 4 years, 7 months ago
On Wed, 21 Aug 2019 at 00:54, Richard Henderson
<richard.henderson@linaro.org> wrote:
> However, while working on this patch set, I noticed that we have a lot of
> unnecessary overlap between A- and M- profile in the TBFLAGs.  Thus point 4
> above and the completely separate rebuild_hflags_m32().
>
> If we rearrange things like the appended, then we recover 4 bits.

You can't make the THUMB bit A-profile only: we need it in
M-profile too, so we can correctly generate code that takes
the InvalidState exception for attempts to execute with the
Thumb bit not set.

If you want to make VFPEN be A-profile only you need to
do something so we don't look at it for M-profile: currently
we set it always-1 for M-profile so we don't trip the code
that causes us to take an exception if it's 0.

Otherwise seems reasonable. My overall question is: how bad
is it if we just start using bits in the cs_base word?
If we try to get too tricky with using the same bits for
different purposes it opens the door for accidentally writing
code where we use a bit that isn't actually set correctly
for all the situations where we're using it.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v5 00/17] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Richard Henderson 4 years, 7 months ago
On 9/4/19 3:48 AM, Peter Maydell wrote:
> On Wed, 21 Aug 2019 at 00:54, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> However, while working on this patch set, I noticed that we have a lot of
>> unnecessary overlap between A- and M- profile in the TBFLAGs.  Thus point 4
>> above and the completely separate rebuild_hflags_m32().
>>
>> If we rearrange things like the appended, then we recover 4 bits.
> 
> You can't make the THUMB bit A-profile only: we need it in
> M-profile too, so we can correctly generate code that takes
> the InvalidState exception for attempts to execute with the
> Thumb bit not set.

Yes, IIRC I found that when I went to make this work, rather than merely a
sketch through the header file.

> If you want to make VFPEN be A-profile only you need to
> do something so we don't look at it for M-profile: currently
> we set it always-1 for M-profile so we don't trip the code
> that causes us to take an exception if it's 0.
> 
> Otherwise seems reasonable. My overall question is: how bad
> is it if we just start using bits in the cs_base word?

*shrug* Even then we don't want to waste bits; there are only 32 of them
remaining, and after that there's no room at all for expansion.

> If we try to get too tricky with using the same bits for
> different purposes it opens the door for accidentally writing
> code where we use a bit that isn't actually set correctly
> for all the situations where we're using it.

Thankfully, we don't touch these bits in many places.  We set them in the
generator function, and we read them at the beginning of the translator.


r~