[PATCH v8 00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state

Richard Henderson posted 22 patches 6 years ago
Test asan passed
Test checkpatch passed
Test FreeBSD passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test docker-quick@centos7 passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20191018174431.1784-1-richard.henderson@linaro.org
Maintainers: Peter Maydell <peter.maydell@linaro.org>
There is a newer version of this series
target/arm/cpu.h           |  84 +++++---
target/arm/helper.h        |   4 +
target/arm/internals.h     |   9 +
hw/intc/armv7m_nvic.c      |  22 ++-
linux-user/syscall.c       |   1 +
target/arm/cpu.c           |   1 +
target/arm/helper-a64.c    |   3 +
target/arm/helper.c        | 393 ++++++++++++++++++++++++-------------
target/arm/m_helper.c      |   6 +
target/arm/machine.c       |   1 +
target/arm/op_helper.c     |   4 +
target/arm/translate-a64.c |  13 +-
target/arm/translate.c     |  33 +++-
13 files changed, 390 insertions(+), 184 deletions(-)
[PATCH v8 00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Richard Henderson 6 years ago
Changes since v7:
  * Rebuild hflags for all successful nvic writes (Peter).
  * Rebuild hflags for Xscale sctlr writes (Peter).

Changes since v6:
  * Regen hflags in two more places for m-profile (patch 19).

Changes since v5:
  * Fix the debug assertion ifdef in the final patch.
  * Add more calls to arm_rebuild_hflags: CPSR and M-profile
    These become two new patches, 18 & 19.
  * Update some comments per review. (Alex)

Changes since v4:
  * Split patch 1 into 15 smaller patches.
  * Cache the new DEBUG_TARGET_EL field.
  * Split out m-profile hflags separately from a-profile 32-bit.
  * Move around non-cached tb flags as well, avoiding repetitive
    checks for m-profile or other mutually exclusive conditions.

  I haven't officially re-run the performance test quoted in the
  last patch, but I have eyeballed "perf top", and have dug into
  the compiled code a bit, which resulted in a few of the new
  cleanup patches (e.g. cs_base, arm_mmu_idx_el, and
  arm_cpu_data_is_big_endian).
...


r~


Richard Henderson (22):
  target/arm: Split out rebuild_hflags_common
  target/arm: Split out rebuild_hflags_a64
  target/arm: Split out rebuild_hflags_common_32
  target/arm: Split arm_cpu_data_is_big_endian
  target/arm: Split out rebuild_hflags_m32
  target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
  target/arm: Split out rebuild_hflags_a32
  target/arm: Split out rebuild_hflags_aprofile
  target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in
    cpu_get_tb_cpu_state
  target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
  target/arm: Hoist computation of TBFLAG_A32.VFPEN
  target/arm: Add arm_rebuild_hflags
  target/arm: Split out arm_mmu_idx_el
  target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
  target/arm: Add HELPER(rebuild_hflags_{a32,a64,m32})
  target/arm: Rebuild hflags at EL changes
  target/arm: Rebuild hflags at MSR writes
  target/arm: Rebuild hflags at CPSR writes
  target/arm: Rebuild hflags at Xscale SCTLR writes
  target/arm: Rebuild hflags for M-profile
  target/arm: Rebuild hflags for M-profile NVIC
  target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

 target/arm/cpu.h           |  84 +++++---
 target/arm/helper.h        |   4 +
 target/arm/internals.h     |   9 +
 hw/intc/armv7m_nvic.c      |  22 ++-
 linux-user/syscall.c       |   1 +
 target/arm/cpu.c           |   1 +
 target/arm/helper-a64.c    |   3 +
 target/arm/helper.c        | 393 ++++++++++++++++++++++++-------------
 target/arm/m_helper.c      |   6 +
 target/arm/machine.c       |   1 +
 target/arm/op_helper.c     |   4 +
 target/arm/translate-a64.c |  13 +-
 target/arm/translate.c     |  33 +++-
 13 files changed, 390 insertions(+), 184 deletions(-)

-- 
2.17.1


Re: [PATCH v8 00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Peter Maydell 6 years ago
On Fri, 18 Oct 2019 at 18:44, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Changes since v7:
>   * Rebuild hflags for all successful nvic writes (Peter).
>   * Rebuild hflags for Xscale sctlr writes (Peter).
>
> Changes since v6:
>   * Regen hflags in two more places for m-profile (patch 19).
>
> Changes since v5:
>   * Fix the debug assertion ifdef in the final patch.
>   * Add more calls to arm_rebuild_hflags: CPSR and M-profile
>     These become two new patches, 18 & 19.
>   * Update some comments per review. (Alex)
>
> Changes since v4:
>   * Split patch 1 into 15 smaller patches.
>   * Cache the new DEBUG_TARGET_EL field.
>   * Split out m-profile hflags separately from a-profile 32-bit.
>   * Move around non-cached tb flags as well, avoiding repetitive
>     checks for m-profile or other mutually exclusive conditions.
>
>   I haven't officially re-run the performance test quoted in the
>   last patch, but I have eyeballed "perf top", and have dug into
>   the compiled code a bit, which resulted in a few of the new
>   cleanup patches (e.g. cs_base, arm_mmu_idx_el, and
>   arm_cpu_data_is_big_endian).
> ...
>
>
> r~



Applied to target-arm.next, thanks.

-- PMM

Re: [PATCH v8 00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Peter Maydell 6 years ago
On Tue, 22 Oct 2019 at 13:47, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Fri, 18 Oct 2019 at 18:44, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Changes since v7:
> >   * Rebuild hflags for all successful nvic writes (Peter).
> >   * Rebuild hflags for Xscale sctlr writes (Peter).
> >
> > Changes since v6:
> >   * Regen hflags in two more places for m-profile (patch 19).
> >
> > Changes since v5:
> >   * Fix the debug assertion ifdef in the final patch.
> >   * Add more calls to arm_rebuild_hflags: CPSR and M-profile
> >     These become two new patches, 18 & 19.
> >   * Update some comments per review. (Alex)
> >
> > Changes since v4:
> >   * Split patch 1 into 15 smaller patches.
> >   * Cache the new DEBUG_TARGET_EL field.
> >   * Split out m-profile hflags separately from a-profile 32-bit.
> >   * Move around non-cached tb flags as well, avoiding repetitive
> >     checks for m-profile or other mutually exclusive conditions.
> >
> >   I haven't officially re-run the performance test quoted in the
> >   last patch, but I have eyeballed "perf top", and have dug into
> >   the compiled code a bit, which resulted in a few of the new
> >   cleanup patches (e.g. cs_base, arm_mmu_idx_el, and
> >   arm_cpu_data_is_big_endian).
> > ...
> >
> >
> > r~
>
>
>
> Applied to target-arm.next, thanks.

Turns out this asserts in qemu-armeb :-(

/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/armeb-linux-user/qemu-armeb
-L ./gnemul/qemu-armeb armeb/ls -l dummyfile
qemu-armeb: /home/petmay01/linaro/qemu-for-merges/target/arm/helper.c:11267:
cpu_get_tb_cpu_state: Assertion `flags ==
rebuild_hflags_internal(env)' failed.
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

Dropping this series again for the moment.

thanks
-- PMM

Re: [PATCH v8 00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Richard Henderson 6 years ago
On 10/22/19 11:38 AM, Peter Maydell wrote:
> Turns out this asserts in qemu-armeb :-(
> 
> /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/armeb-linux-user/qemu-armeb
> -L ./gnemul/qemu-armeb armeb/ls -l dummyfile
> qemu-armeb: /home/petmay01/linaro/qemu-for-merges/target/arm/helper.c:11267:
> cpu_get_tb_cpu_state: Assertion `flags ==
> rebuild_hflags_internal(env)' failed.
> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
> Segmentation fault (core dumped)
> 
> Dropping this series again for the moment.

Argh!  I had forgotten that we have no testing of armeb in check-tcg.

Yes, I see now that we need a recompute in linux-user/{aarch64,arm}/cpu_loop.c
specific to TARGET_WORDS_BIGENDIAN.


r~

Re: [PATCH v8 00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Alex Bennée 6 years ago
Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/22/19 11:38 AM, Peter Maydell wrote:
>> Turns out this asserts in qemu-armeb :-(
>>
>> /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/armeb-linux-user/qemu-armeb
>> -L ./gnemul/qemu-armeb armeb/ls -l dummyfile
>> qemu-armeb: /home/petmay01/linaro/qemu-for-merges/target/arm/helper.c:11267:
>> cpu_get_tb_cpu_state: Assertion `flags ==
>> rebuild_hflags_internal(env)' failed.
>> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
>> Segmentation fault (core dumped)
>>
>> Dropping this series again for the moment.
> Argh!  I had forgotten that we have no testing of armeb in check-tcg.

Does it need it's own toolchain or can it be done with flags?

>
> Yes, I see now that we need a recompute in linux-user/{aarch64,arm}/cpu_loop.c
> specific to TARGET_WORDS_BIGENDIAN.
>
>
> r~


--
Alex Bennée

Re: [PATCH v8 00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Richard Henderson 6 years ago
On 10/23/19 11:17 AM, Alex Bennée wrote:
>>> Dropping this series again for the moment.
>> Argh!  I had forgotten that we have no testing of armeb in check-tcg.
> 
> Does it need it's own toolchain or can it be done with flags?

I think the compiler only needs flags, so we might be able to gin something up
for {aarch64_eb,armeb}-linux-user from __start.


r~

Re: [PATCH v8 00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Alex Bennée 6 years ago
For system emulation? Sure. I think linux-user is harder because you need
the be crt and libs packaged.

On Wed, 23 Oct 2019, 17:13 Richard Henderson, <richard.henderson@linaro.org>
wrote:

> On 10/23/19 11:17 AM, Alex Bennée wrote:
> >>> Dropping this series again for the moment.
> >> Argh!  I had forgotten that we have no testing of armeb in check-tcg.
> >
> > Does it need it's own toolchain or can it be done with flags?
>
> I think the compiler only needs flags, so we might be able to gin
> something up
> for {aarch64_eb,armeb}-linux-user from __start.
>
>
> r~
>