[PATCH v6 00/20] target/arm: Reduce overhead of cpu_get_tb_cpu_state

Richard Henderson posted 20 patches 4 years, 6 months ago
Test asan passed
Test checkpatch passed
Test FreeBSD passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test docker-quick@centos7 passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20191011155546.14342-1-richard.henderson@linaro.org
Maintainers: Riku Voipio <riku.voipio@iki.fi>, Laurent Vivier <laurent@vivier.eu>, Peter Maydell <peter.maydell@linaro.org>
There is a newer version of this series
target/arm/cpu.h           |  84 +++++---
target/arm/helper.h        |   4 +
target/arm/internals.h     |   9 +
linux-user/syscall.c       |   1 +
target/arm/cpu.c           |   1 +
target/arm/helper-a64.c    |   3 +
target/arm/helper.c        | 383 ++++++++++++++++++++++++-------------
target/arm/m_helper.c      |   6 +
target/arm/machine.c       |   1 +
target/arm/op_helper.c     |   4 +
target/arm/translate-a64.c |  13 +-
target/arm/translate.c     |  28 ++-
12 files changed, 363 insertions(+), 174 deletions(-)
[PATCH v6 00/20] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Richard Henderson 4 years, 6 months ago
Changes since v5:
  * Fix the debug assertion ifdef in the final patch.
  * Add more calls to arm_rebuild_hflags: CPSR and M-profile
    These become two new patches, 18 & 19.
  * Update some comments per review. (Alex)

Changes since v4:
  * Split patch 1 into 15 smaller patches.
  * Cache the new DEBUG_TARGET_EL field.
  * Split out m-profile hflags separately from a-profile 32-bit.
  * Move around non-cached tb flags as well, avoiding repetitive
    checks for m-profile or other mutually exclusive conditions.

  I haven't officially re-run the performance test quoted in the
  last patch, but I have eyeballed "perf top", and have dug into
  the compiled code a bit, which resulted in a few of the new
  cleanup patches (e.g. cs_base, arm_mmu_idx_el, and
  arm_cpu_data_is_big_endian).

Changes since v3:
  * Rebase.
  * Do not cache XSCALE_CPAR now that it overlaps VECSTRIDE.
  * Leave the new v7m bits as uncached.  I haven't figured
    out all of the ways fpccr is modified.

Changes since v2:
  * Do not cache VECLEN, VECSTRIDE, VFPEN.
    These variables come from VFP_FPSCR and VFP_FPEXC, not from
    system control registers.
  * Move HANDLER and STACKCHECK to rebuild_hflags_a32,
    instead of building them in rebuild_hflags_common.

Changes since v1:
  * Apparently I had started a last-minute API change, and failed to
    covert all of the users, and also failed to re-test afterward.
  * Retain assertions for --enable-debug-tcg.


Richard Henderson (20):
  target/arm: Split out rebuild_hflags_common
  target/arm: Split out rebuild_hflags_a64
  target/arm: Split out rebuild_hflags_common_32
  target/arm: Split arm_cpu_data_is_big_endian
  target/arm: Split out rebuild_hflags_m32
  target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
  target/arm: Split out rebuild_hflags_a32
  target/arm: Split out rebuild_hflags_aprofile
  target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in
    cpu_get_tb_cpu_state
  target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
  target/arm: Hoist computation of TBFLAG_A32.VFPEN
  target/arm: Add arm_rebuild_hflags
  target/arm: Split out arm_mmu_idx_el
  target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
  target/arm: Add HELPER(rebuild_hflags_{a32,a64,m32})
  target/arm: Rebuild hflags at EL changes
  target/arm: Rebuild hflags at MSR writes
  target/arm: Rebuild hflags at CPSR writes
  target/arm: Rebuild hflags for M-profile.
  target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

 target/arm/cpu.h           |  84 +++++---
 target/arm/helper.h        |   4 +
 target/arm/internals.h     |   9 +
 linux-user/syscall.c       |   1 +
 target/arm/cpu.c           |   1 +
 target/arm/helper-a64.c    |   3 +
 target/arm/helper.c        | 383 ++++++++++++++++++++++++-------------
 target/arm/m_helper.c      |   6 +
 target/arm/machine.c       |   1 +
 target/arm/op_helper.c     |   4 +
 target/arm/translate-a64.c |  13 +-
 target/arm/translate.c     |  28 ++-
 12 files changed, 363 insertions(+), 174 deletions(-)

-- 
2.17.1


Re: [PATCH v6 00/20] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Peter Maydell 4 years, 6 months ago
On Fri, 11 Oct 2019 at 16:55, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Richard Henderson (20):
>   target/arm: Split out rebuild_hflags_common
>   target/arm: Split out rebuild_hflags_a64
>   target/arm: Split out rebuild_hflags_common_32
>   target/arm: Split arm_cpu_data_is_big_endian
>   target/arm: Split out rebuild_hflags_m32
>   target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
>   target/arm: Split out rebuild_hflags_a32
>   target/arm: Split out rebuild_hflags_aprofile
>   target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in
>     cpu_get_tb_cpu_state
>   target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
>   target/arm: Hoist computation of TBFLAG_A32.VFPEN
>   target/arm: Add arm_rebuild_hflags
>   target/arm: Split out arm_mmu_idx_el
>   target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
>   target/arm: Add HELPER(rebuild_hflags_{a32,a64,m32})
>   target/arm: Rebuild hflags at EL changes
>   target/arm: Rebuild hflags at MSR writes
>   target/arm: Rebuild hflags at CPSR writes
>   target/arm: Rebuild hflags for M-profile.
>   target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

Don't we also need to do something to rebuild the hflags
for M-profile writes to the memory mapped system registers?
For instance rebuild_hflags_m32() bakes in state which
cares about env->v7m.ccr, which is set via nvic_writel(),
but I don't see anything whereby the write to the NVIC
register triggers a rebuild of the hflags value. Maybe I
missed it?

thanks
-- PMM

Re: [PATCH v6 00/20] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Richard Henderson 4 years, 6 months ago
On 10/17/19 8:26 AM, Peter Maydell wrote:
> Don't we also need to do something to rebuild the hflags
> for M-profile writes to the memory mapped system registers?
> For instance rebuild_hflags_m32() bakes in state which
> cares about env->v7m.ccr, which is set via nvic_writel(),
> but I don't see anything whereby the write to the NVIC
> register triggers a rebuild of the hflags value. Maybe I
> missed it?

How do you end the TB after nvic_writel(), so that the new behaviour can be
noticed right now?  We can call arm_rebuild_hflags() within nvic_writel(), but
it still needs to be recognized somehow.  I would expect to place one rebuild
in the place you end the TB...


r~

Re: [PATCH v6 00/20] target/arm: Reduce overhead of cpu_get_tb_cpu_state
Posted by Peter Maydell 4 years, 6 months ago
On Thu, 17 Oct 2019 at 17:25, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 10/17/19 8:26 AM, Peter Maydell wrote:
> > Don't we also need to do something to rebuild the hflags
> > for M-profile writes to the memory mapped system registers?
> > For instance rebuild_hflags_m32() bakes in state which
> > cares about env->v7m.ccr, which is set via nvic_writel(),
> > but I don't see anything whereby the write to the NVIC
> > register triggers a rebuild of the hflags value. Maybe I
> > missed it?
>
> How do you end the TB after nvic_writel(), so that the new behaviour can be
> noticed right now?  We can call arm_rebuild_hflags() within nvic_writel(), but
> it still needs to be recognized somehow.  I would expect to place one rebuild
> in the place you end the TB...

To copy my reply from irc, just for the record:

we don't end the TB specifically after a write to the NVIC.
This is permitted by the architecture: rules R_BMLS and R_WQQB from
DDI05533B.h v8M Arm ARM:

# R_BMLS The side effects of any access to the SCS that performs
# a context-altering operation take effect when the access
# completes. A DSB instruction can be used to guarantee completion
# of a previous SCS access.
#
# R_WQQB A context synchronization event guarantees that the side
# effects of a previous SCS access are visible to all instructions
# in program order following the context synchronization event.

A context synchronization event is basically an ISB or an
exception entry/exit. Since we do end the TLB on an ISB
or exception entry/exit I think we're within the architectural
rules, but we do need to do a rebuild-hflags somewhere.
At the end of  nvic_sysreg_write() seems like the best place as
it means that QEMU is internally consistent and not likely to
trip over itself.

thanks
-- PMM