[Qemu-devel] [PATCH 0/8] target/alpha cleanups

Richard Henderson posted 8 patches 6 years, 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170714001819.1660-1-rth@twiddle.net
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
target/alpha/cpu.h       |  79 +++++++----------
hw/alpha/dp264.c         |   1 -
linux-user/main.c        |  25 +++---
target/alpha/cpu.c       |   7 +-
target/alpha/helper.c    |  12 +--
target/alpha/machine.c   |  10 +--
target/alpha/translate.c | 221 +++++++++++++++++++++++++++++------------------
7 files changed, 194 insertions(+), 161 deletions(-)
[Qemu-devel] [PATCH 0/8] target/alpha cleanups
Posted by Richard Henderson 6 years, 8 months ago
The new title holder for perf top is helper_lookup_tb_ptr.
Those targets that have a complicated cpu_get_tb_cpu_state
function are going to regret that.

This cleans up the Alpha version of that function such that it is
just two loads and one mask.  Which is one practically-free mask
away from being as minimal as one can get.

Also, in anticipation of LLuis' generic translation loop, fix all
of the temporary leaks.  They all seem to have been on insns that
end the TB, so in practice they weren't harmful, but...


r~


Richard Henderson (8):
  target/alpha: Remove amask from tb->flags
  target/alpha: Copy tb->flags into DisasContext
  target/alpha: Merge several flag bytes into ENV->FLAGS
  target/alpha: Fix temp leak in gen_bcond
  target/alpha: Fix temp leak in gen_mtpr
  target/alpha: Fix temp leak in gen_call_pal
  target/alpha: Fix temp leak in gen_fbcond
  target/alpha: Log temp leaks

 target/alpha/cpu.h       |  79 +++++++----------
 hw/alpha/dp264.c         |   1 -
 linux-user/main.c        |  25 +++---
 target/alpha/cpu.c       |   7 +-
 target/alpha/helper.c    |  12 +--
 target/alpha/machine.c   |  10 +--
 target/alpha/translate.c | 221 +++++++++++++++++++++++++++++------------------
 7 files changed, 194 insertions(+), 161 deletions(-)

-- 
2.9.4


Re: [Qemu-devel] [PATCH 0/8] target/alpha cleanups
Posted by Emilio G. Cota 6 years, 8 months ago
On Thu, Jul 13, 2017 at 14:18:11 -1000, Richard Henderson wrote:
> The new title holder for perf top is helper_lookup_tb_ptr.
> Those targets that have a complicated cpu_get_tb_cpu_state
> function are going to regret that.
> 
> 
> This cleans up the Alpha version of that function such that it is
> just two loads and one mask.  Which is one practically-free mask
> away from being as minimal as one can get.

Tested-by: Emilio G. Cota <cota@braap.org>
for the series.

I tried to get some perf numbers but really booting linux
doesn't spend much time in lookup_tb_ptr, nor does dbt-bench; so
I get very similar before/after numbers (slight perf decrease for
booting, tiny perf increase for dbt-bench). Numbers are below, FWIW.

		Emilio

* I modified the gentoo-alpha image I'm using [1] to shut down once
it has fully booted. Results before/after this patchset:

 Performance counter stats for 'taskset -c 0 alpha-softmmu/qemu-system-alpha \
	-m 512 -drive \
	file=../img/alpha/die-on-boot.img,media=disk,format=raw,index=0 \
	-kernel ../img/alpha/vmlinux -append root=/dev/sda2 \
	-accel accel=tcg,thread=single -smp 1 -nographic' (10 runs):

Before:

      30586.631281      task-clock (msec)         #    0.883 CPUs utilized            ( +-  0.56% )
            16,373      context-switches          #    0.535 K/sec                    ( +-  1.16% )
                 1      cpu-migrations            #    0.000 K/sec
            10,269      page-faults               #    0.336 K/sec                    ( +-  1.39% )
   128,287,167,139      cycles                    #    4.194 GHz                      ( +-  0.55% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   244,179,137,606      instructions              #    1.90  insns per cycle          ( +-  0.66% )
    45,088,775,217      branches                  # 1474.133 M/sec                    ( +-  0.61% )
       267,065,722      branch-misses             #    0.59% of all branches          ( +-  0.84% )

      34.639115913 seconds time elapsed                                          ( +-  0.50% )

After:
      31358.851235      task-clock (msec)         #    0.892 CPUs utilized            ( +-  1.07% )
            16,352      context-switches          #    0.521 K/sec                    ( +-  1.59% )
                 1      cpu-migrations            #    0.000 K/sec
            10,643      page-faults               #    0.339 K/sec                    ( +-  1.18% )
   131,620,007,449      cycles                    #    4.197 GHz                      ( +-  1.07% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   249,714,336,126      instructions              #    1.90  insns per cycle          ( +-  1.35% )
    46,259,663,064      branches                  # 1475.171 M/sec                    ( +-  1.27% )
       269,500,888      branch-misses             #    0.58% of all branches          ( +-  0.71% )

      35.136529309 seconds time elapsed                                          ( +-  0.99% )

perf diff doesn't show anything interesting (all differences, <1%, are due to kernel code)

* DBT-bench before/after:
			  NBench score, higher is better
  100 +-+---+-----+-----+----+-----+-----+-----+-----+-----+----+-----+---+-+
      |                    ***##       ***##                                |
   90 +-+..................*+*.#.......*.*.#.................before       +-+
      |                    * * #       * * #                  after         |
      |               ***# * * # +++++ * * #                                |
   80 +-+.......***##.*.*#.*.*.#.***##.*.*.#..............................+-+
      |         * * # * *# * * # * * # * * #                                |
   70 +-+.......*.*.#.*.*#.*.*.#.*.*.#.*.*.#..............................+-+
      |         * * # * *# * * # * * # * * #                                |
      |         * * # * *# * * # * * # * * #                                |
   60 +-+.......*.*.#.*.*#.*.*.#.*.*.#.*.*.#..............................+-+
      |         * * # * *# * * # * * # * * # ***##                          |
   50 +-+.......*.*.#.*.*#.*.*.#.*.*.#.*.*.#.*.*.#........................+-+
      |         * * # * *# * * # * * # * * # * * #                          |
      |         * * # * *# * * # * * # * * # * * #                          |
   40 +-+.......*.*.#.*.*#.*.*.#.*.*.#.*.*.#.*.*.#........................+-+
      |   ***## * * # * *# * * # * * # * * # * * #                          |
   30 +-+.*.*.#.*.*.#.*.*#.*.*.#.*.*.#.*.*.#.*.*.#........................+-+
      |   * * # * * # * *# * * # * * # * * # * * #                  ***##   |
      |   * * # * * # * *# * * # * * # * * # * * #                  * * #   |
   20 +-+.*.*.#.*.*.#.*.*#.*.*.#.*.*.#.*.*.#.*.*.#..................*.*.#.+-+
      |   * * # * * # * *# * * # * * # * * # * * #                  * * #   |
   10 +-+.*.*.#.*.*.#.*.*#.*.*.#.*.*.#.*.*.#.*.*.#..................*.*.#.+-+
      |   * * # * * # * *# * * # * * # * * # * * #                  * * #   |
      |   * * # * * # * *# * * # * * # * * # * * #       ***# ***## * * #   |
    0 +-+-***##-***##-***#-***##-***##-***##-***##-***##-***#-***##-***##-+-+
       STRING SOBFP EMULAASSIGNMENT  IDEHUFFMAFOLU DECOMPOSITION gmean
  png: http://imgur.com/oFFYSKd

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00630.html