[Qemu-devel] [PATCH 0/7] target-mips: support MTTCG feature

Aleksandar Markovic posted 7 patches 6 years, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1516377391-25945-1-git-send-email-aleksandar.markovic@rt-rk.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test docker-quick@centos6 passed
Test ppc passed
Test s390x passed
There is a newer version of this series
configure               |   3 ++
hw/mips/mips_int.c      |  12 +++++
hw/misc/mips_cpc.c      |  17 ++++++-
linux-user/main.c       |  58 ------------------------
target/mips/cpu.h       |   9 ++--
target/mips/helper.c    |   6 +--
target/mips/helper.h    |   2 -
target/mips/machine.c   |   7 +--
target/mips/op_helper.c |  74 +++++++++---------------------
target/mips/translate.c | 118 ++++++++++++++++--------------------------------
10 files changed, 100 insertions(+), 206 deletions(-)
[Qemu-devel] [PATCH 0/7] target-mips: support MTTCG feature
Posted by Aleksandar Markovic 6 years, 2 months ago
From: Aleksandar Markovic <aleksandar.markovic@mips.com>

This series introduces MTTCG feature for MIPS targets by adding all
missing bits and pieces, and formally enabling corresponding QEMU
builds to support such configurations.

PATCH ORGANIZATION
==================

The organization of patches is as follows:

  - patches 1 and 2 deal with MIPS' LL/SC instruction emulation
    improvements related to MTTCG. They are based on a previously
    sent patch series by Leon Alrae (this is the last version, v3):
    http://lists.gnu.org/archive/html/qemu-devel/2016-09/msg06870.html

  - patches 3, 4, 5, and 6 deal with locking/synchronization issues
    that surfaced while introducing MTTCG for MIPS. Similar sets of
    patches have been already integrated for some other platforms
    (arm, intel, ppc, sparc).

  - patch 7 just enables QEMU build system to support MTTCG feature
    for MIPS targets.

PERFORMANCE TESTING
===================

Performance testing was performed using atomic_add-bench test program
that tests LL/SC-related functionality in multithread environment. The
observed performance gain was significant.

For the sake of comparison, test case organization mimics the one from
a previously sent patch set:

target-arm: emulate aarch64's LL/SC using cmpxchg helpers
https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg06653.html

-----------------------------------------------------------------------

          atomic_add-bench: 1000000 ops/thread, [0,1] range
                                               
throughput                                  M - MTTCG     N - no MTTCG

 50 +---------+---------+---------+---------+---------+---------+----+
    |                                                                |
    |M                                                               |
 40 +.                                                               +
    |.                                                               |
    |.                                                               |
 30 +.                                                               +
    |.                                                               |
    |.                                                               |
 20 +.                                                               +
    | M                                                              |
    | .                                                              |
 10 +  .M...M.......M.......M.......M.......M.......M.......M.......M+
    |N                                                               |
    | N.N...N.......N.......N.......N.......N.......N.......N.......N|
  0 +---------+---------+---------+---------+---------+---------+----+
    0         10        20        30        40        50        60

                            number of threads

-----------------------------------------------------------------------

          atomic_add-bench: 1000000 ops/thread, [0,2] range
                                               
throughput                                  M - MTTCG     N - no MTTCG

 50 +---------+---------+---------+---------+---------+---------+----+
    |                                                                |
    |M                                                               |
 40 +.                                                               +
    |.                                                               |
    |.                                                               |
 30 + .                                                              +
    | M                                                              |
    | .                                                              |
 20 +  .M...M.......M.......M.......M.......M.......M.......M.......M+
    |                                                                |
    |                                                                |
 10 +                                                                +
    |N                                                               |
    | N.N...N.......N.......N.......N.......N.......N.......N.......N|
  0 +---------+---------+---------+---------+---------+---------+----+
    0         10        20        30        40        50        60

                            number of threads

-----------------------------------------------------------------------

          atomic_add-bench: 1000000 ops/thread, [0,1] range
                                               
throughput                                  M - MTTCG     N - no MTTCG

150 +---------+---------+---------+---------+---------+---------+----+
    |                                                                |
    |                                            ...M...        ....M|
120 +                   ....M.......M........M...       ....M...     +
    |           ....M...                                             |
    |     ..M...                                                     |
 90 +    .                                                           +
    |  .M                                                            |
    | .                                                              |
 60 + M                                                              +
    |.                                                               |
    |M                                                               |
 30 +                                                                +
    |                                                                |
    |NN.N...N.......N.......N.......N.......N.......N.......N.......N|
  0 +---------+---------+---------+---------+---------+---------+----+
    0         10        20        30        40        50        60

                            number of threads

-----------------------------------------------------------------------

          atomic_add-bench: 1000000 ops/thread, [0,2] range
                                               
throughput                                  M - MTTCG     N - no MTTCG

150 +---------+---------+---------+---------+---------+---------+----+
    |                                            ...M.......M.......M|
    |                           ....M...       ..                    |
120 +           ....M.......M...        ....M..                      +
    |     ..M...                                                     |
    |   M.                                                           |
 90 +  .                                                             +
    | .                                                              |
    | .                                                              |
 60 + M                                                              +
    |.                                                               |
    |M                                                               |
 30 +                                                                +
    |                                                                |
    |NN.N...N.......N.......N.......N.......N.......N.......N.......N|
  0 +---------+---------+---------+---------+---------+---------+----+
    0         10        20        30        40        50        60

                            number of threads

-----------------------------------------------------------------------

Numerical data:

Ops
Range-->      1               2              128            1024

# of     no              no              no              no
 thr.    MTTCG  MTTCG    MTTCG  MTTCG    MTTCG  MTTCG    MTTCG  MTTCG

  1      4.95   42.61    4.94   42.27    4.89   42.24    4.85   41.81
  2      1.23   18.41    1.29   25.71    1.33   57.41    1.36   60.34
  4      0.46   11.99    0.48   19.69    0.53   78.98    0.50   95.39
  8      0.18    9.59    0.18   19.11    0.19  104.66    0.20  112.66
 16      0.11   11.19    0.12   19.12    0.12  108.29    0.13  121.90
 24      0.10   10.18    0.09   19.14    0.11  115.53    0.10  127.40
 32      0.11   11.15    0.12   19.36    0.09  120.60    0.10  131.60
 40      0.08   10.47    0.11   20.88    0.12  124.59    0.10  124.74
 48      0.12   11.78    0.13   20.09    0.11  129.24    0.11  137.19
 56      0.14   12.40    0.13   22.13    0.15  124.16    0.15  138.52
 64      0.14   11.08    0.20   21.08    0.18  131.28    0.19  144.84

-----------------------------------------------------------------------

Graphical representation:

 https://i.imgur.com/OtNLpVX.png

-----------------------------------------------------------------------

REGRESSION TESTING
==================

Regression testing was also performed. The main test bed for regression
testing was LTP test suite executed on QEMU-emulated Debian mips64
system.

Some LTP tests (getrusage04, copy_file_range01) that used to fail for
non-MTTCG systems, pass for MTTCG-enabled systems. Also, some LTP tests
(nanosleep01, poll02, pselect01) intermittently fail on both non-MTTCG
and MTTCG configurations, and therefore do not represent valid
regressions.

Emulation by itself did not appear to have any problems while executing
LTP test suite.

QEMU user mode MTTCG-enabled emulation was also tested to some extent.

Aleksandar Markovic (2):
  Revert "target/mips: hold BQL for timer interrupts"
  target/mips: introduce MTTCG-enabled builds

Goran Ferenc (1):
  target/mips: hold BQL in mips_vpe_wake()

Leon Alrae (2):
  target/mips: compare virtual addresses in LL/SC sequence
  target/mips: reimplement SC instruction and use cmpxchg

Miodrag Dinic (2):
  hw/mips_int: hold BQL for all interrupt requests
  hw/mips_cpc: kick a VP when putting it into Run state

 configure               |   3 ++
 hw/mips/mips_int.c      |  12 +++++
 hw/misc/mips_cpc.c      |  17 ++++++-
 linux-user/main.c       |  58 ------------------------
 target/mips/cpu.h       |   9 ++--
 target/mips/helper.c    |   6 +--
 target/mips/helper.h    |   2 -
 target/mips/machine.c   |   7 +--
 target/mips/op_helper.c |  74 +++++++++---------------------
 target/mips/translate.c | 118 ++++++++++++++++--------------------------------
 10 files changed, 100 insertions(+), 206 deletions(-)

-- 
2.7.4