[PATCH v1 00/25] Lazily load PMU data

Ian Rogers posted 25 patches 2 years, 3 months ago
There is a newer version of this series
tools/perf/arch/x86/util/intel-pt.c      |  39 +-
tools/perf/bench/pmu-scan.c              |   8 +-
tools/perf/pmu-events/empty-pmu-events.c |  49 +-
tools/perf/pmu-events/jevents.py         | 319 +++++++--
tools/perf/pmu-events/pmu-events.h       |  15 +-
tools/perf/tests/parse-events.c          |   2 +-
tools/perf/tests/pmu-events.c            | 183 ++---
tools/perf/tests/pmu.c                   |  76 +-
tools/perf/util/amd-sample-raw.c         |   1 -
tools/perf/util/metricgroup.c            |  10 +-
tools/perf/util/parse-events.c           |  91 ++-
tools/perf/util/parse-events.h           |   3 +-
tools/perf/util/pmu.c                    | 872 +++++++++++++++--------
tools/perf/util/pmu.h                    | 110 ++-
tools/perf/util/pmu.y                    |  32 +-
tools/perf/util/pmus.c                   | 230 ++----
tools/perf/util/s390-sample-raw.c        |  50 +-
17 files changed, 1251 insertions(+), 839 deletions(-)
[PATCH v1 00/25] Lazily load PMU data
Posted by Ian Rogers 2 years, 3 months ago
Lazily load PMU data both from sysfs and json files. Reorganize
json data to be more PMU oriented to facilitate this, for
example, json data is now sorted into arrays for their PMU.

In refactoring the code some changes were made to get rid of maximum
encoding sizes for events (256 bytes), with input files being directly
passed to the lex generated code. There is also a small event parse
error message improvement.

Some results from an Intel tigerlake laptop running Debian:

Binary size reduction of 1.4% or 143,264 bytes because the PMU
name no longer appears in the string.

stat -e cpu/cycles/ minor faults reduced from 1733 to 1667, open calls reduced
from 171 to 94.

stat default minor faults reduced from 1085 to 1727, open calls reduced
from 654 to 343.

Average PMU scanning reduced from 4720.641usec to 2927.293usec.
Average core PMU scanning reduced from 1004.658usec to 232.668usec
(4.3x faster).

Ian Rogers (25):
  perf script ibs: Remove unused include
  perf pmu: Avoid a path name copy
  perf pmu: Move perf_pmu__set_format to pmu.y
  perf pmu: Reduce scope of perf_pmu_error
  perf pmu: Avoid passing format list to perf_pmu__config_terms
  perf pmu: Avoid passing format list to perf_pmu__format_type
  perf pmu: Avoid passing format list to perf_pmu__format_bits
  perf pmu: Pass PMU rather than aliases and format
  perf pmu: Make the loading of formats lazy
  perf pmu: Abstract alias/event struct
  perf pmu-events: Add extra underscore to function names
  perf jevents: Group events by PMU
  perf parse-events: Improve error message for double setting
  perf s390 s390_cpumcfdg_dump: Don't scan all PMUs
  perf pmu-events: Reduce processed events by passing PMU
  perf pmu-events: Add pmu_events_table__find_event
  perf pmu: Parse sysfs events directly from a file
  perf pmu: Prefer passing pmu to aliases list
  perf pmu: Merge json events with sysfs at load time
  perf pmu: Cache json events table
  perf pmu: Lazily add json events
  perf pmu: Scan type early to fail an invalid PMU quickly
  perf pmu: Be lazy about loading event info files from sysfs
  perf pmu: Lazily load sysfs aliases
  perf jevents: Sort strings in the big C string to reduce faults

 tools/perf/arch/x86/util/intel-pt.c      |  39 +-
 tools/perf/bench/pmu-scan.c              |   8 +-
 tools/perf/pmu-events/empty-pmu-events.c |  49 +-
 tools/perf/pmu-events/jevents.py         | 319 +++++++--
 tools/perf/pmu-events/pmu-events.h       |  15 +-
 tools/perf/tests/parse-events.c          |   2 +-
 tools/perf/tests/pmu-events.c            | 183 ++---
 tools/perf/tests/pmu.c                   |  76 +-
 tools/perf/util/amd-sample-raw.c         |   1 -
 tools/perf/util/metricgroup.c            |  10 +-
 tools/perf/util/parse-events.c           |  91 ++-
 tools/perf/util/parse-events.h           |   3 +-
 tools/perf/util/pmu.c                    | 872 +++++++++++++++--------
 tools/perf/util/pmu.h                    | 110 ++-
 tools/perf/util/pmu.y                    |  32 +-
 tools/perf/util/pmus.c                   | 230 ++----
 tools/perf/util/s390-sample-raw.c        |  50 +-
 17 files changed, 1251 insertions(+), 839 deletions(-)

-- 
2.42.0.rc1.204.g551eb34607-goog
Re: [PATCH v1 00/25] Lazily load PMU data
Posted by Ian Rogers 2 years, 3 months ago
On Wed, Aug 23, 2023 at 1:08 AM Ian Rogers <irogers@google.com> wrote:
>
> Lazily load PMU data both from sysfs and json files. Reorganize
> json data to be more PMU oriented to facilitate this, for
> example, json data is now sorted into arrays for their PMU.
>
> In refactoring the code some changes were made to get rid of maximum
> encoding sizes for events (256 bytes), with input files being directly
> passed to the lex generated code. There is also a small event parse
> error message improvement.
>
> Some results from an Intel tigerlake laptop running Debian:
>
> Binary size reduction of 1.4% or 143,264 bytes because the PMU
> name no longer appears in the string.
>
> stat -e cpu/cycles/ minor faults reduced from 1733 to 1667, open calls reduced
> from 171 to 94.
>
> stat default minor faults reduced from 1085 to 1727, open calls reduced
> from 654 to 343.

s/1085/1805/

Thanks,
Ian

>
> Average PMU scanning reduced from 4720.641usec to 2927.293usec.
> Average core PMU scanning reduced from 1004.658usec to 232.668usec
> (4.3x faster).
>
> Ian Rogers (25):
>   perf script ibs: Remove unused include
>   perf pmu: Avoid a path name copy
>   perf pmu: Move perf_pmu__set_format to pmu.y
>   perf pmu: Reduce scope of perf_pmu_error
>   perf pmu: Avoid passing format list to perf_pmu__config_terms
>   perf pmu: Avoid passing format list to perf_pmu__format_type
>   perf pmu: Avoid passing format list to perf_pmu__format_bits
>   perf pmu: Pass PMU rather than aliases and format
>   perf pmu: Make the loading of formats lazy
>   perf pmu: Abstract alias/event struct
>   perf pmu-events: Add extra underscore to function names
>   perf jevents: Group events by PMU
>   perf parse-events: Improve error message for double setting
>   perf s390 s390_cpumcfdg_dump: Don't scan all PMUs
>   perf pmu-events: Reduce processed events by passing PMU
>   perf pmu-events: Add pmu_events_table__find_event
>   perf pmu: Parse sysfs events directly from a file
>   perf pmu: Prefer passing pmu to aliases list
>   perf pmu: Merge json events with sysfs at load time
>   perf pmu: Cache json events table
>   perf pmu: Lazily add json events
>   perf pmu: Scan type early to fail an invalid PMU quickly
>   perf pmu: Be lazy about loading event info files from sysfs
>   perf pmu: Lazily load sysfs aliases
>   perf jevents: Sort strings in the big C string to reduce faults
>
>  tools/perf/arch/x86/util/intel-pt.c      |  39 +-
>  tools/perf/bench/pmu-scan.c              |   8 +-
>  tools/perf/pmu-events/empty-pmu-events.c |  49 +-
>  tools/perf/pmu-events/jevents.py         | 319 +++++++--
>  tools/perf/pmu-events/pmu-events.h       |  15 +-
>  tools/perf/tests/parse-events.c          |   2 +-
>  tools/perf/tests/pmu-events.c            | 183 ++---
>  tools/perf/tests/pmu.c                   |  76 +-
>  tools/perf/util/amd-sample-raw.c         |   1 -
>  tools/perf/util/metricgroup.c            |  10 +-
>  tools/perf/util/parse-events.c           |  91 ++-
>  tools/perf/util/parse-events.h           |   3 +-
>  tools/perf/util/pmu.c                    | 872 +++++++++++++++--------
>  tools/perf/util/pmu.h                    | 110 ++-
>  tools/perf/util/pmu.y                    |  32 +-
>  tools/perf/util/pmus.c                   | 230 ++----
>  tools/perf/util/s390-sample-raw.c        |  50 +-
>  17 files changed, 1251 insertions(+), 839 deletions(-)
>
> --
> 2.42.0.rc1.204.g551eb34607-goog
>
Re: [PATCH v1 00/25] Lazily load PMU data
Posted by Arnaldo Carvalho de Melo 2 years, 3 months ago
Em Wed, Aug 23, 2023 at 01:08:03AM -0700, Ian Rogers escreveu:
> Lazily load PMU data both from sysfs and json files. Reorganize
> json data to be more PMU oriented to facilitate this, for
> example, json data is now sorted into arrays for their PMU.
> 
> In refactoring the code some changes were made to get rid of maximum
> encoding sizes for events (256 bytes), with input files being directly
> passed to the lex generated code. There is also a small event parse
> error message improvement.
> 
> Some results from an Intel tigerlake laptop running Debian:
> 
> Binary size reduction of 1.4% or 143,264 bytes because the PMU
> name no longer appears in the string.
> 
> stat -e cpu/cycles/ minor faults reduced from 1733 to 1667, open calls reduced
> from 171 to 94.
> 
> stat default minor faults reduced from 1085 to 1727, open calls reduced
> from 654 to 343.
> 
> Average PMU scanning reduced from 4720.641usec to 2927.293usec.
> Average core PMU scanning reduced from 1004.658usec to 232.668usec
> (4.3x faster).

I'm now chasing this one when building it on ubuntu arm64

  CC      /tmp/build/perf/util/env.o
arch/arm64/util/../../arm/util/cs-etm.c: In function 'cs_etm_validate_context_id':
arch/arm64/util/../../arm/util/cs-etm.c:82:26: error: passing argument 1 of 'perf_pmu__format_bits' from incompatible pointer type [-Werror=incompatible-pointer-types]
   (perf_pmu__format_bits(&cs_etm_pmu->format, "contextid") |
                          ^
In file included from arch/arm64/util/../../arm/util/../../../util/header.h:13:0,
                 from arch/arm64/util/../../arm/util/../../../util/session.h:7,
                 from arch/arm64/util/../../arm/util/cs-etm.c:31:
arch/arm64/util/../../arm/util/../../../util/pmu.h:224:7: note: expected 'struct perf_pmu *' but argument is of type 'struct list_head *'
 __u64 perf_pmu__format_bits(struct perf_pmu *pmu, const char *name);
       ^~~~~~~~~~~~~~~~~~~~~
arch/arm64/util/../../arm/util/cs-etm.c:83:26: error: passing argument 1 of 'perf_pmu__format_bits' from incompatible pointer type [-Werror=incompatible-pointer-types]
    perf_pmu__format_bits(&cs_etm_pmu->format, "contextid1") |
                          ^
In file included from arch/arm64/util/../../arm/util/../../../util/header.h:13:0,
                 from arch/arm64/util/../../arm/util/../../../util/session.h:7,
                 from arch/arm64/util/../../arm/util/cs-etm.c:31:
arch/arm64/util/../../arm/util/../../../util/pmu.h:224:7: note: expected 'struct perf_pmu *' but argument is of type 'struct list_head *'
 __u64 perf_pmu__format_bits(struct perf_pmu *pmu, const char *name);
       ^~~~~~~~~~~~~~~~~~~~~
arch/arm64/util/../../arm/util/cs-etm.c:84:26: error: passing argument 1 of 'perf_pmu__format_bits' from incompatible pointer type [-Werror=incompatible-pointer-types]
    perf_pmu__format_bits(&cs_etm_pmu->format, "contextid2"));
                          ^
In file included from arch/arm64/util/../../arm/util/../../../util/header.h:13:0,
                 from arch/arm64/util/../../arm/util/../../../util/session.h:7,
                 from arch/arm64/util/../../arm/util/cs-etm.c:31:
arch/arm64/util/../../arm/util/../../../util/pmu.h:224:7: note: expected 'struct perf_pmu *' but argument is of type 'struct list_head *'
 __u64 perf_pmu__format_bits(struct perf_pmu *pmu, const char *name);
       ^~~~~~~~~~~~~~~~~~~~~
arch/arm64/util/../../arm/util/cs-etm.c:109:28: error: passing argument 1 of 'perf_pmu__format_bits' from incompatible pointer type [-Werror=incompatible-pointer-types]
      perf_pmu__format_bits(&cs_etm_pmu->format, "contextid1")) {
                            ^
In file included from arch/arm64/util/../../arm/util/../../../util/header.h:13:0,
                 from arch/arm64/util/../../arm/util/../../../util/session.h:7,
                 from arch/arm64/util/../../arm/util/cs-etm.c:31:
arch/arm64/util/../../arm/util/../../../util/pmu.h:224:7: note: expected 'struct perf_pmu *' but argument is of type 'struct list_head *'
 __u64 perf_pmu__format_bits(struct perf_pmu *pmu, const char *name);
       ^~~~~~~~~~~~~~~~~~~~~
arch/arm64/util/../../arm/util/cs-etm.c:125:28: error: passing argument 1 of 'perf_pmu__format_bits' from incompatible pointer type [-Werror=incompatible-pointer-types]
      perf_pmu__format_bits(&cs_etm_pmu->format, "contextid2")) {
                            ^
In file included from arch/arm64/util/../../arm/util/../../../util/header.h:13:0,
                 from arch/arm64/util/../../arm/util/../../../util/session.h:7,
                 from arch/arm64/util/../../arm/util/cs-etm.c:31:
arch/arm64/util/../../arm/util/../../../util/pmu.h:224:7: note: expected 'struct perf_pmu *' but argument is of type 'struct list_head *'
 __u64 perf_pmu__format_bits(struct perf_pmu *pmu, const char *name);
       ^~~~~~~~~~~~~~~~~~~~~
  CC      /tmp/build/perf/bench/epoll-ctl.o
arch/arm64/util/../../arm/util/cs-etm.c: In function 'cs_etm_validate_timestamp':
arch/arm64/util/../../arm/util/cs-etm.c:154:30: error: passing argument 1 of 'perf_pmu__format_bits' from incompatible pointer type [-Werror=incompatible-pointer-types]
        perf_pmu__format_bits(&cs_etm_pmu->format, "timestamp")))
                              ^
In file included from arch/arm64/util/../../arm/util/../../../util/header.h:13:0,
                 from arch/arm64/util/../../arm/util/../../../util/session.h:7,
                 from arch/arm64/util/../../arm/util/cs-etm.c:31:
arch/arm64/util/../../arm/util/../../../util/pmu.h:224:7: note: expected 'struct perf_pmu *' but argument is of type 'struct list_head *'
 __u64 perf_pmu__format_bits(struct perf_pmu *pmu, const char *name);
       ^~~~~~~~~~~~~~~~~~~~~
  CC      /tmp/build/perf/tests/vmlinux-kallsyms.o
  CC      /tmp/build/perf/arch/arm64/util/mem-events.o
  CC      /tmp/build/perf/bench/synthesize.o
  CC      /tmp/build/perf/tests/perf-record.o
  CC      /tmp/build/perf/bench/kallsyms-parse.o
arch/arm64/util/arm-spe.c: In function 'arm_spe_recording_options':
arch/arm64/util/arm-spe.c:233:30: error: passing argument 1 of 'perf_pmu__format_bits' from incompatible pointer type [-Werror=incompatible-pointer-types]
  bit = perf_pmu__format_bits(&arm_spe_pmu->format, "pa_enable");
                              ^
In file included from arch/arm64/util/../../../util/header.h:13:0,
                 from arch/arm64/util/../../../util/session.h:7,
                 from arch/arm64/util/arm-spe.c:19:
arch/arm64/util/../../../util/pmu.h:224:7: note: expected 'struct perf_pmu *' but argument is of type 'struct list_head *'
 __u64 perf_pmu__format_bits(struct perf_pmu *pmu, const char *name);
       ^~~~~~~~~~~~~~~~~~~~~
  CC      /tmp/build/perf/bench/find-bit-bench.o
  CC      /tmp/build/perf/tests/evsel-roundtrip-name.o
  CC      /tmp/build/perf/bench/inject-buildid.o
  CC      /tmp/build/perf/util/event.o