tools/perf/Documentation/perf-record.txt | 7 +- tools/perf/builtin-record.c | 6 + tools/perf/ui/browsers/hists.c | 50 ++++- tools/perf/ui/hist.c | 272 ++++++++++++++++++++++- tools/perf/ui/stdio/hist.c | 57 +++-- tools/perf/util/evsel.c | 2 +- tools/perf/util/hist.c | 78 +++++++ tools/perf/util/hist.h | 22 ++ tools/perf/util/mem-events.c | 183 ++++++++++++++- tools/perf/util/mem-events.h | 57 +++++ tools/perf/util/record.h | 1 + tools/perf/util/sort.c | 42 +++- 12 files changed, 718 insertions(+), 59 deletions(-)
Hello,
The perf mem uses PERF_SAMPLE_DATA_SRC which has a lot of information
for memory access. It has various sort keys to group related samples
together but it's still cumbersome to see the result. While perf c2c
command provides a way to investigate the data in a specific way, I'd
like to add more generic ways using new output fields.
For example, the following is the 'cache' output field which breaks
down the sample weights into different level of caches.
$ perf mem record -a sleep 1
$ perf mem report -F cache,dso,sym --stdio
...
#
# -------------- Cache --------------
# L1 L2 L3 L1-buf Other Shared Object Symbol
# ................................... ..................................... .........................................
#
0.0% 0.0% 0.0% 0.0% 100.0% [kernel.kallsyms] [k] ioread8
100.0% 0.0% 0.0% 0.0% 0.0% [kernel.kallsyms] [k] _raw_spin_lock_irq
0.0% 0.0% 0.0% 0.0% 100.0% [xhci_hcd] [k] xhci_update_erst_dequeue
0.0% 0.0% 0.0% 95.8% 4.2% [kernel.kallsyms] [k] smaps_account
0.6% 1.8% 22.7% 45.5% 29.5% [kernel.kallsyms] [k] sched_balance_update_blocked_averages
29.4% 0.0% 1.6% 58.8% 10.2% [kernel.kallsyms] [k] __update_load_avg_cfs_rq
0.0% 8.5% 4.3% 0.0% 87.2% [kernel.kallsyms] [k] copy_mc_enhanced_fast_string
63.9% 0.0% 8.0% 23.8% 4.3% [kernel.kallsyms] [k] psi_group_change
3.9% 0.0% 9.3% 35.7% 51.1% [kernel.kallsyms] [k] timerqueue_add
35.9% 10.9% 0.0% 39.0% 14.2% [kernel.kallsyms] [k] memcpy
94.1% 0.0% 0.0% 5.9% 0.0% [kernel.kallsyms] [k] unmap_page_range
25.7% 0.0% 4.9% 51.0% 18.4% [kernel.kallsyms] [k] __update_load_avg_se
0.0% 24.9% 19.4% 9.6% 46.1% [kernel.kallsyms] [k] _copy_to_iter
12.9% 0.0% 0.0% 87.1% 0.0% [kernel.kallsyms] [k] next_uptodate_folio
36.8% 0.0% 9.5% 16.6% 37.1% [kernel.kallsyms] [k] update_curr
100.0% 0.0% 0.0% 0.0% 0.0% bpf_prog_b9611ccbbb3d1833_dfs_iter [k] bpf_prog_b9611ccbbb3d1833_dfs_iter
45.4% 1.8% 20.4% 23.6% 8.8% [kernel.kallsyms] [k] audit_filter_rules.isra.0
92.8% 0.0% 0.0% 7.2% 0.0% [kernel.kallsyms] [k] filemap_map_pages
10.6% 0.0% 0.0% 89.4% 0.0% [kernel.kallsyms] [k] smaps_page_accumulate
38.3% 0.0% 29.6% 27.1% 5.0% [kernel.kallsyms] [k] __schedule
Please see the description of each commit for other fields.
New mem_stat field was added to the hist_entry to save this
information. It's a generic data structure (array) to handle
different type of information like cache-level, memory location,
snoop-result, etc.
The first patch is a fix for the hierarchy mode and it was sent
separately. I just add it here not to break the hierarchy mode. The
second patch is to enable SAMPLE_DATA_SRC without SAMPLE_ADDR and
perf_event_attr.mmap_data which generate a lot more data.
The name of some new fields are the same as the corresponding sort
keys (mem, op, snoop) so I had to change the order whether it's
applied as an output field or a sort key. Maybe it's better to name
them differently but I couldn't come up with better ideas.
That means, you need to use -F/--fields option to specify those fields
and the sort keys you want. Maybe we can change the default output
and sort keys for perf mem report with this.
The code is available at 'perf/mem-field-v1' branch in
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Thanks,
Namhyung
Namhyung Kim (11):
perf hist: Remove output field from sort-list properly
perf record: Add --sample-mem-info option
perf hist: Support multi-line header
perf hist: Add struct he_mem_stat
perf hist: Basic support for mem_stat accounting
perf hist: Implement output fields for mem stats
perf mem: Add 'op' output field
perf hist: Hide unused mem stat columns
perf mem: Add 'cache' and 'memory' output fields
perf mem: Add 'snoop' output field
perf mem: Add 'dtlb' output field
tools/perf/Documentation/perf-record.txt | 7 +-
tools/perf/builtin-record.c | 6 +
tools/perf/ui/browsers/hists.c | 50 ++++-
tools/perf/ui/hist.c | 272 ++++++++++++++++++++++-
tools/perf/ui/stdio/hist.c | 57 +++--
tools/perf/util/evsel.c | 2 +-
tools/perf/util/hist.c | 78 +++++++
tools/perf/util/hist.h | 22 ++
tools/perf/util/mem-events.c | 183 ++++++++++++++-
tools/perf/util/mem-events.h | 57 +++++
tools/perf/util/record.h | 1 +
tools/perf/util/sort.c | 42 +++-
12 files changed, 718 insertions(+), 59 deletions(-)
--
2.49.0.906.g1f30a19c02-goog
Hi Namhyung,
I feel the overall idea is good. Running few simple perf-mem commands
on AMD works fine too. Few general feedback below.
> The name of some new fields are the same as the corresponding sort
> keys (mem, op, snoop) so I had to change the order whether it's
> applied as an output field or a sort key. Maybe it's better to name
> them differently but I couldn't come up with better ideas.
1) These semantic changes of the field name seems counter intuitive
(to me). Example:
-F mem:
Without patch:
$ perf mem report -F overhead,sample,mem --stdio
# Overhead Samples Memory access
39.29% 1 L3 hit
37.50% 21 N/A
23.21% 13 L1 hit
With patch:
$ perf mem report -F overhead,sample,mem --stdio
# Memory
# Overhead Samples Other
100.00% 35 100.0%
-F 'snoop':
Without patch:
$ perf mem report -F overhead,sample,snoop --stdio
# Overhead Samples Snoop
60.71% 34 N/A
39.29% 1 HitM
With patchset:
$ perf mem report -F overhead,sample,snoop --stdio
# --- Snoop ----
# Overhead Samples HitM Other
100.00% 35 39.3% 60.7%
2) It was not intuitive (to me:)) that perf-mem overhead is calculated
using sample->weight by overwriting sample->period. I also don't see
it documented anywhere (or did I miss it?)
perf report:
$ perf report -F overhead,sample,period,dso --stdio
# Overhead Samples Period Shared Object
80.00% 28 2800000 [kernel.kallsyms]
5.71% 2 200000 ld-linux-x86-64.so.2
5.71% 2 200000 libc.so.6
5.71% 2 200000 ls
2.86% 1 100000 libpcre2-8.so.0.11.2
perf mem report:
$ perf mem report -F overhead,sample,period,dso --stdio
# Overhead Samples Period Shared Object
87.50% 28 49 [kernel.kallsyms]
3.57% 2 2 ld-linux-x86-64.so.2
3.57% 2 2 libc.so.6
3.57% 2 2 ls
1.79% 1 1 libpcre2-8.so.0.11.2
3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb
percentages are calculated based on sample->weight.
4) I've similar recommended perf-mem command in perf-amd-ibs man page.
Can you please update alternate command there.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167
Please correct me if I'm missing anything.
Thanks,
Ravi
Hi Ravi, On Thu, May 08, 2025 at 09:42:41AM +0530, Ravi Bangoria wrote: > Hi Namhyung, > > I feel the overall idea is good. Running few simple perf-mem commands > on AMD works fine too. Few general feedback below. Thanks for your review! > > > The name of some new fields are the same as the corresponding sort > > keys (mem, op, snoop) so I had to change the order whether it's > > applied as an output field or a sort key. Maybe it's better to name > > them differently but I couldn't come up with better ideas. > > 1) These semantic changes of the field name seems counter intuitive > (to me). Example: > > -F mem: > > Without patch: > > $ perf mem report -F overhead,sample,mem --stdio > # Overhead Samples Memory access > 39.29% 1 L3 hit > 37.50% 21 N/A > 23.21% 13 L1 hit > > With patch: > > $ perf mem report -F overhead,sample,mem --stdio > # Memory > # Overhead Samples Other > 100.00% 35 100.0% Yep, that's because I split the 'mem' part to 'cache' and 'mem' because he_mem_stat can handle up to 8 entries. As your samples hit mostly in the caches, you'd get the similar result when you run: $ perf mem report -F overhead,sample,cache --stdio > > -F 'snoop': > > Without patch: > > $ perf mem report -F overhead,sample,snoop --stdio > # Overhead Samples Snoop > 60.71% 34 N/A > 39.29% 1 HitM > > With patchset: > > $ perf mem report -F overhead,sample,snoop --stdio > # --- Snoop ---- > # Overhead Samples HitM Other > 100.00% 35 39.3% 60.7% This matches to 'Overhead' distribution without patch, right? > > 2) It was not intuitive (to me:)) that perf-mem overhead is calculated > using sample->weight by overwriting sample->period. I also don't see > it documented anywhere (or did I miss it?) I don't see the documentation and I also find it confusing. Sometimes I think the weight is better but sometimes not. :( At least we could add and option to control that (like --use-weight ?). Also we now have 'weight' output field so users can see it, althought it shows averages. > > perf report: > > $ perf report -F overhead,sample,period,dso --stdio > # Overhead Samples Period Shared Object > 80.00% 28 2800000 [kernel.kallsyms] > 5.71% 2 200000 ld-linux-x86-64.so.2 > 5.71% 2 200000 libc.so.6 > 5.71% 2 200000 ls > 2.86% 1 100000 libpcre2-8.so.0.11.2 > > perf mem report: > > $ perf mem report -F overhead,sample,period,dso --stdio > # Overhead Samples Period Shared Object > 87.50% 28 49 [kernel.kallsyms] > 3.57% 2 2 ld-linux-x86-64.so.2 > 3.57% 2 2 libc.so.6 > 3.57% 2 2 ls > 1.79% 1 1 libpcre2-8.so.0.11.2 > > 3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb > percentages are calculated based on sample->weight. Hmm.. ok. Maybe better to use the original period for percentage breakdown in the new output fields. For examples, in the above result you have 13 samples for L1 and 1 sample for L3 but the weight of L3 access is bigger. But I guess users probably want to see L1 access was dominant. > > 4) I've similar recommended perf-mem command in perf-amd-ibs man page. > Can you please update alternate command there. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167 Sure will do. Thanks, Namhyung > > Please correct me if I'm missing anything. > > Thanks, > Ravi
Hi Namhyung, >>> The name of some new fields are the same as the corresponding sort >>> keys (mem, op, snoop) so I had to change the order whether it's >>> applied as an output field or a sort key. Maybe it's better to name >>> them differently but I couldn't come up with better ideas. >> >> 1) These semantic changes of the field name seems counter intuitive >> (to me). Example: >> >> -F mem: >> >> Without patch: >> >> $ perf mem report -F overhead,sample,mem --stdio >> # Overhead Samples Memory access >> 39.29% 1 L3 hit >> 37.50% 21 N/A >> 23.21% 13 L1 hit >> >> With patch: >> >> $ perf mem report -F overhead,sample,mem --stdio >> # Memory >> # Overhead Samples Other >> 100.00% 35 100.0% > > Yep, that's because I split the 'mem' part to 'cache' and 'mem' because > he_mem_stat can handle up to 8 entries. +1. > As your samples hit mostly in > the caches, you'd get the similar result when you run: > > $ perf mem report -F overhead,sample,cache --stdio > >> >> -F 'snoop': >> >> Without patch: >> >> $ perf mem report -F overhead,sample,snoop --stdio >> # Overhead Samples Snoop >> 60.71% 34 N/A >> 39.29% 1 HitM >> >> With patchset: >> >> $ perf mem report -F overhead,sample,snoop --stdio >> # --- Snoop ---- >> # Overhead Samples HitM Other >> 100.00% 35 39.3% 60.7% > > This matches to 'Overhead' distribution without patch, right? Right, it does. >> 2) It was not intuitive (to me:)) that perf-mem overhead is calculated >> using sample->weight by overwriting sample->period. I also don't see >> it documented anywhere (or did I miss it?) > > I don't see the documentation and I also find it confusing. Sometimes I > think the weight is better but sometimes not. :( At least we could add > and option to control that (like --use-weight ?). this and below ... > Also we now have 'weight' output field so users can see it, althought it > shows averages. > >> >> perf report: >> >> $ perf report -F overhead,sample,period,dso --stdio >> # Overhead Samples Period Shared Object >> 80.00% 28 2800000 [kernel.kallsyms] >> 5.71% 2 200000 ld-linux-x86-64.so.2 >> 5.71% 2 200000 libc.so.6 >> 5.71% 2 200000 ls >> 2.86% 1 100000 libpcre2-8.so.0.11.2 >> >> perf mem report: >> >> $ perf mem report -F overhead,sample,period,dso --stdio >> # Overhead Samples Period Shared Object >> 87.50% 28 49 [kernel.kallsyms] >> 3.57% 2 2 ld-linux-x86-64.so.2 >> 3.57% 2 2 libc.so.6 >> 3.57% 2 2 ls >> 1.79% 1 1 libpcre2-8.so.0.11.2 >> >> 3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb >> percentages are calculated based on sample->weight. > > Hmm.. ok. Maybe better to use the original period for percentage > breakdown in the new output fields. For examples, in the above result > you have 13 samples for L1 and 1 sample for L3 but the weight of L3 > access is bigger. But I guess users probably want to see L1 access was > dominant. ... I'm also not sure. Logically, it makes sense to use weight as overhead. Also it dates back to ~2014 and nobody has complained so far. So I'm just being pedantic 🙂. For now, how about just document it in the perf-mem man page and leave it. Attaching the patch at the end. >> 4) I've similar recommended perf-mem command in perf-amd-ibs man page. >> Can you please update alternate command there. >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167 > > Sure will do. Thanks! ------------><--------------- From 7e4393ab7b20f8d89a5dece08fdd925e3e50b15a Mon Sep 17 00:00:00 2001 From: Ravi Bangoria <ravi.bangoria@amd.com> Date: Mon, 12 May 2025 06:22:57 +0000 Subject: [PATCH] perf mem doc: Describe overhead calculation in brief Unlike perf-report which uses sample period for overhead calculation, perf-mem overhead is calculated using sample weight. Describe perf-mem overhead calculation method in it's man page. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> --- tools/perf/Documentation/perf-mem.txt | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt index a9e3c71a2205..965e73d37772 100644 --- a/tools/perf/Documentation/perf-mem.txt +++ b/tools/perf/Documentation/perf-mem.txt @@ -137,6 +137,25 @@ REPORT OPTIONS In addition, for report all perf report options are valid, and for record all perf record options. +OVERHEAD CALCULATION +-------------------- +Unlike linkperf:perf-report[1], which calculates overhead from the actual +sample period, perf-mem overhead is calculated using sample weight. E.g. +there are two samples in perf.data file, both with the same sample period, +but one sample with weight 180 and the other with weight 20: + + $ perf script -F period,data_src,weight,ip,sym + 100000 629080842 |OP LOAD|LVL L3 hit|... 20 7e69b93ca524 strcmp + 100000 1a29081042 |OP LOAD|LVL RAM hit|... 180 ffffffff82429168 memcpy + + $ perf report -F overhead,symbol + 50% [.] strcmp + 50% [k] memcpy + + $ perf mem report -F overhead,symbol + 90% [k] memcpy + 10% [.] strcmp + SEE ALSO -------- linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1] -- 2.43.0 Thanks, Ravi
On Wed, Apr 30, 2025 at 01:55:37PM -0700, Namhyung Kim wrote: > Hello, > The perf mem uses PERF_SAMPLE_DATA_SRC which has a lot of information > for memory access. It has various sort keys to group related samples > together but it's still cumbersome to see the result. While perf c2c > command provides a way to investigate the data in a specific way, I'd > like to add more generic ways using new output fields. > For example, the following is the 'cache' output field which breaks > down the sample weights into different level of caches. Super cool! > $ perf mem record -a sleep 1 > > $ perf mem report -F cache,dso,sym --stdio > ... > # > # -------------- Cache -------------- > # L1 L2 L3 L1-buf Other Shared Object Symbol > # ................................... ..................................... ......................................... > # > 0.0% 0.0% 0.0% 0.0% 100.0% [kernel.kallsyms] [k] ioread8 > 100.0% 0.0% 0.0% 0.0% 0.0% [kernel.kallsyms] [k] _raw_spin_lock_irq > 0.0% 0.0% 0.0% 0.0% 100.0% [xhci_hcd] [k] xhci_update_erst_dequeue > 0.0% 0.0% 0.0% 95.8% 4.2% [kernel.kallsyms] [k] smaps_account > 0.6% 1.8% 22.7% 45.5% 29.5% [kernel.kallsyms] [k] sched_balance_update_blocked_averages > 29.4% 0.0% 1.6% 58.8% 10.2% [kernel.kallsyms] [k] __update_load_avg_cfs_rq > 0.0% 8.5% 4.3% 0.0% 87.2% [kernel.kallsyms] [k] copy_mc_enhanced_fast_string > 63.9% 0.0% 8.0% 23.8% 4.3% [kernel.kallsyms] [k] psi_group_change > 3.9% 0.0% 9.3% 35.7% 51.1% [kernel.kallsyms] [k] timerqueue_add > 35.9% 10.9% 0.0% 39.0% 14.2% [kernel.kallsyms] [k] memcpy > 94.1% 0.0% 0.0% 5.9% 0.0% [kernel.kallsyms] [k] unmap_page_range > 25.7% 0.0% 4.9% 51.0% 18.4% [kernel.kallsyms] [k] __update_load_avg_se > 0.0% 24.9% 19.4% 9.6% 46.1% [kernel.kallsyms] [k] _copy_to_iter > 12.9% 0.0% 0.0% 87.1% 0.0% [kernel.kallsyms] [k] next_uptodate_folio > 36.8% 0.0% 9.5% 16.6% 37.1% [kernel.kallsyms] [k] update_curr > 100.0% 0.0% 0.0% 0.0% 0.0% bpf_prog_b9611ccbbb3d1833_dfs_iter [k] bpf_prog_b9611ccbbb3d1833_dfs_iter > 45.4% 1.8% 20.4% 23.6% 8.8% [kernel.kallsyms] [k] audit_filter_rules.isra.0 > 92.8% 0.0% 0.0% 7.2% 0.0% [kernel.kallsyms] [k] filemap_map_pages > 10.6% 0.0% 0.0% 89.4% 0.0% [kernel.kallsyms] [k] smaps_page_accumulate > 38.3% 0.0% 29.6% 27.1% 5.0% [kernel.kallsyms] [k] __schedule > Please see the description of each commit for other fields. > New mem_stat field was added to the hist_entry to save this > information. It's a generic data structure (array) to handle > different type of information like cache-level, memory location, > snoop-result, etc. > The first patch is a fix for the hierarchy mode and it was sent > separately. I just add it here not to break the hierarchy mode. The > second patch is to enable SAMPLE_DATA_SRC without SAMPLE_ADDR and > perf_event_attr.mmap_data which generate a lot more data. I merged it and added a test for the hierachy mode as mentioned in my reply to that patch. > The name of some new fields are the same as the corresponding sort > keys (mem, op, snoop) so I had to change the order whether it's > applied as an output field or a sort key. Maybe it's better to name > them differently but I couldn't come up with better ideas. Looks ok at first sight. > That means, you need to use -F/--fields option to specify those fields > and the sort keys you want. Maybe we can change the default output > and sort keys for perf mem report with this. Maybe we can come up with aliases to help using these new features without having to create a long command line, maybe: perf cache Or some other more suitable name. That would just be translated into the long command line for 'perf report', kinda like 'perf kvm', but maybe we can do it like with 'perf archive', i.e. just a shell wrapper? > The code is available at 'perf/mem-field-v1' branch in I'll test it, and I'm CCing Joe Mario, who I think will be very much interesting in trying this! - Arnaldo > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git > Thanks, > Namhyung > Namhyung Kim (11): > perf hist: Remove output field from sort-list properly > perf record: Add --sample-mem-info option > perf hist: Support multi-line header > perf hist: Add struct he_mem_stat > perf hist: Basic support for mem_stat accounting > perf hist: Implement output fields for mem stats > perf mem: Add 'op' output field > perf hist: Hide unused mem stat columns > perf mem: Add 'cache' and 'memory' output fields > perf mem: Add 'snoop' output field > perf mem: Add 'dtlb' output field > > tools/perf/Documentation/perf-record.txt | 7 +- > tools/perf/builtin-record.c | 6 + > tools/perf/ui/browsers/hists.c | 50 ++++- > tools/perf/ui/hist.c | 272 ++++++++++++++++++++++- > tools/perf/ui/stdio/hist.c | 57 +++-- > tools/perf/util/evsel.c | 2 +- > tools/perf/util/hist.c | 78 +++++++ > tools/perf/util/hist.h | 22 ++ > tools/perf/util/mem-events.c | 183 ++++++++++++++- > tools/perf/util/mem-events.h | 57 +++++ > tools/perf/util/record.h | 1 + > tools/perf/util/sort.c | 42 +++- > 12 files changed, 718 insertions(+), 59 deletions(-) > > -- > 2.49.0.906.g1f30a19c02-goog
© 2016 - 2025 Red Hat, Inc.