[RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)

Namhyung Kim posted 11 patches 7 months, 3 weeks ago
tools/perf/Documentation/perf-record.txt |   7 +-
tools/perf/builtin-record.c              |   6 +
tools/perf/ui/browsers/hists.c           |  50 ++++-
tools/perf/ui/hist.c                     | 272 ++++++++++++++++++++++-
tools/perf/ui/stdio/hist.c               |  57 +++--
tools/perf/util/evsel.c                  |   2 +-
tools/perf/util/hist.c                   |  78 +++++++
tools/perf/util/hist.h                   |  22 ++
tools/perf/util/mem-events.c             | 183 ++++++++++++++-
tools/perf/util/mem-events.h             |  57 +++++
tools/perf/util/record.h                 |   1 +
tools/perf/util/sort.c                   |  42 +++-
12 files changed, 718 insertions(+), 59 deletions(-)
[RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
Posted by Namhyung Kim 7 months, 3 weeks ago
Hello,

The perf mem uses PERF_SAMPLE_DATA_SRC which has a lot of information
for memory access.  It has various sort keys to group related samples
together but it's still cumbersome to see the result.  While perf c2c
command provides a way to investigate the data in a specific way, I'd
like to add more generic ways using new output fields.

For example, the following is the 'cache' output field which breaks
down the sample weights into different level of caches.

  $ perf mem record -a sleep 1
  
  $ perf mem report -F cache,dso,sym --stdio
  ...
  #
  # -------------- Cache --------------
  #      L1     L2     L3 L1-buf  Other  Shared Object                                  Symbol
  # ...................................  .....................................  .........................................
  #
       0.0%   0.0%   0.0%   0.0% 100.0%  [kernel.kallsyms]                      [k] ioread8
     100.0%   0.0%   0.0%   0.0%   0.0%  [kernel.kallsyms]                      [k] _raw_spin_lock_irq
       0.0%   0.0%   0.0%   0.0% 100.0%  [xhci_hcd]                             [k] xhci_update_erst_dequeue
       0.0%   0.0%   0.0%  95.8%   4.2%  [kernel.kallsyms]                      [k] smaps_account
       0.6%   1.8%  22.7%  45.5%  29.5%  [kernel.kallsyms]                      [k] sched_balance_update_blocked_averages
      29.4%   0.0%   1.6%  58.8%  10.2%  [kernel.kallsyms]                      [k] __update_load_avg_cfs_rq
       0.0%   8.5%   4.3%   0.0%  87.2%  [kernel.kallsyms]                      [k] copy_mc_enhanced_fast_string
      63.9%   0.0%   8.0%  23.8%   4.3%  [kernel.kallsyms]                      [k] psi_group_change
       3.9%   0.0%   9.3%  35.7%  51.1%  [kernel.kallsyms]                      [k] timerqueue_add
      35.9%  10.9%   0.0%  39.0%  14.2%  [kernel.kallsyms]                      [k] memcpy
      94.1%   0.0%   0.0%   5.9%   0.0%  [kernel.kallsyms]                      [k] unmap_page_range
      25.7%   0.0%   4.9%  51.0%  18.4%  [kernel.kallsyms]                      [k] __update_load_avg_se
       0.0%  24.9%  19.4%   9.6%  46.1%  [kernel.kallsyms]                      [k] _copy_to_iter
      12.9%   0.0%   0.0%  87.1%   0.0%  [kernel.kallsyms]                      [k] next_uptodate_folio
      36.8%   0.0%   9.5%  16.6%  37.1%  [kernel.kallsyms]                      [k] update_curr
     100.0%   0.0%   0.0%   0.0%   0.0%  bpf_prog_b9611ccbbb3d1833_dfs_iter     [k] bpf_prog_b9611ccbbb3d1833_dfs_iter
      45.4%   1.8%  20.4%  23.6%   8.8%  [kernel.kallsyms]                      [k] audit_filter_rules.isra.0
      92.8%   0.0%   0.0%   7.2%   0.0%  [kernel.kallsyms]                      [k] filemap_map_pages
      10.6%   0.0%   0.0%  89.4%   0.0%  [kernel.kallsyms]                      [k] smaps_page_accumulate
      38.3%   0.0%  29.6%  27.1%   5.0%  [kernel.kallsyms]                      [k] __schedule

Please see the description of each commit for other fields.

New mem_stat field was added to the hist_entry to save this
information.  It's a generic data structure (array) to handle
different type of information like cache-level, memory location,
snoop-result, etc.

The first patch is a fix for the hierarchy mode and it was sent
separately.  I just add it here not to break the hierarchy mode.  The
second patch is to enable SAMPLE_DATA_SRC without SAMPLE_ADDR and
perf_event_attr.mmap_data which generate a lot more data.

The name of some new fields are the same as the corresponding sort
keys (mem, op, snoop) so I had to change the order whether it's
applied as an output field or a sort key.  Maybe it's better to name
them differently but I couldn't come up with better ideas.

That means, you need to use -F/--fields option to specify those fields
and the sort keys you want.  Maybe we can change the default output
and sort keys for perf mem report with this.

The code is available at 'perf/mem-field-v1' branch in

 git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (11):
  perf hist: Remove output field from sort-list properly
  perf record: Add --sample-mem-info option
  perf hist: Support multi-line header
  perf hist: Add struct he_mem_stat
  perf hist: Basic support for mem_stat accounting
  perf hist: Implement output fields for mem stats
  perf mem: Add 'op' output field
  perf hist: Hide unused mem stat columns
  perf mem: Add 'cache' and 'memory' output fields
  perf mem: Add 'snoop' output field
  perf mem: Add 'dtlb' output field

 tools/perf/Documentation/perf-record.txt |   7 +-
 tools/perf/builtin-record.c              |   6 +
 tools/perf/ui/browsers/hists.c           |  50 ++++-
 tools/perf/ui/hist.c                     | 272 ++++++++++++++++++++++-
 tools/perf/ui/stdio/hist.c               |  57 +++--
 tools/perf/util/evsel.c                  |   2 +-
 tools/perf/util/hist.c                   |  78 +++++++
 tools/perf/util/hist.h                   |  22 ++
 tools/perf/util/mem-events.c             | 183 ++++++++++++++-
 tools/perf/util/mem-events.h             |  57 +++++
 tools/perf/util/record.h                 |   1 +
 tools/perf/util/sort.c                   |  42 +++-
 12 files changed, 718 insertions(+), 59 deletions(-)

-- 
2.49.0.906.g1f30a19c02-goog
Re: [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
Posted by Ravi Bangoria 7 months, 2 weeks ago
Hi Namhyung,

I feel the overall idea is good. Running few simple perf-mem commands
on AMD works fine too. Few general feedback below.

> The name of some new fields are the same as the corresponding sort
> keys (mem, op, snoop) so I had to change the order whether it's
> applied as an output field or a sort key.  Maybe it's better to name
> them differently but I couldn't come up with better ideas.

1) These semantic changes of the field name seems counter intuitive
   (to me). Example:

   -F mem:

     Without patch:

     $ perf mem report -F overhead,sample,mem --stdio
     # Overhead       Samples  Memory access
         39.29%             1  L3 hit
         37.50%            21  N/A
         23.21%            13  L1 hit

     With patch:

     $ perf mem report -F overhead,sample,mem --stdio
     #                          Memory
     # Overhead       Samples    Other
        100.00%            35   100.0%

   -F 'snoop':

     Without patch:

     $ perf mem report -F overhead,sample,snoop --stdio
     # Overhead       Samples  Snoop
         60.71%            34  N/A
         39.29%             1  HitM
   
     With patchset:

     $ perf mem report -F overhead,sample,snoop --stdio
     #                         --- Snoop ----
     # Overhead       Samples     HitM  Other
        100.00%            35    39.3%  60.7%

2) It was not intuitive (to me:)) that perf-mem overhead is calculated
   using sample->weight by overwriting sample->period. I also don't see
   it documented anywhere (or did I miss it?)

   perf report:

     $ perf report -F overhead,sample,period,dso --stdio
     # Overhead  Samples   Period  Shared Object
         80.00%       28  2800000  [kernel.kallsyms]
          5.71%        2   200000  ld-linux-x86-64.so.2
          5.71%        2   200000  libc.so.6
          5.71%        2   200000  ls
          2.86%        1   100000  libpcre2-8.so.0.11.2

   perf mem report:

     $ perf mem report -F overhead,sample,period,dso --stdio
     # Overhead  Samples   Period  Shared Object
         87.50%       28       49  [kernel.kallsyms]
          3.57%        2        2  ld-linux-x86-64.so.2
          3.57%        2        2  libc.so.6
          3.57%        2        2  ls
          1.79%        1        1  libpcre2-8.so.0.11.2

3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb
   percentages are calculated based on sample->weight.

4) I've similar recommended perf-mem command in perf-amd-ibs man page.
   Can you please update alternate command there.
   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167

Please correct me if I'm missing anything.

Thanks,
Ravi
Re: [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
Posted by Namhyung Kim 7 months, 1 week ago
Hi Ravi,

On Thu, May 08, 2025 at 09:42:41AM +0530, Ravi Bangoria wrote:
> Hi Namhyung,
> 
> I feel the overall idea is good. Running few simple perf-mem commands
> on AMD works fine too. Few general feedback below.

Thanks for your review!

> 
> > The name of some new fields are the same as the corresponding sort
> > keys (mem, op, snoop) so I had to change the order whether it's
> > applied as an output field or a sort key.  Maybe it's better to name
> > them differently but I couldn't come up with better ideas.
> 
> 1) These semantic changes of the field name seems counter intuitive
>    (to me). Example:
> 
>    -F mem:
> 
>      Without patch:
> 
>      $ perf mem report -F overhead,sample,mem --stdio
>      # Overhead       Samples  Memory access
>          39.29%             1  L3 hit
>          37.50%            21  N/A
>          23.21%            13  L1 hit
> 
>      With patch:
> 
>      $ perf mem report -F overhead,sample,mem --stdio
>      #                          Memory
>      # Overhead       Samples    Other
>         100.00%            35   100.0%

Yep, that's because I split the 'mem' part to 'cache' and 'mem' because
he_mem_stat can handle up to 8 entries.  As your samples hit mostly in
the caches, you'd get the similar result when you run:

  $ perf mem report -F overhead,sample,cache --stdio

> 
>    -F 'snoop':
> 
>      Without patch:
> 
>      $ perf mem report -F overhead,sample,snoop --stdio
>      # Overhead       Samples  Snoop
>          60.71%            34  N/A
>          39.29%             1  HitM
>    
>      With patchset:
> 
>      $ perf mem report -F overhead,sample,snoop --stdio
>      #                         --- Snoop ----
>      # Overhead       Samples     HitM  Other
>         100.00%            35    39.3%  60.7%

This matches to 'Overhead' distribution without patch, right?

> 
> 2) It was not intuitive (to me:)) that perf-mem overhead is calculated
>    using sample->weight by overwriting sample->period. I also don't see
>    it documented anywhere (or did I miss it?)

I don't see the documentation and I also find it confusing.  Sometimes I
think the weight is better but sometimes not. :(  At least we could add
and option to control that (like --use-weight ?).

Also we now have 'weight' output field so users can see it, althought it
shows averages.

> 
>    perf report:
> 
>      $ perf report -F overhead,sample,period,dso --stdio
>      # Overhead  Samples   Period  Shared Object
>          80.00%       28  2800000  [kernel.kallsyms]
>           5.71%        2   200000  ld-linux-x86-64.so.2
>           5.71%        2   200000  libc.so.6
>           5.71%        2   200000  ls
>           2.86%        1   100000  libpcre2-8.so.0.11.2
> 
>    perf mem report:
> 
>      $ perf mem report -F overhead,sample,period,dso --stdio
>      # Overhead  Samples   Period  Shared Object
>          87.50%       28       49  [kernel.kallsyms]
>           3.57%        2        2  ld-linux-x86-64.so.2
>           3.57%        2        2  libc.so.6
>           3.57%        2        2  ls
>           1.79%        1        1  libpcre2-8.so.0.11.2
> 
> 3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb
>    percentages are calculated based on sample->weight.

Hmm.. ok.  Maybe better to use the original period for percentage
breakdown in the new output fields.  For examples, in the above result
you have 13 samples for L1 and 1 sample for L3 but the weight of L3
access is bigger.  But I guess users probably want to see L1 access was
dominant.

> 
> 4) I've similar recommended perf-mem command in perf-amd-ibs man page.
>    Can you please update alternate command there.
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167

Sure will do.

Thanks,
Namhyung

> 
> Please correct me if I'm missing anything.
> 
> Thanks,
> Ravi
Re: [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
Posted by Ravi Bangoria 7 months, 1 week ago
Hi Namhyung,

>>> The name of some new fields are the same as the corresponding sort
>>> keys (mem, op, snoop) so I had to change the order whether it's
>>> applied as an output field or a sort key.  Maybe it's better to name
>>> them differently but I couldn't come up with better ideas.
>>
>> 1) These semantic changes of the field name seems counter intuitive
>>    (to me). Example:
>>
>>    -F mem:
>>
>>      Without patch:
>>
>>      $ perf mem report -F overhead,sample,mem --stdio
>>      # Overhead       Samples  Memory access
>>          39.29%             1  L3 hit
>>          37.50%            21  N/A
>>          23.21%            13  L1 hit
>>
>>      With patch:
>>
>>      $ perf mem report -F overhead,sample,mem --stdio
>>      #                          Memory
>>      # Overhead       Samples    Other
>>         100.00%            35   100.0%
> 
> Yep, that's because I split the 'mem' part to 'cache' and 'mem' because
> he_mem_stat can handle up to 8 entries.

+1.

>  As your samples hit mostly in
> the caches, you'd get the similar result when you run:
> 
>   $ perf mem report -F overhead,sample,cache --stdio
> 
>>
>>    -F 'snoop':
>>
>>      Without patch:
>>
>>      $ perf mem report -F overhead,sample,snoop --stdio
>>      # Overhead       Samples  Snoop
>>          60.71%            34  N/A
>>          39.29%             1  HitM
>>    
>>      With patchset:
>>
>>      $ perf mem report -F overhead,sample,snoop --stdio
>>      #                         --- Snoop ----
>>      # Overhead       Samples     HitM  Other
>>         100.00%            35    39.3%  60.7%
> 
> This matches to 'Overhead' distribution without patch, right?

Right, it does.

>> 2) It was not intuitive (to me:)) that perf-mem overhead is calculated
>>    using sample->weight by overwriting sample->period. I also don't see
>>    it documented anywhere (or did I miss it?)
> 
> I don't see the documentation and I also find it confusing.  Sometimes I
> think the weight is better but sometimes not. :(  At least we could add
> and option to control that (like --use-weight ?).

this and below ...

> Also we now have 'weight' output field so users can see it, althought it
> shows averages.
> 
>>
>>    perf report:
>>
>>      $ perf report -F overhead,sample,period,dso --stdio
>>      # Overhead  Samples   Period  Shared Object
>>          80.00%       28  2800000  [kernel.kallsyms]
>>           5.71%        2   200000  ld-linux-x86-64.so.2
>>           5.71%        2   200000  libc.so.6
>>           5.71%        2   200000  ls
>>           2.86%        1   100000  libpcre2-8.so.0.11.2
>>
>>    perf mem report:
>>
>>      $ perf mem report -F overhead,sample,period,dso --stdio
>>      # Overhead  Samples   Period  Shared Object
>>          87.50%       28       49  [kernel.kallsyms]
>>           3.57%        2        2  ld-linux-x86-64.so.2
>>           3.57%        2        2  libc.so.6
>>           3.57%        2        2  ls
>>           1.79%        1        1  libpcre2-8.so.0.11.2
>>
>> 3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb
>>    percentages are calculated based on sample->weight.
> 
> Hmm.. ok.  Maybe better to use the original period for percentage
> breakdown in the new output fields.  For examples, in the above result
> you have 13 samples for L1 and 1 sample for L3 but the weight of L3
> access is bigger.  But I guess users probably want to see L1 access was
> dominant.

... I'm also not sure. Logically, it makes sense to use weight as overhead.
Also it dates back to ~2014 and nobody has complained so far. So I'm just
being pedantic 🙂. For now, how about just document it in the perf-mem man
page and leave it. Attaching the patch at the end.

>> 4) I've similar recommended perf-mem command in perf-amd-ibs man page.
>>    Can you please update alternate command there.
>>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167
> 
> Sure will do.

Thanks!

------------><---------------
From 7e4393ab7b20f8d89a5dece08fdd925e3e50b15a Mon Sep 17 00:00:00 2001
From: Ravi Bangoria <ravi.bangoria@amd.com>
Date: Mon, 12 May 2025 06:22:57 +0000
Subject: [PATCH] perf mem doc: Describe overhead calculation in brief

Unlike perf-report which uses sample period for overhead calculation,
perf-mem overhead is calculated using sample weight. Describe perf-mem
overhead calculation method in it's man page.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/Documentation/perf-mem.txt | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index a9e3c71a2205..965e73d37772 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -137,6 +137,25 @@ REPORT OPTIONS
 In addition, for report all perf report options are valid, and for record
 all perf record options.
 
+OVERHEAD CALCULATION
+--------------------
+Unlike linkperf:perf-report[1], which calculates overhead from the actual
+sample period, perf-mem overhead is calculated using sample weight. E.g.
+there are two samples in perf.data file, both with the same sample period,
+but one sample with weight 180 and the other with weight 20:
+
+  $ perf script -F period,data_src,weight,ip,sym
+  100000    629080842 |OP LOAD|LVL L3 hit|...     20       7e69b93ca524 strcmp
+  100000   1a29081042 |OP LOAD|LVL RAM hit|...   180   ffffffff82429168 memcpy
+
+  $ perf report -F overhead,symbol
+  50%   [.] strcmp
+  50%   [k] memcpy
+
+  $ perf mem report -F overhead,symbol
+  90%   [k] memcpy
+  10%   [.] strcmp
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]
-- 
2.43.0

Thanks,
Ravi
Re: [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
Posted by Arnaldo Carvalho de Melo 7 months, 2 weeks ago
On Wed, Apr 30, 2025 at 01:55:37PM -0700, Namhyung Kim wrote:
> Hello,
 
> The perf mem uses PERF_SAMPLE_DATA_SRC which has a lot of information
> for memory access.  It has various sort keys to group related samples
> together but it's still cumbersome to see the result.  While perf c2c
> command provides a way to investigate the data in a specific way, I'd
> like to add more generic ways using new output fields.
 
> For example, the following is the 'cache' output field which breaks
> down the sample weights into different level of caches.

Super cool!
 
>   $ perf mem record -a sleep 1
>   
>   $ perf mem report -F cache,dso,sym --stdio
>   ...
>   #
>   # -------------- Cache --------------
>   #      L1     L2     L3 L1-buf  Other  Shared Object                                  Symbol
>   # ...................................  .....................................  .........................................
>   #
>        0.0%   0.0%   0.0%   0.0% 100.0%  [kernel.kallsyms]                      [k] ioread8
>      100.0%   0.0%   0.0%   0.0%   0.0%  [kernel.kallsyms]                      [k] _raw_spin_lock_irq
>        0.0%   0.0%   0.0%   0.0% 100.0%  [xhci_hcd]                             [k] xhci_update_erst_dequeue
>        0.0%   0.0%   0.0%  95.8%   4.2%  [kernel.kallsyms]                      [k] smaps_account
>        0.6%   1.8%  22.7%  45.5%  29.5%  [kernel.kallsyms]                      [k] sched_balance_update_blocked_averages
>       29.4%   0.0%   1.6%  58.8%  10.2%  [kernel.kallsyms]                      [k] __update_load_avg_cfs_rq
>        0.0%   8.5%   4.3%   0.0%  87.2%  [kernel.kallsyms]                      [k] copy_mc_enhanced_fast_string
>       63.9%   0.0%   8.0%  23.8%   4.3%  [kernel.kallsyms]                      [k] psi_group_change
>        3.9%   0.0%   9.3%  35.7%  51.1%  [kernel.kallsyms]                      [k] timerqueue_add
>       35.9%  10.9%   0.0%  39.0%  14.2%  [kernel.kallsyms]                      [k] memcpy
>       94.1%   0.0%   0.0%   5.9%   0.0%  [kernel.kallsyms]                      [k] unmap_page_range
>       25.7%   0.0%   4.9%  51.0%  18.4%  [kernel.kallsyms]                      [k] __update_load_avg_se
>        0.0%  24.9%  19.4%   9.6%  46.1%  [kernel.kallsyms]                      [k] _copy_to_iter
>       12.9%   0.0%   0.0%  87.1%   0.0%  [kernel.kallsyms]                      [k] next_uptodate_folio
>       36.8%   0.0%   9.5%  16.6%  37.1%  [kernel.kallsyms]                      [k] update_curr
>      100.0%   0.0%   0.0%   0.0%   0.0%  bpf_prog_b9611ccbbb3d1833_dfs_iter     [k] bpf_prog_b9611ccbbb3d1833_dfs_iter
>       45.4%   1.8%  20.4%  23.6%   8.8%  [kernel.kallsyms]                      [k] audit_filter_rules.isra.0
>       92.8%   0.0%   0.0%   7.2%   0.0%  [kernel.kallsyms]                      [k] filemap_map_pages
>       10.6%   0.0%   0.0%  89.4%   0.0%  [kernel.kallsyms]                      [k] smaps_page_accumulate
>       38.3%   0.0%  29.6%  27.1%   5.0%  [kernel.kallsyms]                      [k] __schedule
 
> Please see the description of each commit for other fields.
 
> New mem_stat field was added to the hist_entry to save this
> information.  It's a generic data structure (array) to handle
> different type of information like cache-level, memory location,
> snoop-result, etc.
 
> The first patch is a fix for the hierarchy mode and it was sent
> separately.  I just add it here not to break the hierarchy mode.  The
> second patch is to enable SAMPLE_DATA_SRC without SAMPLE_ADDR and
> perf_event_attr.mmap_data which generate a lot more data.

I merged it and added a test for the hierachy mode as mentioned in my
reply to that patch.
 
> The name of some new fields are the same as the corresponding sort
> keys (mem, op, snoop) so I had to change the order whether it's
> applied as an output field or a sort key.  Maybe it's better to name
> them differently but I couldn't come up with better ideas.

Looks ok at first sight.
 
> That means, you need to use -F/--fields option to specify those fields
> and the sort keys you want.  Maybe we can change the default output
> and sort keys for perf mem report with this.

Maybe we can come up with aliases to help using these new features
without having to create a long command line, maybe:

perf cache

Or some other more suitable name.

That would just be translated into the long command line for 'perf
report', kinda like 'perf kvm', but maybe we can do it like with 'perf
archive', i.e. just a shell wrapper?
 
> The code is available at 'perf/mem-field-v1' branch in

I'll test it, and I'm CCing Joe Mario, who I think will be very much
interesting in trying this!

- Arnaldo
 
>  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
 
> Thanks,
> Namhyung
 
> Namhyung Kim (11):
>   perf hist: Remove output field from sort-list properly
>   perf record: Add --sample-mem-info option
>   perf hist: Support multi-line header
>   perf hist: Add struct he_mem_stat
>   perf hist: Basic support for mem_stat accounting
>   perf hist: Implement output fields for mem stats
>   perf mem: Add 'op' output field
>   perf hist: Hide unused mem stat columns
>   perf mem: Add 'cache' and 'memory' output fields
>   perf mem: Add 'snoop' output field
>   perf mem: Add 'dtlb' output field
> 
>  tools/perf/Documentation/perf-record.txt |   7 +-
>  tools/perf/builtin-record.c              |   6 +
>  tools/perf/ui/browsers/hists.c           |  50 ++++-
>  tools/perf/ui/hist.c                     | 272 ++++++++++++++++++++++-
>  tools/perf/ui/stdio/hist.c               |  57 +++--
>  tools/perf/util/evsel.c                  |   2 +-
>  tools/perf/util/hist.c                   |  78 +++++++
>  tools/perf/util/hist.h                   |  22 ++
>  tools/perf/util/mem-events.c             | 183 ++++++++++++++-
>  tools/perf/util/mem-events.h             |  57 +++++
>  tools/perf/util/record.h                 |   1 +
>  tools/perf/util/sort.c                   |  42 +++-
>  12 files changed, 718 insertions(+), 59 deletions(-)
> 
> -- 
> 2.49.0.906.g1f30a19c02-goog