[PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error

kan.liang@linux.intel.com posted 9 patches 1 year, 5 months ago
[PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error
Posted by kan.liang@linux.intel.com 1 year, 5 months ago
From: Kan Liang <kan.liang@linux.intel.com>

The --total-cycles may output wrong information with the --stdio.

For example,
  perf record -e "{cycles,instructions}",cache-misses -b sleep 1
  perf report --total-cycles --stdio

The total cycles output of {cycles,instructions} and cache-misses are
almost the same.

 # Samples: 938  of events 'anon group { cycles, instructions }'
 # Event count (approx.): 938
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
 # ...............  ..............  ...........  ..........
  ..................................................>
 #
           11.19%            2.6K        0.10%          21
                          [perf_iterate_ctx+48 -> >
            5.79%            1.4K        0.45%          97
            [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
            5.11%            1.2K        0.33%          71
                             [native_write_msr+0 ->>

 # Samples: 293  of event 'cache-misses'
 # Event count (approx.): 293
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
                                                  [>
 # ...............  ..............  ...........  ..........
   ..................................................>
 #
           11.19%            2.6K        0.13%          21
                          [perf_iterate_ctx+48 -> >
            5.79%            1.4K        0.59%          97
[__intel_pmu_enable_all.constprop.0+80 -> __intel_>
            5.11%            1.2K        0.43%          71
                             [native_write_msr+0 ->>

With the symbol_conf.event_group, the perf report should only report the
block information of the leader event in a group.
However, the current implementation retrieves the next event's block
information, rather than the next group leader's block information.

Make sure the index is updated even if the event is skipped.

With the patch,

 # Samples: 293  of event 'cache-misses'
 # Event count (approx.): 293
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
                                                  [>
 # ...............  ..............  ...........  ..........
   ..................................................>
 #
           37.98%            9.0K        4.05%         299
   [perf_event_addr_filters_exec+0 -> perf_event_a>
           11.19%            2.6K        0.28%          21
                          [perf_iterate_ctx+48 -> >
            5.79%            1.4K        1.32%          97
[__intel_pmu_enable_all.constprop.0+80 -> __intel_>

Fixes: 6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-report.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dfb47fa85e5c..04b9a5c1bc7e 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -565,6 +565,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
 		struct hists *hists = evsel__hists(pos);
 		const char *evname = evsel__name(pos);
 
+		i++;
 		if (symbol_conf.event_group && !evsel__is_group_leader(pos))
 			continue;
 
@@ -574,7 +575,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
 		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
 
 		if (rep->total_cycles_mode) {
-			report__browse_block_hists(&rep->block_reports[i++].hist,
+			report__browse_block_hists(&rep->block_reports[i - 1].hist,
 						   rep->min_percent, pos, NULL);
 			continue;
 		}
-- 
2.38.1
Re: [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error
Posted by Arnaldo Carvalho de Melo 1 year, 5 months ago
On Tue, Aug 13, 2024 at 09:02:00AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The --total-cycles may output wrong information with the --stdio.

Hey, I tried --total-cycles with --group but that didn't work, do you
think that would make sense?

Anyway, all applied, now testing and reviewing the changes,

thanks!

- Arnaldo
 
> For example,
>   perf record -e "{cycles,instructions}",cache-misses -b sleep 1
>   perf report --total-cycles --stdio
> 
> The total cycles output of {cycles,instructions} and cache-misses are
> almost the same.
> 
>  # Samples: 938  of events 'anon group { cycles, instructions }'
>  # Event count (approx.): 938
>  #
>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>  # ...............  ..............  ...........  ..........
>   ..................................................>
>  #
>            11.19%            2.6K        0.10%          21
>                           [perf_iterate_ctx+48 -> >
>             5.79%            1.4K        0.45%          97
>             [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>             5.11%            1.2K        0.33%          71
>                              [native_write_msr+0 ->>
> 
>  # Samples: 293  of event 'cache-misses'
>  # Event count (approx.): 293
>  #
>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>                                                   [>
>  # ...............  ..............  ...........  ..........
>    ..................................................>
>  #
>            11.19%            2.6K        0.13%          21
>                           [perf_iterate_ctx+48 -> >
>             5.79%            1.4K        0.59%          97
> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>             5.11%            1.2K        0.43%          71
>                              [native_write_msr+0 ->>
> 
> With the symbol_conf.event_group, the perf report should only report the
> block information of the leader event in a group.
> However, the current implementation retrieves the next event's block
> information, rather than the next group leader's block information.
> 
> Make sure the index is updated even if the event is skipped.
> 
> With the patch,
> 
>  # Samples: 293  of event 'cache-misses'
>  # Event count (approx.): 293
>  #
>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>                                                   [>
>  # ...............  ..............  ...........  ..........
>    ..................................................>
>  #
>            37.98%            9.0K        4.05%         299
>    [perf_event_addr_filters_exec+0 -> perf_event_a>
>            11.19%            2.6K        0.28%          21
>                           [perf_iterate_ctx+48 -> >
>             5.79%            1.4K        1.32%          97
> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
> 
> Fixes: 6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
> Acked-by: Namhyung Kim <namhyung@kernel.org>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/perf/builtin-report.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index dfb47fa85e5c..04b9a5c1bc7e 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -565,6 +565,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>  		struct hists *hists = evsel__hists(pos);
>  		const char *evname = evsel__name(pos);
>  
> +		i++;
>  		if (symbol_conf.event_group && !evsel__is_group_leader(pos))
>  			continue;
>  
> @@ -574,7 +575,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>  		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>  
>  		if (rep->total_cycles_mode) {
> -			report__browse_block_hists(&rep->block_reports[i++].hist,
> +			report__browse_block_hists(&rep->block_reports[i - 1].hist,
>  						   rep->min_percent, pos, NULL);
>  			continue;
>  		}
> -- 
> 2.38.1
Re: [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error
Posted by Liang, Kan 1 year, 5 months ago

On 2024-08-13 2:44 p.m., Arnaldo Carvalho de Melo wrote:
> On Tue, Aug 13, 2024 at 09:02:00AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The --total-cycles may output wrong information with the --stdio.
> 
> Hey, I tried --total-cycles with --group but that didn't work, do you
> think that would make sense?

The current implementation doesn't handle the symbol_conf.event_group
for the tui mode.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/builtin-report.c#n543

I will send a separate patch to fix it and make it consistent.

Thanks,
Kan

> 
> Anyway, all applied, now testing and reviewing the changes,
> 
> thanks!
> 
> - Arnaldo
>  
>> For example,
>>   perf record -e "{cycles,instructions}",cache-misses -b sleep 1
>>   perf report --total-cycles --stdio
>>
>> The total cycles output of {cycles,instructions} and cache-misses are
>> almost the same.
>>
>>  # Samples: 938  of events 'anon group { cycles, instructions }'
>>  # Event count (approx.): 938
>>  #
>>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>>  # ...............  ..............  ...........  ..........
>>   ..................................................>
>>  #
>>            11.19%            2.6K        0.10%          21
>>                           [perf_iterate_ctx+48 -> >
>>             5.79%            1.4K        0.45%          97
>>             [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>>             5.11%            1.2K        0.33%          71
>>                              [native_write_msr+0 ->>
>>
>>  # Samples: 293  of event 'cache-misses'
>>  # Event count (approx.): 293
>>  #
>>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>>                                                   [>
>>  # ...............  ..............  ...........  ..........
>>    ..................................................>
>>  #
>>            11.19%            2.6K        0.13%          21
>>                           [perf_iterate_ctx+48 -> >
>>             5.79%            1.4K        0.59%          97
>> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>>             5.11%            1.2K        0.43%          71
>>                              [native_write_msr+0 ->>
>>
>> With the symbol_conf.event_group, the perf report should only report the
>> block information of the leader event in a group.
>> However, the current implementation retrieves the next event's block
>> information, rather than the next group leader's block information.
>>
>> Make sure the index is updated even if the event is skipped.
>>
>> With the patch,
>>
>>  # Samples: 293  of event 'cache-misses'
>>  # Event count (approx.): 293
>>  #
>>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>>                                                   [>
>>  # ...............  ..............  ...........  ..........
>>    ..................................................>
>>  #
>>            37.98%            9.0K        4.05%         299
>>    [perf_event_addr_filters_exec+0 -> perf_event_a>
>>            11.19%            2.6K        0.28%          21
>>                           [perf_iterate_ctx+48 -> >
>>             5.79%            1.4K        1.32%          97
>> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>>
>> Fixes: 6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
>> Acked-by: Namhyung Kim <namhyung@kernel.org>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>  tools/perf/builtin-report.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> index dfb47fa85e5c..04b9a5c1bc7e 100644
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -565,6 +565,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>>  		struct hists *hists = evsel__hists(pos);
>>  		const char *evname = evsel__name(pos);
>>  
>> +		i++;
>>  		if (symbol_conf.event_group && !evsel__is_group_leader(pos))
>>  			continue;
>>  
>> @@ -574,7 +575,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>>  		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>>  
>>  		if (rep->total_cycles_mode) {
>> -			report__browse_block_hists(&rep->block_reports[i++].hist,
>> +			report__browse_block_hists(&rep->block_reports[i - 1].hist,
>>  						   rep->min_percent, pos, NULL);
>>  			continue;
>>  		}
>> -- 
>> 2.38.1
>