[PATCH v4 2/4] perf stat: Use counter cpumask to skip zero values

Ian Rogers posted 4 patches 11 months, 2 weeks ago
There is a newer version of this series
[PATCH v4 2/4] perf stat: Use counter cpumask to skip zero values
Posted by Ian Rogers 11 months, 2 weeks ago
When a counter is 0 it may or may not be skipped. For uncore counters
it is common they are only valid on 1 logical CPU and all other CPUs
should be skipped. The PMU's cpumask was used for the skip
calculation, but that cpumask may not reflect user
overrides. Similarly a counter on a core PMU may explicitly not
request a CPU be gathered. If the counter on this CPU's value is 0
then the counter should be skipped as it wasn't requested. Switch from
using the PMU cpumask to that associated with the evsel to support
these cases.

Avoid potential crash with --per-thread mode where config->aggr_get_id
is NULL. Add some examples for the tool event 0 counter skipping.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/stat-display.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index ba79f73e1cf5..32badf623267 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1042,8 +1042,16 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
 		return true;
 
 	/*
-	 * Many tool events are only gathered on the first index, skip other
-	 * zero values.
+	 * In per-thread mode the aggr_map and aggr_get_id functions may be
+	 * NULL, assume all 0 values should be output in that case.
+	 */
+	if (!config->aggr_map || !config->aggr_get_id)
+		return false;
+
+	/*
+	 * Tool events may be gathered on all logical CPUs, for example
+	 * system_time, but for many the first index is the only one used, for
+	 * example num_cores. Don't skip for the first index.
 	 */
 	if (evsel__is_tool(counter)) {
 		struct aggr_cpu_id own_id =
@@ -1051,15 +1059,12 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
 
 		return !aggr_cpu_id__equal(id, &own_id);
 	}
-
 	/*
-	 * Skip value 0 when it's an uncore event and the given aggr id
-	 * does not belong to the PMU cpumask.
+	 * Skip value 0 when the counter's cpumask doesn't match the given aggr
+	 * id.
 	 */
-	if (!counter->pmu || !counter->pmu->is_uncore)
-		return false;
 
-	perf_cpu_map__for_each_cpu(cpu, idx, counter->pmu->cpus) {
+	perf_cpu_map__for_each_cpu(cpu, idx, counter->core.cpus) {
 		struct aggr_cpu_id own_id = config->aggr_get_id(config, cpu);
 
 		if (aggr_cpu_id__equal(id, &own_id))
-- 
2.47.1.613.gc27f4b7a9f-goog
Re: [PATCH v4 2/4] perf stat: Use counter cpumask to skip zero values
Posted by Namhyung Kim 11 months, 2 weeks ago
On Tue, Jan 07, 2025 at 09:34:26PM -0800, Ian Rogers wrote:
> When a counter is 0 it may or may not be skipped. For uncore counters
> it is common they are only valid on 1 logical CPU and all other CPUs
> should be skipped. The PMU's cpumask was used for the skip
> calculation, but that cpumask may not reflect user overrides.

It's not clear to me how uncore PMU works with CPU overrides.
I thought it's ignored and the kernel changed the CPU internally
using the cpumask.  But it may be transparent to userspace and
we can think it works as what we expect.

Anyway, the commit dd15480a3d67b9cf ("perf stat: Hide invalid uncore
event output for aggr mode") added the code and the concern was like

  $ sudo ./perf stat -a --per-core -e power/energy-pkg/ sleep 1

So it should be fine as long as the output remains the same.


> Similarly a counter on a core PMU may explicitly not
> request a CPU be gathered. If the counter on this CPU's value is 0
> then the counter should be skipped as it wasn't requested. Switch from
> using the PMU cpumask to that associated with the evsel to support
> these cases.

Do you mean hybrid PMUs?  I guess they won't open events on not
supported/requested CPUs in the first place, right?

Thanks,
Namhyung

> 
> Avoid potential crash with --per-thread mode where config->aggr_get_id
> is NULL. Add some examples for the tool event 0 counter skipping.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/stat-display.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index ba79f73e1cf5..32badf623267 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -1042,8 +1042,16 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
>  		return true;
>  
>  	/*
> -	 * Many tool events are only gathered on the first index, skip other
> -	 * zero values.
> +	 * In per-thread mode the aggr_map and aggr_get_id functions may be
> +	 * NULL, assume all 0 values should be output in that case.
> +	 */
> +	if (!config->aggr_map || !config->aggr_get_id)
> +		return false;
> +
> +	/*
> +	 * Tool events may be gathered on all logical CPUs, for example
> +	 * system_time, but for many the first index is the only one used, for
> +	 * example num_cores. Don't skip for the first index.
>  	 */
>  	if (evsel__is_tool(counter)) {
>  		struct aggr_cpu_id own_id =
> @@ -1051,15 +1059,12 @@ static bool should_skip_zero_counter(struct perf_stat_config *config,
>  
>  		return !aggr_cpu_id__equal(id, &own_id);
>  	}
> -
>  	/*
> -	 * Skip value 0 when it's an uncore event and the given aggr id
> -	 * does not belong to the PMU cpumask.
> +	 * Skip value 0 when the counter's cpumask doesn't match the given aggr
> +	 * id.
>  	 */
> -	if (!counter->pmu || !counter->pmu->is_uncore)
> -		return false;
>  
> -	perf_cpu_map__for_each_cpu(cpu, idx, counter->pmu->cpus) {
> +	perf_cpu_map__for_each_cpu(cpu, idx, counter->core.cpus) {
>  		struct aggr_cpu_id own_id = config->aggr_get_id(config, cpu);
>  
>  		if (aggr_cpu_id__equal(id, &own_id))
> -- 
> 2.47.1.613.gc27f4b7a9f-goog
>
Re: [PATCH v4 2/4] perf stat: Use counter cpumask to skip zero values
Posted by Ian Rogers 11 months, 1 week ago
On Wed, Jan 8, 2025 at 11:45 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Jan 07, 2025 at 09:34:26PM -0800, Ian Rogers wrote:
> > When a counter is 0 it may or may not be skipped. For uncore counters
> > it is common they are only valid on 1 logical CPU and all other CPUs
> > should be skipped. The PMU's cpumask was used for the skip
> > calculation, but that cpumask may not reflect user overrides.
>
> It's not clear to me how uncore PMU works with CPU overrides.
> I thought it's ignored and the kernel changed the CPU internally
> using the cpumask.  But it may be transparent to userspace and
> we can think it works as what we expect.
>
> Anyway, the commit dd15480a3d67b9cf ("perf stat: Hide invalid uncore
> event output for aggr mode") added the code and the concern was like
>
>   $ sudo ./perf stat -a --per-core -e power/energy-pkg/ sleep 1
>
> So it should be fine as long as the output remains the same.

Confirmed the output remains the same:
```
$ perf stat -a --per-core -e energy-pkg sleep 1

 Performance counter stats for 'system wide':

S0-D0-C0              1              22.94 Joules energy-pkg

       1.000934566 seconds time elapsed
```

> > Similarly a counter on a core PMU may explicitly not
> > request a CPU be gathered. If the counter on this CPU's value is 0
> > then the counter should be skipped as it wasn't requested. Switch from
> > using the PMU cpumask to that associated with the evsel to support
> > these cases.
>
> Do you mean hybrid PMUs?  I guess they won't open events on not
> supported/requested CPUs in the first place, right?

Right. The notion of uncore on a PMU is not the opposite of the notion
of core, it's all a bit of a muddle because of the kernel PMU drivers.
The previous code always shows 0 when `!pmu->is_uncore` and is_uncore
is set when a PMU has a `/sys/devices/<pmu name>/cpumask` file - core
PMUs should either have no cpumask or a cpus file instead. In general
the evsel cpumask should match the PMU cpumask. The change here is
that we will use the cpumask regardless of the PMU having or not
having the `/sys/devices/<pmu name>/cpumask` file, where not having
the file may reflect hybrid, a single core PMU, a PMU driver bug,
different core PMUs like AMD IBS and ARM SPE, etc. The output change
from this could be that a 0 on a `!pmu->is_uncore` PMU was previously
shown but now it is not. For that to happen the aggregation would need
to skip that CPU and as you say that shouldn't happen.

Thanks,
Ian