[PATCH v2] perf metrics: Add literal for system TSC frequency

Ian Rogers posted 1 patch 3 years, 11 months ago
tools/perf/tests/expr.c | 15 +++++++++++++
tools/perf/util/expr.c  | 50 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)
[PATCH v2] perf metrics: Add literal for system TSC frequency
Posted by Ian Rogers 3 years, 11 months ago
Such a literal is useful to calculate things like the average frequency
[1]. The TSC frequency isn't exposed by sysfs although some experimental
drivers look to add it [2]. This change computes the value using the
frequency in /proc/cpuinfo which is accurate at least on Intel
processors.

v2. Adds warnings to make clear if things have changed/broken on future
    Intel platforms. It also adds caching and an Intel specific that a
    value is computed.

[1] https://github.com/intel/perfmon-metrics/blob/5ad9ef7056f31075e8178b9f1fb732af183b2c8d/SKX/metrics/perf/skx_metric_perf.json#L11
[2] https://github.com/trailofbits/tsc_freq_khz

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/expr.c | 15 +++++++++++++
 tools/perf/util/expr.c  | 50 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index 5c0032fe93ae..45afe4f24859 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -1,8 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "util/debug.h"
 #include "util/expr.h"
+#include "util/header.h"
 #include "util/smt.h"
 #include "tests.h"
+#include <math.h>
 #include <stdlib.h>
 #include <string.h>
 #include <linux/zalloc.h>
@@ -69,6 +71,11 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
 	double val, num_cpus, num_cores, num_dies, num_packages;
 	int ret;
 	struct expr_parse_ctx *ctx;
+	bool is_intel = false;
+	char buf[128];
+
+	if (!get_cpuid(buf, sizeof(buf)))
+		is_intel = strstr(buf, "Intel") != NULL;
 
 	TEST_ASSERT_EQUAL("ids_union", test_ids_union(), 0);
 
@@ -175,6 +182,14 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
 	if (num_dies) // Some platforms do not have CPU die support, for example s390
 		TEST_ASSERT_VAL("#num_dies >= #num_packages", num_dies >= num_packages);
 
+	if (is_intel) {
+		double system_tsc_freq;
+
+		TEST_ASSERT_VAL("#system_tsc_freq", expr__parse(&system_tsc_freq, ctx,
+								"#system_tsc_freq") == 0);
+		TEST_ASSERT_VAL("!isnan(#system_tsc_freq)", !isnan(system_tsc_freq));
+	}
+
 	/*
 	 * Source count returns the number of events aggregating in a leader
 	 * event including the leader. Check parsing yields an id.
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 675f318ce7c1..f33aeb1e6faa 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -402,6 +402,51 @@ double expr_id_data__source_count(const struct expr_id_data *data)
 	return data->val.source_count;
 }
 
+/*
+ * Derive the TSC frequency in Hz from the /proc/cpuinfo, for example:
+ * ...
+ * model name      : Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
+ * ...
+ * will return 3000000000.
+ */
+static double system_tsc_freq(void)
+{
+	static double result;
+	static bool computed;
+	FILE *cpuinfo;
+	char *line = NULL;
+	size_t len = 0;
+
+	if (computed)
+		return result;
+
+	computed = true;
+	result = NAN;
+	cpuinfo = fopen("/proc/cpuinfo", "r");
+	if (!cpuinfo) {
+		pr_err("Failed to read /proc/cpuinfo for TSC frequency");
+		return NAN;
+	}
+	while (getline(&line, &len, cpuinfo) > 0) {
+		if (!strncmp(line, "model name", 10)) {
+			char *pos = strstr(line + 11, " @ ");
+
+			if (pos && sscanf(pos, " @ %lfGHz", &result) == 1) {
+				result *= 1000000000;
+				goto out;
+			}
+		}
+	}
+
+out:
+	if (isnan(result))
+		pr_err("Failed to find TSC frequency in /proc/cpuinfo");
+
+	free(line);
+	fclose(cpuinfo);
+	return result;
+}
+
 double expr__get_literal(const char *literal)
 {
 	static struct cpu_topology *topology;
@@ -417,6 +462,11 @@ double expr__get_literal(const char *literal)
 		goto out;
 	}
 
+	if (!strcasecmp("#system_tsc_freq", literal)) {
+		result = system_tsc_freq();
+		goto out;
+	}
+
 	/*
 	 * Assume that topology strings are consistent, such as CPUs "0-1"
 	 * wouldn't be listed as "0,1", and so after deduplication the number of
-- 
2.36.1.124.g0e6072fb45-goog
Re: [PATCH v2] perf metrics: Add literal for system TSC frequency
Posted by Peter Zijlstra 3 years, 11 months ago
On Thu, May 26, 2022 at 09:04:07PM -0700, Ian Rogers wrote:
> Such a literal is useful to calculate things like the average frequency
> [1]. The TSC frequency isn't exposed by sysfs although some experimental
> drivers look to add it [2]. This change computes the value using the
> frequency in /proc/cpuinfo which is accurate at least on Intel
> processors.
> 
> v2. Adds warnings to make clear if things have changed/broken on future
>     Intel platforms. It also adds caching and an Intel specific that a
>     value is computed.
> 
> [1] https://github.com/intel/perfmon-metrics/blob/5ad9ef7056f31075e8178b9f1fb732af183b2c8d/SKX/metrics/perf/skx_metric_perf.json#L11
> [2] https://github.com/trailofbits/tsc_freq_khz

This all seems bonghits inspired... and perf actually does expose the
tsc frequency. What do you think is in perf_event_mmap_page::time_* ?

> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/tests/expr.c | 15 +++++++++++++
>  tools/perf/util/expr.c  | 50 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 65 insertions(+)
> 
> diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
> index 5c0032fe93ae..45afe4f24859 100644
> --- a/tools/perf/tests/expr.c
> +++ b/tools/perf/tests/expr.c
> @@ -1,8 +1,10 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include "util/debug.h"
>  #include "util/expr.h"
> +#include "util/header.h"
>  #include "util/smt.h"
>  #include "tests.h"
> +#include <math.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <linux/zalloc.h>
> @@ -69,6 +71,11 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
>  	double val, num_cpus, num_cores, num_dies, num_packages;
>  	int ret;
>  	struct expr_parse_ctx *ctx;
> +	bool is_intel = false;
> +	char buf[128];
> +
> +	if (!get_cpuid(buf, sizeof(buf)))
> +		is_intel = strstr(buf, "Intel") != NULL;
>  
>  	TEST_ASSERT_EQUAL("ids_union", test_ids_union(), 0);
>  
> @@ -175,6 +182,14 @@ static int test__expr(struct test_suite *t __maybe_unused, int subtest __maybe_u
>  	if (num_dies) // Some platforms do not have CPU die support, for example s390
>  		TEST_ASSERT_VAL("#num_dies >= #num_packages", num_dies >= num_packages);
>  
> +	if (is_intel) {
> +		double system_tsc_freq;
> +
> +		TEST_ASSERT_VAL("#system_tsc_freq", expr__parse(&system_tsc_freq, ctx,
> +								"#system_tsc_freq") == 0);
> +		TEST_ASSERT_VAL("!isnan(#system_tsc_freq)", !isnan(system_tsc_freq));
> +	}
> +
>  	/*
>  	 * Source count returns the number of events aggregating in a leader
>  	 * event including the leader. Check parsing yields an id.
> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
> index 675f318ce7c1..f33aeb1e6faa 100644
> --- a/tools/perf/util/expr.c
> +++ b/tools/perf/util/expr.c
> @@ -402,6 +402,51 @@ double expr_id_data__source_count(const struct expr_id_data *data)
>  	return data->val.source_count;
>  }
>  
> +/*
> + * Derive the TSC frequency in Hz from the /proc/cpuinfo, for example:
> + * ...
> + * model name      : Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
> + * ...
> + * will return 3000000000.
> + */
> +static double system_tsc_freq(void)
> +{
> +	static double result;
> +	static bool computed;
> +	FILE *cpuinfo;
> +	char *line = NULL;
> +	size_t len = 0;
> +
> +	if (computed)
> +		return result;
> +
> +	computed = true;
> +	result = NAN;
> +	cpuinfo = fopen("/proc/cpuinfo", "r");
> +	if (!cpuinfo) {
> +		pr_err("Failed to read /proc/cpuinfo for TSC frequency");
> +		return NAN;
> +	}
> +	while (getline(&line, &len, cpuinfo) > 0) {
> +		if (!strncmp(line, "model name", 10)) {
> +			char *pos = strstr(line + 11, " @ ");
> +
> +			if (pos && sscanf(pos, " @ %lfGHz", &result) == 1) {
> +				result *= 1000000000;
> +				goto out;
> +			}
> +		}
> +	}
> +
> +out:
> +	if (isnan(result))
> +		pr_err("Failed to find TSC frequency in /proc/cpuinfo");
> +
> +	free(line);
> +	fclose(cpuinfo);
> +	return result;
> +}
> +
>  double expr__get_literal(const char *literal)
>  {
>  	static struct cpu_topology *topology;
> @@ -417,6 +462,11 @@ double expr__get_literal(const char *literal)
>  		goto out;
>  	}
>  
> +	if (!strcasecmp("#system_tsc_freq", literal)) {
> +		result = system_tsc_freq();
> +		goto out;
> +	}
> +
>  	/*
>  	 * Assume that topology strings are consistent, such as CPUs "0-1"
>  	 * wouldn't be listed as "0,1", and so after deduplication the number of
> -- 
> 2.36.1.124.g0e6072fb45-goog
>
Re: [PATCH v2] perf metrics: Add literal for system TSC frequency
Posted by Andi Kleen 3 years, 11 months ago
On 5/27/2022 2:49 AM, Peter Zijlstra wrote:
> On Thu, May 26, 2022 at 09:04:07PM -0700, Ian Rogers wrote:
>> Such a literal is useful to calculate things like the average frequency
>> [1]. The TSC frequency isn't exposed by sysfs although some experimental
>> drivers look to add it [2]. This change computes the value using the
>> frequency in /proc/cpuinfo which is accurate at least on Intel
>> processors.

It would be better to use CPUID if available which is far more accurate. 
Also I believe newer Intel CPUs drop the frequency in the brand string.

BTW it also has the problem that if perf script is run on some other 
system to compute metrics it won't work.

>
> This all seems bonghits inspired... and perf actually does expose the
> tsc frequency. What do you think is in perf_event_mmap_page::time_* ?


That's not really available to perf stat, which is the primary metrics user.


-Andi
Re: [PATCH v2] perf metrics: Add literal for system TSC frequency
Posted by Peter Zijlstra 3 years, 11 months ago
On Fri, May 27, 2022 at 07:54:19AM -0700, Andi Kleen wrote:

> > This all seems bonghits inspired... and perf actually does expose the
> > tsc frequency. What do you think is in perf_event_mmap_page::time_* ?
> 
> 
> That's not really available to perf stat, which is the primary metrics user.

Why not? You can mmap any perf-fd (even software events) and these
fields should be filled out.

It should work on any x86 CPU that has a TSC. The only caveat is that
the kernel must not have marked the TSC unstable.

It could even work for virt -- all you need is for virt to use
native_sched_clock() instead of the paravirt nonsense.
Re: [PATCH v2] perf metrics: Add literal for system TSC frequency
Posted by Ian Rogers 3 years, 11 months ago
On Sat, May 28, 2022 at 7:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, May 27, 2022 at 07:54:19AM -0700, Andi Kleen wrote:
>
> > > This all seems bonghits inspired... and perf actually does expose the
> > > tsc frequency. What do you think is in perf_event_mmap_page::time_* ?
> >
> >
> > That's not really available to perf stat, which is the primary metrics user.
>
> Why not? You can mmap any perf-fd (even software events) and these
> fields should be filled out.
>
> It should work on any x86 CPU that has a TSC. The only caveat is that
> the kernel must not have marked the TSC unstable.
>
> It could even work for virt -- all you need is for virt to use
> native_sched_clock() instead of the paravirt nonsense.

It will at least fail if inherit is enabled, no?

Thanks,
Ian
Re: [PATCH v2] perf metrics: Add literal for system TSC frequency
Posted by Peter Zijlstra 3 years, 11 months ago
On Sat, May 28, 2022 at 07:50:40AM -0700, Ian Rogers wrote:
> On Sat, May 28, 2022 at 7:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, May 27, 2022 at 07:54:19AM -0700, Andi Kleen wrote:
> >
> > > > This all seems bonghits inspired... and perf actually does expose the
> > > > tsc frequency. What do you think is in perf_event_mmap_page::time_* ?
> > >
> > >
> > > That's not really available to perf stat, which is the primary metrics user.
> >
> > Why not? You can mmap any perf-fd (even software events) and these
> > fields should be filled out.
> >
> > It should work on any x86 CPU that has a TSC. The only caveat is that
> > the kernel must not have marked the TSC unstable.
> >
> > It could even work for virt -- all you need is for virt to use
> > native_sched_clock() instead of the paravirt nonsense.
> 
> It will at least fail if inherit is enabled, no?

For per-task events, yes, I suppose it will. I'd forgotten about that
restriction on perf_mmap() :/
Re: [PATCH v2] perf metrics: Add literal for system TSC frequency
Posted by Ian Rogers 3 years, 11 months ago
On Fri, May 27, 2022 at 7:54 AM Andi Kleen <ak@linux.intel.com> wrote:
>
>
> On 5/27/2022 2:49 AM, Peter Zijlstra wrote:
> > On Thu, May 26, 2022 at 09:04:07PM -0700, Ian Rogers wrote:
> >> Such a literal is useful to calculate things like the average frequency
> >> [1]. The TSC frequency isn't exposed by sysfs although some experimental
> >> drivers look to add it [2]. This change computes the value using the
> >> frequency in /proc/cpuinfo which is accurate at least on Intel
> >> processors.
>
> It would be better to use CPUID if available which is far more accurate.
> Also I believe newer Intel CPUs drop the frequency in the brand string.

But on v1 you also said it was broken for certain architectures. The
point of the tests and warnings is to identify this case.

> BTW it also has the problem that if perf script is run on some other
> system to compute metrics it won't work.

Yep. We don't yet support recording then reporting metrics on a
different system. There would be a bunch to do to support this.

> >
> > This all seems bonghits inspired... and perf actually does expose the
> > tsc frequency. What do you think is in perf_event_mmap_page::time_* ?
>
>
> That's not really available to perf stat, which is the primary metrics user.

So we have 3 approaches:

1) read /proc/cpuinfo: pros: easy in comparison to other approaches.
cons: will break in the future, will need a cross system strategy
2) use cpuid: pros: can be made to work across systems cons: not
available for all Intel architectures
3) mmap: pros: more stable API than cpuinfo vendor string. cons: could
end up with new events purely to gather a TSC frequency and associated
complexity around permissions, etc.

I feel we can bike shed and go between the 3 approaches almost
endlessly. I'd rather get something in and iterate. How the approach
breaks can motivate the next steps and the tests added here will
capture when it does break, not just when a metric is first used. I
mentioned this on V1 and requested feedback, this is just implementing
what I said I'd do to V1.

Thanks,
Ian

> -Andi
>
>
>