[RFC PATCH v8 1/7] perf Document: Add TPEBS to Documents

weilin.wang@intel.com posted 7 patches 1 year, 9 months ago
There is a newer version of this series
[RFC PATCH v8 1/7] perf Document: Add TPEBS to Documents
Posted by weilin.wang@intel.com 1 year, 9 months ago
From: Weilin Wang <weilin.wang@intel.com>

TPEBS is a new feature in next generation of Intel PMU. It will be used in new
TMA releases. Adding related introduction to documents while adding new code to
support it in perf stat.

Signed-off-by: Weilin Wang <weilin.wang@intel.com>
---
 tools/perf/Documentation/perf-list.txt |  1 +
 tools/perf/Documentation/topdown.txt   | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index 6bf2468f59d3..dea005410ec0 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -72,6 +72,7 @@ counted. The following modifiers exist:
  W - group is weak and will fallback to non-group if not schedulable,
  e - group or event are exclusive and do not share the PMU
  b - use BPF aggregration (see perf stat --bpf-counters)
+ R - retire latency value of the event
 
 The 'p' modifier can be used for specifying how precise the instruction
 address should be. The 'p' modifier can be specified multiple times:
diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
index ae0aee86844f..e6c4424e8bf2 100644
--- a/tools/perf/Documentation/topdown.txt
+++ b/tools/perf/Documentation/topdown.txt
@@ -325,6 +325,24 @@ other four level 2 metrics by subtracting corresponding metrics as below.
     Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
     Core_Bound = Backend_Bound - Memory_Bound
 
+TPEBS in TopDown
+================
+
+TPEBS is one of the features provided by the next generation of Intel PMU. The
+TPEBS feature adds a 16 bit retire latency field in the Basic Info group of the
+PEBS record. It records the Core cycles since the retirement of the previous
+instruction to the retirement of current instruction. Please refer to Section
+8.4.1 of "Intel® Architecture Instruction Set Extensions Programming Reference"
+for more details about this feature.
+
+In the most recent release of TMA, the metrics begin to use event retire_latency
+values in some of the metrics’ formulas on processors that support TPEBS feature.
+For previous generations that do not support TPEBS, the values are static and
+predefined per processor family by the hardware architects. Due to the diversity
+of workloads in execution environments, retire latency values measured at real
+time are more accurate. Therefore, new TMA metrics that use TPEBS will provide
+more accurate performance analysis results.
+
 
 [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
 [2] https://sites.google.com/site/analysismethods/yasin-pubs
-- 
2.43.0

Re: [RFC PATCH v8 1/7] perf Document: Add TPEBS to Documents
Posted by Ian Rogers 1 year, 9 months ago
On Tue, May 14, 2024 at 10:44 PM <weilin.wang@intel.com> wrote:
>
> From: Weilin Wang <weilin.wang@intel.com>
>
> TPEBS is a new feature in next generation of Intel PMU. It will be used in new
> TMA releases. Adding related introduction to documents while adding new code to
> support it in perf stat.
>
> Signed-off-by: Weilin Wang <weilin.wang@intel.com>
> ---
>  tools/perf/Documentation/perf-list.txt |  1 +
>  tools/perf/Documentation/topdown.txt   | 18 ++++++++++++++++++
>  2 files changed, 19 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
> index 6bf2468f59d3..dea005410ec0 100644
> --- a/tools/perf/Documentation/perf-list.txt
> +++ b/tools/perf/Documentation/perf-list.txt
> @@ -72,6 +72,7 @@ counted. The following modifiers exist:
>   W - group is weak and will fallback to non-group if not schedulable,
>   e - group or event are exclusive and do not share the PMU
>   b - use BPF aggregration (see perf stat --bpf-counters)
> + R - retire latency value of the event
>
>  The 'p' modifier can be used for specifying how precise the instruction
>  address should be. The 'p' modifier can be specified multiple times:
> diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
> index ae0aee86844f..e6c4424e8bf2 100644
> --- a/tools/perf/Documentation/topdown.txt
> +++ b/tools/perf/Documentation/topdown.txt
> @@ -325,6 +325,24 @@ other four level 2 metrics by subtracting corresponding metrics as below.
>      Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
>      Core_Bound = Backend_Bound - Memory_Bound
>
> +TPEBS in TopDown
> +================
> +
> +TPEBS is one of the features provided by the next generation of Intel PMU. The

As this documentation will live a while "next generation" could become
ambiguous. I think it would be better to mention core ultra or some
other term to more specifically describe which PMUs have TPEBS.

> +TPEBS feature adds a 16 bit retire latency field in the Basic Info group of the
> +PEBS record. It records the Core cycles since the retirement of the previous
> +instruction to the retirement of current instruction. Please refer to Section
> +8.4.1 of "Intel® Architecture Instruction Set Extensions Programming Reference"
> +for more details about this feature.

Perhaps capture that this is placed in the perf event sample in the
weights section as TPEBS isn't exposed except within the kernel PMU
driver.

> +
> +In the most recent release of TMA, the metrics begin to use event retire_latency
> +values in some of the metrics’ formulas on processors that support TPEBS feature.
> +For previous generations that do not support TPEBS, the values are static and
> +predefined per processor family by the hardware architects. Due to the diversity
> +of workloads in execution environments, retire latency values measured at real
> +time are more accurate. Therefore, new TMA metrics that use TPEBS will provide
> +more accurate performance analysis results.

Do you want to capture what the value will be when there hasn't been a
sample? This corner case could be considered broken in the new
approach.

Thanks,
Ian

> +
>
>  [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
>  [2] https://sites.google.com/site/analysismethods/yasin-pubs
> --
> 2.43.0
>
RE: [RFC PATCH v8 1/7] perf Document: Add TPEBS to Documents
Posted by Wang, Weilin 1 year, 9 months ago

> -----Original Message-----
> From: Ian Rogers <irogers@google.com>
> Sent: Thursday, May 16, 2024 9:11 AM
> To: Wang, Weilin <weilin.wang@intel.com>
> Cc: Namhyung Kim <namhyung@kernel.org>; Arnaldo Carvalho de Melo
> <acme@kernel.org>; Peter Zijlstra <peterz@infradead.org>; Ingo Molnar
> <mingo@redhat.com>; Alexander Shishkin
> <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@kernel.org>; Hunter,
> Adrian <adrian.hunter@intel.com>; Kan Liang <kan.liang@linux.intel.com>;
> linux-perf-users@vger.kernel.org; linux-kernel@vger.kernel.org; Taylor, Perry
> <perry.taylor@intel.com>; Alt, Samantha <samantha.alt@intel.com>; Biggers,
> Caleb <caleb.biggers@intel.com>
> Subject: Re: [RFC PATCH v8 1/7] perf Document: Add TPEBS to Documents
> 
> On Tue, May 14, 2024 at 10:44 PM <weilin.wang@intel.com> wrote:
> >
> > From: Weilin Wang <weilin.wang@intel.com>
> >
> > TPEBS is a new feature in next generation of Intel PMU. It will be used in new
> > TMA releases. Adding related introduction to documents while adding new
> code to
> > support it in perf stat.
> >
> > Signed-off-by: Weilin Wang <weilin.wang@intel.com>
> > ---
> >  tools/perf/Documentation/perf-list.txt |  1 +
> >  tools/perf/Documentation/topdown.txt   | 18 ++++++++++++++++++
> >  2 files changed, 19 insertions(+)
> >
> > diff --git a/tools/perf/Documentation/perf-list.txt
> b/tools/perf/Documentation/perf-list.txt
> > index 6bf2468f59d3..dea005410ec0 100644
> > --- a/tools/perf/Documentation/perf-list.txt
> > +++ b/tools/perf/Documentation/perf-list.txt
> > @@ -72,6 +72,7 @@ counted. The following modifiers exist:
> >   W - group is weak and will fallback to non-group if not schedulable,
> >   e - group or event are exclusive and do not share the PMU
> >   b - use BPF aggregration (see perf stat --bpf-counters)
> > + R - retire latency value of the event
> >
> >  The 'p' modifier can be used for specifying how precise the instruction
> >  address should be. The 'p' modifier can be specified multiple times:
> > diff --git a/tools/perf/Documentation/topdown.txt
> b/tools/perf/Documentation/topdown.txt
> > index ae0aee86844f..e6c4424e8bf2 100644
> > --- a/tools/perf/Documentation/topdown.txt
> > +++ b/tools/perf/Documentation/topdown.txt
> > @@ -325,6 +325,24 @@ other four level 2 metrics by subtracting
> corresponding metrics as below.
> >      Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
> >      Core_Bound = Backend_Bound - Memory_Bound
> >
> > +TPEBS in TopDown
> > +================
> > +
> > +TPEBS is one of the features provided by the next generation of Intel PMU.
> The
> 
> As this documentation will live a while "next generation" could become
> ambiguous. I think it would be better to mention core ultra or some
> other term to more specifically describe which PMUs have TPEBS.

Hi Ian, 

Yes, you are right, I will update it. 
> 
> > +TPEBS feature adds a 16 bit retire latency field in the Basic Info group of the
> > +PEBS record. It records the Core cycles since the retirement of the previous
> > +instruction to the retirement of current instruction. Please refer to Section
> > +8.4.1 of "Intel® Architecture Instruction Set Extensions Programming
> Reference"
> > +for more details about this feature.
> 
> Perhaps capture that this is placed in the perf event sample in the
> weights section as TPEBS isn't exposed except within the kernel PMU
> driver.
> 
> > +
> > +In the most recent release of TMA, the metrics begin to use event
> retire_latency
> > +values in some of the metrics’ formulas on processors that support TPEBS
> feature.
> > +For previous generations that do not support TPEBS, the values are static
> and
> > +predefined per processor family by the hardware architects. Due to the
> diversity
> > +of workloads in execution environments, retire latency values measured at
> real
> > +time are more accurate. Therefore, new TMA metrics that use TPEBS will
> provide
> > +more accurate performance analysis results.
> 
> Do you want to capture what the value will be when there hasn't been a
> sample? This corner case could be considered broken in the new
> approach.


When there is no sample, we should expect it to be 0 or use default value. I will 
add this information here. I don’t think this is broken from the approach's perspective. 
We do need to add code to return this value when there is no sample.

Thanks, 
Weilin

> Thanks,
> Ian
> 
> > +
> >
> >  [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-
> method-win
> >  [2] https://sites.google.com/site/analysismethods/yasin-pubs
> > --
> > 2.43.0
> >
Re: [RFC PATCH v8 1/7] perf Document: Add TPEBS to Documents
Posted by Namhyung Kim 1 year, 8 months ago
Hello,

On Thu, May 16, 2024 at 10:37 AM Wang, Weilin <weilin.wang@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Ian Rogers <irogers@google.com>
> > Sent: Thursday, May 16, 2024 9:11 AM
> > To: Wang, Weilin <weilin.wang@intel.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>; Arnaldo Carvalho de Melo
> > <acme@kernel.org>; Peter Zijlstra <peterz@infradead.org>; Ingo Molnar
> > <mingo@redhat.com>; Alexander Shishkin
> > <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@kernel.org>; Hunter,
> > Adrian <adrian.hunter@intel.com>; Kan Liang <kan.liang@linux.intel.com>;
> > linux-perf-users@vger.kernel.org; linux-kernel@vger.kernel.org; Taylor, Perry
> > <perry.taylor@intel.com>; Alt, Samantha <samantha.alt@intel.com>; Biggers,
> > Caleb <caleb.biggers@intel.com>
> > Subject: Re: [RFC PATCH v8 1/7] perf Document: Add TPEBS to Documents
> >
> > On Tue, May 14, 2024 at 10:44 PM <weilin.wang@intel.com> wrote:
> > >
> > > From: Weilin Wang <weilin.wang@intel.com>
> > >
> > > TPEBS is a new feature in next generation of Intel PMU. It will be used in new
> > > TMA releases. Adding related introduction to documents while adding new
> > code to
> > > support it in perf stat.
> > >
> > > Signed-off-by: Weilin Wang <weilin.wang@intel.com>
> > > ---
> > >  tools/perf/Documentation/perf-list.txt |  1 +
> > >  tools/perf/Documentation/topdown.txt   | 18 ++++++++++++++++++
> > >  2 files changed, 19 insertions(+)
> > >
> > > diff --git a/tools/perf/Documentation/perf-list.txt
> > b/tools/perf/Documentation/perf-list.txt
> > > index 6bf2468f59d3..dea005410ec0 100644
> > > --- a/tools/perf/Documentation/perf-list.txt
> > > +++ b/tools/perf/Documentation/perf-list.txt
> > > @@ -72,6 +72,7 @@ counted. The following modifiers exist:
> > >   W - group is weak and will fallback to non-group if not schedulable,
> > >   e - group or event are exclusive and do not share the PMU
> > >   b - use BPF aggregration (see perf stat --bpf-counters)
> > > + R - retire latency value of the event
> > >
> > >  The 'p' modifier can be used for specifying how precise the instruction
> > >  address should be. The 'p' modifier can be specified multiple times:
> > > diff --git a/tools/perf/Documentation/topdown.txt
> > b/tools/perf/Documentation/topdown.txt
> > > index ae0aee86844f..e6c4424e8bf2 100644
> > > --- a/tools/perf/Documentation/topdown.txt
> > > +++ b/tools/perf/Documentation/topdown.txt
> > > @@ -325,6 +325,24 @@ other four level 2 metrics by subtracting
> > corresponding metrics as below.
> > >      Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
> > >      Core_Bound = Backend_Bound - Memory_Bound
> > >
> > > +TPEBS in TopDown
> > > +================
> > > +
> > > +TPEBS is one of the features provided by the next generation of Intel PMU.
> > The
> >
> > As this documentation will live a while "next generation" could become
> > ambiguous. I think it would be better to mention core ultra or some
> > other term to more specifically describe which PMUs have TPEBS.
>
> Hi Ian,
>
> Yes, you are right, I will update it.

Also I'd be nice if you can tell what 'T' means in TPEBS. :)

> >
> > > +TPEBS feature adds a 16 bit retire latency field in the Basic Info group of the
> > > +PEBS record. It records the Core cycles since the retirement of the previous
> > > +instruction to the retirement of current instruction. Please refer to Section
> > > +8.4.1 of "Intel® Architecture Instruction Set Extensions Programming
> > Reference"
> > > +for more details about this feature.
> >
> > Perhaps capture that this is placed in the perf event sample in the
> > weights section as TPEBS isn't exposed except within the kernel PMU
> > driver.
> >
> > > +
> > > +In the most recent release of TMA, the metrics begin to use event
> > retire_latency
> > > +values in some of the metrics’ formulas on processors that support TPEBS
> > feature.
> > > +For previous generations that do not support TPEBS, the values are static
> > and
> > > +predefined per processor family by the hardware architects. Due to the
> > diversity
> > > +of workloads in execution environments, retire latency values measured at
> > real
> > > +time are more accurate. Therefore, new TMA metrics that use TPEBS will
> > provide
> > > +more accurate performance analysis results.
> >
> > Do you want to capture what the value will be when there hasn't been a
> > sample? This corner case could be considered broken in the new
> > approach.
>
>
> When there is no sample, we should expect it to be 0 or use default value. I will
> add this information here. I don’t think this is broken from the approach's perspective.
> We do need to add code to return this value when there is no sample.

More importantly, I think the documentation should say that the
retire_latency is in PEBS which means it needs samples in the
precise events.  So it would run perf record in the background
for events with retire_latency even if users just want to see the
value of counters or metric in perf stat.

Thanks,
Namhyung