[v10] RE: [RFC PATCH v10 0/8] TPEBS counting mode support

RE: [RFC PATCH v10 0/8] TPEBS counting mode support

Posted by Wang, Weilin 1 year, 8 months ago


> -----Original Message-----
> From: Namhyung Kim <namhyung@kernel.org>
> Sent: Friday, May 31, 2024 2:30 PM
> To: Wang, Weilin <weilin.wang@intel.com>
> Cc: Ian Rogers <irogers@google.com>; Arnaldo Carvalho de Melo
> <acme@kernel.org>; Peter Zijlstra <peterz@infradead.org>; Ingo Molnar
> <mingo@redhat.com>; Alexander Shishkin
> <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@kernel.org>; Hunter,
> Adrian <adrian.hunter@intel.com>; Kan Liang <kan.liang@linux.intel.com>;
> linux-perf-users@vger.kernel.org; linux-kernel@vger.kernel.org; Taylor, Perry
> <perry.taylor@intel.com>; Alt, Samantha <samantha.alt@intel.com>; Biggers,
> Caleb <caleb.biggers@intel.com>
> Subject: Re: [RFC PATCH v10 0/8] TPEBS counting mode support
> 
> On Fri, May 31, 2024 at 12:00 AM Wang, Weilin <weilin.wang@intel.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Namhyung Kim <namhyung@kernel.org>
> > > Sent: Thursday, May 30, 2024 11:37 PM
> > > To: Wang, Weilin <weilin.wang@intel.com>
> > > Cc: Ian Rogers <irogers@google.com>; Arnaldo Carvalho de Melo
> > > <acme@kernel.org>; Peter Zijlstra <peterz@infradead.org>; Ingo Molnar
> > > <mingo@redhat.com>; Alexander Shishkin
> > > <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@kernel.org>; Hunter,
> > > Adrian <adrian.hunter@intel.com>; Kan Liang <kan.liang@linux.intel.com>;
> > > linux-perf-users@vger.kernel.org; linux-kernel@vger.kernel.org; Taylor,
> Perry
> > > <perry.taylor@intel.com>; Alt, Samantha <samantha.alt@intel.com>;
> Biggers,
> > > Caleb <caleb.biggers@intel.com>
> > > Subject: Re: [RFC PATCH v10 0/8] TPEBS counting mode support
> > >
> > > Hello,
> > >
> > > On Tue, May 28, 2024 at 11:43 PM <weilin.wang@intel.com> wrote:
> > > >
> > > > From: Weilin Wang <weilin.wang@intel.com>
> > > >
> > > > I have tried not to count retire_latency events but did not succeed.
> > > > In particular, I tried the following methods:
> > > >  - Convert retire_latency event to dummy event in event parser.
> > > >  - Early bail out in evsel__open_cpu() and store_evsel_ids().
> > > >
> > > > The first method fails and causes non-retire_latency events with the same
> > > event
> > > > name return 0 count.
> > > >
> > > > The second method fails and causes all the events in the same group
> > > returning
> > > > "<not counted>" results.
> > >
> > > Can you please describe where it fails?  Is it failing on other events
> > > because the tpebs event is a leader of the group?  I think you wanted
> > > to avoid having it in the leader position.  If we can skip any actual
> > > operations (open/close/enable/disable/read) for the tpebs events, then
> > > it could be fine..
> >
> > It does not fail with the code in this patch set. But if I make it return directly
> > from tpebs_start() in evsel__open_cpu(), it will cause segfault. The segfault is
> > caused by store_evsel_id(). I could add another early return from
> store_evsel_id()
> > if the evsel->retire_lat is true.
> 
> Yeah, I think event:R should not go to kernel from perf stat and you need to
> handle that in the tools.
> 
> >
> > After this change, it will eventually run and give me <not counted> results
> > like below:
> >
> >         <not counted> event1
> >         <not counted> event2
> >         xx event1:R
> >
> > In a different case, it may seem to work (xxxxxx stands for some valid value):
> >
> >         xxxxxxx event1
> >         xxxxxxx event2
> >         xxxxxxx event3
> >         xx event1:R
> >
> > In the first case, the event1, event2 and event1:R are scheduled in the same
> > group. On the other hand, in the second case, event1, event2 and event3
> are
> > in one group, while event1:R is in a different group.
> 
> If you don't open event1:R then the kernel only sees event1 and event2.
> 
> >
> > Based on these two different type of results, I believe the failure happens in
> > the group that include a :R event. I've added the change to
> arch_evlist__cmp()
> > so that a :R event would not be a leader of the group.
> >
> > I think I've made evsel__open_cpu() return before it create fd and make
> > store_evsel_id() not to read and store fd. I'm not sure where I'm missing.
> Please
> > let me know if you have any suggestions.
> 
> As I said, please don't open event1:R (for perf stat) and let tpebs_stop() set
> the value using the data from perf record in background.

I think this is exactly what I'm trying to achieve, not open event1:R for perf stat. 
The problem is that I'm not sure how to do it properly in the code. Please give 
me some suggestion here. 

Thanks,
Weilin

> 
> Thanks,
> Namhyung

Re: [RFC PATCH v10 0/8] TPEBS counting mode support

Posted by Namhyung Kim 1 year, 8 months ago

On Fri, May 31, 2024 at 11:03:25PM +0000, Wang, Weilin wrote:
> 
> > -----Original Message-----
> > From: Namhyung Kim <namhyung@kernel.org>
> > As I said, please don't open event1:R (for perf stat) and let tpebs_stop() set
> > the value using the data from perf record in background.
> 
> I think this is exactly what I'm trying to achieve, not open event1:R for perf stat. 
> The problem is that I'm not sure how to do it properly in the code. Please give 
> me some suggestion here. 

Ok, I think the problem is in the read code.  It requires the number of
entries and the data size to match with what it calculates from the
member events.  It should not count TPEBS events as we don't want to
open them.

Something like below might work (on top of your series).  Probably
libperf should be aware of this..

Thanks,
Namhyung

---8<---

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ac7a98ff6b19..7913db4a99e0 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1559,6 +1559,60 @@ static void evsel__set_count(struct evsel *counter, int cpu_map_idx, int thread,
 	perf_counts__set_loaded(counter->counts, cpu_map_idx, thread, true);
 }
 
+static bool evsel__group_has_tpebs(struct evsel *leader)
+{
+	struct evsel *evsel;
+
+	for_each_group_evsel(evsel, leader) {
+		if (evsel__is_retire_lat(evsel))
+			return true;
+	}
+	return false;
+}
+
+static u64 evsel__group_read_nr_members(struct evsel *leader)
+{
+	u64 nr = leader->core.nr_members;
+	struct evsel *evsel;
+
+	for_each_group_evsel(evsel, leader) {
+		if (evsel__is_retire_lat(evsel))
+			nr--;
+	}
+	return nr;
+}
+
+static u64 evsel__group_read_size(struct evsel *leader)
+{
+	u64 read_format = leader->core.attr.read_format;
+	int entry = sizeof(u64); /* value */
+	int size = 0;
+	int nr = 1;
+
+	if (!evsel__group_has_tpebs(leader))
+		return perf_evsel__read_size(&leader->core);
+
+	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+		size += sizeof(u64);
+
+	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+		size += sizeof(u64);
+
+	if (read_format & PERF_FORMAT_ID)
+		entry += sizeof(u64);
+
+	if (read_format & PERF_FORMAT_LOST)
+		entry += sizeof(u64);
+
+	if (read_format & PERF_FORMAT_GROUP) {
+		nr = evsel__group_read_nr_members(leader);
+		size += sizeof(u64);
+	}
+
+	size += entry * nr;
+	return size;
+}
+
 static int evsel__process_group_data(struct evsel *leader, int cpu_map_idx, int thread, u64 *data)
 {
 	u64 read_format = leader->core.attr.read_format;
@@ -1567,7 +1621,7 @@ static int evsel__process_group_data(struct evsel *leader, int cpu_map_idx, int
 
 	nr = *data++;
 
-	if (nr != (u64) leader->core.nr_members)
+	if (nr != evsel__group_read_nr_members(leader))
 		return -EINVAL;
 
 	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -1597,7 +1651,7 @@ static int evsel__read_group(struct evsel *leader, int cpu_map_idx, int thread)
 {
 	struct perf_stat_evsel *ps = leader->stats;
 	u64 read_format = leader->core.attr.read_format;
-	int size = perf_evsel__read_size(&leader->core);
+	int size = evsel__group_read_size(leader);
 	u64 *data = ps->group_data;
 
 	if (!(read_format & PERF_FORMAT_ID))
@@ -2210,8 +2264,7 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 
 	if (evsel__is_retire_lat(evsel)) {
 		err = tpebs_start(evsel->evlist, cpus);
-		if (err)
-			return err;
+		return err;
 	}
 
 	err = __evsel__prepare_open(evsel, cpus, threads);
diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
index d099fc8080e1..a3857e88af96 100644
--- a/tools/perf/util/intel-tpebs.c
+++ b/tools/perf/util/intel-tpebs.c
@@ -354,10 +354,11 @@ int tpebs_set_evsel(struct evsel *evsel, int cpu_map_idx, int thread)
 	 */
 	if (cpu_map_idx == 0 && thread == 0) {
 	/* Lost precision when casting from double to __u64. Any improvement? */
-		val = t->val;
+		val = t->val * 1000;
 	} else
 		val = 0;
 
+	evsel->scale = 1e-3;
 	count->val = val;
 	return 0;
 }

RE: [RFC PATCH v10 0/8] TPEBS counting mode support

Posted by Wang, Weilin 1 year, 8 months ago


> -----Original Message-----
> From: Namhyung Kim <namhyung@kernel.org>
> Sent: Sunday, June 2, 2024 2:18 PM
> To: Wang, Weilin <weilin.wang@intel.com>
> Cc: Ian Rogers <irogers@google.com>; Arnaldo Carvalho de Melo
> <acme@kernel.org>; Peter Zijlstra <peterz@infradead.org>; Ingo Molnar
> <mingo@redhat.com>; Alexander Shishkin
> <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@kernel.org>; Hunter,
> Adrian <adrian.hunter@intel.com>; Kan Liang <kan.liang@linux.intel.com>;
> linux-perf-users@vger.kernel.org; linux-kernel@vger.kernel.org; Taylor, Perry
> <perry.taylor@intel.com>; Alt, Samantha <samantha.alt@intel.com>; Biggers,
> Caleb <caleb.biggers@intel.com>
> Subject: Re: [RFC PATCH v10 0/8] TPEBS counting mode support
> 
> On Fri, May 31, 2024 at 11:03:25PM +0000, Wang, Weilin wrote:
> >
> > > -----Original Message-----
> > > From: Namhyung Kim <namhyung@kernel.org>
> > > As I said, please don't open event1:R (for perf stat) and let tpebs_stop() set
> > > the value using the data from perf record in background.
> >
> > I think this is exactly what I'm trying to achieve, not open event1:R for perf
> stat.
> > The problem is that I'm not sure how to do it properly in the code. Please
> give
> > me some suggestion here.
> 
> Ok, I think the problem is in the read code.  It requires the number of
> entries and the data size to match with what it calculates from the
> member events.  It should not count TPEBS events as we don't want to
> open them.
> 
> Something like below might work (on top of your series).  Probably
> libperf should be aware of this..
> 
Thanks Namhyung!  I will integrate this patch and test it out. 

Thanks,
Weilin

> Thanks,
> Namhyung
> 
> ---8<---
> 
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index ac7a98ff6b19..7913db4a99e0 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1559,6 +1559,60 @@ static void evsel__set_count(struct evsel
> *counter, int cpu_map_idx, int thread,
>  	perf_counts__set_loaded(counter->counts, cpu_map_idx, thread,
> true);
>  }
> 
> +static bool evsel__group_has_tpebs(struct evsel *leader)
> +{
> +	struct evsel *evsel;
> +
> +	for_each_group_evsel(evsel, leader) {
> +		if (evsel__is_retire_lat(evsel))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static u64 evsel__group_read_nr_members(struct evsel *leader)
> +{
> +	u64 nr = leader->core.nr_members;
> +	struct evsel *evsel;
> +
> +	for_each_group_evsel(evsel, leader) {
> +		if (evsel__is_retire_lat(evsel))
> +			nr--;
> +	}
> +	return nr;
> +}
> +
> +static u64 evsel__group_read_size(struct evsel *leader)
> +{
> +	u64 read_format = leader->core.attr.read_format;
> +	int entry = sizeof(u64); /* value */
> +	int size = 0;
> +	int nr = 1;
> +
> +	if (!evsel__group_has_tpebs(leader))
> +		return perf_evsel__read_size(&leader->core);
> +
> +	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
> +		size += sizeof(u64);
> +
> +	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
> +		size += sizeof(u64);
> +
> +	if (read_format & PERF_FORMAT_ID)
> +		entry += sizeof(u64);
> +
> +	if (read_format & PERF_FORMAT_LOST)
> +		entry += sizeof(u64);
> +
> +	if (read_format & PERF_FORMAT_GROUP) {
> +		nr = evsel__group_read_nr_members(leader);
> +		size += sizeof(u64);
> +	}
> +
> +	size += entry * nr;
> +	return size;
> +}
> +
>  static int evsel__process_group_data(struct evsel *leader, int cpu_map_idx,
> int thread, u64 *data)
>  {
>  	u64 read_format = leader->core.attr.read_format;
> @@ -1567,7 +1621,7 @@ static int evsel__process_group_data(struct evsel
> *leader, int cpu_map_idx, int
> 
>  	nr = *data++;
> 
> -	if (nr != (u64) leader->core.nr_members)
> +	if (nr != evsel__group_read_nr_members(leader))
>  		return -EINVAL;
> 
>  	if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
> @@ -1597,7 +1651,7 @@ static int evsel__read_group(struct evsel *leader,
> int cpu_map_idx, int thread)
>  {
>  	struct perf_stat_evsel *ps = leader->stats;
>  	u64 read_format = leader->core.attr.read_format;
> -	int size = perf_evsel__read_size(&leader->core);
> +	int size = evsel__group_read_size(leader);
>  	u64 *data = ps->group_data;
> 
>  	if (!(read_format & PERF_FORMAT_ID))
> @@ -2210,8 +2264,7 @@ static int evsel__open_cpu(struct evsel *evsel,
> struct perf_cpu_map *cpus,
> 
>  	if (evsel__is_retire_lat(evsel)) {
>  		err = tpebs_start(evsel->evlist, cpus);
> -		if (err)
> -			return err;
> +		return err;
>  	}
> 
>  	err = __evsel__prepare_open(evsel, cpus, threads);
> diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
> index d099fc8080e1..a3857e88af96 100644
> --- a/tools/perf/util/intel-tpebs.c
> +++ b/tools/perf/util/intel-tpebs.c
> @@ -354,10 +354,11 @@ int tpebs_set_evsel(struct evsel *evsel, int
> cpu_map_idx, int thread)
>  	 */
>  	if (cpu_map_idx == 0 && thread == 0) {
>  	/* Lost precision when casting from double to __u64. Any
> improvement? */
> -		val = t->val;
> +		val = t->val * 1000;
>  	} else
>  		val = 0;
> 
> +	evsel->scale = 1e-3;
>  	count->val = val;
>  	return 0;
>  }