[v3] perf: Default use of build IDs and improvements

[PATCH v3 5/9] perf build-id: Mark DSO in sample callchains

Posted by Ian Rogers 9 months, 2 weeks ago

Previously only the sample IP's map DSO would be marked hit for the
purposes of populating the build ID cache. Walk the call chain to mark
all IPs and DSOs.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/build-id.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index aa35dceace90..3386fa8e1e7e 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -42,10 +42,20 @@
 
 static bool no_buildid_cache;
 
+static int mark_dso_hit_callback(struct callchain_cursor_node *node, void *data __maybe_unused)
+{
+	struct map *map = node->ms.map;
+
+	if (map)
+		dso__set_hit(map__dso(map));
+
+	return 0;
+}
+
 int build_id__mark_dso_hit(const struct perf_tool *tool __maybe_unused,
 			   union perf_event *event,
 			   struct perf_sample *sample,
-			   struct evsel *evsel __maybe_unused,
+			   struct evsel *evsel,
 			   struct machine *machine)
 {
 	struct addr_location al;
@@ -63,6 +73,11 @@ int build_id__mark_dso_hit(const struct perf_tool *tool __maybe_unused,
 		dso__set_hit(map__dso(al.map));
 
 	addr_location__exit(&al);
+
+	sample__for_each_callchain_node(thread, evsel, sample, PERF_MAX_STACK_DEPTH,
+					/*symbols=*/false, mark_dso_hit_callback, /*data=*/NULL);
+
+
 	thread__put(thread);
 	return 0;
 }
-- 
2.49.0.901.g37484f566f-goog

Re: [PATCH v3 5/9] perf build-id: Mark DSO in sample callchains

Posted by Namhyung Kim 8 months, 2 weeks ago

On Mon, Apr 28, 2025 at 02:34:04PM -0700, Ian Rogers wrote:
> Previously only the sample IP's map DSO would be marked hit for the
> purposes of populating the build ID cache. Walk the call chain to mark
> all IPs and DSOs.

I think this is correct, but I'm afraid it'd also increase the processing
time.  Do you happen to have any numbers?

Thanks,
Namhyung

> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/build-id.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index aa35dceace90..3386fa8e1e7e 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -42,10 +42,20 @@
>  
>  static bool no_buildid_cache;
>  
> +static int mark_dso_hit_callback(struct callchain_cursor_node *node, void *data __maybe_unused)
> +{
> +	struct map *map = node->ms.map;
> +
> +	if (map)
> +		dso__set_hit(map__dso(map));
> +
> +	return 0;
> +}
> +
>  int build_id__mark_dso_hit(const struct perf_tool *tool __maybe_unused,
>  			   union perf_event *event,
>  			   struct perf_sample *sample,
> -			   struct evsel *evsel __maybe_unused,
> +			   struct evsel *evsel,
>  			   struct machine *machine)
>  {
>  	struct addr_location al;
> @@ -63,6 +73,11 @@ int build_id__mark_dso_hit(const struct perf_tool *tool __maybe_unused,
>  		dso__set_hit(map__dso(al.map));
>  
>  	addr_location__exit(&al);
> +
> +	sample__for_each_callchain_node(thread, evsel, sample, PERF_MAX_STACK_DEPTH,
> +					/*symbols=*/false, mark_dso_hit_callback, /*data=*/NULL);
> +
> +
>  	thread__put(thread);
>  	return 0;
>  }
> -- 
> 2.49.0.901.g37484f566f-goog
>

Re: [PATCH v3 5/9] perf build-id: Mark DSO in sample callchains

Posted by Ian Rogers 8 months, 2 weeks ago

On Wed, May 28, 2025 at 1:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Apr 28, 2025 at 02:34:04PM -0700, Ian Rogers wrote:
> > Previously only the sample IP's map DSO would be marked hit for the
> > purposes of populating the build ID cache. Walk the call chain to mark
> > all IPs and DSOs.
>
> I think this is correct, but I'm afraid it'd also increase the processing
> time.  Do you happen to have any numbers?

It increases time spent processing the data file but to get a large
data file I had to run for multiple seconds and I struggled to get the
performance cost of this to be in the milliseconds (ie a tiny fraction
of the record time). Ultimately I found the change imperceptible and
couldn't think of a good command line to make it perceptible.

If the time is spent populating ~/.debug because more DSOs are marked
then this is fixing a bug and isn't a problem with the patch.

My personal opinion is that it is somewhat surprising `perf record` is
post-processing the perf.data file at all, and -B and -N would be my
expected defaults - just as --buildid-mmap implies --no-buildid (-B).
I didn't want to modify existing behaviors in these changes, however,
in this case I was just trying to make the existing behavior correct,
similar to fixing the same bug in `perf inject`.

Thanks,
Ian

> Thanks,
> Namhyung
>
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/util/build-id.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index aa35dceace90..3386fa8e1e7e 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -42,10 +42,20 @@
> >
> >  static bool no_buildid_cache;
> >
> > +static int mark_dso_hit_callback(struct callchain_cursor_node *node, void *data __maybe_unused)
> > +{
> > +     struct map *map = node->ms.map;
> > +
> > +     if (map)
> > +             dso__set_hit(map__dso(map));
> > +
> > +     return 0;
> > +}
> > +
> >  int build_id__mark_dso_hit(const struct perf_tool *tool __maybe_unused,
> >                          union perf_event *event,
> >                          struct perf_sample *sample,
> > -                        struct evsel *evsel __maybe_unused,
> > +                        struct evsel *evsel,
> >                          struct machine *machine)
> >  {
> >       struct addr_location al;
> > @@ -63,6 +73,11 @@ int build_id__mark_dso_hit(const struct perf_tool *tool __maybe_unused,
> >               dso__set_hit(map__dso(al.map));
> >
> >       addr_location__exit(&al);
> > +
> > +     sample__for_each_callchain_node(thread, evsel, sample, PERF_MAX_STACK_DEPTH,
> > +                                     /*symbols=*/false, mark_dso_hit_callback, /*data=*/NULL);
> > +
> > +
> >       thread__put(thread);
> >       return 0;
> >  }
> > --
> > 2.49.0.901.g37484f566f-goog
> >

Re: [PATCH v3 5/9] perf build-id: Mark DSO in sample callchains

Posted by Namhyung Kim 8 months, 2 weeks ago

On Wed, May 28, 2025 at 01:54:41PM -0700, Ian Rogers wrote:
> On Wed, May 28, 2025 at 1:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Apr 28, 2025 at 02:34:04PM -0700, Ian Rogers wrote:
> > > Previously only the sample IP's map DSO would be marked hit for the
> > > purposes of populating the build ID cache. Walk the call chain to mark
> > > all IPs and DSOs.
> >
> > I think this is correct, but I'm afraid it'd also increase the processing
> > time.  Do you happen to have any numbers?
> 
> It increases time spent processing the data file but to get a large
> data file I had to run for multiple seconds and I struggled to get the
> performance cost of this to be in the milliseconds (ie a tiny fraction
> of the record time). Ultimately I found the change imperceptible and
> couldn't think of a good command line to make it perceptible.

The worst case would be dwarf unwinding.  Maybe we can skip the
processing if it takes too long..

> 
> If the time is spent populating ~/.debug because more DSOs are marked
> then this is fixing a bug and isn't a problem with the patch.

Right, it's a good thing.

> 
> My personal opinion is that it is somewhat surprising `perf record` is
> post-processing the perf.data file at all, and -B and -N would be my
> expected defaults - just as --buildid-mmap implies --no-buildid (-B).

Otherwise nobody will run perf buildid-cache to add the info. :)


> I didn't want to modify existing behaviors in these changes, however,
> in this case I was just trying to make the existing behavior correct,
> similar to fixing the same bug in `perf inject`.

Thanks for your work,
Namhyung

Re: [PATCH v3 5/9] perf build-id: Mark DSO in sample callchains

Posted by Ian Rogers 8 months, 2 weeks ago

On Wed, May 28, 2025 at 3:16 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, May 28, 2025 at 01:54:41PM -0700, Ian Rogers wrote:
> > On Wed, May 28, 2025 at 1:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Mon, Apr 28, 2025 at 02:34:04PM -0700, Ian Rogers wrote:
> > > > Previously only the sample IP's map DSO would be marked hit for the
> > > > purposes of populating the build ID cache. Walk the call chain to mark
> > > > all IPs and DSOs.
> > >
> > > I think this is correct, but I'm afraid it'd also increase the processing
> > > time.  Do you happen to have any numbers?
> >
> > It increases time spent processing the data file but to get a large
> > data file I had to run for multiple seconds and I struggled to get the
> > performance cost of this to be in the milliseconds (ie a tiny fraction
> > of the record time). Ultimately I found the change imperceptible and
> > couldn't think of a good command line to make it perceptible.
>
> The worst case would be dwarf unwinding.  Maybe we can skip the
> processing if it takes too long..

This doesn't sound unreasonable but is somewhat beyond the scope of
what I wanted to do here, which relates to migrating from inodes to
buildids as identifiers for DSOs. It would be useful to get a bug
report on this being too slow.

> >
> > If the time is spent populating ~/.debug because more DSOs are marked
> > then this is fixing a bug and isn't a problem with the patch.
>
> Right, it's a good thing.
>
> >
> > My personal opinion is that it is somewhat surprising `perf record` is
> > post-processing the perf.data file at all, and -B and -N would be my
> > expected defaults - just as --buildid-mmap implies --no-buildid (-B).
>
> Otherwise nobody will run perf buildid-cache to add the info. :)

Right, but we know it is high overhead as we run a number of tests with -B:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record_sideband.sh?h=perf-tools-next#n25
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_arm_spe.sh?h=perf-tools-next#n94

I agree that populating the buildid cache immediately after perf
record minimizes a gap where DSOs may change prior to perf report. If
the DSO changes midway through perf record it doesn't help as the
buildid information is only computed at the end of perf record.

Isn't there an argument that because we have build IDs we don't care
about the record to report race any more as debuginfod can find the
prior version by means of the build ID? What I mean:

Current perf default way:
1) Run perf record with mmap2 inode rather the buildid data
1.1) To avoid any DSOs in the perf.data from not being available
populate the buildid header in the perf.data and the buildid cache at
the end of perf record
2) Replace some DSO that's got a sample in the perf.data with a different DSO
3) Run perf report
3.1) The DSO's buildid is known via the header and the buildid cache
already contains the DSO

Current way with -N:
1) Run perf record with mmap2 inode rather the buildid data
1.1) To ensure DSOs have buildid data populate the buildid header in
the perf.data  at the end of perf record
2) Replace some DSO that's got a sample in the perf.data with a different DSO
3) Run perf report
3.1) The DSO's buildid is known via the header, debuginfod can
populate the buildid cache as needed

With --buildid-mmap:
1) Run perf record with mmap2 buildid data
1.1) No need to post process file to gather buildids
2) Replace some DSO that's got a sample in the perf.data with a different DSO
3) Run perf report
3.1) The DSO's buildid is known via the mmap2 events, debuginfod can
populate the buildid cache as needed

With the current way and the current way with -N there is a race with
the DSO changing midway through perf record. The buildid mmap closes
the race.
With -N and --buildid-mmap the buildid cache is populated based on use
by a tool trying to read the DSO, rather than ahead of time at the end
of perf record.

Is the risk of the race that much of an issue? I'm not sure, that's
why I'd say default to using -B (if we didn't switch to buildid mmaps
by default, but this series does that). You could opt into to covering
the race by adding a flag so the data is processed at the end of perf
record. You could use perf inject to add the build IDs.

As you say there's the cost at the end of perf record and I'm not sure
it is worth it, which is why I'd expect the default to be to opt into
having the cost. With --buildid-mmap I'm not seeing a race to cover
but this series populates the buildid cache as I know you've argued
that perf record should do this by default.

Thanks,
Ian

> > I didn't want to modify existing behaviors in these changes, however,
> > in this case I was just trying to make the existing behavior correct,
> > similar to fixing the same bug in `perf inject`.
>
> Thanks for your work,
> Namhyung
>

Re: [PATCH v3 5/9] perf build-id: Mark DSO in sample callchains

Posted by Namhyung Kim 8 months, 2 weeks ago

On Wed, May 28, 2025 at 04:11:13PM -0700, Ian Rogers wrote:
> On Wed, May 28, 2025 at 3:16 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, May 28, 2025 at 01:54:41PM -0700, Ian Rogers wrote:
> > > On Wed, May 28, 2025 at 1:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Mon, Apr 28, 2025 at 02:34:04PM -0700, Ian Rogers wrote:
> > > > > Previously only the sample IP's map DSO would be marked hit for the
> > > > > purposes of populating the build ID cache. Walk the call chain to mark
> > > > > all IPs and DSOs.
> > > >
> > > > I think this is correct, but I'm afraid it'd also increase the processing
> > > > time.  Do you happen to have any numbers?
> > >
> > > It increases time spent processing the data file but to get a large
> > > data file I had to run for multiple seconds and I struggled to get the
> > > performance cost of this to be in the milliseconds (ie a tiny fraction
> > > of the record time). Ultimately I found the change imperceptible and
> > > couldn't think of a good command line to make it perceptible.
> >
> > The worst case would be dwarf unwinding.  Maybe we can skip the
> > processing if it takes too long..
> 
> This doesn't sound unreasonable but is somewhat beyond the scope of
> what I wanted to do here, which relates to migrating from inodes to
> buildids as identifiers for DSOs. It would be useful to get a bug
> report on this being too slow.
> 
> > >
> > > If the time is spent populating ~/.debug because more DSOs are marked
> > > then this is fixing a bug and isn't a problem with the patch.
> >
> > Right, it's a good thing.
> >
> > >
> > > My personal opinion is that it is somewhat surprising `perf record` is
> > > post-processing the perf.data file at all, and -B and -N would be my
> > > expected defaults - just as --buildid-mmap implies --no-buildid (-B).
> >
> > Otherwise nobody will run perf buildid-cache to add the info. :)
> 
> Right, but we know it is high overhead as we run a number of tests with -B:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record_sideband.sh?h=perf-tools-next#n25
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_arm_spe.sh?h=perf-tools-next#n94
> 
> I agree that populating the buildid cache immediately after perf
> record minimizes a gap where DSOs may change prior to perf report. If
> the DSO changes midway through perf record it doesn't help as the
> buildid information is only computed at the end of perf record.
> 
> Isn't there an argument that because we have build IDs we don't care
> about the record to report race any more as debuginfod can find the
> prior version by means of the build ID? What I mean:
> 
> Current perf default way:
> 1) Run perf record with mmap2 inode rather the buildid data
> 1.1) To avoid any DSOs in the perf.data from not being available
> populate the buildid header in the perf.data and the buildid cache at
> the end of perf record
> 2) Replace some DSO that's got a sample in the perf.data with a different DSO
> 3) Run perf report
> 3.1) The DSO's buildid is known via the header and the buildid cache
> already contains the DSO
> 
> Current way with -N:
> 1) Run perf record with mmap2 inode rather the buildid data
> 1.1) To ensure DSOs have buildid data populate the buildid header in
> the perf.data  at the end of perf record
> 2) Replace some DSO that's got a sample in the perf.data with a different DSO
> 3) Run perf report
> 3.1) The DSO's buildid is known via the header, debuginfod can
> populate the buildid cache as needed
> 
> With --buildid-mmap:
> 1) Run perf record with mmap2 buildid data
> 1.1) No need to post process file to gather buildids
> 2) Replace some DSO that's got a sample in the perf.data with a different DSO
> 3) Run perf report
> 3.1) The DSO's buildid is known via the mmap2 events, debuginfod can
> populate the buildid cache as needed
> 
> With the current way and the current way with -N there is a race with
> the DSO changing midway through perf record. The buildid mmap closes
> the race.
> With -N and --buildid-mmap the buildid cache is populated based on use
> by a tool trying to read the DSO, rather than ahead of time at the end
> of perf record.
> 
> Is the risk of the race that much of an issue? I'm not sure, that's
> why I'd say default to using -B (if we didn't switch to buildid mmaps
> by default, but this series does that). You could opt into to covering
> the race by adding a flag so the data is processed at the end of perf
> record. You could use perf inject to add the build IDs.
> 
> As you say there's the cost at the end of perf record and I'm not sure
> it is worth it, which is why I'd expect the default to be to opt into
> having the cost. With --buildid-mmap I'm not seeing a race to cover
> but this series populates the buildid cache as I know you've argued
> that perf record should do this by default.

Yeah, I just wanted to keep the original behavior to save the used
binaries in the build-ID cache.  But we can use debuginfod if it's
available as you said.

I think it's a separate topic whether we prefer the build-ID cache or
debuginfod.  We should be able to change the default behavior with a
config option later.  So let's keep the caching for now.

Thanks,
Namhyung