[PATCH v1 0/3] Fix and improve __maps__fixup_overlap_and_insert

Ian Rogers posted 3 patches 1 year, 8 months ago
tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
1 file changed, 92 insertions(+), 21 deletions(-)
[PATCH v1 0/3] Fix and improve __maps__fixup_overlap_and_insert
Posted by Ian Rogers 1 year, 8 months ago
Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.

Improve __maps__fixup_overlap_and_insert's performance 21x in the case
of overlapping mmaps. sesse@google.com reported slowness opening
perf.data files from chromium where the files contained a large number
of overlapping mappings. Improve this case primarily by avoiding
unnecessary sorting.

Unscientific timing data processing a perf.data file with overlapping
mmap events from chromium:

Before:
real    0m9.856s
user    0m9.637s
sys     0m0.204s

After:
real    0m0.675s
user    0m0.454s
sys     0m0.196s

Tested with address/leak sanitizer, invariant checks and validating
the before and after output are identical.

Ian Rogers (3):
  perf maps: Fix use after free in __maps__fixup_overlap_and_insert
  perf maps: Reduce sorting for overlapping mappings
  perf maps: Add/use a sorted insert for fixup overlap and insert

 tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
 1 file changed, 92 insertions(+), 21 deletions(-)

-- 
2.45.0.rc1.225.g2a3ae87e7f-goog
Re: [PATCH v1 0/3] Fix and improve __maps__fixup_overlap_and_insert
Posted by Namhyung Kim 1 year, 8 months ago
On Tue, 21 May 2024 09:51:06 -0700, Ian Rogers wrote:

> Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
> 
> Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> of overlapping mmaps. sesse@google.com reported slowness opening
> perf.data files from chromium where the files contained a large number
> of overlapping mappings. Improve this case primarily by avoiding
> unnecessary sorting.
> 
> [...]

Applied to perf-tools-next, thanks!

Best regards,
Namhyung
Re: [PATCH v1 0/3] Fix and improve __maps__fixup_overlap_and_insert
Posted by James Clark 1 year, 8 months ago

On 21/05/2024 17:51, Ian Rogers wrote:
> Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
> 
> Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> of overlapping mmaps. sesse@google.com reported slowness opening
> perf.data files from chromium where the files contained a large number
> of overlapping mappings. Improve this case primarily by avoiding
> unnecessary sorting.
> 
> Unscientific timing data processing a perf.data file with overlapping
> mmap events from chromium:
> 
> Before:
> real    0m9.856s
> user    0m9.637s
> sys     0m0.204s
> 
> After:
> real    0m0.675s
> user    0m0.454s
> sys     0m0.196s
> 
> Tested with address/leak sanitizer, invariant checks and validating
> the before and after output are identical.
> 
> Ian Rogers (3):
>   perf maps: Fix use after free in __maps__fixup_overlap_and_insert
>   perf maps: Reduce sorting for overlapping mappings
>   perf maps: Add/use a sorted insert for fixup overlap and insert
> 
>  tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 92 insertions(+), 21 deletions(-)
> 

Reviewed-by: James Clark <james.clark@arm.com>

I'm wondering if there is any point in the non sorted insert any more?

Maps could be always either sorted by name or sorted by address and
insert() is always a sorted/fixup-overlaps insert depending on the sort
style of the list.
Re: [PATCH v1 0/3] Fix and improve __maps__fixup_overlap_and_insert
Posted by Ian Rogers 1 year, 8 months ago
On Thu, Jun 6, 2024 at 3:56 AM James Clark <james.clark@arm.com> wrote:
>
>
>
> On 21/05/2024 17:51, Ian Rogers wrote:
> > Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
> >
> > Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> > of overlapping mmaps. sesse@google.com reported slowness opening
> > perf.data files from chromium where the files contained a large number
> > of overlapping mappings. Improve this case primarily by avoiding
> > unnecessary sorting.
> >
> > Unscientific timing data processing a perf.data file with overlapping
> > mmap events from chromium:
> >
> > Before:
> > real    0m9.856s
> > user    0m9.637s
> > sys     0m0.204s
> >
> > After:
> > real    0m0.675s
> > user    0m0.454s
> > sys     0m0.196s
> >
> > Tested with address/leak sanitizer, invariant checks and validating
> > the before and after output are identical.
> >
> > Ian Rogers (3):
> >   perf maps: Fix use after free in __maps__fixup_overlap_and_insert
> >   perf maps: Reduce sorting for overlapping mappings
> >   perf maps: Add/use a sorted insert for fixup overlap and insert
> >
> >  tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
> >  1 file changed, 92 insertions(+), 21 deletions(-)
> >
>
> Reviewed-by: James Clark <james.clark@arm.com>
>
> I'm wondering if there is any point in the non sorted insert any more?
>
> Maps could be always either sorted by name or sorted by address and
> insert() is always a sorted/fixup-overlaps insert depending on the sort
> style of the list.

One of the motivations for the sorted array, instead of an rbtree, was
to simplify reference count checking. We're in much better shape with
that these days, I think the worst "memory leak" is the dsos holding
onto a dso and its symbols indefinitely, instead of removing older
unused dsos. It'd be hard to go back to an rb tree with reference
counting.

For the rbtree insert it was O(log N), the sorted insert these changes
add is O(N) and the regular insert without sorting is O(1). A common
case from scanning /proc/pid/maps is that when map entries are
inserted they are inserted in order. The O(1) insert checks that and
maintains that the maps are sorted.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/maps.c?h=perf-tools-next#n469
For the synthesized maps we should be able to insert N map entries in
O(N). The sorted insert would be similar as the memmoves would always
copy 0 elements. So I can see an argument for keeping the array always
sorted. For the perf.data files we commonly process synthesized mmaps
dominate. In the case for this patch, JIT data dominated, with
frequent overlapping mapping changes. I guess the biggest hurdle to
just always keeping things sorted is the mental hurdle of ignoring the
worst insert complexity, which should never apply given how the maps
are commonly built.

Thanks,
Ian
Re: [PATCH v1 0/3] Fix and improve __maps__fixup_overlap_and_insert
Posted by Ian Rogers 1 year, 8 months ago
On Tue, May 21, 2024 at 9:51 AM Ian Rogers <irogers@google.com> wrote:
>
> Fix latent unlikely bugs in __maps__fixup_overlap_and_insert.
>
> Improve __maps__fixup_overlap_and_insert's performance 21x in the case
> of overlapping mmaps. sesse@google.com reported slowness opening
> perf.data files from chromium where the files contained a large number
> of overlapping mappings. Improve this case primarily by avoiding
> unnecessary sorting.
>
> Unscientific timing data processing a perf.data file with overlapping
> mmap events from chromium:
>
> Before:
> real    0m9.856s
> user    0m9.637s
> sys     0m0.204s
>
> After:
> real    0m0.675s
> user    0m0.454s
> sys     0m0.196s
>
> Tested with address/leak sanitizer, invariant checks and validating
> the before and after output are identical.
>
> Ian Rogers (3):
>   perf maps: Fix use after free in __maps__fixup_overlap_and_insert
>   perf maps: Reduce sorting for overlapping mappings
>   perf maps: Add/use a sorted insert for fixup overlap and insert

Ping. Thanks,
Ian

>  tools/perf/util/maps.c | 113 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 92 insertions(+), 21 deletions(-)
>
> --
> 2.45.0.rc1.225.g2a3ae87e7f-goog
>