[PATCHES 0/8] perf tools: Diagnostic offsets in skip messages + two hardening fixes

Arnaldo Carvalho de Melo posted 8 patches 5 days, 4 hours ago
15 files changed, 261 insertions(+), 101 deletions(-)
[PATCHES 0/8] perf tools: Diagnostic offsets in skip messages + two hardening fixes
Posted by Arnaldo Carvalho de Melo 5 days, 4 hours ago
When perf report, perf sched, or perf timechart skip a malformed or
unprocessable event, the warning message doesn't say where in the
perf.data file the problem occurred.  This makes it hard to
cross-reference with 'perf report -D' output or to locate the
corrupted region with a hex editor.

This series adds a file_offset field to struct perf_sample, set in the
event delivery path (including the deferred callchain path), and
retrofits all skip/stop/error messages to include:

  - The file offset where the event was found
  - The event type name via perf_event__name() with the numeric
    type value in parentheses

For example, instead of:

  problem processing 10 event, skipping it.

a user now sees:

  WARNING: at offset 0x1a3f0: MMAP2 (10) event size 24 too small (min 64), skipping

The peek_event() path, which validates events during initial file
scanning, also gains file offsets in its three warning messages
(misaligned size, unsupported type, undersized event).

Two pre-existing bugs found by sashiko-bot are fixed:

  - builtin-timechart.c cat_backtrace(): use-after-free and
    double-free when an invalid callchain context triggers zfree()
    before fclose() on an open_memstream buffer.  The open_memstream
    contract requires fclose() before the buffer can be freed — see
    open_memstream(3).

  - builtin-sched.c: three BUG_ON(cpu >= MAX_CPUS || cpu < 0)
    that abort perf sched when PERF_SAMPLE_CPU is absent from the
    sample type and the CPU sentinel (u32)-1 is cast to signed -1.
    perf.data is untrusted input — a corrupted or truncated file
    should produce a warning, not an abort.

Arnaldo Carvalho de Melo (8):
  perf sample: Add file_offset field to struct perf_sample
  perf session: Include file offset in event skip/stop messages
  perf sched: Include file offset in event skip messages
  perf timechart: Include file offset in CPU bounds check messages
  perf tools: Include file offset and event type name in skip messages
  perf timechart: Fix cat_backtrace() use-after-free on corrupted callchain
  perf sched: Replace BUG_ON on invalid CPU with graceful skip
  perf test: Add file offset diagnostic test for corrupted perf.data

 15 files changed, 261 insertions(+), 101 deletions(-)

Developed with AI assistance (Claude/sashiko), tagged in commits.

Best regards,

- Arnaldo
Re: [PATCHES 0/8] perf tools: Diagnostic offsets in skip messages + two hardening fixes
Posted by Ian Rogers 4 days, 13 hours ago
On Tue, Jun 2, 2026 at 4:57 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> When perf report, perf sched, or perf timechart skip a malformed or
> unprocessable event, the warning message doesn't say where in the
> perf.data file the problem occurred.  This makes it hard to
> cross-reference with 'perf report -D' output or to locate the
> corrupted region with a hex editor.
>
> This series adds a file_offset field to struct perf_sample, set in the
> event delivery path (including the deferred callchain path), and
> retrofits all skip/stop/error messages to include:
>
>   - The file offset where the event was found
>   - The event type name via perf_event__name() with the numeric
>     type value in parentheses
>
> For example, instead of:
>
>   problem processing 10 event, skipping it.
>
> a user now sees:
>
>   WARNING: at offset 0x1a3f0: MMAP2 (10) event size 24 too small (min 64), skipping
>
> The peek_event() path, which validates events during initial file
> scanning, also gains file offsets in its three warning messages
> (misaligned size, unsupported type, undersized event).
>
> Two pre-existing bugs found by sashiko-bot are fixed:
>
>   - builtin-timechart.c cat_backtrace(): use-after-free and
>     double-free when an invalid callchain context triggers zfree()
>     before fclose() on an open_memstream buffer.  The open_memstream
>     contract requires fclose() before the buffer can be freed — see
>     open_memstream(3).

Fwiw, I've also been around the timechart code prompted by AI review
and also trying to clean up tests with address sanitizer:
https://lore.kernel.org/linux-perf-users/agzWqrn6XPEwTAsb@google.com/

Thanks,
Ian

>   - builtin-sched.c: three BUG_ON(cpu >= MAX_CPUS || cpu < 0)
>     that abort perf sched when PERF_SAMPLE_CPU is absent from the
>     sample type and the CPU sentinel (u32)-1 is cast to signed -1.
>     perf.data is untrusted input — a corrupted or truncated file
>     should produce a warning, not an abort.
>
> Arnaldo Carvalho de Melo (8):
>   perf sample: Add file_offset field to struct perf_sample
>   perf session: Include file offset in event skip/stop messages
>   perf sched: Include file offset in event skip messages
>   perf timechart: Include file offset in CPU bounds check messages
>   perf tools: Include file offset and event type name in skip messages
>   perf timechart: Fix cat_backtrace() use-after-free on corrupted callchain
>   perf sched: Replace BUG_ON on invalid CPU with graceful skip
>   perf test: Add file offset diagnostic test for corrupted perf.data
>
>  15 files changed, 261 insertions(+), 101 deletions(-)
>
> Developed with AI assistance (Claude/sashiko), tagged in commits.
>
> Best regards,
>
> - Arnaldo
Re: [PATCHES 0/8] perf tools: Diagnostic offsets in skip messages + two hardening fixes
Posted by Arnaldo Carvalho de Melo 4 days, 8 hours ago
On Wed, Jun 03, 2026 at 08:06:48AM -0700, Ian Rogers wrote:
> On Tue, Jun 2, 2026 at 4:57 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >
> > When perf report, perf sched, or perf timechart skip a malformed or
> > unprocessable event, the warning message doesn't say where in the
> > perf.data file the problem occurred.  This makes it hard to
> > cross-reference with 'perf report -D' output or to locate the
> > corrupted region with a hex editor.
> >
> > This series adds a file_offset field to struct perf_sample, set in the
> > event delivery path (including the deferred callchain path), and
> > retrofits all skip/stop/error messages to include:
> >
> >   - The file offset where the event was found
> >   - The event type name via perf_event__name() with the numeric
> >     type value in parentheses
> >
> > For example, instead of:
> >
> >   problem processing 10 event, skipping it.
> >
> > a user now sees:
> >
> >   WARNING: at offset 0x1a3f0: MMAP2 (10) event size 24 too small (min 64), skipping
> >
> > The peek_event() path, which validates events during initial file
> > scanning, also gains file offsets in its three warning messages
> > (misaligned size, unsupported type, undersized event).
> >
> > Two pre-existing bugs found by sashiko-bot are fixed:
> >
> >   - builtin-timechart.c cat_backtrace(): use-after-free and
> >     double-free when an invalid callchain context triggers zfree()
> >     before fclose() on an open_memstream buffer.  The open_memstream
> >     contract requires fclose() before the buffer can be freed — see
> >     open_memstream(3).
> 
> Fwiw, I've also been around the timechart code prompted by AI review
> and also trying to clean up tests with address sanitizer:
> https://lore.kernel.org/linux-perf-users/agzWqrn6XPEwTAsb@google.com/

Thanks for all the reviews, I'll merge this series since sashiko found
just one endianess issue with the new 'perf test' entry and the other
comments are for pre-existing problems that we've added to TODO lists,
then you can rebase that timechart leaks on top of it, ok?

- Arnaldo
 
> Thanks,
> Ian
> 
> >   - builtin-sched.c: three BUG_ON(cpu >= MAX_CPUS || cpu < 0)
> >     that abort perf sched when PERF_SAMPLE_CPU is absent from the
> >     sample type and the CPU sentinel (u32)-1 is cast to signed -1.
> >     perf.data is untrusted input — a corrupted or truncated file
> >     should produce a warning, not an abort.
> >
> > Arnaldo Carvalho de Melo (8):
> >   perf sample: Add file_offset field to struct perf_sample
> >   perf session: Include file offset in event skip/stop messages
> >   perf sched: Include file offset in event skip messages
> >   perf timechart: Include file offset in CPU bounds check messages
> >   perf tools: Include file offset and event type name in skip messages
> >   perf timechart: Fix cat_backtrace() use-after-free on corrupted callchain
> >   perf sched: Replace BUG_ON on invalid CPU with graceful skip
> >   perf test: Add file offset diagnostic test for corrupted perf.data
> >
> >  15 files changed, 261 insertions(+), 101 deletions(-)
> >
> > Developed with AI assistance (Claude/sashiko), tagged in commits.
> >
> > Best regards,
> >
> > - Arnaldo
Re: [PATCHES 0/8] perf tools: Diagnostic offsets in skip messages + two hardening fixes
Posted by Ian Rogers 4 days, 8 hours ago
On Wed, Jun 3, 2026 at 12:27 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Wed, Jun 03, 2026 at 08:06:48AM -0700, Ian Rogers wrote:
> > On Tue, Jun 2, 2026 at 4:57 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > >
> > > When perf report, perf sched, or perf timechart skip a malformed or
> > > unprocessable event, the warning message doesn't say where in the
> > > perf.data file the problem occurred.  This makes it hard to
> > > cross-reference with 'perf report -D' output or to locate the
> > > corrupted region with a hex editor.
> > >
> > > This series adds a file_offset field to struct perf_sample, set in the
> > > event delivery path (including the deferred callchain path), and
> > > retrofits all skip/stop/error messages to include:
> > >
> > >   - The file offset where the event was found
> > >   - The event type name via perf_event__name() with the numeric
> > >     type value in parentheses
> > >
> > > For example, instead of:
> > >
> > >   problem processing 10 event, skipping it.
> > >
> > > a user now sees:
> > >
> > >   WARNING: at offset 0x1a3f0: MMAP2 (10) event size 24 too small (min 64), skipping
> > >
> > > The peek_event() path, which validates events during initial file
> > > scanning, also gains file offsets in its three warning messages
> > > (misaligned size, unsupported type, undersized event).
> > >
> > > Two pre-existing bugs found by sashiko-bot are fixed:
> > >
> > >   - builtin-timechart.c cat_backtrace(): use-after-free and
> > >     double-free when an invalid callchain context triggers zfree()
> > >     before fclose() on an open_memstream buffer.  The open_memstream
> > >     contract requires fclose() before the buffer can be freed — see
> > >     open_memstream(3).
> >
> > Fwiw, I've also been around the timechart code prompted by AI review
> > and also trying to clean up tests with address sanitizer:
> > https://lore.kernel.org/linux-perf-users/agzWqrn6XPEwTAsb@google.com/
>
> Thanks for all the reviews, I'll merge this series since sashiko found
> just one endianess issue with the new 'perf test' entry and the other
> comments are for pre-existing problems that we've added to TODO lists,
> then you can rebase that timechart leaks on top of it, ok?

Sure. I thought we merged the timechart issue fix with this:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/tools/perf/builtin-timechart.c?h=perf-tools-next&id=00b36b394c15f625fa166ba3b399cad5bd5065f9
So you may well be fixing other issues I introduced while trying to
implement that fix.

Thanks,
Ian

> - Arnaldo
>
> > Thanks,
> > Ian
> >
> > >   - builtin-sched.c: three BUG_ON(cpu >= MAX_CPUS || cpu < 0)
> > >     that abort perf sched when PERF_SAMPLE_CPU is absent from the
> > >     sample type and the CPU sentinel (u32)-1 is cast to signed -1.
> > >     perf.data is untrusted input — a corrupted or truncated file
> > >     should produce a warning, not an abort.
> > >
> > > Arnaldo Carvalho de Melo (8):
> > >   perf sample: Add file_offset field to struct perf_sample
> > >   perf session: Include file offset in event skip/stop messages
> > >   perf sched: Include file offset in event skip messages
> > >   perf timechart: Include file offset in CPU bounds check messages
> > >   perf tools: Include file offset and event type name in skip messages
> > >   perf timechart: Fix cat_backtrace() use-after-free on corrupted callchain
> > >   perf sched: Replace BUG_ON on invalid CPU with graceful skip
> > >   perf test: Add file offset diagnostic test for corrupted perf.data
> > >
> > >  15 files changed, 261 insertions(+), 101 deletions(-)
> > >
> > > Developed with AI assistance (Claude/sashiko), tagged in commits.
> > >
> > > Best regards,
> > >
> > > - Arnaldo