[PATCH RFC 0/3] perf: add logic to collect off-cpu samples

Ajay Kaher posted 3 patches 1 year, 5 months ago
include/linux/perf_event.h            | 16 ++++++++++++++
include/uapi/linux/perf_event.h       |  3 ++-
kernel/events/core.c                  | 27 ++++++++++++++++++-----
tools/include/uapi/linux/perf_event.h |  3 ++-
tools/perf/builtin-record.c           |  2 ++
tools/perf/util/events_stats.h        |  2 ++
tools/perf/util/evsel.c               |  4 ++++
tools/perf/util/hist.c                | 31 ++++++++++++++++++++++++---
tools/perf/util/hist.h                |  1 +
tools/perf/util/record.h              |  1 +
tools/perf/util/sample.h              |  1 +
11 files changed, 81 insertions(+), 10 deletions(-)
[PATCH RFC 0/3] perf: add logic to collect off-cpu samples
Posted by Ajay Kaher 1 year, 5 months ago
Add --off-cpu-kernel option to capture off-cpu sample alongwith on-cpu
samples.

off-cpu samples represent time spent by task when it was on wait queue
(schedule out to waiting for events, blocked on I/O, locks, timers,
paging/swapping, etc)

Refer following links for more details:
https://lpc.events/event/17/contributions/1556/
https://www.youtube.com/watch?v=sF2faKGRnjs

Ajay Kaher (3):
  perf/core: add logic to collect off-cpu sample
  perf/record: add options --off-cpu-kernel
  perf/report: add off-cpu samples

 include/linux/perf_event.h            | 16 ++++++++++++++
 include/uapi/linux/perf_event.h       |  3 ++-
 kernel/events/core.c                  | 27 ++++++++++++++++++-----
 tools/include/uapi/linux/perf_event.h |  3 ++-
 tools/perf/builtin-record.c           |  2 ++
 tools/perf/util/events_stats.h        |  2 ++
 tools/perf/util/evsel.c               |  4 ++++
 tools/perf/util/hist.c                | 31 ++++++++++++++++++++++++---
 tools/perf/util/hist.h                |  1 +
 tools/perf/util/record.h              |  1 +
 tools/perf/util/sample.h              |  1 +
 11 files changed, 81 insertions(+), 10 deletions(-)

-- 
2.39.0
Re: [PATCH RFC 0/3] perf: add logic to collect off-cpu samples
Posted by Ian Rogers 1 year, 5 months ago
On Thu, Jul 11, 2024 at 5:16 AM Ajay Kaher <ajay.kaher@broadcom.com> wrote:
>
> Add --off-cpu-kernel option to capture off-cpu sample alongwith on-cpu
> samples.
>
> off-cpu samples represent time spent by task when it was on wait queue
> (schedule out to waiting for events, blocked on I/O, locks, timers,
> paging/swapping, etc)
>
> Refer following links for more details:
> https://lpc.events/event/17/contributions/1556/
> https://www.youtube.com/watch?v=sF2faKGRnjs

Hi Ajay,

I wonder if Howard's improvements (not landed) for `perf record
--off-cpu` would solve this problem for you?
https://lore.kernel.org/lkml/20240424024805.144759-1-howardchu95@gmail.com/
Or is that approach problematic due to the use of BPF?

Thanks,
Ian

> Ajay Kaher (3):
>   perf/core: add logic to collect off-cpu sample
>   perf/record: add options --off-cpu-kernel
>   perf/report: add off-cpu samples
>
>  include/linux/perf_event.h            | 16 ++++++++++++++
>  include/uapi/linux/perf_event.h       |  3 ++-
>  kernel/events/core.c                  | 27 ++++++++++++++++++-----
>  tools/include/uapi/linux/perf_event.h |  3 ++-
>  tools/perf/builtin-record.c           |  2 ++
>  tools/perf/util/events_stats.h        |  2 ++
>  tools/perf/util/evsel.c               |  4 ++++
>  tools/perf/util/hist.c                | 31 ++++++++++++++++++++++++---
>  tools/perf/util/hist.h                |  1 +
>  tools/perf/util/record.h              |  1 +
>  tools/perf/util/sample.h              |  1 +
>  11 files changed, 81 insertions(+), 10 deletions(-)
>
> --
> 2.39.0
>
Re: [PATCH RFC 0/3] perf: add logic to collect off-cpu samples
Posted by Ajay Kaher 1 year, 5 months ago
On Fri, Jul 12, 2024 at 3:28 AM Ian Rogers <irogers@google.com> wrote:
>
> On Thu, Jul 11, 2024 at 5:16 AM Ajay Kaher <ajay.kaher@broadcom.com> wrote:
> >
> > Add --off-cpu-kernel option to capture off-cpu sample alongwith on-cpu
> > samples.
> >
> > off-cpu samples represent time spent by task when it was on wait queue
> > (schedule out to waiting for events, blocked on I/O, locks, timers,
> > paging/swapping, etc)
> >
> > Refer following links for more details:
> > https://lpc.events/event/17/contributions/1556/
> > https://www.youtube.com/watch?v=sF2faKGRnjs
>
> Hi Ajay,
>
> I wonder if Howard's improvements (not landed) for `perf record
> --off-cpu` would solve this problem for you?
> https://lore.kernel.org/lkml/20240424024805.144759-1-howardchu95@gmail.com/
> Or is that approach problematic due to the use of BPF?
>

Thanks Ian for your response and sharing Howard's improvements.

Yes, perf --off-cpu is based upon BPF and having following restrictions:

- target binary should be compiled with frame pointer, same mentioned
  in tools/perf/Documentation/perf-record.txt:
  Note that BPF can collect stack traces using frame pointer ("fp") only,
  as of now. So the applications built without the frame pointer might see
  bogus addresses.

- perf should be complied with BUILD_BPF_SKEL=1:
  Warning: option `off-cpu' is being ignored because no BUILD_BPF_SKEL=1

- off-cpu, on-cpu samples are not on the same result page.
  (I guess Howard has improve this, not tried his patches)

I have tried to collect the off-cpu sample same as on-cpu sample with the help
of  kernel/events/core.c. We will get one off-cpu sample from the target task
sched-out to sched-in. Or we can say off-cpu samples are not dependent on
frequency provided by the user to perf record.

I am also worried about having so many samples if sched-in/out
frequency is high.
Thinking to merge samples if attributes are the same (i.e. stacktrace)
and add the
off-cpu period to previous samples with the same attribute.

-Ajay

> Thanks,
> Ian
>
> > Ajay Kaher (3):
> >   perf/core: add logic to collect off-cpu sample
> >   perf/record: add options --off-cpu-kernel
> >   perf/report: add off-cpu samples
> >
> >  include/linux/perf_event.h            | 16 ++++++++++++++
> >  include/uapi/linux/perf_event.h       |  3 ++-
> >  kernel/events/core.c                  | 27 ++++++++++++++++++-----
> >  tools/include/uapi/linux/perf_event.h |  3 ++-
> >  tools/perf/builtin-record.c           |  2 ++
> >  tools/perf/util/events_stats.h        |  2 ++
> >  tools/perf/util/evsel.c               |  4 ++++
> >  tools/perf/util/hist.c                | 31 ++++++++++++++++++++++++---
> >  tools/perf/util/hist.h                |  1 +
> >  tools/perf/util/record.h              |  1 +
> >  tools/perf/util/sample.h              |  1 +
> >  11 files changed, 81 insertions(+), 10 deletions(-)
> >
> > --
> > 2.39.0
> >
Re: [PATCH RFC 0/3] perf: add logic to collect off-cpu samples
Posted by Ian Rogers 1 year, 5 months ago
On Sat, Jul 13, 2024 at 12:43 AM Ajay Kaher <ajay.kaher@broadcom.com> wrote:
>
> On Fri, Jul 12, 2024 at 3:28 AM Ian Rogers <irogers@google.com> wrote:
> >
> > On Thu, Jul 11, 2024 at 5:16 AM Ajay Kaher <ajay.kaher@broadcom.com> wrote:
> > >
> > > Add --off-cpu-kernel option to capture off-cpu sample alongwith on-cpu
> > > samples.
> > >
> > > off-cpu samples represent time spent by task when it was on wait queue
> > > (schedule out to waiting for events, blocked on I/O, locks, timers,
> > > paging/swapping, etc)
> > >
> > > Refer following links for more details:
> > > https://lpc.events/event/17/contributions/1556/
> > > https://www.youtube.com/watch?v=sF2faKGRnjs
> >
> > Hi Ajay,
> >
> > I wonder if Howard's improvements (not landed) for `perf record
> > --off-cpu` would solve this problem for you?
> > https://lore.kernel.org/lkml/20240424024805.144759-1-howardchu95@gmail.com/
> > Or is that approach problematic due to the use of BPF?
> >
>
> Thanks Ian for your response and sharing Howard's improvements.
>
> Yes, perf --off-cpu is based upon BPF and having following restrictions:
>
> - target binary should be compiled with frame pointer, same mentioned
>   in tools/perf/Documentation/perf-record.txt:
>   Note that BPF can collect stack traces using frame pointer ("fp") only,
>   as of now. So the applications built without the frame pointer might see
>   bogus addresses.

Agreed, if you want more than the leaf function. There's some evidence
that frame pointers may become the default:
https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html

> - perf should be complied with BUILD_BPF_SKEL=1:
>   Warning: option `off-cpu' is being ignored because no BUILD_BPF_SKEL=1

We've made this the default build behavior but probably a bigger issue
is that generally to have permissions you need to run as root.

> - off-cpu, on-cpu samples are not on the same result page.
>   (I guess Howard has improve this, not tried his patches)
>
> I have tried to collect the off-cpu sample same as on-cpu sample with the help
> of  kernel/events/core.c. We will get one off-cpu sample from the target task
> sched-out to sched-in. Or we can say off-cpu samples are not dependent on
> frequency provided by the user to perf record.
>
> I am also worried about having so many samples if sched-in/out
> frequency is high.
> Thinking to merge samples if attributes are the same (i.e. stacktrace)
> and add the
> off-cpu period to previous samples with the same attribute.

Right, this would be useful to look at in the context of Howard's
patches. The proposal in his changes are that short off-cpu times are
aggregated in a BPF map and dumped at the end of perf record - this
matches the existing perf record off-CPU behavior. Longer off-CPU
times, where long is open for debate and will be a parameter (perhaps
100 microseconds) and cause a sample to be created. BPF programs
create BPF output events, Howard's last patch series would rewrite the
events in perf record to be off-CPU samples. At the last office hours
it was discussed that we should dump the BPF output events directly,
to keep perf record overhead minimal, and add the ability to rewrite
into samples in tools like perf report, etc.

Thanks,
Ian

> -Ajay
>
> > Thanks,
> > Ian
> >
> > > Ajay Kaher (3):
> > >   perf/core: add logic to collect off-cpu sample
> > >   perf/record: add options --off-cpu-kernel
> > >   perf/report: add off-cpu samples
> > >
> > >  include/linux/perf_event.h            | 16 ++++++++++++++
> > >  include/uapi/linux/perf_event.h       |  3 ++-
> > >  kernel/events/core.c                  | 27 ++++++++++++++++++-----
> > >  tools/include/uapi/linux/perf_event.h |  3 ++-
> > >  tools/perf/builtin-record.c           |  2 ++
> > >  tools/perf/util/events_stats.h        |  2 ++
> > >  tools/perf/util/evsel.c               |  4 ++++
> > >  tools/perf/util/hist.c                | 31 ++++++++++++++++++++++++---
> > >  tools/perf/util/hist.h                |  1 +
> > >  tools/perf/util/record.h              |  1 +
> > >  tools/perf/util/sample.h              |  1 +
> > >  11 files changed, 81 insertions(+), 10 deletions(-)
> > >
> > > --
> > > 2.39.0
> > >