[v3] Move uid filtering to BPF filters

[PATCH v3 00/10] Move uid filtering to BPF filters

Posted by Ian Rogers 9 months, 2 weeks ago

Rather than scanning /proc and skipping PIDs based on their UIDs, use
BPF filters for uid filtering. The /proc scanning in thread_map is
racy as the PID may exit before the perf_event_open causing perf to
abort. BPF UID filters are more robust as they avoid the race. The
/proc scanning also misses processes starting after the perf
command. Add a helper for commands that support UID filtering and wire
up. Remove the non-BPF UID filtering support given it doesn't work.

v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
    tmp.perf-tools-next.

v2: Add a perf record uid test (Namhyung) and force setting
    system-wide for perf trace and perf record (Namhyung). Ensure the
    uid filter isn't set on tracepoint evsels.

v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/

Ian Rogers (10):
  perf parse-events filter: Use evsel__find_pmu
  perf target: Separate parse_uid into its own function
  perf parse-events: Add parse_uid_filter helper
  perf record: Switch user option to use BPF filter
  perf tests record: Add basic uid filtering test
  perf top: Switch user option to use BPF filter
  perf trace: Switch user option to use BPF filter
  perf bench evlist-open-close: Switch user option to use BPF filter
  perf target: Remove uid from target
  perf thread_map: Remove uid options

 tools/perf/bench/evlist-open-close.c        | 36 ++++++++------
 tools/perf/builtin-ftrace.c                 |  1 -
 tools/perf/builtin-kvm.c                    |  2 -
 tools/perf/builtin-record.c                 | 27 ++++++-----
 tools/perf/builtin-stat.c                   |  4 +-
 tools/perf/builtin-top.c                    | 22 +++++----
 tools/perf/builtin-trace.c                  | 27 +++++++----
 tools/perf/tests/backward-ring-buffer.c     |  1 -
 tools/perf/tests/event-times.c              |  8 ++-
 tools/perf/tests/keep-tracking.c            |  2 +-
 tools/perf/tests/mmap-basic.c               |  2 +-
 tools/perf/tests/openat-syscall-all-cpus.c  |  2 +-
 tools/perf/tests/openat-syscall-tp-fields.c |  1 -
 tools/perf/tests/openat-syscall.c           |  2 +-
 tools/perf/tests/perf-record.c              |  1 -
 tools/perf/tests/perf-time-to-tsc.c         |  2 +-
 tools/perf/tests/shell/record.sh            | 26 ++++++++++
 tools/perf/tests/switch-tracking.c          |  2 +-
 tools/perf/tests/task-exit.c                |  1 -
 tools/perf/tests/thread-map.c               |  2 +-
 tools/perf/util/bpf-filter.c                |  2 +-
 tools/perf/util/evlist.c                    |  3 +-
 tools/perf/util/parse-events.c              | 33 ++++++++-----
 tools/perf/util/parse-events.h              |  1 +
 tools/perf/util/python.c                    | 10 ++--
 tools/perf/util/target.c                    | 54 +++------------------
 tools/perf/util/target.h                    | 15 ++----
 tools/perf/util/thread_map.c                | 32 ++----------
 tools/perf/util/thread_map.h                |  6 +--
 tools/perf/util/top.c                       |  4 +-
 tools/perf/util/top.h                       |  1 +
 31 files changed, 150 insertions(+), 182 deletions(-)

-- 
2.49.0.850.g28803427d3-goog

Re: [PATCH v3 00/10] Move uid filtering to BPF filters

Posted by Ian Rogers 8 months, 2 weeks ago

On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@google.com> wrote:
>
> Rather than scanning /proc and skipping PIDs based on their UIDs, use
> BPF filters for uid filtering. The /proc scanning in thread_map is
> racy as the PID may exit before the perf_event_open causing perf to
> abort. BPF UID filters are more robust as they avoid the race. The
> /proc scanning also misses processes starting after the perf
> command. Add a helper for commands that support UID filtering and wire
> up. Remove the non-BPF UID filtering support given it doesn't work.
>
> v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
>     tmp.perf-tools-next.
>
> v2: Add a perf record uid test (Namhyung) and force setting
>     system-wide for perf trace and perf record (Namhyung). Ensure the
>     uid filter isn't set on tracepoint evsels.
>
> v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/

Ping. Thanks,
Ian

> Ian Rogers (10):
>   perf parse-events filter: Use evsel__find_pmu
>   perf target: Separate parse_uid into its own function
>   perf parse-events: Add parse_uid_filter helper
>   perf record: Switch user option to use BPF filter
>   perf tests record: Add basic uid filtering test
>   perf top: Switch user option to use BPF filter
>   perf trace: Switch user option to use BPF filter
>   perf bench evlist-open-close: Switch user option to use BPF filter
>   perf target: Remove uid from target
>   perf thread_map: Remove uid options
>
>  tools/perf/bench/evlist-open-close.c        | 36 ++++++++------
>  tools/perf/builtin-ftrace.c                 |  1 -
>  tools/perf/builtin-kvm.c                    |  2 -
>  tools/perf/builtin-record.c                 | 27 ++++++-----
>  tools/perf/builtin-stat.c                   |  4 +-
>  tools/perf/builtin-top.c                    | 22 +++++----
>  tools/perf/builtin-trace.c                  | 27 +++++++----
>  tools/perf/tests/backward-ring-buffer.c     |  1 -
>  tools/perf/tests/event-times.c              |  8 ++-
>  tools/perf/tests/keep-tracking.c            |  2 +-
>  tools/perf/tests/mmap-basic.c               |  2 +-
>  tools/perf/tests/openat-syscall-all-cpus.c  |  2 +-
>  tools/perf/tests/openat-syscall-tp-fields.c |  1 -
>  tools/perf/tests/openat-syscall.c           |  2 +-
>  tools/perf/tests/perf-record.c              |  1 -
>  tools/perf/tests/perf-time-to-tsc.c         |  2 +-
>  tools/perf/tests/shell/record.sh            | 26 ++++++++++
>  tools/perf/tests/switch-tracking.c          |  2 +-
>  tools/perf/tests/task-exit.c                |  1 -
>  tools/perf/tests/thread-map.c               |  2 +-
>  tools/perf/util/bpf-filter.c                |  2 +-
>  tools/perf/util/evlist.c                    |  3 +-
>  tools/perf/util/parse-events.c              | 33 ++++++++-----
>  tools/perf/util/parse-events.h              |  1 +
>  tools/perf/util/python.c                    | 10 ++--
>  tools/perf/util/target.c                    | 54 +++------------------
>  tools/perf/util/target.h                    | 15 ++----
>  tools/perf/util/thread_map.c                | 32 ++----------
>  tools/perf/util/thread_map.h                |  6 +--
>  tools/perf/util/top.c                       |  4 +-
>  tools/perf/util/top.h                       |  1 +
>  31 files changed, 150 insertions(+), 182 deletions(-)
>
> --
> 2.49.0.850.g28803427d3-goog
>

Re: [PATCH v3 00/10] Move uid filtering to BPF filters

Posted by Namhyung Kim 8 months, 1 week ago

Hi Ian,

On Tue, May 27, 2025 at 01:39:21PM -0700, Ian Rogers wrote:
> On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@google.com> wrote:
> >
> > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > BPF filters for uid filtering. The /proc scanning in thread_map is
> > racy as the PID may exit before the perf_event_open causing perf to
> > abort. BPF UID filters are more robust as they avoid the race. The
> > /proc scanning also misses processes starting after the perf
> > command. Add a helper for commands that support UID filtering and wire
> > up. Remove the non-BPF UID filtering support given it doesn't work.
> >
> > v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
> >     tmp.perf-tools-next.
> >
> > v2: Add a perf record uid test (Namhyung) and force setting
> >     system-wide for perf trace and perf record (Namhyung). Ensure the
> >     uid filter isn't set on tracepoint evsels.
> >
> > v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/
> 
> Ping. Thanks,

I'm ok with preferring BPF over /proc scanning, but still hesitate to
remove it since some people don't use BPF.  Can you please drop that
part and make parse_uid_filter() conditional on BPF?

Thanks,
Namhyung

 
> > Ian Rogers (10):
> >   perf parse-events filter: Use evsel__find_pmu
> >   perf target: Separate parse_uid into its own function
> >   perf parse-events: Add parse_uid_filter helper
> >   perf record: Switch user option to use BPF filter
> >   perf tests record: Add basic uid filtering test
> >   perf top: Switch user option to use BPF filter
> >   perf trace: Switch user option to use BPF filter
> >   perf bench evlist-open-close: Switch user option to use BPF filter
> >   perf target: Remove uid from target
> >   perf thread_map: Remove uid options
> >
> >  tools/perf/bench/evlist-open-close.c        | 36 ++++++++------
> >  tools/perf/builtin-ftrace.c                 |  1 -
> >  tools/perf/builtin-kvm.c                    |  2 -
> >  tools/perf/builtin-record.c                 | 27 ++++++-----
> >  tools/perf/builtin-stat.c                   |  4 +-
> >  tools/perf/builtin-top.c                    | 22 +++++----
> >  tools/perf/builtin-trace.c                  | 27 +++++++----
> >  tools/perf/tests/backward-ring-buffer.c     |  1 -
> >  tools/perf/tests/event-times.c              |  8 ++-
> >  tools/perf/tests/keep-tracking.c            |  2 +-
> >  tools/perf/tests/mmap-basic.c               |  2 +-
> >  tools/perf/tests/openat-syscall-all-cpus.c  |  2 +-
> >  tools/perf/tests/openat-syscall-tp-fields.c |  1 -
> >  tools/perf/tests/openat-syscall.c           |  2 +-
> >  tools/perf/tests/perf-record.c              |  1 -
> >  tools/perf/tests/perf-time-to-tsc.c         |  2 +-
> >  tools/perf/tests/shell/record.sh            | 26 ++++++++++
> >  tools/perf/tests/switch-tracking.c          |  2 +-
> >  tools/perf/tests/task-exit.c                |  1 -
> >  tools/perf/tests/thread-map.c               |  2 +-
> >  tools/perf/util/bpf-filter.c                |  2 +-
> >  tools/perf/util/evlist.c                    |  3 +-
> >  tools/perf/util/parse-events.c              | 33 ++++++++-----
> >  tools/perf/util/parse-events.h              |  1 +
> >  tools/perf/util/python.c                    | 10 ++--
> >  tools/perf/util/target.c                    | 54 +++------------------
> >  tools/perf/util/target.h                    | 15 ++----
> >  tools/perf/util/thread_map.c                | 32 ++----------
> >  tools/perf/util/thread_map.h                |  6 +--
> >  tools/perf/util/top.c                       |  4 +-
> >  tools/perf/util/top.h                       |  1 +
> >  31 files changed, 150 insertions(+), 182 deletions(-)
> >
> > --
> > 2.49.0.850.g28803427d3-goog
> >

Re: [PATCH v3 00/10] Move uid filtering to BPF filters

Posted by Ian Rogers 8 months, 1 week ago

On Mon, Jun 2, 2025 at 9:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Tue, May 27, 2025 at 01:39:21PM -0700, Ian Rogers wrote:
> > On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@google.com> wrote:
> > >
> > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > racy as the PID may exit before the perf_event_open causing perf to
> > > abort. BPF UID filters are more robust as they avoid the race. The
> > > /proc scanning also misses processes starting after the perf
> > > command. Add a helper for commands that support UID filtering and wire
> > > up. Remove the non-BPF UID filtering support given it doesn't work.
> > >
> > > v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
> > >     tmp.perf-tools-next.
> > >
> > > v2: Add a perf record uid test (Namhyung) and force setting
> > >     system-wide for perf trace and perf record (Namhyung). Ensure the
> > >     uid filter isn't set on tracepoint evsels.
> > >
> > > v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/
> >
> > Ping. Thanks,
>
> I'm ok with preferring BPF over /proc scanning, but still hesitate to
> remove it since some people don't use BPF.  Can you please drop that
> part and make parse_uid_filter() conditional on BPF?

Hi Namhyung,

The approach of scanning /proc fails as:
1) processes that start after perf starts will be missed,
2) processes that terminate between being scanned in /proc and
perf_event_open will cause perf to fail (essentially the -u option is
just sugar to scan /proc and then provide the processes as if they
were a -p option - such an approach doesn't need building into the
tool).

This patch series adds a test [1] and perf test has lots of processes
starting and exiting, matching condition (2) above*. If this series
were changed to an approach that uses BPF and falls back on /proc
scanning then the -u option would be broken for both reasons above but
also prove a constant source of test flakes.

Rather than give the users something both frustrating to use (keeps
quitting due to failed opens) and broken (missing processes) I think
it is better to quit perf at that point informing the user they need
more permissions to load the BPF program. This also makes the -u
option testable.

So the request for a change I don't think is sensible as it provides a
worse user and testing experience. There is also the cognitive load of
having the /proc scanning code in the code base, whereas the BPF
filter is largely isolated.

Thanks,
Ian

[1] https://lore.kernel.org/lkml/20250425214008.176100-6-irogers@google.com/
* rescord.sh is marked as exclusive currently, but this shouldn't
really be necessary.



> Thanks,
> Namhyung
>
>
> > > Ian Rogers (10):
> > >   perf parse-events filter: Use evsel__find_pmu
> > >   perf target: Separate parse_uid into its own function
> > >   perf parse-events: Add parse_uid_filter helper
> > >   perf record: Switch user option to use BPF filter
> > >   perf tests record: Add basic uid filtering test
> > >   perf top: Switch user option to use BPF filter
> > >   perf trace: Switch user option to use BPF filter
> > >   perf bench evlist-open-close: Switch user option to use BPF filter
> > >   perf target: Remove uid from target
> > >   perf thread_map: Remove uid options
> > >
> > >  tools/perf/bench/evlist-open-close.c        | 36 ++++++++------
> > >  tools/perf/builtin-ftrace.c                 |  1 -
> > >  tools/perf/builtin-kvm.c                    |  2 -
> > >  tools/perf/builtin-record.c                 | 27 ++++++-----
> > >  tools/perf/builtin-stat.c                   |  4 +-
> > >  tools/perf/builtin-top.c                    | 22 +++++----
> > >  tools/perf/builtin-trace.c                  | 27 +++++++----
> > >  tools/perf/tests/backward-ring-buffer.c     |  1 -
> > >  tools/perf/tests/event-times.c              |  8 ++-
> > >  tools/perf/tests/keep-tracking.c            |  2 +-
> > >  tools/perf/tests/mmap-basic.c               |  2 +-
> > >  tools/perf/tests/openat-syscall-all-cpus.c  |  2 +-
> > >  tools/perf/tests/openat-syscall-tp-fields.c |  1 -
> > >  tools/perf/tests/openat-syscall.c           |  2 +-
> > >  tools/perf/tests/perf-record.c              |  1 -
> > >  tools/perf/tests/perf-time-to-tsc.c         |  2 +-
> > >  tools/perf/tests/shell/record.sh            | 26 ++++++++++
> > >  tools/perf/tests/switch-tracking.c          |  2 +-
> > >  tools/perf/tests/task-exit.c                |  1 -
> > >  tools/perf/tests/thread-map.c               |  2 +-
> > >  tools/perf/util/bpf-filter.c                |  2 +-
> > >  tools/perf/util/evlist.c                    |  3 +-
> > >  tools/perf/util/parse-events.c              | 33 ++++++++-----
> > >  tools/perf/util/parse-events.h              |  1 +
> > >  tools/perf/util/python.c                    | 10 ++--
> > >  tools/perf/util/target.c                    | 54 +++------------------
> > >  tools/perf/util/target.h                    | 15 ++----
> > >  tools/perf/util/thread_map.c                | 32 ++----------
> > >  tools/perf/util/thread_map.h                |  6 +--
> > >  tools/perf/util/top.c                       |  4 +-
> > >  tools/perf/util/top.h                       |  1 +
> > >  31 files changed, 150 insertions(+), 182 deletions(-)
> > >
> > > --
> > > 2.49.0.850.g28803427d3-goog
> > >

Re: [PATCH v3 00/10] Move uid filtering to BPF filters

Posted by Namhyung Kim 8 months, 1 week ago

On Mon, Jun 02, 2025 at 11:26:12PM -0700, Ian Rogers wrote:
> On Mon, Jun 2, 2025 at 9:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Ian,
> >
> > On Tue, May 27, 2025 at 01:39:21PM -0700, Ian Rogers wrote:
> > > On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > abort. BPF UID filters are more robust as they avoid the race. The
> > > > /proc scanning also misses processes starting after the perf
> > > > command. Add a helper for commands that support UID filtering and wire
> > > > up. Remove the non-BPF UID filtering support given it doesn't work.
> > > >
> > > > v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
> > > >     tmp.perf-tools-next.
> > > >
> > > > v2: Add a perf record uid test (Namhyung) and force setting
> > > >     system-wide for perf trace and perf record (Namhyung). Ensure the
> > > >     uid filter isn't set on tracepoint evsels.
> > > >
> > > > v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/
> > >
> > > Ping. Thanks,
> >
> > I'm ok with preferring BPF over /proc scanning, but still hesitate to
> > remove it since some people don't use BPF.  Can you please drop that
> > part and make parse_uid_filter() conditional on BPF?
> 
> Hi Namhyung,
> 
> The approach of scanning /proc fails as:
> 1) processes that start after perf starts will be missed,
> 2) processes that terminate between being scanned in /proc and
> perf_event_open will cause perf to fail (essentially the -u option is
> just sugar to scan /proc and then provide the processes as if they
> were a -p option - such an approach doesn't need building into the
> tool).

Yeah, I remember we had this discussion before.  I think (1) is not true
as perf events will be inherited to children (but there is a race).  And
(2) is a real problem but it's also about a race and it can succeed.

Maybe we could change it to skip failed events when the target is a
user but that's not the direction you want.

> 
> This patch series adds a test [1] and perf test has lots of processes
> starting and exiting, matching condition (2) above*. If this series
> were changed to an approach that uses BPF and falls back on /proc
> scanning then the -u option would be broken for both reasons above but
> also prove a constant source of test flakes.
> 
> Rather than give the users something both frustrating to use (keeps
> quitting due to failed opens) and broken (missing processes) I think
> it is better to quit perf at that point informing the user they need
> more permissions to load the BPF program. This also makes the -u
> option testable.
> 
> So the request for a change I don't think is sensible as it provides a
> worse user and testing experience. There is also the cognitive load of
> having the /proc scanning code in the code base, whereas the BPF
> filter is largely isolated.

But I think the problem is that it has different requirements - BPF and
root privilege.  So it should be used after checking the requirements
and fail or fallback.

Does it print proper error messages if not?  With that we can deprecate
the existing behavior and remove it later.

Thanks,
Namhyung

Re: [PATCH v3 00/10] Move uid filtering to BPF filters

Posted by Ian Rogers 8 months, 1 week ago

On Tue, Jun 3, 2025 at 3:32 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Jun 02, 2025 at 11:26:12PM -0700, Ian Rogers wrote:
> > On Mon, Jun 2, 2025 at 9:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hi Ian,
> > >
> > > On Tue, May 27, 2025 at 01:39:21PM -0700, Ian Rogers wrote:
> > > > On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@google.com> wrote:
> > > > >
> > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > abort. BPF UID filters are more robust as they avoid the race. The
> > > > > /proc scanning also misses processes starting after the perf
> > > > > command. Add a helper for commands that support UID filtering and wire
> > > > > up. Remove the non-BPF UID filtering support given it doesn't work.
> > > > >
> > > > > v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
> > > > >     tmp.perf-tools-next.
> > > > >
> > > > > v2: Add a perf record uid test (Namhyung) and force setting
> > > > >     system-wide for perf trace and perf record (Namhyung). Ensure the
> > > > >     uid filter isn't set on tracepoint evsels.
> > > > >
> > > > > v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/
> > > >
> > > > Ping. Thanks,
> > >
> > > I'm ok with preferring BPF over /proc scanning, but still hesitate to
> > > remove it since some people don't use BPF.  Can you please drop that
> > > part and make parse_uid_filter() conditional on BPF?
> >
> > Hi Namhyung,
> >
> > The approach of scanning /proc fails as:
> > 1) processes that start after perf starts will be missed,
> > 2) processes that terminate between being scanned in /proc and
> > perf_event_open will cause perf to fail (essentially the -u option is
> > just sugar to scan /proc and then provide the processes as if they
> > were a -p option - such an approach doesn't need building into the
> > tool).
>
> Yeah, I remember we had this discussion before.  I think (1) is not true
> as perf events will be inherited to children (but there is a race).

If you log in from another terminal? Anything that creates a new
process for that user but isn't inherited will be missed, which isn't
merely a race.

>  And
> (2) is a real problem but it's also about a race and it can succeed.
>
> Maybe we could change it to skip failed events when the target is a
> user but that's not the direction you want.

We could have other events and try to discover new processes via them,
do things like dummy events to cover races. It is just a lot of
complexity for something that is a trivial amount of BPF. In something
like 10 years nobody has bothered to fix this up.

> >
> > This patch series adds a test [1] and perf test has lots of processes
> > starting and exiting, matching condition (2) above*. If this series
> > were changed to an approach that uses BPF and falls back on /proc
> > scanning then the -u option would be broken for both reasons above but
> > also prove a constant source of test flakes.
> >
> > Rather than give the users something both frustrating to use (keeps
> > quitting due to failed opens) and broken (missing processes) I think
> > it is better to quit perf at that point informing the user they need
> > more permissions to load the BPF program. This also makes the -u
> > option testable.
> >
> > So the request for a change I don't think is sensible as it provides a
> > worse user and testing experience. There is also the cognitive load of
> > having the /proc scanning code in the code base, whereas the BPF
> > filter is largely isolated.
>
> But I think the problem is that it has different requirements - BPF and
> root privilege.  So it should be used after checking the requirements
> and fail or fallback.
>
> Does it print proper error messages if not?  With that we can deprecate
> the existing behavior and remove it later.

For `perf top` with TUI you get an error message in a box of:
```
failed to set filter "BPF" on event cpu_atom/cycles/P with 1
(Operation not permitted)
```
With --stdio you get:
```
libbpf: Error in bpf_object__probe_loading(): -EPERM. Couldn't load
trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value.
libbpf: failed to load object 'sample_filter_bpf'
libbpf: failed to load BPF skeleton 'sample_filter_bpf': -EPERM
Failed to load perf sample-filter BPF skeleton
failed to set filter "BPF" on event cpu_atom/cycles/P with 1
(Operation not permitted)
```
This matches the existing behavior if you put a filter on an event.

Thanks,
Ian

> Thanks,
> Namhyung
>

Re: [PATCH v3 00/10] Move uid filtering to BPF filters

Posted by Namhyung Kim 8 months, 1 week ago

On Tue, Jun 03, 2025 at 04:22:53PM -0700, Ian Rogers wrote:
> On Tue, Jun 3, 2025 at 3:32 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Jun 02, 2025 at 11:26:12PM -0700, Ian Rogers wrote:
> > > On Mon, Jun 2, 2025 at 9:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Hi Ian,
> > > >
> > > > On Tue, May 27, 2025 at 01:39:21PM -0700, Ian Rogers wrote:
> > > > > On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@google.com> wrote:
> > > > > >
> > > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > > abort. BPF UID filters are more robust as they avoid the race. The
> > > > > > /proc scanning also misses processes starting after the perf
> > > > > > command. Add a helper for commands that support UID filtering and wire
> > > > > > up. Remove the non-BPF UID filtering support given it doesn't work.
> > > > > >
> > > > > > v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
> > > > > >     tmp.perf-tools-next.
> > > > > >
> > > > > > v2: Add a perf record uid test (Namhyung) and force setting
> > > > > >     system-wide for perf trace and perf record (Namhyung). Ensure the
> > > > > >     uid filter isn't set on tracepoint evsels.
> > > > > >
> > > > > > v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/
> > > > >
> > > > > Ping. Thanks,
> > > >
> > > > I'm ok with preferring BPF over /proc scanning, but still hesitate to
> > > > remove it since some people don't use BPF.  Can you please drop that
> > > > part and make parse_uid_filter() conditional on BPF?
> > >
> > > Hi Namhyung,
> > >
> > > The approach of scanning /proc fails as:
> > > 1) processes that start after perf starts will be missed,
> > > 2) processes that terminate between being scanned in /proc and
> > > perf_event_open will cause perf to fail (essentially the -u option is
> > > just sugar to scan /proc and then provide the processes as if they
> > > were a -p option - such an approach doesn't need building into the
> > > tool).
> >
> > Yeah, I remember we had this discussion before.  I think (1) is not true
> > as perf events will be inherited to children (but there is a race).
> 
> If you log in from another terminal? Anything that creates a new
> process for that user but isn't inherited will be missed, which isn't
> merely a race.

As long as the another terminal is owned by the same user, any new
process from the terminal will inherit events, no?

> 
> >  And
> > (2) is a real problem but it's also about a race and it can succeed.
> >
> > Maybe we could change it to skip failed events when the target is a
> > user but that's not the direction you want.
> 
> We could have other events and try to discover new processes via them,
> do things like dummy events to cover races. It is just a lot of
> complexity for something that is a trivial amount of BPF. In something
> like 10 years nobody has bothered to fix this up.

I don't want any complex solution for this.  Let's not touch this.

> 
> > >
> > > This patch series adds a test [1] and perf test has lots of processes
> > > starting and exiting, matching condition (2) above*. If this series
> > > were changed to an approach that uses BPF and falls back on /proc
> > > scanning then the -u option would be broken for both reasons above but
> > > also prove a constant source of test flakes.
> > >
> > > Rather than give the users something both frustrating to use (keeps
> > > quitting due to failed opens) and broken (missing processes) I think
> > > it is better to quit perf at that point informing the user they need
> > > more permissions to load the BPF program. This also makes the -u
> > > option testable.
> > >
> > > So the request for a change I don't think is sensible as it provides a
> > > worse user and testing experience. There is also the cognitive load of
> > > having the /proc scanning code in the code base, whereas the BPF
> > > filter is largely isolated.
> >
> > But I think the problem is that it has different requirements - BPF and
> > root privilege.  So it should be used after checking the requirements
> > and fail or fallback.
> >
> > Does it print proper error messages if not?  With that we can deprecate
> > the existing behavior and remove it later.
> 
> For `perf top` with TUI you get an error message in a box of:
> ```
> failed to set filter "BPF" on event cpu_atom/cycles/P with 1
> (Operation not permitted)
> ```
> With --stdio you get:
> ```
> libbpf: Error in bpf_object__probe_loading(): -EPERM. Couldn't load
> trivial BPF program. Make sure your kernel supports BPF
> (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
> value.
> libbpf: failed to load object 'sample_filter_bpf'
> libbpf: failed to load BPF skeleton 'sample_filter_bpf': -EPERM
> Failed to load perf sample-filter BPF skeleton
> failed to set filter "BPF" on event cpu_atom/cycles/P with 1
> (Operation not permitted)
> ```
> This matches the existing behavior if you put a filter on an event.

But that's different as user directly asked the BPF filter.
The following message would be better (unless you fallback to the old
behavior).

"-u/--uid option is using BPF filter but perf is not built with BPF.
Please make sure to build with libbpf and bpf skeletons."

and/or

"-u/--uid option is using BPF filter which requires root privilege."

You may check if the filter program and map is pinned already.

Thanks,
Namhyung

Re: [PATCH v3 00/10] Move uid filtering to BPF filters

Posted by Ian Rogers 8 months, 1 week ago

On Tue, Jun 3, 2025 at 4:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Jun 03, 2025 at 04:22:53PM -0700, Ian Rogers wrote:
> > On Tue, Jun 3, 2025 at 3:32 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Mon, Jun 02, 2025 at 11:26:12PM -0700, Ian Rogers wrote:
> > > > On Mon, Jun 2, 2025 at 9:41 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > Hi Ian,
> > > > >
> > > > > On Tue, May 27, 2025 at 01:39:21PM -0700, Ian Rogers wrote:
> > > > > > On Fri, Apr 25, 2025 at 2:40 PM Ian Rogers <irogers@google.com> wrote:
> > > > > > >
> > > > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > > > abort. BPF UID filters are more robust as they avoid the race. The
> > > > > > > /proc scanning also misses processes starting after the perf
> > > > > > > command. Add a helper for commands that support UID filtering and wire
> > > > > > > up. Remove the non-BPF UID filtering support given it doesn't work.
> > > > > > >
> > > > > > > v3: Add lengthier commit messages as requested by Arnaldo. Rebase on
> > > > > > >     tmp.perf-tools-next.
> > > > > > >
> > > > > > > v2: Add a perf record uid test (Namhyung) and force setting
> > > > > > >     system-wide for perf trace and perf record (Namhyung). Ensure the
> > > > > > >     uid filter isn't set on tracepoint evsels.
> > > > > > >
> > > > > > > v1: https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/
> > > > > >
> > > > > > Ping. Thanks,
> > > > >
> > > > > I'm ok with preferring BPF over /proc scanning, but still hesitate to
> > > > > remove it since some people don't use BPF.  Can you please drop that
> > > > > part and make parse_uid_filter() conditional on BPF?
> > > >
> > > > Hi Namhyung,
> > > >
> > > > The approach of scanning /proc fails as:
> > > > 1) processes that start after perf starts will be missed,
> > > > 2) processes that terminate between being scanned in /proc and
> > > > perf_event_open will cause perf to fail (essentially the -u option is
> > > > just sugar to scan /proc and then provide the processes as if they
> > > > were a -p option - such an approach doesn't need building into the
> > > > tool).
> > >
> > > Yeah, I remember we had this discussion before.  I think (1) is not true
> > > as perf events will be inherited to children (but there is a race).
> >
> > If you log in from another terminal? Anything that creates a new
> > process for that user but isn't inherited will be missed, which isn't
> > merely a race.
>
> As long as the another terminal is owned by the same user, any new
> process from the terminal will inherit events, no?
>
> >
> > >  And
> > > (2) is a real problem but it's also about a race and it can succeed.
> > >
> > > Maybe we could change it to skip failed events when the target is a
> > > user but that's not the direction you want.
> >
> > We could have other events and try to discover new processes via them,
> > do things like dummy events to cover races. It is just a lot of
> > complexity for something that is a trivial amount of BPF. In something
> > like 10 years nobody has bothered to fix this up.
>
> I don't want any complex solution for this.  Let's not touch this.
>
> >
> > > >
> > > > This patch series adds a test [1] and perf test has lots of processes
> > > > starting and exiting, matching condition (2) above*. If this series
> > > > were changed to an approach that uses BPF and falls back on /proc
> > > > scanning then the -u option would be broken for both reasons above but
> > > > also prove a constant source of test flakes.
> > > >
> > > > Rather than give the users something both frustrating to use (keeps
> > > > quitting due to failed opens) and broken (missing processes) I think
> > > > it is better to quit perf at that point informing the user they need
> > > > more permissions to load the BPF program. This also makes the -u
> > > > option testable.
> > > >
> > > > So the request for a change I don't think is sensible as it provides a
> > > > worse user and testing experience. There is also the cognitive load of
> > > > having the /proc scanning code in the code base, whereas the BPF
> > > > filter is largely isolated.
> > >
> > > But I think the problem is that it has different requirements - BPF and
> > > root privilege.  So it should be used after checking the requirements
> > > and fail or fallback.
> > >
> > > Does it print proper error messages if not?  With that we can deprecate
> > > the existing behavior and remove it later.
> >
> > For `perf top` with TUI you get an error message in a box of:
> > ```
> > failed to set filter "BPF" on event cpu_atom/cycles/P with 1
> > (Operation not permitted)
> > ```
> > With --stdio you get:
> > ```
> > libbpf: Error in bpf_object__probe_loading(): -EPERM. Couldn't load
> > trivial BPF program. Make sure your kernel supports BPF
> > (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
> > value.
> > libbpf: failed to load object 'sample_filter_bpf'
> > libbpf: failed to load BPF skeleton 'sample_filter_bpf': -EPERM
> > Failed to load perf sample-filter BPF skeleton
> > failed to set filter "BPF" on event cpu_atom/cycles/P with 1
> > (Operation not permitted)
> > ```
> > This matches the existing behavior if you put a filter on an event.
>
> But that's different as user directly asked the BPF filter.
> The following message would be better (unless you fallback to the old
> behavior).
>
> "-u/--uid option is using BPF filter but perf is not built with BPF.
> Please make sure to build with libbpf and bpf skeletons."
>
> and/or
>
> "-u/--uid option is using BPF filter which requires root privilege."
>
> You may check if the filter program and map is pinned already.

I don't disagree that these error messages would be better, shouldn't
the existing BPF filter code also produce these more user friendly
error messages? Once it does that we can adapt it when the caller is
the '-u' option so that the error message doesn't imply '--filter' was
used?

Thanks,
Ian

> Thanks,
> Namhyung
>