[PATCH v4 3/5] perf/core: Account dropped samples from BPF

Namhyung Kim posted 5 patches 1 month ago
There is a newer version of this series
[PATCH v4 3/5] perf/core: Account dropped samples from BPF
Posted by Namhyung Kim 1 month ago
Like in the software events, the BPF overflow handler can drop samples
by returning 0.  Let's count the dropped samples here too.

Acked-by: Kyle Huey <me@kylehuey.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/events/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5d24597180dec167..b41c17a0bc19f7c2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
 	ret = __perf_event_account_interrupt(event, throttle);
 
 	if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
-	    !bpf_overflow_handler(event, data, regs))
+	    !bpf_overflow_handler(event, data, regs)) {
+		atomic64_inc(&event->dropped_samples);
 		return ret;
+	}
 
 	/*
 	 * XXX event_limit might not quite work as expected on inherited
-- 
2.47.0.105.g07ac214952-goog
Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
Posted by Andrii Nakryiko 1 month ago
On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Like in the software events, the BPF overflow handler can drop samples
> by returning 0.  Let's count the dropped samples here too.
>
> Acked-by: Kyle Huey <me@kylehuey.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Song Liu <song@kernel.org>
> Cc: bpf@vger.kernel.org
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/events/core.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5d24597180dec167..b41c17a0bc19f7c2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
>         ret = __perf_event_account_interrupt(event, throttle);
>
>         if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> -           !bpf_overflow_handler(event, data, regs))
> +           !bpf_overflow_handler(event, data, regs)) {
> +               atomic64_inc(&event->dropped_samples);

I don't see the full patch set (please cc relevant people and mailing
list on each patch in the patch set), but do we really want to pay the
price of atomic increment on what's the very typical situation of a
BPF program returning 0?

At least from a BPF perspective this is no "dropping sample", it's
just processing it in BPF and not paying the overhead of the perf
subsystem continuing processing it afterwards. So the dropping part is
also misleading, IMO.

>                 return ret;
> +       }
>
>         /*
>          * XXX event_limit might not quite work as expected on inherited
> --
> 2.47.0.105.g07ac214952-goog
>
Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
Posted by Namhyung Kim 1 month ago
Hello,

On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Like in the software events, the BPF overflow handler can drop samples
> > by returning 0.  Let's count the dropped samples here too.
> >
> > Acked-by: Kyle Huey <me@kylehuey.com>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Andrii Nakryiko <andrii@kernel.org>
> > Cc: Song Liu <song@kernel.org>
> > Cc: bpf@vger.kernel.org
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  kernel/events/core.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> >         ret = __perf_event_account_interrupt(event, throttle);
> >
> >         if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > -           !bpf_overflow_handler(event, data, regs))
> > +           !bpf_overflow_handler(event, data, regs)) {
> > +               atomic64_inc(&event->dropped_samples);
> 
> I don't see the full patch set (please cc relevant people and mailing
> list on each patch in the patch set), but do we really want to pay the

Sorry, you can find the whole series here.

https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org

I thought it's mostly for the perf part so I didn't CC bpf folks but
I'll do in the next version.


> price of atomic increment on what's the very typical situation of a
> BPF program returning 0?

Is it typical for BPF_PROG_TYPE_PERF_EVENT?  I guess TRACING programs
usually return 0 but PERF_EVENT should care about the return values.

> 
> At least from a BPF perspective this is no "dropping sample", it's
> just processing it in BPF and not paying the overhead of the perf
> subsystem continuing processing it afterwards. So the dropping part is
> also misleading, IMO.

In the perf tools, we have a filtering logic in BPF to read sample
values and to decide whether we want this sample or not.  In that case
users would be interested in the exact number of samples.

Thanks,
Namhyung

> 
> >                 return ret;
> > +       }
> >
> >         /*
> >          * XXX event_limit might not quite work as expected on inherited
> > --
> > 2.47.0.105.g07ac214952-goog
> >
Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
Posted by Andrii Nakryiko 1 month ago
On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Like in the software events, the BPF overflow handler can drop samples
> > > by returning 0.  Let's count the dropped samples here too.
> > >
> > > Acked-by: Kyle Huey <me@kylehuey.com>
> > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > Cc: Song Liu <song@kernel.org>
> > > Cc: bpf@vger.kernel.org
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  kernel/events/core.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > >         ret = __perf_event_account_interrupt(event, throttle);
> > >
> > >         if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > -           !bpf_overflow_handler(event, data, regs))
> > > +           !bpf_overflow_handler(event, data, regs)) {
> > > +               atomic64_inc(&event->dropped_samples);
> >
> > I don't see the full patch set (please cc relevant people and mailing
> > list on each patch in the patch set), but do we really want to pay the
>
> Sorry, you can find the whole series here.
>
> https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
>
> I thought it's mostly for the perf part so I didn't CC bpf folks but
> I'll do in the next version.
>
>
> > price of atomic increment on what's the very typical situation of a
> > BPF program returning 0?
>
> Is it typical for BPF_PROG_TYPE_PERF_EVENT?  I guess TRACING programs
> usually return 0 but PERF_EVENT should care about the return values.
>

Yeah, it's pretty much always `return 0;` for perf_event-based BPF
profilers. It's rather unusual to return non-zero, actually.

> >
> > At least from a BPF perspective this is no "dropping sample", it's
> > just processing it in BPF and not paying the overhead of the perf
> > subsystem continuing processing it afterwards. So the dropping part is
> > also misleading, IMO.
>
> In the perf tools, we have a filtering logic in BPF to read sample
> values and to decide whether we want this sample or not.  In that case
> users would be interested in the exact number of samples.
>
> Thanks,
> Namhyung
>
> >
> > >                 return ret;
> > > +       }
> > >
> > >         /*
> > >          * XXX event_limit might not quite work as expected on inherited
> > > --
> > > 2.47.0.105.g07ac214952-goog
> > >
Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
Posted by Namhyung Kim 1 month ago
On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote:
> On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Like in the software events, the BPF overflow handler can drop samples
> > > > by returning 0.  Let's count the dropped samples here too.
> > > >
> > > > Acked-by: Kyle Huey <me@kylehuey.com>
> > > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > > Cc: Song Liu <song@kernel.org>
> > > > Cc: bpf@vger.kernel.org
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > >  kernel/events/core.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > >         ret = __perf_event_account_interrupt(event, throttle);
> > > >
> > > >         if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > > -           !bpf_overflow_handler(event, data, regs))
> > > > +           !bpf_overflow_handler(event, data, regs)) {
> > > > +               atomic64_inc(&event->dropped_samples);
> > >
> > > I don't see the full patch set (please cc relevant people and mailing
> > > list on each patch in the patch set), but do we really want to pay the
> >
> > Sorry, you can find the whole series here.
> >
> > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
> >
> > I thought it's mostly for the perf part so I didn't CC bpf folks but
> > I'll do in the next version.
> >
> >
> > > price of atomic increment on what's the very typical situation of a
> > > BPF program returning 0?
> >
> > Is it typical for BPF_PROG_TYPE_PERF_EVENT?  I guess TRACING programs
> > usually return 0 but PERF_EVENT should care about the return values.
> >
> 
> Yeah, it's pretty much always `return 0;` for perf_event-based BPF
> profilers. It's rather unusual to return non-zero, actually.

Ok, then it may be local_t or plain unsigned long.  It should be
updated on overflow only.  Read can be racy but I think it's ok to
miss some numbers.  If someone really needs a precise count, they can
read it after disabling the event IMHO.

What do you think?

Thanks,
Namhyung

Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
Posted by Andrii Nakryiko 1 month ago
On Wed, Oct 23, 2024 at 1:32 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote:
> > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hello,
> > >
> > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > Like in the software events, the BPF overflow handler can drop samples
> > > > > by returning 0.  Let's count the dropped samples here too.
> > > > >
> > > > > Acked-by: Kyle Huey <me@kylehuey.com>
> > > > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > > > Cc: Song Liu <song@kernel.org>
> > > > > Cc: bpf@vger.kernel.org
> > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > ---
> > > > >  kernel/events/core.c | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > > > --- a/kernel/events/core.c
> > > > > +++ b/kernel/events/core.c
> > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > > >         ret = __perf_event_account_interrupt(event, throttle);
> > > > >
> > > > >         if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > > > -           !bpf_overflow_handler(event, data, regs))
> > > > > +           !bpf_overflow_handler(event, data, regs)) {
> > > > > +               atomic64_inc(&event->dropped_samples);
> > > >
> > > > I don't see the full patch set (please cc relevant people and mailing
> > > > list on each patch in the patch set), but do we really want to pay the
> > >
> > > Sorry, you can find the whole series here.
> > >
> > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
> > >
> > > I thought it's mostly for the perf part so I didn't CC bpf folks but
> > > I'll do in the next version.
> > >
> > >
> > > > price of atomic increment on what's the very typical situation of a
> > > > BPF program returning 0?
> > >
> > > Is it typical for BPF_PROG_TYPE_PERF_EVENT?  I guess TRACING programs
> > > usually return 0 but PERF_EVENT should care about the return values.
> > >
> >
> > Yeah, it's pretty much always `return 0;` for perf_event-based BPF
> > profilers. It's rather unusual to return non-zero, actually.
>
> Ok, then it may be local_t or plain unsigned long.  It should be
> updated on overflow only.  Read can be racy but I think it's ok to
> miss some numbers.  If someone really needs a precise count, they can
> read it after disabling the event IMHO.
>
> What do you think?
>

See [0] for unsynchronized increment absolutely killing the
performance due to cache line bouncing between CPUs. If the event is
high-frequency and can be triggered across multiple CPUs in short
succession, even an imprecise counter might be harmful.

In general, I'd say that if BPF attachment is involved, we probably
should avoid maintaining unnecessary statistics. Things like this
event->dropped_samples increment can be done very efficiently and
trivially from inside the BPF program, if at all necessary.

Having said that, if it's unlikely to have perf_event bouncing between
multiple CPUs, it's probably not that big of a deal.


  [0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@kernel.org/

> Thanks,
> Namhyung
>
Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
Posted by Namhyung Kim 4 weeks ago
On Wed, Oct 23, 2024 at 02:24:13PM -0700, Andrii Nakryiko wrote:
> On Wed, Oct 23, 2024 at 1:32 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Hello,
> > > >
> > > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > Like in the software events, the BPF overflow handler can drop samples
> > > > > > by returning 0.  Let's count the dropped samples here too.
> > > > > >
> > > > > > Acked-by: Kyle Huey <me@kylehuey.com>
> > > > > > Cc: Alexei Starovoitov <ast@kernel.org>
> > > > > > Cc: Andrii Nakryiko <andrii@kernel.org>
> > > > > > Cc: Song Liu <song@kernel.org>
> > > > > > Cc: bpf@vger.kernel.org
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > >  kernel/events/core.c | 4 +++-
> > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > > > > --- a/kernel/events/core.c
> > > > > > +++ b/kernel/events/core.c
> > > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > > > >         ret = __perf_event_account_interrupt(event, throttle);
> > > > > >
> > > > > >         if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > > > > -           !bpf_overflow_handler(event, data, regs))
> > > > > > +           !bpf_overflow_handler(event, data, regs)) {
> > > > > > +               atomic64_inc(&event->dropped_samples);
> > > > >
> > > > > I don't see the full patch set (please cc relevant people and mailing
> > > > > list on each patch in the patch set), but do we really want to pay the
> > > >
> > > > Sorry, you can find the whole series here.
> > > >
> > > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org
> > > >
> > > > I thought it's mostly for the perf part so I didn't CC bpf folks but
> > > > I'll do in the next version.
> > > >
> > > >
> > > > > price of atomic increment on what's the very typical situation of a
> > > > > BPF program returning 0?
> > > >
> > > > Is it typical for BPF_PROG_TYPE_PERF_EVENT?  I guess TRACING programs
> > > > usually return 0 but PERF_EVENT should care about the return values.
> > > >
> > >
> > > Yeah, it's pretty much always `return 0;` for perf_event-based BPF
> > > profilers. It's rather unusual to return non-zero, actually.
> >
> > Ok, then it may be local_t or plain unsigned long.  It should be
> > updated on overflow only.  Read can be racy but I think it's ok to
> > miss some numbers.  If someone really needs a precise count, they can
> > read it after disabling the event IMHO.
> >
> > What do you think?
> >
> 
> See [0] for unsynchronized increment absolutely killing the
> performance due to cache line bouncing between CPUs. If the event is
> high-frequency and can be triggered across multiple CPUs in short
> succession, even an imprecise counter might be harmful.

Ok.

> 
> In general, I'd say that if BPF attachment is involved, we probably
> should avoid maintaining unnecessary statistics. Things like this
> event->dropped_samples increment can be done very efficiently and
> trivially from inside the BPF program, if at all necessary.

Right, we can do that in the BPF too.

> 
> Having said that, if it's unlikely to have perf_event bouncing between
> multiple CPUs, it's probably not that big of a deal.

Yeah, perf_event is dedicated to a CPU or a task and the counter is
updated only in the overflow handler.  So I don't think it'd cause cache
line bouncing between CPUs.

Thanks,
Namhyung

> 
> 
>   [0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@kernel.org/
> 
> > Thanks,
> > Namhyung
> >