[RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS

Namhyung Kim posted 4 patches 1 year, 3 months ago
[RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
Posted by Namhyung Kim 1 year, 3 months ago
While IBS is available for per-thread profiling, still regular users
cannot open an event due to the default paranoid setting (2) which
doesn't allow unprivileged users to get kernel samples.  That means
it needs to set exclude_kernel bit in the attribute but IBS driver
would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
we want and I've been getting requests to fix this issue.

This should be done in the hardware, but until we get the HW fix we may
allow exclude_{kernel,user} in the attribute and silently drop the
samples in the PMU IRQ handler.  It won't guarantee the sampling
frequency or even it'd miss some with fixed period too.  Not ideal,
but that'd still be helpful to regular users.

This uses perf_exclude_event() which checks regs->cs.  But it should be
fine because set_linear_ip() also updates the CS according to the RIP
provided by IBS.

Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 arch/x86/events/amd/ibs.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d62..e40e2255239a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
 	if (has_branch_stack(event))
 		return -EOPNOTSUPP;
 
+	/* handle exclude_{user,kernel} in the IRQ handler */
+	if (event->attr.exclude_hv || event->attr.exclude_idle ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
+
 	ret = validate_group(event);
 	if (ret)
 		return ret;
@@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
 		.start		= perf_ibs_start,
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
-		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
 	},
 	.msr			= MSR_AMD64_IBSFETCHCTL,
 	.config_mask		= IBS_FETCH_CONFIG_MASK,
@@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
 		.start		= perf_ibs_start,
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
-		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
 	},
 	.msr			= MSR_AMD64_IBSOPCTL,
 	.config_mask		= IBS_OP_CONFIG_MASK,
@@ -1111,6 +1114,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		regs.flags |= PERF_EFLAGS_EXACT;
 	}
 
+	if (perf_exclude_event(event, &regs)) {
+		throttle = perf_event_account_interrupt(event);
+		atomic64_inc(&event->dropped_samples);
+		goto out;
+	}
+
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		raw = (struct perf_raw_record){
 			.frag = {
-- 
2.46.0.469.g59c65b2a67-goog
Re: [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
Posted by Peter Zijlstra 1 year, 3 months ago
On Fri, Aug 30, 2024 at 04:29:10PM -0700, Namhyung Kim wrote:
> While IBS is available for per-thread profiling, still regular users
> cannot open an event due to the default paranoid setting (2) which
> doesn't allow unprivileged users to get kernel samples.  That means
> it needs to set exclude_kernel bit in the attribute but IBS driver
> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
> we want and I've been getting requests to fix this issue.
> 
> This should be done in the hardware, but until we get the HW fix we may
> allow exclude_{kernel,user} in the attribute and silently drop the
> samples in the PMU IRQ handler.  It won't guarantee the sampling
> frequency or even it'd miss some with fixed period too.  Not ideal,
> but that'd still be helpful to regular users.

Urgh.... this is really rather bad. And I'm sure a bunch of people are
going to be spending a lot of time trying to figure out why their
results don't make sense.

I realize that having entry hooks to disable/enable the counters is also
not going to happen, this has a ton of problems too.

Also, that PMU passthrough patch set has guest hooks, so you can
actually do the exclude_host/guest nonsense with those, right?

> This uses perf_exclude_event() which checks regs->cs.  But it should be
> fine because set_linear_ip() also updates the CS according to the RIP
> provided by IBS.
> 
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  arch/x86/events/amd/ibs.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index e91970b01d62..e40e2255239a 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
>  	if (has_branch_stack(event))
>  		return -EOPNOTSUPP;
>  
> +	/* handle exclude_{user,kernel} in the IRQ handler */
> +	if (event->attr.exclude_hv || event->attr.exclude_idle ||
> +	    event->attr.exclude_host || event->attr.exclude_guest)
> +		return -EINVAL;
> +
>  	ret = validate_group(event);
>  	if (ret)
>  		return ret;
> @@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
>  		.start		= perf_ibs_start,
>  		.stop		= perf_ibs_stop,
>  		.read		= perf_ibs_read,
> -		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
>  	},
>  	.msr			= MSR_AMD64_IBSFETCHCTL,
>  	.config_mask		= IBS_FETCH_CONFIG_MASK,
> @@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
>  		.start		= perf_ibs_start,
>  		.stop		= perf_ibs_stop,
>  		.read		= perf_ibs_read,
> -		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
>  	},
>  	.msr			= MSR_AMD64_IBSOPCTL,
>  	.config_mask		= IBS_OP_CONFIG_MASK,
> @@ -1111,6 +1114,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>  		regs.flags |= PERF_EFLAGS_EXACT;
>  	}
>  
> +	if (perf_exclude_event(event, &regs)) {
> +		throttle = perf_event_account_interrupt(event);
> +		atomic64_inc(&event->dropped_samples);
> +		goto out;
> +	}
> +
>  	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
>  		raw = (struct perf_raw_record){
>  			.frag = {
> -- 
> 2.46.0.469.g59c65b2a67-goog
>
Re: [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
Posted by Namhyung Kim 1 year, 3 months ago
On Mon, Sep 2, 2024 at 2:12 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Aug 30, 2024 at 04:29:10PM -0700, Namhyung Kim wrote:
> > While IBS is available for per-thread profiling, still regular users
> > cannot open an event due to the default paranoid setting (2) which
> > doesn't allow unprivileged users to get kernel samples.  That means
> > it needs to set exclude_kernel bit in the attribute but IBS driver
> > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
> > we want and I've been getting requests to fix this issue.
> >
> > This should be done in the hardware, but until we get the HW fix we may
> > allow exclude_{kernel,user} in the attribute and silently drop the
> > samples in the PMU IRQ handler.  It won't guarantee the sampling
> > frequency or even it'd miss some with fixed period too.  Not ideal,
> > but that'd still be helpful to regular users.
>
> Urgh.... this is really rather bad. And I'm sure a bunch of people are
> going to be spending a lot of time trying to figure out why their
> results don't make sense.

I agree it can be confusing but there are use cases where regular users
want IBS information like memory data source, data address and so on.
Also I realized that software events like cpu-clock use the same logic to
discard samples by privilege mode already.

>
> I realize that having entry hooks to disable/enable the counters is also
> not going to happen, this has a ton of problems too.

Do you mean kernel/user mode change hook?  I guess it'd be too costly.

>
> Also, that PMU passthrough patch set has guest hooks, so you can
> actually do the exclude_host/guest nonsense with those, right?

Oh.. this patch is about exclude_user/kernel not host/guest.  Anyway
it'd be great if IBS could support the guest hooks and allow the exclude
bits.

Thanks,
Namhyung

>
> > This uses perf_exclude_event() which checks regs->cs.  But it should be
> > fine because set_linear_ip() also updates the CS according to the RIP
> > provided by IBS.
> >
> > Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> > Cc: Stephane Eranian <eranian@google.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  arch/x86/events/amd/ibs.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> > index e91970b01d62..e40e2255239a 100644
> > --- a/arch/x86/events/amd/ibs.c
> > +++ b/arch/x86/events/amd/ibs.c
> > @@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
> >       if (has_branch_stack(event))
> >               return -EOPNOTSUPP;
> >
> > +     /* handle exclude_{user,kernel} in the IRQ handler */
> > +     if (event->attr.exclude_hv || event->attr.exclude_idle ||
> > +         event->attr.exclude_host || event->attr.exclude_guest)
> > +             return -EINVAL;
> > +
> >       ret = validate_group(event);
> >       if (ret)
> >               return ret;
> > @@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
> >               .start          = perf_ibs_start,
> >               .stop           = perf_ibs_stop,
> >               .read           = perf_ibs_read,
> > -             .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
> >       },
> >       .msr                    = MSR_AMD64_IBSFETCHCTL,
> >       .config_mask            = IBS_FETCH_CONFIG_MASK,
> > @@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
> >               .start          = perf_ibs_start,
> >               .stop           = perf_ibs_stop,
> >               .read           = perf_ibs_read,
> > -             .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
> >       },
> >       .msr                    = MSR_AMD64_IBSOPCTL,
> >       .config_mask            = IBS_OP_CONFIG_MASK,
> > @@ -1111,6 +1114,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> >               regs.flags |= PERF_EFLAGS_EXACT;
> >       }
> >
> > +     if (perf_exclude_event(event, &regs)) {
> > +             throttle = perf_event_account_interrupt(event);
> > +             atomic64_inc(&event->dropped_samples);
> > +             goto out;
> > +     }
> > +
> >       if (event->attr.sample_type & PERF_SAMPLE_RAW) {
> >               raw = (struct perf_raw_record){
> >                       .frag = {
> > --
> > 2.46.0.469.g59c65b2a67-goog
> >
Re: [RFC/PATCH 4/4] perf/x86: Relax privilege filter restriction on AMD IBS
Posted by Peter Zijlstra 1 year, 3 months ago
On Mon, Sep 02, 2024 at 10:30:19AM -0700, Namhyung Kim wrote:
> On Mon, Sep 2, 2024 at 2:12 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Aug 30, 2024 at 04:29:10PM -0700, Namhyung Kim wrote:
> > > While IBS is available for per-thread profiling, still regular users
> > > cannot open an event due to the default paranoid setting (2) which
> > > doesn't allow unprivileged users to get kernel samples.  That means
> > > it needs to set exclude_kernel bit in the attribute but IBS driver
> > > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
> > > we want and I've been getting requests to fix this issue.
> > >
> > > This should be done in the hardware, but until we get the HW fix we may
> > > allow exclude_{kernel,user} in the attribute and silently drop the
> > > samples in the PMU IRQ handler.  It won't guarantee the sampling
> > > frequency or even it'd miss some with fixed period too.  Not ideal,
> > > but that'd still be helpful to regular users.
> >
> > Urgh.... this is really rather bad. And I'm sure a bunch of people are
> > going to be spending a lot of time trying to figure out why their
> > results don't make sense.
> 
> I agree it can be confusing but there are use cases where regular users
> want IBS information like memory data source, data address and so on.

Sure, but I'm a bit worried about users not being aware of this
trickery. This makes IBS events that have exclude_kernel=1 behave
significantly different from those that don't have it.

At the very least you should kill the IBS forward in amd_pmu_hw_config()
when this is active. But perhaps we should also emit a one time
KERN_INFO message when such an event gets created?

> Also I realized that software events like cpu-clock use the same logic to
> discard samples by privilege mode already.

Right, but everybody expects the software things to suck :-) And they
always suck, unconditionally.

While the IBS thing only sucks when you use exclude_[user,kernel]
things. Stealth suckage is bad and enrages people.

> > I realize that having entry hooks to disable/enable the counters is also
> > not going to happen, this has a ton of problems too.
> 
> Do you mean kernel/user mode change hook?  I guess it'd be too costly.

Yep, insanely expensive :-)

> > Also, that PMU passthrough patch set has guest hooks, so you can
> > actually do the exclude_host/guest nonsense with those, right?
> 
> Oh.. this patch is about exclude_user/kernel not host/guest.  Anyway
> it'd be great if IBS could support the guest hooks and allow the exclude
> bits.

Right, but since your other patchset about disabling exclude_guest
because IBS don't support it I figured I'd mention that it would be
fairly simple to fix.