[RFC/PATCH] perf/x86: Relax privilege filter restriction on AMD IBS

Namhyung Kim posted 1 patch 1 year, 5 months ago
arch/x86/events/amd/ibs.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
[RFC/PATCH] perf/x86: Relax privilege filter restriction on AMD IBS
Posted by Namhyung Kim 1 year, 5 months ago
While IBS is available for per-thread profiling, still regular users
cannot open an event due to the default paranoid setting (2) which
doesn't allow unprivileged users to get kernel samples.  That means
it needs to set exclude_kernel bit in the attribute but IBS driver
would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
we want and I've been getting requests to fix this issue.

This should be done in the hardware, but until we get the fix we may
allow exclude_{kernel,user} in the attribute and silently drop the
samples in the PMU IRQ handler.  It won't guarantee the sampling
frequency or even it'd miss some with fixed period too.  Not ideal,
but that'd still be helpful to regular users.

I also think that it should be able to inform the number of dropped
samples to userspace so I've increased the lost_samples count.  This
can be read with the PERF_FORMAT_LOST (perf tools set it by default)
so I didn't emit the PERF_RECORD_LOST_SAMPLES for this.

I'm not sure if it's acceptable since it might be confusing to figure
out whether it's because of a lack of the ring buffer or it's dropped
due to privileges.  Or we can add a new record format for this.  Let me
know if you have a better idea.

Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 arch/x86/events/amd/ibs.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d62..6f514072c440 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
 	if (has_branch_stack(event))
 		return -EOPNOTSUPP;
 
+	/* handle exclude_{user,kernel} in the IRQ handler */
+	if (event->attr.exclude_hv || event->attr.exclude_idle ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
+
 	ret = validate_group(event);
 	if (ret)
 		return ret;
@@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
 		.start		= perf_ibs_start,
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
-		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
 	},
 	.msr			= MSR_AMD64_IBSFETCHCTL,
 	.config_mask		= IBS_FETCH_CONFIG_MASK,
@@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
 		.start		= perf_ibs_start,
 		.stop		= perf_ibs_stop,
 		.read		= perf_ibs_read,
-		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
 	},
 	.msr			= MSR_AMD64_IBSOPCTL,
 	.config_mask		= IBS_OP_CONFIG_MASK,
@@ -1062,6 +1065,13 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	if (!perf_ibs_set_period(perf_ibs, hwc, &period))
 		goto out;	/* no sw counter overflow */
 
+	if ((event->attr.exclude_kernel && !user_mode(iregs)) ||
+	    (event->attr.exclude_user && user_mode(iregs))) {
+		throttle = perf_event_account_interrupt(event);
+		atomic64_inc(&event->lost_samples);
+		goto out;
+	}
+
 	ibs_data.caps = ibs_caps;
 	size = 1;
 	offset = 1;
-- 
2.46.0.295.g3b9ea8a38a-goog
Re: [RFC/PATCH] perf/x86: Relax privilege filter restriction on AMD IBS
Posted by Ravi Bangoria 1 year, 5 months ago
Hi Namhyung,

On 23-Aug-24 4:38 AM, Namhyung Kim wrote:
> While IBS is available for per-thread profiling, still regular users
> cannot open an event due to the default paranoid setting (2) which
> doesn't allow unprivileged users to get kernel samples.  That means
> it needs to set exclude_kernel bit in the attribute but IBS driver
> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
> we want and I've been getting requests to fix this issue.
> 
> This should be done in the hardware, but until we get the fix we may
> allow exclude_{kernel,user} in the attribute and silently drop the
> samples in the PMU IRQ handler.  It won't guarantee the sampling
> frequency or even it'd miss some with fixed period too.  Not ideal,
> but that'd still be helpful to regular users.
> 
> I also think that it should be able to inform the number of dropped
> samples to userspace so I've increased the lost_samples count.  This
> can be read with the PERF_FORMAT_LOST (perf tools set it by default)
> so I didn't emit the PERF_RECORD_LOST_SAMPLES for this.

Samples are discarded not lost. Should we count them as lost?

> I'm not sure if it's acceptable since it might be confusing to figure
> out whether it's because of a lack of the ring buffer or it's dropped
> due to privileges.  Or we can add a new record format for this.  Let me
> know if you have a better idea.
> 
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Cc: Stephane Eranian <eranian@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  arch/x86/events/amd/ibs.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index e91970b01d62..6f514072c440 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
>  	if (has_branch_stack(event))
>  		return -EOPNOTSUPP;
>  
> +	/* handle exclude_{user,kernel} in the IRQ handler */
> +	if (event->attr.exclude_hv || event->attr.exclude_idle ||
> +	    event->attr.exclude_host || event->attr.exclude_guest)
> +		return -EINVAL;
> +
>  	ret = validate_group(event);
>  	if (ret)
>  		return ret;
> @@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
>  		.start		= perf_ibs_start,
>  		.stop		= perf_ibs_stop,
>  		.read		= perf_ibs_read,
> -		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
>  	},
>  	.msr			= MSR_AMD64_IBSFETCHCTL,
>  	.config_mask		= IBS_FETCH_CONFIG_MASK,
> @@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
>  		.start		= perf_ibs_start,
>  		.stop		= perf_ibs_stop,
>  		.read		= perf_ibs_read,
> -		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
>  	},
>  	.msr			= MSR_AMD64_IBSOPCTL,
>  	.config_mask		= IBS_OP_CONFIG_MASK,
> @@ -1062,6 +1065,13 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>  	if (!perf_ibs_set_period(perf_ibs, hwc, &period))
>  		goto out;	/* no sw counter overflow */
>  
> +	if ((event->attr.exclude_kernel && !user_mode(iregs)) ||
> +	    (event->attr.exclude_user && user_mode(iregs))) {

Should we use kernel_ip() instead? That would be accurate since RIP is
provided by IBS hw itself.

user_mode() relies on CS which is captured at interrupt time, not at the
sample capture time, and processor might have switched privilege mode by
the time IBS interrupt arrives. We might need to fallback to user_mode()
if ibs_op_data->op_rip_invalid is set.

Wondering, should perf_misc_flags() also switch to kernel_ip() ?

> +		throttle = perf_event_account_interrupt(event);
> +		atomic64_inc(&event->lost_samples);
> +		goto out;
> +	}
> +
>  	ibs_data.caps = ibs_caps;
>  	size = 1;
>  	offset = 1;

Thanks,
Ravi
Re: [RFC/PATCH] perf/x86: Relax privilege filter restriction on AMD IBS
Posted by Namhyung Kim 1 year, 5 months ago
Hi Ravi,

Thanks for your review!

On Mon, Aug 26, 2024 at 9:42 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Hi Namhyung,
>
> On 23-Aug-24 4:38 AM, Namhyung Kim wrote:
> > While IBS is available for per-thread profiling, still regular users
> > cannot open an event due to the default paranoid setting (2) which
> > doesn't allow unprivileged users to get kernel samples.  That means
> > it needs to set exclude_kernel bit in the attribute but IBS driver
> > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE.  This is not what
> > we want and I've been getting requests to fix this issue.
> >
> > This should be done in the hardware, but until we get the fix we may
> > allow exclude_{kernel,user} in the attribute and silently drop the
> > samples in the PMU IRQ handler.  It won't guarantee the sampling
> > frequency or even it'd miss some with fixed period too.  Not ideal,
> > but that'd still be helpful to regular users.
> >
> > I also think that it should be able to inform the number of dropped
> > samples to userspace so I've increased the lost_samples count.  This
> > can be read with the PERF_FORMAT_LOST (perf tools set it by default)
> > so I didn't emit the PERF_RECORD_LOST_SAMPLES for this.
>
> Samples are discarded not lost. Should we count them as lost?

Yeah, I'm asking for help on how to handle them properly.  We could:

1. ignore dropped samples
2. count them as lost
3. count them separately and emit a (new) record in the ring buffer
4. count them separately and let users can read(2) them with a new format
5. anything else?

>
> > I'm not sure if it's acceptable since it might be confusing to figure
> > out whether it's because of a lack of the ring buffer or it's dropped
> > due to privileges.  Or we can add a new record format for this.  Let me
> > know if you have a better idea.
> >
> > Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> > Cc: Stephane Eranian <eranian@google.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  arch/x86/events/amd/ibs.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> > index e91970b01d62..6f514072c440 100644
> > --- a/arch/x86/events/amd/ibs.c
> > +++ b/arch/x86/events/amd/ibs.c
> > @@ -290,6 +290,11 @@ static int perf_ibs_init(struct perf_event *event)
> >       if (has_branch_stack(event))
> >               return -EOPNOTSUPP;
> >
> > +     /* handle exclude_{user,kernel} in the IRQ handler */
> > +     if (event->attr.exclude_hv || event->attr.exclude_idle ||
> > +         event->attr.exclude_host || event->attr.exclude_guest)
> > +             return -EINVAL;
> > +
> >       ret = validate_group(event);
> >       if (ret)
> >               return ret;
> > @@ -667,7 +672,6 @@ static struct perf_ibs perf_ibs_fetch = {
> >               .start          = perf_ibs_start,
> >               .stop           = perf_ibs_stop,
> >               .read           = perf_ibs_read,
> > -             .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
> >       },
> >       .msr                    = MSR_AMD64_IBSFETCHCTL,
> >       .config_mask            = IBS_FETCH_CONFIG_MASK,
> > @@ -691,7 +695,6 @@ static struct perf_ibs perf_ibs_op = {
> >               .start          = perf_ibs_start,
> >               .stop           = perf_ibs_stop,
> >               .read           = perf_ibs_read,
> > -             .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
> >       },
> >       .msr                    = MSR_AMD64_IBSOPCTL,
> >       .config_mask            = IBS_OP_CONFIG_MASK,
> > @@ -1062,6 +1065,13 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> >       if (!perf_ibs_set_period(perf_ibs, hwc, &period))
> >               goto out;       /* no sw counter overflow */
> >
> > +     if ((event->attr.exclude_kernel && !user_mode(iregs)) ||
> > +         (event->attr.exclude_user && user_mode(iregs))) {
>
> Should we use kernel_ip() instead? That would be accurate since RIP is
> provided by IBS hw itself.
>
> user_mode() relies on CS which is captured at interrupt time, not at the
> sample capture time, and processor might have switched privilege mode by
> the time IBS interrupt arrives. We might need to fallback to user_mode()
> if ibs_op_data->op_rip_invalid is set.

Sure.  I'll update it in v2.

>
> Wondering, should perf_misc_flags() also switch to kernel_ip() ?

Probably.

Thanks,
Namhyung

>
> > +             throttle = perf_event_account_interrupt(event);
> > +             atomic64_inc(&event->lost_samples);
> > +             goto out;
> > +     }
> > +
> >       ibs_data.caps = ibs_caps;
> >       size = 1;
> >       offset = 1;
>
> Thanks,
> Ravi