While IBS is available for per-thread profiling, still regular users
cannot open an event due to the default paranoid setting (2) which
doesn't allow unprivileged users to get kernel samples. That means
it needs to set exclude_kernel bit in the attribute but IBS driver
would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
we want and I've been getting requests to fix this issue.
This should be done in the hardware, but until we get the HW fix we may
allow exclude_{kernel,user} in the attribute and silently drop the
samples in the PMU IRQ handler. It won't guarantee the sampling
frequency or even it'd miss some with fixed period too. Not ideal,
but that'd still be helpful to regular users.
To minimize the confusion, let's add 'swfilt' bit to attr.config2 which
is exposed in the sysfs format directory so that users can figure out
if the kernel support the privilege filters by software.
$ perf record -e ibs_op/swfilt=1/uh true
This uses perf_exclude_event() which checks regs->cs. But it should be
fine because set_linear_ip() also updates the CS according to the RIP
provided by IBS.
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/x86/events/amd/ibs.c | 50 ++++++++++++++++++++++++++++-----------
1 file changed, 36 insertions(+), 14 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d6243e4..7fea9527971a7aae 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -31,6 +31,8 @@ static u32 ibs_caps;
#define IBS_FETCH_CONFIG_MASK (IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
#define IBS_OP_CONFIG_MASK IBS_OP_MAX_CNT
+/* attr.config2 */
+#define IBS_SW_FILTER_MASK 1
/*
* IBS states:
@@ -290,6 +292,15 @@ static int perf_ibs_init(struct perf_event *event)
if (has_branch_stack(event))
return -EOPNOTSUPP;
+ /* handle exclude_{user,kernel} in the IRQ handler */
+ if (event->attr.exclude_hv || event->attr.exclude_idle ||
+ event->attr.exclude_host || event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (!(event->attr.config2 & IBS_SW_FILTER_MASK) &&
+ (event->attr.exclude_kernel || event->attr.exclude_user))
+ return -EINVAL;
+
ret = validate_group(event);
if (ret)
return ret;
@@ -550,24 +561,14 @@ static struct attribute *attrs_empty[] = {
NULL,
};
-static struct attribute_group empty_format_group = {
- .name = "format",
- .attrs = attrs_empty,
-};
-
static struct attribute_group empty_caps_group = {
.name = "caps",
.attrs = attrs_empty,
};
-static const struct attribute_group *empty_attr_groups[] = {
- &empty_format_group,
- &empty_caps_group,
- NULL,
-};
-
PMU_FORMAT_ATTR(rand_en, "config:57");
PMU_FORMAT_ATTR(cnt_ctl, "config:19");
+PMU_FORMAT_ATTR(swfilt, "config2:0");
PMU_EVENT_ATTR_STRING(l3missonly, fetch_l3missonly, "config:59");
PMU_EVENT_ATTR_STRING(l3missonly, op_l3missonly, "config:16");
PMU_EVENT_ATTR_STRING(zen4_ibs_extensions, zen4_ibs_extensions, "1");
@@ -583,6 +584,11 @@ static struct attribute *rand_en_attrs[] = {
NULL,
};
+static struct attribute *swfilt_attrs[] = {
+ &format_attr_swfilt.attr,
+ NULL,
+};
+
static struct attribute *fetch_l3missonly_attrs[] = {
&fetch_l3missonly.attr.attr,
NULL,
@@ -598,6 +604,11 @@ static struct attribute_group group_rand_en = {
.attrs = rand_en_attrs,
};
+static struct attribute_group group_swfilt = {
+ .name = "format",
+ .attrs = swfilt_attrs,
+};
+
static struct attribute_group group_fetch_l3missonly = {
.name = "format",
.attrs = fetch_l3missonly_attrs,
@@ -612,6 +623,7 @@ static struct attribute_group group_zen4_ibs_extensions = {
static const struct attribute_group *fetch_attr_groups[] = {
&group_rand_en,
+ &group_swfilt,
&empty_caps_group,
NULL,
};
@@ -650,6 +662,12 @@ static struct attribute_group group_op_l3missonly = {
.is_visible = zen4_ibs_extensions_is_visible,
};
+static const struct attribute_group *op_attr_groups[] = {
+ &group_swfilt,
+ &empty_caps_group,
+ NULL,
+};
+
static const struct attribute_group *op_attr_update[] = {
&group_cnt_ctl,
&group_op_l3missonly,
@@ -667,7 +685,6 @@ static struct perf_ibs perf_ibs_fetch = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSFETCHCTL,
.config_mask = IBS_FETCH_CONFIG_MASK,
@@ -691,7 +708,6 @@ static struct perf_ibs perf_ibs_op = {
.start = perf_ibs_start,
.stop = perf_ibs_stop,
.read = perf_ibs_read,
- .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr = MSR_AMD64_IBSOPCTL,
.config_mask = IBS_OP_CONFIG_MASK,
@@ -1111,6 +1127,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
regs.flags |= PERF_EFLAGS_EXACT;
}
+ if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
+ perf_exclude_event(event, ®s)) {
+ throttle = perf_event_account_interrupt(event);
+ goto out;
+ }
+
if (event->attr.sample_type & PERF_SAMPLE_RAW) {
raw = (struct perf_raw_record){
.frag = {
@@ -1228,7 +1250,7 @@ static __init int perf_ibs_op_init(void)
if (ibs_caps & IBS_CAPS_ZEN4)
perf_ibs_op.config_mask |= IBS_OP_L3MISSONLY;
- perf_ibs_op.pmu.attr_groups = empty_attr_groups;
+ perf_ibs_op.pmu.attr_groups = op_attr_groups;
perf_ibs_op.pmu.attr_update = op_attr_update;
return perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
--
2.46.0.469.g59c65b2a67-goog
Hi Namhyung,
On 05-Sep-24 8:40 AM, Namhyung Kim wrote:
> While IBS is available for per-thread profiling, still regular users
> cannot open an event due to the default paranoid setting (2) which
> doesn't allow unprivileged users to get kernel samples. That means
> it needs to set exclude_kernel bit in the attribute but IBS driver
> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> we want and I've been getting requests to fix this issue.
>
> This should be done in the hardware, but until we get the HW fix we may
> allow exclude_{kernel,user} in the attribute and silently drop the
> samples in the PMU IRQ handler. It won't guarantee the sampling
> frequency or even it'd miss some with fixed period too. Not ideal,
> but that'd still be helpful to regular users.
>
> To minimize the confusion, let's add 'swfilt' bit to attr.config2 which
> is exposed in the sysfs format directory so that users can figure out
> if the kernel support the privilege filters by software.
>
> $ perf record -e ibs_op/swfilt=1/uh true
Shall we add an example in tools/perf/Documentation/perf-amd-ibs.txt?
> +static struct attribute *swfilt_attrs[] = {
> + &format_attr_swfilt.attr,
> + NULL,
> +};
> +
> static struct attribute *fetch_l3missonly_attrs[] = {
> &fetch_l3missonly.attr.attr,
> NULL,
> @@ -598,6 +604,11 @@ static struct attribute_group group_rand_en = {
> .attrs = rand_en_attrs,
> };
>
> +static struct attribute_group group_swfilt = {
> + .name = "format",
> + .attrs = swfilt_attrs,
> +};
> +
> static struct attribute_group group_fetch_l3missonly = {
> .name = "format",
> .attrs = fetch_l3missonly_attrs,
> @@ -612,6 +623,7 @@ static struct attribute_group group_zen4_ibs_extensions = {
>
> static const struct attribute_group *fetch_attr_groups[] = {
> &group_rand_en,
> + &group_swfilt,
> &empty_caps_group,
> NULL,
> };
Causes:
# dmesg
sysfs: cannot create duplicate filename '/devices/ibs_fetch/format'
Failed to register pmu: ibs_fetch, reason -17
Rename rand_en_attrs[] to fetch_attrs[], add &format_attr_swfilt.attr
to it and remove &group_swfilt from fetch_attr_groups[]. And I guess
it should work.
Thanks,
Ravi
On Tue, Oct 15, 2024 at 07:06:24PM +0530, Ravi Bangoria wrote:
> Hi Namhyung,
>
> On 05-Sep-24 8:40 AM, Namhyung Kim wrote:
> > While IBS is available for per-thread profiling, still regular users
> > cannot open an event due to the default paranoid setting (2) which
> > doesn't allow unprivileged users to get kernel samples. That means
> > it needs to set exclude_kernel bit in the attribute but IBS driver
> > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> > we want and I've been getting requests to fix this issue.
> >
> > This should be done in the hardware, but until we get the HW fix we may
> > allow exclude_{kernel,user} in the attribute and silently drop the
> > samples in the PMU IRQ handler. It won't guarantee the sampling
> > frequency or even it'd miss some with fixed period too. Not ideal,
> > but that'd still be helpful to regular users.
> >
> > To minimize the confusion, let's add 'swfilt' bit to attr.config2 which
> > is exposed in the sysfs format directory so that users can figure out
> > if the kernel support the privilege filters by software.
> >
> > $ perf record -e ibs_op/swfilt=1/uh true
>
> Shall we add an example in tools/perf/Documentation/perf-amd-ibs.txt?
Yep, I'll update the document when I send the tooling changes.
>
>
> > +static struct attribute *swfilt_attrs[] = {
> > + &format_attr_swfilt.attr,
> > + NULL,
> > +};
> > +
> > static struct attribute *fetch_l3missonly_attrs[] = {
> > &fetch_l3missonly.attr.attr,
> > NULL,
> > @@ -598,6 +604,11 @@ static struct attribute_group group_rand_en = {
> > .attrs = rand_en_attrs,
> > };
> >
> > +static struct attribute_group group_swfilt = {
> > + .name = "format",
> > + .attrs = swfilt_attrs,
> > +};
> > +
> > static struct attribute_group group_fetch_l3missonly = {
> > .name = "format",
> > .attrs = fetch_l3missonly_attrs,
> > @@ -612,6 +623,7 @@ static struct attribute_group group_zen4_ibs_extensions = {
> >
> > static const struct attribute_group *fetch_attr_groups[] = {
> > &group_rand_en,
> > + &group_swfilt,
> > &empty_caps_group,
> > NULL,
> > };
>
> Causes:
>
> # dmesg
> sysfs: cannot create duplicate filename '/devices/ibs_fetch/format'
> Failed to register pmu: ibs_fetch, reason -17
>
> Rename rand_en_attrs[] to fetch_attrs[], add &format_attr_swfilt.attr
> to it and remove &group_swfilt from fetch_attr_groups[]. And I guess
> it should work.
Thanks for the review, I'll update the code as you suggested.
Namhyung
Hi Namhyung,
> While IBS is available for per-thread profiling, still regular users
> cannot open an event due to the default paranoid setting (2) which
> doesn't allow unprivileged users to get kernel samples. That means
> it needs to set exclude_kernel bit in the attribute but IBS driver
> would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> we want and I've been getting requests to fix this issue.
I'm working on some IBS improvements that impacts this change as well.
Is it be possible to hold off this patch for some time. I'll try to
post my patches soon.
> @@ -1111,6 +1127,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> regs.flags |= PERF_EFLAGS_EXACT;
> }
>
> + if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
> + perf_exclude_event(event, ®s)) {
> + throttle = perf_event_account_interrupt(event);
> + goto out;
> + }
Throttling can give surprises when the sample period is very small.
For ex,
$ ./perf record -e cycles:uh -c 192 -- make
[ perf record: Woken up 52 times to write data ]
[ perf record: Captured and wrote 23.016 MB perf.data (705634 samples) ]
$ ./perf record -e ibs_op/swfilt=1/uh -c 192 -- make
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.608 MB perf.data (19 samples) ]
It seems like the IBS event gets throttled (and disabled) before the
cpu get a chance to go back to userspace), hence we end up with very
few samples.
Thanks,
Ravi
Hello,
On Mon, Sep 23, 2024 at 04:03:47PM +0530, Ravi Bangoria wrote:
> Hi Namhyung,
>
> > While IBS is available for per-thread profiling, still regular users
> > cannot open an event due to the default paranoid setting (2) which
> > doesn't allow unprivileged users to get kernel samples. That means
> > it needs to set exclude_kernel bit in the attribute but IBS driver
> > would reject it since it has PERF_PMU_CAP_NO_EXCLUDE. This is not what
> > we want and I've been getting requests to fix this issue.
>
> I'm working on some IBS improvements that impacts this change as well.
> Is it be possible to hold off this patch for some time. I'll try to
> post my patches soon.
>
> > @@ -1111,6 +1127,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> > regs.flags |= PERF_EFLAGS_EXACT;
> > }
> >
> > + if ((event->attr.config2 & IBS_SW_FILTER_MASK) &&
> > + perf_exclude_event(event, ®s)) {
> > + throttle = perf_event_account_interrupt(event);
> > + goto out;
> > + }
>
> Throttling can give surprises when the sample period is very small.
> For ex,
>
> $ ./perf record -e cycles:uh -c 192 -- make
> [ perf record: Woken up 52 times to write data ]
> [ perf record: Captured and wrote 23.016 MB perf.data (705634 samples) ]
>
> $ ./perf record -e ibs_op/swfilt=1/uh -c 192 -- make
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.608 MB perf.data (19 samples) ]
>
> It seems like the IBS event gets throttled (and disabled) before the
> cpu get a chance to go back to userspace), hence we end up with very
> few samples.
Thanks for raising this issue. This indeed looks like a surprising
result. Not sure what we can do here other than adding a documentation
to refrain from using such a small period. I don't think we want to
skip the throttling logic for the filtered samples. Otherwise it can be
used for DoS-like attacks IMHO.
Thanks,
Namhyung
© 2016 - 2025 Red Hat, Inc.