[v11] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

[PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Babu Moger 1 year ago

Assign/unassign counters on resctrl group creation/deletion. Two counters
are required per group, one for MBM total event and one for MBM local
event.

There are a limited number of counters available for assignment. If these
counters are exhausted, the kernel will display the error message: "Out of
MBM assignable counters". However, it is not necessary to fail the
creation of a group due to assignment failures. Users have the flexibility
to modify the assignments at a later time.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v11: Moved mbm_cntr_reset() to monitor.c.
     Added code reset non-architectural state in mbm_cntr_reset().
     Added missing rdtgroup_unassign_cntrs() calls on failure path.

v10: Assigned the counter before exposing the event files.
    Moved the call rdtgroup_assign_cntrs() inside mkdir_rdt_prepare_rmid_alloc().
    This is called both CNTR_MON and MON group creation.
    Call mbm_cntr_reset() when unmounted to clear all the assignments.
    Taken care of few other feedback comments.

v9: Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
    Updated couple of rdtgroup_unassign_cntrs() calls properly.
    Updated function comments.

v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
    Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
    Fixed the problem with unassigning the child MON groups of CTRL_MON group.

v7: Reworded the commit message.
    Removed the reference of ABMC with mbm_cntr_assign.
    Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.

v6: Removed the redundant comments on all the calls of
    rdtgroup_assign_cntrs. Updated the commit message.
    Dropped printing error message on every call of rdtgroup_assign_cntrs.

v5: Removed the code to enable/disable ABMC during the mount.
    That will be another patch.
    Added arch callers to get the arch specific data.
    Renamed fuctions to match the other abmc function.
    Added code comments for assignment failures.

v4: Few name changes based on the upstream discussion.
    Commit message update.

v3: This is a new patch. Patch addresses the upstream comment to enable
    ABMC feature by default if the feature is available.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  | 27 +++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 63 +++++++++++++++++++++++++-
 3 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a5b8eadc7f5c..c979abb3d3b0 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -731,4 +731,5 @@ int resctrl_assign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
 			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
 int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d,
 				struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
+void mbm_cntr_reset(struct rdt_resource *r);
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b6d188d0f9b7..118b39fbb01e 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1557,3 +1557,30 @@ int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d
 
 	return ret;
 }
+
+void mbm_cntr_reset(struct rdt_resource *r)
+{
+	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+	struct rdt_mon_domain *dom;
+
+	/*
+	 * Reset the domain counter configuration. Hardware counters
+	 * will reset after switching the monitor mode. So, reset the
+	 * architectural amd non-architectural state so that reading
+	 * of hardware counter is not considered as an overflow in the
+	 * next update.
+	 */
+	if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
+		list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+			memset(dom->cntr_cfg, 0,
+			       sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
+			if (is_mbm_total_enabled())
+				memset(dom->mbm_total, 0,
+				       sizeof(struct mbm_state) * idx_limit);
+			if (is_mbm_local_enabled())
+				memset(dom->mbm_local, 0,
+				       sizeof(struct mbm_state) * idx_limit);
+			resctrl_arch_reset_rmid_all(r, dom);
+		}
+	}
+}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 2b86124c336b..f61f0cd032ef 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2687,6 +2687,46 @@ static void schemata_list_destroy(void)
 	}
 }
 
+/*
+ * Called when a new group is created. If "mbm_cntr_assign" mode is enabled,
+ * counters are automatically assigned. Each group can accommodate two counters:
+ * one for the total event and one for the local event. Assignments may fail
+ * due to the limited number of counters. However, it is not necessary to fail
+ * the group creation and thus no failure is returned. Users have the option
+ * to modify the counter assignments after the group has been created.
+ */
+static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+		return;
+
+	if (is_mbm_total_enabled())
+		resctrl_assign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (is_mbm_local_enabled())
+		resctrl_assign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
+/*
+ * Called when a group is deleted. Counters are unassigned if it was in
+ * assigned state.
+ */
+static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
+		return;
+
+	if (is_mbm_total_enabled())
+		resctrl_unassign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	if (is_mbm_local_enabled())
+		resctrl_unassign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -2741,6 +2781,8 @@ static int rdt_get_tree(struct fs_context *fc)
 		if (ret < 0)
 			goto out_info;
 
+		rdtgroup_assign_cntrs(&rdtgroup_default);
+
 		ret = mkdir_mondata_all(rdtgroup_default.kn,
 					&rdtgroup_default, &kn_mondata);
 		if (ret < 0)
@@ -2779,8 +2821,10 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (resctrl_arch_mon_capable())
 		kernfs_remove(kn_mondata);
 out_mongrp:
-	if (resctrl_arch_mon_capable())
+	if (resctrl_arch_mon_capable()) {
+		rdtgroup_unassign_cntrs(&rdtgroup_default);
 		kernfs_remove(kn_mongrp);
+	}
 out_info:
 	kernfs_remove(kn_info);
 out_schemata_free:
@@ -2956,6 +3000,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
 
 	head = &rdtgrp->mon.crdtgrp_list;
 	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
+		rdtgroup_unassign_cntrs(sentry);
 		free_rmid(sentry->closid, sentry->mon.rmid);
 		list_del(&sentry->mon.crdtgrp_list);
 
@@ -2996,6 +3041,8 @@ static void rmdir_all_sub(void)
 		cpumask_or(&rdtgroup_default.cpu_mask,
 			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
 
+		rdtgroup_unassign_cntrs(rdtgrp);
+
 		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 		kernfs_remove(rdtgrp->kn);
@@ -3027,6 +3074,8 @@ static void rdt_kill_sb(struct super_block *sb)
 	for_each_alloc_capable_rdt_resource(r)
 		reset_all_ctrls(r);
 	rmdir_all_sub();
+	rdtgroup_unassign_cntrs(&rdtgroup_default);
+	mbm_cntr_reset(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
 	schemata_list_destroy();
@@ -3490,9 +3539,12 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 	}
 	rdtgrp->mon.rmid = ret;
 
+	rdtgroup_assign_cntrs(rdtgrp);
+
 	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
 	if (ret) {
 		rdt_last_cmd_puts("kernfs subdir error\n");
+		rdtgroup_unassign_cntrs(rdtgrp);
 		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 		return ret;
 	}
@@ -3502,8 +3554,10 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 
 static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
 {
-	if (resctrl_arch_mon_capable())
+	if (resctrl_arch_mon_capable()) {
+		rdtgroup_unassign_cntrs(rgrp);
 		free_rmid(rgrp->closid, rgrp->mon.rmid);
+	}
 }
 
 static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
@@ -3764,6 +3818,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	update_closid_rmid(tmpmask, NULL);
 
 	rdtgrp->flags = RDT_DELETED;
+
+	rdtgroup_unassign_cntrs(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 	/*
@@ -3810,6 +3867,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
 	update_closid_rmid(tmpmask, NULL);
 
+	rdtgroup_unassign_cntrs(rdtgrp);
+
 	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 	closid_free(rdtgrp->closid);
 
-- 
2.34.1

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Dave Martin 11 months, 3 weeks ago

Hi,

On Wed, Jan 22, 2025 at 02:20:25PM -0600, Babu Moger wrote:
> Assign/unassign counters on resctrl group creation/deletion. Two counters
> are required per group, one for MBM total event and one for MBM local
> event.
> 
> There are a limited number of counters available for assignment. If these
> counters are exhausted, the kernel will display the error message: "Out of
> MBM assignable counters". However, it is not necessary to fail the
> creation of a group due to assignment failures. Users have the flexibility
> to modify the assignments at a later time.

If we are doing this, should turning mbm_cntr_assign mode on also
trigger auto-assingment for all extant monitoring groups?

Either way though, this auto-assignment feels like a potential nuisance
for userspace.

If the userspace use-case requires too many monitoring groups for the
available counters, then the kernel will auto-assign counters to a
random subset of groups which may or may not be the ones that userspace
wanted to monitor; then userspace must manually look for the assigned
counters and unassign some of them before they can be assigned where
userspace actually wanted them.

This is not impossible for userspace to cope with, but it feels
awkward.

Is there a way to inhibit auto-assignment?

Or could automatic assignments be considered somehow "weak", so that
new explicit assignments can steal automatically assigned counters
without the need to unassign them explicitly?

[...]

Cheers
---Dave

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Peter Newman 11 months, 3 weeks ago

Hi Dave,

On Wed, Feb 19, 2025 at 2:41 PM Dave Martin <Dave.Martin@arm.com> wrote:
>
> Hi,
>
> On Wed, Jan 22, 2025 at 02:20:25PM -0600, Babu Moger wrote:
> > Assign/unassign counters on resctrl group creation/deletion. Two counters
> > are required per group, one for MBM total event and one for MBM local
> > event.
> >
> > There are a limited number of counters available for assignment. If these
> > counters are exhausted, the kernel will display the error message: "Out of
> > MBM assignable counters". However, it is not necessary to fail the
> > creation of a group due to assignment failures. Users have the flexibility
> > to modify the assignments at a later time.
>
> If we are doing this, should turning mbm_cntr_assign mode on also
> trigger auto-assingment for all extant monitoring groups?
>
> Either way though, this auto-assignment feels like a potential nuisance
> for userspace.
>
> If the userspace use-case requires too many monitoring groups for the
> available counters, then the kernel will auto-assign counters to a
> random subset of groups which may or may not be the ones that userspace
> wanted to monitor; then userspace must manually look for the assigned
> counters and unassign some of them before they can be assigned where
> userspace actually wanted them.
>
> This is not impossible for userspace to cope with, but it feels
> awkward.
>
> Is there a way to inhibit auto-assignment?
>
> Or could automatic assignments be considered somehow "weak", so that
> new explicit assignments can steal automatically assigned counters
> without the need to unassign them explicitly?

We had an incomplete discussion about this early on[1]. I guess I
didn't revisit it because I found it was trivial to add a flag that
inhibits the assignment behavior during mkdir and had moved on to
bigger issues.

If an agent creating directories isn't coordinated with the agent
managing counters, a series of creating and destroying a group could
prevent a monitor assignment from ever succeeding because it's not
possible to atomically discover the name of the new directory that
stole the previously-available counter and reassign it.

However, if the counter-manager can get all the counters assigned once
and only move them with atomic reassignments, it will become
impossible to snatch them with a mkdir.

-Peter

[1] https://lore.kernel.org/lkml/CALPaoCihfQ9VtLYzyHB9-PsQzXLc06BW8bzhBXwj9-i+Q8RVFQ@mail.gmail.com/

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Reinette Chatre 11 months, 3 weeks ago

Hi Dave and Peter,

On 2/19/25 6:09 AM, Peter Newman wrote:
> Hi Dave,
> 
> On Wed, Feb 19, 2025 at 2:41 PM Dave Martin <Dave.Martin@arm.com> wrote:
>>
>> Hi,
>>
>> On Wed, Jan 22, 2025 at 02:20:25PM -0600, Babu Moger wrote:
>>> Assign/unassign counters on resctrl group creation/deletion. Two counters
>>> are required per group, one for MBM total event and one for MBM local
>>> event.
>>>
>>> There are a limited number of counters available for assignment. If these
>>> counters are exhausted, the kernel will display the error message: "Out of
>>> MBM assignable counters". However, it is not necessary to fail the
>>> creation of a group due to assignment failures. Users have the flexibility
>>> to modify the assignments at a later time.
>>
>> If we are doing this, should turning mbm_cntr_assign mode on also
>> trigger auto-assingment for all extant monitoring groups?
>>
>> Either way though, this auto-assignment feels like a potential nuisance
>> for userspace.

hmmm ... this auto-assignment was created with the goal to help userspace.
In mbm_cntr_assign mode the user will only see data when a counter is assigned
to an event. mbm_cntr_assign mode is selected as default on a system that
supports ABMC. Without auto assignment a user will thus see different
behavior when reading the monitoring events when the user switches to a kernel with
assignable counter support: Before assignable counter support events will have
data, with assignable counter support the events will not have data. 

I understood that interfaces should not behave differently when user space
switches kernels and that is what the auto assignment aims to solve.

>>
>> If the userspace use-case requires too many monitoring groups for the
>> available counters, then the kernel will auto-assign counters to a
>> random subset of groups which may or may not be the ones that userspace
>> wanted to monitor; then userspace must manually look for the assigned
>> counters and unassign some of them before they can be assigned where
>> userspace actually wanted them.
>>
>> This is not impossible for userspace to cope with, but it feels
>> awkward.
>>
>> Is there a way to inhibit auto-assignment?
>>
>> Or could automatic assignments be considered somehow "weak", so that
>> new explicit assignments can steal automatically assigned counters
>> without the need to unassign them explicitly?
> 
> We had an incomplete discussion about this early on[1]. I guess I
> didn't revisit it because I found it was trivial to add a flag that
> inhibits the assignment behavior during mkdir and had moved on to
> bigger issues.

Could you please remind me how a user will set this flag?

> 
> If an agent creating directories isn't coordinated with the agent
> managing counters, a series of creating and destroying a group could
> prevent a monitor assignment from ever succeeding because it's not
> possible to atomically discover the name of the new directory that
> stole the previously-available counter and reassign it.
> 
> However, if the counter-manager can get all the counters assigned once
> and only move them with atomic reassignments, it will become
> impossible to snatch them with a mkdir.
> 

You have many points that makes auto-assignment not be ideal but I
remain concerned that not doing something like this will break
existing users who are not as familiar with resctrl internals.

Reinette

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Peter Newman 11 months, 3 weeks ago

Hi Reinette,

On Wed, Feb 19, 2025 at 6:55 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Dave and Peter,
>
> On 2/19/25 6:09 AM, Peter Newman wrote:
> > Hi Dave,
> >
> > On Wed, Feb 19, 2025 at 2:41 PM Dave Martin <Dave.Martin@arm.com> wrote:
> >>
> >> Hi,
> >>
> >> On Wed, Jan 22, 2025 at 02:20:25PM -0600, Babu Moger wrote:
> >>> Assign/unassign counters on resctrl group creation/deletion. Two counters
> >>> are required per group, one for MBM total event and one for MBM local
> >>> event.
> >>>
> >>> There are a limited number of counters available for assignment. If these
> >>> counters are exhausted, the kernel will display the error message: "Out of
> >>> MBM assignable counters". However, it is not necessary to fail the
> >>> creation of a group due to assignment failures. Users have the flexibility
> >>> to modify the assignments at a later time.
> >>
> >> If we are doing this, should turning mbm_cntr_assign mode on also
> >> trigger auto-assingment for all extant monitoring groups?
> >>
> >> Either way though, this auto-assignment feels like a potential nuisance
> >> for userspace.
>
> hmmm ... this auto-assignment was created with the goal to help userspace.
> In mbm_cntr_assign mode the user will only see data when a counter is assigned
> to an event. mbm_cntr_assign mode is selected as default on a system that
> supports ABMC. Without auto assignment a user will thus see different
> behavior when reading the monitoring events when the user switches to a kernel with
> assignable counter support: Before assignable counter support events will have
> data, with assignable counter support the events will not have data.
>
> I understood that interfaces should not behave differently when user space
> switches kernels and that is what the auto assignment aims to solve.
>
> >>
> >> If the userspace use-case requires too many monitoring groups for the
> >> available counters, then the kernel will auto-assign counters to a
> >> random subset of groups which may or may not be the ones that userspace
> >> wanted to monitor; then userspace must manually look for the assigned
> >> counters and unassign some of them before they can be assigned where
> >> userspace actually wanted them.
> >>
> >> This is not impossible for userspace to cope with, but it feels
> >> awkward.
> >>
> >> Is there a way to inhibit auto-assignment?
> >>
> >> Or could automatic assignments be considered somehow "weak", so that
> >> new explicit assignments can steal automatically assigned counters
> >> without the need to unassign them explicitly?
> >
> > We had an incomplete discussion about this early on[1]. I guess I
> > didn't revisit it because I found it was trivial to add a flag that
> > inhibits the assignment behavior during mkdir and had moved on to
> > bigger issues.
>
> Could you please remind me how a user will set this flag?

Quoting my original suggestion[1]:

 "info/L3_MON/mbm_assign_on_mkdir?

  boolean (parsed with kstrtobool()), defaulting to true?"

After mount, any groups that got counters on creation would have to be
cleaned up, but at least that can be done with forward progress once
the flag is cleared.

I was able to live with that as long as there aren't users polling for
resctrl to be mounted and immediately creating groups. For us, a
single container manager service manages resctrl.

>
> >
> > If an agent creating directories isn't coordinated with the agent
> > managing counters, a series of creating and destroying a group could
> > prevent a monitor assignment from ever succeeding because it's not
> > possible to atomically discover the name of the new directory that
> > stole the previously-available counter and reassign it.
> >
> > However, if the counter-manager can get all the counters assigned once
> > and only move them with atomic reassignments, it will become
> > impossible to snatch them with a mkdir.
> >
>
> You have many points that makes auto-assignment not be ideal but I
> remain concerned that not doing something like this will break
> existing users who are not as familiar with resctrl internals.

I agree auto-assignment should be the default. I just want an official
way to turn it off.

Thanks!
-Peter

[1] https://lore.kernel.org/lkml/CALPaoCiJ9ELXkij-zsAhxC1hx8UUR+KMPJH6i8c8AT6_mtXs+Q@mail.gmail.com/

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Dave Martin 11 months, 3 weeks ago

On Thu, Feb 20, 2025 at 11:35:56AM +0100, Peter Newman wrote:
> Hi Reinette,
> 
> On Wed, Feb 19, 2025 at 6:55 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
> >
> > Hi Dave and Peter,
> >
> > On 2/19/25 6:09 AM, Peter Newman wrote:
> > > Hi Dave,
> > >
> > > On Wed, Feb 19, 2025 at 2:41 PM Dave Martin <Dave.Martin@arm.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> On Wed, Jan 22, 2025 at 02:20:25PM -0600, Babu Moger wrote:
> > >>> Assign/unassign counters on resctrl group creation/deletion. Two counters
> > >>> are required per group, one for MBM total event and one for MBM local
> > >>> event.
> > >>>
> > >>> There are a limited number of counters available for assignment. If these
> > >>> counters are exhausted, the kernel will display the error message: "Out of
> > >>> MBM assignable counters". However, it is not necessary to fail the
> > >>> creation of a group due to assignment failures. Users have the flexibility
> > >>> to modify the assignments at a later time.
> > >>
> > >> If we are doing this, should turning mbm_cntr_assign mode on also
> > >> trigger auto-assingment for all extant monitoring groups?
> > >>
> > >> Either way though, this auto-assignment feels like a potential nuisance
> > >> for userspace.
> >
> > hmmm ... this auto-assignment was created with the goal to help userspace.
> > In mbm_cntr_assign mode the user will only see data when a counter is assigned
> > to an event. mbm_cntr_assign mode is selected as default on a system that
> > supports ABMC. Without auto assignment a user will thus see different
> > behavior when reading the monitoring events when the user switches to a kernel with
> > assignable counter support: Before assignable counter support events will have
> > data, with assignable counter support the events will not have data.
> >
> > I understood that interfaces should not behave differently when user space
> > switches kernels and that is what the auto assignment aims to solve.
> >
> > >>
> > >> If the userspace use-case requires too many monitoring groups for the
> > >> available counters, then the kernel will auto-assign counters to a
> > >> random subset of groups which may or may not be the ones that userspace
> > >> wanted to monitor; then userspace must manually look for the assigned
> > >> counters and unassign some of them before they can be assigned where
> > >> userspace actually wanted them.
> > >>
> > >> This is not impossible for userspace to cope with, but it feels
> > >> awkward.
> > >>
> > >> Is there a way to inhibit auto-assignment?
> > >>
> > >> Or could automatic assignments be considered somehow "weak", so that
> > >> new explicit assignments can steal automatically assigned counters
> > >> without the need to unassign them explicitly?
> > >
> > > We had an incomplete discussion about this early on[1]. I guess I
> > > didn't revisit it because I found it was trivial to add a flag that
> > > inhibits the assignment behavior during mkdir and had moved on to
> > > bigger issues.
> >
> > Could you please remind me how a user will set this flag?
> 
> Quoting my original suggestion[1]:
> 
>  "info/L3_MON/mbm_assign_on_mkdir?
> 
>   boolean (parsed with kstrtobool()), defaulting to true?"
> 
> After mount, any groups that got counters on creation would have to be
> cleaned up, but at least that can be done with forward progress once
> the flag is cleared.
> 
> I was able to live with that as long as there aren't users polling for
> resctrl to be mounted and immediately creating groups. For us, a
> single container manager service manages resctrl.
> 
> >
> > >
> > > If an agent creating directories isn't coordinated with the agent
> > > managing counters, a series of creating and destroying a group could
> > > prevent a monitor assignment from ever succeeding because it's not
> > > possible to atomically discover the name of the new directory that
> > > stole the previously-available counter and reassign it.
> > >
> > > However, if the counter-manager can get all the counters assigned once
> > > and only move them with atomic reassignments, it will become
> > > impossible to snatch them with a mkdir.
> > >
> >
> > You have many points that makes auto-assignment not be ideal but I
> > remain concerned that not doing something like this will break
> > existing users who are not as familiar with resctrl internals.
> 
> I agree auto-assignment should be the default. I just want an official
> way to turn it off.
> 
> Thanks!
> -Peter
> 
> [1] https://lore.kernel.org/lkml/CALPaoCiJ9ELXkij-zsAhxC1hx8UUR+KMPJH6i8c8AT6_mtXs+Q@mail.gmail.com/
> 

+1

That's basically my position -- the auto-assignment feels like a
_potential_ nuisance for ABMC-aware users, but it depends on what they
are trying to do.  Migration of non-ABMC-aware users will be easier for
basic use cases if auto-assignment occurs by default (as in this
series).

Having an explicit way to turn this off seems perfectly reasonable
(and could be added later on, if not provided in this series).


What about the question re whether turning mbm_cntr_assign mode on
should trigger auto-assignment?

Currently turning this mode off and then on again has the effect of
removing all automatic assignments for extant groups.  This feels
surprising and/or unintentional (?)

Cheers
---Dave

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Reinette Chatre 11 months, 3 weeks ago

Hi Dave,

On 2/20/25 5:40 AM, Dave Martin wrote:
> On Thu, Feb 20, 2025 at 11:35:56AM +0100, Peter Newman wrote:
>> Hi Reinette,
>>
>> On Wed, Feb 19, 2025 at 6:55 PM Reinette Chatre
>> <reinette.chatre@intel.com> wrote:
>>>
>>> Hi Dave and Peter,
>>>
>>> On 2/19/25 6:09 AM, Peter Newman wrote:
>>>> Hi Dave,
>>>>
>>>> On Wed, Feb 19, 2025 at 2:41 PM Dave Martin <Dave.Martin@arm.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On Wed, Jan 22, 2025 at 02:20:25PM -0600, Babu Moger wrote:
>>>>>> Assign/unassign counters on resctrl group creation/deletion. Two counters
>>>>>> are required per group, one for MBM total event and one for MBM local
>>>>>> event.
>>>>>>
>>>>>> There are a limited number of counters available for assignment. If these
>>>>>> counters are exhausted, the kernel will display the error message: "Out of
>>>>>> MBM assignable counters". However, it is not necessary to fail the
>>>>>> creation of a group due to assignment failures. Users have the flexibility
>>>>>> to modify the assignments at a later time.
>>>>>
>>>>> If we are doing this, should turning mbm_cntr_assign mode on also
>>>>> trigger auto-assingment for all extant monitoring groups?
>>>>>
>>>>> Either way though, this auto-assignment feels like a potential nuisance
>>>>> for userspace.
>>>
>>> hmmm ... this auto-assignment was created with the goal to help userspace.
>>> In mbm_cntr_assign mode the user will only see data when a counter is assigned
>>> to an event. mbm_cntr_assign mode is selected as default on a system that
>>> supports ABMC. Without auto assignment a user will thus see different
>>> behavior when reading the monitoring events when the user switches to a kernel with
>>> assignable counter support: Before assignable counter support events will have
>>> data, with assignable counter support the events will not have data.
>>>
>>> I understood that interfaces should not behave differently when user space
>>> switches kernels and that is what the auto assignment aims to solve.
>>>
>>>>>
>>>>> If the userspace use-case requires too many monitoring groups for the
>>>>> available counters, then the kernel will auto-assign counters to a
>>>>> random subset of groups which may or may not be the ones that userspace
>>>>> wanted to monitor; then userspace must manually look for the assigned
>>>>> counters and unassign some of them before they can be assigned where
>>>>> userspace actually wanted them.
>>>>>
>>>>> This is not impossible for userspace to cope with, but it feels
>>>>> awkward.
>>>>>
>>>>> Is there a way to inhibit auto-assignment?
>>>>>
>>>>> Or could automatic assignments be considered somehow "weak", so that
>>>>> new explicit assignments can steal automatically assigned counters
>>>>> without the need to unassign them explicitly?
>>>>
>>>> We had an incomplete discussion about this early on[1]. I guess I
>>>> didn't revisit it because I found it was trivial to add a flag that
>>>> inhibits the assignment behavior during mkdir and had moved on to
>>>> bigger issues.
>>>
>>> Could you please remind me how a user will set this flag?
>>
>> Quoting my original suggestion[1]:
>>
>>  "info/L3_MON/mbm_assign_on_mkdir?
>>
>>   boolean (parsed with kstrtobool()), defaulting to true?"
>>
>> After mount, any groups that got counters on creation would have to be
>> cleaned up, but at least that can be done with forward progress once
>> the flag is cleared.
>>
>> I was able to live with that as long as there aren't users polling for
>> resctrl to be mounted and immediately creating groups. For us, a
>> single container manager service manages resctrl.
>>
>>>
>>>>
>>>> If an agent creating directories isn't coordinated with the agent
>>>> managing counters, a series of creating and destroying a group could
>>>> prevent a monitor assignment from ever succeeding because it's not
>>>> possible to atomically discover the name of the new directory that
>>>> stole the previously-available counter and reassign it.
>>>>
>>>> However, if the counter-manager can get all the counters assigned once
>>>> and only move them with atomic reassignments, it will become
>>>> impossible to snatch them with a mkdir.
>>>>
>>>
>>> You have many points that makes auto-assignment not be ideal but I
>>> remain concerned that not doing something like this will break
>>> existing users who are not as familiar with resctrl internals.
>>
>> I agree auto-assignment should be the default. I just want an official
>> way to turn it off.
>>
>> Thanks!
>> -Peter
>>
>> [1] https://lore.kernel.org/lkml/CALPaoCiJ9ELXkij-zsAhxC1hx8UUR+KMPJH6i8c8AT6_mtXs+Q@mail.gmail.com/
>>
> 
> +1
> 
> That's basically my position -- the auto-assignment feels like a
> _potential_ nuisance for ABMC-aware users, but it depends on what they
> are trying to do.  Migration of non-ABMC-aware users will be easier for
> basic use cases if auto-assignment occurs by default (as in this
> series).
> 
> Having an explicit way to turn this off seems perfectly reasonable
> (and could be added later on, if not provided in this series).
> 
> 
> What about the question re whether turning mbm_cntr_assign mode on
> should trigger auto-assignment?
> 
> Currently turning this mode off and then on again has the effect of
> removing all automatic assignments for extant groups.  This feels
> surprising and/or unintentional (?)

Connecting to what you start off by saying I also see auto-assignment
as the way to provide a smooth transition for "non-ABMC-aware" users.

To me a user that turns this mode off and then on again can be
considered as a user that is "ABMC-aware" and turning it "off and then
on again" seems like an intuitive way to get to a "clean slate"
wrt counter assignments. This may also be a convenient way for
an "ABMC-aware" user space to unassign all counters and thus also
helpful if resctrl supports the flag that Peter proposed. The flag
seems to already keep something like this in its context with
a name of "mbm_assign_on_mkdir" that could be interpreted as
"only auto assign on mkdir"?

I am not taking a stand for one or the other approach but instead
trying to be more specific about pros/cons. Could you please provide
more insight in the use case you have in mind so that we can see how
resctrl could behave with few surprises? 

Reinette

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Dave Martin 11 months, 3 weeks ago

Hi,

On Thu, Feb 20, 2025 at 09:08:17AM -0800, Reinette Chatre wrote:
> Hi Dave,
> 
> On 2/20/25 5:40 AM, Dave Martin wrote:
> > On Thu, Feb 20, 2025 at 11:35:56AM +0100, Peter Newman wrote:
> >> Hi Reinette,
> >>
> >> On Wed, Feb 19, 2025 at 6:55 PM Reinette Chatre
> >> <reinette.chatre@intel.com> wrote:

[...]

> >>> Could you please remind me how a user will set this flag?
> >>
> >> Quoting my original suggestion[1]:
> >>
> >>  "info/L3_MON/mbm_assign_on_mkdir?
> >>
> >>   boolean (parsed with kstrtobool()), defaulting to true?"
> >>
> >> After mount, any groups that got counters on creation would have to be
> >> cleaned up, but at least that can be done with forward progress once
> >> the flag is cleared.
> >>
> >> I was able to live with that as long as there aren't users polling for
> >> resctrl to be mounted and immediately creating groups. For us, a
> >> single container manager service manages resctrl.

[...]

> > +1
> > 
> > That's basically my position -- the auto-assignment feels like a
> > _potential_ nuisance for ABMC-aware users, but it depends on what they
> > are trying to do.  Migration of non-ABMC-aware users will be easier for
> > basic use cases if auto-assignment occurs by default (as in this
> > series).
> > 
> > Having an explicit way to turn this off seems perfectly reasonable
> > (and could be added later on, if not provided in this series).
> > 
> > 
> > What about the question re whether turning mbm_cntr_assign mode on
> > should trigger auto-assignment?
> > 
> > Currently turning this mode off and then on again has the effect of
> > removing all automatic assignments for extant groups.  This feels
> > surprising and/or unintentional (?)
> 
> Connecting to what you start off by saying I also see auto-assignment
> as the way to provide a smooth transition for "non-ABMC-aware" users.

I agree, and having this on by default also helps non-ABMC-aware users.

> To me a user that turns this mode off and then on again can be
> considered as a user that is "ABMC-aware" and turning it "off and then
> on again" seems like an intuitive way to get to a "clean slate"
> wrt counter assignments. This may also be a convenient way for
> an "ABMC-aware" user space to unassign all counters and thus also
> helpful if resctrl supports the flag that Peter proposed. The flag
> seems to already keep something like this in its context with
> a name of "mbm_assign_on_mkdir" that could be interpreted as
> "only auto assign on mkdir"?

Yes, that's reasonable.  It could be a good idea to document this
behaviour of switching the mbm_cntr_assign mode, if we think it is
useful and people are likely to rely on it.

Since mkdir is an implementation detail of the resctrl interface, I'd
be tempted to go for a more generic name, say,
"mbm_assign_new_mon_groups".  But that's just bikeshedding.
The proposed behaviour seems fine.

Either way, if this is not included in this series, it could be added
later without breaking anything.


> I am not taking a stand for one or the other approach but instead
> trying to be more specific about pros/cons. Could you please provide
> more insight in the use case you have in mind so that we can see how
> resctrl could behave with few surprises? 
> 
> Reinette

I don't have a strong view either.

I don't have a concrete use case here -- I was just trying to imagine
the experience of an ABMC-aware user who wants full control over what
counters get assigned.

I agree that the convenience of the non-ABMC-aware user should probably
take priority over that of the ABMC-aware user, at least in situations
where the expected behaviour is achievable (i.e., where we didn't run
out of counters to auto-assign.)

Cheers
---Dave

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Moger, Babu 11 months, 3 weeks ago

Hi All,

On 2/21/2025 11:14 AM, Dave Martin wrote:
> Hi,
> 
> On Thu, Feb 20, 2025 at 09:08:17AM -0800, Reinette Chatre wrote:
>> Hi Dave,
>>
>> On 2/20/25 5:40 AM, Dave Martin wrote:
>>> On Thu, Feb 20, 2025 at 11:35:56AM +0100, Peter Newman wrote:
>>>> Hi Reinette,
>>>>
>>>> On Wed, Feb 19, 2025 at 6:55 PM Reinette Chatre
>>>> <reinette.chatre@intel.com> wrote:
> 
> [...]
> 
>>>>> Could you please remind me how a user will set this flag?
>>>>
>>>> Quoting my original suggestion[1]:
>>>>
>>>>   "info/L3_MON/mbm_assign_on_mkdir?
>>>>
>>>>    boolean (parsed with kstrtobool()), defaulting to true?"
>>>>
>>>> After mount, any groups that got counters on creation would have to be
>>>> cleaned up, but at least that can be done with forward progress once
>>>> the flag is cleared.
>>>>
>>>> I was able to live with that as long as there aren't users polling for
>>>> resctrl to be mounted and immediately creating groups. For us, a
>>>> single container manager service manages resctrl.
> 
> [...]
> 
>>> +1
>>>
>>> That's basically my position -- the auto-assignment feels like a
>>> _potential_ nuisance for ABMC-aware users, but it depends on what they
>>> are trying to do.  Migration of non-ABMC-aware users will be easier for
>>> basic use cases if auto-assignment occurs by default (as in this
>>> series).
>>>
>>> Having an explicit way to turn this off seems perfectly reasonable
>>> (and could be added later on, if not provided in this series).
>>>
>>>
>>> What about the question re whether turning mbm_cntr_assign mode on
>>> should trigger auto-assignment?
>>>
>>> Currently turning this mode off and then on again has the effect of
>>> removing all automatic assignments for extant groups.  This feels
>>> surprising and/or unintentional (?)
>>
>> Connecting to what you start off by saying I also see auto-assignment
>> as the way to provide a smooth transition for "non-ABMC-aware" users.
> 
> I agree, and having this on by default also helps non-ABMC-aware users.
> 
>> To me a user that turns this mode off and then on again can be
>> considered as a user that is "ABMC-aware" and turning it "off and then
>> on again" seems like an intuitive way to get to a "clean slate"
>> wrt counter assignments. This may also be a convenient way for
>> an "ABMC-aware" user space to unassign all counters and thus also
>> helpful if resctrl supports the flag that Peter proposed. The flag
>> seems to already keep something like this in its context with
>> a name of "mbm_assign_on_mkdir" that could be interpreted as
>> "only auto assign on mkdir"?
> 
> Yes, that's reasonable.  It could be a good idea to document this
> behaviour of switching the mbm_cntr_assign mode, if we think it is
> useful and people are likely to rely on it.
> 
> Since mkdir is an implementation detail of the resctrl interface, I'd
> be tempted to go for a more generic name, say,
> "mbm_assign_new_mon_groups".  But that's just bikeshedding.
> The proposed behaviour seems fine.
> 
> Either way, if this is not included in this series, it could be added
> later without breaking anything.

How about more generic "mbm_cntr_assign_auto" ?

We can add it as part of "struct resctrl_mon" and set it "on" when ABMC 
is detected. It will be part of check in rdtgroup_assign_cntrs() which 
is called when new groups are created. Also, provide  user interface to 
disable it. Seems simple to me.

Thanks
Babu


> 
> 
>> I am not taking a stand for one or the other approach but instead
>> trying to be more specific about pros/cons. Could you please provide
>> more insight in the use case you have in mind so that we can see how
>> resctrl could behave with few surprises?
>>
>> Reinette
> 
> I don't have a strong view either.
> 
> I don't have a concrete use case here -- I was just trying to imagine
> the experience of an ABMC-aware user who wants full control over what
> counters get assigned.
> 
> I agree that the convenience of the non-ABMC-aware user should probably
> take priority over that of the ABMC-aware user, at least in situations
> where the expected behaviour is achievable (i.e., where we didn't run
> out of counters to auto-assign.)
> 
> Cheers
> ---Dave
>

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Reinette Chatre 11 months, 3 weeks ago

Hi Babu,

On 2/21/25 10:23 AM, Moger, Babu wrote:
> Hi All,
> 
> On 2/21/2025 11:14 AM, Dave Martin wrote:
>> Hi,
>>
>> On Thu, Feb 20, 2025 at 09:08:17AM -0800, Reinette Chatre wrote:
>>> Hi Dave,
>>>
>>> On 2/20/25 5:40 AM, Dave Martin wrote:
>>>> On Thu, Feb 20, 2025 at 11:35:56AM +0100, Peter Newman wrote:
>>>>> Hi Reinette,
>>>>>
>>>>> On Wed, Feb 19, 2025 at 6:55 PM Reinette Chatre
>>>>> <reinette.chatre@intel.com> wrote:
>>
>> [...]
>>
>>>>>> Could you please remind me how a user will set this flag?
>>>>>
>>>>> Quoting my original suggestion[1]:
>>>>>
>>>>>   "info/L3_MON/mbm_assign_on_mkdir?
>>>>>
>>>>>    boolean (parsed with kstrtobool()), defaulting to true?"
>>>>>
>>>>> After mount, any groups that got counters on creation would have to be
>>>>> cleaned up, but at least that can be done with forward progress once
>>>>> the flag is cleared.
>>>>>
>>>>> I was able to live with that as long as there aren't users polling for
>>>>> resctrl to be mounted and immediately creating groups. For us, a
>>>>> single container manager service manages resctrl.
>>
>> [...]
>>
>>>> +1
>>>>
>>>> That's basically my position -- the auto-assignment feels like a
>>>> _potential_ nuisance for ABMC-aware users, but it depends on what they
>>>> are trying to do.  Migration of non-ABMC-aware users will be easier for
>>>> basic use cases if auto-assignment occurs by default (as in this
>>>> series).
>>>>
>>>> Having an explicit way to turn this off seems perfectly reasonable
>>>> (and could be added later on, if not provided in this series).
>>>>
>>>>
>>>> What about the question re whether turning mbm_cntr_assign mode on
>>>> should trigger auto-assignment?
>>>>
>>>> Currently turning this mode off and then on again has the effect of
>>>> removing all automatic assignments for extant groups.  This feels
>>>> surprising and/or unintentional (?)
>>>
>>> Connecting to what you start off by saying I also see auto-assignment
>>> as the way to provide a smooth transition for "non-ABMC-aware" users.
>>
>> I agree, and having this on by default also helps non-ABMC-aware users.
>>
>>> To me a user that turns this mode off and then on again can be
>>> considered as a user that is "ABMC-aware" and turning it "off and then
>>> on again" seems like an intuitive way to get to a "clean slate"
>>> wrt counter assignments. This may also be a convenient way for
>>> an "ABMC-aware" user space to unassign all counters and thus also
>>> helpful if resctrl supports the flag that Peter proposed. The flag
>>> seems to already keep something like this in its context with
>>> a name of "mbm_assign_on_mkdir" that could be interpreted as
>>> "only auto assign on mkdir"?
>>
>> Yes, that's reasonable.  It could be a good idea to document this
>> behaviour of switching the mbm_cntr_assign mode, if we think it is
>> useful and people are likely to rely on it.
>>
>> Since mkdir is an implementation detail of the resctrl interface, I'd
>> be tempted to go for a more generic name, say,
>> "mbm_assign_new_mon_groups".  But that's just bikeshedding.
>> The proposed behaviour seems fine.
>>
>> Either way, if this is not included in this series, it could be added
>> later without breaking anything.
> 
> How about more generic "mbm_cntr_assign_auto" ?

I would like to be careful to not make it _too_ generic. Dave already pointed
out that users may be surprised that counters are not auto-assigned when switching
between the different modes so using the the name to help highlight when this
auto-assignment can be expected to happen seems very useful.

Reinette

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Moger, Babu 11 months, 3 weeks ago

Hi Reinette,

On 2/21/2025 4:48 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 2/21/25 10:23 AM, Moger, Babu wrote:
>> Hi All,
>>
>> On 2/21/2025 11:14 AM, Dave Martin wrote:
>>> Hi,
>>>
>>> On Thu, Feb 20, 2025 at 09:08:17AM -0800, Reinette Chatre wrote:
>>>> Hi Dave,
>>>>
>>>> On 2/20/25 5:40 AM, Dave Martin wrote:
>>>>> On Thu, Feb 20, 2025 at 11:35:56AM +0100, Peter Newman wrote:
>>>>>> Hi Reinette,
>>>>>>
>>>>>> On Wed, Feb 19, 2025 at 6:55 PM Reinette Chatre
>>>>>> <reinette.chatre@intel.com> wrote:
>>>
>>> [...]
>>>
>>>>>>> Could you please remind me how a user will set this flag?
>>>>>>
>>>>>> Quoting my original suggestion[1]:
>>>>>>
>>>>>>    "info/L3_MON/mbm_assign_on_mkdir?
>>>>>>
>>>>>>     boolean (parsed with kstrtobool()), defaulting to true?"
>>>>>>
>>>>>> After mount, any groups that got counters on creation would have to be
>>>>>> cleaned up, but at least that can be done with forward progress once
>>>>>> the flag is cleared.
>>>>>>
>>>>>> I was able to live with that as long as there aren't users polling for
>>>>>> resctrl to be mounted and immediately creating groups. For us, a
>>>>>> single container manager service manages resctrl.
>>>
>>> [...]
>>>
>>>>> +1
>>>>>
>>>>> That's basically my position -- the auto-assignment feels like a
>>>>> _potential_ nuisance for ABMC-aware users, but it depends on what they
>>>>> are trying to do.  Migration of non-ABMC-aware users will be easier for
>>>>> basic use cases if auto-assignment occurs by default (as in this
>>>>> series).
>>>>>
>>>>> Having an explicit way to turn this off seems perfectly reasonable
>>>>> (and could be added later on, if not provided in this series).
>>>>>
>>>>>
>>>>> What about the question re whether turning mbm_cntr_assign mode on
>>>>> should trigger auto-assignment?
>>>>>
>>>>> Currently turning this mode off and then on again has the effect of
>>>>> removing all automatic assignments for extant groups.  This feels
>>>>> surprising and/or unintentional (?)
>>>>
>>>> Connecting to what you start off by saying I also see auto-assignment
>>>> as the way to provide a smooth transition for "non-ABMC-aware" users.
>>>
>>> I agree, and having this on by default also helps non-ABMC-aware users.
>>>
>>>> To me a user that turns this mode off and then on again can be
>>>> considered as a user that is "ABMC-aware" and turning it "off and then
>>>> on again" seems like an intuitive way to get to a "clean slate"
>>>> wrt counter assignments. This may also be a convenient way for
>>>> an "ABMC-aware" user space to unassign all counters and thus also
>>>> helpful if resctrl supports the flag that Peter proposed. The flag
>>>> seems to already keep something like this in its context with
>>>> a name of "mbm_assign_on_mkdir" that could be interpreted as
>>>> "only auto assign on mkdir"?
>>>
>>> Yes, that's reasonable.  It could be a good idea to document this
>>> behaviour of switching the mbm_cntr_assign mode, if we think it is
>>> useful and people are likely to rely on it.
>>>
>>> Since mkdir is an implementation detail of the resctrl interface, I'd
>>> be tempted to go for a more generic name, say,
>>> "mbm_assign_new_mon_groups".  But that's just bikeshedding.
>>> The proposed behaviour seems fine.
>>>
>>> Either way, if this is not included in this series, it could be added
>>> later without breaking anything.
>>
>> How about more generic "mbm_cntr_assign_auto" ?
> 
> I would like to be careful to not make it _too_ generic. Dave already pointed
> out that users may be surprised that counters are not auto-assigned when switching
> between the different modes so using the the name to help highlight when this
> auto-assignment can be expected to happen seems very useful.

In that case "mbm_assign_on_mkdir" seems on point and precise.
Thanks
Babu

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Peter Newman 11 months, 2 weeks ago

Hi Babu,

On Sat, Feb 22, 2025 at 12:43 AM Moger, Babu <bmoger@amd.com> wrote:
>
> Hi Reinette,
>
> On 2/21/2025 4:48 PM, Reinette Chatre wrote:
> > Hi Babu,
> >
> > On 2/21/25 10:23 AM, Moger, Babu wrote:
> >> Hi All,
> >>
> >> On 2/21/2025 11:14 AM, Dave Martin wrote:
> >>> Hi,
> >>>
> >>> On Thu, Feb 20, 2025 at 09:08:17AM -0800, Reinette Chatre wrote:
> >>>> Hi Dave,
> >>>>
> >>>> On 2/20/25 5:40 AM, Dave Martin wrote:
> >>>>> On Thu, Feb 20, 2025 at 11:35:56AM +0100, Peter Newman wrote:
> >>>>>> Hi Reinette,
> >>>>>>
> >>>>>> On Wed, Feb 19, 2025 at 6:55 PM Reinette Chatre
> >>>>>> <reinette.chatre@intel.com> wrote:
> >>>
> >>> [...]
> >>>
> >>>>>>> Could you please remind me how a user will set this flag?
> >>>>>>
> >>>>>> Quoting my original suggestion[1]:
> >>>>>>
> >>>>>>    "info/L3_MON/mbm_assign_on_mkdir?
> >>>>>>
> >>>>>>     boolean (parsed with kstrtobool()), defaulting to true?"
> >>>>>>
> >>>>>> After mount, any groups that got counters on creation would have to be
> >>>>>> cleaned up, but at least that can be done with forward progress once
> >>>>>> the flag is cleared.
> >>>>>>
> >>>>>> I was able to live with that as long as there aren't users polling for
> >>>>>> resctrl to be mounted and immediately creating groups. For us, a
> >>>>>> single container manager service manages resctrl.
> >>>
> >>> [...]
> >>>
> >>>>> +1
> >>>>>
> >>>>> That's basically my position -- the auto-assignment feels like a
> >>>>> _potential_ nuisance for ABMC-aware users, but it depends on what they
> >>>>> are trying to do.  Migration of non-ABMC-aware users will be easier for
> >>>>> basic use cases if auto-assignment occurs by default (as in this
> >>>>> series).
> >>>>>
> >>>>> Having an explicit way to turn this off seems perfectly reasonable
> >>>>> (and could be added later on, if not provided in this series).
> >>>>>
> >>>>>
> >>>>> What about the question re whether turning mbm_cntr_assign mode on
> >>>>> should trigger auto-assignment?
> >>>>>
> >>>>> Currently turning this mode off and then on again has the effect of
> >>>>> removing all automatic assignments for extant groups.  This feels
> >>>>> surprising and/or unintentional (?)
> >>>>
> >>>> Connecting to what you start off by saying I also see auto-assignment
> >>>> as the way to provide a smooth transition for "non-ABMC-aware" users.
> >>>
> >>> I agree, and having this on by default also helps non-ABMC-aware users.
> >>>
> >>>> To me a user that turns this mode off and then on again can be
> >>>> considered as a user that is "ABMC-aware" and turning it "off and then
> >>>> on again" seems like an intuitive way to get to a "clean slate"
> >>>> wrt counter assignments. This may also be a convenient way for
> >>>> an "ABMC-aware" user space to unassign all counters and thus also
> >>>> helpful if resctrl supports the flag that Peter proposed. The flag
> >>>> seems to already keep something like this in its context with
> >>>> a name of "mbm_assign_on_mkdir" that could be interpreted as
> >>>> "only auto assign on mkdir"?
> >>>
> >>> Yes, that's reasonable.  It could be a good idea to document this
> >>> behaviour of switching the mbm_cntr_assign mode, if we think it is
> >>> useful and people are likely to rely on it.
> >>>
> >>> Since mkdir is an implementation detail of the resctrl interface, I'd
> >>> be tempted to go for a more generic name, say,
> >>> "mbm_assign_new_mon_groups".  But that's just bikeshedding.
> >>> The proposed behaviour seems fine.
> >>>
> >>> Either way, if this is not included in this series, it could be added
> >>> later without breaking anything.
> >>
> >> How about more generic "mbm_cntr_assign_auto" ?
> >
> > I would like to be careful to not make it _too_ generic. Dave already pointed
> > out that users may be surprised that counters are not auto-assigned when switching
> > between the different modes so using the the name to help highlight when this
> > auto-assignment can be expected to happen seems very useful.
>
> In that case "mbm_assign_on_mkdir" seems on point and precise.
> Thanks

It also looks like counters are not assigned when a domain is
hotplugged, so explicitly stating that it's on mkdir gets us off the
hook for that.

-Peter

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Reinette Chatre 1 year ago

Hi Babu,

On 1/22/25 12:20 PM, Babu Moger wrote:
> Assign/unassign counters on resctrl group creation/deletion. Two counters
> are required per group, one for MBM total event and one for MBM local
> event.
> 
> There are a limited number of counters available for assignment. If these
> counters are exhausted, the kernel will display the error message: "Out of
> MBM assignable counters". However, it is not necessary to fail the
> creation of a group due to assignment failures. Users have the flexibility
> to modify the assignments at a later time.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index b6d188d0f9b7..118b39fbb01e 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1557,3 +1557,30 @@ int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d
>  
>  	return ret;
>  }
> +
> +void mbm_cntr_reset(struct rdt_resource *r)
> +{
> +	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> +	struct rdt_mon_domain *dom;
> +
> +	/*
> +	 * Reset the domain counter configuration. Hardware counters
> +	 * will reset after switching the monitor mode. So, reset the
> +	 * architectural amd non-architectural state so that reading

"amd" -> "and"

> +	 * of hardware counter is not considered as an overflow in the
> +	 * next update.
> +	 */
> +	if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
> +		list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> +			memset(dom->cntr_cfg, 0,
> +			       sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
> +			if (is_mbm_total_enabled())
> +				memset(dom->mbm_total, 0,
> +				       sizeof(struct mbm_state) * idx_limit);
> +			if (is_mbm_local_enabled())
> +				memset(dom->mbm_local, 0,
> +				       sizeof(struct mbm_state) * idx_limit);
> +			resctrl_arch_reset_rmid_all(r, dom);
> +		}
> +	}
> +}

I looked back at the previous versions to better understand how this function
came about and I do not think it actually solves the problem it aims to solve.

rdtgroup_unassign_cntrs() can fail and when it does the counter is not free'd. That
leaves a monitoring domain's array with an entry that points to a resource group
that no longer exists (unless it is the default resource group) since
rdtgroup_unassign_cntrs() does not check the return and proceeds to remove the
resource group. mbm_cntr_reset() is called on umount of resctrl but
rdtgroup_unassign_cntrs() is called on every  group remove and those scenarios
are not handled.

To address this I believe that I need to go back on a previous request to have
resctrl_arch_config_cntr() return an error code. AMD does not need this and
it is difficult to predict what will work for MPAM. I originally wanted to be
flexible here but this appears to be impractical. With a new requirement that 
resctrl_arch_config_cntr() always succeeds the counter will in turn always
be free'd and not leave dangling pointers. I believe doing so eliminates
the need for mbm_cntr_reset() as used in this patch. My apologies for the
misdirection. We can re-evaluate these flows if MPAM needs anything different.

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 2b86124c336b..f61f0cd032ef 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2687,6 +2687,46 @@ static void schemata_list_destroy(void)
>  	}
>  }
>  
> +/*
> + * Called when a new group is created. If "mbm_cntr_assign" mode is enabled,
> + * counters are automatically assigned. Each group can accommodate two counters:
> + * one for the total event and one for the local event. Assignments may fail
> + * due to the limited number of counters. However, it is not necessary to fail
> + * the group creation and thus no failure is returned. Users have the option
> + * to modify the counter assignments after the group has been created.
> + */
> +static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> +		return;
> +
> +	if (is_mbm_total_enabled())
> +		resctrl_assign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	if (is_mbm_local_enabled())
> +		resctrl_assign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> +}
> +
> +/*
> + * Called when a group is deleted. Counters are unassigned if it was in
> + * assigned state.
> + */
> +static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> +		return;
> +

It looks to me as though there are a couple of places (rmdir_all_sub(), rdt_kill_sb(),
and rdtgroup_rmdir_ctrl()) where rdtgroup_unassign_cntrs() could be called on a system that
does not support monitoring and/or only supports cache allocation monitoring.

In these paths it is only the architecture's resctrl_arch_mbm_cntr_assign_enabled(r) that
gates the resctrl flow. I think rdtgroup_unassign_cntrs() and to match rdtgroup_assign_cntrs()
can do with at least a r->mon_capable check.

> +	if (is_mbm_total_enabled())
> +		resctrl_unassign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	if (is_mbm_local_enabled())
> +		resctrl_unassign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> +}
> +
>  static int rdt_get_tree(struct fs_context *fc)
>  {
>  	struct rdt_fs_context *ctx = rdt_fc2context(fc);
> @@ -2741,6 +2781,8 @@ static int rdt_get_tree(struct fs_context *fc)
>  		if (ret < 0)
>  			goto out_info;
>  
> +		rdtgroup_assign_cntrs(&rdtgroup_default);
> +
>  		ret = mkdir_mondata_all(rdtgroup_default.kn,
>  					&rdtgroup_default, &kn_mondata);
>  		if (ret < 0)
> @@ -2779,8 +2821,10 @@ static int rdt_get_tree(struct fs_context *fc)
>  	if (resctrl_arch_mon_capable())
>  		kernfs_remove(kn_mondata);
>  out_mongrp:
> -	if (resctrl_arch_mon_capable())
> +	if (resctrl_arch_mon_capable()) {
> +		rdtgroup_unassign_cntrs(&rdtgroup_default);
>  		kernfs_remove(kn_mongrp);
> +	}
>  out_info:
>  	kernfs_remove(kn_info);
>  out_schemata_free:
> @@ -2956,6 +3000,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
>  
>  	head = &rdtgrp->mon.crdtgrp_list;
>  	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
> +		rdtgroup_unassign_cntrs(sentry);
>  		free_rmid(sentry->closid, sentry->mon.rmid);
>  		list_del(&sentry->mon.crdtgrp_list);
>  
> @@ -2996,6 +3041,8 @@ static void rmdir_all_sub(void)
>  		cpumask_or(&rdtgroup_default.cpu_mask,
>  			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>  
> +		rdtgroup_unassign_cntrs(rdtgrp);
> +
>  		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  
>  		kernfs_remove(rdtgrp->kn);
> @@ -3027,6 +3074,8 @@ static void rdt_kill_sb(struct super_block *sb)
>  	for_each_alloc_capable_rdt_resource(r)
>  		reset_all_ctrls(r);
>  	rmdir_all_sub();
> +	rdtgroup_unassign_cntrs(&rdtgroup_default);
> +	mbm_cntr_reset(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
>  	rdt_pseudo_lock_release();
>  	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
>  	schemata_list_destroy();
> @@ -3490,9 +3539,12 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>  	}
>  	rdtgrp->mon.rmid = ret;
>  
> +	rdtgroup_assign_cntrs(rdtgrp);
> +
>  	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
>  	if (ret) {
>  		rdt_last_cmd_puts("kernfs subdir error\n");
> +		rdtgroup_unassign_cntrs(rdtgrp);
>  		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  		return ret;
>  	}
> @@ -3502,8 +3554,10 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>  
>  static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
>  {
> -	if (resctrl_arch_mon_capable())
> +	if (resctrl_arch_mon_capable()) {
> +		rdtgroup_unassign_cntrs(rgrp);
>  		free_rmid(rgrp->closid, rgrp->mon.rmid);
> +	}
>  }
>  
>  static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
> @@ -3764,6 +3818,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>  	update_closid_rmid(tmpmask, NULL);
>  
>  	rdtgrp->flags = RDT_DELETED;
> +
> +	rdtgroup_unassign_cntrs(rdtgrp);
> +
>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  
>  	/*
> @@ -3810,6 +3867,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>  	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>  	update_closid_rmid(tmpmask, NULL);
>  
> +	rdtgroup_unassign_cntrs(rdtgrp);
> +
>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>  	closid_free(rdtgrp->closid);
>  

Reinette

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Moger, Babu 1 year ago

Hi Reinette,

On 2/6/25 12:03, Reinette Chatre wrote:
> Hi Babu,
> 
> On 1/22/25 12:20 PM, Babu Moger wrote:
>> Assign/unassign counters on resctrl group creation/deletion. Two counters
>> are required per group, one for MBM total event and one for MBM local
>> event.
>>
>> There are a limited number of counters available for assignment. If these
>> counters are exhausted, the kernel will display the error message: "Out of
>> MBM assignable counters". However, it is not necessary to fail the
>> creation of a group due to assignment failures. Users have the flexibility
>> to modify the assignments at a later time.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index b6d188d0f9b7..118b39fbb01e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1557,3 +1557,30 @@ int resctrl_unassign_cntr_event(struct rdt_resource *r, struct rdt_mon_domain *d
>>  
>>  	return ret;
>>  }
>> +
>> +void mbm_cntr_reset(struct rdt_resource *r)
>> +{
>> +	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>> +	struct rdt_mon_domain *dom;
>> +
>> +	/*
>> +	 * Reset the domain counter configuration. Hardware counters
>> +	 * will reset after switching the monitor mode. So, reset the
>> +	 * architectural amd non-architectural state so that reading
> 
> "amd" -> "and"

Sure.

> 
>> +	 * of hardware counter is not considered as an overflow in the
>> +	 * next update.
>> +	 */
>> +	if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>> +		list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>> +			memset(dom->cntr_cfg, 0,
>> +			       sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
>> +			if (is_mbm_total_enabled())
>> +				memset(dom->mbm_total, 0,
>> +				       sizeof(struct mbm_state) * idx_limit);
>> +			if (is_mbm_local_enabled())
>> +				memset(dom->mbm_local, 0,
>> +				       sizeof(struct mbm_state) * idx_limit);
>> +			resctrl_arch_reset_rmid_all(r, dom);
>> +		}
>> +	}
>> +}
> 
> I looked back at the previous versions to better understand how this function
> came about and I do not think it actually solves the problem it aims to solve.
> 
> rdtgroup_unassign_cntrs() can fail and when it does the counter is not free'd. That
> leaves a monitoring domain's array with an entry that points to a resource group
> that no longer exists (unless it is the default resource group) since
> rdtgroup_unassign_cntrs() does not check the return and proceeds to remove the
> resource group. mbm_cntr_reset() is called on umount of resctrl but
> rdtgroup_unassign_cntrs() is called on every  group remove and those scenarios
> are not handled.
> 
> To address this I believe that I need to go back on a previous request to have
> resctrl_arch_config_cntr() return an error code. AMD does not need this and
> it is difficult to predict what will work for MPAM. I originally wanted to be
> flexible here but this appears to be impractical. With a new requirement that 
> resctrl_arch_config_cntr() always succeeds the counter will in turn always
> be free'd and not leave dangling pointers. I believe doing so eliminates
> the need for mbm_cntr_reset() as used in this patch. My apologies for the
> misdirection. We can re-evaluate these flows if MPAM needs anything different.

So, new requirement is to free the counter even if the
resctrl_arch_config_cntr() call fails. That way after calling
rdtgroup_unassign_cntrs() the counter is freed and it is in clean state.
So, we dont need to call mbm_cntr_reset() in the end to clear all the entries.

Here is the call sequence.

rdtgroup_unassign_cntrs() -> resctrl_unassign_cntr_event() ->
resctrl_free_config_cntr() -> resctrl_config_cntr() ->
resctrl_arch_config_cntr().

So, only change here is.

/*
 * Unassign and free the counter if assigned else return success.
 */
static int resctrl_free_config_cntr(struct rdt_resource *r,
           struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
           enum resctrl_event_id evtid)
{
        int cntr_id, ret = 0;

        cntr_id = mbm_cntr_get(r, d, rdtgrp, evtid);
	if (cntr_id < 0)
 		return ret;

        /* Unassign and free the counter*/
        ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
                                          rdtgrp->closid, cntr_id, false);
        mbm_cntr_free(d, cntr_id);

        return ret;
}


> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 2b86124c336b..f61f0cd032ef 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2687,6 +2687,46 @@ static void schemata_list_destroy(void)
>>  	}
>>  }
>>  
>> +/*
>> + * Called when a new group is created. If "mbm_cntr_assign" mode is enabled,
>> + * counters are automatically assigned. Each group can accommodate two counters:
>> + * one for the total event and one for the local event. Assignments may fail
>> + * due to the limited number of counters. However, it is not necessary to fail
>> + * the group creation and thus no failure is returned. Users have the option
>> + * to modify the counter assignments after the group has been created.
>> + */
>> +static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +		return;
>> +
>> +	if (is_mbm_total_enabled())
>> +		resctrl_assign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	if (is_mbm_local_enabled())
>> +		resctrl_assign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +}
>> +
>> +/*
>> + * Called when a group is deleted. Counters are unassigned if it was in
>> + * assigned state.
>> + */
>> +static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
>> +{
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r))
>> +		return;
>> +
> 
> It looks to me as though there are a couple of places (rmdir_all_sub(), rdt_kill_sb(),
> and rdtgroup_rmdir_ctrl()) where rdtgroup_unassign_cntrs() could be called on a system that
> does not support monitoring and/or only supports cache allocation monitoring.
> 
> In these paths it is only the architecture's resctrl_arch_mbm_cntr_assign_enabled(r) that
> gates the resctrl flow. I think rdtgroup_unassign_cntrs() and to match rdtgroup_assign_cntrs()
> can do with at least a r->mon_capable check.

ok. Will add following check.

if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r))
   return;

> 
>> +	if (is_mbm_total_enabled())
>> +		resctrl_unassign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	if (is_mbm_local_enabled())
>> +		resctrl_unassign_cntr_event(r, NULL, rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
>> +}
>> +
>>  static int rdt_get_tree(struct fs_context *fc)
>>  {
>>  	struct rdt_fs_context *ctx = rdt_fc2context(fc);
>> @@ -2741,6 +2781,8 @@ static int rdt_get_tree(struct fs_context *fc)
>>  		if (ret < 0)
>>  			goto out_info;
>>  
>> +		rdtgroup_assign_cntrs(&rdtgroup_default);
>> +
>>  		ret = mkdir_mondata_all(rdtgroup_default.kn,
>>  					&rdtgroup_default, &kn_mondata);
>>  		if (ret < 0)
>> @@ -2779,8 +2821,10 @@ static int rdt_get_tree(struct fs_context *fc)
>>  	if (resctrl_arch_mon_capable())
>>  		kernfs_remove(kn_mondata);
>>  out_mongrp:
>> -	if (resctrl_arch_mon_capable())
>> +	if (resctrl_arch_mon_capable()) {
>> +		rdtgroup_unassign_cntrs(&rdtgroup_default);
>>  		kernfs_remove(kn_mongrp);
>> +	}
>>  out_info:
>>  	kernfs_remove(kn_info);
>>  out_schemata_free:
>> @@ -2956,6 +3000,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
>>  
>>  	head = &rdtgrp->mon.crdtgrp_list;
>>  	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
>> +		rdtgroup_unassign_cntrs(sentry);
>>  		free_rmid(sentry->closid, sentry->mon.rmid);
>>  		list_del(&sentry->mon.crdtgrp_list);
>>  
>> @@ -2996,6 +3041,8 @@ static void rmdir_all_sub(void)
>>  		cpumask_or(&rdtgroup_default.cpu_mask,
>>  			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>>  
>> +		rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>  		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>  
>>  		kernfs_remove(rdtgrp->kn);
>> @@ -3027,6 +3074,8 @@ static void rdt_kill_sb(struct super_block *sb)
>>  	for_each_alloc_capable_rdt_resource(r)
>>  		reset_all_ctrls(r);
>>  	rmdir_all_sub();
>> +	rdtgroup_unassign_cntrs(&rdtgroup_default);
>> +	mbm_cntr_reset(&rdt_resources_all[RDT_RESOURCE_L3].r_resctrl);
>>  	rdt_pseudo_lock_release();
>>  	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
>>  	schemata_list_destroy();
>> @@ -3490,9 +3539,12 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>>  	}
>>  	rdtgrp->mon.rmid = ret;
>>  
>> +	rdtgroup_assign_cntrs(rdtgrp);
>> +
>>  	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
>>  	if (ret) {
>>  		rdt_last_cmd_puts("kernfs subdir error\n");
>> +		rdtgroup_unassign_cntrs(rdtgrp);
>>  		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>  		return ret;
>>  	}
>> @@ -3502,8 +3554,10 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>>  
>>  static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
>>  {
>> -	if (resctrl_arch_mon_capable())
>> +	if (resctrl_arch_mon_capable()) {
>> +		rdtgroup_unassign_cntrs(rgrp);
>>  		free_rmid(rgrp->closid, rgrp->mon.rmid);
>> +	}
>>  }
>>  
>>  static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
>> @@ -3764,6 +3818,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>>  	update_closid_rmid(tmpmask, NULL);
>>  
>>  	rdtgrp->flags = RDT_DELETED;
>> +
>> +	rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>  
>>  	/*
>> @@ -3810,6 +3867,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
>>  	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
>>  	update_closid_rmid(tmpmask, NULL);
>>  
>> +	rdtgroup_unassign_cntrs(rdtgrp);
>> +
>>  	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>>  	closid_free(rdtgrp->closid);
>>  
> 
> Reinette
> 

-- 
Thanks
Babu Moger

Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled

Posted by Reinette Chatre 1 year ago

Hi Babu,

On 2/10/25 9:27 AM, Moger, Babu wrote:
> On 2/6/25 12:03, Reinette Chatre wrote:
>> On 1/22/25 12:20 PM, Babu Moger wrote:

>>
>>> +	 * of hardware counter is not considered as an overflow in the
>>> +	 * next update.
>>> +	 */
>>> +	if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>>> +		list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>> +			memset(dom->cntr_cfg, 0,
>>> +			       sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
>>> +			if (is_mbm_total_enabled())
>>> +				memset(dom->mbm_total, 0,
>>> +				       sizeof(struct mbm_state) * idx_limit);
>>> +			if (is_mbm_local_enabled())
>>> +				memset(dom->mbm_local, 0,
>>> +				       sizeof(struct mbm_state) * idx_limit);
>>> +			resctrl_arch_reset_rmid_all(r, dom);
>>> +		}
>>> +	}
>>> +}
>>
>> I looked back at the previous versions to better understand how this function
>> came about and I do not think it actually solves the problem it aims to solve.
>>
>> rdtgroup_unassign_cntrs() can fail and when it does the counter is not free'd. That
>> leaves a monitoring domain's array with an entry that points to a resource group
>> that no longer exists (unless it is the default resource group) since
>> rdtgroup_unassign_cntrs() does not check the return and proceeds to remove the
>> resource group. mbm_cntr_reset() is called on umount of resctrl but
>> rdtgroup_unassign_cntrs() is called on every  group remove and those scenarios
>> are not handled.
>>
>> To address this I believe that I need to go back on a previous request to have
>> resctrl_arch_config_cntr() return an error code. AMD does not need this and
>> it is difficult to predict what will work for MPAM. I originally wanted to be
>> flexible here but this appears to be impractical. With a new requirement that 
>> resctrl_arch_config_cntr() always succeeds the counter will in turn always
>> be free'd and not leave dangling pointers. I believe doing so eliminates
>> the need for mbm_cntr_reset() as used in this patch. My apologies for the
>> misdirection. We can re-evaluate these flows if MPAM needs anything different.
> 
> So, new requirement is to free the counter even if the
> resctrl_arch_config_cntr() call fails. That way after calling

No. Quoting above: "new requirement that resctrl_arch_config_cntr() always succeeds".
As I see it this will eliminate a lot of error checking on the calling path,
not ignore errors.

Reinette