[PATCH v9 19/26] x86/resctrl: Add the interface to unassign a MBM counter

Babu Moger posted 26 patches 3 weeks, 5 days ago
[PATCH v9 19/26] x86/resctrl: Add the interface to unassign a MBM counter
Posted by Babu Moger 3 weeks, 5 days ago
The mbm_cntr_assign mode provides a limited number of hardware counters
that can be assigned to an RMID, event pair to monitor bandwidth while
assigned. If all counters are in use, the kernel will show an error
message: "Out of MBM assignable counters" when a new assignment is
requested. To make space for a new assignment, users must unassign an
already assigned counter.

Introduce an interface that allows for the unassignment of counter IDs
from both the group and the domain. Additionally, ensure that the global
counter is released if it is no longer assigned to any domains.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v9: Changes related to addition of new function resctrl_config_cntr().
    The removed rdtgroup_mbm_cntr_is_assigned() as it was introduced
    already.
    Text changes to take care comments.

v8: Renamed rdtgroup_mbm_cntr_is_assigned to mbm_cntr_assigned_to_domain
    Added return error handling in resctrl_arch_config_cntr().

v7: Merged rdtgroup_unassign_cntr and rdtgroup_free_cntr functions.
    Renamed rdtgroup_mbm_cntr_test() to rdtgroup_mbm_cntr_is_assigned().
    Reworded the commit log little bit.

v6: Removed mbm_cntr_free from this patch.
    Added counter test in all the domains and free if it is not assigned to
    any domains.

v5: Few name changes to match cntr_id.
    Changed the function names to rdtgroup_unassign_cntr
    More comments on commit log.

v4: Added domain specific unassign feature.
    Few name changes.

v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
    The function is not called directly from user anymore. These
    changes are related to global assignment interface.

v2: No changes.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 41 ++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index cb496bd97007..66de0ce12aba 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -719,6 +719,8 @@ int resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 			     u32 cntr_id, bool assign);
 int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 			       struct rdt_mon_domain *d, enum resctrl_event_id evtid);
+int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+				 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
 void rdt_staged_configs_clear(void);
 bool closid_allocated(unsigned int closid);
 int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index bc3752967c44..b0cce3dfd062 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2011,6 +2011,47 @@ int rdtgroup_assign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 	return ret;
 }
 
+/*
+ * Unassign a hardware counter associated with @evtid from the domain and
+ * the group. Unassign the counters from all the domains if rdt_mon_domain
+ * is NULL else unassign from the specific domain.
+ * Free the global counter once unassigned from all the domains.
+ */
+int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
+				 struct rdt_mon_domain *d, enum resctrl_event_id evtid)
+{
+	int index = MBM_EVENT_ARRAY_INDEX(evtid);
+	int cntr_id = rdtgrp->mon.cntr_id[index];
+	int ret;
+
+	/* Return early if the counter is unassigned already */
+	if (cntr_id == MON_CNTR_UNSET)
+		return 0;
+
+	if (!d) {
+		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+			ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+						  rdtgrp->closid, cntr_id, false);
+			if (ret)
+				goto out_done_unassign;
+		}
+	} else {
+		ret = resctrl_config_cntr(r, d, evtid, rdtgrp->mon.rmid,
+					  rdtgrp->closid, cntr_id, false);
+		if (ret)
+			goto out_done_unassign;
+	}
+
+	/* Free the counter id once it is unassigned from all the domains */
+	if (!mbm_cntr_assigned_to_domain(r, cntr_id)) {
+		mbm_cntr_free(r, cntr_id);
+		rdtgroup_cntr_id_init(rdtgrp, evtid);
+	}
+
+out_done_unassign:
+	return ret;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
-- 
2.34.1
Re: [PATCH v9 19/26] x86/resctrl: Add the interface to unassign a MBM counter
Posted by Peter Newman 2 weeks, 6 days ago
Hi Babu,

On Wed, Oct 30, 2024 at 12:25 AM Babu Moger <babu.moger@amd.com> wrote:
>
> The mbm_cntr_assign mode provides a limited number of hardware counters
> that can be assigned to an RMID, event pair to monitor bandwidth while
> assigned. If all counters are in use, the kernel will show an error
> message: "Out of MBM assignable counters" when a new assignment is
> requested. To make space for a new assignment, users must unassign an
> already assigned counter.
>
> Introduce an interface that allows for the unassignment of counter IDs
> from both the group and the domain. Additionally, ensure that the global
> counter is released if it is no longer assigned to any domains.

This seems unnecessarily restrictive. What's wrong with monitoring
different groups in different domains?


-Peter
Re: [PATCH v9 19/26] x86/resctrl: Add the interface to unassign a MBM counter
Posted by Moger, Babu 2 weeks, 6 days ago
Hi Peter,

On 11/4/24 08:16, Peter Newman wrote:
> Hi Babu,
> 
> On Wed, Oct 30, 2024 at 12:25 AM Babu Moger <babu.moger@amd.com> wrote:
>>
>> The mbm_cntr_assign mode provides a limited number of hardware counters
>> that can be assigned to an RMID, event pair to monitor bandwidth while
>> assigned. If all counters are in use, the kernel will show an error
>> message: "Out of MBM assignable counters" when a new assignment is
>> requested. To make space for a new assignment, users must unassign an
>> already assigned counter.
>>
>> Introduce an interface that allows for the unassignment of counter IDs
>> from both the group and the domain. Additionally, ensure that the global
>> counter is released if it is no longer assigned to any domains.
> 
> This seems unnecessarily restrictive. What's wrong with monitoring
> different groups in different domains?

Yes. User can monitor different groups in different domains. But, they
will have to use different global counter for each group.

Here is an example.

#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
groupA//0=t;1=_;
groupB//0=_;1=l;

Group A - counter 0 (Assigned to total event in Domain 0)
Group B - counter 1 (Assigned to local event in Domain 1)

We allocate two different counters here.  Now we are left with 30 counters
(max 32).


This is similar to CLOSID management we follow in resctrl. This is not a
new restriction,
-- 
Thanks
Babu Moger
Re: [PATCH v9 19/26] x86/resctrl: Add the interface to unassign a MBM counter
Posted by Peter Newman 2 weeks, 5 days ago
Hi Babu,

On Mon, Nov 4, 2024 at 7:21 PM Moger, Babu <babu.moger@amd.com> wrote:
>
> Hi Peter,
>
> On 11/4/24 08:16, Peter Newman wrote:
> > Hi Babu,
> >
> > On Wed, Oct 30, 2024 at 12:25 AM Babu Moger <babu.moger@amd.com> wrote:
> >>
> >> The mbm_cntr_assign mode provides a limited number of hardware counters
> >> that can be assigned to an RMID, event pair to monitor bandwidth while
> >> assigned. If all counters are in use, the kernel will show an error
> >> message: "Out of MBM assignable counters" when a new assignment is
> >> requested. To make space for a new assignment, users must unassign an
> >> already assigned counter.
> >>
> >> Introduce an interface that allows for the unassignment of counter IDs
> >> from both the group and the domain. Additionally, ensure that the global
> >> counter is released if it is no longer assigned to any domains.
> >
> > This seems unnecessarily restrictive. What's wrong with monitoring
> > different groups in different domains?
>
> Yes. User can monitor different groups in different domains. But, they
> will have to use different global counter for each group.

What is a global counter anyways? It sounds like an artifact of an
earlier revision. This concept does not sound intuitive to the user.

>
> Here is an example.
>
> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> groupA//0=t;1=_;
> groupB//0=_;1=l;
>
> Group A - counter 0 (Assigned to total event in Domain 0)
> Group B - counter 1 (Assigned to local event in Domain 1)
>
> We allocate two different counters here.  Now we are left with 30 counters
> (max 32).
>
>
> This is similar to CLOSID management we follow in resctrl. This is not a
> new restriction,

It is a restriction in a new feature that resembles a restriction in
an existing feature.

I don't see what function the global allocator serves now that there
is already a per-domain allocator. My best guess is that it avoids the
case of an mbm_assign_control write that succeeds in some domains but
fails in others.

I admit I said earlier that I was only planning to allocate globally,
but now that I'm evaluating how to make resctrl's monitoring
functionality scale on large systems, I'm being forced to reconsider.

As long as this is only a limitation I can fix later, I don't see it
as an obstacle. There would just need to be better documentation of
what sort of internal data structures the user needs to visualize in
order to use this feature successfully.

-Peter
Re: [PATCH v9 19/26] x86/resctrl: Add the interface to unassign a MBM counter
Posted by Moger, Babu 2 weeks, 5 days ago
Hi Peter,

On 11/5/24 04:35, Peter Newman wrote:
> Hi Babu,
> 
> On Mon, Nov 4, 2024 at 7:21 PM Moger, Babu <babu.moger@amd.com> wrote:
>>
>> Hi Peter,
>>
>> On 11/4/24 08:16, Peter Newman wrote:
>>> Hi Babu,
>>>
>>> On Wed, Oct 30, 2024 at 12:25 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>> The mbm_cntr_assign mode provides a limited number of hardware counters
>>>> that can be assigned to an RMID, event pair to monitor bandwidth while
>>>> assigned. If all counters are in use, the kernel will show an error
>>>> message: "Out of MBM assignable counters" when a new assignment is
>>>> requested. To make space for a new assignment, users must unassign an
>>>> already assigned counter.
>>>>
>>>> Introduce an interface that allows for the unassignment of counter IDs
>>>> from both the group and the domain. Additionally, ensure that the global
>>>> counter is released if it is no longer assigned to any domains.
>>>
>>> This seems unnecessarily restrictive. What's wrong with monitoring
>>> different groups in different domains?
>>
>> Yes. User can monitor different groups in different domains. But, they
>> will have to use different global counter for each group.
> 
> What is a global counter anyways? It sounds like an artifact of an
> earlier revision. This concept does not sound intuitive to the user.


# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
32

This is a global counter. We have totally 32 hardware counters.

This is tracked by the bitmap mbm_cntr_free_map.


> 
>>
>> Here is an example.
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>> groupA//0=t;1=_;
>> groupB//0=_;1=l;
>>
>> Group A - counter 0 (Assigned to total event in Domain 0)
>> Group B - counter 1 (Assigned to local event in Domain 1)
>>
>> We allocate two different counters here.  Now we are left with 30 counters
>> (max 32).
>>
>>
>> This is similar to CLOSID management we follow in resctrl. This is not a
>> new restriction,
> 
> It is a restriction in a new feature that resembles a restriction in
> an existing feature.
> 
> I don't see what function the global allocator serves now that there
> is already a per-domain allocator. My best guess is that it avoids the
> case of an mbm_assign_control write that succeeds in some domains but
> fails in others.
> 
> I admit I said earlier that I was only planning to allocate globally,
> but now that I'm evaluating how to make resctrl's monitoring
> functionality scale on large systems, I'm being forced to reconsider.
> 
> As long as this is only a limitation I can fix later, I don't see it
> as an obstacle. There would just need to be better documentation of
> what sort of internal data structures the user needs to visualize in
> order to use this feature successfully.


We have totally 32 global counters. That means we can assign up to 32 events.

Assigning events requires sending an IPI to write the MSR
(MSR_IA32_L3_QOS_ABMC_CFG) on every domain affected.

So, we wanted another bitmap to track if status of the assignment on each
domain. This is tracked by mbm_cntr_map. This bit is updated when we send
the IPI on that domain.

I dont consider this as a limitation. This helps to avoid sending
unnecessary IPIs to all the domains when user wants to assign an event.
This is kind of improvement I would say.

We still have the option to applying the assignment to all the domains by
setting "*" for the domain.

-- 
Thanks
Babu Moger