[PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode

Babu Moger posted 24 patches 1 year ago
There is a newer version of this series
[PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
Posted by Babu Moger 1 year ago
In mbm_cntr_assign mode, the hardware counter should be assigned to read
the MBM events.

Report 'Unassigned' in case the user attempts to read the events without
assigning the counter.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v10: Moved the code to check the assign state inside mon_event_read().
     Fixed few text comments.

v9: Used is_mbm_event() to check the event type.
    Minor user documentation update.

v8: Used MBM_EVENT_ARRAY_INDEX to get the index for the MBM event.
    Documentation update to make the text generic.

v7: Moved the documentation under "mon_data".
    Updated the text little bit.

v6: Added more explaination in the resctrl.rst
    Added checks to detect "Unassigned" before reading RMID.

v5: New patch.
---
 Documentation/arch/x86/resctrl.rst        | 10 ++++++++++
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  8 ++++++++
 arch/x86/kernel/cpu/resctrl/internal.h    |  2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  4 ++--
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index c075fcee96b7..3ec14c314606 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -430,6 +430,16 @@ When monitoring is enabled all MON groups will also contain:
 	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
 	where "YY" is the node number.
 
+	When supported the mbm_cntr_assign mode allows users to assign a
+	counter to mon_hw_id, event pair enabling bandwidth monitoring for
+	as long as the counter remains assigned. The hardware will continue
+	tracking the assigned mon_hw_id until the user manually unassigns
+	it, ensuring that counters are not reset during this period. With
+	a limited number of counters, the system may run out of assignable
+	counters. In that case, MBM event counters will return 'Unassigned'
+	when the event is read. Users must manually assign a counter to read
+	the events.
+
 "mon_hw_id":
 	Available only with debug option. The identifier used by hardware
 	for the monitor group. On x86 this is the RMID.
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 200d89a64027..8e265a86e524 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -527,6 +527,12 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
 
+	if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid) &&
+	    !mbm_cntr_assigned(r, d, rdtgrp, evtid)) {
+		rr->err = -ENOENT;
+		return;
+	}
+
 	/*
 	 * Setup the parameters to pass to mon_event_count() to read the data.
 	 */
@@ -618,6 +624,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		seq_puts(m, "Error\n");
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
+	else if (rr.err == -ENOENT)
+		seq_puts(m, "Unassigned\n");
 	else
 		seq_printf(m, "%llu\n", rr.val);
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f858098dbe4b..bb3213a7993e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -710,4 +710,6 @@ int rdtgroup_unassign_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp
 				 struct rdt_mon_domain *d, enum resctrl_event_id evtid);
 struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
 				u32 rmid, enum resctrl_event_id evtid);
+bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
+		       struct rdtgroup *rdtgrp, enum resctrl_event_id evtid);
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5acae525881a..8d00b1689a80 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1967,8 +1967,8 @@ static int resctrl_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 	return ret;
 }
 
-static bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
-			      struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
+bool mbm_cntr_assigned(struct rdt_resource *r, struct rdt_mon_domain *d,
+		       struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
 {
 	int cntr_id;
 
-- 
2.34.1
Re: [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
Posted by Reinette Chatre 11 months, 4 weeks ago
Hi Babu,

On 12/12/24 12:15 PM, Babu Moger wrote:
> In mbm_cntr_assign mode, the hardware counter should be assigned to read
> the MBM events.
> 
> Report 'Unassigned' in case the user attempts to read the events without
> assigning the counter.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

..

> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index c075fcee96b7..3ec14c314606 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -430,6 +430,16 @@ When monitoring is enabled all MON groups will also contain:
>  	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>  	where "YY" is the node number.
>  
> +	When supported the mbm_cntr_assign mode allows users to assign a

"When supported" -> "When enabled"? Or perhaps just drop that and start with
"mbm_cntr_assign mode allows users ..."


> +	counter to mon_hw_id, event pair enabling bandwidth monitoring for
> +	as long as the counter remains assigned. The hardware will continue
> +	tracking the assigned mon_hw_id until the user manually unassigns
> +	it, ensuring that counters are not reset during this period. With
> +	a limited number of counters, the system may run out of assignable
> +	counters. In that case, MBM event counters will return 'Unassigned'
> +	when the event is read. Users must manually assign a counter to read
> +	the events.
> +
>  "mon_hw_id":
>  	Available only with debug option. The identifier used by hardware
>  	for the monitor group. On x86 this is the RMID.
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 200d89a64027..8e265a86e524 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -527,6 +527,12 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
>  	lockdep_assert_cpus_held();
>  
> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid) &&
> +	    !mbm_cntr_assigned(r, d, rdtgrp, evtid)) {
> +		rr->err = -ENOENT;
> +		return;
> +	}
> +

hmmm ... d can be NULL here after the SNC support. Since the file that needs a
sum is essentially software backed I do not think assigning counters would
apply to it (but it may theoretically apply to the domains it consists of).
I think it may be safer to just move this check into rdtgroup_mondata_show()
where it reads data for a single domain.

I am not sure if we need to change the documentation because of this. One option
could be a rewording to "MBM event counters may return 'Unassigned' or
'Unavailable' when the event is read".

Reinette
Re: [PATCH v10 19/24] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode
Posted by Moger, Babu 11 months, 4 weeks ago
Hi Reinette,

On 12/19/2024 5:59 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 12/12/24 12:15 PM, Babu Moger wrote:
>> In mbm_cntr_assign mode, the hardware counter should be assigned to read
>> the MBM events.
>>
>> Report 'Unassigned' in case the user attempts to read the events without
>> assigning the counter.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
> ..
> 
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index c075fcee96b7..3ec14c314606 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -430,6 +430,16 @@ When monitoring is enabled all MON groups will also contain:
>>   	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
>>   	where "YY" is the node number.
>>   
>> +	When supported the mbm_cntr_assign mode allows users to assign a
> 
> "When supported" -> "When enabled"? Or perhaps just drop that and start with
> "mbm_cntr_assign mode allows users ..."

ok.

> 
> 
>> +	counter to mon_hw_id, event pair enabling bandwidth monitoring for
>> +	as long as the counter remains assigned. The hardware will continue
>> +	tracking the assigned mon_hw_id until the user manually unassigns
>> +	it, ensuring that counters are not reset during this period. With
>> +	a limited number of counters, the system may run out of assignable
>> +	counters. In that case, MBM event counters will return 'Unassigned'
>> +	when the event is read. Users must manually assign a counter to read
>> +	the events.
>> +
>>   "mon_hw_id":
>>   	Available only with debug option. The identifier used by hardware
>>   	for the monitor group. On x86 this is the RMID.
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 200d89a64027..8e265a86e524 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -527,6 +527,12 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>>   	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
>>   	lockdep_assert_cpus_held();
>>   
>> +	if (resctrl_arch_mbm_cntr_assign_enabled(r) && is_mbm_event(evtid) &&
>> +	    !mbm_cntr_assigned(r, d, rdtgrp, evtid)) {
>> +		rr->err = -ENOENT;
>> +		return;
>> +	}
>> +
> 
> hmmm ... d can be NULL here after the SNC support. Since the file that needs a
> sum is essentially software backed I do not think assigning counters would
> apply to it (but it may theoretically apply to the domains it consists of).
> I think it may be safer to just move this check into rdtgroup_mondata_show()
> where it reads data for a single domain.

Sure.

> 
> I am not sure if we need to change the documentation because of this. One option
> could be a rewording to "MBM event counters may return 'Unassigned' or
> 'Unavailable' when the event is read".

ok.
Thanks
Babu