[PATCH v1 10/31] x86/resctrl: Move monitor init work to a resctrl init call

James Morse posted 31 patches 1 year, 10 months ago
There is a newer version of this series
[PATCH v1 10/31] x86/resctrl: Move monitor init work to a resctrl init call
Posted by James Morse 1 year, 10 months ago
rdt_get_mon_l3_config() is called from the architecture's
resctrl_arch_late_init(), and initialises both architecture specific
fields, such as hw_res->mon_scale and resctrl filesystem fields
by calling dom_data_init().

To separate the filesystem and architecture parts of resctrl, this
function needs splitting up.

Add resctrl_mon_resource_init() to do the filesystem specific work,
and call it from resctrl_init(). This runs later, but is still before
the filesystem is mounted and the rmid_ptrs[] array can be used.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  | 24 +++++++++++++++++-------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 ++++
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 031948322eab..7a0c74779c53 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -540,6 +540,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first);
+int resctrl_mon_resource_init(void);
 void mbm_setup_overflow_handler(struct rdt_domain *dom,
 				unsigned long delay_ms,
 				int exclude_cpu);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 06565153ceb2..929ec1430b45 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1003,12 +1003,28 @@ static void l3_mon_evt_init(struct rdt_resource *r)
 		list_add_tail(&mbm_local_event.list, &r->evt_list);
 }
 
+int resctrl_mon_resource_init(void)
+{
+	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+	int ret;
+
+	if (!r->mon_capable)
+		return 0;
+
+	ret = dom_data_init(r);
+	if (ret)
+		return ret;
+
+	l3_mon_evt_init(r);
+
+	return 0;
+}
+
 int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	unsigned int threshold;
-	int ret;
 
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
@@ -1036,10 +1052,6 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	 */
 	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
 
-	ret = dom_data_init(r);
-	if (ret)
-		return ret;
-
 	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
 		u32 eax, ebx, ecx, edx;
 
@@ -1057,8 +1069,6 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 		}
 	}
 
-	l3_mon_evt_init(r);
-
 	r->mon_capable = true;
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 13c24cb18d76..7a9696f53f2b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4138,6 +4138,10 @@ int __init resctrl_init(void)
 
 	rdtgroup_setup_default();
 
+	ret = resctrl_mon_resource_init();
+	if (ret)
+		return ret;
+
 	ret = sysfs_create_mount_point(fs_kobj, "resctrl");
 	if (ret)
 		return ret;
-- 
2.39.2
Re: [PATCH v1 10/31] x86/resctrl: Move monitor init work to a resctrl init call
Posted by Shawn Wang 1 year, 10 months ago
Hi James,

On 3/22/24 12:50 AM, James Morse wrote:
> rdt_get_mon_l3_config() is called from the architecture's
> resctrl_arch_late_init(), and initialises both architecture specific
> fields, such as hw_res->mon_scale and resctrl filesystem fields
> by calling dom_data_init().
> 
> To separate the filesystem and architecture parts of resctrl, this
> function needs splitting up.
> 
> Add resctrl_mon_resource_init() to do the filesystem specific work,
> and call it from resctrl_init(). This runs later, but is still before
> the filesystem is mounted and the rmid_ptrs[] array can be used.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>   arch/x86/kernel/cpu/resctrl/monitor.c  | 24 +++++++++++++++++-------
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 ++++
>   3 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 031948322eab..7a0c74779c53 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -540,6 +540,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg);
>   void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>   		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
>   		    int evtid, int first);
> +int resctrl_mon_resource_init(void);
>   void mbm_setup_overflow_handler(struct rdt_domain *dom,
>   				unsigned long delay_ms,
>   				int exclude_cpu);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 06565153ceb2..929ec1430b45 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1003,12 +1003,28 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>   		list_add_tail(&mbm_local_event.list, &r->evt_list);
>   }
>   
> +int resctrl_mon_resource_init(void)
> +{
> +	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> +	int ret;
> +
> +	if (!r->mon_capable)
> +		return 0;
> +
> +	ret = dom_data_init(r);
> +	if (ret)
> +		return ret;
> +
> +	l3_mon_evt_init(r);
> +
> +	return 0;
> +}
> +

Now x86 platform defaults to all monitoring features on the L3 cache, but some monitoring features may also be on the L2 cache or memory controller according to the MPAM spec. For example, the memory bandwidth monitors could be on the memory controller.

Will resctrl_mon_resource_init() consider this scenario?

Best Regards,
Shawn
Re: [PATCH v1 10/31] x86/resctrl: Move monitor init work to a resctrl init call
Posted by James Morse 1 year, 8 months ago
Hi Shawn,

On 10/04/2024 03:57, Shawn Wang wrote:
> On 3/22/24 12:50 AM, James Morse wrote:
>> rdt_get_mon_l3_config() is called from the architecture's
>> resctrl_arch_late_init(), and initialises both architecture specific
>> fields, such as hw_res->mon_scale and resctrl filesystem fields
>> by calling dom_data_init().
>>
>> To separate the filesystem and architecture parts of resctrl, this
>> function needs splitting up.
>>
>> Add resctrl_mon_resource_init() to do the filesystem specific work,
>> and call it from resctrl_init(). This runs later, but is still before
>> the filesystem is mounted and the rmid_ptrs[] array can be used.


> Now x86 platform defaults to all monitoring features on the L3 cache, but some monitoring
> features may also be on the L2 cache or memory controller according to the MPAM spec. For
> example, the memory bandwidth monitors could be on the memory controller.
> 
> Will resctrl_mon_resource_init() consider this scenario?

Nope.

The aim here is to get existing user-space software working on machines with MPAM.
You can build an MPAM machine with monitors on the L1, L2 and L5 - but resctrl can't
expose them, and there is no existing software that makes use of them.

The result is the MPAM driver plays fast and loose with some of this stuff to try and slot
the machine into a Xeon shaped hole.

I'd like to support this in the future, but it will need user visible filesystem changes -
its not worth discussing now.
I strongly suspect that declaring all monitors platform-specific, and plumbing them out
via perf is the best thing for everyone...



Thanks,

James
Re: [PATCH v1 10/31] x86/resctrl: Move monitor init work to a resctrl init call
Posted by Dave Martin 1 year, 10 months ago
On Wed, Apr 10, 2024 at 10:57:30AM +0800, Shawn Wang wrote:
> Hi James,
> 
> On 3/22/24 12:50 AM, James Morse wrote:
> > rdt_get_mon_l3_config() is called from the architecture's
> > resctrl_arch_late_init(), and initialises both architecture specific
> > fields, such as hw_res->mon_scale and resctrl filesystem fields
> > by calling dom_data_init().
> > 
> > To separate the filesystem and architecture parts of resctrl, this
> > function needs splitting up.
> > 
> > Add resctrl_mon_resource_init() to do the filesystem specific work,
> > and call it from resctrl_init(). This runs later, but is still before
> > the filesystem is mounted and the rmid_ptrs[] array can be used.
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > ---
> >   arch/x86/kernel/cpu/resctrl/internal.h |  1 +
> >   arch/x86/kernel/cpu/resctrl/monitor.c  | 24 +++++++++++++++++-------
> >   arch/x86/kernel/cpu/resctrl/rdtgroup.c |  4 ++++
> >   3 files changed, 22 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> > index 031948322eab..7a0c74779c53 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -540,6 +540,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg);
> >   void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
> >   		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
> >   		    int evtid, int first);
> > +int resctrl_mon_resource_init(void);
> >   void mbm_setup_overflow_handler(struct rdt_domain *dom,
> >   				unsigned long delay_ms,
> >   				int exclude_cpu);
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index 06565153ceb2..929ec1430b45 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -1003,12 +1003,28 @@ static void l3_mon_evt_init(struct rdt_resource *r)
> >   		list_add_tail(&mbm_local_event.list, &r->evt_list);
> >   }
> > +int resctrl_mon_resource_init(void)
> > +{
> > +	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> > +	int ret;
> > +
> > +	if (!r->mon_capable)
> > +		return 0;
> > +
> > +	ret = dom_data_init(r);
> > +	if (ret)
> > +		return ret;
> > +
> > +	l3_mon_evt_init(r);
> > +
> > +	return 0;
> > +}
> > +
> 
> Now x86 platform defaults to all monitoring features on the L3 cache, but some monitoring features may also be on the L2 cache or memory controller according to the MPAM spec. For example, the memory bandwidth monitors could be on the memory controller.
> 
> Will resctrl_mon_resource_init() consider this scenario?
> 
> Best Regards,
> Shawn

My understanding is limited here, so this will probably have to wait
until James gets back for a full answer.

The resctrl userspace filesystem interface looks like it can describe
monitors at multiple levels of the memory system, so my assumption would
be that it should be possible to wire it up.

This series focuses on refactoring and does not try to add new
functionality, though.

Cheers
---Dave
Re: [PATCH v1 10/31] x86/resctrl: Move monitor init work to a resctrl init call
Posted by Reinette Chatre 1 year, 10 months ago
Hi James,

On 3/21/2024 9:50 AM, James Morse wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 13c24cb18d76..7a9696f53f2b 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -4138,6 +4138,10 @@ int __init resctrl_init(void)
>  
>  	rdtgroup_setup_default();
>  
> +	ret = resctrl_mon_resource_init();
> +	if (ret)
> +		return ret;
> +
>  	ret = sysfs_create_mount_point(fs_kobj, "resctrl");
>  	if (ret)
>  		return ret;

Should the mon data be cleaned up if sysfs_create_mount_point() fails?

Reinette