[PATCH v3 16/38] x86/resctrl: Move monitor init work to a resctrl init call

James Morse posted 38 patches 1 year, 6 months ago
There is a newer version of this series
[PATCH v3 16/38] x86/resctrl: Move monitor init work to a resctrl init call
Posted by James Morse 1 year, 6 months ago
rdt_get_mon_l3_config() is called from the architecture's
resctrl_arch_late_init(), and initialises both architecture specific
fields, such as hw_res->mon_scale and resctrl filesystem fields
by calling dom_data_init().

To separate the filesystem and architecture parts of resctrl, this
function needs splitting up.

Add resctrl_mon_resource_init() to do the filesystem specific work,
and call it from resctrl_init(). This runs later, but is still before
the filesystem is mounted and the rmid_ptrs[] array can be used.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
---
Changes since v2:
 * Added error handling for the case sysfs files can't be created.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  | 24 +++++++++++++++++-------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  9 ++++++++-
 3 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 9aa7f587484c..eaf458967fa1 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -542,6 +542,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first);
+int resctrl_mon_resource_init(void);
 void mbm_setup_overflow_handler(struct rdt_domain *dom,
 				unsigned long delay_ms,
 				int exclude_cpu);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7d6aebce75c1..527c0e9d7b2e 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1016,12 +1016,28 @@ static void l3_mon_evt_init(struct rdt_resource *r)
 		list_add_tail(&mbm_local_event.list, &r->evt_list);
 }
 
+int resctrl_mon_resource_init(void)
+{
+	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+	int ret;
+
+	if (!r->mon_capable)
+		return 0;
+
+	ret = dom_data_init(r);
+	if (ret)
+		return ret;
+
+	l3_mon_evt_init(r);
+
+	return 0;
+}
+
 int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	unsigned int threshold;
-	int ret;
 
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
@@ -1049,10 +1065,6 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	 */
 	resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);
 
-	ret = dom_data_init(r);
-	if (ret)
-		return ret;
-
 	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
 		u32 eax, ebx, ecx, edx;
 
@@ -1070,8 +1082,6 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 		}
 	}
 
-	l3_mon_evt_init(r);
-
 	r->mon_capable = true;
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8c380f389b93..9c03e973a5f6 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4168,9 +4168,15 @@ int __init resctrl_init(void)
 
 	rdtgroup_setup_default();
 
+	ret = resctrl_mon_resource_init();
+	if (ret)
+		return ret;
+
 	ret = sysfs_create_mount_point(fs_kobj, "resctrl");
-	if (ret)
+	if (ret) {
+		resctrl_mon_resource_exit();
 		return ret;
+	}
 
 	ret = register_filesystem(&rdt_fs_type);
 	if (ret)
@@ -4203,6 +4209,7 @@ int __init resctrl_init(void)
 
 cleanup_mountpoint:
 	sysfs_remove_mount_point(fs_kobj, "resctrl");
+	resctrl_mon_resource_exit();
 
 	return ret;
 }
-- 
2.39.2
Re: [PATCH v3 16/38] x86/resctrl: Move monitor init work to a resctrl init call
Posted by Reinette Chatre 1 year, 5 months ago
Hi James,

On 6/14/24 8:00 AM, James Morse wrote:
> rdt_get_mon_l3_config() is called from the architecture's
> resctrl_arch_late_init(), and initialises both architecture specific
> fields, such as hw_res->mon_scale and resctrl filesystem fields
> by calling dom_data_init().
> 
> To separate the filesystem and architecture parts of resctrl, this
> function needs splitting up.
> 
> Add resctrl_mon_resource_init() to do the filesystem specific work,
> and call it from resctrl_init(). This runs later, but is still before
> the filesystem is mounted and the rmid_ptrs[] array can be used.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> ---
> Changes since v2:
>   * Added error handling for the case sysfs files can't be created.
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>   arch/x86/kernel/cpu/resctrl/monitor.c  | 24 +++++++++++++++++-------
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c |  9 ++++++++-
>   3 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 9aa7f587484c..eaf458967fa1 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -542,6 +542,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg);
>   void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>   		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
>   		    int evtid, int first);
> +int resctrl_mon_resource_init(void);
>   void mbm_setup_overflow_handler(struct rdt_domain *dom,
>   				unsigned long delay_ms,
>   				int exclude_cpu);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 7d6aebce75c1..527c0e9d7b2e 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1016,12 +1016,28 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>   		list_add_tail(&mbm_local_event.list, &r->evt_list);
>   }
>   
> +int resctrl_mon_resource_init(void)

(Lack of an __init is unexpected but I assume it was done since that will be removed
in later patch anyway?)

This function needs a big warning to deter anybody from considering this to
be the place where any and all monitor related allocations happen. It needs
to warn developers that only resources that can only be touched after fs mount
may be allocated here.

Reinette
Re: [PATCH v3 16/38] x86/resctrl: Move monitor init work to a resctrl init call
Posted by James Morse 1 year, 5 months ago
Hi Reinette,

On 28/06/2024 17:47, Reinette Chatre wrote:
> On 6/14/24 8:00 AM, James Morse wrote:
>> rdt_get_mon_l3_config() is called from the architecture's
>> resctrl_arch_late_init(), and initialises both architecture specific
>> fields, such as hw_res->mon_scale and resctrl filesystem fields
>> by calling dom_data_init().
>>
>> To separate the filesystem and architecture parts of resctrl, this
>> function needs splitting up.
>>
>> Add resctrl_mon_resource_init() to do the filesystem specific work,
>> and call it from resctrl_init(). This runs later, but is still before
>> the filesystem is mounted and the rmid_ptrs[] array can be used.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 7d6aebce75c1..527c0e9d7b2e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -1016,12 +1016,28 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>>           list_add_tail(&mbm_local_event.list, &r->evt_list);
>>   }
>>   +int resctrl_mon_resource_init(void)
> 
> (Lack of an __init is unexpected but I assume it was done since that will be removed
> in later patch anyway?)

Yup - I'll add and remove that if you find it surprising.


> This function needs a big warning to deter anybody from considering this to
> be the place where any and all monitor related allocations happen. It needs
> to warn developers that only resources that can only be touched after fs mount
> may be allocated here.

I'm afraid I don't follow. Can you give an example of the scenario you are worried about?

This is called from resctrl_init(), which is called once the architecture code has done
its setup, and reckons resctrl is something that can be supported on this platform. It
would be safe for the limbo/overflow callbacks to start ticking after this point - but
there is no point if the filesystem isn't mounted yet.
Filesystem mount is triggered through rdt_get_tree(). The filesystem can't be mounted
until resctrl_init() goes on to call register_filesystem().
These allocations could be made later (at mount time), but they're allocated once up-front
today.


I've added:
/**
 * resctrl_mon_resource_init() - Initialise monitoring structures.
 *
 * Allocate and initialise the rmid_ptrs[] used for the limbo and free lists.
 * Called once during boot after the struct rdt_resource's have been configured
 * but before the filesystem is mounted.
 */


Thanks,

James
Re: [PATCH v3 16/38] x86/resctrl: Move monitor init work to a resctrl init call
Posted by Reinette Chatre 1 year, 5 months ago
Hi James,

On 7/1/24 11:17 AM, James Morse wrote:
> Hi Reinette,
> 
> On 28/06/2024 17:47, Reinette Chatre wrote:
>> On 6/14/24 8:00 AM, James Morse wrote:
>>> rdt_get_mon_l3_config() is called from the architecture's
>>> resctrl_arch_late_init(), and initialises both architecture specific
>>> fields, such as hw_res->mon_scale and resctrl filesystem fields
>>> by calling dom_data_init().
>>>
>>> To separate the filesystem and architecture parts of resctrl, this
>>> function needs splitting up.
>>>
>>> Add resctrl_mon_resource_init() to do the filesystem specific work,
>>> and call it from resctrl_init(). This runs later, but is still before
>>> the filesystem is mounted and the rmid_ptrs[] array can be used.
> 
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 7d6aebce75c1..527c0e9d7b2e 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> @@ -1016,12 +1016,28 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>>>            list_add_tail(&mbm_local_event.list, &r->evt_list);
>>>    }
>>>    +int resctrl_mon_resource_init(void)
>>
>> (Lack of an __init is unexpected but I assume it was done since that will be removed
>> in later patch anyway?)
> 
> Yup - I'll add and remove that if you find it surprising.
> 
> 
>> This function needs a big warning to deter anybody from considering this to
>> be the place where any and all monitor related allocations happen. It needs
>> to warn developers that only resources that can only be touched after fs mount
>> may be allocated here.
> 
> I'm afraid I don't follow. Can you give an example of the scenario you are worried about?

My concern is not a scenario with current code flow but a request for informational
comments to prevent future mistakes. Specifically, as I understand the CPU online/offline
handlers can run before this function is called. Those handlers do a lot of setup, getting
resctrl and the system ready. It can be reasonable that some future action may need to touch
a new monitoring structure and with a name like resctrl_mon_resource_init() it seems appropriate
to allocate this new monitoring structure there. I am hoping that resctrl_mon_resource_init()
will have sufficient comments to deter that.

> This is called from resctrl_init(), which is called once the architecture code has done
> its setup, and reckons resctrl is something that can be supported on this platform. It
> would be safe for the limbo/overflow callbacks to start ticking after this point - but
> there is no point if the filesystem isn't mounted yet.
> Filesystem mount is triggered through rdt_get_tree(). The filesystem can't be mounted
> until resctrl_init() goes on to call register_filesystem().
> These allocations could be made later (at mount time), but they're allocated once up-front
> today.
> 
> 
> I've added:
> /**
>   * resctrl_mon_resource_init() - Initialise monitoring structures.

How about a more specific "Initialise monitoring structures used after filesystem mount"?

>   *
>   * Allocate and initialise the rmid_ptrs[] used for the limbo and free lists.
>   * Called once during boot after the struct rdt_resource's have been configured
>   * but before the filesystem is mounted.

Can there be a warning (please feel free to improve):
	"Only for resources used after filesystem mount. For example, do not allocate resources
	 needed by the CPU online/offline handlers since these handlers may run before this
	 function."

>   */
> 
> 
> Thanks,
> 
> James

Reinette
Re: [PATCH v3 16/38] x86/resctrl: Move monitor init work to a resctrl init call
Posted by James Morse 1 year, 4 months ago
Hi Reinette,

On 01/07/2024 22:11, Reinette Chatre wrote:
> On 7/1/24 11:17 AM, James Morse wrote:
>> On 28/06/2024 17:47, Reinette Chatre wrote:
>>> On 6/14/24 8:00 AM, James Morse wrote:
>>>> rdt_get_mon_l3_config() is called from the architecture's
>>>> resctrl_arch_late_init(), and initialises both architecture specific
>>>> fields, such as hw_res->mon_scale and resctrl filesystem fields
>>>> by calling dom_data_init().
>>>>
>>>> To separate the filesystem and architecture parts of resctrl, this
>>>> function needs splitting up.
>>>>
>>>> Add resctrl_mon_resource_init() to do the filesystem specific work,
>>>> and call it from resctrl_init(). This runs later, but is still before
>>>> the filesystem is mounted and the rmid_ptrs[] array can be used.
>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> index 7d6aebce75c1..527c0e9d7b2e 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> @@ -1016,12 +1016,28 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>>>>            list_add_tail(&mbm_local_event.list, &r->evt_list);
>>>>    }
>>>>    +int resctrl_mon_resource_init(void)
>>>
>>> (Lack of an __init is unexpected but I assume it was done since that will be removed
>>> in later patch anyway?)
>>
>> Yup - I'll add and remove that if you find it surprising.
>>
>>
>>> This function needs a big warning to deter anybody from considering this to
>>> be the place where any and all monitor related allocations happen. It needs
>>> to warn developers that only resources that can only be touched after fs mount
>>> may be allocated here.
>>
>> I'm afraid I don't follow. Can you give an example of the scenario you are worried about?

> My concern is not a scenario with current code flow but a request for informational
> comments to prevent future mistakes. Specifically, as I understand the CPU online/offline
> handlers can run before this function is called. Those handlers do a lot of setup, getting
> resctrl and the system ready. It can be reasonable that some future action may need to touch
> a new monitoring structure and with a name like resctrl_mon_resource_init() it seems
> appropriate
> to allocate this new monitoring structure there. I am hoping that resctrl_mon_resource_init()
> will have sufficient comments to deter that.

Ah, Of course! ... this is about 'global' allocations that don't belong to a specific domain.

I've reworded the comment above the function as:
| * Allocate and initialise global monitor resources that do not belong to a
| * specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists.
| * Called once during boot after the struct rdt_resource's have been configured
| * but before the filesystem is mounted.
| * Resctrl's cpuhp callbacks may be called before this point to bring a domain
| * online.

and a similar comment above domain_setup_mon_state:
| * Allocate monitor resources that belong to this domain.
| * Called when the first CPU of a domain comes online, regardless of whether
| * the filesystem is mounted.
| * During boot this may be called before global allocations have been made by
| * resctrl_mon_resource_init().



>> This is called from resctrl_init(), which is called once the architecture code has done
>> its setup, and reckons resctrl is something that can be supported on this platform. It
>> would be safe for the limbo/overflow callbacks to start ticking after this point - but
>> there is no point if the filesystem isn't mounted yet.
>> Filesystem mount is triggered through rdt_get_tree(). The filesystem can't be mounted
>> until resctrl_init() goes on to call register_filesystem().
>> These allocations could be made later (at mount time), but they're allocated once up-front
>> today.
>>
>>
>> I've added:
>> /**
>>   * resctrl_mon_resource_init() - Initialise monitoring structures.
> 
> How about a more specific "Initialise monitoring structures used after filesystem mount"?

Sure, this has become;
| * resctrl_mon_resource_init() - Initialise global monitoring structures used
| *				  after filesystem mount.


>>   *
>>   * Allocate and initialise the rmid_ptrs[] used for the limbo and free lists.
>>   * Called once during boot after the struct rdt_resource's have been configured
>>   * but before the filesystem is mounted.
> 
> Can there be a warning (please feel free to improve):
>     "Only for resources used after filesystem mount. For example, do not allocate resources
>      needed by the CPU online/offline handlers since these handlers may run before this
>      function."

Enumerating what not to do feels like the beginning of a never ending story!
I think describing these as global/specific-to-a-domain makes it clear what kind of
allocation should go here.


Thanks,

James