[PATCH v6 33/42] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point

James Morse posted 42 patches 1 year ago
There is a newer version of this series
[PATCH v6 33/42] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point
Posted by James Morse 1 year ago
resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
resctrl can't be built as a module, and the kernfs helpers are not exported
so this is unlikely to change. MPAM has an error interrupt which indicates
the MPAM driver has gone haywire. Should this occur tasks could run with
the wrong control values, leading to bad performance for important tasks.
The MPAM driver needs a way to tell resctrl that no further configuration
should be attempted.

Using resctrl_exit() for this leaves the system in a funny state as
resctrl is still mounted, but cannot be un-mounted because the sysfs
directory that is typically used has been removed. Dave Martin suggests
this may cause systemd trouble in the future as not all filesystems
can be unmounted.

Add calls to remove all the files and directories in resctrl, and
remove the sysfs_remove_mount_point() call that leaves the system
in a funny state. When triggered, this causes all the resctrl files
to disappear. resctrl can be unmounted, but not mounted again.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
Changes since v5:
 * Serialise rdtgroup_destroy_root() against umount().
 * Check rdtgroup_default.kn to protect against duplicate calls.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6e30283358d4..424622d2f959 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4092,8 +4092,12 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
 
 static void rdtgroup_destroy_root(void)
 {
-	kernfs_destroy_root(rdt_root);
-	rdtgroup_default.kn = NULL;
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (rdtgroup_default.kn) {
+		kernfs_destroy_root(rdt_root);
+		rdtgroup_default.kn = NULL;
+	}
 }
 
 static void __init rdtgroup_setup_default(void)
@@ -4371,9 +4375,12 @@ int __init resctrl_init(void)
 
 void __exit resctrl_exit(void)
 {
+	mutex_lock(&rdtgroup_mutex);
+	rdtgroup_destroy_root();
+	mutex_unlock(&rdtgroup_mutex);
+
 	debugfs_remove_recursive(debugfs_resctrl);
 	unregister_filesystem(&rdt_fs_type);
-	sysfs_remove_mount_point(fs_kobj, "resctrl");
 
 	resctrl_mon_resource_exit();
 }
-- 
2.39.2
Re: [PATCH v6 33/42] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point
Posted by Reinette Chatre 11 months, 3 weeks ago
Hi James,

On 2/7/25 10:18 AM, James Morse wrote:
> resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
> resctrl can't be built as a module, and the kernfs helpers are not exported
> so this is unlikely to change. MPAM has an error interrupt which indicates
> the MPAM driver has gone haywire. Should this occur tasks could run with
> the wrong control values, leading to bad performance for important tasks.
> The MPAM driver needs a way to tell resctrl that no further configuration
> should be attempted.
> 
> Using resctrl_exit() for this leaves the system in a funny state as
> resctrl is still mounted, but cannot be un-mounted because the sysfs
> directory that is typically used has been removed. Dave Martin suggests
> this may cause systemd trouble in the future as not all filesystems
> can be unmounted.
> 
> Add calls to remove all the files and directories in resctrl, and
> remove the sysfs_remove_mount_point() call that leaves the system
> in a funny state. When triggered, this causes all the resctrl files
> to disappear. resctrl can be unmounted, but not mounted again.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
> Changes since v5:
>  * Serialise rdtgroup_destroy_root() against umount().
>  * Check rdtgroup_default.kn to protect against duplicate calls.
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6e30283358d4..424622d2f959 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -4092,8 +4092,12 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
>  
>  static void rdtgroup_destroy_root(void)
>  {
> -	kernfs_destroy_root(rdt_root);
> -	rdtgroup_default.kn = NULL;
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	if (rdtgroup_default.kn) {
> +		kernfs_destroy_root(rdt_root);
> +		rdtgroup_default.kn = NULL;
> +	}
>  }
>  
>  static void __init rdtgroup_setup_default(void)
> @@ -4371,9 +4375,12 @@ int __init resctrl_init(void)
>  

Could you please add the kerneldoc you proposed in
https://lore.kernel.org/lkml/f2ecb501-bc65-49a9-903d-80ba1737845f@arm.com/ ?

>  void __exit resctrl_exit(void)
>  {
> +	mutex_lock(&rdtgroup_mutex);
> +	rdtgroup_destroy_root();
> +	mutex_unlock(&rdtgroup_mutex);
> +
>  	debugfs_remove_recursive(debugfs_resctrl);
>  	unregister_filesystem(&rdt_fs_type);
> -	sysfs_remove_mount_point(fs_kobj, "resctrl");
>  
>  	resctrl_mon_resource_exit();
>  }

It is difficult for me to follow the kernfs reference counting required
to make this work. Specifically, the root kn is "destroyed" here but it
is required to stick around until unmount when the rest of the files
are removed. Have you been able to test this flow? I think you mentioned
something like this before but I cannot find the details now.

Reinette
Re: [PATCH v6 33/42] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point
Posted by James Morse 11 months, 2 weeks ago
Hi Reinette,

On 20/02/2025 04:42, Reinette Chatre wrote:
> On 2/7/25 10:18 AM, James Morse wrote:
>> resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
>> resctrl can't be built as a module, and the kernfs helpers are not exported
>> so this is unlikely to change. MPAM has an error interrupt which indicates
>> the MPAM driver has gone haywire. Should this occur tasks could run with
>> the wrong control values, leading to bad performance for important tasks.
>> The MPAM driver needs a way to tell resctrl that no further configuration
>> should be attempted.
>>
>> Using resctrl_exit() for this leaves the system in a funny state as
>> resctrl is still mounted, but cannot be un-mounted because the sysfs
>> directory that is typically used has been removed. Dave Martin suggests
>> this may cause systemd trouble in the future as not all filesystems
>> can be unmounted.
>>
>> Add calls to remove all the files and directories in resctrl, and
>> remove the sysfs_remove_mount_point() call that leaves the system
>> in a funny state. When triggered, this causes all the resctrl files
>> to disappear. resctrl can be unmounted, but not mounted again.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6e30283358d4..424622d2f959 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -4371,9 +4375,12 @@ int __init resctrl_init(void)
>>  
> 
> Could you please add the kerneldoc you proposed in
> https://lore.kernel.org/lkml/f2ecb501-bc65-49a9-903d-80ba1737845f@arm.com/ ?

Huh. The way that is indented means I copied it out the file - I'm not sure went wrong
there. Thanks for fishing out the link!


>>  void __exit resctrl_exit(void)
>>  {
>> +	mutex_lock(&rdtgroup_mutex);
>> +	rdtgroup_destroy_root();
>> +	mutex_unlock(&rdtgroup_mutex);
>> +
>>  	debugfs_remove_recursive(debugfs_resctrl);
>>  	unregister_filesystem(&rdt_fs_type);
>> -	sysfs_remove_mount_point(fs_kobj, "resctrl");
>>  
>>  	resctrl_mon_resource_exit();
>>  }
> 
> It is difficult for me to follow the kernfs reference counting required
> to make this work. Specifically, the root kn is "destroyed" here but it
> is required to stick around until unmount when the rest of the files
> are removed.

This drops resctrl's reference to all of the files, which would make the files disappear.
unmount is what calls kernfs_kill_sb(), which gets rid of the root of the filesystem.


> Have you been able to test this flow? I think you mentioned
> something like this before but I cannot find the details now.

Yes:
https://web.git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot%2bextras/v6.14-rc1&id=8c96f858b25aa42694c5db56a2afe255ed8262dd

This is a debugfs file that schedules the threaded bit of the MPAM error interrupt
handler. I figure its MPAM specific because there is no way into this code on x86.
(the aim is to get the CI to tickle this)


Thanks,

James
Re: [PATCH v6 33/42] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point
Posted by Reinette Chatre 11 months, 2 weeks ago
Hi James,

On 2/28/25 11:54 AM, James Morse wrote:
> Hi Reinette,
> 
> On 20/02/2025 04:42, Reinette Chatre wrote:
>> On 2/7/25 10:18 AM, James Morse wrote:
>>> resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
>>> resctrl can't be built as a module, and the kernfs helpers are not exported
>>> so this is unlikely to change. MPAM has an error interrupt which indicates
>>> the MPAM driver has gone haywire. Should this occur tasks could run with
>>> the wrong control values, leading to bad performance for important tasks.
>>> The MPAM driver needs a way to tell resctrl that no further configuration
>>> should be attempted.
>>>
>>> Using resctrl_exit() for this leaves the system in a funny state as
>>> resctrl is still mounted, but cannot be un-mounted because the sysfs
>>> directory that is typically used has been removed. Dave Martin suggests
>>> this may cause systemd trouble in the future as not all filesystems
>>> can be unmounted.
>>>
>>> Add calls to remove all the files and directories in resctrl, and
>>> remove the sysfs_remove_mount_point() call that leaves the system
>>> in a funny state. When triggered, this causes all the resctrl files
>>> to disappear. resctrl can be unmounted, but not mounted again.
> 
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 6e30283358d4..424622d2f959 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -4371,9 +4375,12 @@ int __init resctrl_init(void)
>>>  
>>
>> Could you please add the kerneldoc you proposed in
>> https://lore.kernel.org/lkml/f2ecb501-bc65-49a9-903d-80ba1737845f@arm.com/ ?
> 
> Huh. The way that is indented means I copied it out the file - I'm not sure went wrong
> there. Thanks for fishing out the link!
> 
> 
>>>  void __exit resctrl_exit(void)
>>>  {
>>> +	mutex_lock(&rdtgroup_mutex);
>>> +	rdtgroup_destroy_root();
>>> +	mutex_unlock(&rdtgroup_mutex);
>>> +
>>>  	debugfs_remove_recursive(debugfs_resctrl);
>>>  	unregister_filesystem(&rdt_fs_type);
>>> -	sysfs_remove_mount_point(fs_kobj, "resctrl");
>>>  
>>>  	resctrl_mon_resource_exit();
>>>  }
>>
>> It is difficult for me to follow the kernfs reference counting required
>> to make this work. Specifically, the root kn is "destroyed" here but it
>> is required to stick around until unmount when the rest of the files
>> are removed.
> 
> This drops resctrl's reference to all of the files, which would make the files disappear.
> unmount is what calls kernfs_kill_sb(), which gets rid of the root of the filesystem.

My concern is mostly with the kernfs_remove() calls in the rdt_kill_sb()->rmdir_all_sub()
flow. For example:
	kernfs_remove(kn_info);
	kernfs_remove(kn_mongrp);
	kernfs_remove(kn_mondata);

As I understand the above require the destroyed root to still be around.

> 
> 
>> Have you been able to test this flow? I think you mentioned
>> something like this before but I cannot find the details now.
> 
> Yes:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot%2bextras/v6.14-rc1&id=8c96f858b25aa42694c5db56a2afe255ed8262dd
> 
> This is a debugfs file that schedules the threaded bit of the MPAM error interrupt
> handler. I figure its MPAM specific because there is no way into this code on x86.
> (the aim is to get the CI to tickle this)

Thank you.

Reinette
Re: [PATCH v6 33/42] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point
Posted by James Morse 11 months, 1 week ago
Hi Reinette,

On 01/03/2025 02:35, Reinette Chatre wrote:
> On 2/28/25 11:54 AM, James Morse wrote:
>> On 20/02/2025 04:42, Reinette Chatre wrote:
>>> On 2/7/25 10:18 AM, James Morse wrote:
>>>> resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
>>>> resctrl can't be built as a module, and the kernfs helpers are not exported
>>>> so this is unlikely to change. MPAM has an error interrupt which indicates
>>>> the MPAM driver has gone haywire. Should this occur tasks could run with
>>>> the wrong control values, leading to bad performance for important tasks.
>>>> The MPAM driver needs a way to tell resctrl that no further configuration
>>>> should be attempted.
>>>>
>>>> Using resctrl_exit() for this leaves the system in a funny state as
>>>> resctrl is still mounted, but cannot be un-mounted because the sysfs
>>>> directory that is typically used has been removed. Dave Martin suggests
>>>> this may cause systemd trouble in the future as not all filesystems
>>>> can be unmounted.
>>>>
>>>> Add calls to remove all the files and directories in resctrl, and
>>>> remove the sysfs_remove_mount_point() call that leaves the system
>>>> in a funny state. When triggered, this causes all the resctrl files
>>>> to disappear. resctrl can be unmounted, but not mounted again.
>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 6e30283358d4..424622d2f959 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -4371,9 +4375,12 @@ int __init resctrl_init(void)
>>>>  
>>>
>>> Could you please add the kerneldoc you proposed in
>>> https://lore.kernel.org/lkml/f2ecb501-bc65-49a9-903d-80ba1737845f@arm.com/ ?
>>
>> Huh. The way that is indented means I copied it out the file - I'm not sure went wrong
>> there. Thanks for fishing out the link!
>>
>>
>>>>  void __exit resctrl_exit(void)
>>>>  {
>>>> +	mutex_lock(&rdtgroup_mutex);
>>>> +	rdtgroup_destroy_root();
>>>> +	mutex_unlock(&rdtgroup_mutex);
>>>> +
>>>>  	debugfs_remove_recursive(debugfs_resctrl);
>>>>  	unregister_filesystem(&rdt_fs_type);
>>>> -	sysfs_remove_mount_point(fs_kobj, "resctrl");
>>>>  
>>>>  	resctrl_mon_resource_exit();
>>>>  }
>>>
>>> It is difficult for me to follow the kernfs reference counting required
>>> to make this work. Specifically, the root kn is "destroyed" here but it
>>> is required to stick around until unmount when the rest of the files
>>> are removed.
>>
>> This drops resctrl's reference to all of the files, which would make the files disappear.
>> unmount is what calls kernfs_kill_sb(), which gets rid of the root of the filesystem.
> 
> My concern is mostly with the kernfs_remove() calls in the rdt_kill_sb()->rmdir_all_sub()
> flow. For example:
> 	kernfs_remove(kn_info);
> 	kernfs_remove(kn_mongrp);
> 	kernfs_remove(kn_mondata);
> 
> As I understand the above require the destroyed root to still be around.

Right - because rdt_get_tree() has these global pointers into the hierarchy, but doesn't
take a reference. rmdir_all_sub() relies on always being called before
rdtgroup_destroy_root().

The point hack would be for rdtgroup_destroy_root() to NULL out those global pointers, (I
note they are left dangling) - that would make a subsequent call to rmdir_all_sub() harmless.

A better fix would be to pull out all the filesystem relevant parts from rdt_kill_sb(),
make that safe for multiple calls and get resctrl_exit() to call that.
A call to rdt_kill_sb() after resctrl_exit() would just cleanup the super-block.
This will leave things in a more predictable state.


>>> Have you been able to test this flow? I think you mentioned
>>> something like this before but I cannot find the details now.
>>
>> Yes:
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot%2bextras/v6.14-rc1&id=8c96f858b25aa42694c5db56a2afe255ed8262dd
>>
>> This is a debugfs file that schedules the threaded bit of the MPAM error interrupt
>> handler. I figure its MPAM specific because there is no way into this code on x86.
>> (the aim is to get the CI to tickle this)



Thanks,

James
Re: [PATCH v6 33/42] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point
Posted by Reinette Chatre 11 months, 1 week ago
Hi James,

On 3/6/25 11:28 AM, James Morse wrote:
> Hi Reinette,
> 
> On 01/03/2025 02:35, Reinette Chatre wrote:
>> On 2/28/25 11:54 AM, James Morse wrote:
>>> On 20/02/2025 04:42, Reinette Chatre wrote:
>>>> On 2/7/25 10:18 AM, James Morse wrote:
>>>>> resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
>>>>> resctrl can't be built as a module, and the kernfs helpers are not exported
>>>>> so this is unlikely to change. MPAM has an error interrupt which indicates
>>>>> the MPAM driver has gone haywire. Should this occur tasks could run with
>>>>> the wrong control values, leading to bad performance for important tasks.
>>>>> The MPAM driver needs a way to tell resctrl that no further configuration
>>>>> should be attempted.
>>>>>
>>>>> Using resctrl_exit() for this leaves the system in a funny state as
>>>>> resctrl is still mounted, but cannot be un-mounted because the sysfs
>>>>> directory that is typically used has been removed. Dave Martin suggests
>>>>> this may cause systemd trouble in the future as not all filesystems
>>>>> can be unmounted.
>>>>>
>>>>> Add calls to remove all the files and directories in resctrl, and
>>>>> remove the sysfs_remove_mount_point() call that leaves the system
>>>>> in a funny state. When triggered, this causes all the resctrl files
>>>>> to disappear. resctrl can be unmounted, but not mounted again.
>>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> index 6e30283358d4..424622d2f959 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> @@ -4371,9 +4375,12 @@ int __init resctrl_init(void)
>>>>>  
>>>>
>>>> Could you please add the kerneldoc you proposed in
>>>> https://lore.kernel.org/lkml/f2ecb501-bc65-49a9-903d-80ba1737845f@arm.com/ ?
>>>
>>> Huh. The way that is indented means I copied it out the file - I'm not sure went wrong
>>> there. Thanks for fishing out the link!
>>>
>>>
>>>>>  void __exit resctrl_exit(void)
>>>>>  {
>>>>> +	mutex_lock(&rdtgroup_mutex);
>>>>> +	rdtgroup_destroy_root();
>>>>> +	mutex_unlock(&rdtgroup_mutex);
>>>>> +
>>>>>  	debugfs_remove_recursive(debugfs_resctrl);
>>>>>  	unregister_filesystem(&rdt_fs_type);
>>>>> -	sysfs_remove_mount_point(fs_kobj, "resctrl");
>>>>>  
>>>>>  	resctrl_mon_resource_exit();
>>>>>  }
>>>>
>>>> It is difficult for me to follow the kernfs reference counting required
>>>> to make this work. Specifically, the root kn is "destroyed" here but it
>>>> is required to stick around until unmount when the rest of the files
>>>> are removed.
>>>
>>> This drops resctrl's reference to all of the files, which would make the files disappear.
>>> unmount is what calls kernfs_kill_sb(), which gets rid of the root of the filesystem.
>>
>> My concern is mostly with the kernfs_remove() calls in the rdt_kill_sb()->rmdir_all_sub()
>> flow. For example:
>> 	kernfs_remove(kn_info);
>> 	kernfs_remove(kn_mongrp);
>> 	kernfs_remove(kn_mondata);
>>
>> As I understand the above require the destroyed root to still be around.
> 
> Right - because rdt_get_tree() has these global pointers into the hierarchy, but doesn't
> take a reference. rmdir_all_sub() relies on always being called before
> rdtgroup_destroy_root().
> 
> The point hack would be for rdtgroup_destroy_root() to NULL out those global pointers, (I
> note they are left dangling) - that would make a subsequent call to rmdir_all_sub() harmless.
> 
> A better fix would be to pull out all the filesystem relevant parts from rdt_kill_sb(),
> make that safe for multiple calls and get resctrl_exit() to call that.
> A call to rdt_kill_sb() after resctrl_exit() would just cleanup the super-block.
> This will leave things in a more predictable state.
> 
> 

Since V7 has been posted already I try to keep things coherent by copying this
exchange and responded to you there [1].

Reinette

[1] https://lore.kernel.org/lkml/053d8a62-022b-4bf8-8e47-651e7c3a2d59@intel.com/