[v9] x86/resctrl: Move the resctrl filesystem code to /fs/resctrl

[PATCH v9 04/27] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point

Posted by James Morse 9 months, 2 weeks ago

resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
resctrl can't be built as a module, and the kernfs helpers are not exported
so this is unlikely to change. MPAM has an error interrupt which indicates
the MPAM driver has gone haywire. Should this occur tasks could run with
the wrong control values, leading to bad performance for important tasks.
In this scenario the MPAM driver will reset the hardware, but it needs
a way to tell resctrl that no further configuration should be attempted.

In particular, moving tasks between control or monitor groups does not
interact with the architecture code, so there is no opportunity for the
arch code to indicate that the hardware is no-longer functioning.

Using resctrl_exit() for this leaves the system in a funny state as
resctrl is still mounted, but cannot be un-mounted because the sysfs
directory that is typically used has been removed. Dave Martin suggests
this may cause systemd trouble in the future as not all filesystems
can be unmounted.

Add calls to remove all the files and directories in resctrl, and
remove the sysfs_remove_mount_point() call that leaves the system
in a funny state. When triggered, this causes all the resctrl files
to disappear. resctrl can be unmounted, but not mounted again.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
Changes since v8:
 * Call resctrl_fs_teardown() under cpuhp lock, for good measure.
 * Set debugfs_resctrl to NULL to be robust against a second call.
 * Added more documentation commentary.
 * Moved some words around to change the tone.

Changes since v7:
 * Moved kernfs cleanup functions to resctrl_fs_teardown(), and call this
   from rescrl_exit().
 * Added description of MPAM resetting the hardware to the commit message.

Changes since v6:
 * Added kdoc and comment to resctrl_exit().

Changes since v5:
 * Serialise rdtgroup_destroy_root() against umount().
 * Check rdtgroup_default.kn to protect against duplicate calls.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 48 +++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f617ac97758b..3d105b546842 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3078,6 +3078,22 @@ static void rmdir_all_sub(void)
 	kernfs_remove(kn_mondata);
 }
 
+static void resctrl_fs_teardown(void)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	/* Cleared by rdtgroup_destroy_root() */
+	if (!rdtgroup_default.kn)
+		return;
+
+	rmdir_all_sub();
+	rdt_pseudo_lock_release();
+	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+	closid_exit();
+	schemata_list_destroy();
+	rdtgroup_destroy_root();
+}
+
 static void rdt_kill_sb(struct super_block *sb)
 {
 	struct rdt_resource *r;
@@ -3091,12 +3107,7 @@ static void rdt_kill_sb(struct super_block *sb)
 	for_each_alloc_capable_rdt_resource(r)
 		resctrl_arch_reset_all_ctrls(r);
 
-	rmdir_all_sub();
-	rdt_pseudo_lock_release();
-	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
-	closid_exit();
-	schemata_list_destroy();
-	rdtgroup_destroy_root();
+	resctrl_fs_teardown();
 	if (resctrl_arch_alloc_capable())
 		resctrl_arch_disable_alloc();
 	if (resctrl_arch_mon_capable())
@@ -4127,6 +4138,8 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
 
 static void rdtgroup_destroy_root(void)
 {
+	lockdep_assert_held(&rdtgroup_mutex);
+
 	kernfs_destroy_root(rdt_root);
 	rdtgroup_default.kn = NULL;
 }
@@ -4432,23 +4445,42 @@ static bool __exit resctrl_online_domains_exist(void)
 	return false;
 }
 
-/*
+/**
  * resctrl_exit() - Remove the resctrl filesystem and free resources.
  *
+ * Called by the architecture code in response to a fatal error.
+ * Removes resctrl files and structures from kernfs to prevent further
+ * configuration.
+ *
  * When called by the architecture code, all CPUs and resctrl domains must be
  * offline. This ensures the limbo and overflow handlers are not scheduled to
  * run, meaning the data structures they access can be freed by
  * resctrl_mon_resource_exit().
+ *
+ * After this function has returned, the architecture code should return an
+ * from all resctrl_arch_ functions that can do this.
+ * resctrl_arch_get_resource() must continue to return struct rdt_resources
+ * with the correct rid field to ensure the filesystem can be unmounted.
  */
 void __exit resctrl_exit(void)
 {
 	cpus_read_lock();
 	WARN_ON_ONCE(resctrl_online_domains_exist());
+
+	mutex_lock(&rdtgroup_mutex);
+	resctrl_fs_teardown();
+	mutex_unlock(&rdtgroup_mutex);
+
 	cpus_read_unlock();
 
 	debugfs_remove_recursive(debugfs_resctrl);
+	debugfs_resctrl = NULL;
 	unregister_filesystem(&rdt_fs_type);
-	sysfs_remove_mount_point(fs_kobj, "resctrl");
+
+	/*
+	 * Do not remove the sysfs mount point added by resctrl_init() so that
+	 * it can be used to umount resctrl.
+	 */
 
 	resctrl_mon_resource_exit();
 }
-- 
2.39.5

Re: [PATCH v9 04/27] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point

Posted by Reinette Chatre 9 months, 1 week ago

Hi James,

On 4/25/25 10:37 AM, James Morse wrote:
> @@ -4432,23 +4445,42 @@ static bool __exit resctrl_online_domains_exist(void)
>  	return false;
>  }
>  
> -/*
> +/**

Why make the switch to kernel-doc now? The benefit is not clear considering
resctrl_init() is not using kernel-doc.

>   * resctrl_exit() - Remove the resctrl filesystem and free resources.
>   *
> + * Called by the architecture code in response to a fatal error.
> + * Removes resctrl files and structures from kernfs to prevent further
> + * configuration.
> + *
>   * When called by the architecture code, all CPUs and resctrl domains must be
>   * offline. This ensures the limbo and overflow handlers are not scheduled to
>   * run, meaning the data structures they access can be freed by
>   * resctrl_mon_resource_exit().
> + *
> + * After this function has returned, the architecture code should return an

nit: "After this function has returned," -> "After resctrl_exit() returns, "

"should return an" -> "should return an error"?

> + * from all resctrl_arch_ functions that can do this.
> + * resctrl_arch_get_resource() must continue to return struct rdt_resources
> + * with the correct rid field to ensure the filesystem can be unmounted.

Is this to get through set_mba_sc() and the for_each_alloc_capable_rdt_resource(r)
loop in rdt_kill_sb() or is there something more subtle?

>   */
>  void __exit resctrl_exit(void)
>  {
>  	cpus_read_lock();
>  	WARN_ON_ONCE(resctrl_online_domains_exist());
> +
> +	mutex_lock(&rdtgroup_mutex);
> +	resctrl_fs_teardown();
> +	mutex_unlock(&rdtgroup_mutex);
> +
>  	cpus_read_unlock();
>  
>  	debugfs_remove_recursive(debugfs_resctrl);
> +	debugfs_resctrl = NULL;
>  	unregister_filesystem(&rdt_fs_type);
> -	sysfs_remove_mount_point(fs_kobj, "resctrl");
> +
> +	/*
> +	 * Do not remove the sysfs mount point added by resctrl_init() so that
> +	 * it can be used to umount resctrl.
> +	 */
>  
>  	resctrl_mon_resource_exit();
>  }

Looks good to me.

Thank you.

Reinette

Re: [PATCH v9 04/27] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point

Posted by James Morse 9 months, 1 week ago

Hi Reinette,

On 01/05/2025 18:03, Reinette Chatre wrote:
> On 4/25/25 10:37 AM, James Morse wrote:
>> @@ -4432,23 +4445,42 @@ static bool __exit resctrl_online_domains_exist(void)
>>  	return false;
>>  }
>>  
>> -/*
>> +/**

> Why make the switch to kernel-doc now? The benefit is not clear considering
> resctrl_init() is not using kernel-doc.

Just the ratcheting of 'add a comment' eventually leading to 'put it in kernel-doc'
once the comment becomes sufficiently long.


>>   * resctrl_exit() - Remove the resctrl filesystem and free resources.
>>   *
>> + * Called by the architecture code in response to a fatal error.
>> + * Removes resctrl files and structures from kernfs to prevent further
>> + * configuration.
>> + *
>>   * When called by the architecture code, all CPUs and resctrl domains must be
>>   * offline. This ensures the limbo and overflow handlers are not scheduled to
>>   * run, meaning the data structures they access can be freed by
>>   * resctrl_mon_resource_exit().
>> + *
>> + * After this function has returned, the architecture code should return an

> nit: "After this function has returned," -> "After resctrl_exit() returns, "

Sure,

> "should return an" -> "should return an error"?

Fixed, thanks!


>> + * from all resctrl_arch_ functions that can do this.
>> + * resctrl_arch_get_resource() must continue to return struct rdt_resources
>> + * with the correct rid field to ensure the filesystem can be unmounted.

> Is this to get through set_mba_sc() and the for_each_alloc_capable_rdt_resource(r)
> loop in rdt_kill_sb() or is there something more subtle?

The for_each walkers, which may also get used by the arch code. I don't have an example of
where this would go wrong, but felt it was worth noting that resctrl_arch_get_resource()
should not return NULL for all possible resources in this case - resctrl doesn't expect
that for any entry in the enum. Adding that error handling was too noisy, given that today
x86 has all the resources.

Tony suggested that get changed to searching a list if the list of possible resources
starts to grow.


>>   */
>>  void __exit resctrl_exit(void)
>>  {
>>  	cpus_read_lock();
>>  	WARN_ON_ONCE(resctrl_online_domains_exist());
>> +
>> +	mutex_lock(&rdtgroup_mutex);
>> +	resctrl_fs_teardown();
>> +	mutex_unlock(&rdtgroup_mutex);
>> +
>>  	cpus_read_unlock();
>>  
>>  	debugfs_remove_recursive(debugfs_resctrl);
>> +	debugfs_resctrl = NULL;
>>  	unregister_filesystem(&rdt_fs_type);
>> -	sysfs_remove_mount_point(fs_kobj, "resctrl");
>> +
>> +	/*
>> +	 * Do not remove the sysfs mount point added by resctrl_init() so that
>> +	 * it can be used to umount resctrl.
>> +	 */
>>  
>>  	resctrl_mon_resource_exit();
>>  }
> 
> Looks good to me.


Thanks!

James

Re: [PATCH v9 04/27] x86/resctrl: resctrl_exit() teardown resctrl but leave the mount point

Posted by Reinette Chatre 9 months, 1 week ago

Hi James,

On 5/7/25 9:48 AM, James Morse wrote:
> Hi Reinette,
> 
> On 01/05/2025 18:03, Reinette Chatre wrote:
>> On 4/25/25 10:37 AM, James Morse wrote:
>>> @@ -4432,23 +4445,42 @@ static bool __exit resctrl_online_domains_exist(void)
>>>  	return false;
>>>  }
>>>  
>>> -/*
>>> +/**
> 
>> Why make the switch to kernel-doc now? The benefit is not clear considering
>> resctrl_init() is not using kernel-doc.
> 
> Just the ratcheting of 'add a comment' eventually leading to 'put it in kernel-doc'
> once the comment becomes sufficiently long.

I see. Not something to pick on since resctrl surely is not consistent in this regard.

>>> + * from all resctrl_arch_ functions that can do this.
>>> + * resctrl_arch_get_resource() must continue to return struct rdt_resources
>>> + * with the correct rid field to ensure the filesystem can be unmounted.
> 
>> Is this to get through set_mba_sc() and the for_each_alloc_capable_rdt_resource(r)
>> loop in rdt_kill_sb() or is there something more subtle?
> 
> The for_each walkers, which may also get used by the arch code. I don't have an example of
> where this would go wrong, but felt it was worth noting that resctrl_arch_get_resource()
> should not return NULL for all possible resources in this case - resctrl doesn't expect
> that for any entry in the enum. Adding that error handling was too noisy, given that today
> x86 has all the resources.

I see. I also find resctrl_arch_get_resource()'s comment in include/linux/resctrl.h 
supportive of this. Having the extra note here is enforces this requirement.

> 
> Tony suggested that get changed to searching a list if the list of possible resources
> starts to grow.
> 

ack.

Reinette