[PATCH v9 03/27] x86/resctrl: Check all domains are offline in resctrl_exit()

James Morse posted 27 patches 7 months, 3 weeks ago
There is a newer version of this series
[PATCH v9 03/27] x86/resctrl: Check all domains are offline in resctrl_exit()
Posted by James Morse 7 months, 3 weeks ago
resctrl_exit() removes things like the resctrl mount point directory
and unregisters the filesystem prior to freeing data structures that
were allocated during resctrl_init().

This assumes that there are no online domains when resctrl_exit() is
called. If any domain were online, the limbo or overflow handler could
be scheduled to run.

Add a check for any online control or monitor domains, and document that
the architecture code is required to do this.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v8:
 * This patch is new.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 88197afbbb8a..f617ac97758b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4420,8 +4420,32 @@ int __init resctrl_init(void)
 	return ret;
 }
 
+static bool __exit resctrl_online_domains_exist(void)
+{
+	struct rdt_resource *r;
+
+	for_each_rdt_resource(r) {
+		if (!list_empty(&r->ctrl_domains) || !list_empty(&r->mon_domains))
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * resctrl_exit() - Remove the resctrl filesystem and free resources.
+ *
+ * When called by the architecture code, all CPUs and resctrl domains must be
+ * offline. This ensures the limbo and overflow handlers are not scheduled to
+ * run, meaning the data structures they access can be freed by
+ * resctrl_mon_resource_exit().
+ */
 void __exit resctrl_exit(void)
 {
+	cpus_read_lock();
+	WARN_ON_ONCE(resctrl_online_domains_exist());
+	cpus_read_unlock();
+
 	debugfs_remove_recursive(debugfs_resctrl);
 	unregister_filesystem(&rdt_fs_type);
 	sysfs_remove_mount_point(fs_kobj, "resctrl");
-- 
2.39.5
Re: [PATCH v9 03/27] x86/resctrl: Check all domains are offline in resctrl_exit()
Posted by Reinette Chatre 7 months, 3 weeks ago
Hi James,

On 4/25/25 10:37 AM, James Morse wrote:
> resctrl_exit() removes things like the resctrl mount point directory
> and unregisters the filesystem prior to freeing data structures that
> were allocated during resctrl_init().
> 
> This assumes that there are no online domains when resctrl_exit() is
> called. If any domain were online, the limbo or overflow handler could
> be scheduled to run.
> 
> Add a check for any online control or monitor domains, and document that
> the architecture code is required to do this.

nit: It may not be obvious at this point what "this" means. Above could be:

	Add a check for any online control or monitor domains, and document that
	the architecture code is required to offline all monitor and control
	domains before calling resctrl_exit().

> 
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v8:
>  * This patch is new.

Thank you for adding this.

> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 88197afbbb8a..f617ac97758b 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -4420,8 +4420,32 @@ int __init resctrl_init(void)
>  	return ret;
>  }
>  
> +static bool __exit resctrl_online_domains_exist(void)
> +{
> +	struct rdt_resource *r;
> +
> +	for_each_rdt_resource(r) {
> +		if (!list_empty(&r->ctrl_domains) || !list_empty(&r->mon_domains))

A list needs to be initialized for list_empty() to behave as intended. A list within
an uninitialized or "kzalloc()'ed" struct will not be considered empty. 
resctrl_arch_get_resource() as used by for_each_rdt_resource() already establishes
that if an architecture does not support a particular resource then it can (should?)
return a "dummy/not-capable" resource. I do not think resctrl should require
anything additionally like initializing the lists of a dummy/not-capable resource
to support things like this loop. 

Considering this, could this be made more specific? For example,

	for_each_alloc_capable_rdt_resource(r) {
		if (!list_empty(&r->ctrl_domains))
			return true;
	}

	for_each_mon_capable_rdt_resource(r) {
		if (!list_empty(&r->mon_domains))
			return true;
	}
		
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * resctrl_exit() - Remove the resctrl filesystem and free resources.
> + *
> + * When called by the architecture code, all CPUs and resctrl domains must be
> + * offline. This ensures the limbo and overflow handlers are not scheduled to
> + * run, meaning the data structures they access can be freed by
> + * resctrl_mon_resource_exit().
> + */
>  void __exit resctrl_exit(void)
>  {
> +	cpus_read_lock();
> +	WARN_ON_ONCE(resctrl_online_domains_exist());
> +	cpus_read_unlock();
> +
>  	debugfs_remove_recursive(debugfs_resctrl);
>  	unregister_filesystem(&rdt_fs_type);
>  	sysfs_remove_mount_point(fs_kobj, "resctrl");

Thank you.

Reinette