[v7] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

[PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by James Morse 2 years, 3 months ago

rmid_ptrs[] is allocated from dom_data_init() but never free()d.

While the exit text ends up in the linker script's DISCARD section,
the direction of travel is for resctrl to be/have loadable modules.

Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
by rdt_get_mon_l3_config().

There is no reason to backport this to a stable kernel.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v5:
 * This patch is new

Changes since v6:
 * Removed struct rdt_resource argument, added __exit markers to match the
   only caller.
 * Adedd a whole stack of functions to maintain symmetry.
---
 arch/x86/kernel/cpu/resctrl/core.c     |  6 ++++++
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  | 15 +++++++++++++++
 3 files changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 19e0681f0435..0056c9962a44 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -992,7 +992,13 @@ late_initcall(resctrl_late_init);
 
 static void __exit resctrl_exit(void)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
 	cpuhp_remove_state(rdt_online);
+
+	if (r->mon_capable)
+		rdt_put_mon_l3_config(r);
+
 	rdtgroup_exit();
 }
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a4f1aa15f0a2..f68c6aecfa66 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -546,6 +546,7 @@ void closid_free(int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
+void __exit rdt_put_mon_l3_config(struct rdt_resource *r);
 bool __init rdt_cpu_has(int flag);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f136ac046851..5d9864919f1c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r)
 	return 0;
 }
 
+static void __exit dom_data_exit(struct rdt_resource *r)
+{
+	mutex_lock(&rdtgroup_mutex);
+
+	kfree(rmid_ptrs);
+	rmid_ptrs = NULL;
+
+	mutex_unlock(&rdtgroup_mutex);
+}
+
 static struct mon_evt llc_occupancy_event = {
 	.name		= "llc_occupancy",
 	.evtid		= QOS_L3_OCCUP_EVENT_ID,
@@ -830,6 +840,11 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	return 0;
 }
 
+void __exit rdt_put_mon_l3_config(struct rdt_resource *r)
+{
+	dom_data_exit(r);
+}
+
 void __init intel_rdt_mbm_apply_quirk(void)
 {
 	int cf_index;
-- 
2.39.2

Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by Moger, Babu 2 years, 3 months ago

Hi James,

On 10/25/23 13:03, James Morse wrote:
> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
> 
> While the exit text ends up in the linker script's DISCARD section,
> the direction of travel is for resctrl to be/have loadable modules.
> 
> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
> by rdt_get_mon_l3_config().
> 
> There is no reason to backport this to a stable kernel.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Babu Moger <babu.moger@amd.com>


> ---
> Changes since v5:
>  * This patch is new
> 
> Changes since v6:
>  * Removed struct rdt_resource argument, added __exit markers to match the
>    only caller.
>  * Adedd a whole stack of functions to maintain symmetry.
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |  6 ++++++
>  arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 15 +++++++++++++++
>  3 files changed, 22 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 19e0681f0435..0056c9962a44 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init);
>  
>  static void __exit resctrl_exit(void)
>  {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
>  	cpuhp_remove_state(rdt_online);
> +
> +	if (r->mon_capable)
> +		rdt_put_mon_l3_config(r);
> +
>  	rdtgroup_exit();
>  }
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index a4f1aa15f0a2..f68c6aecfa66 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -546,6 +546,7 @@ void closid_free(int closid);
>  int alloc_rmid(void);
>  void free_rmid(u32 rmid);
>  int rdt_get_mon_l3_config(struct rdt_resource *r);
> +void __exit rdt_put_mon_l3_config(struct rdt_resource *r);
>  bool __init rdt_cpu_has(int flag);
>  void mon_event_count(void *info);
>  int rdtgroup_mondata_show(struct seq_file *m, void *arg);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index f136ac046851..5d9864919f1c 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r)
>  	return 0;
>  }
>  
> +static void __exit dom_data_exit(struct rdt_resource *r)
> +{
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	kfree(rmid_ptrs);
> +	rmid_ptrs = NULL;
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +}
> +
>  static struct mon_evt llc_occupancy_event = {
>  	.name		= "llc_occupancy",
>  	.evtid		= QOS_L3_OCCUP_EVENT_ID,
> @@ -830,6 +840,11 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	return 0;
>  }
>  
> +void __exit rdt_put_mon_l3_config(struct rdt_resource *r)
> +{
> +	dom_data_exit(r);
> +}
> +
>  void __init intel_rdt_mbm_apply_quirk(void)
>  {
>  	int cf_index;

-- 
Thanks
Babu Moger

Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by James Morse 2 years, 1 month ago

Hi Babu,

On 09/11/2023 20:28, Moger, Babu wrote:
> On 10/25/23 13:03, James Morse wrote:
>> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>>
>> While the exit text ends up in the linker script's DISCARD section,
>> the direction of travel is for resctrl to be/have loadable modules.
>>
>> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
>> by rdt_get_mon_l3_config().
>>
>> There is no reason to backport this to a stable kernel.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>

> Reviewed-by: Babu Moger <babu.moger@amd.com>

Thanks!

James

Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by Reinette Chatre 2 years, 3 months ago

Hi James,

Subject refers to rdtgroup_exit() but the patch is actually changing
resctrl_exit().

On 10/25/2023 11:03 AM, James Morse wrote:
> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
> 
> While the exit text ends up in the linker script's DISCARD section,
> the direction of travel is for resctrl to be/have loadable modules.
> 
> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
> by rdt_get_mon_l3_config().

To match what patch actually does it looks like this should rather be:
"Add resctrl_exit_mon_l3_config()" -> "Add resctrl_put_mon_l3_config()" 

> 
> There is no reason to backport this to a stable kernel.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v5:
>  * This patch is new
> 
> Changes since v6:
>  * Removed struct rdt_resource argument, added __exit markers to match the
>    only caller.
>  * Adedd a whole stack of functions to maintain symmetry.
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |  6 ++++++
>  arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 15 +++++++++++++++
>  3 files changed, 22 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 19e0681f0435..0056c9962a44 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init);
>  
>  static void __exit resctrl_exit(void)
>  {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
>  	cpuhp_remove_state(rdt_online);
> +
> +	if (r->mon_capable)
> +		rdt_put_mon_l3_config(r);
> +
>  	rdtgroup_exit();
>  }

I expect cleanup to do the inverse of init. I do not know what was the
motivation for the rdtgroup_exit() to follow cpuhp_remove_state() but I
was expecting this new cleanup to be done after rdtgroup_exit() to be inverse
of init. This cleanup is inserted in middle of two existing cleanup - could
you please elaborate how this location was chosen?

Reinette

Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by James Morse 2 years, 1 month ago

Hi Reinette

On 09/11/2023 17:39, Reinette Chatre wrote:
> Hi James,
> 
> Subject refers to rdtgroup_exit() but the patch is actually changing
> resctrl_exit().

I'll fix that,


> On 10/25/2023 11:03 AM, James Morse wrote:
>> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>>
>> While the exit text ends up in the linker script's DISCARD section,
>> the direction of travel is for resctrl to be/have loadable modules.
>>
>> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
>> by rdt_get_mon_l3_config().
> 
> To match what patch actually does it looks like this should rather be:
> "Add resctrl_exit_mon_l3_config()" -> "Add resctrl_put_mon_l3_config()" 
> 
>>
>> There is no reason to backport this to a stable kernel.

[...]

>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 19e0681f0435..0056c9962a44 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init);
>>  
>>  static void __exit resctrl_exit(void)
>>  {
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>>  	cpuhp_remove_state(rdt_online);
>> +
>> +	if (r->mon_capable)
>> +		rdt_put_mon_l3_config(r);
>> +
>>  	rdtgroup_exit();
>>  }
> 
> I expect cleanup to do the inverse of init. I do not know what was the
> motivation for the rdtgroup_exit() to follow cpuhp_remove_state()

This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are
offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug
notifiers. (if you could run this code...)


> but I
> was expecting this new cleanup to be done after rdtgroup_exit() to be inverse
> of init. This cleanup is inserted in middle of two existing cleanup - could
> you please elaborate how this location was chosen?

rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs
entries, and unregisters the filesystem.

Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as
all the CPUs are offline and the overflow/limbo threads should have been cancelled.
Once cpuhp_remove_state() has been called, this really doesn't matter.


Thanks,

James

Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by Reinette Chatre 2 years, 1 month ago

Hi James,

On 12/13/2023 10:03 AM, James Morse wrote:
> On 09/11/2023 17:39, Reinette Chatre wrote:
>> On 10/25/2023 11:03 AM, James Morse wrote:

...

>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>> index 19e0681f0435..0056c9962a44 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init);
>>>  
>>>  static void __exit resctrl_exit(void)
>>>  {
>>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>> +
>>>  	cpuhp_remove_state(rdt_online);
>>> +
>>> +	if (r->mon_capable)
>>> +		rdt_put_mon_l3_config(r);
>>> +
>>>  	rdtgroup_exit();
>>>  }
>>
>> I expect cleanup to do the inverse of init. I do not know what was the
>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state()
> 
> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are
> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug
> notifiers. (if you could run this code...)
> 

hmmm ... if there is a risk of such a race would the init code not also be
vulnerable to that with the notifiers up before rdtgroup_init()? The races you mention
are not obvious to me. I see the filesystem and hotplug code protected against races via
the mutex and static keys. Could you please elaborate on the flows of concern?

I am not advocating for cpuhp_remove_state() to be called later. I understand that
it simplifies the flows to consider.

>> but I
>> was expecting this new cleanup to be done after rdtgroup_exit() to be inverse
>> of init. This cleanup is inserted in middle of two existing cleanup - could
>> you please elaborate how this location was chosen?
> 
> rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs
> entries, and unregisters the filesystem.
> 
> Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as
> all the CPUs are offline and the overflow/limbo threads should have been cancelled.
> Once cpuhp_remove_state() has been called, this really doesn't matter.

Sounds like nothing prevents this code from following the custom of cleanup to be
inverse of init (yet keep cpuhp_remove_state() first).

Reinette

Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by James Morse 2 years, 1 month ago

Hi Reinette,

On 13/12/2023 23:27, Reinette Chatre wrote:
> Hi James,
> 
> On 12/13/2023 10:03 AM, James Morse wrote:
>> On 09/11/2023 17:39, Reinette Chatre wrote:
>>> On 10/25/2023 11:03 AM, James Morse wrote:
> 
> ...
> 
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>>> index 19e0681f0435..0056c9962a44 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>>> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init);
>>>>  
>>>>  static void __exit resctrl_exit(void)
>>>>  {
>>>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>> +
>>>>  	cpuhp_remove_state(rdt_online);
>>>> +
>>>> +	if (r->mon_capable)
>>>> +		rdt_put_mon_l3_config(r);
>>>> +
>>>>  	rdtgroup_exit();
>>>>  }
>>>
>>> I expect cleanup to do the inverse of init. I do not know what was the
>>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state()
>>
>> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are
>> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug
>> notifiers. (if you could run this code...)

> hmmm ... if there is a risk of such a race would the init code not also be
> vulnerable to that with the notifiers up before rdtgroup_init()?

Nope, because this array is allocated behind rdt_get_mon_l3_config(), which ultimately
comes from get_rdt_resources() in resctrl_late_init() - which calls cpuhp_setup_state()
after all this init work has been done.

(cpu hp always gives me a headache1)


> The races you mention
> are not obvious to me. I see the filesystem and hotplug code protected against races via
> the mutex and static keys. Could you please elaborate on the flows of concern?

Functions like __check_limbo() (calling __rmid_entry()) are called under the
rdtgroup_mutex, but they don't consider that rmid_ptrs[] may be NULL.

But this could only happen if the limbo work ran after cpuhp_remove_state() - this can't
happen because the hotplug callbacks cancel the limbo work, and won't reschedule it if the
domain is going offline.


The only other path is via free_rmid(), I've not thought too much about this as
resctrl_exit() can't actually be invoked - this code is discarded by the linker.

It could be run on MPAM, but only in response to an 'error interrupt' (which is optional)
- and all the MPAM error interrupts indicate a software bug.

I've only invoked this path once, and rdtgroup_exit()s unregister_filesystem() didn't
remove all the files. I anticipate digging into this teardown code more once the bulk of
the MPAM driver is upstream.


> I am not advocating for cpuhp_remove_state() to be called later. I understand that
> it simplifies the flows to consider.
> 
>>> but I
>>> was expecting this new cleanup to be done after rdtgroup_exit() to be inverse
>>> of init. This cleanup is inserted in middle of two existing cleanup - could
>>> you please elaborate how this location was chosen?
>>
>> rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs
>> entries, and unregisters the filesystem.
>>
>> Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as
>> all the CPUs are offline and the overflow/limbo threads should have been cancelled.
>> Once cpuhp_remove_state() has been called, this really doesn't matter.

> Sounds like nothing prevents this code from following the custom of cleanup to be
> inverse of init (yet keep cpuhp_remove_state() first).

I'll put the the rdt_put_mon_l3_config() call after rdtgroup_exit()...


Thanks,

James

Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by Reinette Chatre 2 years, 1 month ago

Hi James,

On 12/14/2023 10:28 AM, James Morse wrote:
> Hi Reinette,
> 
> On 13/12/2023 23:27, Reinette Chatre wrote:
>> Hi James,
>>
>> On 12/13/2023 10:03 AM, James Morse wrote:
>>> On 09/11/2023 17:39, Reinette Chatre wrote:
>>>> On 10/25/2023 11:03 AM, James Morse wrote:
>>
>> ...
>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>>>> index 19e0681f0435..0056c9962a44 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>>>> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init);
>>>>>  
>>>>>  static void __exit resctrl_exit(void)
>>>>>  {
>>>>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>>> +
>>>>>  	cpuhp_remove_state(rdt_online);
>>>>> +
>>>>> +	if (r->mon_capable)
>>>>> +		rdt_put_mon_l3_config(r);
>>>>> +
>>>>>  	rdtgroup_exit();
>>>>>  }
>>>>
>>>> I expect cleanup to do the inverse of init. I do not know what was the
>>>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state()
>>>
>>> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are
>>> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug
>>> notifiers. (if you could run this code...)
> 
>> hmmm ... if there is a risk of such a race would the init code not also be
>> vulnerable to that with the notifiers up before rdtgroup_init()?
> 
> Nope, because this array is allocated behind rdt_get_mon_l3_config(), which ultimately
> comes from get_rdt_resources() in resctrl_late_init() - which calls cpuhp_setup_state()
> after all this init work has been done.
> 
> (cpu hp always gives me a headache1)

Right. My comment was actually and specifically about rdtgroup_init() and attempting to
understand your view of its races with the hotplug notifiers in response to your comment about
its (the hotplug notifiers) races with rdtgroup_exit().

The current order of state initialization you mention and hotplug notifiers needing the
state is sane and implies to expect an inverse order of teardown.

>> The races you mention
>> are not obvious to me. I see the filesystem and hotplug code protected against races via
>> the mutex and static keys. Could you please elaborate on the flows of concern?
> 
> Functions like __check_limbo() (calling __rmid_entry()) are called under the
> rdtgroup_mutex, but they don't consider that rmid_ptrs[] may be NULL.
> 
> But this could only happen if the limbo work ran after cpuhp_remove_state() - this can't
> happen because the hotplug callbacks cancel the limbo work, and won't reschedule it if the
> domain is going offline.
> 
> 
> The only other path is via free_rmid(), I've not thought too much about this as
> resctrl_exit() can't actually be invoked - this code is discarded by the linker.
> 
> It could be run on MPAM, but only in response to an 'error interrupt' (which is optional)
> - and all the MPAM error interrupts indicate a software bug.

This still just considers the resctrl state and hotplug notifiers.

I clearly am missing something. It is still not clear to me how this connects to your earlier
comment about races with the rdtgroup_exit() code ... how the hotplug notifiers races with the
filesystem register/unregister code.

> 
> I've only invoked this path once, and rdtgroup_exit()s unregister_filesystem() didn't
> remove all the files. I anticipate digging into this teardown code more once the bulk of
> the MPAM driver is upstream.
> 
> 
>> I am not advocating for cpuhp_remove_state() to be called later. I understand that
>> it simplifies the flows to consider.
>>
>>>> but I
>>>> was expecting this new cleanup to be done after rdtgroup_exit() to be inverse
>>>> of init. This cleanup is inserted in middle of two existing cleanup - could
>>>> you please elaborate how this location was chosen?
>>>
>>> rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs
>>> entries, and unregisters the filesystem.
>>>
>>> Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as
>>> all the CPUs are offline and the overflow/limbo threads should have been cancelled.
>>> Once cpuhp_remove_state() has been called, this really doesn't matter.
> 
>> Sounds like nothing prevents this code from following the custom of cleanup to be
>> inverse of init (yet keep cpuhp_remove_state() first).
> 
> I'll put the the rdt_put_mon_l3_config() call after rdtgroup_exit()...

thank you

Reinette

Re: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Posted by James Morse 2 years, 1 month ago

Hi Reinette,

On 12/14/23 19:06, Reinette Chatre wrote:
> On 12/14/2023 10:28 AM, James Morse wrote:
>> On 13/12/2023 23:27, Reinette Chatre wrote:
>>> On 12/13/2023 10:03 AM, James Morse wrote:
>>>> On 09/11/2023 17:39, Reinette Chatre wrote:
>>>>> I expect cleanup to do the inverse of init. I do not know what was the
>>>>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state()
>>>>
>>>> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are
>>>> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug
>>>> notifiers. (if you could run this code...)
>>
>>> hmmm ... if there is a risk of such a race would the init code not also be
>>> vulnerable to that with the notifiers up before rdtgroup_init()?
>>
>> Nope, because this array is allocated behind rdt_get_mon_l3_config(), which ultimately
>> comes from get_rdt_resources() in resctrl_late_init() - which calls cpuhp_setup_state()
>> after all this init work has been done.
>>
>> (cpu hp always gives me a headache1)
> 
> Right. My comment was actually and specifically about rdtgroup_init() and attempting to
> understand your view of its races with the hotplug notifiers in response to your comment about
> its (the hotplug notifiers) races with rdtgroup_exit().
> 
> The current order of state initialization you mention and hotplug notifiers needing the
> state is sane and implies to expect an inverse order of teardown.
> 
>>> The races you mention
>>> are not obvious to me. I see the filesystem and hotplug code protected against races via
>>> the mutex and static keys. Could you please elaborate on the flows of concern?
>>
>> Functions like __check_limbo() (calling __rmid_entry()) are called under the
>> rdtgroup_mutex, but they don't consider that rmid_ptrs[] may be NULL.
>>
>> But this could only happen if the limbo work ran after cpuhp_remove_state() - this can't
>> happen because the hotplug callbacks cancel the limbo work, and won't reschedule it if the
>> domain is going offline.
>>
>>
>> The only other path is via free_rmid(), I've not thought too much about this as
>> resctrl_exit() can't actually be invoked - this code is discarded by the linker.
>>
>> It could be run on MPAM, but only in response to an 'error interrupt' (which is optional)
>> - and all the MPAM error interrupts indicate a software bug.
> 
> This still just considers the resctrl state and hotplug notifiers.
> 
> I clearly am missing something. It is still not clear to me how this connects to your earlier
> comment about races with the rdtgroup_exit() code ... how the hotplug notifiers races with the
> filesystem register/unregister code.

I don't think there is a specific problem there, this was mostly about unexpected surprises because cpuhp/limbo_handler/overflow_handler all run asynchronously. I may also have added confusion because the code added here moves into rdtgroup_exit() which is renamed resctrl_exit() as part of dragging all this out to /fs/. (This is also why I tried to initially add it in its final location)


Thanks,

James