[v1] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

[PATCH -next] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

Posted by Chen Ridong 2 months, 3 weeks ago

From: Chen Ridong <chenridong@huawei.com>

Commit 406100f3da08 ("cpuset: fix race between hotplug work and later CPU
offline")added a check for empty effective_cpus in partitions for cgroup
v2. However, thischeck did not account for remote partitions, which were
introduced later.

After commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug processing
synchronous"),cgroup v2's cpuset hotplug handling is now synchronous. This
eliminates the race condition with subsequent CPU offline operations that
the original check aimed to fix.

Instead of extending the check to support remote partitions, this patch
removes the redundant partition effective_cpus check. Additionally, it adds
a check and warningto verify that all generated sched domains consist of
active CPUs, preventing partition_sched_domains from being invoked with
offline CPUs.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset.c | 29 ++++++-----------------------
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index daf813386260..1ac58e3f26b4 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1084,11 +1084,10 @@ void dl_rebuild_rd_accounting(void)
  */
 void rebuild_sched_domains_locked(void)
 {
-	struct cgroup_subsys_state *pos_css;
 	struct sched_domain_attr *attr;
 	cpumask_var_t *doms;
-	struct cpuset *cs;
 	int ndoms;
+	int i;
 
 	lockdep_assert_cpus_held();
 	lockdep_assert_held(&cpuset_mutex);
@@ -1107,30 +1106,14 @@ void rebuild_sched_domains_locked(void)
 	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
 		return;
 
-	/*
-	 * With subpartition CPUs, however, the effective CPUs of a partition
-	 * root should be only a subset of the active CPUs.  Since a CPU in any
-	 * partition root could be offlined, all must be checked.
-	 */
-	if (!cpumask_empty(subpartitions_cpus)) {
-		rcu_read_lock();
-		cpuset_for_each_descendant_pre(cs, pos_css, &top_cpuset) {
-			if (!is_partition_valid(cs)) {
-				pos_css = css_rightmost_descendant(pos_css);
-				continue;
-			}
-			if (!cpumask_subset(cs->effective_cpus,
-					    cpu_active_mask)) {
-				rcu_read_unlock();
-				return;
-			}
-		}
-		rcu_read_unlock();
-	}
-
 	/* Generate domain masks and attrs */
 	ndoms = generate_sched_domains(&doms, &attr);
 
+	for (i = 0; i < ndoms; ++i) {
+		if (WARN_ON_ONCE(!cpumask_subset(doms[i], cpu_active_mask)))
+			return;
+	}
+
 	/* Have scheduler rebuild the domains */
 	partition_sched_domains(ndoms, doms, attr);
 }
-- 
2.34.1

Re: [PATCH -next] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

Posted by Waiman Long 2 months, 2 weeks ago

On 11/18/25 3:36 AM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Commit 406100f3da08 ("cpuset: fix race between hotplug work and later CPU
> offline")added a check for empty effective_cpus in partitions for cgroup
> v2. However, thischeck did not account for remote partitions, which were
> introduced later.
>
> After commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug processing
> synchronous"),cgroup v2's cpuset hotplug handling is now synchronous. This
> eliminates the race condition with subsequent CPU offline operations that
> the original check aimed to fix.
That is true. The original asynchronous cpuset_hotplug_workfn() is 
called after the hotplug operation finishes. So cpuset can be in a state 
where cpu_active_mask was updated, but not the effective cpumasks in 
cpuset.
>
> Instead of extending the check to support remote partitions, this patch
> removes the redundant partition effective_cpus check. Additionally, it adds
> a check and warningto verify that all generated sched domains consist of
"warningto" => "warning to"
> active CPUs, preventing partition_sched_domains from being invoked with
> offline CPUs.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>   kernel/cgroup/cpuset.c | 29 ++++++-----------------------
>   1 file changed, 6 insertions(+), 23 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index daf813386260..1ac58e3f26b4 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1084,11 +1084,10 @@ void dl_rebuild_rd_accounting(void)
>    */
>   void rebuild_sched_domains_locked(void)
>   {
> -	struct cgroup_subsys_state *pos_css;
>   	struct sched_domain_attr *attr;
>   	cpumask_var_t *doms;
> -	struct cpuset *cs;
>   	int ndoms;
> +	int i;
>   
>   	lockdep_assert_cpus_held();
>   	lockdep_assert_held(&cpuset_mutex);

In fact, the following code and the comments above in 
rebuild_sched_domains_locked() are also no longer relevant. So you may 
remove them as well.

         if (!top_cpuset.nr_subparts_cpus &&
             !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
                 return;

> @@ -1107,30 +1106,14 @@ void rebuild_sched_domains_locked(void)
>   	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>   		return;
>   
> -	/*
> -	 * With subpartition CPUs, however, the effective CPUs of a partition
> -	 * root should be only a subset of the active CPUs.  Since a CPU in any
> -	 * partition root could be offlined, all must be checked.
> -	 */
> -	if (!cpumask_empty(subpartitions_cpus)) {
> -		rcu_read_lock();
> -		cpuset_for_each_descendant_pre(cs, pos_css, &top_cpuset) {
> -			if (!is_partition_valid(cs)) {
> -				pos_css = css_rightmost_descendant(pos_css);
> -				continue;
> -			}
> -			if (!cpumask_subset(cs->effective_cpus,
> -					    cpu_active_mask)) {
> -				rcu_read_unlock();
> -				return;
> -			}
> -		}
> -		rcu_read_unlock();
> -	}
> -
>   	/* Generate domain masks and attrs */
>   	ndoms = generate_sched_domains(&doms, &attr);
>   
> +	for (i = 0; i < ndoms; ++i) {
> +		if (WARN_ON_ONCE(!cpumask_subset(doms[i], cpu_active_mask)))
> +			return;
> +	}
> +

If it is not clear about the purpose of the WARN_ON_ONCE() call, we 
should add a comment to explain that cpu_active_mask will not be out of 
sync with cpuset's effective cpumasks. So the warning should not be 
triggered.

Cheers,
Longman

>   	/* Have scheduler rebuild the domains */
>   	partition_sched_domains(ndoms, doms, attr);
>   }

Re: [PATCH -next] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

Posted by Chen Ridong 2 months, 2 weeks ago


On 2025/11/26 2:16, Waiman Long wrote:
> On 11/18/25 3:36 AM, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Commit 406100f3da08 ("cpuset: fix race between hotplug work and later CPU
>> offline")added a check for empty effective_cpus in partitions for cgroup
>> v2. However, thischeck did not account for remote partitions, which were
>> introduced later.
>>
>> After commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug processing
>> synchronous"),cgroup v2's cpuset hotplug handling is now synchronous. This
>> eliminates the race condition with subsequent CPU offline operations that
>> the original check aimed to fix.
> That is true. The original asynchronous cpuset_hotplug_workfn() is called after the hotplug
> operation finishes. So cpuset can be in a state where cpu_active_mask was updated, but not the
> effective cpumasks in cpuset.
>>
>> Instead of extending the check to support remote partitions, this patch
>> removes the redundant partition effective_cpus check. Additionally, it adds
>> a check and warningto verify that all generated sched domains consist of
> "warningto" => "warning to"

Thank you Longman,

will update.

>> active CPUs, preventing partition_sched_domains from being invoked with
>> offline CPUs.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>   kernel/cgroup/cpuset.c | 29 ++++++-----------------------
>>   1 file changed, 6 insertions(+), 23 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index daf813386260..1ac58e3f26b4 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1084,11 +1084,10 @@ void dl_rebuild_rd_accounting(void)
>>    */
>>   void rebuild_sched_domains_locked(void)
>>   {
>> -    struct cgroup_subsys_state *pos_css;
>>       struct sched_domain_attr *attr;
>>       cpumask_var_t *doms;
>> -    struct cpuset *cs;
>>       int ndoms;
>> +    int i;
>>         lockdep_assert_cpus_held();
>>       lockdep_assert_held(&cpuset_mutex);
> 
> In fact, the following code and the comments above in rebuild_sched_domains_locked() are also no
> longer relevant. So you may remove them as well.
> 
>         if (!top_cpuset.nr_subparts_cpus &&
>             !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>                 return;
> 

Thank you for reminding me.

I initially retained this code because I believed it was still required for cgroup v1, as I recalled
that synchronous operation is exclusive to cgroup v2.

However, upon re-examining the code, I confirm it can be safely removed. For cgroup v1,
rebuild_sched_domains_locked is called synchronously, and only the migration task (handled by
cpuset_migrate_tasks_workfn) operates asynchronously. Consequently, cpuset_hotplug_workfn is
guaranteed to complete before the hotplug workflow finishes.

>> @@ -1107,30 +1106,14 @@ void rebuild_sched_domains_locked(void)
>>           !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>           return;
>>   -    /*
>> -     * With subpartition CPUs, however, the effective CPUs of a partition
>> -     * root should be only a subset of the active CPUs.  Since a CPU in any
>> -     * partition root could be offlined, all must be checked.
>> -     */
>> -    if (!cpumask_empty(subpartitions_cpus)) {
>> -        rcu_read_lock();
>> -        cpuset_for_each_descendant_pre(cs, pos_css, &top_cpuset) {
>> -            if (!is_partition_valid(cs)) {
>> -                pos_css = css_rightmost_descendant(pos_css);
>> -                continue;
>> -            }
>> -            if (!cpumask_subset(cs->effective_cpus,
>> -                        cpu_active_mask)) {
>> -                rcu_read_unlock();
>> -                return;
>> -            }
>> -        }
>> -        rcu_read_unlock();
>> -    }
>> -
>>       /* Generate domain masks and attrs */
>>       ndoms = generate_sched_domains(&doms, &attr);
>>   +    for (i = 0; i < ndoms; ++i) {
>> +        if (WARN_ON_ONCE(!cpumask_subset(doms[i], cpu_active_mask)))
>> +            return;
>> +    }
>> +
> 
> If it is not clear about the purpose of the WARN_ON_ONCE() call, we should add a comment to explain
> that cpu_active_mask will not be out of sync with cpuset's effective cpumasks. So the warning should
> not be triggered.
> 

Will add.

> Cheers,
> Longman
> 
>>       /* Have scheduler rebuild the domains */
>>       partition_sched_domains(ndoms, doms, attr);
>>   }
> 

-- 
Best regards,
Ridong

Re: [PATCH -next] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

Posted by Waiman Long 2 months, 2 weeks ago

On 11/25/25 8:01 PM, Chen Ridong wrote:
>
> On 2025/11/26 2:16, Waiman Long wrote:
>>> active CPUs, preventing partition_sched_domains from being invoked with
>>> offline CPUs.
>>>
>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>> ---
>>>    kernel/cgroup/cpuset.c | 29 ++++++-----------------------
>>>    1 file changed, 6 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index daf813386260..1ac58e3f26b4 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -1084,11 +1084,10 @@ void dl_rebuild_rd_accounting(void)
>>>     */
>>>    void rebuild_sched_domains_locked(void)
>>>    {
>>> -    struct cgroup_subsys_state *pos_css;
>>>        struct sched_domain_attr *attr;
>>>        cpumask_var_t *doms;
>>> -    struct cpuset *cs;
>>>        int ndoms;
>>> +    int i;
>>>          lockdep_assert_cpus_held();
>>>        lockdep_assert_held(&cpuset_mutex);
>> In fact, the following code and the comments above in rebuild_sched_domains_locked() are also no
>> longer relevant. So you may remove them as well.
>>
>>          if (!top_cpuset.nr_subparts_cpus &&
>>              !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>                  return;
>>
> Thank you for reminding me.
>
> I initially retained this code because I believed it was still required for cgroup v1, as I recalled
> that synchronous operation is exclusive to cgroup v2.
>
> However, upon re-examining the code, I confirm it can be safely removed. For cgroup v1,
> rebuild_sched_domains_locked is called synchronously, and only the migration task (handled by
> cpuset_migrate_tasks_workfn) operates asynchronously. Consequently, cpuset_hotplug_workfn is
> guaranteed to complete before the hotplug workflow finishes.

Yes, v1 still have a task migration part that is done asynchronously 
because of the lock ordering issue. Even if this code has to be left 
because of v1, you should still update the comment to reflect that. 
Please try to keep the comment updated to help others to have a better 
understanding of what the code is doing.

Thanks,
Longman

Re: [PATCH -next] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

Posted by Chen Ridong 2 months, 2 weeks ago


On 2025/11/26 10:33, Waiman Long wrote:
> 
> On 11/25/25 8:01 PM, Chen Ridong wrote:
>>
>> On 2025/11/26 2:16, Waiman Long wrote:
>>>> active CPUs, preventing partition_sched_domains from being invoked with
>>>> offline CPUs.
>>>>
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>> ---
>>>>    kernel/cgroup/cpuset.c | 29 ++++++-----------------------
>>>>    1 file changed, 6 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>> index daf813386260..1ac58e3f26b4 100644
>>>> --- a/kernel/cgroup/cpuset.c
>>>> +++ b/kernel/cgroup/cpuset.c
>>>> @@ -1084,11 +1084,10 @@ void dl_rebuild_rd_accounting(void)
>>>>     */
>>>>    void rebuild_sched_domains_locked(void)
>>>>    {
>>>> -    struct cgroup_subsys_state *pos_css;
>>>>        struct sched_domain_attr *attr;
>>>>        cpumask_var_t *doms;
>>>> -    struct cpuset *cs;
>>>>        int ndoms;
>>>> +    int i;
>>>>          lockdep_assert_cpus_held();
>>>>        lockdep_assert_held(&cpuset_mutex);
>>> In fact, the following code and the comments above in rebuild_sched_domains_locked() are also no
>>> longer relevant. So you may remove them as well.
>>>
>>>          if (!top_cpuset.nr_subparts_cpus &&
>>>              !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>>                  return;
>>>
>> Thank you for reminding me.
>>
>> I initially retained this code because I believed it was still required for cgroup v1, as I recalled
>> that synchronous operation is exclusive to cgroup v2.
>>
>> However, upon re-examining the code, I confirm it can be safely removed. For cgroup v1,
>> rebuild_sched_domains_locked is called synchronously, and only the migration task (handled by
>> cpuset_migrate_tasks_workfn) operates asynchronously. Consequently, cpuset_hotplug_workfn is
>> guaranteed to complete before the hotplug workflow finishes.
> 
> Yes, v1 still have a task migration part that is done asynchronously because of the lock ordering
> issue. Even if this code has to be left because of v1, you should still update the comment to
> reflect that. Please try to keep the comment updated to help others to have a better understanding
> of what the code is doing.
> 
> Thanks,
> Longman
> 

Hi Longman,

Just to confirm (in case I misunderstood): I believe it is safe to remove the check on
top_cpuset.effective_cpus (for both cgroup v1 and v2). I will proceed to remove both the
corresponding code and its associated comment(not update the comment).

          if (!top_cpuset.nr_subparts_cpus &&
              !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
                  return;

Additionally, I should add a comment to clarify the rationale for introducing the
WARN_ON_ONCE(!cpumask_subset(doms[i], cpu_active_mask)) warning.

Does this approach look good to you? Please let me know if I’ve missed anything or if further
adjustments are needed.

-- 
Best regards,
Ridong

Re: [PATCH -next] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

Posted by Waiman Long 2 months, 2 weeks ago

On 11/25/25 10:17 PM, Chen Ridong wrote:
>
> On 2025/11/26 10:33, Waiman Long wrote:
>> On 11/25/25 8:01 PM, Chen Ridong wrote:
>>> On 2025/11/26 2:16, Waiman Long wrote:
>>>>> active CPUs, preventing partition_sched_domains from being invoked with
>>>>> offline CPUs.
>>>>>
>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>> ---
>>>>>     kernel/cgroup/cpuset.c | 29 ++++++-----------------------
>>>>>     1 file changed, 6 insertions(+), 23 deletions(-)
>>>>>
>>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>>> index daf813386260..1ac58e3f26b4 100644
>>>>> --- a/kernel/cgroup/cpuset.c
>>>>> +++ b/kernel/cgroup/cpuset.c
>>>>> @@ -1084,11 +1084,10 @@ void dl_rebuild_rd_accounting(void)
>>>>>      */
>>>>>     void rebuild_sched_domains_locked(void)
>>>>>     {
>>>>> -    struct cgroup_subsys_state *pos_css;
>>>>>         struct sched_domain_attr *attr;
>>>>>         cpumask_var_t *doms;
>>>>> -    struct cpuset *cs;
>>>>>         int ndoms;
>>>>> +    int i;
>>>>>           lockdep_assert_cpus_held();
>>>>>         lockdep_assert_held(&cpuset_mutex);
>>>> In fact, the following code and the comments above in rebuild_sched_domains_locked() are also no
>>>> longer relevant. So you may remove them as well.
>>>>
>>>>           if (!top_cpuset.nr_subparts_cpus &&
>>>>               !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>>>                   return;
>>>>
>>> Thank you for reminding me.
>>>
>>> I initially retained this code because I believed it was still required for cgroup v1, as I recalled
>>> that synchronous operation is exclusive to cgroup v2.
>>>
>>> However, upon re-examining the code, I confirm it can be safely removed. For cgroup v1,
>>> rebuild_sched_domains_locked is called synchronously, and only the migration task (handled by
>>> cpuset_migrate_tasks_workfn) operates asynchronously. Consequently, cpuset_hotplug_workfn is
>>> guaranteed to complete before the hotplug workflow finishes.
>> Yes, v1 still have a task migration part that is done asynchronously because of the lock ordering
>> issue. Even if this code has to be left because of v1, you should still update the comment to
>> reflect that. Please try to keep the comment updated to help others to have a better understanding
>> of what the code is doing.
>>
>> Thanks,
>> Longman
>>
> Hi Longman,
>
> Just to confirm (in case I misunderstood): I believe it is safe to remove the check on
> top_cpuset.effective_cpus (for both cgroup v1 and v2). I will proceed to remove both the
> corresponding code and its associated comment(not update the comment).
>
>            if (!top_cpuset.nr_subparts_cpus &&
>                !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>                    return;
>
> Additionally, I should add a comment to clarify the rationale for introducing the
> WARN_ON_ONCE(!cpumask_subset(doms[i], cpu_active_mask)) warning.
>
> Does this approach look good to you? Please let me know if I’ve missed anything or if further
> adjustments are needed.
>
Yes, that is good for me. I was just talking about a hypothetical 
situation, not that you have to update the comment.

Cheers,
Longman

Re: [PATCH -next] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

Posted by Chen Ridong 2 months, 2 weeks ago


On 2025/11/26 11:26, Waiman Long wrote:
> On 11/25/25 10:17 PM, Chen Ridong wrote:
>>
>> On 2025/11/26 10:33, Waiman Long wrote:
>>> On 11/25/25 8:01 PM, Chen Ridong wrote:
>>>> On 2025/11/26 2:16, Waiman Long wrote:
>>>>>> active CPUs, preventing partition_sched_domains from being invoked with
>>>>>> offline CPUs.
>>>>>>
>>>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>>>> ---
>>>>>>     kernel/cgroup/cpuset.c | 29 ++++++-----------------------
>>>>>>     1 file changed, 6 insertions(+), 23 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>>>> index daf813386260..1ac58e3f26b4 100644
>>>>>> --- a/kernel/cgroup/cpuset.c
>>>>>> +++ b/kernel/cgroup/cpuset.c
>>>>>> @@ -1084,11 +1084,10 @@ void dl_rebuild_rd_accounting(void)
>>>>>>      */
>>>>>>     void rebuild_sched_domains_locked(void)
>>>>>>     {
>>>>>> -    struct cgroup_subsys_state *pos_css;
>>>>>>         struct sched_domain_attr *attr;
>>>>>>         cpumask_var_t *doms;
>>>>>> -    struct cpuset *cs;
>>>>>>         int ndoms;
>>>>>> +    int i;
>>>>>>           lockdep_assert_cpus_held();
>>>>>>         lockdep_assert_held(&cpuset_mutex);
>>>>> In fact, the following code and the comments above in rebuild_sched_domains_locked() are also no
>>>>> longer relevant. So you may remove them as well.
>>>>>
>>>>>           if (!top_cpuset.nr_subparts_cpus &&
>>>>>               !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>>>>                   return;
>>>>>
>>>> Thank you for reminding me.
>>>>
>>>> I initially retained this code because I believed it was still required for cgroup v1, as I
>>>> recalled
>>>> that synchronous operation is exclusive to cgroup v2.
>>>>
>>>> However, upon re-examining the code, I confirm it can be safely removed. For cgroup v1,
>>>> rebuild_sched_domains_locked is called synchronously, and only the migration task (handled by
>>>> cpuset_migrate_tasks_workfn) operates asynchronously. Consequently, cpuset_hotplug_workfn is
>>>> guaranteed to complete before the hotplug workflow finishes.
>>> Yes, v1 still have a task migration part that is done asynchronously because of the lock ordering
>>> issue. Even if this code has to be left because of v1, you should still update the comment to
>>> reflect that. Please try to keep the comment updated to help others to have a better understanding
>>> of what the code is doing.
>>>
>>> Thanks,
>>> Longman
>>>
>> Hi Longman,
>>
>> Just to confirm (in case I misunderstood): I believe it is safe to remove the check on
>> top_cpuset.effective_cpus (for both cgroup v1 and v2). I will proceed to remove both the
>> corresponding code and its associated comment(not update the comment).
>>
>>            if (!top_cpuset.nr_subparts_cpus &&
>>                !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>>                    return;
>>
>> Additionally, I should add a comment to clarify the rationale for introducing the
>> WARN_ON_ONCE(!cpumask_subset(doms[i], cpu_active_mask)) warning.
>>
>> Does this approach look good to you? Please let me know if I’ve missed anything or if further
>> adjustments are needed.
>>
> Yes, that is good for me. I was just talking about a hypothetical situation, not that you have to
> update the comment.
> 

See. Thanks.
-- 
Best regards,
Ridong

Re: [PATCH -next] cpuset: Remove unnecessary checks in rebuild_sched_domains_locked

Posted by Chen Ridong 2 months, 2 weeks ago


On 2025/11/18 16:36, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> Commit 406100f3da08 ("cpuset: fix race between hotplug work and later CPU
> offline")added a check for empty effective_cpus in partitions for cgroup
> v2. However, thischeck did not account for remote partitions, which were
> introduced later.
> 
> After commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug processing
> synchronous"),cgroup v2's cpuset hotplug handling is now synchronous. This
> eliminates the race condition with subsequent CPU offline operations that
> the original check aimed to fix.
> 
> Instead of extending the check to support remote partitions, this patch
> removes the redundant partition effective_cpus check. Additionally, it adds
> a check and warningto verify that all generated sched domains consist of
> active CPUs, preventing partition_sched_domains from being invoked with
> offline CPUs.
> 
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  kernel/cgroup/cpuset.c | 29 ++++++-----------------------
>  1 file changed, 6 insertions(+), 23 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index daf813386260..1ac58e3f26b4 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1084,11 +1084,10 @@ void dl_rebuild_rd_accounting(void)
>   */
>  void rebuild_sched_domains_locked(void)
>  {
> -	struct cgroup_subsys_state *pos_css;
>  	struct sched_domain_attr *attr;
>  	cpumask_var_t *doms;
> -	struct cpuset *cs;
>  	int ndoms;
> +	int i;
>  
>  	lockdep_assert_cpus_held();
>  	lockdep_assert_held(&cpuset_mutex);
> @@ -1107,30 +1106,14 @@ void rebuild_sched_domains_locked(void)
>  	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
>  		return;
>  
> -	/*
> -	 * With subpartition CPUs, however, the effective CPUs of a partition
> -	 * root should be only a subset of the active CPUs.  Since a CPU in any
> -	 * partition root could be offlined, all must be checked.
> -	 */
> -	if (!cpumask_empty(subpartitions_cpus)) {
> -		rcu_read_lock();
> -		cpuset_for_each_descendant_pre(cs, pos_css, &top_cpuset) {
> -			if (!is_partition_valid(cs)) {
> -				pos_css = css_rightmost_descendant(pos_css);
> -				continue;
> -			}
> -			if (!cpumask_subset(cs->effective_cpus,
> -					    cpu_active_mask)) {
> -				rcu_read_unlock();
> -				return;
> -			}
> -		}
> -		rcu_read_unlock();
> -	}
> -
>  	/* Generate domain masks and attrs */
>  	ndoms = generate_sched_domains(&doms, &attr);
>  
> +	for (i = 0; i < ndoms; ++i) {
> +		if (WARN_ON_ONCE(!cpumask_subset(doms[i], cpu_active_mask)))
> +			return;
> +	}
> +
>  	/* Have scheduler rebuild the domains */
>  	partition_sched_domains(ndoms, doms, attr);
>  }

Friendly ping.

-- 
Best regards,
Ridong