[PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues

Waiman Long posted 8 patches 1 month, 1 week ago
kernel/cgroup/cpuset.c                        | 220 +++++++++++------
kernel/sched/isolation.c                      |   4 +-
kernel/time/timer_migration.c                 |   4 +-
.../selftests/cgroup/test_cpuset_prs.sh       | 225 +++++++++---------
4 files changed, 265 insertions(+), 188 deletions(-)
[PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues
Posted by Waiman Long 1 month, 1 week ago
 v6:
  - Rebase on top of the latest v7.0 pre-RC linux tree.
  - Add another fix patch to fix found during code inspection.
  - Revert back to the v4 idea of just deferring the housekeeping_update()
    call to workqueue to make it simple as v5 change will add quite a
    bit more complexity to the cpuset code.

 v5:
  - https://lore.kernel.org/lkml/20260212164640.2408295-1-longman@redhat.com/

After booting the latest linux debug kernel with the latest cgroup
changes as well as Federic's "cpuset/isolation: Honour kthreads
preferred affinity" patch series [1] merged on top and running the
test-cpuset-prs.sh test, a circular locking dependency lockdep splat
was reported. See patch 5 for details.

To fix this issue, the cpuset code is modified to not call
housekeeping_update() with cpu_hotplug_lock held.  The cpuset hotplug
code is also modified to defer the housekeeping_update() call, if needed,
to workqueue.  A new top level cpuset_top_mutex is added to have more
exclusion control.

With these changes in place, the cpuset test ran to completion with no
failure and no lockdep splat.

[1] https://lore.kernel.org/lkml/20260125224541.50226-1-frederic@kernel.org/

Waiman Long (8):
  cgroup/cpuset: Fix incorrect change to effective_xcpus in
    partition_xcpus_del()
  cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in
    update_cpumasks_hier()
  cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
  cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is
    changed
  kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
  cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains()
    together
  cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to
    workqueue
  cgroup/cpuset: Call housekeeping_update() without holding
    cpus_read_lock

 kernel/cgroup/cpuset.c                        | 220 +++++++++++------
 kernel/sched/isolation.c                      |   4 +-
 kernel/time/timer_migration.c                 |   4 +-
 .../selftests/cgroup/test_cpuset_prs.sh       | 225 +++++++++---------
 4 files changed, 265 insertions(+), 188 deletions(-)

-- 
2.53.0
Re: [PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues
Posted by Tejun Heo 1 month, 1 week ago
Hello,

> Waiman Long (8):
>   cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del()
>   cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier()
>   cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
>   cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed
>   kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
>   cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together
>   cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue
>   cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock

Applied 1-8 to cgroup/for-7.0-fixes with the following minor fixups:

- 5/8: Removed a duplicate test entry that resulted from the "S+"
  removal (two previously-different lines becoming identical).

- 8/8: Fixed typos in commit message ("essentally" -> "essentially",
  "beforce" -> "before") and code comment ("top_cpuset_mutex" ->
  "cpuset_top_mutex").

This has gone through more than enough iterations. We can resolve further
issues if there's any incrementally.

Thanks.

--
tejun
Re: [PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues
Posted by Frederic Weisbecker 1 month ago
On Mon, Feb 23, 2026 at 10:57:24AM -1000, Tejun Heo wrote:
> Hello,
> 
> > Waiman Long (8):
> >   cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del()
> >   cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier()
> >   cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
> >   cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed
> >   kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
> >   cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together
> >   cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue
> >   cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock
> 
> Applied 1-8 to cgroup/for-7.0-fixes with the following minor fixups:
> 
> - 5/8: Removed a duplicate test entry that resulted from the "S+"
>   removal (two previously-different lines becoming identical).
> 
> - 8/8: Fixed typos in commit message ("essentally" -> "essentially",
>   "beforce" -> "before") and code comment ("top_cpuset_mutex" ->
>   "cpuset_top_mutex").
> 
> This has gone through more than enough iterations. We can resolve further
> issues if there's any incrementally.

We really need to check the fact that the workqueue is not flushed at any
relevant point in hotplug such that:

- offline CPU might now appear in the live topology, quite dangerous.

- CPUs might not be timely (un)isolated when they are expected to.

Thanks.

> 
> Thanks.
> 
> --
> tejun
Re: [PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues
Posted by Waiman Long 1 month, 1 week ago
On 2/23/26 3:57 PM, Tejun Heo wrote:
> Hello,
>
>> Waiman Long (8):
>>    cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del()
>>    cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier()
>>    cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
>>    cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed
>>    kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
>>    cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together
>>    cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue
>>    cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock
> Applied 1-8 to cgroup/for-7.0-fixes with the following minor fixups:
>
> - 5/8: Removed a duplicate test entry that resulted from the "S+"
>    removal (two previously-different lines becoming identical).
>
> - 8/8: Fixed typos in commit message ("essentally" -> "essentially",
>    "beforce" -> "before") and code comment ("top_cpuset_mutex" ->
>    "cpuset_top_mutex").
>
> This has gone through more than enough iterations. We can resolve further
> issues if there's any incrementally.

Thanks for fixing the errors.

Cheers,
Longman
Re: [PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues
Posted by Chen Ridong 1 month ago

On 2026/2/24 5:11, Waiman Long wrote:
> 
> On 2/23/26 3:57 PM, Tejun Heo wrote:
>> Hello,
>>
>>> Waiman Long (8):
>>>    cgroup/cpuset: Fix incorrect change to effective_xcpus in
>>> partition_xcpus_del()
>>>    cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in
>>> update_cpumasks_hier()
>>>    cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
>>>    cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed
>>>    kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
>>>    cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together
>>>    cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to
>>> workqueue
>>>    cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock
>> Applied 1-8 to cgroup/for-7.0-fixes with the following minor fixups:
>>
>> - 5/8: Removed a duplicate test entry that resulted from the "S+"
>>    removal (two previously-different lines becoming identical).
>>
>> - 8/8: Fixed typos in commit message ("essentally" -> "essentially",
>>    "beforce" -> "before") and code comment ("top_cpuset_mutex" ->
>>    "cpuset_top_mutex").
>>
>> This has gone through more than enough iterations. We can resolve further
>> issues if there's any incrementally.
> 
> Thanks for fixing the errors.
> 
> Cheers,
> Longman
> 

This series looks good to me, it's much clearer now.

Thanks.

-- 
Best regards,
Ridong