[PATCH 00/33 v2] cpuset/isolation: Honour kthreads preferred affinity

Frederic Weisbecker posted 33 patches 1 month ago
Documentation/cpu_isolation/housekeeping.rst | 111 +++++++++++++++
arch/arm64/kernel/cpufeature.c               |  18 ++-
block/blk-mq.c                               |   6 +-
drivers/base/cpu.c                           |   2 +-
drivers/pci/pci-driver.c                     |  50 ++++---
include/linux/cpu.h                          |   4 +
include/linux/cpuhplock.h                    |   1 +
include/linux/cpuset.h                       |   8 +-
include/linux/kthread.h                      |   2 +
include/linux/memcontrol.h                   |   4 +
include/linux/mmu_context.h                  |   2 +-
include/linux/percpu-rwsem.h                 |   1 +
include/linux/sched/isolation.h              |  30 +++--
include/linux/vmstat.h                       |   2 +
include/linux/workqueue.h                    |   2 +-
init/Kconfig                                 |   1 +
kernel/cgroup/cpuset.c                       | 131 +++++++++++++-----
kernel/cpu.c                                 |  42 +++---
kernel/irq/manage.c                          |  47 ++++---
kernel/kthread.c                             | 195 +++++++++++++++++++--------
kernel/sched/isolation.c                     | 185 ++++++++++++++++---------
kernel/sched/sched.h                         |   4 +
kernel/workqueue.c                           |   2 +-
mm/memcontrol.c                              |  25 +++-
mm/vmstat.c                                  |  15 ++-
net/core/net-sysfs.c                         |   2 +-
26 files changed, 639 insertions(+), 253 deletions(-)
[PATCH 00/33 v2] cpuset/isolation: Honour kthreads preferred affinity
Posted by Frederic Weisbecker 1 month ago
Hi,

The kthread code was enhanced lately to provide an infrastructure which
manages the preferred affinity of unbound kthreads (node or custom
cpumask) against housekeeping constraints and CPU hotplug events.

One crucial missing piece is cpuset: when an isolated partition is
created, deleted, or its CPUs updated, all the unbound kthreads in the
top cpuset are affine to _all_ the non-isolated CPUs, possibly breaking
their preferred affinity along the way

Solve this with performing the kthreads affinity update from cpuset to
the kthreads consolidated relevant code instead so that preferred
affinities are honoured.

The dispatch of the new cpumasks to workqueues and kthreads is performed
by housekeeping, as per the nice Tejun's suggestion.

As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set
from isolcpus= and cpuset isolated partitions. Housekeeping cpumasks are
now modifyable with specific synchronization. A big step toward making
nohz_full= also mutable through cpuset in the future.

Changes since v1:

- Drop the housekeeping lock and use RCU to synchronize housekeeping
  against cpuset changes.

- Add housekeeping documentation

- Simplify CPU hotplug handling

- Collect ack from Shakeel Butt

- Handle sched/arm64's task fallback cpumask move to HK_TYPE_DOMAIN

- Fix genirq kthreads affinity

- Add missing kernel doc

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	kthread/core-v2

HEAD: 092784f7df0aa6415c91ae5edc1c1a72603b5c50
Thanks,
	Frederic
---

Frederic Weisbecker (32):
      sched/isolation: Remove housekeeping static key
      PCI: Protect against concurrent change of housekeeping cpumask
      cpu: Revert "cpu/hotplug: Prevent self deadlock on CPU hot-unplug"
      memcg: Prepare to protect against concurrent isolated cpuset change
      mm: vmstat: Prepare to protect against concurrent isolated cpuset change
      sched/isolation: Save boot defined domain flags
      cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOT
      driver core: cpu: Convert /sys/devices/system/cpu/isolated to use HK_TYPE_DOMAIN_BOOT
      net: Keep ignoring isolated cpuset change
      block: Protect against concurrent isolated cpuset change
      cpu: Provide lockdep check for CPU hotplug lock write-held
      cpuset: Provide lockdep check for cpuset lock held
      sched/isolation: Convert housekeeping cpumasks to rcu pointers
      cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset
      sched/isolation: Flush memcg workqueues on cpuset isolated partition change
      sched/isolation: Flush vmstat workqueues on cpuset isolated partition change
      cpuset: Propagate cpuset isolation update to workqueue through housekeeping
      cpuset: Remove cpuset_cpu_is_isolated()
      sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated()
      PCI: Remove superfluous HK_TYPE_WQ check
      kthread: Refine naming of affinity related fields
      kthread: Include unbound kthreads in the managed affinity list
      kthread: Include kthreadd to the managed affinity list
      kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management
      sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN
      sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN
      kthread: Honour kthreads preferred affinity after cpuset changes
      kthread: Comment on the purpose and placement of kthread_affine_node() call
      kthread: Add API to update preferred affinity on kthread runtime
      kthread: Document kthread_affine_preferred()
      genirq: Correctly handle preferred kthreads affinity
      doc: Add housekeeping documentation

Gabriele Monaco (1):
      cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping

 Documentation/cpu_isolation/housekeeping.rst | 111 +++++++++++++++
 arch/arm64/kernel/cpufeature.c               |  18 ++-
 block/blk-mq.c                               |   6 +-
 drivers/base/cpu.c                           |   2 +-
 drivers/pci/pci-driver.c                     |  50 ++++---
 include/linux/cpu.h                          |   4 +
 include/linux/cpuhplock.h                    |   1 +
 include/linux/cpuset.h                       |   8 +-
 include/linux/kthread.h                      |   2 +
 include/linux/memcontrol.h                   |   4 +
 include/linux/mmu_context.h                  |   2 +-
 include/linux/percpu-rwsem.h                 |   1 +
 include/linux/sched/isolation.h              |  30 +++--
 include/linux/vmstat.h                       |   2 +
 include/linux/workqueue.h                    |   2 +-
 init/Kconfig                                 |   1 +
 kernel/cgroup/cpuset.c                       | 131 +++++++++++++-----
 kernel/cpu.c                                 |  42 +++---
 kernel/irq/manage.c                          |  47 ++++---
 kernel/kthread.c                             | 195 +++++++++++++++++++--------
 kernel/sched/isolation.c                     | 185 ++++++++++++++++---------
 kernel/sched/sched.h                         |   4 +
 kernel/workqueue.c                           |   2 +-
 mm/memcontrol.c                              |  25 +++-
 mm/vmstat.c                                  |  15 ++-
 net/core/net-sysfs.c                         |   2 +-
 26 files changed, 639 insertions(+), 253 deletions(-)
Re: [PATCH 00/33 v2] cpuset/isolation: Honour kthreads preferred affinity
Posted by Waiman Long 1 month ago
On 8/29/25 11:47 AM, Frederic Weisbecker wrote:
> Hi,
>
> The kthread code was enhanced lately to provide an infrastructure which
> manages the preferred affinity of unbound kthreads (node or custom
> cpumask) against housekeeping constraints and CPU hotplug events.
>
> One crucial missing piece is cpuset: when an isolated partition is
> created, deleted, or its CPUs updated, all the unbound kthreads in the
> top cpuset are affine to _all_ the non-isolated CPUs, possibly breaking
> their preferred affinity along the way
>
> Solve this with performing the kthreads affinity update from cpuset to
> the kthreads consolidated relevant code instead so that preferred
> affinities are honoured.
>
> The dispatch of the new cpumasks to workqueues and kthreads is performed
> by housekeeping, as per the nice Tejun's suggestion.
>
> As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set
> from isolcpus= and cpuset isolated partitions. Housekeeping cpumasks are
> now modifyable with specific synchronization. A big step toward making
> nohz_full= also mutable through cpuset in the future.
>
> Changes since v1:
>
> - Drop the housekeeping lock and use RCU to synchronize housekeeping
>    against cpuset changes.
>
> - Add housekeeping documentation
>
> - Simplify CPU hotplug handling
>
> - Collect ack from Shakeel Butt
>
> - Handle sched/arm64's task fallback cpumask move to HK_TYPE_DOMAIN
>
> - Fix genirq kthreads affinity
>
> - Add missing kernel doc
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	kthread/core-v2
>
> HEAD: 092784f7df0aa6415c91ae5edc1c1a72603b5c50
> Thanks,
> 	Frederic

I have finally finished the review of this long patch series. I like 
your current approach and I will adopt my RFC patch series to be based 
on yours. However, I do have comments and is looking forward to your 
response.

Thanks,
Longman
Re: [PATCH 00/33 v2] cpuset/isolation: Honour kthreads preferred affinity
Posted by Frederic Weisbecker 1 week, 3 days ago
Le Tue, Sep 02, 2025 at 03:12:04PM -0400, Waiman Long a écrit :
> On 8/29/25 11:47 AM, Frederic Weisbecker wrote:
> > Hi,
> > 
> > The kthread code was enhanced lately to provide an infrastructure which
> > manages the preferred affinity of unbound kthreads (node or custom
> > cpumask) against housekeeping constraints and CPU hotplug events.
> > 
> > One crucial missing piece is cpuset: when an isolated partition is
> > created, deleted, or its CPUs updated, all the unbound kthreads in the
> > top cpuset are affine to _all_ the non-isolated CPUs, possibly breaking
> > their preferred affinity along the way
> > 
> > Solve this with performing the kthreads affinity update from cpuset to
> > the kthreads consolidated relevant code instead so that preferred
> > affinities are honoured.
> > 
> > The dispatch of the new cpumasks to workqueues and kthreads is performed
> > by housekeeping, as per the nice Tejun's suggestion.
> > 
> > As a welcome side effect, HK_TYPE_DOMAIN then integrates both the set
> > from isolcpus= and cpuset isolated partitions. Housekeeping cpumasks are
> > now modifyable with specific synchronization. A big step toward making
> > nohz_full= also mutable through cpuset in the future.
> > 
> > Changes since v1:
> > 
> > - Drop the housekeeping lock and use RCU to synchronize housekeeping
> >    against cpuset changes.
> > 
> > - Add housekeeping documentation
> > 
> > - Simplify CPU hotplug handling
> > 
> > - Collect ack from Shakeel Butt
> > 
> > - Handle sched/arm64's task fallback cpumask move to HK_TYPE_DOMAIN
> > 
> > - Fix genirq kthreads affinity
> > 
> > - Add missing kernel doc
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > 	kthread/core-v2
> > 
> > HEAD: 092784f7df0aa6415c91ae5edc1c1a72603b5c50
> > Thanks,
> > 	Frederic
> 
> I have finally finished the review of this long patch series. I like your
> current approach and I will adopt my RFC patch series to be based on yours.
> However, I do have comments and is looking forward to your response.
> 
> Thanks,
> Longman

Thanks a lot for the detailed reviews, I'll try to address all that and repost!

-- 
Frederic Weisbecker
SUSE Labs