From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F060322DCE for ; Fri, 29 Aug 2025 15:48:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482503; cv=none; b=IByCYGgs1UUXc/jKEFxonD2vZtVeDOr20xA6rEW/nQXaD4tBajQ71D6dDr7Js9+KRy1W7vhTSG3lt75xFpt8CABd9FUIgoCkDH0kgKdO4wxU1Xi1RViove3QLJKds1BM04af/TWqR0pjPHtYGLme0ob1LoEzQrLlEu8fb5Cifu0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482503; c=relaxed/simple; bh=1NsscIZdy+zc/Q1UQVJIqNLBjDKmHl6bm+ib3E4jCdY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MNkcCYg2RqaLWrD9/gKq2pI5eNfcbmTgbkbycAXpY79C+PxwLlqXKI1uIc3TtT4xO/7mJIZysGTm/rC9Re4OCr6YqRo+FQ789wPEjMMocRlk0RzfDSrr5jyx9RWEHz2w7N5wbYzZnuP9t9IJ30dZEDWqUu6PJNjjTtSpCBP0EVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=moJhYSwv; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="moJhYSwv" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BB9CCC4CEF5; Fri, 29 Aug 2025 15:48:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482503; bh=1NsscIZdy+zc/Q1UQVJIqNLBjDKmHl6bm+ib3E4jCdY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=moJhYSwvkJTae0v8crpHZX4NyzTE8GHMQGDWQusW0tf/X9qrGl0rTpLFXZFdtrMC2 WfQSZPTnxE5QMn0Eh6uC/M22U4IebQeVbi5bAAD5V+1Xom+DRJtcCfuizaG70sftou +la+F9CZjXfeU09KCcR/rpn4L3loSCW0TB4swqDmHGYd0iO9UdnmDxbyUS52NBOCBR tKKgJBf/8IQvYwe+lLU04FGc4/0TWy/JeQBFo3sqUKPm296IRoPbk4Cpqny+TG6qVg JTqtHNAHFmAG1l4r/rV+Flr8AeDCsNyQbkPMzFgifjigDfnj4GQeVCqo1VVygbZift 9bD3RksIsGYlw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Ingo Molnar , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 01/33] sched/isolation: Remove housekeeping static key Date: Fri, 29 Aug 2025 17:47:42 +0200 Message-ID: <20250829154814.47015-2-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The housekeeping static key in its current use is mostly irrelevant. Most of the time, a housekeeping function call had already been issued before the static call got a chance to be evaluated, defeating the initial call optimization purpose. housekeeping_cpu() is the sole correct user performing the static call before the actual slow-path function call. But it's seldom used in fast-path. Finally the static call prevents from synchronizing correctly against dynamic updates of the housekeeping cpumasks through cpusets. Get away with a simple flag test instead. Signed-off-by: Frederic Weisbecker Reviewed-by: Phil Auld --- include/linux/sched/isolation.h | 25 +++++---- kernel/sched/isolation.c | 90 ++++++++++++++------------------- 2 files changed, 55 insertions(+), 60 deletions(-) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index d8501f4709b5..f98ba0d71c52 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -25,12 +25,22 @@ enum hk_type { }; =20 #ifdef CONFIG_CPU_ISOLATION -DECLARE_STATIC_KEY_FALSE(housekeeping_overridden); +extern unsigned long housekeeping_flags; + extern int housekeeping_any_cpu(enum hk_type type); extern const struct cpumask *housekeeping_cpumask(enum hk_type type); extern bool housekeeping_enabled(enum hk_type type); extern void housekeeping_affine(struct task_struct *t, enum hk_type type); extern bool housekeeping_test_cpu(int cpu, enum hk_type type); + +static inline bool housekeeping_cpu(int cpu, enum hk_type type) +{ + if (housekeeping_flags & BIT(type)) + return housekeeping_test_cpu(cpu, type); + else + return true; +} + extern void __init housekeeping_init(void); =20 #else @@ -58,17 +68,14 @@ static inline bool housekeeping_test_cpu(int cpu, enum = hk_type type) return true; } =20 +static inline bool housekeeping_cpu(int cpu, enum hk_type type) +{ + return true; +} + static inline void housekeeping_init(void) { } #endif /* CONFIG_CPU_ISOLATION */ =20 -static inline bool housekeeping_cpu(int cpu, enum hk_type type) -{ -#ifdef CONFIG_CPU_ISOLATION - if (static_branch_unlikely(&housekeeping_overridden)) - return housekeeping_test_cpu(cpu, type); -#endif - return true; -} =20 static inline bool cpu_is_isolated(int cpu) { diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index a4cf17b1fab0..2a6fc6fc46fb 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -16,19 +16,13 @@ enum hk_flags { HK_FLAG_KERNEL_NOISE =3D BIT(HK_TYPE_KERNEL_NOISE), }; =20 -DEFINE_STATIC_KEY_FALSE(housekeeping_overridden); -EXPORT_SYMBOL_GPL(housekeeping_overridden); - -struct housekeeping { - cpumask_var_t cpumasks[HK_TYPE_MAX]; - unsigned long flags; -}; - -static struct housekeeping housekeeping; +static cpumask_var_t housekeeping_cpumasks[HK_TYPE_MAX]; +unsigned long housekeeping_flags; +EXPORT_SYMBOL_GPL(housekeeping_flags); =20 bool housekeeping_enabled(enum hk_type type) { - return !!(housekeeping.flags & BIT(type)); + return !!(housekeeping_flags & BIT(type)); } EXPORT_SYMBOL_GPL(housekeeping_enabled); =20 @@ -36,50 +30,46 @@ int housekeeping_any_cpu(enum hk_type type) { int cpu; =20 - if (static_branch_unlikely(&housekeeping_overridden)) { - if (housekeeping.flags & BIT(type)) { - cpu =3D sched_numa_find_closest(housekeeping.cpumasks[type], smp_proces= sor_id()); - if (cpu < nr_cpu_ids) - return cpu; + if (housekeeping_flags & BIT(type)) { + cpu =3D sched_numa_find_closest(housekeeping_cpumasks[type], smp_process= or_id()); + if (cpu < nr_cpu_ids) + return cpu; =20 - cpu =3D cpumask_any_and_distribute(housekeeping.cpumasks[type], cpu_onl= ine_mask); - if (likely(cpu < nr_cpu_ids)) - return cpu; - /* - * Unless we have another problem this can only happen - * at boot time before start_secondary() brings the 1st - * housekeeping CPU up. - */ - WARN_ON_ONCE(system_state =3D=3D SYSTEM_RUNNING || - type !=3D HK_TYPE_TIMER); - } + cpu =3D cpumask_any_and_distribute(housekeeping_cpumasks[type], cpu_onli= ne_mask); + if (likely(cpu < nr_cpu_ids)) + return cpu; + /* + * Unless we have another problem this can only happen + * at boot time before start_secondary() brings the 1st + * housekeeping CPU up. + */ + WARN_ON_ONCE(system_state =3D=3D SYSTEM_RUNNING || + type !=3D HK_TYPE_TIMER); } + return smp_processor_id(); } EXPORT_SYMBOL_GPL(housekeeping_any_cpu); =20 const struct cpumask *housekeeping_cpumask(enum hk_type type) { - if (static_branch_unlikely(&housekeeping_overridden)) - if (housekeeping.flags & BIT(type)) - return housekeeping.cpumasks[type]; + if (housekeeping_flags & BIT(type)) + return housekeeping_cpumasks[type]; return cpu_possible_mask; } EXPORT_SYMBOL_GPL(housekeeping_cpumask); =20 void housekeeping_affine(struct task_struct *t, enum hk_type type) { - if (static_branch_unlikely(&housekeeping_overridden)) - if (housekeeping.flags & BIT(type)) - set_cpus_allowed_ptr(t, housekeeping.cpumasks[type]); + if (housekeeping_flags & BIT(type)) + set_cpus_allowed_ptr(t, housekeeping_cpumasks[type]); } EXPORT_SYMBOL_GPL(housekeeping_affine); =20 bool housekeeping_test_cpu(int cpu, enum hk_type type) { - if (static_branch_unlikely(&housekeeping_overridden)) - if (housekeeping.flags & BIT(type)) - return cpumask_test_cpu(cpu, housekeeping.cpumasks[type]); + if (housekeeping_flags & BIT(type)) + return cpumask_test_cpu(cpu, housekeeping_cpumasks[type]); return true; } EXPORT_SYMBOL_GPL(housekeeping_test_cpu); @@ -88,17 +78,15 @@ void __init housekeeping_init(void) { enum hk_type type; =20 - if (!housekeeping.flags) + if (!housekeeping_flags) return; =20 - static_branch_enable(&housekeeping_overridden); - - if (housekeeping.flags & HK_FLAG_KERNEL_NOISE) + if (housekeeping_flags & HK_FLAG_KERNEL_NOISE) sched_tick_offload_init(); =20 - for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) { + for_each_set_bit(type, &housekeeping_flags, HK_TYPE_MAX) { /* We need at least one CPU to handle housekeeping work */ - WARN_ON_ONCE(cpumask_empty(housekeeping.cpumasks[type])); + WARN_ON_ONCE(cpumask_empty(housekeeping_cpumasks[type])); } } =20 @@ -106,8 +94,8 @@ static void __init housekeeping_setup_type(enum hk_type = type, cpumask_var_t housekeeping_staging) { =20 - alloc_bootmem_cpumask_var(&housekeeping.cpumasks[type]); - cpumask_copy(housekeeping.cpumasks[type], + alloc_bootmem_cpumask_var(&housekeeping_cpumasks[type]); + cpumask_copy(housekeeping_cpumasks[type], housekeeping_staging); } =20 @@ -117,7 +105,7 @@ static int __init housekeeping_setup(char *str, unsigne= d long flags) unsigned int first_cpu; int err =3D 0; =20 - if ((flags & HK_FLAG_KERNEL_NOISE) && !(housekeeping.flags & HK_FLAG_KERN= EL_NOISE)) { + if ((flags & HK_FLAG_KERNEL_NOISE) && !(housekeeping_flags & HK_FLAG_KERN= EL_NOISE)) { if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) { pr_warn("Housekeeping: nohz unsupported." " Build with CONFIG_NO_HZ_FULL\n"); @@ -139,7 +127,7 @@ static int __init housekeeping_setup(char *str, unsigne= d long flags) if (first_cpu >=3D nr_cpu_ids || first_cpu >=3D setup_max_cpus) { __cpumask_set_cpu(smp_processor_id(), housekeeping_staging); __cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask); - if (!housekeeping.flags) { + if (!housekeeping_flags) { pr_warn("Housekeeping: must include one present CPU, " "using boot CPU:%d\n", smp_processor_id()); } @@ -148,7 +136,7 @@ static int __init housekeeping_setup(char *str, unsigne= d long flags) if (cpumask_empty(non_housekeeping_mask)) goto free_housekeeping_staging; =20 - if (!housekeeping.flags) { + if (!housekeeping_flags) { /* First setup call ("nohz_full=3D" or "isolcpus=3D") */ enum hk_type type; =20 @@ -157,26 +145,26 @@ static int __init housekeeping_setup(char *str, unsig= ned long flags) } else { /* Second setup call ("nohz_full=3D" after "isolcpus=3D" or the reverse)= */ enum hk_type type; - unsigned long iter_flags =3D flags & housekeeping.flags; + unsigned long iter_flags =3D flags & housekeeping_flags; =20 for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) { if (!cpumask_equal(housekeeping_staging, - housekeeping.cpumasks[type])) { + housekeeping_cpumasks[type])) { pr_warn("Housekeeping: nohz_full=3D must match isolcpus=3D\n"); goto free_housekeeping_staging; } } =20 - iter_flags =3D flags & ~housekeeping.flags; + iter_flags =3D flags & ~housekeeping_flags; =20 for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) housekeeping_setup_type(type, housekeeping_staging); } =20 - if ((flags & HK_FLAG_KERNEL_NOISE) && !(housekeeping.flags & HK_FLAG_KERN= EL_NOISE)) + if ((flags & HK_FLAG_KERNEL_NOISE) && !(housekeeping_flags & HK_FLAG_KERN= EL_NOISE)) tick_nohz_full_setup(non_housekeeping_mask); =20 - housekeeping.flags |=3D flags; + housekeeping_flags |=3D flags; err =3D 1; =20 free_housekeeping_staging: --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F422326D6E; Fri, 29 Aug 2025 15:48:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482506; cv=none; b=hcSnGMc4iJ8O6s9bMhGU2kyt68bvg/MiIVF2r2MArtAq0oNoNePeMbbSoeVaY25g/GvCKA6DA0utplmxw+ncYImmckFNAibTHicqK+t3+myczKDhB0WSVJc1zHfs/8FEhlN6zoSn+tJe3IL7F4HYPwABI+6eW7I48/BHemZYtJM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482506; c=relaxed/simple; bh=SuR/4PvyX3AQ13jiY8VYnneI4pJyFeidjVrUVvNAUMQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lhjZKnvgdMeLfwG3FW9tT0WqXcVyUueLUOGy2UYsLth4a1/ngQNpbsWXvPOxdkOR0K6MMnrqAgVE0TDBcdzYHpoLf9PnXBT5blnyh85PhmQAreZtDk7qF/4MJmmYOx0Uvs52usGfhUfgCyVFREAG4/WOMzC4/6KugZoORofjRBo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LIZDY4St; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LIZDY4St" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6E01CC4CEF7; Fri, 29 Aug 2025 15:48:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482505; bh=SuR/4PvyX3AQ13jiY8VYnneI4pJyFeidjVrUVvNAUMQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LIZDY4StLXVa/LB5d9TsXGTHLSuUzB04YE5nCWjZ/7J6S/WvboX0d3QndN9OcCQVu b2qvGly3nZnPQ2KS+LiPs5Gm+BG5nEp1o05HyNvGbJRTgLfN4fDbcEW2Wk+vIsB21t 8jdEC59wBRKTAAW8ItIIqBMzLw/MquU4IJN0tDHei2eSs8AsGV7Fm7XZZKo8Jh6Uaf Xypbie2z45y0uYQ4z4ZfuK4bQOisvTaDlPuIxtx/Dao7AqMxTwlRF2RAObSQflYDGj hCKr2J7EqQFBFEu5z6URC1OexkAmkg76LhVY/kPYY5J+pv7S+6V7tF6y7V9VtlYBJh +cU35ODJSZQkA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Bjorn Helgaas , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , linux-pci@vger.kernel.org Subject: [PATCH 02/33] PCI: Protect against concurrent change of housekeeping cpumask Date: Fri, 29 Aug 2025 17:47:43 +0200 Message-ID: <20250829154814.47015-3-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" HK_TYPE_DOMAIN will soon integrate cpuset isolated partitions and therefore be made modifyable at runtime. Synchronize against the cpumask update using RCU. The RCU locked section includes both the housekeeping CPU target election for the PCI probe work and the work enqueue. This way the housekeeping update side will simply need to flush the pending related works after updating the housekeeping mask in order to make sure that no PCI work ever executes on an isolated CPU. Signed-off-by: Frederic Weisbecker --- drivers/pci/pci-driver.c | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 63665240ae87..cf2b83004886 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -302,9 +302,8 @@ struct drv_dev_and_id { const struct pci_device_id *id; }; =20 -static long local_pci_probe(void *_ddi) +static int local_pci_probe(struct drv_dev_and_id *ddi) { - struct drv_dev_and_id *ddi =3D _ddi; struct pci_dev *pci_dev =3D ddi->dev; struct pci_driver *pci_drv =3D ddi->drv; struct device *dev =3D &pci_dev->dev; @@ -338,6 +337,19 @@ static long local_pci_probe(void *_ddi) return 0; } =20 +struct pci_probe_arg { + struct drv_dev_and_id *ddi; + struct work_struct work; + int ret; +}; + +static void local_pci_probe_callback(struct work_struct *work) +{ + struct pci_probe_arg *arg =3D container_of(work, struct pci_probe_arg, wo= rk); + + arg->ret =3D local_pci_probe(arg->ddi); +} + static bool pci_physfn_is_probed(struct pci_dev *dev) { #ifdef CONFIG_PCI_IOV @@ -362,34 +374,44 @@ static int pci_call_probe(struct pci_driver *drv, str= uct pci_dev *dev, dev->is_probed =3D 1; =20 cpu_hotplug_disable(); - /* * Prevent nesting work_on_cpu() for the case where a Virtual Function * device is probed from work_on_cpu() of the Physical device. */ if (node < 0 || node >=3D MAX_NUMNODES || !node_online(node) || pci_physfn_is_probed(dev)) { - cpu =3D nr_cpu_ids; + error =3D local_pci_probe(&ddi); } else { cpumask_var_t wq_domain_mask; + struct pci_probe_arg arg =3D { .ddi =3D &ddi }; + + INIT_WORK_ONSTACK(&arg.work, local_pci_probe_callback); =20 if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) { error =3D -ENOMEM; goto out; } + + rcu_read_lock(); cpumask_and(wq_domain_mask, housekeeping_cpumask(HK_TYPE_WQ), housekeeping_cpumask(HK_TYPE_DOMAIN)); =20 cpu =3D cpumask_any_and(cpumask_of_node(node), wq_domain_mask); + if (cpu < nr_cpu_ids) { + schedule_work_on(cpu, &arg.work); + rcu_read_unlock(); + flush_work(&arg.work); + error =3D arg.ret; + } else { + rcu_read_unlock(); + error =3D local_pci_probe(&ddi); + } + free_cpumask_var(wq_domain_mask); + destroy_work_on_stack(&arg.work); } - - if (cpu < nr_cpu_ids) - error =3D work_on_cpu(cpu, local_pci_probe, &ddi); - else - error =3D local_pci_probe(&ddi); out: dev->is_probed =3D 0; cpu_hotplug_enable(); --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DDC032A3E9 for ; Fri, 29 Aug 2025 15:48:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482508; cv=none; b=BcldlwzUEsRQPDgmgK9JQA1rTgNAggwCNq9n3xMREp7hZaxi4yY43OXZPHiHuzprPHks1vc5xIz1i9TW2688kRPtCPETZwITk4aPvmCaRz8ZIh8MDbGIsM3K5xhSVybRNiNi+PPOAC07Zh2DV5UKtIpG5LOjl0gc+eX47ozRKrw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482508; c=relaxed/simple; bh=bjPVH71yYHRT38x+a2qhkcGYmz0Liwe3xJMHVBXj3C4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aHwd5Cm6Im09svjhitutE8d5gjfauwztcKDaFCDBrydNKANq36e/Ich6qDcnJgdpbP8NA/eZkf4+1t7H5x/uPGEHyULTlSwrmlPmk0X9oTYnE7rcCF/NZ6dCt6JaXQmi10/gKAtt6FTN1M8DYqHVAkQstJfoWTC5JSoQODwr+VM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=T4GMj8XN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="T4GMj8XN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 36AFDC4CEF8; Fri, 29 Aug 2025 15:48:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482508; bh=bjPVH71yYHRT38x+a2qhkcGYmz0Liwe3xJMHVBXj3C4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=T4GMj8XNJnYSDWsfs1/7aLKXCLVOKZ7yv8HQwWtwyZnEetNUlm8SA50kW9PP6OEDx NGZzW0FTJk705rXkrEXLxamFE/2oOEUqpN9grsWkpCIGVwU8Hij/csR5bpcbN0bYwG yXeWLIwpIMRMmXTmy3XkRTX6y/oL4gmj76nkT/OANpVMGtS+p8vYXpRLWrrx2BB9km 4dTHsBupNUvqOzMwS80ThTT/Q7kjRw5OLPg2vyuc3e6FBhFBev3YVA85acO1APtp6E T1Q1Fd/n0RblJ8pM6UudIMidLJlEG7i3iWh7/rv7saey7ZAOMmpYaEuguTQglYGiKi S+8j6wmij56kQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Waiman Long Subject: [PATCH 03/33] cpu: Revert "cpu/hotplug: Prevent self deadlock on CPU hot-unplug" Date: Fri, 29 Aug 2025 17:47:44 +0200 Message-ID: <20250829154814.47015-4-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 1) The commit: 2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug") was added to fix an issue where the hotplug control task (BP) was throttled between CPUHP_AP_IDLE_DEAD and CPUHP_HRTIMERS_PREPARE waiting in the hrtimer blindspot for the bandwidth callback queued in the dead CPU. 2) Later on, the commit: 38685e2a0476 ("cpu/hotplug: Don't offline the last non-isolated CPU") plugged on the target selection for the workqueue offloaded CPU down process to prevent from destroying the last CPU domain. 3) Finally: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earl= ier") removed entirely the conditions for the race exposed and partially fixed in 1). The offloading of the CPU down process to a workqueue on another CPU then becomes unnecessary. But the last CPU belonging to scheduler domains must still remain online. Therefore revert the now obsolete commit 2b8272ff4a70b866106ae13c36be7ecbef5d5da2 and move the housekeeping check under the cpu_hotplug_lock write held. Since HK_TYPE_DOMAIN will include both isolcpus and cpuset isolated partition, the hotplug lock will synchronize against concurrent cpuset partition updates. Signed-off-by: Frederic Weisbecker --- kernel/cpu.c | 37 +++++++++++-------------------------- 1 file changed, 11 insertions(+), 26 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index db9f6c539b28..453a806af2ee 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1410,6 +1410,16 @@ static int __ref _cpu_down(unsigned int cpu, int tas= ks_frozen, =20 cpus_write_lock(); =20 + /* + * Keep at least one housekeeping cpu onlined to avoid generating + * an empty sched_domain span. + */ + if (cpumask_any_and(cpu_online_mask, + housekeeping_cpumask(HK_TYPE_DOMAIN)) >=3D nr_cpu_ids) { + ret =3D -EBUSY; + goto out; + } + cpuhp_tasks_frozen =3D tasks_frozen; =20 prev_state =3D cpuhp_set_state(cpu, st, target); @@ -1456,22 +1466,8 @@ static int __ref _cpu_down(unsigned int cpu, int tas= ks_frozen, return ret; } =20 -struct cpu_down_work { - unsigned int cpu; - enum cpuhp_state target; -}; - -static long __cpu_down_maps_locked(void *arg) -{ - struct cpu_down_work *work =3D arg; - - return _cpu_down(work->cpu, 0, work->target); -} - static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target) { - struct cpu_down_work work =3D { .cpu =3D cpu, .target =3D target, }; - /* * If the platform does not support hotplug, report it explicitly to * differentiate it from a transient offlining failure. @@ -1480,18 +1476,7 @@ static int cpu_down_maps_locked(unsigned int cpu, en= um cpuhp_state target) return -EOPNOTSUPP; if (cpu_hotplug_disabled) return -EBUSY; - - /* - * Ensure that the control task does not run on the to be offlined - * CPU to prevent a deadlock against cfs_b->period_timer. - * Also keep at least one housekeeping cpu onlined to avoid generating - * an empty sched_domain span. - */ - for_each_cpu_and(cpu, cpu_online_mask, housekeeping_cpumask(HK_TYPE_DOMAI= N)) { - if (cpu !=3D work.cpu) - return work_on_cpu(cpu, __cpu_down_maps_locked, &work); - } - return -EBUSY; + return _cpu_down(cpu, 0, target); } =20 static int cpu_down(unsigned int cpu, enum cpuhp_state target) --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E269F326D54 for ; Fri, 29 Aug 2025 15:48:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482512; cv=none; b=M59JDWiO9tA1kDbx3hhZ63afMW79oQWCF67A2pIDzsRr1ZeYTpW+T38+B0Fe0Dp/HI+XkA2o55i2u+bCSdI0y1b3o0zBRX4LF83B5PN/dITvBRi/xRCa1qPo2+E4sCcNgK+w88zFmInnmZedC707XzzTj8R1bPkfI/sZkJFheQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482512; c=relaxed/simple; bh=Vc04EDItjBp0X7n1yxJukk3IkSy0glb+ldt8bl29/7s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RrzKzdsT7k0AGkI5GDMayd5n+a0BxldvKoEAWg3BfHFkvsX/snvZnWi9wg1jlxxxNO5iUdEMdirRFAk9geYE7wAoAxuNq9MdUdlcYYe3wFjgpvem0INMIcRrvqDsV+8Od1VxALv2EL7b4/vEXRUQ3IEeZZNIPljz1FgA7pCOLn0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Wa8/zBzn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Wa8/zBzn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 63FD9C4CEF0; Fri, 29 Aug 2025 15:48:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482511; bh=Vc04EDItjBp0X7n1yxJukk3IkSy0glb+ldt8bl29/7s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Wa8/zBzn24d587hcSzZFIZNPrU+pNKrPHUthBF8v1o9NY8ztws8/YKkyWqZwno5P2 VTEQsxifGJt5CmuaACbgMPuimdEei+byFBX16twmP13JEbO8x+YpK7uM8Ox6u9rBj3 CdTShUAmqDj4W7arX0GKOrKIz+xomVva87kiSgdK2LlGkrLnwYIoToVNGV/gJ4nyNr ZJGNLdJQrZxeYuwBIxAiXUN+0Z3tkfnOECX30OP+/MCiNpEG1I9OP31pEY5Hpi4hXn XM6bfvHsdjgD3GNOR91xi2W51AMsy+dyQ0XbogVdwkhgcGGbikebtofFx3znCEUvNX A+nPcxZjv5W4g== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Johannes Weiner , Marco Crivellari , Michal Hocko , Muchun Song , Peter Zijlstra , Roman Gushchin , Shakeel Butt , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 04/33] memcg: Prepare to protect against concurrent isolated cpuset change Date: Fri, 29 Aug 2025 17:47:45 +0200 Message-ID: <20250829154814.47015-5-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifyable at runtime. In order to synchronize against memcg workqueue to make sure that no asynchronous draining is pending or executing on a newly made isolated CPU, target and queue a drain work under the same RCU critical section. Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg workqueue flush will also be issued in a further change to make sure that no work remains pending after a CPU has been made isolated. Signed-off-by: Frederic Weisbecker --- mm/memcontrol.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8dd7fbed5a94..2649d6c09160 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1970,6 +1970,13 @@ static bool is_memcg_drain_needed(struct memcg_stock= _pcp *stock, return flush; } =20 +static void schedule_drain_work(int cpu, struct work_struct *work) +{ + guard(rcu)(); + if (!cpu_is_isolated(cpu)) + schedule_work_on(cpu, work); +} + /* * Drains all per-CPU charge caches for given root_memcg resp. subtree * of the hierarchy under it. @@ -1999,8 +2006,8 @@ void drain_all_stock(struct mem_cgroup *root_memcg) &memcg_st->flags)) { if (cpu =3D=3D curcpu) drain_local_memcg_stock(&memcg_st->work); - else if (!cpu_is_isolated(cpu)) - schedule_work_on(cpu, &memcg_st->work); + else + schedule_drain_work(cpu, &memcg_st->work); } =20 if (!test_bit(FLUSHING_CACHED_CHARGE, &obj_st->flags) && @@ -2009,8 +2016,8 @@ void drain_all_stock(struct mem_cgroup *root_memcg) &obj_st->flags)) { if (cpu =3D=3D curcpu) drain_local_obj_stock(&obj_st->work); - else if (!cpu_is_isolated(cpu)) - schedule_work_on(cpu, &obj_st->work); + else + schedule_drain_work(cpu, &obj_st->work); } } migrate_enable(); --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D37932C327 for ; Fri, 29 Aug 2025 15:48:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482514; cv=none; b=CNZKf0LwF9T7zK99sWoVPKM8RupM7y4Q1bC/sdEa+zco4d7/04wkoNKws09jnu49tthneEub96H/IfzBvUlMUQX4LRD1LfJkkH+91IQ6nzzS7gt8TWWuft+jcGhgUiJEdwXgOSd3/o7aer7cPFOBHy1+kEn1kxYKcUOp8rzaOBw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482514; c=relaxed/simple; bh=tKKwBK9xX0xg3Dm/NJEdQ7r5p4nUDr3fqs7FMV7Y/xo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bBPY9125fZBl+/1fIfCGFzQGmToWQ7svMt3wZQHfY3plzHd3hkKrtur7N0WBW/g5Hy7aP19EMyPC1Gt61PITwxWrYYWyHsxtOAnFqX54vf2GgkDNvZA+oZmGhUx3i5Rar+pJzeMrGK53NBJnWnSTFVE5NL6vKdi8XWrob7eGx/A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JBCCVwqS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JBCCVwqS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C938AC4CEF5; Fri, 29 Aug 2025 15:48:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482514; bh=tKKwBK9xX0xg3Dm/NJEdQ7r5p4nUDr3fqs7FMV7Y/xo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JBCCVwqSjumn7XT4pVDMZEQLI5ogqSOLLqGspHQDWXhvD0P1ULdzN5pTeo2YYyls9 R1X0QNhPHFsw7zRULaD26WedF9faUpfaOdmDue3V+AlLjmpiuVjqW6ACLSQh0SNmbr dsb63Wt0czGmui5C6jv6RImZDkq7MhoITNU7+llg+qFQy+uOOTL0n6jNdOXsVzugv5 onzVLJ0B+qvls4RBWOQ5FmzyvIUvBjo6pcO4XanBU+cvUhEGyDwWroQcN/+opq93XB hNCXxaTRCVJ94BZun8eg+Rl+qFXPj1mkU88P5VcLTsoRIioeEpdrZpQDgXZhpkmvMe vvWARYaBRDQGQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , linux-mm@kvack.org Subject: [PATCH 05/33] mm: vmstat: Prepare to protect against concurrent isolated cpuset change Date: Fri, 29 Aug 2025 17:47:46 +0200 Message-ID: <20250829154814.47015-6-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifyable at runtime. In order to synchronize against vmstat workqueue to make sure that no asynchronous vmstat work is pending or executing on a newly made isolated CPU, target and queue a vmstat work under the same RCU read side critical section. Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a vmstat workqueue flush will also be issued in a further change to make sure that no work remains pending after a CPU has been made isolated. Signed-off-by: Frederic Weisbecker --- mm/vmstat.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index 71cd1ceba191..b90325ee49d3 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -2133,11 +2133,13 @@ static void vmstat_shepherd(struct work_struct *w) * infrastructure ever noticing. Skip regular flushing from vmstat_sheph= erd * for all isolated CPUs to avoid interference with the isolated workloa= d. */ - if (cpu_is_isolated(cpu)) - continue; + scoped_guard(rcu) { + if (cpu_is_isolated(cpu)) + continue; =20 - if (!delayed_work_pending(dw) && need_update(cpu)) - queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); + if (!delayed_work_pending(dw) && need_update(cpu)) + queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); + } =20 cond_resched(); } --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AE823314A7 for ; Fri, 29 Aug 2025 15:48:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482517; cv=none; b=n/CA+QAFNfqKWGHPWpNxUby8RRQzwwEAj0MGQ9Aqk+yI8yQU9LfoD9QfYOSyqWf9vPn5PX/ckcWQ9uGgh4yEeeyfMLI8ijlFnxuAR9LWUMDkJsVVvQpFG8lWqfcMErbCdXWkMJ2vUxuNcfADMmVdRzCaM2YSMgoZkAjKsaLyg9Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482517; c=relaxed/simple; bh=YqIjpVaY5D0ks0POHkX9szyfGsLXZLStsFdd5bGAgPs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D6shwqHaoWmm18qXQT8YDJ5griLoxg7pM8+guKZ7mNFNlI+vlLl6gqydass3ejjy9UjS72RAPpZubj05TESBRskuATyDT9vJgLwERDrSqE1SX7KDP4l2byG3Kq88gpAerbxBvWzCMxVBwwRNXTzNWLRdjTzJ8sUReRz3Ys49whg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mYqJB6IP; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mYqJB6IP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 85E13C4CEF5; Fri, 29 Aug 2025 15:48:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482516; bh=YqIjpVaY5D0ks0POHkX9szyfGsLXZLStsFdd5bGAgPs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mYqJB6IPNNUciPYkOIHvqfSulF3gPDnDb22Q5YExqL+GXCfA3hr9bXcTxWXfK8NDy fM4Dg0JRSqe3pMHJI3xTd74WnFm9s/nAHxTRypSuf6mazdlJQgRdQODOn4yZe8wIvV 3i3zZVVDiDgPJY8Gue2WRHq8fJHTAVdBDk1Wp0FUT/i5bbDFD0oG9AqMCp3+GuO1Ue xbgidQrxS6ErGnHMXMEb6HZ8DdJ+gzhWBZMtAxdKihoNrCNp3kdbYZdembO7xsORlY HivMSjCcJqj8nJXLJiTmFEpvxmWI4W4F0AHBk9bFC4Fox8PVbjoPKCK5cYao1ZiSR1 JrBQ69q2HPDmg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Ingo Molnar , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 06/33] sched/isolation: Save boot defined domain flags Date: Fri, 29 Aug 2025 17:47:47 +0200 Message-ID: <20250829154814.47015-7-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" HK_TYPE_DOMAIN will soon integrate not only boot defined isolcpus=3D CPUs but also cpuset isolated partitions. Housekeeping still needs a way to record what was initially passed to isolcpus=3D in order to keep these CPUs isolated after a cpuset isolated partition is modified or destroyed while containing some of them. Create a new HK_TYPE_DOMAIN_BOOT to keep track of those. Signed-off-by: Frederic Weisbecker Reviewed-by: Phil Auld --- include/linux/sched/isolation.h | 1 + kernel/sched/isolation.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index f98ba0d71c52..9262378760b1 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -7,6 +7,7 @@ #include =20 enum hk_type { + HK_TYPE_DOMAIN_BOOT, HK_TYPE_DOMAIN, HK_TYPE_MANAGED_IRQ, HK_TYPE_KERNEL_NOISE, diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 2a6fc6fc46fb..fb414e28706d 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -11,6 +11,7 @@ #include "sched.h" =20 enum hk_flags { + HK_FLAG_DOMAIN_BOOT =3D BIT(HK_TYPE_DOMAIN_BOOT), HK_FLAG_DOMAIN =3D BIT(HK_TYPE_DOMAIN), HK_FLAG_MANAGED_IRQ =3D BIT(HK_TYPE_MANAGED_IRQ), HK_FLAG_KERNEL_NOISE =3D BIT(HK_TYPE_KERNEL_NOISE), @@ -204,7 +205,7 @@ static int __init housekeeping_isolcpus_setup(char *str) =20 if (!strncmp(str, "domain,", 7)) { str +=3D 7; - flags |=3D HK_FLAG_DOMAIN; + flags |=3D HK_FLAG_DOMAIN | HK_FLAG_DOMAIN_BOOT; continue; } =20 @@ -234,7 +235,7 @@ static int __init housekeeping_isolcpus_setup(char *str) =20 /* Default behaviour for isolcpus without flags */ if (!flags) - flags |=3D HK_FLAG_DOMAIN; + flags |=3D HK_FLAG_DOMAIN | HK_FLAG_DOMAIN_BOOT; =20 return housekeeping_setup(str, flags); } --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C482322DB9; Fri, 29 Aug 2025 15:48:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482520; cv=none; b=epq7a1pP5l+cxOhwKS0Ru5/gDXBJJCOBWItxuHbc5wh9Y1QkJybN8nrXvZvrNp2Oy6sEAnACPnR/os6BuFVnjkiM9u0COzAb/uUK7z8XtqyE2/P8LSxxUUd17Ccik6r8P97jqB3iivzUgMVFGocmOu0WrDCWmOZqOIe3QaXRM9E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482520; c=relaxed/simple; bh=78cfTk//lEQ7a6zMtD55eixVrCWYYR/yhAFKP0zepco=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ODOZChezX8BKOwiOwnXiZWMxZYqcKvsRUMdqQXoOnHjdXuwWuA5DILhnAln8NAT0x0jZDrwwZdpwDbKrwAEdCeMSYo97RdOZ4cGXFJA4GhIyro/FX/1WE6t+eFlFeboILE50VZ1haUTLbTYyQHCgJM9OBd9wXeQNqyp3jeIpbWo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rQtAqNGN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rQtAqNGN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EEA25C4CEF6; Fri, 29 Aug 2025 15:48:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482519; bh=78cfTk//lEQ7a6zMtD55eixVrCWYYR/yhAFKP0zepco=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rQtAqNGNqOxuGB4np2oOSlxOm6TShBeQ8RMxJ5ignz+97LVJp3l+gLQ5y/f1lGABl 7wSsgru6emmQ3FxUQRIvLP1Jn90aIL8d/U3od5zuuvc+H1neeqNp/kob/lwCKgaGuk ZsR0vU7UgdQxILNcigf61ln/eXaZnUO18HFoVIampFSaYSQH9JJqOutRO3Pc6H0E9a hsf7bouSN2BBnqj5Qe5xem6aTvQOlSFrgOJWmVIFmEpul2BB041qUpjm+iMkulhsms pxr8JawDYtMwk8q8Zm8y9YRJhYy3JCO/ShLTerLV0eKZYEzAvlJrclPuMqFFnzkMSE d3PYwgx10s5/g== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Johannes Weiner , Marco Crivellari , Michal Hocko , Michal Koutny , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , cgroups@vger.kernel.org Subject: [PATCH 07/33] cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOT Date: Fri, 29 Aug 2025 17:47:48 +0200 Message-ID: <20250829154814.47015-8-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" boot_hk_cpus is an ad-hoc copy of HK_TYPE_DOMAIN_BOOT. Remove it and use the official version. Signed-off-by: Frederic Weisbecker Reviewed-by: Phil Auld --- kernel/cgroup/cpuset.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 27adb04df675..b00d8e3c30ba 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -80,12 +80,6 @@ static cpumask_var_t subpartitions_cpus; */ static cpumask_var_t isolated_cpus; =20 -/* - * Housekeeping (HK_TYPE_DOMAIN) CPUs at boot - */ -static cpumask_var_t boot_hk_cpus; -static bool have_boot_isolcpus; - /* List of remote partition root children */ static struct list_head remote_children; =20 @@ -1601,15 +1595,16 @@ static void remote_cpus_update(struct cpuset *cs, s= truct cpumask *xcpus, * @new_cpus: cpu mask * Return: true if there is conflict, false otherwise * - * CPUs outside of boot_hk_cpus, if defined, can only be used in an + * CPUs outside of HK_TYPE_DOMAIN_BOOT, if defined, can only be used in an * isolated partition. */ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new= _cpus) { - if (!have_boot_isolcpus) + if (!housekeeping_enabled(HK_TYPE_DOMAIN_BOOT)) return false; =20 - if ((prstate !=3D PRS_ISOLATED) && !cpumask_subset(new_cpus, boot_hk_cpus= )) + if ((prstate !=3D PRS_ISOLATED) && + !cpumask_subset(new_cpus, housekeeping_cpumask(HK_TYPE_DOMAIN_BOOT))) return true; =20 return false; @@ -3764,12 +3759,9 @@ int __init cpuset_init(void) =20 BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL)); =20 - have_boot_isolcpus =3D housekeeping_enabled(HK_TYPE_DOMAIN); - if (have_boot_isolcpus) { - BUG_ON(!alloc_cpumask_var(&boot_hk_cpus, GFP_KERNEL)); - cpumask_copy(boot_hk_cpus, housekeeping_cpumask(HK_TYPE_DOMAIN)); - cpumask_andnot(isolated_cpus, cpu_possible_mask, boot_hk_cpus); - } + if (housekeeping_enabled(HK_TYPE_DOMAIN_BOOT)) + cpumask_andnot(isolated_cpus, cpu_possible_mask, + housekeeping_cpumask(HK_TYPE_DOMAIN_BOOT)); =20 return 0; } --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20635326D4F for ; Fri, 29 Aug 2025 15:48:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482523; cv=none; b=C1p9JirNJUbqhMTx//XDc9j8LlQGsPIHGPYmB4Qrcn58q8QxyMrob9M+SUraP+bfKASzzp0foi5V8UENs6Q6baDevAR7ZlsPr654hZoZmMth2TWq+2oTHNGvFXkBu20HsvyfgtiXUmz0c3RfB/B/q/p48DQ6flTl4QBiHJ88nJo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482523; c=relaxed/simple; bh=fgejeggFTJF5KymiXlA1EoAj6cO6G66DeOSRRZSU6VE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TYO4pfaNDvfKNh7rW/e1AyFM+OtNTOFK9F7qNod0BREwYk7TCMVaU+zu9NjbQX800qTX6WQXTClIAejtFxRwxQJxUzgAYNKVJEiis3X2ONTknRjn9Qe3p2y21jhdr1c0Jfs3agsK19wT6/2OCCm+p0ISy5f289xlDeWarOrP2+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RA4C123Y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RA4C123Y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B295C4CEF7; Fri, 29 Aug 2025 15:48:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482522; bh=fgejeggFTJF5KymiXlA1EoAj6cO6G66DeOSRRZSU6VE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RA4C123YmO1uTOx/KKumY63ZjvO2S5ckV0mZi4L6dSyjJhtk2epPfdl+vktamFs5M Ah+6oScoyYKi5QIXO9wQXavJ09G9hstFVe33A3RycYqkGiYfv393n+Q8OrJzkEu8NU xzykTnkDJUg+Z1/DOcnIt3AqK59uvglAOjrq8ifALFC2QKivkyyAdRFgT9bZh6wWxR csFLv6dN0+cPtMfHL6SBB2SudoM3F6r9ZgHAzQXHPmPc6sp6wFMA0kjyKb9QMJT+aJ rIXkDYu4uim1WlenScn5KIYvoDeaoOBfdq16hHdghTQhgLl3eFcllMRRn+qVSbOvcc 3dDUJ20NF7Czg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Danilo Krummrich , Greg Kroah-Hartman , Marco Crivellari , Michal Hocko , Peter Zijlstra , "Rafael J . Wysocki" , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 08/33] driver core: cpu: Convert /sys/devices/system/cpu/isolated to use HK_TYPE_DOMAIN_BOOT Date: Fri, 29 Aug 2025 17:47:49 +0200 Message-ID: <20250829154814.47015-9-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make sure /sys/devices/system/cpu/isolated only prints what was passed through the isolcpus=3D parameter before HK_TYPE_DOMAIN will also integrate cpuset isolated partitions. Signed-off-by: Frederic Weisbecker --- drivers/base/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index efc575a00edd..f448e0b8e56d 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -291,7 +291,7 @@ static ssize_t print_cpus_isolated(struct device *dev, return -ENOMEM; =20 cpumask_andnot(isolated, cpu_possible_mask, - housekeeping_cpumask(HK_TYPE_DOMAIN)); + housekeeping_cpumask(HK_TYPE_DOMAIN_BOOT)); len =3D sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(isolated)); =20 free_cpumask_var(isolated); --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5330322DAB; Fri, 29 Aug 2025 15:48:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482526; cv=none; b=QgC+kip4zBzw598q8JeffhUx14jEp5Cceue9O5Px/8+hejs76KWkAHZRwxUaVbf6TRpGpfesSqcilX//R7bvOMqpIOFdguZjEmA1H8sT0WCMko8eI8YIQIz+EASivisjJWZUORuq2IlsHaeHaenemRixtBNghZZK6tD7uFgnSqk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482526; c=relaxed/simple; bh=U7eyQBqVG5JlKE7cyq1KJwK1IWz3fkCZph3lla7l9vg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L1pL7Qt9MhZt4FdLZ4B5pE1os7i/7qOBove6TUFHLH64oiUotjKlq4peQoXK9wR3k9952fDkCaFGtpE3aLuLH71u8/0pIo2R1GWxqqlVSP6tD6lZSgv4SP7X87VDRaEtog0xfs7cRO82jUmoBtVK8C+j/36WJNLfhI2icZS+KZ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hp0OYd+2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hp0OYd+2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1E329C4CEF5; Fri, 29 Aug 2025 15:48:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482526; bh=U7eyQBqVG5JlKE7cyq1KJwK1IWz3fkCZph3lla7l9vg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hp0OYd+2n5wRqQ6ZKiUimFOUYgcVS8QRf9udh1+NKUfkIjnZskUmuCzBKJ0DaVeNH 9Y7ltHZWPFHAuztmnULe0zLBsabJh5T4s/irezDhU3OUtke1q//NqJ6egIaelEg99w /P5OGicva6Gy+D2NTRU1G4N7AS3QhzkLiN2cxez+7VfrYo6poRd5MRVWbivp1N57or DTMAMHWpSuJ9EW/otuIQa3dGuVJEbRw5MQrGsQNiBz82SH61iriKFVtChdlD+GKehK 8P+h8iNnQmT+KcM5kkAo5Oo82OdVJuATqfXVVmCZ/pKoHjlQlil8BlD3luhMzxkRLW Evp3Xrf8z1ztg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Marco Crivellari , Michal Hocko , Paolo Abeni , Peter Zijlstra , Simon Horman , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , netdev@vger.kernel.org Subject: [PATCH 09/33] net: Keep ignoring isolated cpuset change Date: Fri, 29 Aug 2025 17:47:50 +0200 Message-ID: <20250829154814.47015-10-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" RPS cpumask can be overriden through sysfs/syctl. The boot defined isolated CPUs are then excluded from that cpumask. However HK_TYPE_DOMAIN will soon integrate cpuset isolated CPUs updates and the RPS infrastructure needs more thoughts to be able to propagate such changes and synchronize against them. Keep handling only what was passed through "isolcpus=3D" for now. Signed-off-by: Frederic Weisbecker --- net/core/net-sysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index c28cd6665444..9b0081e444d6 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1022,7 +1022,7 @@ static int netdev_rx_queue_set_rps_mask(struct netdev= _rx_queue *queue, int rps_cpumask_housekeeping(struct cpumask *mask) { if (!cpumask_empty(mask)) { - cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); + cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN_BOOT)); cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ)); if (cpumask_empty(mask)) return -EINVAL; --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E891335BAF; Fri, 29 Aug 2025 15:48:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482529; cv=none; b=XgtyK0G8g03zCADaNMhLv6kLgIXUPuCWtYxEc3ockIcxgKX6GURF4RjdowSrLTYuBFBvSzxZGr9laB+Nl4LQF45uG/nf3pghygpwiitCszXa3r8Uc5OsB0WKws+Y3V2cCFjsRrJ2UyvUH2lYcxF2oJB6o4USjGr4AMaijPwbVa0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482529; c=relaxed/simple; bh=N7iuauUh7P5zsvAS2I3TSB3cjF4+WxMa7Hofz2VW+Gs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r8P7FyrL+4vlLYUG08BRQYZaUVwYIOSrEs5BRlM6nMVY4x9M1c8kXfn0I3eQ6hOXUQ6pdOMADC/QJgj0ZFSQccO7XW/Ior8O4WIVt7BRoXMgn6yIuFkn6H5pF+zqzRY/Qcev5UorlSIm+HM0uU25nEZi1d1Z2kF4INJga8kTK3Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lj7ikhfS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lj7ikhfS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C60BAC4CEF0; Fri, 29 Aug 2025 15:48:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482529; bh=N7iuauUh7P5zsvAS2I3TSB3cjF4+WxMa7Hofz2VW+Gs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lj7ikhfSVP3EdMyviJUXZclCiQF+8nfJ/fekhphSH8XBK4gzNwmLLHl5eZ8c8sIn5 WRCMUoM5chRM73MZR3GKYx/elVgVRCPsPRdhyoUf3U4VF7PaCW6PGfq8kLk/AKyzA4 92tzab9lPVkIWEtb37ChtS1mTCGTLBoTnGQt5GpOS3j1MgYIgxWI5rZHX8geQX/nbW rOqHmg+JAPbbKBcg6oSCU8VVr5ITtvuQA+Kle/2ne/csvIeyB2elA9PEBK1Iu3C9BE 3FfOH38eyqPl37vnPRLzrNlYUxiFjHz0E0t+CUIiWxuKB1n0ZoXc1QdjQLD2tvLknZ qvJzhFMVDTBTQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Jens Axboe , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , linux-block@vger.kernel.org Subject: [PATCH 10/33] block: Protect against concurrent isolated cpuset change Date: Fri, 29 Aug 2025 17:47:51 +0200 Message-ID: <20250829154814.47015-11-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The block subsystem prevents running the workqueue to isolated CPUs, including those defined by cpuset isolated partitions. Since HK_TYPE_DOMAIN will soon contain both and be subject to runtime modifications, synchronize against housekeeping using the relevant lock. For full support of cpuset changes, the block subsystem may need to propagate changes to isolated cpumask through the workqueue in the future. Signed-off-by: Frederic Weisbecker --- block/blk-mq.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ba3a4b77f578..f2d1f2531fca 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4241,12 +4241,16 @@ static void blk_mq_map_swqueue(struct request_queue= *q) =20 /* * Rule out isolated CPUs from hctx->cpumask to avoid - * running block kworker on isolated CPUs + * running block kworker on isolated CPUs. + * FIXME: cpuset should propagate further changes to isolated CPUs + * here. */ + rcu_read_lock(); for_each_cpu(cpu, hctx->cpumask) { if (cpu_is_isolated(cpu)) cpumask_clear_cpu(cpu, hctx->cpumask); } + rcu_read_unlock(); =20 /* * Initialize batch roundrobin counts --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCDC53277A7 for ; Fri, 29 Aug 2025 15:48:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482532; cv=none; b=n4l8cwICahxKAL2L6uN12J+h16J183JqVBjfuOsq6mdsUgFCIhLtRJlmN9J3uxNuqLa/TjWCcs6NVD9CSnMU8yzb+H253vrnCbkpOs+Twph7DOkwLfh2g77AGcDXTayS+rQ3mK++NXIGT8fpi4L9gYHNvfwmHmatg/OlAGt2caw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482532; c=relaxed/simple; bh=069wMqK9bF6D74YIJY8nZAw2J7YjllAfy3UAu+j/YUM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DYZCuJPc0JbF9raQzeWkbyFGT5CPv0wQ3L7wsXKXOSR82Nt3Ka8s2C02Y8dq9Qirtn6olpnzFt98vW7UDvsCUZTAc6XIUh16KSVWIl85dRACiup7/8s/j9WRD67PuRgJRO3108PXGvlP2vbd+rGe6zBT9LWjK6drW17BOdlJ9pA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Rjmbh3Ol; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Rjmbh3Ol" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A59EAC4CEF0; Fri, 29 Aug 2025 15:48:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482531; bh=069wMqK9bF6D74YIJY8nZAw2J7YjllAfy3UAu+j/YUM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Rjmbh3OlGvhaN0dbZ5yNeJrUoVWLD/BjCk7ULf1qSy3haIqWabdL8HqM4GG1mFsRx jgzgaB5upUO/2BF4fmUF0Lzx00uVPiWqDTb7I62rqAaUFKlypIeXs8gXmGhu3XgnsL PZuRkBC4xg0+VssfKSaBa6K/pTdflh7yGllFEfB/AvR++fP0oz2H/E2kcCPPS3ZF/4 SlqIvBX3iapid3lzmeBU0aKhXqD+N/a8lOK9PNjUg92/KFjzdNkq0BLQkI/aNi/xBP 0dPZZR5sn0+sESUqseEoAMZto60pDaHnTGdJI48djTiSgWEaZ0IHuIxaSn8j4WWqu0 Gz8P9EKsOQGJA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Waiman Long Subject: [PATCH 11/33] cpu: Provide lockdep check for CPU hotplug lock write-held Date: Fri, 29 Aug 2025 17:47:52 +0200 Message-ID: <20250829154814.47015-12-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" cpuset modifies partitions, including isolated, while holding the cpu hotplug lock read-held. This means that write-holding the CPU hotplug lock is safe to synchronize against housekeeping cpumask changes. Provide a lockdep check to validate that. Signed-off-by: Frederic Weisbecker --- include/linux/cpuhplock.h | 1 + include/linux/percpu-rwsem.h | 1 + kernel/cpu.c | 5 +++++ 3 files changed, 7 insertions(+) diff --git a/include/linux/cpuhplock.h b/include/linux/cpuhplock.h index f7aa20f62b87..286b3ab92e15 100644 --- a/include/linux/cpuhplock.h +++ b/include/linux/cpuhplock.h @@ -13,6 +13,7 @@ struct device; =20 extern int lockdep_is_cpus_held(void); +extern int lockdep_is_cpus_write_held(void); =20 #ifdef CONFIG_HOTPLUG_CPU void cpus_write_lock(void); diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index 288f5235649a..c8cb010d655e 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -161,6 +161,7 @@ extern void percpu_free_rwsem(struct percpu_rw_semaphor= e *); __percpu_init_rwsem(sem, #sem, &rwsem_key); \ }) =20 +#define percpu_rwsem_is_write_held(sem) lockdep_is_held_type(sem, 0) #define percpu_rwsem_is_held(sem) lockdep_is_held(sem) #define percpu_rwsem_assert_held(sem) lockdep_assert_held(sem) =20 diff --git a/kernel/cpu.c b/kernel/cpu.c index 453a806af2ee..3b0443f7c486 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -534,6 +534,11 @@ int lockdep_is_cpus_held(void) { return percpu_rwsem_is_held(&cpu_hotplug_lock); } + +int lockdep_is_cpus_write_held(void) +{ + return percpu_rwsem_is_write_held(&cpu_hotplug_lock); +} #endif =20 static void lockdep_acquire_cpus_lock(void) --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 51B6A32A3C7; Fri, 29 Aug 2025 15:48:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482534; cv=none; b=PgHKlYfcXNt1c63nFkO5qp6hEs2oDiqyw09MYQiwmoMsrwDIir3sKQXczwtKZfjOwh7ojzelBWKiQ0S9Akm9hkTJ30wD6MxUpNG3W2qXQOh8jypEhWtnTWAxZ/V64D7brwfay0tahGuIGTnYpXtyLWInhLnY62DUtUywvok7ZTs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482534; c=relaxed/simple; bh=LPfGRQUCouLlYMYagUni4E5aLlawCBT0lOrgRGdT9YM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uwDBbhi8FJbe1Lz0+BCORwQRIMh4wgW5V2mKqdP65Kr6x/CkdwqjSGawF7bMDAv64FeKrw8Mhk0ul1lgS/+06QkeNvFRZdWrKbKiR08XBFPrsEIY8s1enc/pld7K5hsEr5XTPRQEGW9c0NPP7Hv0OEbwEYF9QmgJtevpyewy50c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ENEga7Da; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ENEga7Da" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3396C4CEF6; Fri, 29 Aug 2025 15:48:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482534; bh=LPfGRQUCouLlYMYagUni4E5aLlawCBT0lOrgRGdT9YM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ENEga7Dam+kNI8dPxGhGkV8vI6v/8iw/viFTgf36zq726dg9TrY/N8US8GvzpfpFE l5KBzDsoznlioAIEtdNOcA7JpgrNvgYM7FBLPBLpmViRE2xLTtzcbBAOce4GIycC1c UvWwmQZnnbpfONbkQ32MGghq9QWK+bWWITEf1TOUNxLuuYZFD7+WD4DRCYm2RJt2H2 rMf5iaKEkP/Abfc7UmAZaHt+HASHdqxbrfFM0sZzRB+XoU1VIIKilMkAVDg7ZJ9xzA 81l2ACdYvTlJbyTw3+qrJpOLClcffmd4UoJvAfPTQMp4VupfosmAVh9JWEwSHE6r9z GVbTFOkLpLpyA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Johannes Weiner , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , cgroups@vger.kernel.org Subject: [PATCH 12/33] cpuset: Provide lockdep check for cpuset lock held Date: Fri, 29 Aug 2025 17:47:53 +0200 Message-ID: <20250829154814.47015-13-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" cpuset modifies partitions, including isolated, while holding the cpuset mutex. This means that holding the cpuset mutex is safe to synchronize against housekeeping cpumask changes. Provide a lockdep check to validate that. Signed-off-by: Frederic Weisbecker --- include/linux/cpuset.h | 2 ++ kernel/cgroup/cpuset.c | 7 +++++++ 2 files changed, 9 insertions(+) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 2ddb256187b5..051d36fec578 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -18,6 +18,8 @@ #include #include =20 +extern bool lockdep_is_cpuset_held(void); + #ifdef CONFIG_CPUSETS =20 /* diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index b00d8e3c30ba..2d2fc74bc00c 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -254,6 +254,13 @@ void cpuset_unlock(void) mutex_unlock(&cpuset_mutex); } =20 +#ifdef CONFIG_LOCKDEP +bool lockdep_is_cpuset_held(void) +{ + return lockdep_is_held(&cpuset_mutex); +} +#endif + static DEFINE_SPINLOCK(callback_lock); =20 void cpuset_callback_lock_irq(void) --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E64043277B4 for ; Fri, 29 Aug 2025 15:48:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482537; cv=none; b=QeODJUd1O+05+qnvEh6IORg9sJtXjgfurTwGUIBdpR0oFTiFP9pzW0oNjT7lqzzP1UpRSB6O2QHGm3TARfccHzKCxGuVCDiVkvsQqYrtUzipo2OPvF7vPfeJ1Ank8aTtySX3jTdNUDTAgrtQWN7yz8IRHSXfLR+3hotoTjvEQkQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482537; c=relaxed/simple; bh=JhJw3/T9u/hnpbb5ctajFxg6YPqtzMSvHrfpSvw6ITg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kKLrf/RBUNX3TvdDzfbzlg7kAqfthYhrt1DNhs4Xw9TuDhK1ikIhjDO8wn0Mtx3P4MHWd+GZhatB9TRaXBWLAAGBElaLE+jr0xDYYMGF3TcQWaFo+rfsd0sMN/Gtp+dQbF3ISgtK6EmuaQHpwUHDAtjvSBnjJPRSu5h5+UchuOY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=k5wNbLiO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="k5wNbLiO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90E69C4CEF0; Fri, 29 Aug 2025 15:48:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482536; bh=JhJw3/T9u/hnpbb5ctajFxg6YPqtzMSvHrfpSvw6ITg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=k5wNbLiOZzRvWvaRCJwSAcgWMUTKX0/llfwJIUnSlAWSODxwpafPzpuSR7UFeTHDa 8FRZg/DFMUwculwr27bBoYkMqMLQwHpXWAXW5EE0HHM4yceYoxGVGLXttHw5W9zHUd chwG6QpK/2RxhGztQb1IqxKxRbhEZw96boL1S1YOVGM+QxyAdpSKbYlFQ6xU0BzMU3 6fjIAk1xQ+rRySBbdg4uTTgSLZwSHeud6yeGNbm2e7FhXokUQL9YadKU72PPZ80Lr7 ZUa04sFVTfXpnaQRgP8vpCL0Rh0rB0nphlS90GMIDjfJ7qaHoJvAqlwUrQcU4z+muk hh2VYwgsZPrJw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Ingo Molnar , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 13/33] sched/isolation: Convert housekeeping cpumasks to rcu pointers Date: Fri, 29 Aug 2025 17:47:54 +0200 Message-ID: <20250829154814.47015-14-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" HK_TYPE_DOMAIN's cpumask will soon be made modifyable by cpuset. A synchronization mechanism is then needed to synchronize the updates with the housekeeping cpumask readers. Turn the housekeeping cpumasks into RCU pointers. Once a housekeeping cpumask will be modified, the update side will wait for an RCU grace period and propagate the change to interested subsystem when deemed necessary. Signed-off-by: Frederic Weisbecker --- kernel/sched/isolation.c | 52 ++++++++++++++++++++++++++-------------- kernel/sched/sched.h | 1 + 2 files changed, 35 insertions(+), 18 deletions(-) diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index fb414e28706d..5ddb8dc5ca91 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -17,7 +17,7 @@ enum hk_flags { HK_FLAG_KERNEL_NOISE =3D BIT(HK_TYPE_KERNEL_NOISE), }; =20 -static cpumask_var_t housekeeping_cpumasks[HK_TYPE_MAX]; +static struct cpumask __rcu *housekeeping_cpumasks[HK_TYPE_MAX]; unsigned long housekeeping_flags; EXPORT_SYMBOL_GPL(housekeeping_flags); =20 @@ -27,16 +27,25 @@ bool housekeeping_enabled(enum hk_type type) } EXPORT_SYMBOL_GPL(housekeeping_enabled); =20 +const struct cpumask *housekeeping_cpumask(enum hk_type type) +{ + if (housekeeping_flags & BIT(type)) { + return rcu_dereference_check(housekeeping_cpumasks[type], 1); + } + return cpu_possible_mask; +} +EXPORT_SYMBOL_GPL(housekeeping_cpumask); + int housekeeping_any_cpu(enum hk_type type) { int cpu; =20 if (housekeeping_flags & BIT(type)) { - cpu =3D sched_numa_find_closest(housekeeping_cpumasks[type], smp_process= or_id()); + cpu =3D sched_numa_find_closest(housekeeping_cpumask(type), smp_processo= r_id()); if (cpu < nr_cpu_ids) return cpu; =20 - cpu =3D cpumask_any_and_distribute(housekeeping_cpumasks[type], cpu_onli= ne_mask); + cpu =3D cpumask_any_and_distribute(housekeeping_cpumask(type), cpu_onlin= e_mask); if (likely(cpu < nr_cpu_ids)) return cpu; /* @@ -52,25 +61,17 @@ int housekeeping_any_cpu(enum hk_type type) } EXPORT_SYMBOL_GPL(housekeeping_any_cpu); =20 -const struct cpumask *housekeeping_cpumask(enum hk_type type) -{ - if (housekeeping_flags & BIT(type)) - return housekeeping_cpumasks[type]; - return cpu_possible_mask; -} -EXPORT_SYMBOL_GPL(housekeeping_cpumask); - void housekeeping_affine(struct task_struct *t, enum hk_type type) { if (housekeeping_flags & BIT(type)) - set_cpus_allowed_ptr(t, housekeeping_cpumasks[type]); + set_cpus_allowed_ptr(t, housekeeping_cpumask(type)); } EXPORT_SYMBOL_GPL(housekeeping_affine); =20 bool housekeeping_test_cpu(int cpu, enum hk_type type) { if (housekeeping_flags & BIT(type)) - return cpumask_test_cpu(cpu, housekeeping_cpumasks[type]); + return cpumask_test_cpu(cpu, housekeeping_cpumask(type)); return true; } EXPORT_SYMBOL_GPL(housekeeping_test_cpu); @@ -85,9 +86,23 @@ void __init housekeeping_init(void) if (housekeeping_flags & HK_FLAG_KERNEL_NOISE) sched_tick_offload_init(); =20 + /* + * Realloc with a proper allocator so that any cpumask update + * can indifferently free the old version with kfree(). + */ for_each_set_bit(type, &housekeeping_flags, HK_TYPE_MAX) { + struct cpumask *omask, *nmask =3D kmalloc(cpumask_size(), GFP_KERNEL); + + if (WARN_ON_ONCE(!nmask)) + return; + + omask =3D rcu_dereference(housekeeping_cpumasks[type]); + /* We need at least one CPU to handle housekeeping work */ - WARN_ON_ONCE(cpumask_empty(housekeeping_cpumasks[type])); + WARN_ON_ONCE(cpumask_empty(omask)); + cpumask_copy(nmask, omask); + RCU_INIT_POINTER(housekeeping_cpumasks[type], nmask); + memblock_free(omask, cpumask_size()); } } =20 @@ -95,9 +110,10 @@ static void __init housekeeping_setup_type(enum hk_type= type, cpumask_var_t housekeeping_staging) { =20 - alloc_bootmem_cpumask_var(&housekeeping_cpumasks[type]); - cpumask_copy(housekeeping_cpumasks[type], - housekeeping_staging); + struct cpumask *mask =3D memblock_alloc_or_panic(cpumask_size(), SMP_CACH= E_BYTES); + + cpumask_copy(mask, housekeeping_staging); + RCU_INIT_POINTER(housekeeping_cpumasks[type], mask); } =20 static int __init housekeeping_setup(char *str, unsigned long flags) @@ -150,7 +166,7 @@ static int __init housekeeping_setup(char *str, unsigne= d long flags) =20 for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) { if (!cpumask_equal(housekeeping_staging, - housekeeping_cpumasks[type])) { + housekeeping_cpumask(type))) { pr_warn("Housekeeping: nohz_full=3D must match isolcpus=3D\n"); goto free_housekeeping_staging; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index be9745d104f7..0b1a233dcabf 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E7713277B4; Fri, 29 Aug 2025 15:49:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482540; cv=none; b=cxwzyhB0h5Omd7FVMcL6+07AzqX5shrD456UxZZ2/SJURmpmKb9eNoGDCGhcwgmqwThAt2W9LyDHOJgms4LRV9oVkRfTFAcjsL3hIemK8t8A2P6KAX7I0fZTw/AbXJxLWRfruEX7RFHFyFJbcUYFidFf9/fXaz0dySJSazqOrmY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482540; c=relaxed/simple; bh=g05kmaCWh9kz1yQbOgKAwo02zbhbBJem0FHbC3D9l4A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SQUvZqwSSnd9c1XDiyGRKFag9m8H3NXFAiYqQxZyXFuOTJgjSaE76aA6d3N7DCQmfoOTRdo39sJnkHrS6pPMgj4zVAFnY8+BIHChaXR7Wi+clqSsJcU6qSl4Tb3npO+phdhNxzkat5gC6j4hL1enl2EGWj5szzNIwVYqPFToKKU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NGTUIbf4; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NGTUIbf4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3C553C4CEF6; Fri, 29 Aug 2025 15:48:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482539; bh=g05kmaCWh9kz1yQbOgKAwo02zbhbBJem0FHbC3D9l4A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NGTUIbf4zz1cc1V5OW0z+uuFGmmh71QLrS0tzI72BsW6ksimZ6s8fGz4XRCyjZVHO Ud9N+Ls6gp6mfjeQ5d2NXR9Rv+DO2EbERvrRVJ/O44hHoUm2x8XZ9ECWeN39YhbH/p 3gXeWAqmogFmZ52hpwv0j0AfOsXgIimnCQYKGDr5FP8YPHUcHWiTjFJEcOMY6iombU b+nsu7vYxOwRR2w3yzk4BNwfzFLxvIn2dbMZWQojn3JNfo3yhF6NMeNhOZFTMIt+8L rYGNTP/lC8zZ2tkJDnludtyxiDmn2VZHO8afTThalTdifySD47ISQ6usdbtUfPwji1 Xdqnul7SAbJnA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Ingo Molnar , Johannes Weiner , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , cgroups@vger.kernel.org Subject: [PATCH 14/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset Date: Fri, 29 Aug 2025 17:47:55 +0200 Message-ID: <20250829154814.47015-15-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Until now, HK_TYPE_DOMAIN used to only include boot defined isolated CPUs passed through isolcpus=3D boot option. Users interested in also knowing the runtime defined isolated CPUs through cpuset must use different APIs: cpuset_cpu_is_isolated(), cpu_is_isolated(), etc... There are many drawbacks to that approach: 1) Most interested subsystems want to know about all isolated CPUs, not just those defined on boot time. 2) cpuset_cpu_is_isolated() / cpu_is_isolated() are not synchronized with concurrent cpuset changes. 3) Further cpuset modifications are not propagated to subsystems Solve 1) and 2) and centralize all isolated CPUs within the HK_TYPE_DOMAIN housekeeping cpumask. Subsystems can rely on RCU to synchronize against concurrent changes. The propagation mentioned in 3) will be handled in further patches. Signed-off-by: Frederic Weisbecker --- include/linux/sched/isolation.h | 4 +- kernel/cgroup/cpuset.c | 2 + kernel/sched/isolation.c | 65 ++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 1 + 4 files changed, 65 insertions(+), 7 deletions(-) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index 9262378760b1..199d0fc4646f 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -36,12 +36,13 @@ extern bool housekeeping_test_cpu(int cpu, enum hk_type= type); =20 static inline bool housekeeping_cpu(int cpu, enum hk_type type) { - if (housekeeping_flags & BIT(type)) + if (READ_ONCE(housekeeping_flags) & BIT(type)) return housekeeping_test_cpu(cpu, type); else return true; } =20 +extern int housekeeping_update(struct cpumask *mask, enum hk_type type); extern void __init housekeeping_init(void); =20 #else @@ -74,6 +75,7 @@ static inline bool housekeeping_cpu(int cpu, enum hk_type= type) return true; } =20 +static inline int housekeeping_update(struct cpumask *mask, enum hk_type t= ype) { return 0; } static inline void housekeeping_init(void) { } #endif /* CONFIG_CPU_ISOLATION */ =20 diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 2d2fc74bc00c..4f2bc68332a7 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1351,6 +1351,8 @@ static void update_unbound_workqueue_cpumask(bool iso= lcpus_updated) =20 ret =3D workqueue_unbound_exclude_cpumask(isolated_cpus); WARN_ON_ONCE(ret < 0); + ret =3D housekeeping_update(isolated_cpus, HK_TYPE_DOMAIN); + WARN_ON_ONCE(ret < 0); } =20 /** diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 5ddb8dc5ca91..48f3b6b20604 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -23,16 +23,39 @@ EXPORT_SYMBOL_GPL(housekeeping_flags); =20 bool housekeeping_enabled(enum hk_type type) { - return !!(housekeeping_flags & BIT(type)); + return !!(READ_ONCE(housekeeping_flags) & BIT(type)); } EXPORT_SYMBOL_GPL(housekeeping_enabled); =20 +static bool housekeeping_dereference_check(enum hk_type type) +{ + if (type =3D=3D HK_TYPE_DOMAIN) { + if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && lockdep_is_cpus_write_held()) + return true; + if (IS_ENABLED(CONFIG_CPUSETS) && lockdep_is_cpuset_held()) + return true; + + return false; + } + + return true; +} + +static inline struct cpumask *__housekeeping_cpumask(enum hk_type type) +{ + return rcu_dereference_check(housekeeping_cpumasks[type], + housekeeping_dereference_check(type)); +} + const struct cpumask *housekeeping_cpumask(enum hk_type type) { - if (housekeeping_flags & BIT(type)) { - return rcu_dereference_check(housekeeping_cpumasks[type], 1); - } - return cpu_possible_mask; + const struct cpumask *mask =3D NULL; + + if (READ_ONCE(housekeeping_flags) & BIT(type)) + mask =3D __housekeeping_cpumask(type); + if (!mask) + mask =3D cpu_possible_mask; + return mask; } EXPORT_SYMBOL_GPL(housekeeping_cpumask); =20 @@ -70,12 +93,42 @@ EXPORT_SYMBOL_GPL(housekeeping_affine); =20 bool housekeeping_test_cpu(int cpu, enum hk_type type) { - if (housekeeping_flags & BIT(type)) + if (READ_ONCE(housekeeping_flags) & BIT(type)) return cpumask_test_cpu(cpu, housekeeping_cpumask(type)); return true; } EXPORT_SYMBOL_GPL(housekeeping_test_cpu); =20 +int housekeeping_update(struct cpumask *mask, enum hk_type type) +{ + struct cpumask *trial, *old =3D NULL; + + if (type !=3D HK_TYPE_DOMAIN) + return -ENOTSUPP; + + trial =3D kmalloc(sizeof(*trial), GFP_KERNEL); + if (!trial) + return -ENOMEM; + + cpumask_andnot(trial, housekeeping_cpumask(HK_TYPE_DOMAIN_BOOT), mask); + if (!cpumask_intersects(trial, cpu_online_mask)) { + kfree(trial); + return -EINVAL; + } + + if (housekeeping_flags & BIT(type)) + old =3D __housekeeping_cpumask(type); + else + WRITE_ONCE(housekeeping_flags, housekeeping_flags | BIT(type)); + rcu_assign_pointer(housekeeping_cpumasks[type], trial); + + synchronize_rcu(); + + kfree(old); + + return 0; +} + void __init housekeeping_init(void) { enum hk_type type; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0b1a233dcabf..d3512138027b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB849338F38; Fri, 29 Aug 2025 15:49:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482543; cv=none; b=i+eavuLFilH0YwzLAMv5Vdg8QK/Q4Nr4kR5GfzWIL1yvST/ZpXc63py0WkUkH6zaNwUcCcl0npuPKFAD/T2w+miWFV8p0Qcw6KtIzufYp1Tivn4JhvrkQfYdpXcywyhtKXz9J56ydpSX+jXLltFEs/+g/I0qW7lAQJJPGR8C23s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482543; c=relaxed/simple; bh=/fRsHmLrTF09mQZ2vEOSlW4cMULF+WniR3U+xY0TsLU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OjtxjvBp40XnvGclDHu3a8/z6eN3Cu7M1adQJPpESOiulA8RKT276lFg7uQkXu//5GscYRRgHu7CPKdtdB3OPT0KKozV0HBT9YNXMMUmIlthPpqmQWZJ/JaN9kB1aFs0FIhzy55MxxUsuA1lg8JG5jFkgHFeIq6zic6ulR3NT9I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=C8AXhZl+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="C8AXhZl+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C5F7C4CEF0; Fri, 29 Aug 2025 15:49:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482543; bh=/fRsHmLrTF09mQZ2vEOSlW4cMULF+WniR3U+xY0TsLU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=C8AXhZl+gXT7Wn4j0WTzrl75mnwgtlWombLUDLZsESBSf1B1sbjpjnwfsvn+zVGE4 1k31OZAdZNpqkhpeMU6kbL3YO//pWMNEdGM8v6qGjRlUGQZPtmhefUAyVD4Lgo9eBo SUAto569iRvjZecY1rEQI8w4bgovBDYW2mx70TPBAXYobui3Ew/e+7B2uWGQfU7BaW 8g2hpGYcq0tOuirdQu+z+nu0gLWlISIEIXTakrbzjAWze60tmue26KzabRJhD0F5eR 7yHNlS+5eup2c/DeXHLgHzG510t+Bj7V1AqmDMon7edVCGP2bG18HSD5dICpyTBGCg XLCm26U0+Wz+A== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Ingo Molnar , Johannes Weiner , Marco Crivellari , Michal Hocko , Muchun Song , Peter Zijlstra , Roman Gushchin , Shakeel Butt , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 15/33] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Date: Fri, 29 Aug 2025 17:47:56 +0200 Message-ID: <20250829154814.47015-16-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The HK_TYPE_DOMAIN housekeeping cpumask is now modifyable at runtime. In order to synchronize against memcg workqueue to make sure that no asynchronous draining is still pending or executing on a newly made isolated CPU, the housekeeping susbsystem must flush the memcg workqueues. However the memcg workqueues can't be flushed easily since they are queued to the main per-CPU workqueue pool. Solve this with creating a memcg specific pool and provide and use the appropriate flushing API. Acked-by: Shakeel Butt Signed-off-by: Frederic Weisbecker --- include/linux/memcontrol.h | 4 ++++ kernel/sched/isolation.c | 2 ++ kernel/sched/sched.h | 1 + mm/memcontrol.c | 12 +++++++++++- 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 785173aa0739..8b23ff000473 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1048,6 +1048,8 @@ static inline u64 cgroup_id_from_mm(struct mm_struct = *mm) return id; } =20 +void mem_cgroup_flush_workqueue(void); + extern int mem_cgroup_init(void); #else /* CONFIG_MEMCG */ =20 @@ -1453,6 +1455,8 @@ static inline u64 cgroup_id_from_mm(struct mm_struct = *mm) return 0; } =20 +static inline void mem_cgroup_flush_workqueue(void) { } + static inline int mem_cgroup_init(void) { return 0; } #endif /* CONFIG_MEMCG */ =20 diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 48f3b6b20604..e85f402b103a 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -124,6 +124,8 @@ int housekeeping_update(struct cpumask *mask, enum hk_t= ype type) =20 synchronize_rcu(); =20 + mem_cgroup_flush_workqueue(); + kfree(old); =20 return 0; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d3512138027b..1dad1ac7fc61 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -44,6 +44,7 @@ #include #include #include +#include #include #include #include diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2649d6c09160..1aa2dfa32ccd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -95,6 +95,8 @@ static bool cgroup_memory_nokmem __ro_after_init; /* BPF memory accounting disabled? */ static bool cgroup_memory_nobpf __ro_after_init; =20 +static struct workqueue_struct *memcg_wq __ro_after_init; + static struct kmem_cache *memcg_cachep; static struct kmem_cache *memcg_pn_cachep; =20 @@ -1974,7 +1976,7 @@ static void schedule_drain_work(int cpu, struct work_= struct *work) { guard(rcu)(); if (!cpu_is_isolated(cpu)) - schedule_work_on(cpu, work); + queue_work_on(cpu, memcg_wq, work); } =20 /* @@ -5071,6 +5073,11 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *me= mcg, unsigned int nr_pages) refill_stock(memcg, nr_pages); } =20 +void mem_cgroup_flush_workqueue(void) +{ + flush_workqueue(memcg_wq); +} + static int __init cgroup_memory(char *s) { char *token; @@ -5113,6 +5120,9 @@ int __init mem_cgroup_init(void) cpuhp_setup_state_nocalls(CPUHP_MM_MEMCQ_DEAD, "mm/memctrl:dead", NULL, memcg_hotplug_cpu_dead); =20 + memcg_wq =3D alloc_workqueue("memcg", 0, 0); + WARN_ON(!memcg_wq); + for_each_possible_cpu(cpu) { INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work, drain_local_memcg_stock); --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E360732BF25 for ; Fri, 29 Aug 2025 15:49:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482547; cv=none; b=aIWaLI85sM1dCTQFe7dYx+saNr3n+Q3qTl9WflNACbU8xXs08s7BAMJpFwKNMKC7Bz5LWIO4vCCuOoMGPLoaZYwYwWRS2225XqVRTuj6s/OB/0/qkx/dfVyM8J6XsABoaR3Wz4C7DEPezEQNrhRVQgiuSWBZq8trT+2sznemrKc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482547; c=relaxed/simple; bh=2AKrGqjcDEQNktrx0AjvGJUm44a7opBORKSmeOdVWP8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DAkwsqnIYnKEl78TQGbYdqmx4BrfKnRnGS4f4+iUqKJiYaYnm6APqCMM67JiBJptFUr/cs++HIyIZ25mSqY0l7RlO3sXBe2L8TNILiGYuihRf+9/qhnUpVWhYBFAxEApoKBtTlOS176imd/Qwez3vp2B9qKzkkHGMnswyexf7Q8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Txvuw8E5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Txvuw8E5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF56EC4CEF5; Fri, 29 Aug 2025 15:49:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482546; bh=2AKrGqjcDEQNktrx0AjvGJUm44a7opBORKSmeOdVWP8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Txvuw8E5yTPsG09Tjpv6OmMLkfH+kmFBRxaLHs+eyP+UoV4kakVxkflji9xw907jH sQp8aBkSqwIXZZRf9IdQ/UZT9lannJILBTTPoyVlTDB1KS7AHAGhl0g0HVqHhHTnxk ZZDB3UtJ8VWcifGeJjmlJ42eHT9rLJHLF1GQ6F2RC+FF+yfYvZsDLt0EOLW2lkZucL xWOgvL+ENrJobKRaKvx1bVMc+dWycLyO/F7g7kk3RIZdm17FK3FrWGhGt3vDq1Mn8r wsDWhF4ZBpyflA3e+OCuLBS72S5hhtvj9evyI4zNoLHczcOE2GAzORnyqejY1wbhdY ReuZz+MKUTF+g== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Ingo Molnar , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , linux-mm@kvack.org Subject: [PATCH 16/33] sched/isolation: Flush vmstat workqueues on cpuset isolated partition change Date: Fri, 29 Aug 2025 17:47:57 +0200 Message-ID: <20250829154814.47015-17-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The HK_TYPE_DOMAIN housekeeping cpumask is now modifyable at runtime. In order to synchronize against vmstat workqueue to make sure that no asynchronous vmstat work is still pending or executing on a newly made isolated CPU, the housekeeping susbsystem must flush the vmstat workqueues. This involves flushing the whole mm_percpu_wq workqueue, shared with LRU drain, introducing here a welcome side effect. Signed-off-by: Frederic Weisbecker --- include/linux/vmstat.h | 2 ++ kernel/sched/isolation.c | 1 + kernel/sched/sched.h | 1 + mm/vmstat.c | 5 +++++ 4 files changed, 9 insertions(+) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index c287998908bf..a81aa5635b47 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -303,6 +303,7 @@ int calculate_pressure_threshold(struct zone *zone); int calculate_normal_threshold(struct zone *zone); void set_pgdat_percpu_threshold(pg_data_t *pgdat, int (*calculate_pressure)(struct zone *)); +void vmstat_flush_workqueue(void); #else /* CONFIG_SMP */ =20 /* @@ -403,6 +404,7 @@ static inline void __dec_node_page_state(struct page *p= age, static inline void refresh_zone_stat_thresholds(void) { } static inline void cpu_vm_stats_fold(int cpu) { } static inline void quiet_vmstat(void) { } +static inline void vmstat_flush_workqueue(void) { } =20 static inline void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) { } diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index e85f402b103a..86ce39aa1e9f 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -125,6 +125,7 @@ int housekeeping_update(struct cpumask *mask, enum hk_t= ype type) synchronize_rcu(); =20 mem_cgroup_flush_workqueue(); + vmstat_flush_workqueue(); =20 kfree(old); =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1dad1ac7fc61..2d4de083200a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -68,6 +68,7 @@ #include #include #include +#include #include #include #include diff --git a/mm/vmstat.c b/mm/vmstat.c index b90325ee49d3..69412b61fe1b 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -2113,6 +2113,11 @@ static void vmstat_shepherd(struct work_struct *w); =20 static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd); =20 +void vmstat_flush_workqueue(void) +{ + flush_workqueue(mm_percpu_wq); +} + static void vmstat_shepherd(struct work_struct *w) { int cpu; --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 638D232BF3F; Fri, 29 Aug 2025 15:49:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482550; cv=none; b=VD9lZz6Pz9iqDNEQkZs8YlYxTsAHpgcV5iWUB7SEKJPcdXSYyWtItT+qZaSKbk9BSPDoC/O2VpBghFeT3H6SXE4VbyqQLVmwpacvzWW4r29/a+BwnR06zvfgkc/tsnG0FbY/pWR0Z//R8ATtrTleYjkv+Iub4CZJr1N4w3SCSBU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482550; c=relaxed/simple; bh=0MmDFM8cuV7a81MntpOUNOOq2tPG55qK8I3sBVoNvr8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gDG4Y4FJKNm1U2xB4rfWVMO2zE1Ikzl+uAXOhPNOzzh54cny0z5J/QpLJuwxUAolRnssGfV1O1NGQn6pCaEoNjA+B+hG3AD3LC+YnNTCT6qrWZf4/SWp8yWIp5QMAl+9VnKYSTnI2xoQD3+uudr7yX2i9MPIvflW0QANKxC4h80= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gpArFLeU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gpArFLeU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05701C4CEF6; Fri, 29 Aug 2025 15:49:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482549; bh=0MmDFM8cuV7a81MntpOUNOOq2tPG55qK8I3sBVoNvr8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gpArFLeUPdjT2XzgUdMI+hLmDi2Bvuj3aVpBZLqtbJnR5s5IS5cT7De5UKRfxtvh4 tIn/wBTFAdsw8xjSxw4xK0dNjUo2Y5bQFuKhfaF+TS7999GGLV4j5z3TpFooKtOlVb of63g+avpmpqFMYEDSDyCcl/avQjoyMSPH0HyzpHRqbXrSHYCr12CIqbMT/KhrcemO 41EJXlhKpwagSWE+hR6a51gloSKkHFmgllksobGZLm7ueTmouH+GNWHtPkIWudncWT 9mnQJ+sqzwDHpV35iOLC3BRO+7c98+z+plz75CbiTEjD/VybBSqIWjD0R1FMQsh1ae LpRKupQtrVXeA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Ingo Molnar , Johannes Weiner , Lai Jiangshan , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , cgroups@vger.kernel.org Subject: [PATCH 17/33] cpuset: Propagate cpuset isolation update to workqueue through housekeeping Date: Fri, 29 Aug 2025 17:47:58 +0200 Message-ID: <20250829154814.47015-18-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Until now, cpuset would propagate isolated partition changes to workqueues so that unbound workers get properly reaffined. Since housekeeping now centralizes, synchronize and propagates isolation cpumask changes, perform the work from that subsystem for consolidation and consistency purposes. Suggested-by: Tejun Heo Signed-off-by: Frederic Weisbecker --- include/linux/workqueue.h | 2 +- init/Kconfig | 1 + kernel/cgroup/cpuset.c | 14 ++++++-------- kernel/sched/isolation.c | 4 +++- kernel/workqueue.c | 2 +- 5 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 45d5dd470ff6..19fee865ce2a 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -588,7 +588,7 @@ struct workqueue_attrs *alloc_workqueue_attrs_noprof(vo= id); void free_workqueue_attrs(struct workqueue_attrs *attrs); int apply_workqueue_attrs(struct workqueue_struct *wq, const struct workqueue_attrs *attrs); -extern int workqueue_unbound_exclude_cpumask(cpumask_var_t cpumask); +extern int workqueue_unbound_exclude_cpumask(const struct cpumask *cpumask= ); =20 extern bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work); diff --git a/init/Kconfig b/init/Kconfig index 836320251219..af05cf89db12 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1230,6 +1230,7 @@ config CPUSETS bool "Cpuset controller" depends on SMP select UNION_FIND + select CPU_ISOLATION help This option will let you create and manage CPUSETs which allow dynamically partitioning a system into sets of CPUs and diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 4f2bc68332a7..eb8d01d23af6 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1340,7 +1340,7 @@ static bool partition_xcpus_del(int old_prs, struct c= puset *parent, return isolcpus_updated; } =20 -static void update_unbound_workqueue_cpumask(bool isolcpus_updated) +static void update_housekeeping_cpumask(bool isolcpus_updated) { int ret; =20 @@ -1349,8 +1349,6 @@ static void update_unbound_workqueue_cpumask(bool iso= lcpus_updated) if (!isolcpus_updated) return; =20 - ret =3D workqueue_unbound_exclude_cpumask(isolated_cpus); - WARN_ON_ONCE(ret < 0); ret =3D housekeeping_update(isolated_cpus, HK_TYPE_DOMAIN); WARN_ON_ONCE(ret < 0); } @@ -1473,7 +1471,7 @@ static int remote_partition_enable(struct cpuset *cs,= int new_prs, list_add(&cs->remote_sibling, &remote_children); cpumask_copy(cs->effective_xcpus, tmp->new_cpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_housekeeping_cpumask(isolcpus_updated); cpuset_force_rebuild(); cs->prs_err =3D 0; =20 @@ -1514,7 +1512,7 @@ static void remote_partition_disable(struct cpuset *c= s, struct tmpmasks *tmp) compute_effective_exclusive_cpumask(cs, NULL, NULL); reset_partition_data(cs); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_housekeeping_cpumask(isolcpus_updated); cpuset_force_rebuild(); =20 /* @@ -1583,7 +1581,7 @@ static void remote_cpus_update(struct cpuset *cs, str= uct cpumask *xcpus, if (xcpus) cpumask_copy(cs->exclusive_cpus, xcpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_housekeeping_cpumask(isolcpus_updated); if (adding || deleting) cpuset_force_rebuild(); =20 @@ -1947,7 +1945,7 @@ static int update_parent_effective_cpumask(struct cpu= set *cs, int cmd, WARN_ON_ONCE(parent->nr_subparts < 0); } spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_housekeeping_cpumask(isolcpus_updated); =20 if ((old_prs !=3D new_prs) && (cmd =3D=3D partcmd_update)) update_partition_exclusive_flag(cs, new_prs); @@ -2972,7 +2970,7 @@ static int update_prstate(struct cpuset *cs, int new_= prs) else if (isolcpus_updated) isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_housekeeping_cpumask(isolcpus_updated); =20 /* Force update if switching back to member & update effective_xcpus */ update_cpumasks_hier(cs, &tmpmask, !new_prs); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 86ce39aa1e9f..5baf1621a56e 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -102,6 +102,7 @@ EXPORT_SYMBOL_GPL(housekeeping_test_cpu); int housekeeping_update(struct cpumask *mask, enum hk_type type) { struct cpumask *trial, *old =3D NULL; + int err; =20 if (type !=3D HK_TYPE_DOMAIN) return -ENOTSUPP; @@ -126,10 +127,11 @@ int housekeeping_update(struct cpumask *mask, enum hk= _type type) =20 mem_cgroup_flush_workqueue(); vmstat_flush_workqueue(); + err =3D workqueue_unbound_exclude_cpumask(housekeeping_cpumask(type)); =20 kfree(old); =20 - return 0; + return err; } =20 void __init housekeeping_init(void) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index c6b79b3675c3..63dcc1d8b317 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -6921,7 +6921,7 @@ static int workqueue_apply_unbound_cpumask(const cpum= ask_var_t unbound_cpumask) * This function can be called from cpuset code to provide a set of isolat= ed * CPUs that should be excluded from wq_unbound_cpumask. */ -int workqueue_unbound_exclude_cpumask(cpumask_var_t exclude_cpumask) +int workqueue_unbound_exclude_cpumask(const struct cpumask *exclude_cpumas= k) { cpumask_var_t cpumask; int ret =3D 0; --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C11D33CEB1; Fri, 29 Aug 2025 15:49:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482554; cv=none; b=UoPV/sveNxcSXnd4RCqf/jTK1G6sjQMfmwdooasUBTypMXekLPERhZhUOn248dS3hk6WpvZykR9Q3rklS/um5TeG71hjtFyT0Rl54UMIeYz/EX6d/OxTmz1Lgh/ooOzOKCJibxrYWv8I1X0dj09+wyilEQHRdDrjczCMlhtwWxU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482554; c=relaxed/simple; bh=1sjNIErYPKHTUGf8GFmQpOwZYS7RGtAlT/xdwIhtqi4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nnWsF4Kvh5F+/f+8FSNRDnH9LvTaTI24dhq+FoL/FxIMCZO6E0n9ByefxeNat3KzJe5ChTd8kgUkSS3j012vpJOJZ19JA+VpsjQ+wfV/cdqM5BcYbyj6Q8XPIOaZRYixVb9XfypxLnHnokZnrLZVxnzjPawVfeRhgiPGm5xGqfI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RkJxJIT9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RkJxJIT9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4E97BC4CEF0; Fri, 29 Aug 2025 15:49:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482552; bh=1sjNIErYPKHTUGf8GFmQpOwZYS7RGtAlT/xdwIhtqi4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RkJxJIT9LWmLTgfQ6NbwYDDiBF12gT4Ru1GVKCeHY8I1CDkvp1y/bG4XtFtHPBnRb CT/AasE8/4Gob/oci3IJtkfjnnGeNl0vv+lcdlFoGReAXh0TqD9bzZcbNaYSM9liVu maiERSFxcT57JnVO7oUhLDqS171QN+unqWjx5csiWoqgM0VyF3qEijnhEAdO1UdLvH NakZ9DtztzIhXqGgSaBIpFy8wGBmYzw2f2AuAK3vWCpulsxJFsoB8Fs2NJKuBgA9iy P2BPMiTP9e3GZYn+qPaHx6Se+lwVWu+4hK6zLhXyg7vvL5cs+Y8ygk13M6BEhTNCS4 33Zu3MnrxEr+w== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Johannes Weiner , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , cgroups@vger.kernel.org Subject: [PATCH 18/33] cpuset: Remove cpuset_cpu_is_isolated() Date: Fri, 29 Aug 2025 17:47:59 +0200 Message-ID: <20250829154814.47015-19-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The set of cpuset isolated CPUs is now included in HK_TYPE_DOMAIN housekeeping cpumask. There is no usecase left interested in just checking what is isolated by cpuset and not by the isolcpus=3D kernel boot parameter. Signed-off-by: Frederic Weisbecker --- include/linux/cpuset.h | 6 ------ include/linux/sched/isolation.h | 3 +-- kernel/cgroup/cpuset.c | 12 ------------ 3 files changed, 1 insertion(+), 20 deletions(-) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 051d36fec578..a10775a4f702 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -78,7 +78,6 @@ extern void cpuset_lock(void); extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mas= k); extern bool cpuset_cpus_allowed_fallback(struct task_struct *p); -extern bool cpuset_cpu_is_isolated(int cpu); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); #define cpuset_current_mems_allowed (current->mems_allowed) void cpuset_init_current_mems_allowed(void); @@ -208,11 +207,6 @@ static inline bool cpuset_cpus_allowed_fallback(struct= task_struct *p) return false; } =20 -static inline bool cpuset_cpu_is_isolated(int cpu) -{ - return false; -} - static inline nodemask_t cpuset_mems_allowed(struct task_struct *p) { return node_possible_map; diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index 199d0fc4646f..c02923ed4cbe 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -83,8 +83,7 @@ static inline void housekeeping_init(void) { } static inline bool cpu_is_isolated(int cpu) { return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) || - !housekeeping_test_cpu(cpu, HK_TYPE_TICK) || - cpuset_cpu_is_isolated(cpu); + !housekeeping_test_cpu(cpu, HK_TYPE_TICK); } =20 #endif /* _LINUX_SCHED_ISOLATION_H */ diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index eb8d01d23af6..df1dfacf5f9d 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -29,7 +29,6 @@ #include #include #include -#include #include #include #include @@ -1353,17 +1352,6 @@ static void update_housekeeping_cpumask(bool isolcpu= s_updated) WARN_ON_ONCE(ret < 0); } =20 -/** - * cpuset_cpu_is_isolated - Check if the given CPU is isolated - * @cpu: the CPU number to be checked - * Return: true if CPU is used in an isolated partition, false otherwise - */ -bool cpuset_cpu_is_isolated(int cpu) -{ - return cpumask_test_cpu(cpu, isolated_cpus); -} -EXPORT_SYMBOL_GPL(cpuset_cpu_is_isolated); - /* * compute_effective_exclusive_cpumask - compute effective exclusive CPUs * @cs: cpuset --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D40B633EAF4 for ; Fri, 29 Aug 2025 15:49:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482555; cv=none; b=IW6eUEAo2kYIsTOheWgqB6NX8tBQdvDNveAcX30lb4KhkkagdSEephTPT4/pZV2PwYUz0D8xH3HHs3qxD9yyjVRpQMR7GHiMNuxCbpXcHBw1zV610+2tP98tQdF7zZwRfCsYyPWKUVrRNr2Flwt+EOSfol6bpJW7jAYX7VEQRY8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482555; c=relaxed/simple; bh=H42PRIJFhc68hczE5JFr8BulcuyrR8cWi/byZNSlUrw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ky+MNkhfKI1No9O95PwhHP4dcl09YgjujzPmNMnrPojP8i3YHJJIc+jTCP+FrFDVxm7O30U4Ss5jj3HRTG3ASwT1grMOGc77zgdWsm04Pazv1c3x/z1+A42WW7dGYD1ms9e+Tc1lLq3zz19Onn2VG6ts/yxPEBXWxE7ZX9A9sGY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AcVGpPnl; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AcVGpPnl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0D073C4CEF6; Fri, 29 Aug 2025 15:49:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482555; bh=H42PRIJFhc68hczE5JFr8BulcuyrR8cWi/byZNSlUrw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=AcVGpPnlQT1jwi8qC6n4TM2FJ2My5t5SfEz9687CXaAWzknU5wiIjteY3u8GtlXx2 samFbf8UAxUsyqIh0Q1QOfA1zvJajbFOjTi8VNRQMxB0Za5RVZtSATYa7Z0tU+MWfr vvKop+2fRrFgEX5mff7dw6QQp7a0eaQZQ6cUU6k+4rEZInstCCOoyd3AZaQqWgryIB oMn2ZWkfv11old8eEHmnu4gJfUFc9DO7A/GKgshtQRTIRZbYTdiYyJY9en1/1mvRa2 slBNBSCVOw1t+n1g31piiGuhIsjTFxHqD4HWpiyMhP6PUORSTiq/2b8/vD9zbL1v1+ ghf5Klfdxd+Xg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Ingo Molnar , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 19/33] sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated() Date: Fri, 29 Aug 2025 17:48:00 +0200 Message-ID: <20250829154814.47015-20-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It doesn't make sense to use nohz_full without also isolating the related CPUs from the domain topology, either through the use of isolcpus=3D or cpuset isolated partitions. And now HK_TYPE_DOMAIN includes all kinds of domain isolated CPUs. This means that HK_TYPE_KERNEL_NOISE (of which HK_TYPE_TICK is only an alias) is always a superset of HK_TYPE_DOMAIN. Therefore if a CPU is not HK_TYPE_KERNEL_NOISE, it can't be HK_TYPE_DOMAIN either. Testing the latter is then enough. Simplify cpu_is_isolated() accordingly. Signed-off-by: Frederic Weisbecker --- include/linux/sched/isolation.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index c02923ed4cbe..8d6d26d3fdf5 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -82,8 +82,7 @@ static inline void housekeeping_init(void) { } =20 static inline bool cpu_is_isolated(int cpu) { - return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) || - !housekeeping_test_cpu(cpu, HK_TYPE_TICK); + return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN); } =20 #endif /* _LINUX_SCHED_ISOLATION_H */ --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F72B33EB18; Fri, 29 Aug 2025 15:49:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482558; cv=none; b=Hx2aalPfyNDu931gTH6UaXJ2nY++/uZ9ncQ4GRL7XGD3nftGuZ09Sh0SlV7cuSft0U/eJzkakB2YfuU3haB5W/4yGlOF/zHqCIPnXiCVmMOBSUIFqDtYbrK1iCcPLlYXLHOvhPgx5W+hj4gTvR8FWCqPd16/Hz4IMBrv9sq1iT4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482558; c=relaxed/simple; bh=fj8C83YhLE6FVgaWV6fkmpAqQbVfAMlK0gEymOT9gl0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dJho/AJxbjbTvMbpF7f5xAO+tQabP7zg38HVdAMTyMRrKFRNZXiu2Sbjjt7dGxJQ2vERpjceD19SzSbPt3SAyHOn/mwx9Cdq+BJmSi8JnYA7dP1qXWPklRSXEoJQei6cpwIEjUmMSrJFJ8f9bVSZj2pnX85Jl7ljryhkIs/3PUs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bFF9W1uG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bFF9W1uG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DFBCFC4CEF7; Fri, 29 Aug 2025 15:49:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482558; bh=fj8C83YhLE6FVgaWV6fkmpAqQbVfAMlK0gEymOT9gl0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bFF9W1uGjPYgj0n35MA8v1QzRUdZ+VVTyx8RusAg7FgrCvSIk6zL8oauUs/qFiW0P voQjtxTbTk5yq5dtTj98FLtFUgy7U9qhEMGF2Cm2P7IahY06mY10MrVoGIkfMu2bzS wRyaOyHwaXAo9YTRjgvnPzgZ9vamh+HvnJGxQdf8nD6RAKcZbp7WzVWMgEQ2/utKJa aFWbSzBN6zRS+Gjx9slQQcSA6EL/hLmkFts81Gq0H30pbIZVbae3uR9gkV7gtwd8sy osVhH/eOO9DE3SeQHJguu0L39A2HQ3a69dYC7Haedvm5juGOV8zD580yoqYrPo1QDy QMgKVCGHDcKoA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Bjorn Helgaas , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , linux-pci@vger.kernel.org Subject: [PATCH 20/33] PCI: Remove superfluous HK_TYPE_WQ check Date: Fri, 29 Aug 2025 17:48:01 +0200 Message-ID: <20250829154814.47015-21-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It doesn't make sense to use nohz_full without also isolating the related CPUs from the domain topology, either through the use of isolcpus=3D or cpuset isolated partitions. And now HK_TYPE_DOMAIN includes all kinds of domain isolated CPUs. This means that HK_TYPE_KERNEL_NOISE (of which HK_TYPE_WQ is only an alias) is always a superset of HK_TYPE_DOMAIN. Therefore: HK_TYPE_KERNEL_NOISE & HK_TYPE_DOMAIN =3D HK_TYPE_DOMAIN Simplify the PCI probe target election accordingly. Signed-off-by: Frederic Weisbecker --- drivers/pci/pci-driver.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index cf2b83004886..326112ec516e 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -382,23 +382,14 @@ static int pci_call_probe(struct pci_driver *drv, str= uct pci_dev *dev, pci_physfn_is_probed(dev)) { error =3D local_pci_probe(&ddi); } else { - cpumask_var_t wq_domain_mask; struct pci_probe_arg arg =3D { .ddi =3D &ddi }; =20 INIT_WORK_ONSTACK(&arg.work, local_pci_probe_callback); =20 - if (!zalloc_cpumask_var(&wq_domain_mask, GFP_KERNEL)) { - error =3D -ENOMEM; - goto out; - } - rcu_read_lock(); - cpumask_and(wq_domain_mask, - housekeeping_cpumask(HK_TYPE_WQ), - housekeeping_cpumask(HK_TYPE_DOMAIN)); - cpu =3D cpumask_any_and(cpumask_of_node(node), - wq_domain_mask); + housekeeping_cpumask(HK_TYPE_DOMAIN)); + if (cpu < nr_cpu_ids) { schedule_work_on(cpu, &arg.work); rcu_read_unlock(); @@ -409,10 +400,9 @@ static int pci_call_probe(struct pci_driver *drv, stru= ct pci_dev *dev, error =3D local_pci_probe(&ddi); } =20 - free_cpumask_var(wq_domain_mask); destroy_work_on_stack(&arg.work); } -out: + dev->is_probed =3D 0; cpu_hotplug_enable(); return error; --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9D9732C31A for ; Fri, 29 Aug 2025 15:49:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482560; cv=none; b=CAhUcU/yUtBCjwn7uf7QLFdA3cAdfkTMZD85rcBqkSCpaOlERqYwgSANjQUuiOCTi7vyg4jxa8wD5tUZAbatCzAuUPXSVU9JV7u+tcyL/ov7w2a/5U/0fQSsWU57Fh6Frg1858Fa1euCWPT20GIumz6f+FnOahD96r+6ZK3CCPo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482560; c=relaxed/simple; bh=msWuzDTcgxqxV1VfqrsWjNwgWbBhxXt72Fo9MOZeVHU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i9hY419EVytQ5ajSnB4nLG4+PLuKDGhd0ItV7cQW/1+rA0oLCer1R2ukA2iDYOdioXRkoW+AaGsBRR+KLCF8gddEe3VyEP4x/29VMg3ReE8hVEoCDZ3fi4Gqzmtw2Fj82zvi6DmgHaFvIKJh0bWd3aJaJxw8TAPJeiFztZT5Jm4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hHSzD4Dn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hHSzD4Dn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A529FC4CEF0; Fri, 29 Aug 2025 15:49:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482560; bh=msWuzDTcgxqxV1VfqrsWjNwgWbBhxXt72Fo9MOZeVHU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hHSzD4Dn4qZ4WYPHCYqnZqAtyCr8q+4g24/jaS6Vl3q0CE/FcoT3rB+yYcK3dzCac rr30Mt0hR137cIcYylb8/M8hTBOOMBlo4wNlnap4RdTZEZSH+rr+03DklbnyjMu7PJ moa31JgA2121K1Fv8Q5rFS9uphda4zpOOpSlWNmrcns211gtwu201PnL8A7x28kHZ7 k2uxj+D9oqUR96HiNS7E3nHzPUlJDvWQJJE4AdY/dtcz6avDksIHxl7wM+DdfMqphO DlcHy2XEGSqYWHiiqIo06P4F7117/Kl5cWHlPm7MqdlT5LHMNzz3e3/Wuoy+s+RuGu UK59VYvQjTfHA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 21/33] kthread: Refine naming of affinity related fields Date: Fri, 29 Aug 2025 17:48:02 +0200 Message-ID: <20250829154814.47015-22-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The kthreads preferred affinity related fields use "hotplug" as the base of their naming because the affinity management was initially deemed to deal with CPU hotplug. The scope of this role is going to broaden now and also deal with cpuset isolated partition updates. Switch the naming accordingly. Signed-off-by: Frederic Weisbecker --- kernel/kthread.c | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index 31b072e8d427..c4dd967e9e9c 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -35,8 +35,8 @@ static DEFINE_SPINLOCK(kthread_create_lock); static LIST_HEAD(kthread_create_list); struct task_struct *kthreadd_task; =20 -static LIST_HEAD(kthreads_hotplug); -static DEFINE_MUTEX(kthreads_hotplug_lock); +static LIST_HEAD(kthread_affinity_list); +static DEFINE_MUTEX(kthread_affinity_lock); =20 struct kthread_create_info { @@ -69,7 +69,7 @@ struct kthread { /* To store the full name if task comm is truncated. */ char *full_name; struct task_struct *task; - struct list_head hotplug_node; + struct list_head affinity_node; struct cpumask *preferred_affinity; }; =20 @@ -128,7 +128,7 @@ bool set_kthread_struct(struct task_struct *p) =20 init_completion(&kthread->exited); init_completion(&kthread->parked); - INIT_LIST_HEAD(&kthread->hotplug_node); + INIT_LIST_HEAD(&kthread->affinity_node); p->vfork_done =3D &kthread->exited; =20 kthread->task =3D p; @@ -323,10 +323,10 @@ void __noreturn kthread_exit(long result) { struct kthread *kthread =3D to_kthread(current); kthread->result =3D result; - if (!list_empty(&kthread->hotplug_node)) { - mutex_lock(&kthreads_hotplug_lock); - list_del(&kthread->hotplug_node); - mutex_unlock(&kthreads_hotplug_lock); + if (!list_empty(&kthread->affinity_node)) { + mutex_lock(&kthread_affinity_lock); + list_del(&kthread->affinity_node); + mutex_unlock(&kthread_affinity_lock); =20 if (kthread->preferred_affinity) { kfree(kthread->preferred_affinity); @@ -390,9 +390,9 @@ static void kthread_affine_node(void) return; } =20 - mutex_lock(&kthreads_hotplug_lock); - WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); - list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + mutex_lock(&kthread_affinity_lock); + WARN_ON_ONCE(!list_empty(&kthread->affinity_node)); + list_add_tail(&kthread->affinity_node, &kthread_affinity_list); /* * The node cpumask is racy when read from kthread() but: * - a racing CPU going down will either fail on the subsequent @@ -402,7 +402,7 @@ static void kthread_affine_node(void) */ kthread_fetch_affinity(kthread, affinity); set_cpus_allowed_ptr(current, affinity); - mutex_unlock(&kthreads_hotplug_lock); + mutex_unlock(&kthread_affinity_lock); =20 free_cpumask_var(affinity); } @@ -876,10 +876,10 @@ int kthread_affine_preferred(struct task_struct *p, c= onst struct cpumask *mask) goto out; } =20 - mutex_lock(&kthreads_hotplug_lock); + mutex_lock(&kthread_affinity_lock); cpumask_copy(kthread->preferred_affinity, mask); - WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); - list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + WARN_ON_ONCE(!list_empty(&kthread->affinity_node)); + list_add_tail(&kthread->affinity_node, &kthread_affinity_list); kthread_fetch_affinity(kthread, affinity); =20 /* It's safe because the task is inactive. */ @@ -887,7 +887,7 @@ int kthread_affine_preferred(struct task_struct *p, con= st struct cpumask *mask) do_set_cpus_allowed(p, affinity); raw_spin_unlock_irqrestore(&p->pi_lock, flags); =20 - mutex_unlock(&kthreads_hotplug_lock); + mutex_unlock(&kthread_affinity_lock); out: free_cpumask_var(affinity); =20 @@ -908,9 +908,9 @@ static int kthreads_online_cpu(unsigned int cpu) struct kthread *k; int ret; =20 - guard(mutex)(&kthreads_hotplug_lock); + guard(mutex)(&kthread_affinity_lock); =20 - if (list_empty(&kthreads_hotplug)) + if (list_empty(&kthread_affinity_list)) return 0; =20 if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) @@ -918,7 +918,7 @@ static int kthreads_online_cpu(unsigned int cpu) =20 ret =3D 0; =20 - list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { + list_for_each_entry(k, &kthread_affinity_list, affinity_node) { if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || kthread_is_per_cpu(k->task))) { ret =3D -EINVAL; --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E10E8340DBA for ; Fri, 29 Aug 2025 15:49:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482563; cv=none; b=kGSJi5MpqSjuEG3qe2RyNR4Io0VUM3vKM9IkTOcG0vut7hi1sSFU0RWGNWexBk0caZ3eaZaNPiUHSV+DQ43V/eqA6uLOjjs16SSwLx+5kr+PFZBETsQj+52Ehk+CYBHRdr7FUdRJn+KLzdygfI8ShOr5T5Oxhk3ELtBKY+JVwsE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482563; c=relaxed/simple; bh=W/GX3bsXC+QdFnHRurPxesapj10VViVlroe7jMIjDmM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FT8mnqd5jPBpWnB+0UsIqhI9XqVMekKrNDhkn2EdYZeMmTQr4ZKzYv7nFWDQndNSWraG/YYGZ6+q95O2MezC6OBSy1kEaC4W73wJn5bI4RLcgeQYiL6tjIiDJHjOmZFevPeJ9zdKneSME3xV0gyi/DGePiWCqZ6K41y2Fk7qbaw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TJZNBgew; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TJZNBgew" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5816C4CEF5; Fri, 29 Aug 2025 15:49:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482562; bh=W/GX3bsXC+QdFnHRurPxesapj10VViVlroe7jMIjDmM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TJZNBgewpL77YxM3jxrHKjGe0QrhlNbtg/f6NjjS8o5c8k7TN7sN/NwsiSOCL7MEH D6AkZMIZs54V29bmez7t2snQovRau9reM42B21F6a5b3nkYvf3wIOv6jJHcjvChMKf jpoVjfDTLBpOWdlt9AcNfX0wfqKqS9no8pUb2Vzj5RqJ8J6JnnpM4GyJG8NMbZEHUv kRE2InDeDFERwOxAl7qCJlF2nFJoblULX6tl8PvzbHAB+vM5qFm3h1Pm1Ukw9xDU9r YVla9w+yZG7HUe59MqepeC28fwKbI5Mylw5huaLec8B5HTGfX+44p99sQ5wbF+iOoV Yt/q1iLuXue6g== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 22/33] kthread: Include unbound kthreads in the managed affinity list Date: Fri, 29 Aug 2025 17:48:03 +0200 Message-ID: <20250829154814.47015-23-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The managed affinity list currently contains only unbound kthreads that have affinity preferences. Unbound kthreads globally affine by default are outside of the list because their affinity is automatically managed by the scheduler (through the fallback housekeeping mask) and by cpuset. However in order to preserve the preferred affinity of kthreads, cpuset will delegate the isolated partition update propagation to the housekeeping and kthread code. Prepare for that with including all unbound kthreads in the managed affinity list. Signed-off-by: Frederic Weisbecker --- kernel/kthread.c | 59 ++++++++++++++++++++++++------------------------ 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index c4dd967e9e9c..cba3d297f267 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -365,9 +365,10 @@ static void kthread_fetch_affinity(struct kthread *kth= read, struct cpumask *cpum if (kthread->preferred_affinity) { pref =3D kthread->preferred_affinity; } else { - if (WARN_ON_ONCE(kthread->node =3D=3D NUMA_NO_NODE)) - return; - pref =3D cpumask_of_node(kthread->node); + if (kthread->node =3D=3D NUMA_NO_NODE) + pref =3D housekeeping_cpumask(HK_TYPE_KTHREAD); + else + pref =3D cpumask_of_node(kthread->node); } =20 cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_KTHREAD)); @@ -380,32 +381,29 @@ static void kthread_affine_node(void) struct kthread *kthread =3D to_kthread(current); cpumask_var_t affinity; =20 - WARN_ON_ONCE(kthread_is_per_cpu(current)); + if (WARN_ON_ONCE(kthread_is_per_cpu(current))) + return; =20 - if (kthread->node =3D=3D NUMA_NO_NODE) { - housekeeping_affine(current, HK_TYPE_KTHREAD); - } else { - if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) { - WARN_ON_ONCE(1); - return; - } - - mutex_lock(&kthread_affinity_lock); - WARN_ON_ONCE(!list_empty(&kthread->affinity_node)); - list_add_tail(&kthread->affinity_node, &kthread_affinity_list); - /* - * The node cpumask is racy when read from kthread() but: - * - a racing CPU going down will either fail on the subsequent - * call to set_cpus_allowed_ptr() or be migrated to housekeepers - * afterwards by the scheduler. - * - a racing CPU going up will be handled by kthreads_online_cpu() - */ - kthread_fetch_affinity(kthread, affinity); - set_cpus_allowed_ptr(current, affinity); - mutex_unlock(&kthread_affinity_lock); - - free_cpumask_var(affinity); + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) { + WARN_ON_ONCE(1); + return; } + + mutex_lock(&kthread_affinity_lock); + WARN_ON_ONCE(!list_empty(&kthread->affinity_node)); + list_add_tail(&kthread->affinity_node, &kthread_affinity_list); + /* + * The node cpumask is racy when read from kthread() but: + * - a racing CPU going down will either fail on the subsequent + * call to set_cpus_allowed_ptr() or be migrated to housekeepers + * afterwards by the scheduler. + * - a racing CPU going up will be handled by kthreads_online_cpu() + */ + kthread_fetch_affinity(kthread, affinity); + set_cpus_allowed_ptr(current, affinity); + mutex_unlock(&kthread_affinity_lock); + + free_cpumask_var(affinity); } =20 static int kthread(void *_create) @@ -924,8 +922,11 @@ static int kthreads_online_cpu(unsigned int cpu) ret =3D -EINVAL; continue; } - kthread_fetch_affinity(k, affinity); - set_cpus_allowed_ptr(k->task, affinity); + + if (k->preferred_affinity || k->node !=3D NUMA_NO_NODE) { + kthread_fetch_affinity(k, affinity); + set_cpus_allowed_ptr(k->task, affinity); + } } =20 free_cpumask_var(affinity); --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26321341AA8 for ; Fri, 29 Aug 2025 15:49:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482565; cv=none; b=NsT/POogReyDkoPIKkfDSw6BfN9BouXCiCsjRhMB+rQj382hX0WmHmbLl/arWQ0/o6UI8jZbCX4jYBD2olXxypmqjfLopfSf40NuYNUzhqQXIS8mVVkDl32/tUASzz4CAPV/KTh7uGXZgtNsJBEh1l+h43SlWGGwtK3e12UJVGU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482565; c=relaxed/simple; bh=pzOS1AH2wHJHSLxWx9UNzEZr4fjsciwBuMuPCFGnunY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NHwZxqWdVVwosFgwkf73QKFhHEiCV9hWU8ZCIztuinvFuG9yYsy64INSGTeCP11k1AnbrGDmmrKZCTeF9xO7gbCubDyZTOZnLclLUEgCOaESp4+wbL2Uiye2sK/ZQJ4JtCfRRG3k36RJ0XuLfcqjVp6stdrjbtfLBfaWC2Mm2Ew= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GwfEcOVn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GwfEcOVn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 36433C4CEF0; Fri, 29 Aug 2025 15:49:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482565; bh=pzOS1AH2wHJHSLxWx9UNzEZr4fjsciwBuMuPCFGnunY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GwfEcOVnU5K1ePkPkZNrBJExeBPZm13MFUtGD95UkVYALYTUCQareaXWecuhS2nUh AaSByZV0BSJmnCh8ib5VYI79e0/poO1RruOKSbL8HmSE7GOIxamTZTUHgD/SLagCnM oysQK1EpwszXiQKMZ2IE6AXMiRjyZG1JCU3VvAY76laKCerAi4iZSx4O8ibg4gMW8d sujoJv9wntizKz0l5Wg1N0OPQaGINLPrgfu5oKQy2oMSBf3GloJB1pBZrH8OhdxIBe bqKY6bkcpuAbKBrcFl3X1Y4gVucz8qDpTHsM1x8x86COKNhmyPtFQ6QNUGeelclpiE XE/bTqouuRUPQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 23/33] kthread: Include kthreadd to the managed affinity list Date: Fri, 29 Aug 2025 17:48:04 +0200 Message-ID: <20250829154814.47015-24-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The unbound kthreads affinity management performed by cpuset is going to be imported to the kthread core code for consolidation purposes. Treat kthreadd just like any other kthread. Signed-off-by: Frederic Weisbecker --- kernel/kthread.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index cba3d297f267..cb0be05d6091 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -820,12 +820,13 @@ int kthreadd(void *unused) /* Setup a clean context for our children to inherit. */ set_task_comm(tsk, comm); ignore_signals(tsk); - set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_KTHREAD)); set_mems_allowed(node_states[N_MEMORY]); =20 current->flags |=3D PF_NOFREEZE; cgroup_init_kthreadd(); =20 + kthread_affine_node(); + for (;;) { set_current_state(TASK_INTERRUPTIBLE); if (list_empty(&kthread_create_list)) --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C80A9341ADA for ; Fri, 29 Aug 2025 15:49:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482567; cv=none; b=ipsuK2qFVcJPksd07xhyNyoB1Pdqj8XV7HBRBaBZrggSXwsI3U4rerKhfhIwrdtxGA6wkBmf0t8H2YpkzucWar8YY1+nZ++tUFVq+ktWAmnE32U8fSAhjM4s3u8k0bEs2dNiaYV49EfSeBObWMb5j2sRB8u8W9eUagANrOA65mQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482567; c=relaxed/simple; bh=P4mvTUeHXF9en4Wc01ryvy7Y+w+A6mZbhGMT9VJdNo8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=k29xhYHDto+UriTw0npxd1td0RW2C2i2Zl34U7GrjGyllz+8rAme8ijjAPsUwp5S8taTrk+YLzX8VrMSNDCCmt3WDF/5FqZt3a+z7ASF/EWr9CbfHkbe206t0mWffiUoGFWP6WGGIC00fVvzDSUXg5054KVWKZYky427026xmnE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RO/y6dOr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RO/y6dOr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D3B9C4CEF0; Fri, 29 Aug 2025 15:49:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482567; bh=P4mvTUeHXF9en4Wc01ryvy7Y+w+A6mZbhGMT9VJdNo8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RO/y6dOr8F+Lq3WXvsHR5YmvrwTzpO7rcA3MaBc6GDXAtE1QoZ85775+mG1CRoAxI q9A95Vtbp2tu4R0sq9Y0/ys2MWkKibTOawjb0TxHsc+6ZmjxdElhlagX04NtkCyMVa j0FOfLo4SjYGAjRQ8zi0R2VKteaQO1uxdTQLRCTYIALN/xLYZzPc+Q4w3UvgfTxZwH S/I09ib1vrJ0IJ17E8/co9qczeQxBBYkya41qi95QS6KKPj0TLu4lx9BjwFlgxFXvM iHxA/nIiQo/71WsB/RS2RdJsyar7JM8gdxtluIddgG9JkiV2kRgpBL1ZUUJrGwE+h5 nQI0gOv0pEiCA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 24/33] kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management Date: Fri, 29 Aug 2025 17:48:05 +0200 Message-ID: <20250829154814.47015-25-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Unbound kthreads want to run neither on nohz_full CPUs nor on domain isolated CPUs. And since nohz_full implies domain isolation, checking the latter is enough to verify both. Therefore exclude kthreads from domain isolation. Signed-off-by: Frederic Weisbecker --- kernel/kthread.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index cb0be05d6091..8d0c8c4c7e46 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -362,18 +362,20 @@ static void kthread_fetch_affinity(struct kthread *kt= hread, struct cpumask *cpum { const struct cpumask *pref; =20 + guard(rcu)(); + if (kthread->preferred_affinity) { pref =3D kthread->preferred_affinity; } else { if (kthread->node =3D=3D NUMA_NO_NODE) - pref =3D housekeeping_cpumask(HK_TYPE_KTHREAD); + pref =3D housekeeping_cpumask(HK_TYPE_DOMAIN); else pref =3D cpumask_of_node(kthread->node); } =20 - cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_KTHREAD)); + cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_DOMAIN)); if (cpumask_empty(cpumask)) - cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD)); + cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN)); } =20 static void kthread_affine_node(void) --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A03D4342CA7 for ; Fri, 29 Aug 2025 15:49:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482570; cv=none; b=KyvpV03uU7bOVjHBt3+6Zn433xvd0fPLLoUmCdytyuJaPaGpYl3iVSJWuHOhikO1i/+xllftsguTok8fOOgXKZHWB9lYyCAbWYnTk9C1MUY56hkBoPTXSWjiLjes5yGom9sJTyKyDm06zKQDiWbWz6mUIoL4jDksfqI+3WV2aQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482570; c=relaxed/simple; bh=Rz519K4RzNFN9C7FTifkrc2s4js15D9og5elk9stWlE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IWsWEHIDz+qKXuXlhQ5Zr1nNDcnSKWmhPTTT7p2P3ML8Tpt2VbKWuDvGobfykgaSuneit5by92Ry22bF0jkTPRE6aZE59KGGmW6UihouiIWS9wjrrBb17Aj5fTzRysKTQl2C7Q/oMSnzClV0XVsbBxue8Kg/Nv4LTl/rbB2yh0M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=b746JEi7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="b746JEi7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA898C4CEF6; Fri, 29 Aug 2025 15:49:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482570; bh=Rz519K4RzNFN9C7FTifkrc2s4js15D9og5elk9stWlE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b746JEi7ofbu/Uhds2/RNNzOPXnS7NWLS66+0qFYAYdB06+Utq0poYU59EjL1x1H8 dezSaOR27Iq42u6uaWRjc0o/96wwcsMJAiN39CyBlkvhI7CEX4TDv/4c8o1TVSQNuq YJm8ix9mud4MyQuOE04B2Uce5NFehL9MJbWBkyAmP4z3jflL9uWZVMd0fL9MNCMFta ubplVfs0bNlY0fF8820ia4Zf3NegD9Fj7qfeg/9O6tXuRg41IIusMhbt9Tzjb38Soz is41qw93F1ecHTsp+2PGo249tkyHvtYsos2tbi2AkSALvzDsQEUI8XU02jqckILXj8 rbZrZMlUlU9Bw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Catalin Marinas , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , Will Deacon , linux-arm-kernel@lists.infradead.org Subject: [PATCH 25/33] sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN Date: Fri, 29 Aug 2025 17:48:06 +0200 Message-ID: <20250829154814.47015-26-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Tasks that have all their allowed CPUs offline don't want their affinity to fallback on either nohz_full CPUs or on domain isolated CPUs. And since nohz_full implies domain isolation, checking the latter is enough to verify both. Therefore exclude domain isolation from fallback task affinity. Signed-off-by: Frederic Weisbecker --- include/linux/mmu_context.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h index ac01dc4eb2ce..ed3dd0f3fe19 100644 --- a/include/linux/mmu_context.h +++ b/include/linux/mmu_context.h @@ -24,7 +24,7 @@ static inline void leave_mm(void) { } #ifndef task_cpu_possible_mask # define task_cpu_possible_mask(p) cpu_possible_mask # define task_cpu_possible(cpu, p) true -# define task_cpu_fallback_mask(p) housekeeping_cpumask(HK_TYPE_TICK) +# define task_cpu_fallback_mask(p) housekeeping_cpumask(HK_TYPE_DOMAIN) #else # define task_cpu_possible(cpu, p) cpumask_test_cpu((cpu), task_cpu_possib= le_mask(p)) #endif --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71C19343213; Fri, 29 Aug 2025 15:49:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482573; cv=none; b=ckj5ETJTY8bzPxhc6/uwG1eMGlMkM1P4GPDQE8vbwn5gL79ZqxSWRtcr+XD9zx6RZTvWXBg/YassXW65wFdPU8uIlD0KcmWfJHhJ967fXdt5WIZRbWbILNxctosFM4fAOqg8zngh01w/vbVTpnR3TGcxZaP3r3mKOvOGRnfyhTE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482573; c=relaxed/simple; bh=EmolSYetz1s/vfgGS0udWvK8bCbBeW5BQ9rcUHulJgU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=azgr67X4FhVxRy7RiszL1C6B/K0yo9J2rsNESX5s/SW8kr8PrhvbsTIuSckCSxgFw6fMwoWn+ETNfAWDDP+vL1JArUs/qYKZoRl54EJ5cq1sAzz1TLJtn5cVuTZa0qQcsXCx3ygEWVlWP+csWFwU63KCsiPV9SpmlCeQ5BRaWPU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=X/pUjtlN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="X/pUjtlN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9BA13C4CEF5; Fri, 29 Aug 2025 15:49:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482573; bh=EmolSYetz1s/vfgGS0udWvK8bCbBeW5BQ9rcUHulJgU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=X/pUjtlNA3q8T67DkXk64bvz6y4vf1gZ6dHbOkQSJuTI4OF6qrBqu5Azg6hPuGhFm u3YuLghkohNMiRlCWgTxFPn+r7RzhNeUwsX0DPcfpy1DdOwQd7MovxWTkwJ4PlBVnT QFFD1nUfxfhqp/BE2iFnFd0sxdqvXSC2SGI0VyF9fqvD+5kBx4kqStNdTT4HeCTKPh LZIrbLMvl/ZXJnLXM/Ek1PZ4ioC5muDBFSob5aphSZM6Itx38A5pdFvsvqV/omLn/I IQqUr5C2LEXVwhoAxB1qRTVza99aUGghEmAXVK6MjNsSQPDxkwVocHc7TEPuDKgpVg WNwn2SD2W3XHQ== From: Frederic Weisbecker To: LKML Cc: Gabriele Monaco , Johannes Weiner , Marco Crivellari , Michal Hocko , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Waiman Long , cgroups@vger.kernel.org, Frederic Weisbecker Subject: [PATCH 26/33] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Date: Fri, 29 Aug 2025 17:48:07 +0200 Message-ID: <20250829154814.47015-27-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Gabriele Monaco Currently the user can set up isolated cpus via cpuset and nohz_full in such a way that leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated nor nohz full). This can be a problem for other subsystems (e.g. the timer wheel imgration). Prevent this configuration by blocking any assignation that would cause the union of domain isolated cpus and nohz_full to covers all CPUs. Acked-by: Frederic Weisbecker Signed-off-by: Gabriele Monaco Signed-off-by: Frederic Weisbecker --- kernel/cgroup/cpuset.c | 57 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index df1dfacf5f9d..8260dd699fd8 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1275,6 +1275,19 @@ static void isolated_cpus_update(int old_prs, int ne= w_prs, struct cpumask *xcpus cpumask_andnot(isolated_cpus, isolated_cpus, xcpus); } =20 +/* + * isolated_cpus_should_update - Returns if the isolated_cpus mask needs u= pdate + * @prs: new or old partition_root_state + * @parent: parent cpuset + * Return: true if isolated_cpus needs modification, false otherwise + */ +static bool isolated_cpus_should_update(int prs, struct cpuset *parent) +{ + if (!parent) + parent =3D &top_cpuset; + return prs !=3D parent->partition_root_state; +} + /* * partition_xcpus_add - Add new exclusive CPUs to partition * @new_prs: new partition_root_state @@ -1339,6 +1352,36 @@ static bool partition_xcpus_del(int old_prs, struct = cpuset *parent, return isolcpus_updated; } =20 +/* + * isolcpus_nohz_conflict - check for isolated & nohz_full conflicts + * @new_cpus: cpu mask for cpus that are going to be isolated + * Return: true if there is conflict, false otherwise + * + * If nohz_full is enabled and we have isolated CPUs, their combination mu= st + * still leave housekeeping CPUs. + */ +static bool isolcpus_nohz_conflict(struct cpumask *new_cpus) +{ + cpumask_var_t full_hk_cpus; + int res =3D false; + + if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE)) + return false; + + if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL)) + return true; + + cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE), + housekeeping_cpumask(HK_TYPE_DOMAIN)); + cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus); + cpumask_and(full_hk_cpus, full_hk_cpus, cpu_online_mask); + if (!cpumask_weight_andnot(full_hk_cpus, new_cpus)) + res =3D true; + + free_cpumask_var(full_hk_cpus); + return res; +} + static void update_housekeeping_cpumask(bool isolcpus_updated) { int ret; @@ -1453,6 +1496,9 @@ static int remote_partition_enable(struct cpuset *cs,= int new_prs, if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) || cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus)) return PERR_INVCPUS; + if (isolated_cpus_should_update(new_prs, NULL) && + isolcpus_nohz_conflict(tmp->new_cpus)) + return PERR_HKEEPING; =20 spin_lock_irq(&callback_lock); isolcpus_updated =3D partition_xcpus_add(new_prs, NULL, tmp->new_cpus); @@ -1552,6 +1598,9 @@ static void remote_cpus_update(struct cpuset *cs, str= uct cpumask *xcpus, else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) || cpumask_subset(top_cpuset.effective_cpus, tmp->addmask)) cs->prs_err =3D PERR_NOCPUS; + else if (isolated_cpus_should_update(prs, NULL) && + isolcpus_nohz_conflict(tmp->addmask)) + cs->prs_err =3D PERR_HKEEPING; if (cs->prs_err) goto invalidate; } @@ -1904,6 +1953,12 @@ static int update_parent_effective_cpumask(struct cp= uset *cs, int cmd, return err; } =20 + if (deleting && isolated_cpus_should_update(new_prs, parent) && + isolcpus_nohz_conflict(tmp->delmask)) { + cs->prs_err =3D PERR_HKEEPING; + return PERR_HKEEPING; + } + /* * Change the parent's effective_cpus & effective_xcpus (top cpuset * only). @@ -2924,6 +2979,8 @@ static int update_prstate(struct cpuset *cs, int new_= prs) * Need to update isolated_cpus. */ isolcpus_updated =3D true; + if (isolcpus_nohz_conflict(cs->effective_xcpus)) + err =3D PERR_HKEEPING; } else { /* * Switching back to member is always allowed even if it --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5779E3314AC for ; Fri, 29 Aug 2025 15:49:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482576; cv=none; b=Zdm2Kdmua7mY6dnEFlGjJZNtp6wg3JgHOn245kK+JkJdwo/ZpYMHO9O0x7C4A04l5fYlkU0mXJZI09ZYzQjpMaxM6dqhkNfHLWE3+u4ZXpy914gkaPOr2GCi3pMZloVxQguz7vpUIDO0P9MPr2D+pyEEzc9LL92N19B1B8E2FHQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482576; c=relaxed/simple; bh=QmHeJLzWp0BmqCz+ztUiz8/RmLhDblGFFLV72Ii4meA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=C8ICXXjx1TOxDYwvIpaYsVfxbKrwnHAq4WyA+wq94Dqm/JTAvs0IAvQrShOEQ0cb1vFg+DmAiUbe2nEx+zE6ODknuz4wf+lbRRy9yDGWsfZnYlkBI2jvXmJpg4v5j4wqxXVVE+QMVIsA4Yok1KJZrg6FZ2JtGpwnW5boSYSzY2A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Onb0u1kZ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Onb0u1kZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B8C8FC4CEF5; Fri, 29 Aug 2025 15:49:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482576; bh=QmHeJLzWp0BmqCz+ztUiz8/RmLhDblGFFLV72Ii4meA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Onb0u1kZ3slfgnN8k1ly/s8A1GZzWzaV5YsJT04FaL2a7EmkG7WZ3lbDCaky2ngSN o7ZF3jsOYN4lEWAMdZtfl0JFKyx5fSrKh4x2mYeE/HmmZuBEQpy85BLaG5wwrcd11s h+wKTCtCEIlEU1wdg/jgRKM6OzrTUVBdKR/hCp6OR3R2751odueFoposm69H7XAj9K RD6i4peSxwqffTswmTtIGkiV+4LKuDy4XriHKU7lqZ+7AKXlG0mEpLrcR908qdK4uo M6A55zOpjr50gyZPvKHra5/tNlZAXcm6JqjbQOPgOIMQVxJnTF1Db4TOV5sp48ZXaq de/A2kTsqqemA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Catalin Marinas , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Waiman Long , Will Deacon , linux-arm-kernel@lists.infradead.org Subject: [PATCH 27/33] sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN Date: Fri, 29 Aug 2025 17:48:08 +0200 Message-ID: <20250829154814.47015-28-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When none of the allowed CPUs of a task are online, it gets migrated to the fallback cpumask which is all the non nohz_full CPUs. However just like nohz_full CPUs, domain isolated CPUs don't want to be disturbed by tasks that have lost their CPU affinities. And since nohz_full rely on domain isolation to work correctly, the housekeeping mask of domain isolated CPUs is always a subset of the housekeeping mask of nohz_full CPUs (there can be CPUs that are domain isolated but not nohz_full, OTOH there can't be nohz_full CPUs that are not domain isolated): HK_TYPE_DOMAIN & HK_TYPE_KERNEL_NOISE =3D=3D HK_TYPE_DOMAIN Therefore use HK_TYPE_DOMAIN as the appropriate fallback target for tasks and since this cpumask can be modified at runtime, make sure that 32 bits support CPUs on ARM64 mismatched systems are not isolated by cpusets. CC: linux-arm-kernel@lists.infradead.org Signed-off-by: Frederic Weisbecker --- arch/arm64/kernel/cpufeature.c | 18 ++++++++++++--- include/linux/cpu.h | 4 ++++ kernel/cgroup/cpuset.c | 40 +++++++++++++++++++++++----------- 3 files changed, 46 insertions(+), 16 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9ad065f15f1d..38046489d2ea 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1653,6 +1653,18 @@ has_cpuid_feature(const struct arm64_cpu_capabilitie= s *entry, int scope) return feature_matches(val, entry); } =20 +/* + * 32 bits support CPUs can't be isolated because tasks may be + * arbitrarily affine to them, defeating the purpose of isolation. + */ +bool arch_isolated_cpus_can_update(struct cpumask *new_cpus) +{ + if (static_branch_unlikely(&arm64_mismatched_32bit_el0)) + return !cpumask_intersects(cpu_32bit_el0_mask, new_cpus); + else + return true; +} + const struct cpumask *system_32bit_el0_cpumask(void) { if (!system_supports_32bit_el0()) @@ -1666,7 +1678,7 @@ const struct cpumask *system_32bit_el0_cpumask(void) =20 const struct cpumask *task_cpu_fallback_mask(struct task_struct *p) { - return __task_cpu_possible_mask(p, housekeeping_cpumask(HK_TYPE_TICK)); + return __task_cpu_possible_mask(p, housekeeping_cpumask(HK_TYPE_DOMAIN)); } =20 static int __init parse_32bit_el0_param(char *str) @@ -3963,8 +3975,8 @@ static int enable_mismatched_32bit_el0(unsigned int c= pu) bool cpu_32bit =3D false; =20 if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) { - if (!housekeeping_cpu(cpu, HK_TYPE_TICK)) - pr_info("Treating adaptive-ticks CPU %u as 64-bit only\n", cpu); + if (!housekeeping_cpu(cpu, HK_TYPE_DOMAIN)) + pr_info("Treating domain isolated CPU %u as 64-bit only\n", cpu); else cpu_32bit =3D true; } diff --git a/include/linux/cpu.h b/include/linux/cpu.h index b91b993f58ee..8bb239080534 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -228,4 +228,8 @@ static inline bool cpu_attack_vector_mitigated(enum cpu= _attack_vectors v) #define smt_mitigations SMT_MITIGATIONS_OFF #endif =20 +struct cpumask; + +bool arch_isolated_cpus_can_update(struct cpumask *new_cpus); + #endif /* _LINUX_CPU_H_ */ diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 8260dd699fd8..cf99ea844c1d 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1352,33 +1352,47 @@ static bool partition_xcpus_del(int old_prs, struct= cpuset *parent, return isolcpus_updated; } =20 +bool __weak arch_isolated_cpus_can_update(struct cpumask *new_cpus) +{ + return true; +} + /* - * isolcpus_nohz_conflict - check for isolated & nohz_full conflicts + * isolated_cpus_can_update - check for conflicts against housekeeping and + * CPUs capabilities. * @new_cpus: cpu mask for cpus that are going to be isolated - * Return: true if there is conflict, false otherwise + * Return: true if there no conflict, false otherwise * - * If nohz_full is enabled and we have isolated CPUs, their combination mu= st - * still leave housekeeping CPUs. + * Check for conflicts: + * - If nohz_full is enabled and there are isolated CPUs, their combinatio= n must + * still leave housekeeping CPUs. + * - Architecture has CPU capabilities incompatible with being isolated */ -static bool isolcpus_nohz_conflict(struct cpumask *new_cpus) +static bool isolated_cpus_can_update(struct cpumask *new_cpus) { cpumask_var_t full_hk_cpus; - int res =3D false; + bool res; + + if (!arch_isolated_cpus_can_update(new_cpus)) + return false; =20 if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE)) - return false; + return true; =20 if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL)) - return true; + return false; + + res =3D true; =20 cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE), housekeeping_cpumask(HK_TYPE_DOMAIN)); cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus); cpumask_and(full_hk_cpus, full_hk_cpus, cpu_online_mask); if (!cpumask_weight_andnot(full_hk_cpus, new_cpus)) - res =3D true; + res =3D false; =20 free_cpumask_var(full_hk_cpus); + return res; } =20 @@ -1497,7 +1511,7 @@ static int remote_partition_enable(struct cpuset *cs,= int new_prs, cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus)) return PERR_INVCPUS; if (isolated_cpus_should_update(new_prs, NULL) && - isolcpus_nohz_conflict(tmp->new_cpus)) + !isolated_cpus_can_update(tmp->new_cpus)) return PERR_HKEEPING; =20 spin_lock_irq(&callback_lock); @@ -1599,7 +1613,7 @@ static void remote_cpus_update(struct cpuset *cs, str= uct cpumask *xcpus, cpumask_subset(top_cpuset.effective_cpus, tmp->addmask)) cs->prs_err =3D PERR_NOCPUS; else if (isolated_cpus_should_update(prs, NULL) && - isolcpus_nohz_conflict(tmp->addmask)) + !isolated_cpus_can_update(tmp->addmask)) cs->prs_err =3D PERR_HKEEPING; if (cs->prs_err) goto invalidate; @@ -1954,7 +1968,7 @@ static int update_parent_effective_cpumask(struct cpu= set *cs, int cmd, } =20 if (deleting && isolated_cpus_should_update(new_prs, parent) && - isolcpus_nohz_conflict(tmp->delmask)) { + !isolated_cpus_can_update(tmp->delmask)) { cs->prs_err =3D PERR_HKEEPING; return PERR_HKEEPING; } @@ -2979,7 +2993,7 @@ static int update_prstate(struct cpuset *cs, int new_= prs) * Need to update isolated_cpus. */ isolcpus_updated =3D true; - if (isolcpus_nohz_conflict(cs->effective_xcpus)) + if (!isolated_cpus_can_update(cs->effective_xcpus)) err =3D PERR_HKEEPING; } else { /* --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59F8E343D74; Fri, 29 Aug 2025 15:49:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482579; cv=none; b=OcCW+aYzfTSaYQeTNNEhVxiS/vjaaBMS+l+iW5OVee7XvpThOYYyTlcF+I7V7dW2Z6rrX7FUiEgig1VYlo0RimPOZi9xkQ8VesD/qmRRCQYG4slCrtehX6t283fe5zlHj6hQQR/9kc7JxafbQYlvcuDMP9yliQ4J8DPkW2uj9QU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482579; c=relaxed/simple; bh=hpG0dj5iV2xivxvSMNh4a8MoQwil4rwPsTJqI4z5o6w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NVfDnMdVU8RLaMnJbctdmpLvyqdOjvxTgiIJIIbeWwSe5su7UXGUwXbTDAstmWMwsDvkfqJpZJ4OZXR5l2xl9qnvheEGAOqmerGQpTla+LfQ6TcBcjeahMXDkYMXlVqeiB9vP5+cYeyaSLKM6vh9n2ihkfuqmEPRceMduy3JRx8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rDs5VGew; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rDs5VGew" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E17CC4CEF6; Fri, 29 Aug 2025 15:49:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482579; bh=hpG0dj5iV2xivxvSMNh4a8MoQwil4rwPsTJqI4z5o6w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rDs5VGew0+CnTFohXhAoiPDU0BkhoLd+XDyY4j6GP4lx9U0ivy8IfGFyQzhIbdQyq EM86AKCYd15Nyz156BWF16K6jqlA88Et+JA5A7rxo17pBS1oeCbt9GwyhzxZ5FsbSL NVoO8J49QMSVjTANp24I8JzP+r2nQ2pJLXIvMe8gHxHMYthGWq+OrgNG+P4I3+ezjz CeePJ4k44ZN0rzgv/q5TLpptoXowwYGRVmBMhuV5A/tiwFERvEqbBCxy0x/FDGOgnD ekTIf/1+to/q75hHZ+84Fovd1+b88ukUT9XioYG72xOUKiP3mOJI6XkLNnxEdg8YjK ypB4P4tEKZj8A== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Ingo Molnar , Johannes Weiner , Marco Crivellari , Michal Hocko , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long , cgroups@vger.kernel.org Subject: [PATCH 28/33] kthread: Honour kthreads preferred affinity after cpuset changes Date: Fri, 29 Aug 2025 17:48:09 +0200 Message-ID: <20250829154814.47015-29-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When cpuset isolated partitions get updated, unbound kthreads get indifferently affine to all non isolated CPUs, regardless of their individual affinity preferences. For example kswapd is a per-node kthread that prefers to be affine to the node it refers to. Whenever an isolated partition is created, updated or deleted, kswapd's node affinity is going to be broken if any CPU in the related node is not isolated because kswapd will be affine globally. Fix this with letting the consolidated kthread managed affinity code do the affinity update on behalf of cpuset. Signed-off-by: Frederic Weisbecker --- include/linux/kthread.h | 1 + kernel/cgroup/cpuset.c | 5 ++--- kernel/kthread.c | 38 +++++++++++++++++++++++++++++--------- kernel/sched/isolation.c | 2 ++ 4 files changed, 34 insertions(+), 12 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 8d27403888ce..c92c1149ee6e 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -100,6 +100,7 @@ void kthread_unpark(struct task_struct *k); void kthread_parkme(void); void kthread_exit(long result) __noreturn; void kthread_complete_and_exit(struct completion *, long) __noreturn; +int kthreads_update_housekeeping(void); =20 int kthreadd(void *unused); extern struct task_struct *kthreadd_task; diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index cf99ea844c1d..e76711fa7d34 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1130,11 +1130,10 @@ void cpuset_update_tasks_cpumask(struct cpuset *cs,= struct cpumask *new_cpus) =20 if (top_cs) { /* + * PF_KTHREAD tasks are handled by housekeeping. * PF_NO_SETAFFINITY tasks are ignored. - * All per cpu kthreads should have PF_NO_SETAFFINITY - * flag set, see kthread_set_per_cpu(). */ - if (task->flags & PF_NO_SETAFFINITY) + if (task->flags & (PF_KTHREAD | PF_NO_SETAFFINITY)) continue; cpumask_andnot(new_cpus, possible_mask, subpartitions_cpus); } else { diff --git a/kernel/kthread.c b/kernel/kthread.c index 8d0c8c4c7e46..4d3cc04e5e8b 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -896,14 +896,7 @@ int kthread_affine_preferred(struct task_struct *p, co= nst struct cpumask *mask) } EXPORT_SYMBOL_GPL(kthread_affine_preferred); =20 -/* - * Re-affine kthreads according to their preferences - * and the newly online CPU. The CPU down part is handled - * by select_fallback_rq() which default re-affines to - * housekeepers from other nodes in case the preferred - * affinity doesn't apply anymore. - */ -static int kthreads_online_cpu(unsigned int cpu) +static int kthreads_update_affinity(bool force) { cpumask_var_t affinity; struct kthread *k; @@ -926,7 +919,7 @@ static int kthreads_online_cpu(unsigned int cpu) continue; } =20 - if (k->preferred_affinity || k->node !=3D NUMA_NO_NODE) { + if (force || k->preferred_affinity || k->node !=3D NUMA_NO_NODE) { kthread_fetch_affinity(k, affinity); set_cpus_allowed_ptr(k->task, affinity); } @@ -937,6 +930,33 @@ static int kthreads_online_cpu(unsigned int cpu) return ret; } =20 +/** + * kthreads_update_housekeeping - Update kthreads affinity on cpuset change + * + * When cpuset changes a partition type to/from "isolated" or updates rela= ted + * cpumasks, propagate the housekeeping cpumask change to preferred kthrea= ds + * affinity. + * + * Returns 0 if successful, -ENOMEM if temporary mask couldn't + * be allocated or -EINVAL in case of internal error. + */ +int kthreads_update_housekeeping(void) +{ + return kthreads_update_affinity(true); +} + +/* + * Re-affine kthreads according to their preferences + * and the newly online CPU. The CPU down part is handled + * by select_fallback_rq() which default re-affines to + * housekeepers from other nodes in case the preferred + * affinity doesn't apply anymore. + */ +static int kthreads_online_cpu(unsigned int cpu) +{ + return kthreads_update_affinity(false); +} + static int kthreads_init(void) { return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online", diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 5baf1621a56e..51392eb9b221 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -128,6 +128,8 @@ int housekeeping_update(struct cpumask *mask, enum hk_t= ype type) mem_cgroup_flush_workqueue(); vmstat_flush_workqueue(); err =3D workqueue_unbound_exclude_cpumask(housekeeping_cpumask(type)); + WARN_ON_ONCE(err < 0); + err =3D kthreads_update_housekeeping(); =20 kfree(old); =20 --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C4DC3451D5 for ; Fri, 29 Aug 2025 15:49:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482581; cv=none; b=uGof+BfW7Mw1AEGEUfithJt7uFVGQmbylSVv1DTH9ahlOJNoH//f6KAl4WFKFBqrn/Zxj7B6qkPyL2wPCPOpZS8n5Pw5FNW/lVlocqnq0dJOC0bpY8OjvwnRDUL1r+WoQHnASV+W6wVY91nOOUT5MuonNEHelbKmAAm1+/Cqf2A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482581; c=relaxed/simple; bh=n52cAu7bEMYLK1x1CARbNE9gVHxK/1DrMBlj3TyiG6U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e2HGndQTMRi/UtkInXnlmq4x1tbVRD07gEMtBYYnFB42aQXU96gLUFWUDtcxQCP6hCP8bgy99bdoGh5av+SuhXOhFanQRVqzkStfijzjWpUFbfbQjFXAp5Ib0wSkptDXYdpp2oivJ7ulRi8x6BaP6VeOdf/lY7XZ+HIvHb2psHE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hCsXZhJJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hCsXZhJJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A33CDC4CEF0; Fri, 29 Aug 2025 15:49:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482581; bh=n52cAu7bEMYLK1x1CARbNE9gVHxK/1DrMBlj3TyiG6U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hCsXZhJJuSzcbOV2emvF2IXQgJKhCJ5oXujhVYoIeYfxonhIaXF1jE2yHzbGZYJXd 1Tu8OzFhVZWH7+vWC7MhmuMqTOYc5Fq5Wnw8cNre6suETrByplGsG3SKt04IhqgeND oJDUGuQ/Fau+jhxaHoGCUkrCtVsiWgSI1a21uUFqCIzh3iYNJj2GWhcQqSc2+BIWVH O6sLuILxwt7yS+fb7mNUEAdKUU7pIUJymgQcll6mi70e8TFJYyR3pB2sxFkyOT0zJ1 klNAg/j5Ikn09eV0Sqv5cleve0i75Eat0A2vpSXtBZjm0ZVvSwioiUiTK8LveYC+7u jFRGj4rua0aLA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Vlastimil Babka , Waiman Long Subject: [PATCH 29/33] kthread: Comment on the purpose and placement of kthread_affine_node() call Date: Fri, 29 Aug 2025 17:48:10 +0200 Message-ID: <20250829154814.47015-30-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It may not appear obvious why kthread_affine_node() is not called before the kthread creation completion instead of after the first wake-up. The reason is that kthread_affine_node() applies a default affinity behaviour that only takes place if no affinity preference have already been passed by the kthread creation call site. Add a comment to clarify that. Reported-by: Peter Zijlstra Signed-off-by: Frederic Weisbecker --- kernel/kthread.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/kthread.c b/kernel/kthread.c index 4d3cc04e5e8b..d36bdfbd004e 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -453,6 +453,10 @@ static int kthread(void *_create) =20 self->started =3D 1; =20 + /* + * Apply default node affinity if no call to kthread_bind[_mask]() nor + * kthread_affine_preferred() was issued before the first wake-up. + */ if (!(current->flags & PF_NO_SETAFFINITY) && !self->preferred_affinity) kthread_affine_node(); =20 --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09EB6345741 for ; Fri, 29 Aug 2025 15:49:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482584; cv=none; b=sPlrLnwlUYuVJOGhE3kU37K7HdyTxQlBxk6ML70md92sy+VjjluqKXWdmBotf0CHvYcgf85kDHdTM8NDlVWa/yfT3Zsl3e1upAJQGCdJ1NwKc4bMSLAQNtIITLj8yEMC9+S+27Mk/kr2uDpE3NFGh2CaWjbd7o3xhVvEsE235/E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482584; c=relaxed/simple; bh=tmme1GHCRZttEH0fzbesyJQ/RQ0GFTr99io8B0/CrOM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zh+36ly9G9O6bMyGpBw5eYsuoiRz8nAj5lP7ZOHkIbu1NJ9FOg/U4PZUiLxfHjpeybGtV+OaoTTCkGwWbslOF3R6IgyKiqU9RdRHzmRmDVnuXkFIRPk2TpTOi+bmlVnMN9AqnGCFdthH7rTCdVh1W8QiPitH+s90QgNtrYS4R6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YD8acBty; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YD8acBty" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 08CE1C4CEF0; Fri, 29 Aug 2025 15:49:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482583; bh=tmme1GHCRZttEH0fzbesyJQ/RQ0GFTr99io8B0/CrOM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YD8acBty7uTe7lkSuCFwOxAFRxYC5vf1qSBSyYfi9nuYsNCz00UrngZDNNcqK3V9U yIoIfAd26AaLADYX44IpcqZfR0Hl/v0/yPPzOiBoSqDJfi2+4M9mw2/rT8NIUA2s74 nGys6NvCAR+TnGTIZe48AVOU6jQo+632thLRetDWZzANMdXHvvhGSDp635jVK90gPp P2/geXdYLXNwNvet6/gvEwe4j4/hrdyt7hqyE46adZxVBU8BnLOZYaRg9kM/p25E3J 7c2ANkdQfiq6lD5HrtoUxYHMBtBrGZXCA6FajhotONZnjOx6lDeIhAkeLoA0H55UvP b3Yhb81AeUJQQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Waiman Long Subject: [PATCH 30/33] kthread: Add API to update preferred affinity on kthread runtime Date: Fri, 29 Aug 2025 17:48:11 +0200 Message-ID: <20250829154814.47015-31-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Kthreads can apply for a preferred affinity upon creation but they have no means to update that preferred affinity after the first wake up. kthread_affine_preferred() is optimized by assuming the kthread is sleeping while applying the allowed cpumask. Therefore introduce a new API to further update the preferred affinity. It will be used by IRQ kthreads. Signed-off-by: Frederic Weisbecker --- include/linux/kthread.h | 1 + kernel/kthread.c | 55 +++++++++++++++++++++++++++++++++++------ 2 files changed, 48 insertions(+), 8 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index c92c1149ee6e..a06cae7f2c55 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -86,6 +86,7 @@ void free_kthread_struct(struct task_struct *k); void kthread_bind(struct task_struct *k, unsigned int cpu); void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask); int kthread_affine_preferred(struct task_struct *p, const struct cpumask *= mask); +int kthread_affine_preferred_update(struct task_struct *p, const struct cp= umask *mask); int kthread_stop(struct task_struct *k); int kthread_stop_put(struct task_struct *k); bool kthread_should_stop(void); diff --git a/kernel/kthread.c b/kernel/kthread.c index d36bdfbd004e..f3397cf7542a 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -322,17 +322,16 @@ EXPORT_SYMBOL_GPL(kthread_parkme); void __noreturn kthread_exit(long result) { struct kthread *kthread =3D to_kthread(current); + struct cpumask *to_free =3D NULL; kthread->result =3D result; - if (!list_empty(&kthread->affinity_node)) { - mutex_lock(&kthread_affinity_lock); - list_del(&kthread->affinity_node); - mutex_unlock(&kthread_affinity_lock); =20 - if (kthread->preferred_affinity) { - kfree(kthread->preferred_affinity); - kthread->preferred_affinity =3D NULL; - } + scoped_guard(mutex, &kthread_affinity_lock) { + if (!list_empty(&kthread->affinity_node)) + list_del_init(&kthread->affinity_node); + to_free =3D kthread->preferred_affinity; + kthread->preferred_affinity =3D NULL; } + kfree(to_free); do_exit(0); } EXPORT_SYMBOL(kthread_exit); @@ -900,6 +899,46 @@ int kthread_affine_preferred(struct task_struct *p, co= nst struct cpumask *mask) } EXPORT_SYMBOL_GPL(kthread_affine_preferred); =20 +/** + * kthread_affine_preferred_update - update a kthread's preferred affinity + * @p: thread created by kthread_create(). + * @cpumask: new mask of CPUs (might not be online, must be possible) for = @k + * to run on. + * + * Update the cpumask of the desired kthread's affinity that was passed by + * a previous call to kthread_affine_preferred(). This can be called either + * before or after the first wakeup of the kthread. + * + * Returns 0 if the affinity has been applied. + */ +int kthread_affine_preferred_update(struct task_struct *p, + const struct cpumask *mask) +{ + struct kthread *kthread =3D to_kthread(p); + cpumask_var_t affinity; + int ret =3D 0; + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + scoped_guard(mutex, &kthread_affinity_lock) { + if (WARN_ON_ONCE(!kthread->preferred_affinity || + list_empty(&kthread->affinity_node))) { + ret =3D -EINVAL; + goto out; + } + + cpumask_copy(kthread->preferred_affinity, mask); + kthread_fetch_affinity(kthread, affinity); + set_cpus_allowed_ptr(p, affinity); + } +out: + free_cpumask_var(affinity); + + return ret; +} +EXPORT_SYMBOL_GPL(kthread_affine_preferred_update); + static int kthreads_update_affinity(bool force) { cpumask_var_t affinity; --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79DEB3469EC for ; Fri, 29 Aug 2025 15:49:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482586; cv=none; b=FHsSyxABA8y8xC4Lg+hd0LfxqoDbTLwvj120iCA+6rYMscY4BfRTsRO9VhOiqH3rGN/zC6lzpS1OMSC2R03sOzGFza7GLQePnGxsRTtnyTJeU0HKq3hYmMMZdrUkJRADHYNgFIrrsqSdXQ9nK2DNvcUgYjwyTtik4tuuuI8vv0A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482586; c=relaxed/simple; bh=VRca4O5xWSvVtV07vu7T/vayv/vKzk4YtiAHpk59aYg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hxPJE1p3jc+K380OFYpnhLaFF2cBPAmEcbcGQemA3JsMbT5NxJUQ8/TAx6srDZVwYPveUEPDdvt15rt8h5vyHGtHiccVuNqRIwZ9M4LTRjv9etQ8Pq4sYvy2sQv8RJazndootcnts/5T1GhW4JhuYzYnK+29Y50ryWT4u7vnONI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eeY+u4yR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eeY+u4yR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2141AC4CEF6; Fri, 29 Aug 2025 15:49:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482586; bh=VRca4O5xWSvVtV07vu7T/vayv/vKzk4YtiAHpk59aYg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eeY+u4yRCSDgJjtSz5hmOoH3x+19w/I5H1KLqf5o9IzSy4VLTHJHmCL04nCDqjl/A Mi3x3LwMHRRn+lb+eRFvvD0VNC/J8aFrykShAPe2Qyw3DFCrMaXcUjLoMtmE5Zqk5v KEY/GMXoPLxWzLoKGwA1a1EU0rS1reEksM2jpm33gu1U3bjqhY20oFcA7TosiH/GOV rUtyYwdKFmyOcDXFLxXk7mOf6qqGFqPUlbXsYWhEPy978x01HZUHGDVX1ckPDmrT8j M9m3hbV2fQHdHo0C6h2vE7P64y7inEVxIublpIAIQq2kbRtatTh4jaUoflsYZPVrL4 ToZ6s8NuGB8Mg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Waiman Long Subject: [PATCH 31/33] kthread: Document kthread_affine_preferred() Date: Fri, 29 Aug 2025 17:48:12 +0200 Message-ID: <20250829154814.47015-32-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The documentation of this new API has been overlooked during its introduction. Fill the gap. Signed-off-by: Frederic Weisbecker --- kernel/kthread.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/kernel/kthread.c b/kernel/kthread.c index f3397cf7542a..b989aeaa441a 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -857,6 +857,18 @@ int kthreadd(void *unused) return 0; } =20 +/** + * kthread_affine_preferred - Define a kthread's preferred affinity + * @p: thread created by kthread_create(). + * @cpumask: preferred mask of CPUs (might not be online, must be possible= ) for @k + * to run on. + * + * Similar to kthread_bind_mask() except that the affinity is not a requir= ement + * but rather a preference that can be constrained by CPU isolation or CPU= hotplug. + * Must be called before the first wakeup of the kthread. + * + * Returns 0 if the affinity has been applied. + */ int kthread_affine_preferred(struct task_struct *p, const struct cpumask *= mask) { struct kthread *kthread =3D to_kthread(p); --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0DA7346A17 for ; Fri, 29 Aug 2025 15:49:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482589; cv=none; b=GlymJ7eMj+ocISySJfLjgzX5FOWoLR81/VVYyQ4o75IXRuV/WKcgWMwVI9AZtoi/+3G+W9kC3bPMe2dLAeh4DyeGnyjf5qhqsRBIkUh22HmFh15tGT46UyoB0xXX7T1BHjUSXo0wYReeIVPFLR/a/pUyvMizE3zDQar/U9+Jc5o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482589; c=relaxed/simple; bh=chMAaHPdvZjXfYZQTfmqroMaCahKvWi7hxvycGkBc0k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e9YZ4hYhd2qIC27TBmS2FKV4cxmxH7KqsfcTQtSV3Qbl2owvd0fSCQ5nI/rGmBVSuvWiyVMnI/u/YlqYSonAgPcWax0uGdTr7a4TkpcsOrqei1gRiRdVgnJSPXfkWk6TyRtdsdpLqSDG9enMFJPqXmMS/0HJ9yJfAuodfdfxmuA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=es0FMyW1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="es0FMyW1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 69DDBC4CEF0; Fri, 29 Aug 2025 15:49:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482588; bh=chMAaHPdvZjXfYZQTfmqroMaCahKvWi7hxvycGkBc0k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=es0FMyW1SaqVKkgezlAZUhevVLjYaqYFRexaNvJcGe+5ANF//BXhMnqwba1S7xffE Ds6tnsQta6Bqz1F39kztfqbU1kdVjQPnaSVAGTneqthxva9slurGG+63EdbqHMSwDA nEM/aTvaaz+NNKYGx2KE1l/mzecJQNJa+eBsoyfwOLn64InXzi8YDQvRKgwiCirvZW SnI600a6tyIG+pCffJvrESqXiAnw5mtZwW5NwK4Qjfs3ZeJSqQdxP7jAGHJLKBdau4 7LVZnKjDhLEsdKeZ7+RkcdudrBImSiZaEGro9s/DF5pZbfWw1LlQ7GMKGE8yA74Jmr mH0a24lK4gqlA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Waiman Long Subject: [RFC PATCH 32/33] genirq: Correctly handle preferred kthreads affinity Date: Fri, 29 Aug 2025 17:48:13 +0200 Message-ID: <20250829154814.47015-33-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" [CHECKME: Do some IRQ threads have strong affinity requirements? In which case they should use kthread_bind()...] The affinity of IRQ threads is applied through a direct call to the scheduler. As a result this affinity may not be carried correctly across hotplug events, cpuset isolated partitions updates, or against housekeeping constraints. For example a simple creation of cpuset isolated partition will overwrite all IRQ threads affinity to the non isolated cpusets. To prevent from that, use the appropriate kthread affinity APIs that takes care of the preferred affinity during these kinds of events. Signed-off-by: Frederic Weisbecker --- kernel/irq/manage.c | 47 +++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index c94837382037..d96f6675c888 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -176,15 +176,15 @@ bool irq_can_set_affinity_usr(unsigned int irq) } =20 /** - * irq_set_thread_affinity - Notify irq threads to adjust affinity + * irq_thread_notify_affinity - Notify irq threads to adjust affinity * @desc: irq descriptor which has affinity changed * * Just set IRQTF_AFFINITY and delegate the affinity setting to the - * interrupt thread itself. We can not call set_cpus_allowed_ptr() here as - * we hold desc->lock and this code can be called from hard interrupt + * interrupt thread itself. We can not call kthread_affine_preferred_updat= e() + * here as we hold desc->lock and this code can be called from hard interr= upt * context. */ -static void irq_set_thread_affinity(struct irq_desc *desc) +static void irq_thread_notify_affinity(struct irq_desc *desc) { struct irqaction *action; =20 @@ -283,7 +283,7 @@ int irq_do_set_affinity(struct irq_data *data, const st= ruct cpumask *mask, fallthrough; case IRQ_SET_MASK_OK_NOCOPY: irq_validate_effective_affinity(data); - irq_set_thread_affinity(desc); + irq_thread_notify_affinity(desc); ret =3D 0; } =20 @@ -1032,11 +1032,26 @@ static void irq_thread_check_affinity(struct irq_de= sc *desc, struct irqaction *a } =20 if (valid) - set_cpus_allowed_ptr(current, mask); + kthread_affine_preferred_update(current, mask); free_cpumask_var(mask); } + +static inline void irq_thread_set_affinity(struct task_struct *t, + struct irq_desc *desc) +{ + const struct cpumask *mask; + + if (cpumask_available(desc->irq_common_data.affinity)) + mask =3D irq_data_get_effective_affinity_mask(&desc->irq_data); + else + mask =3D cpu_possible_mask; + + kthread_affine_preferred(t, mask); +} #else static inline void irq_thread_check_affinity(struct irq_desc *desc, struct= irqaction *action) { } +static inline void irq_thread_set_affinity(struct task_struct *t, + struct irq_desc *desc) { } #endif =20 static int irq_wait_for_interrupt(struct irq_desc *desc, @@ -1384,7 +1399,8 @@ static void irq_nmi_teardown(struct irq_desc *desc) } =20 static int -setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary) +setup_irq_thread(struct irqaction *new, struct irq_desc *desc, + unsigned int irq, bool secondary) { struct task_struct *t; =20 @@ -1405,16 +1421,9 @@ setup_irq_thread(struct irqaction *new, unsigned int= irq, bool secondary) * references an already freed task_struct. */ new->thread =3D get_task_struct(t); - /* - * Tell the thread to set its affinity. This is - * important for shared interrupt handlers as we do - * not invoke setup_affinity() for the secondary - * handlers as everything is already set up. Even for - * interrupts marked with IRQF_NO_BALANCE this is - * correct as we want the thread to move to the cpu(s) - * on which the requesting code placed the interrupt. - */ - set_bit(IRQTF_AFFINITY, &new->thread_flags); + + irq_thread_set_affinity(t, desc); + return 0; } =20 @@ -1486,11 +1495,11 @@ __setup_irq(unsigned int irq, struct irq_desc *desc= , struct irqaction *new) * thread. */ if (new->thread_fn && !nested) { - ret =3D setup_irq_thread(new, irq, false); + ret =3D setup_irq_thread(new, desc, irq, false); if (ret) goto out_mput; if (new->secondary) { - ret =3D setup_irq_thread(new->secondary, irq, true); + ret =3D setup_irq_thread(new->secondary, desc, irq, true); if (ret) goto out_thread; } --=20 2.51.0 From nobody Fri Oct 3 14:34:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE59D33437F for ; Fri, 29 Aug 2025 15:49:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482591; cv=none; b=AeddXrbIx+ABO/PUnnWwcjhVz+rY3KuYt2EcXSm2NU3pROHVU/5kkGxRodhn8l9JFg/Fgl/lu+WVK9Y62OlDlsA54RC1TPG+QX/LoNFJJcDR7Yr2NajALsox7FUjc2JdwtD3JQ4cRLGh7foGafyf5Y9W+RxIR3oxLrJsNKvrqqQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756482591; c=relaxed/simple; bh=Jrc2SP9zuO/q8lVKXeMSZQ2TjjYp5fXU9+UkI2oxxh8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=POT1kIPacOFO2Ya4bHrmh5M9ph39q01gPPdppew9lAJxtW6c8JA5nGE2NnuQG4dTilFwMlHsPENQoDE+py3zKa59Po6qydlTWOdhvCHBDpf3WzpiTLHrgFzczaXz1rcwuztO0bS+WNHDV6s5IlpGWGWSqVUEtS7lskJgBkm95oo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lkYd/pOm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lkYd/pOm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B09F0C4CEF6; Fri, 29 Aug 2025 15:49:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1756482590; bh=Jrc2SP9zuO/q8lVKXeMSZQ2TjjYp5fXU9+UkI2oxxh8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lkYd/pOmb9GnGKg5lfavP7nDSvh/s8W9bL6YX4KotaTa+2qjTPlmnudrn1vD/k9QS oJcz+Akb4MOrivK6cknt4ZqSpiLWVnTD4nHf1gTxtH6XsGXzBlAIGpCEsrblxNP46A iXeQXkXVGjJqkA4MUT46nkS+QCQSW66qGj0UWbaE+COLFUWKPsYPvrdZ8S1nyTm1bx +6e+9+oyapC4lYTFTm8aGzbUPNa4Gd8d+sIL+9GXW08U4YFtVkDO2j9ve46E6uf60J yNqI40EOQ/wPTRJ8J1c9EzRcQdKQgC6KAAF4X4DaWaaqg0SLucqT54i5eojQooCT95 u8kbLlTMAt/jQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Marco Crivellari , Michal Hocko , Peter Zijlstra , Tejun Heo , Thomas Gleixner , Waiman Long Subject: [PATCH 33/33] doc: Add housekeeping documentation Date: Fri, 29 Aug 2025 17:48:14 +0200 Message-ID: <20250829154814.47015-34-frederic@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250829154814.47015-1-frederic@kernel.org> References: <20250829154814.47015-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Frederic Weisbecker --- Documentation/cpu_isolation/housekeeping.rst | 111 +++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 Documentation/cpu_isolation/housekeeping.rst diff --git a/Documentation/cpu_isolation/housekeeping.rst b/Documentation/c= pu_isolation/housekeeping.rst new file mode 100644 index 000000000000..e5417302774c --- /dev/null +++ b/Documentation/cpu_isolation/housekeeping.rst @@ -0,0 +1,111 @@ +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Housekeeping +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + + +CPU Isolation moves away kernel work that may otherwise run on any CPU. +The purpose of its related features is to reduce the OS jitter that some +extreme workloads can't stand, such as in some DPDK usecases. + +The kernel work moved away by CPU isolation is commonly described as +"housekeeping" because it includes ground work that performs cleanups, +statistics maintainance and actions relying on them, memory release, +various deferrals etc... + +Sometimes housekeeping is just some unbound work (unbound workqueues, +unbound timers, ...) that gets easily assigned to non-isolated CPUs. +But sometimes housekeeping is tied to a specific CPU and requires +elaborated tricks to be offloaded to non-isolated CPUs (RCU_NOCB, remote +scheduler tick, etc...). + +Thus, a housekeeping CPU can be considered as the reverse of an isolated +CPU. It is simply a CPU that can execute housekeeping work. There must +always be at least one online housekeeping CPU at any time. The CPUs that +are not isolated are automatically assigned as housekeeping. + +Housekeeping is currently divided in four features described +by the ``enum hk_type type``: + +1. HK_TYPE_DOMAIN matches the work moved away by scheduler domain + isolation performed through ``isolcpus=3Ddomain`` boot parameter or + isolated cpuset partitions in cgroup v2. This includes scheduler + load balancing, unbound workqueues and timers. + +2. HK_TYPE_KERNEL_NOISE matches the work moved away by tick isolation + performed through ``nohz_full=3D`` or ``isolcpus=3Dnohz`` boot + parameters. This includes remote scheduler tick, vmstat and lockup + watchdog. + +3. HK_TYPE_MANAGED_IRQ matches the IRQ handlers moved away by managed + IRQ isolation performed through ``isolcpus=3Dmanaged_irq``. + +4. HK_TYPE_DOMAIN_BOOT matches the work moved away by scheduler domain + isolation performed through ``isolcpus=3Ddomain`` only. It is similar + to HK_TYPE_DOMAIN except it ignores the isolation performed by + cpusets. + + +Housekeeping cpumasks +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D + +Housekeeping cpumasks include the CPUs that can execute the work moved +away by the matching isolation feature. These cpumasks are returned by +the following function:: + + const struct cpumask *housekeeping_cpumask(enum hk_type type) + +By default, if neither ``nohz_full=3D``, nor ``isolcpus``, nor cpuset's +isolated partitions are used, which covers most usecases, this function +returns the cpu_possible_mask. + +Otherwise the function returns the cpumask complement of the isolation +feature. For example: + +With isolcpus=3Ddomain,7 the following will return a mask with all possible +CPUs except 7:: + + housekeeping_cpumask(HK_TYPE_DOMAIN) + +Similarly with nohz_full=3D5,6 the following will return a mask with all +possible CPUs except 5,6:: + + housekeeping_cpumask(HK_TYPE_KERNEL_NOISE) + + +Synchronization against cpusets +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D + +Cpuset can modify the HK_TYPE_DOMAIN housekeeping cpumask while creating, +modifying or deleting an isolated partition. + +The users of HK_TYPE_DOMAIN cpumask must then make sure to synchronize +properly against cpuset in order to make sure that: + +1. The cpumask snapshot stays coherent. + +2. No housekeeping work is queued on a newly made isolated CPU. + +3. Pending housekeeping work that was queued to a non isolated + CPU which just turned isolated through cpuset must be flushed + before the related created/modified isolated partition is made + available to userspace. + +This synchronization is maintained by an RCU based scheme. The cpuset upda= te +side waits for an RCU grace period after updating the HK_TYPE_DOMAIN +cpumask and before flushing pending works. On the read side, care must be +taken to gather the housekeeping target election and the work enqueue with= in +the same RCU read side critical section. + +A typical layout example would look like this on the update side +(``housekeeping_update()``):: + + rcu_assign_pointer(housekeeping_cpumasks[type], trial); + synchronize_rcu(); + flush_workqueue(example_workqueue); + +And then on the read side:: + + rcu_read_lock(); + cpu =3D housekeeping_any_cpu(HK_TYPE_DOMAIN); + queue_work_on(cpu, example_workqueue, work); + rcu_read_unlock(); --=20 2.51.0