From nobody Sat Apr 11 07:08:26 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DD7CC678D4 for ; Fri, 3 Mar 2023 04:56:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229676AbjCCE43 (ORCPT ); Thu, 2 Mar 2023 23:56:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229616AbjCCE4S (ORCPT ); Thu, 2 Mar 2023 23:56:18 -0500 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9262430C1; Thu, 2 Mar 2023 20:56:14 -0800 (PST) Received: from kwepemi500024.china.huawei.com (unknown [172.30.72.53]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4PSbK6073JznVZR; Fri, 3 Mar 2023 12:56:09 +0800 (CST) Received: from ci.huawei.com (10.67.175.89) by kwepemi500024.china.huawei.com (7.221.188.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Fri, 3 Mar 2023 12:56:11 +0800 From: Cai Xinchen To: , , , , , CC: , , , , , , Subject: [PATCH 4.19 1/3] cgroup/cpuset: Change cpuset_rwsem and hotplug lock order Date: Fri, 3 Mar 2023 04:50:48 +0000 Message-ID: <20230303045050.139985-2-caixinchen1@huawei.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230303045050.139985-1-caixinchen1@huawei.com> References: <20230303045050.139985-1-caixinchen1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.175.89] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemi500024.china.huawei.com (7.221.188.100) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Juri Lelli commit d74b27d63a8bebe2fe634944e4ebdc7b10db7a39 upstream. commit 1243dc518c9da ("cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem") is performance patch which is not backport. So convert percpu_rwsem to cpuset_mutex. original commit message: cpuset_rwsem is going to be acquired from sched_setscheduler() with a following patch. There are however paths (e.g., spawn_ksoftirqd) in which sched_scheduler() is eventually called while holding hotplug lock; this creates a dependecy between hotplug lock (to be always acquired first) and cpuset_rwsem (to be always acquired after hotplug lock). Fix paths which currently take the two locks in the wrong order (after a following patch is applied). Tested-by: Dietmar Eggemann Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: bristot@redhat.com Cc: claudio@evidence.eu.com Cc: lizefan@huawei.com Cc: longman@redhat.com Cc: luca.abeni@santannapisa.it Cc: mathieu.poirier@linaro.org Cc: rostedt@goodmis.org Cc: tj@kernel.org Cc: tommaso.cucinotta@santannapisa.it Link: https://lkml.kernel.org/r/20190719140000.31694-7-juri.lelli@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Cai Xinchen --- include/linux/cpuset.h | 8 ++++---- kernel/cgroup/cpuset.c | 18 ++++++++++++++---- 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 934633a05d20..7f1478c26a33 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -40,14 +40,14 @@ static inline bool cpusets_enabled(void) =20 static inline void cpuset_inc(void) { - static_branch_inc(&cpusets_pre_enable_key); - static_branch_inc(&cpusets_enabled_key); + static_branch_inc_cpuslocked(&cpusets_pre_enable_key); + static_branch_inc_cpuslocked(&cpusets_enabled_key); } =20 static inline void cpuset_dec(void) { - static_branch_dec(&cpusets_enabled_key); - static_branch_dec(&cpusets_pre_enable_key); + static_branch_dec_cpuslocked(&cpusets_enabled_key); + static_branch_dec_cpuslocked(&cpusets_pre_enable_key); } =20 extern int cpuset_init(void); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index dcd5755b1fe2..2ee0e7a06dd9 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -830,8 +830,8 @@ static void rebuild_sched_domains_locked(void) cpumask_var_t *doms; int ndoms; =20 + lockdep_assert_cpus_held(); lockdep_assert_held(&cpuset_mutex); - get_online_cpus(); =20 /* * We have raced with CPU hotplug. Don't do anything to avoid @@ -839,15 +839,13 @@ static void rebuild_sched_domains_locked(void) * Anyways, hotplug work item will rebuild sched domains. */ if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask)) - goto out; + return; =20 /* Generate domain masks and attrs */ ndoms =3D generate_sched_domains(&doms, &attr); =20 /* Have scheduler rebuild the domains */ partition_sched_domains(ndoms, doms, attr); -out: - put_online_cpus(); } #else /* !CONFIG_SMP */ static void rebuild_sched_domains_locked(void) @@ -857,9 +855,11 @@ static void rebuild_sched_domains_locked(void) =20 void rebuild_sched_domains(void) { + get_online_cpus(); mutex_lock(&cpuset_mutex); rebuild_sched_domains_locked(); mutex_unlock(&cpuset_mutex); + put_online_cpus(); } =20 /** @@ -1617,6 +1617,7 @@ static int cpuset_write_u64(struct cgroup_subsys_stat= e *css, struct cftype *cft, cpuset_filetype_t type =3D cft->private; int retval =3D 0; =20 + get_online_cpus(); mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) { retval =3D -ENODEV; @@ -1654,6 +1655,7 @@ static int cpuset_write_u64(struct cgroup_subsys_stat= e *css, struct cftype *cft, } out_unlock: mutex_unlock(&cpuset_mutex); + put_online_cpus(); return retval; } =20 @@ -1664,6 +1666,7 @@ static int cpuset_write_s64(struct cgroup_subsys_stat= e *css, struct cftype *cft, cpuset_filetype_t type =3D cft->private; int retval =3D -ENODEV; =20 + get_online_cpus(); mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; @@ -1678,6 +1681,7 @@ static int cpuset_write_s64(struct cgroup_subsys_stat= e *css, struct cftype *cft, } out_unlock: mutex_unlock(&cpuset_mutex); + put_online_cpus(); return retval; } =20 @@ -1716,6 +1720,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_ope= n_file *of, kernfs_break_active_protection(of->kn); flush_work(&cpuset_hotplug_work); =20 + get_online_cpus(); mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; @@ -1741,6 +1746,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_ope= n_file *of, free_trial_cpuset(trialcs); out_unlock: mutex_unlock(&cpuset_mutex); + put_online_cpus(); kernfs_unbreak_active_protection(of->kn); css_put(&cs->css); flush_workqueue(cpuset_migrate_mm_wq); @@ -1985,6 +1991,7 @@ static int cpuset_css_online(struct cgroup_subsys_sta= te *css) if (!parent) return 0; =20 + get_online_cpus(); mutex_lock(&cpuset_mutex); =20 set_bit(CS_ONLINE, &cs->flags); @@ -2035,6 +2042,7 @@ static int cpuset_css_online(struct cgroup_subsys_sta= te *css) spin_unlock_irq(&callback_lock); out_unlock: mutex_unlock(&cpuset_mutex); + put_online_cpus(); return 0; } =20 @@ -2048,6 +2056,7 @@ static void cpuset_css_offline(struct cgroup_subsys_s= tate *css) { struct cpuset *cs =3D css_cs(css); =20 + get_online_cpus(); mutex_lock(&cpuset_mutex); =20 if (is_sched_load_balance(cs)) @@ -2057,6 +2066,7 @@ static void cpuset_css_offline(struct cgroup_subsys_s= tate *css) clear_bit(CS_ONLINE, &cs->flags); =20 mutex_unlock(&cpuset_mutex); + put_online_cpus(); } =20 static void cpuset_css_free(struct cgroup_subsys_state *css) --=20 2.17.1 From nobody Sat Apr 11 07:08:26 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37B83C64EC4 for ; Fri, 3 Mar 2023 04:56:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229661AbjCCE4X (ORCPT ); Thu, 2 Mar 2023 23:56:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229615AbjCCE4S (ORCPT ); Thu, 2 Mar 2023 23:56:18 -0500 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9321C14EA5; Thu, 2 Mar 2023 20:56:15 -0800 (PST) Received: from kwepemi500024.china.huawei.com (unknown [172.30.72.57]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4PSbJM25ftzrSLq; Fri, 3 Mar 2023 12:55:31 +0800 (CST) Received: from ci.huawei.com (10.67.175.89) by kwepemi500024.china.huawei.com (7.221.188.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Fri, 3 Mar 2023 12:56:11 +0800 From: Cai Xinchen To: , , , , , CC: , , , , , , Subject: [PATCH 4.19 2/3] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock Date: Fri, 3 Mar 2023 04:50:49 +0000 Message-ID: <20230303045050.139985-3-caixinchen1@huawei.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230303045050.139985-1-caixinchen1@huawei.com> References: <20230303045050.139985-1-caixinchen1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.175.89] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemi500024.china.huawei.com (7.221.188.100) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Tejun Heo commit 4f7e7236435ca0abe005c674ebd6892c6e83aeb3 upstream. Bringing up a CPU may involve creating and destroying tasks which requires read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). However, cpuset's ->attach(), which may be called with thredagroup_rwsem write-locked, also wants to disable CPU hotplug and acquires cpus_read_lock(), leading to a deadlock. Fix it by guaranteeing that ->attach() is always called with CPU hotplug disabled and removing cpus_read_lock() call from cpuset_attach(). Signed-off-by: Tejun Heo Reviewed-and-tested-by: Imran Khan Reported-and-tested-by: Xuewen Yan Fixes: 05c7b7a92cc8 ("cgroup/cpuset: Fix a race between cpuset_attach() and= cpu hotplug") Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Cai Xinchen --- kernel/cgroup/cgroup.c | 49 +++++++++++++++++++++++++++++++++++++----- kernel/cgroup/cpuset.c | 7 +----- 2 files changed, 45 insertions(+), 11 deletions(-) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index a892a99eb4bf..de4c490f9193 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2209,6 +2209,45 @@ int task_cgroup_path(struct task_struct *task, char = *buf, size_t buflen) } EXPORT_SYMBOL_GPL(task_cgroup_path); =20 +/** + * cgroup_attach_lock - Lock for ->attach() + * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem + * + * cgroup migration sometimes needs to stabilize threadgroups against fork= s and + * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach= () + * implementations (e.g. cpuset), also need to disable CPU hotplug. + * Unfortunately, letting ->attach() operations acquire cpus_read_lock() c= an + * lead to deadlocks. + * + * Bringing up a CPU may involve creating and destroying tasks which requi= res + * read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside + * cpus_read_lock(). If we call an ->attach() which acquires the cpus lock= while + * write-locking threadgroup_rwsem, the locking order is reversed and we e= nd up + * waiting for an on-going CPU hotplug operation which in turn is waiting = for + * the threadgroup_rwsem to be released to create new tasks. For more deta= ils: + * + * http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu + * + * Resolve the situation by always acquiring cpus_read_lock() before optio= nally + * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assum= e that + * CPU hotplug is disabled on entry. + */ +static void cgroup_attach_lock(void) +{ + get_online_cpus(); + percpu_down_write(&cgroup_threadgroup_rwsem); +} + +/** + * cgroup_attach_unlock - Undo cgroup_attach_lock() + * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem + */ +static void cgroup_attach_unlock(void) +{ + percpu_up_write(&cgroup_threadgroup_rwsem); + put_online_cpus(); +} + /** * cgroup_migrate_add_task - add a migration target task to a migration co= ntext * @task: target task @@ -2694,7 +2733,7 @@ struct task_struct *cgroup_procs_write_start(char *bu= f, bool threadgroup) if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0) return ERR_PTR(-EINVAL); =20 - percpu_down_write(&cgroup_threadgroup_rwsem); + cgroup_attach_lock(); =20 rcu_read_lock(); if (pid) { @@ -2725,7 +2764,7 @@ struct task_struct *cgroup_procs_write_start(char *bu= f, bool threadgroup) goto out_unlock_rcu; =20 out_unlock_threadgroup: - percpu_up_write(&cgroup_threadgroup_rwsem); + cgroup_attach_unlock(); out_unlock_rcu: rcu_read_unlock(); return tsk; @@ -2740,7 +2779,7 @@ void cgroup_procs_write_finish(struct task_struct *ta= sk) /* release reference from cgroup_procs_write_start() */ put_task_struct(task); =20 - percpu_up_write(&cgroup_threadgroup_rwsem); + cgroup_attach_unlock(); for_each_subsys(ss, ssid) if (ss->post_attach) ss->post_attach(); @@ -2799,7 +2838,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgr= p) =20 lockdep_assert_held(&cgroup_mutex); =20 - percpu_down_write(&cgroup_threadgroup_rwsem); + cgroup_attach_lock(); =20 /* look up all csses currently attached to @cgrp's subtree */ spin_lock_irq(&css_set_lock); @@ -2830,7 +2869,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgr= p) ret =3D cgroup_migrate_execute(&mgctx); out_finish: cgroup_migrate_finish(&mgctx); - percpu_up_write(&cgroup_threadgroup_rwsem); + cgroup_attach_unlock(); return ret; } =20 diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 2ee0e7a06dd9..c6d412cebc43 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1528,13 +1528,9 @@ static void cpuset_attach(struct cgroup_taskset *tse= t) cgroup_taskset_first(tset, &css); cs =3D css_cs(css); =20 + lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ mutex_lock(&cpuset_mutex); =20 - /* - * It should hold cpus lock because a cpu offline event can - * cause set_cpus_allowed_ptr() failed. - */ - get_online_cpus(); /* prepare for attach */ if (cs =3D=3D &top_cpuset) cpumask_copy(cpus_attach, cpu_possible_mask); @@ -1553,7 +1549,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to); cpuset_update_task_spread_flag(cs, task); } - put_online_cpus(); =20 /* * Change mm for all threadgroup leaders. This is expensive and may --=20 2.17.1 From nobody Sat Apr 11 07:08:26 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E3CBC64EC4 for ; Fri, 3 Mar 2023 04:56:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229624AbjCCE4S (ORCPT ); Thu, 2 Mar 2023 23:56:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbjCCE4R (ORCPT ); Thu, 2 Mar 2023 23:56:17 -0500 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 932C815547; Thu, 2 Mar 2023 20:56:14 -0800 (PST) Received: from kwepemi500024.china.huawei.com (unknown [172.30.72.57]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4PSbJM4v6szrSN7; Fri, 3 Mar 2023 12:55:31 +0800 (CST) Received: from ci.huawei.com (10.67.175.89) by kwepemi500024.china.huawei.com (7.221.188.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Fri, 3 Mar 2023 12:56:12 +0800 From: Cai Xinchen To: , , , , , CC: , , , , , , Subject: [PATCH 4.19 3/3] cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all() Date: Fri, 3 Mar 2023 04:50:50 +0000 Message-ID: <20230303045050.139985-4-caixinchen1@huawei.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230303045050.139985-1-caixinchen1@huawei.com> References: <20230303045050.139985-1-caixinchen1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.175.89] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemi500024.china.huawei.com (7.221.188.100) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Tetsuo Handa commit 43626dade36fa74d3329046f4ae2d7fdefe401c6 upstream. syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that cpuset_attach() is also called from cgroup_attach_task_all(). Add cpus_read_lock() like what cgroup_procs_write_start() does. Link: https://syzkaller.appspot.com/bug?extid=3D29d3a3b4d86c8136ad9e [1] Reported-by: syzbot Signed-off-by: Tetsuo Handa Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock(= ) deadlock") Signed-off-by: Tejun Heo Signed-off-by: Cai Xinchen --- kernel/cgroup/cgroup-v1.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 61644976225a..c0ebb70808b6 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -13,6 +13,7 @@ #include #include #include +#include =20 #include =20 @@ -55,6 +56,7 @@ int cgroup_attach_task_all(struct task_struct *from, stru= ct task_struct *tsk) int retval =3D 0; =20 mutex_lock(&cgroup_mutex); + get_online_cpus(); percpu_down_write(&cgroup_threadgroup_rwsem); for_each_root(root) { struct cgroup *from_cgrp; @@ -71,6 +73,7 @@ int cgroup_attach_task_all(struct task_struct *from, stru= ct task_struct *tsk) break; } percpu_up_write(&cgroup_threadgroup_rwsem); + put_online_cpus(); mutex_unlock(&cgroup_mutex); =20 return retval; --=20 2.17.1