From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 070112820A3 for ; Fri, 8 Aug 2025 15:12:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665923; cv=none; b=KJsCZ0f9Gru99H2KxUz9l6wNA9K6SoPgr76HdXf42VteBGm8x5jdjd+2i++kMKDTEIm1fJeO7Ytinc21lOD0e3rVg+kl5WK30xzei+eims5ywtRXwNy3Zda3yHgMPUDh2srb7Axp4g+mfCzKA4AUjXrWTxLoq0xIw/M05h6Q8+w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665923; c=relaxed/simple; bh=ExwA4B6/w/byNDynTP1ep7pkTcLw3I3plTh/hwS9eP0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VSxp5qqISViZBdNubPfCFCs2Oj5Vey8DWVIRhVAsoWNQNMBgRlmGvbFiAYVqClgX8SfgRZceW764kqFcPc47Qu6mSwvYuOaObf7E5r8iolkIBymfmaM+Xlt/C9Yx0EZA9QsuTb+263C0zXY7MaABje9uz8FK7kdqJxQm4ka36w0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=X01Pfk/R; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="X01Pfk/R" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665921; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CHk/tIYRW3Z9YJEul9RLyS51vrTNKx3o9jNmAMIsCq8=; b=X01Pfk/RLRms0kctboFkj/5YS1F2mCuHfwgnUAX+oH0eLNOdZElOG4X8vyRGWkoQQCzTN+ JBNXf0bCbTwoqtEOLZSGBaASmjoEG/aLM3HN4e/nX6jj+1q14K9MZc7G55xkdjIuTmRdxd Ru467Rhf2NxAzeIMfuEGPujt+MHaoLI= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-685-O-pXsDE7PbWaeeLGCXHc0w-1; Fri, 08 Aug 2025 11:11:56 -0400 X-MC-Unique: O-pXsDE7PbWaeeLGCXHc0w-1 X-Mimecast-MFC-AGG-ID: O-pXsDE7PbWaeeLGCXHc0w_1754665913 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 43BBC19560B4; Fri, 8 Aug 2025 15:11:53 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9E68C195419C; Fri, 8 Aug 2025 15:11:46 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 01/18] sched/isolation: Enable runtime update of housekeeping cpumasks Date: Fri, 8 Aug 2025 11:10:45 -0400 Message-ID: <20250808151053.19777-2-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" The housekeeping CPU masks, set up by the "isolcpus" and "nohz_full" boot command line options, are used at boot time to exclude selected CPUs from running some kernel background processes to minimize disturbance to latency sensitive userspace applications. Some of housekeeping CPU masks are also checked at run time to avoid using those isolated CPUs. The cpuset subsystem is now able to dynamically create a set of isolated CPUs to be used in isolated cpuset partitions. The long term goal is to make the degree of isolation as close as possible to what can be done statically using those boot command line options. This patch is a step in that direction by providing a new housekeep_exclude_cpumask() API to exclude only the given cpumask from the housekeeping cpumasks. Existing boot time "isolcpus" and "nohz_full" cpumask setup, if present, can be overwritten. Two set of cpumasks are now kept internally. One set are used by the callers while the other set are being updated before the new set are atomically switched on. Signed-off-by: Waiman Long --- include/linux/sched/isolation.h | 6 +++ kernel/sched/isolation.c | 95 +++++++++++++++++++++++++++++---- 2 files changed, 91 insertions(+), 10 deletions(-) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index d8501f4709b5..af38d21d0d00 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -32,6 +32,7 @@ extern bool housekeeping_enabled(enum hk_type type); extern void housekeeping_affine(struct task_struct *t, enum hk_type type); extern bool housekeeping_test_cpu(int cpu, enum hk_type type); extern void __init housekeeping_init(void); +extern int housekeeping_exclude_cpumask(struct cpumask *cpumask, unsigned = long flags); =20 #else =20 @@ -59,6 +60,11 @@ static inline bool housekeeping_test_cpu(int cpu, enum h= k_type type) } =20 static inline void housekeeping_init(void) { } + +static inline housekeeping_exclude_cpumask(struct cpumask *cpumask, unsign= ed long flags) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_CPU_ISOLATION */ =20 static inline bool housekeeping_cpu(int cpu, enum hk_type type) diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index a4cf17b1fab0..3fb0e8ccce26 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -19,8 +19,16 @@ enum hk_flags { DEFINE_STATIC_KEY_FALSE(housekeeping_overridden); EXPORT_SYMBOL_GPL(housekeeping_overridden); =20 +/* + * The housekeeping cpumasks can now be dynamically updated at run time. + * Two set of cpumasks are kept. One set can be used while the other set a= re + * being updated concurrently. + */ +static DEFINE_RAW_SPINLOCK(cpumask_lock); struct housekeeping { - cpumask_var_t cpumasks[HK_TYPE_MAX]; + struct cpumask *cpumask_ptrs[HK_TYPE_MAX]; + cpumask_var_t cpumasks[HK_TYPE_MAX][2]; + unsigned int seq_nrs[HK_TYPE_MAX]; unsigned long flags; }; =20 @@ -38,11 +46,13 @@ int housekeeping_any_cpu(enum hk_type type) =20 if (static_branch_unlikely(&housekeeping_overridden)) { if (housekeeping.flags & BIT(type)) { - cpu =3D sched_numa_find_closest(housekeeping.cpumasks[type], smp_proces= sor_id()); + struct cpumask *cpumask =3D READ_ONCE(housekeeping.cpumask_ptrs[type]); + + cpu =3D sched_numa_find_closest(cpumask, smp_processor_id()); if (cpu < nr_cpu_ids) return cpu; =20 - cpu =3D cpumask_any_and_distribute(housekeeping.cpumasks[type], cpu_onl= ine_mask); + cpu =3D cpumask_any_and_distribute(cpumask, cpu_online_mask); if (likely(cpu < nr_cpu_ids)) return cpu; /* @@ -62,7 +72,7 @@ const struct cpumask *housekeeping_cpumask(enum hk_type t= ype) { if (static_branch_unlikely(&housekeeping_overridden)) if (housekeeping.flags & BIT(type)) - return housekeeping.cpumasks[type]; + return READ_ONCE(housekeeping.cpumask_ptrs[type]); return cpu_possible_mask; } EXPORT_SYMBOL_GPL(housekeeping_cpumask); @@ -71,7 +81,7 @@ void housekeeping_affine(struct task_struct *t, enum hk_t= ype type) { if (static_branch_unlikely(&housekeeping_overridden)) if (housekeeping.flags & BIT(type)) - set_cpus_allowed_ptr(t, housekeeping.cpumasks[type]); + set_cpus_allowed_ptr(t, READ_ONCE(housekeeping.cpumask_ptrs[type])); } EXPORT_SYMBOL_GPL(housekeeping_affine); =20 @@ -79,7 +89,7 @@ bool housekeeping_test_cpu(int cpu, enum hk_type type) { if (static_branch_unlikely(&housekeeping_overridden)) if (housekeeping.flags & BIT(type)) - return cpumask_test_cpu(cpu, housekeeping.cpumasks[type]); + return cpumask_test_cpu(cpu, READ_ONCE(housekeeping.cpumask_ptrs[type])= ); return true; } EXPORT_SYMBOL_GPL(housekeeping_test_cpu); @@ -98,7 +108,7 @@ void __init housekeeping_init(void) =20 for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) { /* We need at least one CPU to handle housekeeping work */ - WARN_ON_ONCE(cpumask_empty(housekeeping.cpumasks[type])); + WARN_ON_ONCE(cpumask_empty(housekeeping.cpumask_ptrs[type])); } } =20 @@ -106,8 +116,10 @@ static void __init housekeeping_setup_type(enum hk_typ= e type, cpumask_var_t housekeeping_staging) { =20 - alloc_bootmem_cpumask_var(&housekeeping.cpumasks[type]); - cpumask_copy(housekeeping.cpumasks[type], + alloc_bootmem_cpumask_var(&housekeeping.cpumasks[type][0]); + alloc_bootmem_cpumask_var(&housekeeping.cpumasks[type][1]); + housekeeping.cpumask_ptrs[type] =3D housekeeping.cpumasks[type][0]; + cpumask_copy(housekeeping.cpumask_ptrs[type], housekeeping_staging); } =20 @@ -161,7 +173,7 @@ static int __init housekeeping_setup(char *str, unsigne= d long flags) =20 for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) { if (!cpumask_equal(housekeeping_staging, - housekeeping.cpumasks[type])) { + housekeeping.cpumask_ptrs[type])) { pr_warn("Housekeeping: nohz_full=3D must match isolcpus=3D\n"); goto free_housekeeping_staging; } @@ -251,3 +263,66 @@ static int __init housekeeping_isolcpus_setup(char *st= r) return housekeeping_setup(str, flags); } __setup("isolcpus=3D", housekeeping_isolcpus_setup); + +/** + * housekeeping_exclude_cpumask - Update housekeeping cpumasks to exclude = only the given cpumask + * @cpumask: new cpumask to be excluded from housekeeping cpumasks + * @hk_flags: bit mask of housekeeping types to be excluded + * Return: 0 if successful, error code if an error happens. + * + * Exclude the given cpumask from the housekeeping cpumasks associated with + * the given hk_flags. If the given cpumask is NULL, no CPU will need to be + * excluded. + */ +int housekeeping_exclude_cpumask(struct cpumask *cpumask, unsigned long hk= _flags) +{ + unsigned long type; + +#ifdef CONFIG_CPUMASK_OFFSTACK + /* + * Pre-allocate cpumasks, if needed + */ + for_each_set_bit(type, &hk_flags, HK_TYPE_MAX) { + cpumask_var_t mask0, mask1; + + if (housekeeping.cpumask_ptrs[type]) + continue; + if (!zalloc_cpumask_var(&mask0, GFP_KERNEL) || + !zalloc_cpumask_var(&mask1, GFP_KERNEL)) + return -ENOMEM; + + /* + * cpumasks[type][] should be NULL, still do a swap & free + * dance just in case the cpumasks are allocated but + * cpumask_ptrs not setup somehow. + */ + mask0 =3D xchg(&housekeeping.cpumasks[type][0], mask0); + mask1 =3D xchg(&housekeeping.cpumasks[type][1], mask1); + free_cpumask_var(mask0); + free_cpumask_var(mask1); + } +#endif + + raw_spin_lock(&cpumask_lock); + + for_each_set_bit(type, &hk_flags, HK_TYPE_MAX) { + int idx =3D ++housekeeping.seq_nrs[type] & 1; + struct cpumask *dst_cpumask =3D housekeeping.cpumasks[type][idx]; + + if (!cpumask) { + cpumask_copy(dst_cpumask, cpu_possible_mask); + housekeeping.flags &=3D ~BIT(type); + } else { + cpumask_andnot(dst_cpumask, cpu_possible_mask, cpumask); + housekeeping.flags |=3D BIT(type); + } + WRITE_ONCE(housekeeping.cpumask_ptrs[type], dst_cpumask); + } + raw_spin_unlock(&cpumask_lock); + + if (!housekeeping.flags && static_key_enabled(&housekeeping_overridden)) + static_key_disable(&housekeeping_overridden.key); + else if (housekeeping.flags && !static_key_enabled(&housekeeping_overridd= en)) + static_key_enable(&housekeeping_overridden.key); + return 0; +} --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C23628137A for ; Fri, 8 Aug 2025 15:12:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665930; cv=none; b=ST+pBzkHSY2RQ9umTQdXge2dnEBVdYqpjZsFt2m/iRkGnzvt9pfFNnyrlgfM/sIYjsk6smCUGDciToGC5k5yVSFeMzTFVG+tr2XwO8+az9lYsKs60L+XxMQVPDteFp2V02TcXlAj7FrdCXsZGghoqIaBDZTynY4kCp2XOoaN9TA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665930; c=relaxed/simple; bh=e8U0ERXlufPRd76pCbuEllcTV+lrPTD+VuR7FbUS0Mc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TLoeYKr+GT1wuyf9Awi2U1QJW/LwmWwfQlOlFykwfTWfLy+kj0g0otgO+v2yPHkNW05zj59V/s67Ttc0VXmlzq4P9F0EmOWlgHsDM4SD9uqJFYGUa81CRbBTvgsaLKzxoUTtJ/OjGF2nm5XpqWxZQ7uT6QvVUxq/z42X7W6OzQo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MICLIKkB; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MICLIKkB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665928; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rS6779GQkHlX78optXa72gUgfXe51djkmFLW5S8q5S8=; b=MICLIKkBu8sLTxr18vyo4X2WNr1Mhf8XEk7W3tOz9WKXwCu0XTlV+e6oWntQFtqBqaHk0W t+d85v743Ha+mRE09Cd0TGkx7g6zcImtmgsULlBO7QE40eHXWTLl5WAXVwqrp2DC4inl84 Na9SakYfNDYZl6ghgYgBCiWg9PHQIAA= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-379-1dkO3sekPdasiBUNCW1yww-1; Fri, 08 Aug 2025 11:12:05 -0400 X-MC-Unique: 1dkO3sekPdasiBUNCW1yww-1 X-Mimecast-MFC-AGG-ID: 1dkO3sekPdasiBUNCW1yww_1754665921 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7823E195608E; Fri, 8 Aug 2025 15:12:00 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 88E301954196; Fri, 8 Aug 2025 15:11:53 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 02/18] sched/isolation: Call sched_tick_offload_init() when HK_FLAG_KERNEL_NOISE is first set Date: Fri, 8 Aug 2025 11:10:46 -0400 Message-ID: <20250808151053.19777-3-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" The sched_tick_offload_init() function is called at boot time whenever "nohz_full" is set. Now housekeeping cpumasks can be updated at run time without the corresponding "nohz_full" kernel parameter. So we have to be able to call sched_tick_offload_init() at run time to allow tick offloading. Remove the __init attribute from sched_tick_offload_init() and call it when the HK_FLAG_KERNEL_NOISE flag is first set. Signed-off-by: Waiman Long --- kernel/sched/core.c | 2 +- kernel/sched/isolation.c | 10 +++++++++- kernel/sched/sched.h | 2 +- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index be00629f0ba4..9f02c047e25b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5783,7 +5783,7 @@ static void sched_tick_stop(int cpu) } #endif /* CONFIG_HOTPLUG_CPU */ =20 -int __init sched_tick_offload_init(void) +int sched_tick_offload_init(void) { tick_work_cpu =3D alloc_percpu(struct tick_work); BUG_ON(!tick_work_cpu); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 3fb0e8ccce26..ee396ae13719 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -33,6 +33,7 @@ struct housekeeping { }; =20 static struct housekeeping housekeeping; +static bool sched_tick_offload_inited; =20 bool housekeeping_enabled(enum hk_type type) { @@ -103,8 +104,10 @@ void __init housekeeping_init(void) =20 static_branch_enable(&housekeeping_overridden); =20 - if (housekeeping.flags & HK_FLAG_KERNEL_NOISE) + if (housekeeping.flags & HK_FLAG_KERNEL_NOISE) { sched_tick_offload_init(); + sched_tick_offload_inited =3D true; + } =20 for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) { /* We need at least one CPU to handle housekeeping work */ @@ -324,5 +327,10 @@ int housekeeping_exclude_cpumask(struct cpumask *cpuma= sk, unsigned long hk_flags static_key_disable(&housekeeping_overridden.key); else if (housekeeping.flags && !static_key_enabled(&housekeeping_overridd= en)) static_key_enable(&housekeeping_overridden.key); + + if ((housekeeping.flags & HK_FLAG_KERNEL_NOISE) && !sched_tick_offload_in= ited) { + sched_tick_offload_init(); + sched_tick_offload_inited =3D true; + } return 0; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index be9745d104f7..d4676305e099 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2671,7 +2671,7 @@ extern void post_init_entity_util_avg(struct task_str= uct *p); =20 #ifdef CONFIG_NO_HZ_FULL extern bool sched_can_stop_tick(struct rq *rq); -extern int __init sched_tick_offload_init(void); +extern int sched_tick_offload_init(void); =20 /* * Tick may be needed by tasks in the runqueue depending on their policy a= nd --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF48B281526 for ; Fri, 8 Aug 2025 15:12:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665936; cv=none; b=CpY3ne7QgvSSrd617phDJviRyL2MI4S8DH8zJCTtN5YE17HeJTn2oiJugEX/xTGG+RrxjVRL670PvOTPEjM+GRX8qvnG0KV1xGn/9iognYwIyBUOsUT2eoopih5tgTSOGjSpfUgSMnjEviCKgQB1Xjc/pk3S8o63yf9cG3GSyZU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665936; c=relaxed/simple; bh=QRY16VAW4X4F0xXHp04SZrPFAFE5kMQLXklX4MXdJ+A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JfohlzFUtpqIo3RTweita6W+SVMzX/HC0Szo9waQXhINE9MVRAfSOLAR3McAofJDE/ISD9Egr0VAtIlKh7QsJ6HuzzHyo+Ltqm+EgLZwyAqBAblHssKmvQEJLMTp0JQ3GdGR0lKs3QNi6DuhM6KA9/eM3vWUbBIFyqXEVlURZLk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EamPtlCC; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EamPtlCC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665933; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u94p6gUaRQMoeY109WbKTEOrDCchNSj9vS9OMLPC8GY=; b=EamPtlCClVmtWNtdcCBPJYK2MfcRLiV5iYy3v8rq7XZ6+qwagAI0TCqkKONeYqb0yrIE4f rV1jvjzew2FEgUHBgY7UsqLZgF7lmN0xJ3ON5vPwocH5JPKuaz9Xu7XuF8dvBM9AUOWMpe CagdfrdrQt44YOjuv7XCMAthX5A0wh8= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-42-g97BaJH6Pcm_bPkS05BaYw-1; Fri, 08 Aug 2025 11:12:12 -0400 X-MC-Unique: g97BaJH6Pcm_bPkS05BaYw-1 X-Mimecast-MFC-AGG-ID: g97BaJH6Pcm_bPkS05BaYw_1754665928 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0C8EC1800289; Fri, 8 Aug 2025 15:12:08 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BDCEB1954199; Fri, 8 Aug 2025 15:12:00 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 03/18] sched/isolation: Use RCU to delay successive housekeeping cpumask updates Date: Fri, 8 Aug 2025 11:10:47 -0400 Message-ID: <20250808151053.19777-4-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" Even though there are 2 separate sets of housekeeping cpumasks for access and update, it is possible that the set of cpumasks to be updated are still being used by the callers of housekeeping functions resulting in the use of an intermediate cpumask between the new and old ones. To reduce the chance of this, we need to introduce delay between successive housekeeping cpumask updates. One simple way is to make use of the RCU grace period delay. The callers of the housekeeping APIs can optionally hold rcu_read_lock to eliminate the chance of using intermediate housekeeping cpumasks. Signed-off-by: Waiman Long --- kernel/sched/isolation.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index ee396ae13719..f26708667754 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -23,6 +23,9 @@ EXPORT_SYMBOL_GPL(housekeeping_overridden); * The housekeeping cpumasks can now be dynamically updated at run time. * Two set of cpumasks are kept. One set can be used while the other set a= re * being updated concurrently. + * + * rcu_read_lock() can optionally be held by housekeeping API callers to + * ensure stability of the cpumasks. */ static DEFINE_RAW_SPINLOCK(cpumask_lock); struct housekeeping { @@ -34,6 +37,8 @@ struct housekeeping { =20 static struct housekeeping housekeeping; static bool sched_tick_offload_inited; +static struct rcu_head rcu_gp[HK_TYPE_MAX]; +static unsigned long update_flags; =20 bool housekeeping_enabled(enum hk_type type) { @@ -267,6 +272,18 @@ static int __init housekeeping_isolcpus_setup(char *st= r) } __setup("isolcpus=3D", housekeeping_isolcpus_setup); =20 +/* + * Bits in update_flags can only turned on with cpumask_lock held and + * cleared by this RCU callback function. + */ +static void rcu_gp_end(struct rcu_head *rcu) +{ + int type =3D rcu - rcu_gp; + + /* Atomically clear the corresponding flag bit */ + clear_bit(type, &update_flags); +} + /** * housekeeping_exclude_cpumask - Update housekeeping cpumasks to exclude = only the given cpumask * @cpumask: new cpumask to be excluded from housekeeping cpumasks @@ -306,8 +323,21 @@ int housekeeping_exclude_cpumask(struct cpumask *cpuma= sk, unsigned long hk_flags } #endif =20 +retry: + /* + * If the RCU grace period for the previous update with conflicting + * flag bits hasn't been completed yet, we have to wait for it. + */ + while (READ_ONCE(update_flags) & hk_flags) + synchronize_rcu(); + raw_spin_lock(&cpumask_lock); =20 + if (READ_ONCE(update_flags) & hk_flags) { + raw_spin_unlock(&cpumask_lock); + goto retry; + } + for_each_set_bit(type, &hk_flags, HK_TYPE_MAX) { int idx =3D ++housekeeping.seq_nrs[type] & 1; struct cpumask *dst_cpumask =3D housekeeping.cpumasks[type][idx]; @@ -320,8 +350,11 @@ int housekeeping_exclude_cpumask(struct cpumask *cpuma= sk, unsigned long hk_flags housekeeping.flags |=3D BIT(type); } WRITE_ONCE(housekeeping.cpumask_ptrs[type], dst_cpumask); + set_bit(type, &update_flags); } raw_spin_unlock(&cpumask_lock); + for_each_set_bit(type, &hk_flags, HK_TYPE_MAX) + call_rcu(&rcu_gp[type], rcu_gp_end); =20 if (!housekeeping.flags && static_key_enabled(&housekeeping_overridden)) static_key_disable(&housekeeping_overridden.key); --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDEFF28153D for ; Fri, 8 Aug 2025 15:12:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665946; cv=none; b=OP/QFAxL7TnZsbvEDwZctVgeSAxCcT3JWYSirgPEeywG0IRQj2h4Zi05aTfqlbejEDg7BNOTBpinKVZFge5Tq9AFnkRldNUt70UDJpT0sg+uei1okg64s0ge04cARhb8nVguk2wnXoqbk70FjnBqiuW/FZSxYyCf203eegdlA20= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665946; c=relaxed/simple; bh=E5C5hX5eN0f29firekyx2hBKs492l3eG/OpZn3r3C/k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RSWT9KZdj1BQ5JlnjXzK0uY4Q5bQiU38YCX7HWwDTB39weW7s+WlbOyvnl6eXiiuCkM/EGAAZwoSHD2U6vw4+gsWiSoKhR0fXsF8/XokzsF9eIyooqWF/bAs6UCV4AXwcJzyYOOfo59tl+t4+lYgayG2xgC+/W9tXgZ2DZhBXZY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RRq3nNX8; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RRq3nNX8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665943; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=roaCs6aaTpSLNP30noSwdjwLN929rnnIEU4sqPWHpnk=; b=RRq3nNX8YRZb+nzFEzbyQ7+ymaSVV+k5qMhas1Rpk4WNhG1dBDrZCuZuUtJGQWMMDsEpze ztlaOE/d2b1hXXTMBFYB5tH7Xwox8lpOfh4QgiVLefrteG9nb0h2PKMUe7F1pkG74MyUJJ /x+hglNHoXcdE7ZCMFemYo6bzpRERe4= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-246-02rJ4c0cMIWbuNa0oT-s6w-1; Fri, 08 Aug 2025 11:12:18 -0400 X-MC-Unique: 02rJ4c0cMIWbuNa0oT-s6w-1 X-Mimecast-MFC-AGG-ID: 02rJ4c0cMIWbuNa0oT-s6w_1754665935 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 92B1A180028D; Fri, 8 Aug 2025 15:12:14 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 505F71954196; Fri, 8 Aug 2025 15:12:08 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 04/18] sched/isolation: Add a debugfs file to dump housekeeping cpumasks Date: Fri, 8 Aug 2025 11:10:48 -0400 Message-ID: <20250808151053.19777-5-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" As housekeeping cpumasks can now be modified at run time, we need a way to examine the their current values to see if they meet our expectation. Add a new sched debugfs file "housekeeping_cpumasks" to dump out the current values. Signed-off-by: Waiman Long --- kernel/sched/debug.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 3f06ab84d53f..ba8f0334c15e 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -490,6 +490,35 @@ static void debugfs_fair_server_init(void) } } =20 +#ifdef CONFIG_CPU_ISOLATION +static int hk_cpumasks_show(struct seq_file *m, void *v) +{ + static const char * const hk_type_name[HK_TYPE_MAX] =3D { + [HK_TYPE_DOMAIN] =3D "domain", + [HK_TYPE_MANAGED_IRQ] =3D "managed_irq", + [HK_TYPE_KERNEL_NOISE] =3D "nohz_full" + }; + int type; + + for (type =3D 0; type < HK_TYPE_MAX; type++) + seq_printf(m, "%s: %*pbl\n", hk_type_name[type], + cpumask_pr_args(housekeeping_cpumask(type))); + return 0; +} + +static int hk_cpumasks_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, hk_cpumasks_show, NULL); +} + +static const struct file_operations hk_cpumasks_fops =3D { + .open =3D hk_cpumasks_open, + .read =3D seq_read, + .llseek =3D seq_lseek, + .release =3D seq_release, +}; +#endif + static __init int sched_init_debug(void) { struct dentry __maybe_unused *numa; @@ -525,6 +554,9 @@ static __init int sched_init_debug(void) debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing= _hot_threshold); #endif /* CONFIG_NUMA_BALANCING */ =20 +#ifdef CONFIG_CPU_ISOLATION + debugfs_create_file("housekeeing_cpumasks", 0444, debugfs_sched, NULL, &h= k_cpumasks_fops); +#endif debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops= ); =20 debugfs_fair_server_init(); --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9007928033C for ; Fri, 8 Aug 2025 15:12:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665954; cv=none; b=CiVHibt7jDrTOy00ePEP5xPlBOOrUdaJkXPH0nyd/ZxIOIoV2xHMc+DFMWEb8x04jzr9yw72Yi/DhixKP4tbi8o5bfniESizEsuf/Dr4KjfA/lZn0HDKhzweWhLdcgEP+wAF1LXocTQrQJiEzvA3kllriaYlajEsz6ARqj+4IJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665954; c=relaxed/simple; bh=FgjIWfU+qvY0l477cECYpRJobxSzgSfnhfM0BHAM+Bw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GGVf61bFCWXeYeThFTEXmjsSvHZdWsuVUpvfkmx05Pa455/4Faev4e40C4L2e9PZv0o5tBiTP3DipzHt01Qs8Fd+PFdZzffWH+xlCrecKSY9pmmcG8QElx+lx8mJXeS1P9zKOxQExUKNc59KmIVcUcdJa2ypmL07iFr4PJfW9Mg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=g/BZ1pQ0; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="g/BZ1pQ0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665951; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A3YmjVs5UGHP5zKmSwFLIwmnWgiM9jmDjv5kWmHyuwI=; b=g/BZ1pQ0i3B+QXex54oZacwY8JDB3XHhAZGMtZeG9m47qPt2kvrspGx2HUVDATnG5jaUPI WXYrCEE3Fi6nkcagrcE8i140hPoHSrZxcQAST0CfjZhgsWfMpZ8OZmOsHxATY2q9iqhZrh 1dwU086moYtivB25ikqe3JfV87Xsz5M= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-381-nlGtU5NvMkaxGDR46lIxJw-1; Fri, 08 Aug 2025 11:12:26 -0400 X-MC-Unique: nlGtU5NvMkaxGDR46lIxJw-1 X-Mimecast-MFC-AGG-ID: nlGtU5NvMkaxGDR46lIxJw_1754665942 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 17C241800371; Fri, 8 Aug 2025 15:12:22 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D90541954199; Fri, 8 Aug 2025 15:12:14 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 05/18] cpu/hotplug: Add a new cpuhp_offline_cb() API Date: Fri, 8 Aug 2025 11:10:49 -0400 Message-ID: <20250808151053.19777-6-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" Add a new cpuhp_offline_cb() API that allows us to offline a set of CPUs one-by-one, run the given callback function and then bring those CPUs back online again while inhibiting any concurrent CPU hotplug operations from happening. This new API can be used to enable runtime adjustment of nohz_full and isolcpus boot command line options. A new cpuhp_offline_cb_mode flag is also added to signal that the system is in this offline callback transient state so that some hotplug operations can be optimized out if we choose to. Signed-off-by: Waiman Long --- include/linux/cpuhplock.h | 9 ++++++++ kernel/cpu.c | 47 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/include/linux/cpuhplock.h b/include/linux/cpuhplock.h index f7aa20f62b87..b42b81361abc 100644 --- a/include/linux/cpuhplock.h +++ b/include/linux/cpuhplock.h @@ -9,7 +9,9 @@ =20 #include #include +#include =20 +typedef int (*cpuhp_cb_t)(void *arg); struct device; =20 extern int lockdep_is_cpus_held(void); @@ -28,6 +30,8 @@ void clear_tasks_mm_cpumask(int cpu); int remove_cpu(unsigned int cpu); int cpu_device_down(struct device *dev); void smp_shutdown_nonboot_cpus(unsigned int primary_cpu); +int cpuhp_offline_cb(struct cpumask *mask, cpuhp_cb_t func, void *arg); +extern bool cpuhp_offline_cb_mode; =20 #else /* CONFIG_HOTPLUG_CPU */ =20 @@ -42,6 +46,11 @@ static inline void cpu_hotplug_disable(void) { } static inline void cpu_hotplug_enable(void) { } static inline int remove_cpu(unsigned int cpu) { return -EPERM; } static inline void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) { } +static inline int cpuhp_offline_cb(struct cpumask *mask, cpuhp_cb_t func, = void *arg) +{ + return -EPERM; +} +#define cpuhp_offline_cb_mode false #endif /* !CONFIG_HOTPLUG_CPU */ =20 DEFINE_LOCK_GUARD_0(cpus_read_lock, cpus_read_lock(), cpus_read_unlock()) diff --git a/kernel/cpu.c b/kernel/cpu.c index faf0f23fc5d8..b6364a1950b1 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1534,6 +1534,53 @@ int remove_cpu(unsigned int cpu) } EXPORT_SYMBOL_GPL(remove_cpu); =20 +bool cpuhp_offline_cb_mode; + +/** + * cpuhp_offline_cb - offline CPUs, invoke callback function & online CPUs= afterward + * @mask: A mask of CPUs to be taken offline and then online + * @func: A callback function to be invoked while the given CPUs are offli= ne + * @arg: Argument to be passed back to the callback function + * Return: 0 if successful, an error code otherwise + */ +int cpuhp_offline_cb(struct cpumask *mask, cpuhp_cb_t func, void *arg) +{ + int cpu, ret, ret2 =3D 0; + + if (WARN_ON_ONCE(cpumask_empty(mask))) + return -EINVAL; + + lock_device_hotplug(); + cpuhp_offline_cb_mode =3D true; + for_each_cpu(cpu, mask) { + ret =3D device_offline(get_cpu_device(cpu)); + if (unlikely(ret)) { + int cpu2; + + /* Online the offline CPUs before returning */ + for_each_cpu(cpu2, mask) { + if (cpu2 =3D=3D cpu) + break; + device_online(get_cpu_device(cpu2)); + } + goto out; + } + } + ret =3D func(arg); + + /* Bring CPUs back online */ + for_each_cpu(cpu, mask) { + int ret3 =3D device_online(get_cpu_device(cpu)); + + if (ret3 && !ret2) + ret2 =3D ret3; + } +out: + cpuhp_offline_cb_mode =3D false; + unlock_device_hotplug(); + return ret ? ret : (ret2 ? ret2 : 0); +} + void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) { unsigned int cpu; --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FA64283FFA for ; Fri, 8 Aug 2025 15:12:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665959; cv=none; b=R4YQvXI0yiHWdTvPATPKnBD87PrM/fuWe8T1ypYadnQ+DRYzfQcUdaD3Fq/QWNQ2zTf8y/utnOESXX8RgPH40aE1dCD5uKKVm6JkyrpHE9RfKuquNQ7Nyd/yhhNjGSXiy1uCJ7b1KOwdCv/9jxKEy84vSsUZymhnLUuPhzh/KmM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665959; c=relaxed/simple; bh=bKGzmCs04LfQYlOkM2mmW14HoIY6BlJGh+5yZVS1gqk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZF4hNDuP646YvPCDYyFSL9zLjdWx9boknDth456tHr7D7o3ysDk53GOq7qwWuBPjMqxi4cs7vKglyynnedlsNeEWuKpux1VYAuV0DND9hZf5SwdgPCOFyj8TF1XNda3OCNJpVaa+qQewyhS/4LGVutMSyjdSTnFxQQ0Kgo89vDo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fk5V10Oi; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fk5V10Oi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=azQU8QVYfgv/z/JeThAegMECgITB7qX+pmu/A49K2aI=; b=fk5V10Oi+413hPZWF6QQ/oKPum3ToNn+sX8dvfRuQndn6Ro/uVpnKvR2sItuv5t2MoRcq1 CTrkuMDRYfOGFadHWblXVhKidphNsjFhVP9rMr5N7DqoDB7orC3L/jWF0kcsmgEN5QsXmG ve/vfz0c7/6XPeuXaUGQM3Qy5eomVXQ= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-216-DY2KxiU2MFKqn7LUev5NUA-1; Fri, 08 Aug 2025 11:12:32 -0400 X-MC-Unique: DY2KxiU2MFKqn7LUev5NUA-1 X-Mimecast-MFC-AGG-ID: DY2KxiU2MFKqn7LUev5NUA_1754665949 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DED271956048; Fri, 8 Aug 2025 15:12:28 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5EEF91954196; Fri, 8 Aug 2025 15:12:22 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 06/18] cgroup/cpuset: Introduce a new top level isolcpus_update_mutex Date: Fri, 8 Aug 2025 11:10:50 -0400 Message-ID: <20250808151053.19777-7-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" The current cpuset partition code is able to dynamically update the sched domains of a running system to perform what is essentally the "isolcpus=3Ddomain,..." boot command line feature at run time. To enable runtime modification of nohz_full, we will have to make use of the CPU hotplug functionality to facilitate the proper addition or subtraction of nohz_full CPUs. In other word, we can't hold the cpu_hotplug_lock while doing so. Given the current lock ordering, we will need to introduce a new top level mutex to ensure proper mutual exclusion in case there is a need to update the cpuset states that may require the use of CPU hotplug. This patch introduces a new top level isolcpus_update_mutex for such purpose. This new mutex will be acquired in case the cpuset partition states or the set of isolated CPUs may have to be changed. The update_unbound_workqueue_cpumask() is now renamed to update_isolation_cpumasks() and moved outside of cpu_hotplug_lock critical regions to enable its future extension to invoke CPU hotplug. A new global isolcpus_update_state structure is added to track if update_isolation_cpumasks() will need to be invoked. So the existing partition_xcpus_add/del() functions and their callers can now be simplified. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 149 ++++++++++++++++++++++++----------------- 1 file changed, 86 insertions(+), 63 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 27adb04df675..2190efd33efb 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -215,29 +215,39 @@ static struct cpuset top_cpuset =3D { }; =20 /* - * There are two global locks guarding cpuset structures - cpuset_mutex and - * callback_lock. The cpuset code uses only cpuset_mutex. Other kernel - * subsystems can use cpuset_lock()/cpuset_unlock() to prevent change to c= puset - * structures. Note that cpuset_mutex needs to be a mutex as it is used in - * paths that rely on priority inheritance (e.g. scheduler - on RT) for - * correctness. + * CPUSET Locking Convention + * ------------------------- * - * A task must hold both locks to modify cpusets. If a task holds - * cpuset_mutex, it blocks others, ensuring that it is the only task able = to - * also acquire callback_lock and be able to modify cpusets. It can perfo= rm - * various checks on the cpuset structure first, knowing nothing will chan= ge. - * It can also allocate memory while just holding cpuset_mutex. While it = is - * performing these checks, various callback routines can briefly acquire - * callback_lock to query cpusets. Once it is ready to make the changes, = it - * takes callback_lock, blocking everyone else. + * Below are the three global locks guarding cpuset structures in lock + * acquisition order: + * - isolcpus_update_mutex + * - cpu_hotplug_lock (cpus_read_lock/cpus_write_lock) + * - cpuset_mutex + * - callback_lock (raw spinlock) * - * Calls to the kernel memory allocator can not be made while holding - * callback_lock, as that would risk double tripping on callback_lock - * from one of the callbacks into the cpuset code from within - * __alloc_pages(). + * The first isolcpus_update_mutex should only be held if the existing set= of + * isolated CPUs (in isolated partition) or any of the partition states ma= y be + * changed. Otherwise, it can be skipped. This is used to prevent concurre= nt + * updates to the set of isolated CPUs. * - * If a task is only holding callback_lock, then it has read-only - * access to cpusets. + * A task must hold all the remaining three locks to modify externally vis= ible + * or used fields of cpusets, though some of the internally used cpuset fi= elds + * can be modified by holding cpu_hotplug_lock and cpuset_mutex only. If o= nly + * reliable read access of the externally used fields are needed, a task c= an + * hold either cpuset_mutex or callback_lock. + * + * If a task holds cpu_hotplug_lock and cpuset_mutex, it blocks others, + * ensuring that it is the only task able to also acquire callback_lock and + * be able to modify cpusets. It can perform various checks on the cpuset + * structure first, knowing nothing will change. It can also allocate memo= ry + * without holding callback_lock. While it is performing these checks, var= ious + * callback routines can briefly acquire callback_lock to query cpusets. = Once + * it is ready to make the changes, it takes callback_lock, blocking every= one + * else. + * + * Calls to the kernel memory allocator cannot be made while holding + * callback_lock which is a spinlock, as the memory allocator may sleep or + * call back into cpuset code and acquire callback_lock. * * Now, the task_struct fields mems_allowed and mempolicy may be changed * by other task, we use alloc_lock in the task_struct fields to protect @@ -248,6 +258,7 @@ static struct cpuset top_cpuset =3D { * cpumasks and nodemasks. */ =20 +static DEFINE_MUTEX(isolcpus_update_mutex); static DEFINE_MUTEX(cpuset_mutex); =20 void cpuset_lock(void) @@ -272,6 +283,17 @@ void cpuset_callback_unlock_irq(void) spin_unlock_irq(&callback_lock); } =20 +/* + * Isolcpus update state (protected by isolcpus_update_mutex mutex) + * + * It contains data related to updating the isolated CPUs configuration in + * isolated partitions. + */ +static struct { + bool updating; /* Isolcpus updating in progress */ + cpumask_var_t cpus; /* CPUs to be updated */ +} isolcpus_update_state; + static struct workqueue_struct *cpuset_migrate_mm_wq; =20 static DECLARE_WAIT_QUEUE_HEAD(cpuset_attach_wq); @@ -1273,6 +1295,9 @@ static void isolated_cpus_update(int old_prs, int new= _prs, struct cpumask *xcpus cpumask_or(isolated_cpus, isolated_cpus, xcpus); else cpumask_andnot(isolated_cpus, isolated_cpus, xcpus); + + isolcpus_update_state.updating =3D true; + cpumask_or(isolcpus_update_state.cpus, isolcpus_update_state.cpus, xcpus); } =20 /* @@ -1280,31 +1305,26 @@ static void isolated_cpus_update(int old_prs, int n= ew_prs, struct cpumask *xcpus * @new_prs: new partition_root_state * @parent: parent cpuset * @xcpus: exclusive CPUs to be added - * Return: true if isolated_cpus modified, false otherwise * * Remote partition if parent =3D=3D NULL */ -static bool partition_xcpus_add(int new_prs, struct cpuset *parent, +static void partition_xcpus_add(int new_prs, struct cpuset *parent, struct cpumask *xcpus) { - bool isolcpus_updated; - WARN_ON_ONCE(new_prs < 0); lockdep_assert_held(&callback_lock); if (!parent) parent =3D &top_cpuset; =20 - if (parent =3D=3D &top_cpuset) cpumask_or(subpartitions_cpus, subpartitions_cpus, xcpus); =20 - isolcpus_updated =3D (new_prs !=3D parent->partition_root_state); - if (isolcpus_updated) + if (new_prs !=3D parent->partition_root_state) isolated_cpus_update(parent->partition_root_state, new_prs, xcpus); =20 cpumask_andnot(parent->effective_cpus, parent->effective_cpus, xcpus); - return isolcpus_updated; + return; } =20 /* @@ -1312,15 +1332,12 @@ static bool partition_xcpus_add(int new_prs, struct= cpuset *parent, * @old_prs: old partition_root_state * @parent: parent cpuset * @xcpus: exclusive CPUs to be removed - * Return: true if isolated_cpus modified, false otherwise * * Remote partition if parent =3D=3D NULL */ -static bool partition_xcpus_del(int old_prs, struct cpuset *parent, +static void partition_xcpus_del(int old_prs, struct cpuset *parent, struct cpumask *xcpus) { - bool isolcpus_updated; - WARN_ON_ONCE(old_prs < 0); lockdep_assert_held(&callback_lock); if (!parent) @@ -1329,27 +1346,33 @@ static bool partition_xcpus_del(int old_prs, struct= cpuset *parent, if (parent =3D=3D &top_cpuset) cpumask_andnot(subpartitions_cpus, subpartitions_cpus, xcpus); =20 - isolcpus_updated =3D (old_prs !=3D parent->partition_root_state); - if (isolcpus_updated) + if (old_prs !=3D parent->partition_root_state) isolated_cpus_update(old_prs, parent->partition_root_state, xcpus); =20 cpumask_and(xcpus, xcpus, cpu_active_mask); cpumask_or(parent->effective_cpus, parent->effective_cpus, xcpus); - return isolcpus_updated; + return; } =20 -static void update_unbound_workqueue_cpumask(bool isolcpus_updated) +/** + * update_isolation_cpumasks - Update external isolation CPU masks + * + * The following external CPU masks will be updated if necessary: + * - workqueue unbound cpumask + */ +static void update_isolation_cpumasks(void) { int ret; =20 - lockdep_assert_cpus_held(); - - if (!isolcpus_updated) + if (!isolcpus_update_state.updating) return; =20 ret =3D workqueue_unbound_exclude_cpumask(isolated_cpus); WARN_ON_ONCE(ret < 0); + + cpumask_clear(isolcpus_update_state.cpus); + isolcpus_update_state.updating =3D false; } =20 /** @@ -1441,8 +1464,6 @@ static inline bool is_local_partition(struct cpuset *= cs) static int remote_partition_enable(struct cpuset *cs, int new_prs, struct tmpmasks *tmp) { - bool isolcpus_updated; - /* * The user must have sysadmin privilege. */ @@ -1466,11 +1487,10 @@ static int remote_partition_enable(struct cpuset *c= s, int new_prs, return PERR_INVCPUS; =20 spin_lock_irq(&callback_lock); - isolcpus_updated =3D partition_xcpus_add(new_prs, NULL, tmp->new_cpus); + partition_xcpus_add(new_prs, NULL, tmp->new_cpus); list_add(&cs->remote_sibling, &remote_children); cpumask_copy(cs->effective_xcpus, tmp->new_cpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); cpuset_force_rebuild(); cs->prs_err =3D 0; =20 @@ -1493,15 +1513,12 @@ static int remote_partition_enable(struct cpuset *c= s, int new_prs, */ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *t= mp) { - bool isolcpus_updated; - WARN_ON_ONCE(!is_remote_partition(cs)); WARN_ON_ONCE(!cpumask_subset(cs->effective_xcpus, subpartitions_cpus)); =20 spin_lock_irq(&callback_lock); list_del_init(&cs->remote_sibling); - isolcpus_updated =3D partition_xcpus_del(cs->partition_root_state, - NULL, cs->effective_xcpus); + partition_xcpus_del(cs->partition_root_state, NULL, cs->effective_xcpus); if (cs->prs_err) cs->partition_root_state =3D -cs->partition_root_state; else @@ -1511,7 +1528,6 @@ static void remote_partition_disable(struct cpuset *c= s, struct tmpmasks *tmp) compute_effective_exclusive_cpumask(cs, NULL, NULL); reset_partition_data(cs); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); cpuset_force_rebuild(); =20 /* @@ -1536,7 +1552,6 @@ static void remote_cpus_update(struct cpuset *cs, str= uct cpumask *xcpus, { bool adding, deleting; int prs =3D cs->partition_root_state; - int isolcpus_updated =3D 0; =20 if (WARN_ON_ONCE(!is_remote_partition(cs))) return; @@ -1569,9 +1584,9 @@ static void remote_cpus_update(struct cpuset *cs, str= uct cpumask *xcpus, =20 spin_lock_irq(&callback_lock); if (adding) - isolcpus_updated +=3D partition_xcpus_add(prs, NULL, tmp->addmask); + partition_xcpus_add(prs, NULL, tmp->addmask); if (deleting) - isolcpus_updated +=3D partition_xcpus_del(prs, NULL, tmp->delmask); + partition_xcpus_del(prs, NULL, tmp->delmask); /* * Need to update effective_xcpus and exclusive_cpus now as * update_sibling_cpumasks() below may iterate back to the same cs. @@ -1580,7 +1595,6 @@ static void remote_cpus_update(struct cpuset *cs, str= uct cpumask *xcpus, if (xcpus) cpumask_copy(cs->exclusive_cpus, xcpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); if (adding || deleting) cpuset_force_rebuild(); =20 @@ -1662,7 +1676,6 @@ static int update_parent_effective_cpumask(struct cpu= set *cs, int cmd, int old_prs, new_prs; int part_error =3D PERR_NONE; /* Partition error? */ int subparts_delta =3D 0; - int isolcpus_updated =3D 0; struct cpumask *xcpus =3D user_xcpus(cs); bool nocpu; =20 @@ -1932,18 +1945,15 @@ static int update_parent_effective_cpumask(struct c= puset *cs, int cmd, * and vice versa. */ if (adding) - isolcpus_updated +=3D partition_xcpus_del(old_prs, parent, - tmp->addmask); + partition_xcpus_del(old_prs, parent, tmp->addmask); if (deleting) - isolcpus_updated +=3D partition_xcpus_add(new_prs, parent, - tmp->delmask); + partition_xcpus_add(new_prs, parent, tmp->delmask); =20 if (is_partition_valid(parent)) { parent->nr_subparts +=3D subparts_delta; WARN_ON_ONCE(parent->nr_subparts < 0); } spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); =20 if ((old_prs !=3D new_prs) && (cmd =3D=3D partcmd_update)) update_partition_exclusive_flag(cs, new_prs); @@ -2968,7 +2978,6 @@ static int update_prstate(struct cpuset *cs, int new_= prs) else if (isolcpus_updated) isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); =20 /* Force update if switching back to member & update effective_xcpus */ update_cpumasks_hier(cs, &tmpmask, !new_prs); @@ -3224,6 +3233,7 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file = *of, int retval =3D -ENODEV; =20 buf =3D strstrip(buf); + mutex_lock(&isolcpus_update_mutex); cpus_read_lock(); mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) @@ -3256,6 +3266,8 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file = *of, out_unlock: mutex_unlock(&cpuset_mutex); cpus_read_unlock(); + update_isolation_cpumasks(); + mutex_unlock(&isolcpus_update_mutex); flush_workqueue(cpuset_migrate_mm_wq); return retval ?: nbytes; } @@ -3358,12 +3370,15 @@ static ssize_t cpuset_partition_write(struct kernfs= _open_file *of, char *buf, else return -EINVAL; =20 + mutex_lock(&isolcpus_update_mutex); cpus_read_lock(); mutex_lock(&cpuset_mutex); if (is_cpuset_online(cs)) retval =3D update_prstate(cs, val); mutex_unlock(&cpuset_mutex); cpus_read_unlock(); + update_isolation_cpumasks(); + mutex_unlock(&isolcpus_update_mutex); return retval ?: nbytes; } =20 @@ -3586,15 +3601,22 @@ static void cpuset_css_killed(struct cgroup_subsys_= state *css) { struct cpuset *cs =3D css_cs(css); =20 + mutex_lock(&isolcpus_update_mutex); + /* + * Here the partition root state can't be changed by user again. + */ + if (!is_partition_valid(cs)) + goto out; + cpus_read_lock(); mutex_lock(&cpuset_mutex); - /* Reset valid partition back to member */ - if (is_partition_valid(cs)) - update_prstate(cs, PRS_MEMBER); - + update_prstate(cs, PRS_MEMBER); mutex_unlock(&cpuset_mutex); cpus_read_unlock(); + update_isolation_cpumasks(); +out: + mutex_unlock(&isolcpus_update_mutex); =20 } =20 @@ -3751,6 +3773,7 @@ int __init cpuset_init(void) BUG_ON(!alloc_cpumask_var(&top_cpuset.exclusive_cpus, GFP_KERNEL)); BUG_ON(!zalloc_cpumask_var(&subpartitions_cpus, GFP_KERNEL)); BUG_ON(!zalloc_cpumask_var(&isolated_cpus, GFP_KERNEL)); + BUG_ON(!zalloc_cpumask_var(&isolcpus_update_state.cpus, GFP_KERNEL)); =20 cpumask_setall(top_cpuset.cpus_allowed); nodes_setall(top_cpuset.mems_allowed); --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74133280332 for ; Fri, 8 Aug 2025 15:12:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665965; cv=none; b=cVIJ3D2DrO8kwIzDds2weTVqrFApR9ZpB0LDGpRVGPZlUAPSn2U8qotFYM1Vc3zHDmSOlCWPOtrICXvvWEXt+f+jOlA6q8st7MjwZRLhMQOBB8BgQh1NmiWqe1aLkkVQyyX3iRJioFv7zj+xWoh2y13wrm5Tg/NmcnUV1VTABeY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665965; c=relaxed/simple; bh=WxcyJCO0fWF/itVzNgLAUOJpV3YrZhfDRj22jYzwLrY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y0Q/hp0itrr8MDvY3M7CGTkNBzvJGY+bXeq40s3LYuuCO32BrS04FK+5oe8gy6VReuzDDFoNPht/S0XhZNQE6ywRoJ3GOdXLISnMHcraGt0r11qGqzAhA3+vCNO/7zbrfZZ2hDb49gqZBrcLe7HlEiPuDcl6rFw3W9lw1sy/Nyk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iu5/+vun; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iu5/+vun" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TXhSdoNCpYJCnFLmB+Mv+/bGCPmxw1lkTCoajFKNs5U=; b=iu5/+vunDfT2UqEvpJOFgP/riOtnHjhV0t5jBfLGl8Y6gFgB/dxvrzTcB6pJEdKPwVzN87 lR5TW7g+d5rfr20KC17/wJ743Q3kuN0wbTOjC3qDOHy3SPttLbvUTkWYNRuXGyF4BADyOR r0DoNIXV0UPFMBjttaCzb3jnfxTx8gY= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-257-soRIi8mEPtCYPkmgpol81A-1; Fri, 08 Aug 2025 11:12:39 -0400 X-MC-Unique: soRIi8mEPtCYPkmgpol81A-1 X-Mimecast-MFC-AGG-ID: soRIi8mEPtCYPkmgpol81A_1754665955 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 60AE018004A7; Fri, 8 Aug 2025 15:12:35 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3182A1954196; Fri, 8 Aug 2025 15:12:29 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 07/18] cgroup/cpuset: Allow overwriting HK_TYPE_DOMAIN housekeeping cpumask Date: Fri, 8 Aug 2025 11:10:51 -0400 Message-ID: <20250808151053.19777-8-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" As we did not modify housekeeping cpumasks in the creation of cpuset partition before, we had to disallow the creation of non-isolated partitions from using any of the HK_TYPE_DOMAIN isolated CPUs. Now we are going to modify housekeeping cpumasks at run time, we will now allow overwriting of HK_TYPE_DOMAIN cpumask when an isolated partition is first created or when the creation of a non-isolated partition conflicts with the boot time HK_TYPE_DOMAIN isolated CPUs. The unnecessary checking code are now being removed. The doc file will be updated in a later patch. On the other hand, there is still a latency spike problem when CPU hotplug code is used to facilitate the proper functioning of the dynamically modified nohz_full HK_TYPE_KERNEL_NOISE cpumask. So the cpuset code will be modified to maintain the boot-time enabled nohz_full cpumask to avoid using cpu hotplug if all the newly isolated/non-isolated CPUs are already in that cpumask. This code will be removed in the future when the latency spike problem is solved. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 45 ++++++++---------------------------------- 1 file changed, 8 insertions(+), 37 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 2190efd33efb..87e9ee7922cd 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -59,7 +59,6 @@ static const char * const perr_strings[] =3D { [PERR_NOCPUS] =3D "Parent unable to distribute cpu downstream", [PERR_HOTPLUG] =3D "No cpu available due to hotplug", [PERR_CPUSEMPTY] =3D "cpuset.cpus and cpuset.cpus.exclusive are empty", - [PERR_HKEEPING] =3D "partition config conflicts with housekeeping setup", [PERR_ACCESS] =3D "Enable partition not permitted", [PERR_REMOTE] =3D "Have remote partition underneath", }; @@ -81,9 +80,10 @@ static cpumask_var_t subpartitions_cpus; static cpumask_var_t isolated_cpus; =20 /* - * Housekeeping (HK_TYPE_DOMAIN) CPUs at boot + * Housekeeping (nohz_full) CPUs at boot */ -static cpumask_var_t boot_hk_cpus; +static cpumask_var_t boot_nohz_full_hk_cpus; +static bool have_boot_nohz_full; static bool have_boot_isolcpus; =20 /* List of remote partition root children */ @@ -1609,26 +1609,6 @@ static void remote_cpus_update(struct cpuset *cs, st= ruct cpumask *xcpus, remote_partition_disable(cs, tmp); } =20 -/* - * prstate_housekeeping_conflict - check for partition & housekeeping conf= licts - * @prstate: partition root state to be checked - * @new_cpus: cpu mask - * Return: true if there is conflict, false otherwise - * - * CPUs outside of boot_hk_cpus, if defined, can only be used in an - * isolated partition. - */ -static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new= _cpus) -{ - if (!have_boot_isolcpus) - return false; - - if ((prstate !=3D PRS_ISOLATED) && !cpumask_subset(new_cpus, boot_hk_cpus= )) - return true; - - return false; -} - /** * update_parent_effective_cpumask - update effective_cpus mask of parent = cpuset * @cs: The cpuset that requests change in partition root state @@ -1737,9 +1717,6 @@ static int update_parent_effective_cpumask(struct cpu= set *cs, int cmd, if (cpumask_empty(xcpus)) return PERR_INVCPUS; =20 - if (prstate_housekeeping_conflict(new_prs, xcpus)) - return PERR_HKEEPING; - /* * A parent can be left with no CPU as long as there is no * task directly associated with the parent partition. @@ -2356,9 +2333,6 @@ static int update_cpumask(struct cpuset *cs, struct c= puset *trialcs, cpumask_empty(trialcs->effective_xcpus)) { invalidate =3D true; cs->prs_err =3D PERR_INVCPUS; - } else if (prstate_housekeeping_conflict(old_prs, trialcs->effective_xcp= us)) { - invalidate =3D true; - cs->prs_err =3D PERR_HKEEPING; } else if (tasks_nocpu_error(parent, cs, trialcs->effective_xcpus)) { invalidate =3D true; cs->prs_err =3D PERR_NOCPUS; @@ -2499,9 +2473,6 @@ static int update_exclusive_cpumask(struct cpuset *cs= , struct cpuset *trialcs, if (cpumask_empty(trialcs->effective_xcpus)) { invalidate =3D true; cs->prs_err =3D PERR_INVCPUS; - } else if (prstate_housekeeping_conflict(old_prs, trialcs->effective_xcp= us)) { - invalidate =3D true; - cs->prs_err =3D PERR_HKEEPING; } else if (tasks_nocpu_error(parent, cs, trialcs->effective_xcpus)) { invalidate =3D true; cs->prs_err =3D PERR_NOCPUS; @@ -3787,11 +3758,11 @@ int __init cpuset_init(void) =20 BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL)); =20 - have_boot_isolcpus =3D housekeeping_enabled(HK_TYPE_DOMAIN); - if (have_boot_isolcpus) { - BUG_ON(!alloc_cpumask_var(&boot_hk_cpus, GFP_KERNEL)); - cpumask_copy(boot_hk_cpus, housekeeping_cpumask(HK_TYPE_DOMAIN)); - cpumask_andnot(isolated_cpus, cpu_possible_mask, boot_hk_cpus); + have_boot_nohz_full =3D housekeeping_enabled(HK_TYPE_KERNEL_NOISE); + have_boot_isolcpus =3D housekeeping_enabled(HK_TYPE_DOMAIN); + if (have_boot_nohz_full) { + BUG_ON(!alloc_cpumask_var(&boot_nohz_full_hk_cpus, GFP_KERNEL)); + cpumask_copy(boot_nohz_full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL= _NOISE)); } =20 return 0; --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A41C2836BF for ; Fri, 8 Aug 2025 15:12:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665984; cv=none; b=JbggSxmcR5AfXDoera4DIb4LwLcanBu9TvinekERkDdZEn+jtTLBVe5biwqOEKluHOH385KwWA+0G93SGb0ByR7726wbwUpc8Csdr8tFblKX93Tdh/wNyC6PysulPEwmidyAmh6ie1rCclTumIiYcmOuIDChVnSwqlX5ZxcQJN4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665984; c=relaxed/simple; bh=cIpLk/5NGgJNNjuGY9UZLAULUVicJJ7g5IXjevBvTvM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SYmixvb0pZPqoNd/v1kfVbGNPKBm4PyKm/Dj3NXYOUPkAovzwsvsmBhu9oUOpU7kHgtYLbnQXyultkNFSOA2wkME0JW5fy9rVpNdwlTr00Sq7uQb5V4kbGYL41zwCbu9O3Qka9+YmiYC3Ozsj/r/hmwMb2VxXPRhidLe6JnBI68= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WuuOqxgf; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WuuOqxgf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665978; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xcgR+/Mgcw3wYRw08/uxfDNmVczMzzLTybzSdrby6Ro=; b=WuuOqxgfRqF7pUD7U3p3jEp8vvMOio8tE9U2W8RDRpCLIcRJDIelsizXsCEOs7GA3Ba4Gz w//cDZBezeuGkmHEsohfSrBR+dUi3FYeJH0z8LshoIkNKlcPoSbvssjhDTTGonQpiiY2QK Ty8VomtnIZ2YSvd7d6RI/3PAnu8i9QQ= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-114-nwSTaxClN8ahNFuZwPS4cg-1; Fri, 08 Aug 2025 11:12:49 -0400 X-MC-Unique: nwSTaxClN8ahNFuZwPS4cg-1 X-Mimecast-MFC-AGG-ID: nwSTaxClN8ahNFuZwPS4cg_1754665962 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E3F8F180029A; Fri, 8 Aug 2025 15:12:41 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A6212195419C; Fri, 8 Aug 2025 15:12:35 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 08/18] cgroup/cpuset: Use CPU hotplug to enable runtime nohz_full modification Date: Fri, 8 Aug 2025 11:10:52 -0400 Message-ID: <20250808151053.19777-9-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" One relatively simple way to allow runtime modification of nohz_full, and rcu_nocbs CPUs is to use the CPU hotplug to bring the affected CPUs offline first, making changes to the housekeeping cpumasks and then bring them back online. However, doing this will be rather costly in term of the number of CPU cycles needed. Still it is the easiet way to achieve the desired result and hopefully we can gradually reduce the overhead over time. Use the newly introduced cpuhp_offline_cb() API to bring the affected CPUs offline, make the necessary housekeeping cpumask changes and then bring those CPUs back online again. As HK_TYPE_DOMAIN cpumask is going to be updated at run time, we are going to reset any boot time isolcpus domain setting if an isolated partition or a conflicting non-isolated partition is going to be created. Since rebuild_sched_domains() will be called at the end of update_isolation_cpumasks(), earlier rebuild_sched_domains_locked() calls will be suppressed to avoid unneeded work. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 95 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 92 insertions(+), 3 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 87e9ee7922cd..60f336e50b05 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1355,11 +1355,57 @@ static void partition_xcpus_del(int old_prs, struct= cpuset *parent, return; } =20 +/* + * We are only updating HK_TYPE_DOMAIN and HK_TYPE_KERNEL_NOISE housekeepi= ng + * cpumask for now. HK_TYPE_MANAGED_IRQ will be handled later. + */ +static int do_housekeeping_exclude_cpumask(void *arg __maybe_unused) +{ + int ret; + struct cpumask *icpus =3D isolated_cpus; + unsigned long flags =3D BIT(HK_TYPE_DOMAIN) | BIT(HK_TYPE_KERNEL_NOISE); + + /* + * The boot time isolcpus setting will be overwritten if set. + */ + have_boot_isolcpus =3D false; + + if (have_boot_nohz_full) { + /* + * Need to separate the handling of HK_TYPE_KERNEL_NOISE and + * HK_TYPE_DOMAIN as different cpumasks will be used for each. + */ + ret =3D housekeeping_exclude_cpumask(icpus, BIT(HK_TYPE_DOMAIN)); + WARN_ON_ONCE((ret < 0) && (ret !=3D -EOPNOTSUPP)); + + if (cpumask_empty(isolcpus_update_state.cpus)) + return ret; + flags =3D BIT(HK_TYPE_KERNEL_NOISE); + icpus =3D kmalloc(cpumask_size(), GFP_KERNEL); + if (WARN_ON_ONCE(!icpus)) + return -ENOMEM; + + /* + * Add boot time nohz_full CPUs into the isolated CPUs list + * for exclusion from HK_TYPE_KERNEL_NOISE CPUs. + */ + cpumask_andnot(icpus, cpu_possible_mask, boot_nohz_full_hk_cpus); + cpumask_or(icpus, icpus, isolated_cpus); + } + ret =3D housekeeping_exclude_cpumask(icpus, flags); + WARN_ON_ONCE((ret < 0) && (ret !=3D -EOPNOTSUPP)); + + if (icpus !=3D isolated_cpus) + kfree(icpus); + return ret; +} + /** * update_isolation_cpumasks - Update external isolation CPU masks * * The following external CPU masks will be updated if necessary: * - workqueue unbound cpumask + * - housekeeping cpumasks */ static void update_isolation_cpumasks(void) { @@ -1371,7 +1417,41 @@ static void update_isolation_cpumasks(void) ret =3D workqueue_unbound_exclude_cpumask(isolated_cpus); WARN_ON_ONCE(ret < 0); =20 + /* + * Mask out offline and boot-time nohz_full non-housekeeping + * CPUs from isolcpus_update_state.cpus to compute the set + * of CPUs that need to be brought offline before calling + * do_housekeeping_exclude_cpumask(). + */ + cpumask_and(isolcpus_update_state.cpus, + isolcpus_update_state.cpus, cpu_active_mask); + if (have_boot_nohz_full) + cpumask_and(isolcpus_update_state.cpus, + isolcpus_update_state.cpus, boot_nohz_full_hk_cpus); + + /* + * Without any change in the set of nohz_full CPUs, we don't really + * need to use CPU hotplug for making change in HK cpumasks. + */ + if (cpumask_empty(isolcpus_update_state.cpus)) + ret =3D do_housekeeping_exclude_cpumask(NULL); + else + ret =3D cpuhp_offline_cb(isolcpus_update_state.cpus, + do_housekeeping_exclude_cpumask, NULL); + /* + * A errno value of -EPERM may be returned from cpuhp_offline_cb() if + * any one of the CPUs in isolcpus_update_state.cpus can't be brought + * offline. This can happen for the boot CPU (normally CPU 0) which + * cannot be shut down. This CPU should not be used for creating + * isolated partition. + */ + if (ret =3D=3D -EPERM) + pr_warn_once("cpuset: The boot CPU shouldn't be used for isolated partit= ion\n"); + else + WARN_ON_ONCE(ret < 0); + cpumask_clear(isolcpus_update_state.cpus); + rebuild_sched_domains(); isolcpus_update_state.updating =3D false; } =20 @@ -2961,7 +3041,16 @@ static int update_prstate(struct cpuset *cs, int new= _prs) update_partition_sd_lb(cs, old_prs); =20 notify_partition_change(cs, old_prs); - if (force_sd_rebuild) + + /* + * If boot time domain isolcpus exists and it conflicts with the CPUs + * in the new partition, we will have to reset HK_TYPE_DOMAIN cpumask. + */ + if (have_boot_isolcpus && (new_prs > PRS_MEMBER) && + !cpumask_subset(cs->effective_xcpus, housekeeping_cpumask(HK_TYPE_DOM= AIN))) + isolcpus_update_state.updating =3D true; + + if (force_sd_rebuild && !isolcpus_update_state.updating) rebuild_sched_domains_locked(); free_cpumasks(NULL, &tmpmask); return 0; @@ -3232,7 +3321,7 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file = *of, } =20 free_cpuset(trialcs); - if (force_sd_rebuild) + if (force_sd_rebuild && !isolcpus_update_state.updating) rebuild_sched_domains_locked(); out_unlock: mutex_unlock(&cpuset_mutex); @@ -3999,7 +4088,7 @@ static void cpuset_handle_hotplug(void) } =20 /* rebuild sched domains if necessary */ - if (force_sd_rebuild) + if (force_sd_rebuild && !isolcpus_update_state.updating) rebuild_sched_domains_cpuslocked(); =20 free_cpumasks(NULL, ptmp); --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DA19284B58 for ; Fri, 8 Aug 2025 15:13:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665985; cv=none; b=ZPIgU2BwuvLKC5DB/xWrLEgGFxK7QrTKrLukTySvar1ouKcro1StCzgz5s5dh+HEMZqY0QVGckIHzCM4W5jgvWEh4rHg44sExDb8p+lLCfEXr3wnfWMs4mwT7qqEoIdWukY/7d126VpTUibOz2OfotoOEBpcsIVIme7TeuacZ+0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665985; c=relaxed/simple; bh=2v0z2UD/Jy6qrCjDearYXZy6ugISuWgSA/5PTA5JQTk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=STZ+OHZkSwDsithMUBJ6kmSNdbchEKSRpAzynXGJ/TRA3tUEzRI7O8+DvlBVdf+MFSLBK1fuMBXazKFXW5N57fn0u8qrzL4TjKtobl6wLBwpYapkO/OInSQSWBAaVKMVvyAA3i4PMC2iaUbeJFJGCdSi+1t6J9FxrDCxvoP96KM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Rn6Nt5RD; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Rn6Nt5RD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665983; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JL1VyN/kLh867ZyHGhu4edVO+k2wi+9swB6ARbL5mBI=; b=Rn6Nt5RDdFWnueoygX0PQKEtUKaQ2k5wkh5GZGiatRmF2bmhsmtox6h8zRJaODO3amvG+W Ay3QkjM3uMDC3YvehsAgNzzRn/s5MVjNx7xtEgV98Omw71E2yuDNGpFJnmsWeh+SsGAbZu O+kCBxCLIvZ8SXu5Ps2/fTjNkOeD1MU= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-75-NmJOjs3_OsCCQTGZzcX6RQ-1; Fri, 08 Aug 2025 11:12:59 -0400 X-MC-Unique: NmJOjs3_OsCCQTGZzcX6RQ-1 X-Mimecast-MFC-AGG-ID: NmJOjs3_OsCCQTGZzcX6RQ_1754665975 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 238531944D25; Fri, 8 Aug 2025 15:12:53 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3269F196FCAB; Fri, 8 Aug 2025 15:12:42 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 09/18] cgroup/cpuset: Revert "Include isolated cpuset CPUs in cpu_is_isolated() check" Date: Fri, 8 Aug 2025 11:10:53 -0400 Message-ID: <20250808151053.19777-10-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" Now that the HK_TYPE_DOMAIN cpumask is updated at run time to reflect changes made in isolated cpuset partitions. We no longer need a separate cpuset_cpu_is_isolated() function for checking isolated CPUs generated by cpuset. Revert commit 3232e7aad11e ("cgroup/cpuset: Include isolated cpuset CPUs in cpu_is_isolated() check"). Signed-off-by: Waiman Long --- include/linux/cpuset.h | 6 ------ include/linux/sched/isolation.h | 3 +-- kernel/cgroup/cpuset.c | 11 ----------- 3 files changed, 1 insertion(+), 19 deletions(-) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 2ddb256187b5..a2ea8efebf36 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -76,7 +76,6 @@ extern void cpuset_lock(void); extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mas= k); extern bool cpuset_cpus_allowed_fallback(struct task_struct *p); -extern bool cpuset_cpu_is_isolated(int cpu); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); #define cpuset_current_mems_allowed (current->mems_allowed) void cpuset_init_current_mems_allowed(void); @@ -206,11 +205,6 @@ static inline bool cpuset_cpus_allowed_fallback(struct= task_struct *p) return false; } =20 -static inline bool cpuset_cpu_is_isolated(int cpu) -{ - return false; -} - static inline nodemask_t cpuset_mems_allowed(struct task_struct *p) { return node_possible_map; diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index af38d21d0d00..0bc4b3368d39 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -79,8 +79,7 @@ static inline bool housekeeping_cpu(int cpu, enum hk_type= type) static inline bool cpu_is_isolated(int cpu) { return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) || - !housekeeping_test_cpu(cpu, HK_TYPE_TICK) || - cpuset_cpu_is_isolated(cpu); + !housekeeping_test_cpu(cpu, HK_TYPE_TICK); } =20 #endif /* _LINUX_SCHED_ISOLATION_H */ diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 60f336e50b05..6308bb14e018 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1455,17 +1455,6 @@ static void update_isolation_cpumasks(void) isolcpus_update_state.updating =3D false; } =20 -/** - * cpuset_cpu_is_isolated - Check if the given CPU is isolated - * @cpu: the CPU number to be checked - * Return: true if CPU is used in an isolated partition, false otherwise - */ -bool cpuset_cpu_is_isolated(int cpu) -{ - return cpumask_test_cpu(cpu, isolated_cpus); -} -EXPORT_SYMBOL_GPL(cpuset_cpu_is_isolated); - /* * compute_effective_exclusive_cpumask - compute effective exclusive CPUs * @cs: cpuset --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45810283FE7 for ; Fri, 8 Aug 2025 15:20:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666439; cv=none; b=evGktzru3rCMeRZKXrD9jaCfMMOrO9w1eme+hw2GtT4Y6zqIenawpO9TfUCgP9a0AsVqdaBx61vkwZkluawUH49uS+w6Dz1/WHV5ii1cGsgqbyTyZGLvPtBIcggWnVi7tMJuwEsBkw8C725G6GCFP/ul8wlGUrLmx6YoRe3buys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666439; c=relaxed/simple; bh=XH0Af+hJDBkCAuw5puHAq55b47h+GZUhJtnbmXxuG6g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dF76OMPBqnIktE6EsooRJWN5NI10VIbi+F3usJzg2vUdr8ed/dS83KLtNTwKYmLoIQ17xQ9SrON5sKRqeU4FfCEhDt4AsJBL43XuDPkrcgJr8rKLdND+bJeID8Yf/0GvnGz4SkrUAhfDA56nSZwCMJIcLRZfq5lVdUIrNgOQKMU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iUeydMlR; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iUeydMlR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666437; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8WLvzJ5WiuMjAOAmYSD8iakpWVzO3E6xQqHPmy68LCM=; b=iUeydMlR2vN4JZe7V2sieyCsqfIt5O3kN4Cg7rLUUVbmJLUoAVxSx8GqyXmyEjPcOxj02L oTT5U0V4WjmFJkjRuCAflOsfimQ3g1NJ3QK3g8a43ewJlx+uShyUvkYRedeSMF8moXTsw+ AEgXCJ73psWV1WG7Z27P54H1viv0e6I= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-557-vUyb1KdwOIq-OmnUSqoLsA-1; Fri, 08 Aug 2025 11:20:31 -0400 X-MC-Unique: vUyb1KdwOIq-OmnUSqoLsA-1 X-Mimecast-MFC-AGG-ID: vUyb1KdwOIq-OmnUSqoLsA_1754666421 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 65E30180028C; Fri, 8 Aug 2025 15:20:21 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5657C1800294; Fri, 8 Aug 2025 15:20:14 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 10/18] sched/core: Ignore DL BW deactivation error if in cpuhp_offline_cb_mode Date: Fri, 8 Aug 2025 11:19:53 -0400 Message-ID: <20250808152001.20245-1-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" With the new strategy of using CPU hotplug to improve CPU isolation and the optimization of delaying sched domain rebuild until the whole process completes, we can run into a problem in shutting down the last CPU of a partition and a -EBUSY error may be returned. This -EBUSY error is caused by failing the DL BW check in dl_bw_deactivate(). As the CPU deactivation is only temporary and it will be brought back up again in a short moment, there is no point in failing the operation because of this DL BW error in this transitioning period. Fix this problem by ignoring this error when in CPU hotplug offline callback mode (cpuhp_offline_cb_mode is on). Signed-off-by: Waiman Long --- kernel/sched/core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9f02c047e25b..78f4ba73a9f2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8469,7 +8469,11 @@ int sched_cpu_deactivate(unsigned int cpu) =20 ret =3D dl_bw_deactivate(cpu); =20 - if (ret) + /* + * Ignore DL BW error if in cpuhp offline callback mode as CPU + * deactivation is only temporary. + */ + if (ret && !cpuhp_offline_cb_mode) return ret; =20 /* --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C38728467C for ; Fri, 8 Aug 2025 15:20:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666444; cv=none; b=ElJHYUytNQW7zgPKnqZC/GB2H7GJ1pb7bwlvG0xURLvc6jkun4EJK8dqLdpyxd/69xOcGaENoYtFdDhACezAN9lZ2KlyPPJE29XLku3D4DvMShG5jd2rKpJ3piMvpF7v8xor20Mti45IpGTHxQAHM3ox18MT4wwtkxVMIV4MRmY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666444; c=relaxed/simple; bh=uZdXodG+xrgE4lj+usSCEwwAw/UOkFtAAv/696uixfw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oQiR3m5ZFBE0GzLEUO9duTT7gQ+wQubxseubiPtFPJUcvB4ilrgvp7bOw/R9IMVeSCEBhLUQMn2OJUnp9YAhBhA8I/FTbZ2l3pbwjMk3DO08Y9XPef+UZAIUQubok5YwTBia2ElRJp/8zw7SYjgc6yrcABCa7LLLJ2itOUMKLH8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OJLX6VDY; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OJLX6VDY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666441; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6b3KPC8rglAUpE8Wovp40kQFjRH7fp7qx30+4/FOiug=; b=OJLX6VDYAM1x02oPP66Ny8H5OtaLlMahPi9n7NMt6mvsNkskKI7L0ub6DsorcE2E7DltOj T2EAndO0hIP/6W9GmtmaCK61hfh0QXcCvzr3LkRVLFp/dMpIKur6AiyE+FOC+zdCVKkW23 e5aWvgJX6OFXKPIRAGhUs+MqfXu3wtI= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-149-tDyNSjeoOwSEhqrEZUcvrg-1; Fri, 08 Aug 2025 11:20:36 -0400 X-MC-Unique: tDyNSjeoOwSEhqrEZUcvrg-1 X-Mimecast-MFC-AGG-ID: tDyNSjeoOwSEhqrEZUcvrg_1754666428 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 02B9F18002EA; Fri, 8 Aug 2025 15:20:28 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id ACC00180029D; Fri, 8 Aug 2025 15:20:21 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 11/18] tick/nohz: Make nohz_full parameter optional Date: Fri, 8 Aug 2025 11:19:54 -0400 Message-ID: <20250808152001.20245-2-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" To provide nohz_full tick support, there is a set of tick dependency masks that need to be evaluated on every IRQ and context switch. Switching on nohz_full tick support at runtime will be problematic as some of the tick dependency masks may not be properly set causing problem down the road. Allow nohz_full boot option to be specified without any parameter to force enable nohz_full tick support without any CPU in the tick_nohz_full_mask yet. The context_tracking_key and tick_nohz_full_running flag will be enabled in this case to make tick_nohz_full_enabled() return true. There is still a small performance overhead by force enable nohz_full this way. So it should only be used if there is a chance that some CPUs may become isolated later via the cpuset isolated partition functionality and better CPU isolation closed to nohz_full is desired. Signed-off-by: Waiman Long --- .../admin-guide/kernel-parameters.txt | 19 ++++++++++++------- include/linux/context_tracking.h | 7 ++++++- kernel/context_tracking.c | 4 +++- kernel/sched/isolation.c | 13 ++++++++++++- kernel/time/tick-sched.c | 11 +++++++++-- 5 files changed, 42 insertions(+), 12 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 747a55abf494..89a8161475b5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4260,15 +4260,20 @@ Valid arguments: on, off Default: on =20 - nohz_full=3D [KNL,BOOT,SMP,ISOL] - The argument is a cpu list, as described above. + nohz_full[=3Dcpu-list] + [KNL,BOOT,SMP,ISOL] In kernels built with CONFIG_NO_HZ_FULL=3Dy, set - the specified list of CPUs whose tick will be stopped - whenever possible. The boot CPU will be forced outside - the range to maintain the timekeeping. Any CPUs - in this list will have their RCU callbacks offloaded, + the specified list of CPUs whose tick will be + stopped whenever possible. If the argument is + not specified, nohz_full will be forced enabled + without any CPU in the nohz_full list yet. + The boot CPU will be forced outside the range + to maintain the timekeeping. Any CPUs in this + list will have their RCU callbacks offloaded, just as if they had also been called out in the - rcu_nocbs=3D boot parameter. + rcu_nocbs=3D boot parameter. There is no need + to use rcu_nocbs=3D boot parameter if nohz_full + has been set which will override rcu_nocbs. =20 Note that this argument takes precedence over the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option. diff --git a/include/linux/context_tracking.h b/include/linux/context_track= ing.h index af9fe87a0922..a3fea7f9fef6 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -9,8 +9,13 @@ =20 #include =20 - #ifdef CONFIG_CONTEXT_TRACKING_USER +/* + * Pass CONTEXT_TRACKING_FORCE_ENABLE to ct_cpu_track_user() to force enab= le + * user context tracking. + */ +#define CONTEXT_TRACKING_FORCE_ENABLE (-1) + extern void ct_cpu_track_user(int cpu); =20 /* Called with interrupts disabled. */ diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index fb5be6e9b423..734354bbfdbb 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -698,7 +698,9 @@ void __init ct_cpu_track_user(int cpu) { static __initdata bool initialized =3D false; =20 - if (!per_cpu(context_tracking.active, cpu)) { + if (cpu =3D=3D CONTEXT_TRACKING_FORCE_ENABLE) { + static_branch_inc(&context_tracking_key); + } else if (!per_cpu(context_tracking.active, cpu)) { per_cpu(context_tracking.active, cpu) =3D true; static_branch_inc(&context_tracking_key); } diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index f26708667754..2bed4b2f9ec5 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -146,6 +146,7 @@ static int __init housekeeping_setup(char *str, unsigne= d long flags) } =20 alloc_bootmem_cpumask_var(&non_housekeeping_mask); + if (cpulist_parse(str, non_housekeeping_mask) < 0) { pr_warn("Housekeeping: nohz_full=3D or isolcpus=3D incorrect CPU range\n= "); goto free_non_housekeeping_mask; @@ -155,6 +156,13 @@ static int __init housekeeping_setup(char *str, unsign= ed long flags) cpumask_andnot(housekeeping_staging, cpu_possible_mask, non_housekeeping_mask); =20 + /* + * Allow "nohz_full" without parameter to force enable nohz_full + * at boot time without any CPUs in the nohz_full list yet. + */ + if ((flags & HK_FLAG_KERNEL_NOISE) && !*str) + goto setup_housekeeping_staging; + first_cpu =3D cpumask_first_and(cpu_present_mask, housekeeping_staging); if (first_cpu >=3D nr_cpu_ids || first_cpu >=3D setup_max_cpus) { __cpumask_set_cpu(smp_processor_id(), housekeeping_staging); @@ -168,6 +176,7 @@ static int __init housekeeping_setup(char *str, unsigne= d long flags) if (cpumask_empty(non_housekeeping_mask)) goto free_housekeeping_staging; =20 +setup_housekeeping_staging: if (!housekeeping.flags) { /* First setup call ("nohz_full=3D" or "isolcpus=3D") */ enum hk_type type; @@ -212,10 +221,12 @@ static int __init housekeeping_nohz_full_setup(char *= str) unsigned long flags; =20 flags =3D HK_FLAG_KERNEL_NOISE; + if (*str =3D=3D '=3D') + str++; =20 return housekeeping_setup(str, flags); } -__setup("nohz_full=3D", housekeeping_nohz_full_setup); +__setup("nohz_full", housekeeping_nohz_full_setup); =20 static int __init housekeeping_isolcpus_setup(char *str) { diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index c527b421c865..87b26a4471e7 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -651,8 +651,15 @@ void __init tick_nohz_init(void) } } =20 - for_each_cpu(cpu, tick_nohz_full_mask) - ct_cpu_track_user(cpu); + /* + * Force enable context_tracking_key if tick_nohz_full_mask empty + */ + if (cpumask_empty(tick_nohz_full_mask)) { + ct_cpu_track_user(CONTEXT_TRACKING_FORCE_ENABLE); + } else { + for_each_cpu(cpu, tick_nohz_full_mask) + ct_cpu_track_user(cpu); + } =20 ret =3D cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "kernel/nohz:predown", NULL, --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C1C4284678 for ; Fri, 8 Aug 2025 15:20:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666443; cv=none; b=Jgzm8YvibM2HFZ5QCUiqJM+y1dFKTS02LeGPK0qegho6gRBVjmL7+w2e0VmidqkxteGlz7qAchHqTNom/jqarSp8WQzJXbhAwSnuwKl6HrRdNMbakXKknRAETr26zLa1k7d3RIwFtxjaRGhVX9DqU0dh5inPe71/0Kb50lNAZ0Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666443; c=relaxed/simple; bh=KhnDnewWmNF5LBNjEEZdjlT1+NiVfMU/hEwiwCzSkY8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IEOxi3Tzyd4eKgw2bi5yWdcyqgxYRQLSVeD4KfaBTlM8CLdz+sabMqLUTdT1gUoQXIyCwqKpKp5pPRCrE+Nx/JUXvQ9O6v1LUFqT4/3FXYCSju0MKUe3UaDcLIhD0yCob9AvlxqTDnIRZ1eAzUatexL9LlyyU5kANJ1EVNlTU0w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=N2ycpzOt; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="N2ycpzOt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666440; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AxJEYuuRH3QK9/Cdhoov2KDfqFceQ8tv5+6IbkX7B5U=; b=N2ycpzOt7Nh9F/bjFzMe2QNkD6FMe2tdIbIaq2FDJTmnjMQSPTcyhvCzZN7+Wz3eJX9QmT wYGTvWKG01n84gtG4a5wlxE06HFDliiE88RM0KBsyKTijKM6u+qmPsaJpSUpLUxx2TAyCb V/aPa+0kTjgWmdkAhdnSayvWBltL9dE= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-261-JK_dUM0TP86Dh_KVi_TrBw-1; Fri, 08 Aug 2025 11:20:39 -0400 X-MC-Unique: JK_dUM0TP86Dh_KVi_TrBw-1 X-Mimecast-MFC-AGG-ID: JK_dUM0TP86Dh_KVi_TrBw_1754666434 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 96322180028C; Fri, 8 Aug 2025 15:20:34 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 48E23180029B; Fri, 8 Aug 2025 15:20:28 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 12/18] tick/nohz: Introduce tick_nohz_full_update_cpus() to update tick_nohz_full_mask Date: Fri, 8 Aug 2025 11:19:55 -0400 Message-ID: <20250808152001.20245-3-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" When the list of HK_FLAG_KERNEL_NOISE housekeeping CPUs are changed, we will need to update tick_nohz_full_mask so that dynticks can work correctly. Introduce a new tick_nohz_full_update_cpus() function that can be called at run time to update tick_nohz_full_mask. Signed-off-by: Waiman Long --- include/linux/tick.h | 2 ++ kernel/time/tick-sched.c | 6 ++++++ 2 files changed, 8 insertions(+) diff --git a/include/linux/tick.h b/include/linux/tick.h index ac76ae9fa36d..34907c0b632c 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -272,6 +272,7 @@ static inline void tick_dep_clear_signal(struct signal_= struct *signal, extern void tick_nohz_full_kick_cpu(int cpu); extern void __tick_nohz_task_switch(void); extern void __init tick_nohz_full_setup(cpumask_var_t cpumask); +extern void tick_nohz_full_update_cpus(cpumask_var_t cpumask); #else static inline bool tick_nohz_full_enabled(void) { return false; } static inline bool tick_nohz_full_cpu(int cpu) { return false; } @@ -297,6 +298,7 @@ static inline void tick_dep_clear_signal(struct signal_= struct *signal, static inline void tick_nohz_full_kick_cpu(int cpu) { } static inline void __tick_nohz_task_switch(void) { } static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { } +static inline void tick_nohz_full_update_cpus(cpumask_var_t cpumask) { ret= urn false; } #endif =20 static inline void tick_nohz_task_switch(void) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 87b26a4471e7..9204808b7a55 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -604,6 +604,12 @@ void __init tick_nohz_full_setup(cpumask_var_t cpumask) tick_nohz_full_running =3D true; } =20 +/* Get the new set of run-time nohz CPU list from cpuset */ +void tick_nohz_full_update_cpus(cpumask_var_t cpumask) +{ + cpumask_copy(tick_nohz_full_mask, cpumask); +} + bool tick_nohz_cpu_hotpluggable(unsigned int cpu) { /* --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CF5528505D for ; Fri, 8 Aug 2025 15:21:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666491; cv=none; b=OCa1pgQbvuuCU6y4ftXjgEhPgPCEFrNzjXqXjmt9kk4y0lmsS4XHmw2Ma7EKDxKua+Ap/tPLmTtuuRgZsZiDo0Ky45ofy1RnQdk59nju6rAK+SJc5wQ0/HFhV4bWL31DM2F/t+KujBIaww464w3VH1aGI14JQ8UpQitubexIe+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666491; c=relaxed/simple; bh=G+SuqHYYBtyqlrzLWONaD/8ACPafYoU76CpsPRv+ZDo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bSMylC1o+3Us02JmCs9TM3+GAT3nE7zBvnn6FsXkL3XWATDrxazbYAOtd8AxQCD3IDhSWrIyPbPqiGv6QIORq7bmDDHgKtCOzJy3Om3N+qa+Ssa7mMB3nGOj+EZYMxrbxPseGoSdR6P+e13Pr36ztVuAq/z8HtTwH4s4NhygSuw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CkTL74UG; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CkTL74UG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s3IrWf1FQMKxHSbm6lytMXnYB7oM2ZZ+ln1x5eRMpdA=; b=CkTL74UG0MqqF3Jl/aZdTLIYUF75EHFLhk1jaYH+DWQyIQ/Rb7ZUAN2pTVw2RXCuNhjs3i iJ8ph/Ok7JvwtW/+96hkQWjdiT2JGE90Vu5wGW2IicVcrSuyvBHmRwewpk7ZBMQgWpriVS dG+2NP3OHOnFR9vzACGkn3n1vkukNSo= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-158-sptsYpEoMveUvnkJBZ77Hw-1; Fri, 08 Aug 2025 11:21:20 -0400 X-MC-Unique: sptsYpEoMveUvnkJBZ77Hw-1 X-Mimecast-MFC-AGG-ID: sptsYpEoMveUvnkJBZ77Hw_1754666442 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 91B6F180034A; Fri, 8 Aug 2025 15:20:41 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DB7921800294; Fri, 8 Aug 2025 15:20:34 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 13/18] tick/nohz: Allow runtime changes in full dynticks CPUs Date: Fri, 8 Aug 2025 11:19:56 -0400 Message-ID: <20250808152001.20245-4-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Full dynticks can only be enabled if "nohz_full" boot option has been been specified with or without parameter. Any change in the list of nohz_full CPUs have to be reflected in tick_nohz_full_mask. So the newly introduced tick_nohz_full_update_cpus() will be called to update the mask. We also need to enable CPU context tracking for those CPUs that are in tick_nohz_full_mask. So remove __init from tick_nohz_init() and ct_cpu_track_user() so that they be called later when an isolated cpuset partition is being created. The __ro_after_init attribute is taken away from context_tracking_key as well. Also add a new ct_cpu_untrack_user() function to reverse the action of ct_cpu_track_user() in case we need to disable the nohz_full mode of a CPU. With nohz_full enabled, the boot CPU (typically CPU 0) will be the tick CPU which cannot be shut down easily. So the boot CPU should not be used in an isolated cpuset partition. With runtime modification of nohz_full CPUs, tick_do_timer_cpu can become TICK_DO_TIMER_NONE. So remove the two TICK_DO_TIMER_NONE WARN_ON_ONCE() calls in tick-sched.c to avoid unnecessary warnings. Signed-off-by: Waiman Long --- include/linux/context_tracking.h | 1 + kernel/cgroup/cpuset.c | 23 ++++++++++++++++++++++- kernel/context_tracking.c | 17 ++++++++++++++--- kernel/time/tick-sched.c | 7 ------- 4 files changed, 37 insertions(+), 11 deletions(-) diff --git a/include/linux/context_tracking.h b/include/linux/context_track= ing.h index a3fea7f9fef6..1a6b816f1ad6 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -17,6 +17,7 @@ #define CONTEXT_TRACKING_FORCE_ENABLE (-1) =20 extern void ct_cpu_track_user(int cpu); +extern void ct_cpu_untrack_user(int cpu); =20 /* Called with interrupts disabled. */ extern void __ct_user_enter(enum ctx_state state); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 6308bb14e018..45c82c18bec4 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -23,6 +23,7 @@ */ #include "cpuset-internal.h" =20 +#include #include #include #include @@ -1361,7 +1362,7 @@ static void partition_xcpus_del(int old_prs, struct c= puset *parent, */ static int do_housekeeping_exclude_cpumask(void *arg __maybe_unused) { - int ret; + int cpu, ret; struct cpumask *icpus =3D isolated_cpus; unsigned long flags =3D BIT(HK_TYPE_DOMAIN) | BIT(HK_TYPE_KERNEL_NOISE); =20 @@ -1395,6 +1396,26 @@ static int do_housekeeping_exclude_cpumask(void *arg= __maybe_unused) ret =3D housekeeping_exclude_cpumask(icpus, flags); WARN_ON_ONCE((ret < 0) && (ret !=3D -EOPNOTSUPP)); =20 +#ifdef CONFIG_NO_HZ_FULL + /* + * To properly enable/disable nohz_full dynticks for the affected CPUs, + * the new nohz_full CPUs have to be copied to tick_nohz_full_mask and + * ct_cpu_track_user/ct_cpu_untrack_user() will have to be called + * for those CPUs that have their states changed. + */ + if (tick_nohz_full_enabled()) { + tick_nohz_full_update_cpus(icpus); + for_each_cpu(cpu, isolcpus_update_state.cpus) { + if (cpumask_test_cpu(cpu, icpus)) + ct_cpu_track_user(cpu); + else + ct_cpu_untrack_user(cpu); + } + } else { + pr_warn_once("Full dynticks cannot be enabled without the nohz_full kern= el boot parameter!\n"); + } +#endif + if (icpus !=3D isolated_cpus) kfree(icpus); return ret; diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index 734354bbfdbb..ed5653a3d6f7 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -431,7 +431,7 @@ static __always_inline void ct_kernel_enter(bool user, = int offset) { } #define CREATE_TRACE_POINTS #include =20 -DEFINE_STATIC_KEY_FALSE_RO(context_tracking_key); +DEFINE_STATIC_KEY_FALSE(context_tracking_key); EXPORT_SYMBOL_GPL(context_tracking_key); =20 static noinstr bool context_tracking_recursion_enter(void) @@ -694,9 +694,9 @@ void user_exit_callable(void) } NOKPROBE_SYMBOL(user_exit_callable); =20 -void __init ct_cpu_track_user(int cpu) +void ct_cpu_track_user(int cpu) { - static __initdata bool initialized =3D false; + static bool initialized; =20 if (cpu =3D=3D CONTEXT_TRACKING_FORCE_ENABLE) { static_branch_inc(&context_tracking_key); @@ -720,6 +720,17 @@ void __init ct_cpu_track_user(int cpu) initialized =3D true; } =20 +void ct_cpu_untrack_user(int cpu) +{ +#ifndef CONFIG_CONTEXT_TRACKING_USER_FORCE + if (!per_cpu(context_tracking.active, cpu)) + return; + + per_cpu(context_tracking.active, cpu) =3D false; + static_branch_dec(&context_tracking_key); +#endif +} + #ifdef CONFIG_CONTEXT_TRACKING_USER_FORCE void __init context_tracking_init(void) { diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 9204808b7a55..c16250c6a79f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -220,9 +220,6 @@ static void tick_sched_do_timer(struct tick_sched *ts, = ktime_t now) tick_cpu =3D READ_ONCE(tick_do_timer_cpu); =20 if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && unlikely(tick_cpu =3D=3D TICK_DO_T= IMER_NONE)) { -#ifdef CONFIG_NO_HZ_FULL - WARN_ON_ONCE(tick_nohz_full_running); -#endif WRITE_ONCE(tick_do_timer_cpu, cpu); tick_cpu =3D cpu; } @@ -1201,10 +1198,6 @@ static bool can_stop_idle_tick(int cpu, struct tick_= sched *ts) */ if (tick_cpu =3D=3D cpu) return false; - - /* Should not happen for nohz-full */ - if (WARN_ON_ONCE(tick_cpu =3D=3D TICK_DO_TIMER_NONE)) - return false; } =20 return true; --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B385284B49 for ; Fri, 8 Aug 2025 15:21:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666488; cv=none; b=I++fV/VSq9i4ALQWCS4wqeodWy2GlLbqMvIVB3Tl9QPMmx+ZYGxpQTx0rGOvJbyPQgJyIMjyTBRXgmTGifithUE9kq5NVdwNvIY8s9yQa+8VCb9ROGLLdpDvTfYQzmET+v9Psk9/SDQYYxMkUg7s28KLe9I6H83hh6AJZ1Ut1Pc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666488; c=relaxed/simple; bh=GoVQz85XRSK4EnmDjUywrx9qG5mAsCPsLS5ttvZ+MJM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EjhZFbJCgVEZCCld+PMUaEx3yJnXHcEw89hFsl8ULU5QAYfe+TnOnkkdU8RbCXR61T5AtQCw9Q4jTPyM+LNPg7DNGGNKWhQvMCG754XKehoKOYfjYJMlQSECFogCr69ET/xbhJJZBB/ouyEMHSYYmmOANMDJ4iGDoMgyAw2mmUA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EzBd/clK; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EzBd/clK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666484; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RJvtXRmFLqg3cjTyv7JZL5JKi+l3dywyp+8sWT9E3vU=; b=EzBd/clK+MGmYZYRjHT6cXuFudGJi96G4QPWirYbfvAaB7OIp21J6n/ILNNdFu6v/MGK02 PAFX0UMt4fRon4hXMy1EZhYSzQyS/5uBGUvi17YOOfoGITjNqaZuP5lFDZeNyneegUNwU9 WTkZ5vWAr7GfzHJLJKw954M506+g7Ho= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-607-lLulFMOLMk2tYQ8XiMY6nw-1; Fri, 08 Aug 2025 11:21:17 -0400 X-MC-Unique: lLulFMOLMk2tYQ8XiMY6nw-1 X-Mimecast-MFC-AGG-ID: lLulFMOLMk2tYQ8XiMY6nw_1754666448 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DA0D119560B2; Fri, 8 Aug 2025 15:20:47 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D8727180029D; Fri, 8 Aug 2025 15:20:41 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 14/18] tick: Pass timer tick job to an online HK CPU in tick_cpu_dying() Date: Fri, 8 Aug 2025 11:19:57 -0400 Message-ID: <20250808152001.20245-5-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" In tick_cpu_dying(), if the dying CPU is the current timekeeper, it has to pass the job over to another CPU. The current code passes it to another online CPU. However, that CPU may not be a timer tick housekeeping CPU. If that happens, another CPU will have to manually take it over again later. Avoid this unnecessary work by directly assigning an online housekeeping CPU. Use READ_ONCE/WRITE_ONCE() to access tick_do_timer_cpu in case the non-HK CPUs may not be in stop machine in the future. Signed-off-by: Waiman Long --- kernel/time/tick-common.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 9a3859443c04..6d5ff85281cc 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -17,6 +17,7 @@ #include #include #include +#include #include =20 #include @@ -394,12 +395,18 @@ int tick_cpu_dying(unsigned int dying_cpu) { /* * If the current CPU is the timekeeper, it's the only one that can - * safely hand over its duty. Also all online CPUs are in stop - * machine, guaranteed not to be idle, therefore there is no + * safely hand over its duty. Also all online housekeeping CPUs are + * in stop machine, guaranteed not to be idle, therefore there is no * concurrency and it's safe to pick any online successor. */ - if (tick_do_timer_cpu =3D=3D dying_cpu) - tick_do_timer_cpu =3D cpumask_first(cpu_online_mask); + if (READ_ONCE(tick_do_timer_cpu) =3D=3D dying_cpu) { + unsigned int new_cpu; + + new_cpu =3D cpumask_first_and(cpu_online_mask, housekeeping_cpumask(HK_T= YPE_TICK)); + if (WARN_ON_ONCE(new_cpu >=3D nr_cpu_ids)) + new_cpu =3D cpumask_first(cpu_online_mask); + WRITE_ONCE(tick_do_timer_cpu, new_cpu); + } =20 /* Make sure the CPU won't try to retake the timekeeping duty */ tick_sched_timer_dying(dying_cpu); --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D234F283FE4 for ; Fri, 8 Aug 2025 15:21:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666490; cv=none; b=gyORDlUSbPqFguEW86QvzxbJkbzt4cWKfLEHHSl9NXc2SlWbAYPZq/Wsae0Dh8zileDCItxoLNOMhkq7JllboKUSO/9nnmFV01Ir3DfZMWutaYki1v32bOAw7+bJHT3AG4fcsY20MTBvQ1930sO3Mw3wcwTNbEOdLjimdbbofEo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666490; c=relaxed/simple; bh=QFJWiaweEnjtZulZv+TQcP+SkZextOHNVPGHzdr08DY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MtjEa/cKPXbEX9a1uZRUaT+6EynwmVUmOCiAdBANPh0FZPRdhzj72I1dk2CEAtr2Gow7L92oShh2ji3boTt1m1dX45KLkvKGNgGThJLbKwg9PWyuA0qdrUA2UFo20QIyyRZyQVhB1aYquIZJNFjIKBbyVvovZeYw21L0ivR3/W4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HzB57zMv; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HzB57zMv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666486; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wEWvc3gO1vjmutC6sTx4SpSS9DqW2Vo8SMK8dJwY8nw=; b=HzB57zMvRSnezSzCxiEYp6vSVa7cJ10Q42sg5KP9olEHRqB6xTyNHT4mUUYj1zEDzJdhXp qc2ioprCEeRB7l1QMBCDVzFmlQW0RJiLGmy4Rzn7mBKJyqNB6J/ge8A0Gh66hxNUQQP1vO RpY8nPJldykWLNXMuZ2JlJPOjV0hKDc= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-pKkLszEMMSyqCueLYuEvDg-1; Fri, 08 Aug 2025 11:21:22 -0400 X-MC-Unique: pKkLszEMMSyqCueLYuEvDg-1 X-Mimecast-MFC-AGG-ID: pKkLszEMMSyqCueLYuEvDg_1754666455 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F319E180029D; Fri, 8 Aug 2025 15:20:54 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2AC23180029B; Fri, 8 Aug 2025 15:20:48 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 15/18] cgroup/cpuset: Enable RCU NO-CB CPU offloading of newly isolated CPUs Date: Fri, 8 Aug 2025 11:19:58 -0400 Message-ID: <20250808152001.20245-6-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Make use of the provided rcu_nocb_cpu_offload()/rcu_nocb_cpu_deoffload() APIs to enable RCU NO-CB CPU offloading of newly isolated CPUs and deoffloading of de-isolated CPUs. Also add a new rcu_nocbs_enabled() helper function to determine if RCU NO-CB CPU offloading can be done. As nohz_full can now be specified without any CPU list, drop the test for cpumask_empty(tick_nohz_full_mask) in rcu_init_nohz(). The RCU NO-CB CPU offloading feature can only used if either "rcs_nocbs" or the "nohz_full" boot command parameters are used so that the proper RCU NO-CB resources are properly initialized at boot time. Signed-off-by: Waiman Long --- include/linux/rcupdate.h | 2 ++ kernel/cgroup/cpuset.c | 14 ++++++++++++++ kernel/rcu/tree_nocb.h | 7 ++++++- 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 120536f4c6eb..642b80a4f071 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -140,6 +140,7 @@ void rcu_init_nohz(void); int rcu_nocb_cpu_offload(int cpu); int rcu_nocb_cpu_deoffload(int cpu); void rcu_nocb_flush_deferred_wakeup(void); +bool rcu_nocbs_enabled(void); =20 #define RCU_NOCB_LOCKDEP_WARN(c, s) RCU_LOCKDEP_WARN(c, s) =20 @@ -149,6 +150,7 @@ static inline void rcu_init_nohz(void) { } static inline int rcu_nocb_cpu_offload(int cpu) { return -EINVAL; } static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; } static inline void rcu_nocb_flush_deferred_wakeup(void) { } +static inline bool rcu_nocbs_enabled(void) { return false; } =20 #define RCU_NOCB_LOCKDEP_WARN(c, s) =20 diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 45c82c18bec4..de9cb92a0fc7 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1416,6 +1416,20 @@ static int do_housekeeping_exclude_cpumask(void *arg= __maybe_unused) } #endif =20 + if (rcu_nocbs_enabled()) { + /* + * Enable RCU NO-CB CPU offloading/deoffloading for the affected CPUs + */ + for_each_cpu(cpu, isolcpus_update_state.cpus) { + if (cpumask_test_cpu(cpu, icpus)) + ret =3D rcu_nocb_cpu_offload(cpu); + else + ret =3D rcu_nocb_cpu_deoffload(cpu); + if (WARN_ON_ONCE(ret)) + break; + } + } + if (icpus !=3D isolated_cpus) kfree(icpus); return ret; diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index e6cd56603cad..4d49a745b871 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -1293,7 +1293,7 @@ void __init rcu_init_nohz(void) struct shrinker * __maybe_unused lazy_rcu_shrinker; =20 #if defined(CONFIG_NO_HZ_FULL) - if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) + if (tick_nohz_full_running) cpumask =3D tick_nohz_full_mask; #endif =20 @@ -1365,6 +1365,11 @@ static void __init rcu_boot_init_nocb_percpu_data(st= ruct rcu_data *rdp) mutex_init(&rdp->nocb_gp_kthread_mutex); } =20 +bool rcu_nocbs_enabled(void) +{ + return !!rcu_state.nocb_is_setup; +} + /* * If the specified CPU is a no-CBs CPU that does not already have its * rcuo CB kthread, spawn it. Additionally, if the rcuo GP kthread --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67EDD285065 for ; Fri, 8 Aug 2025 15:21:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666491; cv=none; b=t5j14LbWvXhuahf9+5kgat+T4V/OyfSLmqNTvrslxZRiro9wkzdSCL2t6EWwd4HELsgY5ztnmPFyLGskR0pkQiDIHkNGe3+3q8IHlCkAH3DRJwwJ3g+7IL7Kr5HbD8hzkcUTT4qfz0pd8h8UOT6RJDw//Luf0wmcKRnNesnTxFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666491; c=relaxed/simple; bh=1eqN4fGhzGayU80/Ck4ie8EzmZzXnviR7/KkMsLuG4w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LRi6uaJBm/QBL5Ukrynmglo3t0FHdzVH1Vjd6VE08TEZ1FK5xDPVNhmRxabXe8Qh64sMEtt/1kDyniY/bUmAQPsg1VlcmCeUrC0JTEFN2XFIuFtBjVQ7DpPehCg/4L0h3tKiyq9jkciNjUZIRvMuD7Bl6YGFL6G1xqVTTAi+Eu0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dOsDR9IR; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dOsDR9IR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Trb4QHyh4VHqbF76osaAgQ0EsfsRdUKQSiAhsHdLVCE=; b=dOsDR9IR1flHPLpENIFx5ypC3bwzEVLgOHd7Rlgu+bEQfdxvZxsyasw6Okkj+JLy3LobFC kqEg7s2g4CR5NsqZQHYlM1Rm/y+DPF9ECYLqNdL6HnH3LV6NjviXLFEIAOeDH23x7BqHmN XMiV+EHO7VUPdLWwvmaH7c0KmJR0/yU= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-650-kb5xsjiRPEqGouRI64KuyQ-1; Fri, 08 Aug 2025 11:21:19 -0400 X-MC-Unique: kb5xsjiRPEqGouRI64KuyQ-1 X-Mimecast-MFC-AGG-ID: kb5xsjiRPEqGouRI64KuyQ_1754666462 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CBC5C1800286; Fri, 8 Aug 2025 15:21:01 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 26001180029B; Fri, 8 Aug 2025 15:20:55 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 16/18] cgroup/cpuset: Don't set have_boot_nohz_full without any boot time nohz_full CPU Date: Fri, 8 Aug 2025 11:19:59 -0400 Message-ID: <20250808152001.20245-7-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" As HK_TYPE_KERNEL_NOISE bit can now be set without any nohz_full CPU specified at boot time, don't set have_boot_nohz_full in this case. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index de9cb92a0fc7..489708f4e096 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3871,7 +3871,12 @@ int __init cpuset_init(void) =20 BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL)); =20 - have_boot_nohz_full =3D housekeeping_enabled(HK_TYPE_KERNEL_NOISE); + /* + * HK_TYPE_KERNEL_NOISE bit can be set without any nohz_full CPU + */ + have_boot_nohz_full =3D housekeeping_enabled(HK_TYPE_KERNEL_NOISE) && + !cpumask_equal(cpu_possible_mask, + housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)); have_boot_isolcpus =3D housekeeping_enabled(HK_TYPE_DOMAIN); if (have_boot_nohz_full) { BUG_ON(!alloc_cpumask_var(&boot_nohz_full_hk_cpus, GFP_KERNEL)); --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 237CA28467B for ; Fri, 8 Aug 2025 15:21:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666491; cv=none; b=UQW1KB1nPkTyJ0ICu0kGG4trXl0AvYtuV0+hOY5qCHUmtxJsxVOWv41yfNl/3BtEIs8qgVYosFbP0oFn7bMBd4PpGJi6e+tRx8Nx238iJ9UVyuRijvXxvJQYKeok3CNRIExWL2k+AWNW5/wpY7Sk9ZxT/EeKRywu99hmXCtowLc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666491; c=relaxed/simple; bh=MBqCNbw0/4mMEsbRSaYRdm3BtIBo7r05XDeSRZQ1KiA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LNfgizzVNNvHLWZz7PEHUFo0Msci9AeTKdEY+gMkSvpwjfLjGqbkiCaQwNKwUmX/z0jtsCL0svXUuRmF/ichq42wK0kT23jjZHRH4iOkaPTY+j5NKCzOA6DGX8ZRVssI3DtNPFEKly+apz5/RRxfbP57RoeJQLl9VP6aDZjrYgQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dWGWiJlc; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dWGWiJlc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666486; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=adgD1KsqgDBawHB48N8oiqlBQgZtCJcFgOO35w2t8QQ=; b=dWGWiJlcPzBw5K3jqivKScXVyUS8LavOt8KtqdBGa9kGeZEa/7rtSlGRtL/TAsPGUUUUaK 4OTEwsksbOK/LhgwJl48lVEd3jY+BCGMgUs5Gu75wRi6b/PuMo+pXZ+lJJA6rKjexpLcDm 3V5NGGQxtpuCtPm6U22m9Cw5KHvx1Qs= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-213-bNd6t_mVN-WPjfintDeu9A-1; Fri, 08 Aug 2025 11:21:20 -0400 X-MC-Unique: bNd6t_mVN-WPjfintDeu9A-1 X-Mimecast-MFC-AGG-ID: bNd6t_mVN-WPjfintDeu9A_1754666468 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EC26F195605F; Fri, 8 Aug 2025 15:21:07 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1DDF4180029B; Fri, 8 Aug 2025 15:21:01 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 17/18] cgroup/cpuset: Documentation updates & don't use CPU 0 for isolated partition Date: Fri, 8 Aug 2025 11:20:00 -0400 Message-ID: <20250808152001.20245-8-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" As CPU hotplug is now used to improve CPU isolation of CPUs in isolated partitions. The boot CPU (typically CPU 0) cannot be put offline impacting the amount of CPU isolation available. Now we have to advise users that the boot CPU should never be used for isolated partitions. A warning will be printed when boot CPU is used and the cgroup-v2.rst is updated accordingly. The test_cpuset_prs.sh selftest is also updated to remove CPU 0 when forming isolated partitions. Also update the cgroup-v2.rst file to show the need to specify the "nohz_full" kernel boot parameter to enable better nohz_full behavior for the CPUs in isolated partitions as well as the latency spike issue with using CPU hotplug. Signed-off-by: Waiman Long --- Documentation/admin-guide/cgroup-v2.rst | 33 +++++++++++++++---- .../selftests/cgroup/test_cpuset_prs.sh | 8 ++--- 2 files changed, 31 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-= guide/cgroup-v2.rst index d9d3cc7df348..26213383b34b 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2556,11 +2556,12 @@ Cpuset Interface Files =20 It accepts only the following input values when written to. =20 - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D "member" Non-root member of a partition "root" Partition root - "isolated" Partition root without load balancing - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + "isolated" Partition root without load balancing and other + OS noises + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D =20 A cpuset partition is a collection of cpuset-enabled cgroups with a partition root at the top of the hierarchy and its descendants @@ -2593,9 +2594,29 @@ Cpuset Interface Files =20 When set to "isolated", the CPUs in that partition will be in an isolated state without any load balancing from the scheduler - and excluded from the unbound workqueues. Tasks placed in such - a partition with multiple CPUs should be carefully distributed - and bound to each of the individual CPUs for optimal performance. + and excluded from the unbound workqueues as well as without + other OS noises. Tasks placed in such a partition with multiple + CPUs should be carefully distributed and bound to each of the + individual CPUs for optimal performance. + + As CPU hotplug, if supported, is used to improve the degree of + CPU isolation close to the "nohz_full" kernel boot parameter. + The boot CPU (typically CPU 0) cannot be brought offline, so the + boot CPU should not be used for forming isolated partitions. + The "nohz_full" kernel boot parameter needs to be present to + enable full dynticks support and RCU no-callback CPU mode for + CPUs in isolated partitions even if the optional cpu list + isn't provided. Without that, adding the "rcu_nocbs" boot + kernel parameter without the cpu list can be used to enable + RCU no-callback CPU mode without full dynticks. + + Using CPU hotplug for creating or destroying an isolated + partition can cause latency spike in applications running + in other isolated partitions. A reserved list of CPUs can + optionally be put in the "nohz_full" kernel boot parameter to + alleviate this problem. When these reserved CPUs are used for + isolated partitions, CPU hotplug won't need to be invoked and + so there won't be latency spike in other isolated partitions. =20 A partition root ("root" or "isolated") can be in one of the two possible states - valid or invalid. An invalid partition diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/test= ing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..f61369be8bf6 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -318,8 +318,8 @@ TEST_MATRIX=3D( # Invalid to valid local partition direct transition tests " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1= :1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1= :1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3" - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:= 4-6 A1:P-2|B1:P0" - " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:= 4-6 A1:P2|B1:P0 0-3" + " C1-3:P2 . . C4-6 C1-4 . . . 0 A1:1-4|B1:= 4-6 A1:P-2|B1:P0" + " C1-3:P2 . . C4-6 C1-4:C1-3 . . . 0 A1:1-3|B1:= 4-6 A1:P2|B1:P0 1-3" =20 # Local partition invalidation tests " C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \ @@ -329,8 +329,8 @@ TEST_MATRIX=3D( " C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \ . . C4:X . . 0 A1:1-3|A2:1-3|A3:2-3|XA2:|XA3: = A1:P2|A2:P-2|A3:P-2 1-3" # Local partition CPU change tests - " C0-5:S+:P2 C4-5:S+:P1 . . . C3-5 . . 0 A1:0-2|A2:= 3-5 A1:P2|A2:P1 0-2" - " C0-5:S+:P2 C4-5:S+:P1 . . C1-5 . . . 0 A1:1-3|A2:= 4-5 A1:P2|A2:P1 1-3" + " C1-5:S+:P2 C4-5:S+:P1 . . . C3-5 . . 0 A1:1-2|A2:= 3-5 A1:P2|A2:P1 1-2" + " C1-5:S+:P2 C4-5:S+:P1 . . C2-5 . . . 0 A1:2-3|A2:= 4-5 A1:P2|A2:P1 2-3" =20 # cpus_allowed/exclusive_cpus update tests " C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \ --=20 2.50.0 From nobody Sun Dec 14 20:26:39 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A87A4283FE8 for ; Fri, 8 Aug 2025 15:21:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666491; cv=none; b=h3egvPnfYPq69EXSQBR5Qyqqk4P6vfmV99p3BloUa0OPpaLjUjgajUm+MeweEK0rqvFHISC9OdLcas2LTA/tlpqP8Kwc2PCEHoyBELjLIBs3tQivpuqXgf/UpmQpHQGa80eJb2D4bHMXV+rLeV44ys/5RUBje35Eg4s9FGikeVI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754666491; c=relaxed/simple; bh=ZncYki1KL4ndbu/cEBgE3ithy3n1MacpG1Kddz8qY/A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y/X0sMKhVAzglqS6/DaQOhAGX2FhkKA5G9Qh+mN/TVA37e0kvd+oK6pTYx94Y9whMO2VBawaxpw98K/DHh43xyKDgg2wGXrmUlA+yBY1yzV8NeMX9nIhIZz/JuRzmtcZkvqWBaYjjBwtE/2ivgVfB8T8zjulMybJpuTdCYFx/rc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PC/H9Say; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PC/H9Say" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754666486; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OkInnKItN1D64SESBQnElXZy3eYPYkSB5pXAhM/lfjA=; b=PC/H9SayD/FfXdiclMJWunqQqsVfRkgkpR/du9L6/CF33MKONaitGHup6HlKYVBPy1n2AZ 52mNbdDLkuIXIa2Rjmw7MPInJ0JZ4gW293ZOVW/Auezolq7pNGI+9eOdDkYavOkuUXgHO8 cGgeHtzMZBa7qEarXpLfrwNCKSbF8UA= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-596-a8ijxvpuNdOkYZYzaCQICw-1; Fri, 08 Aug 2025 11:21:22 -0400 X-MC-Unique: a8ijxvpuNdOkYZYzaCQICw-1 X-Mimecast-MFC-AGG-ID: a8ijxvpuNdOkYZYzaCQICw_1754666474 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 17E4919560B1; Fri, 8 Aug 2025 15:21:14 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3D1D51800294; Fri, 8 Aug 2025 15:21:08 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 18/18] cgroup/cpuset: Add pr_debug() statements for cpuhp_offline_cb() call Date: Fri, 8 Aug 2025 11:20:01 -0400 Message-ID: <20250808152001.20245-9-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Add some pr_debug() statements to actions performed related to the cpuhp_offline_cb() call to aid debugging. Since rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() will print out some info text, there is no need to add pr_debug() statements for them. Also update test_cpuset_prs.sh test script to enable printing of dynamic debug messages for the kernel/cgroup/cpuset.c file when loglevel is 7 (debug). Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 18 +++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 7 +++++++ 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 489708f4e096..30632e4b5899 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -21,6 +21,7 @@ * License. See the file COPYING in the main directory of the Linux * distribution for more details. */ +#define pr_fmt(fmt) "cpuset: " fmt #include "cpuset-internal.h" =20 #include @@ -1406,10 +1407,13 @@ static int do_housekeeping_exclude_cpumask(void *ar= g __maybe_unused) if (tick_nohz_full_enabled()) { tick_nohz_full_update_cpus(icpus); for_each_cpu(cpu, isolcpus_update_state.cpus) { - if (cpumask_test_cpu(cpu, icpus)) + if (cpumask_test_cpu(cpu, icpus)) { + pr_debug("Add CPU %d to nohz_full\n", cpu); ct_cpu_track_user(cpu); - else + } else { + pr_debug("Remove CPU %d from nohz_full\n", cpu); ct_cpu_untrack_user(cpu); + } } } else { pr_warn_once("Full dynticks cannot be enabled without the nohz_full kern= el boot parameter!\n"); @@ -1425,6 +1429,7 @@ static int do_housekeeping_exclude_cpumask(void *arg = __maybe_unused) ret =3D rcu_nocb_cpu_offload(cpu); else ret =3D rcu_nocb_cpu_deoffload(cpu); + if (WARN_ON_ONCE(ret)) break; } @@ -1468,11 +1473,14 @@ static void update_isolation_cpumasks(void) * Without any change in the set of nohz_full CPUs, we don't really * need to use CPU hotplug for making change in HK cpumasks. */ - if (cpumask_empty(isolcpus_update_state.cpus)) + if (cpumask_empty(isolcpus_update_state.cpus)) { ret =3D do_housekeeping_exclude_cpumask(NULL); - else + } else { + pr_debug("cpuhp_offline_cb() called for CPUs %*pbl\n", + cpumask_pr_args(isolcpus_update_state.cpus)); ret =3D cpuhp_offline_cb(isolcpus_update_state.cpus, do_housekeeping_exclude_cpumask, NULL); + } /* * A errno value of -EPERM may be returned from cpuhp_offline_cb() if * any one of the CPUs in isolcpus_update_state.cpus can't be brought @@ -1481,7 +1489,7 @@ static void update_isolation_cpumasks(void) * isolated partition. */ if (ret =3D=3D -EPERM) - pr_warn_once("cpuset: The boot CPU shouldn't be used for isolated partit= ion\n"); + pr_warn_once("The boot CPU shouldn't be used for isolated partition\n"); else WARN_ON_ONCE(ret < 0); =20 diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/test= ing/selftests/cgroup/test_cpuset_prs.sh index f61369be8bf6..43a12690775e 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -67,6 +67,13 @@ then echo Y > /sys/kernel/debug/sched/verbose fi =20 +# Enable dynamic debug messages for cpuset only +DYN_DEBUG=3D/sys/kernel/debug/dynamic_debug/control +[[ -f $DYN_DEBUG ]] && { + echo "-p" > $DYN_DEBUG + echo "file kernel/cgroup/cpuset.c +p" > $DYN_DEBUG +} + cd $CGROUP2 echo +cpuset > cgroup.subtree_control =20 --=20 2.50.0