From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A92B032E757 for ; Thu, 18 Jun 2026 03:11:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752293; cv=none; b=IEcLEc1xc6qGgMU3MXzfKpm7H3t7h+p9UKmlZCzrKfU3FZNqAbzaBMo8FzVpavDge/Zceo45/0st5VjOUWbab2uif5ZrpNCdM7SFtTCxVTe11XJe2foWN1yXNK4ruC4IUtq9EzF3AaqgLPbbsact0nbUlLZO8bFBH259YUv2aXk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752293; c=relaxed/simple; bh=LFnUvZsdMV36HeugdzJp7p/REa5uLOArHKOIqQWkJY0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ILbEDDkEzRbB1+zu1dh7zIHfNWktKEsFIeQOI8d3aob9d+0d6ZuGhNFg8O1IYA4qCcQ2/9BL52rcfUkF7dlu+hV2Qpug8byqzHJRNhx7lyZmmNzPbKkJp6OpTjWNQ3oAycXLWMd4Za2qL7ZClUmhkzRHwg7Ql9jXGvL3yKwIEIQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=djEp0eok; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="djEp0eok" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-2c6bb8a5980so3214545ad.2 for ; Wed, 17 Jun 2026 20:11:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752291; x=1782357091; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=WzKJaLY4P29xr36n94cF6pafuVv3SDEbWpOz12HsOhM=; b=djEp0eokhQ2NMzAcX24vYUFw7KCOYo04rjPuhwnJxmAZUa3gbFVTm+4j3DGzDUGPyI DIKUBNI+QvlSbkuKWdia3u4U+ICMFPC3G4vw5fJBciBwb87g6yNhgaJ79ePKvvyAG1to 9AFC9zVvEUuU9Tqk5/+GOczqNlBGwyrLCrfNsLfoW0VUE4mzaoS6FrnEaeP8/8TNjG4V eCAKW3knJoh8YMN/rAVJAbLfn/W5c2Jte6eAUKlcwCD9/yVhgUqSsEmstPkEWKKeoC1l /djM+oK4tN5Iayfhe9A6NKs26Jrap7Gkn+J4WbhSaj8NM5Idz3QanszWiLIJseWhGKgR xHxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752291; x=1782357091; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=WzKJaLY4P29xr36n94cF6pafuVv3SDEbWpOz12HsOhM=; b=QsYWjdPIpj9MbpxC9NVOz7m0XF/oyEtlWTiy0mGoJ0Snqv3IjsPAsNti3ZEPnqn2zj 3dh6v5SXcxgjg3SG0ueAsFUj4/bVKZ+zm/0QDraaB7Onj/Lw5J/RgxqO1EnYbn2DeMwD BcMD2nyrlDwSfkxs4scjYHldKhVt4eUC/59CdF0oXTW0sELL1MQLlFJ53sW31XqoK+/D s6uAXaWIae+kPRuh0Mt9eHR2EbCvXauQagjj10EvIiaYwiGQ3G+OS2ELTHIlsUicC0WL V4JAGRI+Bq4bT/SmSKroxUINHVhgAcnbFm8mhSfrDrFZWvBdg69+maRRlMDozGUY9HJO cFRg== X-Gm-Message-State: AOJu0YzDboeT6x8S+UZ9W9jvnLAnR5BW7TBQEywDQ8Ijdv8crHiq3ns/ GzyQPPfyL03p0Wd88SRYL0r0XWV38M82q5XGMcbgNcF8QT9G5P5ceDOU X-Gm-Gg: AfdE7cmNL1+Y5TOktXI9uPWlzuw9Qh8tMLEnFcOyT4hT9fF7J3EhjHK4CGX8X/gQq4k 2g+si58SySjyfQUCTLhc3Dd0F1S5FCJG1AaSMQlTdNzv5VMIUqdoUT9SgycADwjlA6QsJSUZNwy l9Zuoh83H0Y9kljNr52a88r92/sG9xot30FQbiLqzFRH1xGb0IZlWZN499uiN6jShsYYUB0mUIs p98epxU8hDtrkwqGPCq35WIYYqkbGczGZ/G//xurBxQWDS2j82LVM7EigAkiGq7vrmhj2Gy602z Q4qV7YAqq/8I1dLltz6fC9kz9AXP2apz+4jmLK5W3mV3+MeLEcuxQwEsT5ehKGtMATgHnFaeuiP yecIZA0tpUe3fMApouCIHbibeOVxfZYkI1QW975tYe6XtzuLzlufqvZCNQEu/Dk2ew8v5U/VSvt L9PxpRYOxwMw8= X-Received: by 2002:a17:902:f688:b0:2c6:c66b:4b03 with SMTP id d9443c01a7336-2c6e4746cb1mr16305205ad.10.1781752290966; Wed, 17 Jun 2026 20:11:30 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.11.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:11:30 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:12 +0800 Subject: [PATCH v3 01/13] sched/isolation: Replace notifier chain with explicit callback interface Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-1-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Replace the blocking notifier chain with an explicit per-type callback table (struct housekeeping_cbs). Each subsystem registers callbacks at initcall time; pre_validate() runs before the RCU pointer swap to allow rejecting the update, and apply() runs after synchronize_rcu() when the new mask is visible to readers. The table is limited to HK_MAX_CBS (4) slots per type, sufficient for the kernel-noise subsystems and avoiding unbounded dynamic allocation in the update path. The interface provides deterministic callback order and explicit registration, giving each subsystem maintainer clear visibility into when and why its callback is invoked =E2=80=94 unlike the opaque priority-based dispatch of notifier chains. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- include/linux/sched/isolation.h | 31 +++++++++++++++ kernel/sched/isolation.c | 87 +++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 118 insertions(+) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index cf0fd03dd7a24..f362876b3ebdf 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -46,6 +46,33 @@ extern bool housekeeping_test_cpu(int cpu, enum hk_type = type); extern int housekeeping_update(struct cpumask *isol_mask); extern void __init housekeeping_init(void); =20 +/** + * struct housekeeping_cbs - Per-subsystem callbacks for housekeeping mask= changes + * @name: Subsystem name for diagnostic messages + * @pre_validate: Run before RCU pointer swap. Return -EINVAL + * to reject the update. + * @apply: Run after synchronize_rcu(). Reconfigure subsystem + * state. The new mask is visible to readers. + * + * Register subsystem callbacks at initcall time. + * Invoke callbacks in registration order when the corresponding + * housekeeping mask changes. Skip types not present in the update + * mask. + * + * Replace the notifier-chain pattern with deterministic callback + * ordering. + */ +struct housekeeping_cbs { + const char *name; + int (*pre_validate)(enum hk_type type, + const struct cpumask *cur_mask, + const struct cpumask *new_mask); + void (*apply)(enum hk_type type); +}; + +int housekeeping_register_cbs(enum hk_type type, struct housekeeping_cbs *= cbs); +int housekeeping_unregister_cbs(enum hk_type type, struct housekeeping_cbs= *cbs); + #else =20 static inline int housekeeping_any_cpu(enum hk_type type) @@ -73,6 +100,10 @@ static inline bool housekeeping_test_cpu(int cpu, enum = hk_type type) =20 static inline int housekeeping_update(struct cpumask *isol_mask) { return = 0; } static inline void housekeeping_init(void) { } +static inline int housekeeping_register_cbs(enum hk_type type, + struct housekeeping_cbs *cbs) { return 0; } +static inline int housekeeping_unregister_cbs(enum hk_type type, + struct housekeeping_cbs *cbs) { return 0; } #endif /* CONFIG_CPU_ISOLATION */ =20 static inline bool housekeeping_cpu(int cpu, enum hk_type type) diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index ef152d401fe20..aae4dff7fbfc8 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -28,6 +28,93 @@ struct housekeeping { =20 static struct housekeeping housekeeping; =20 +/* + * Maintain an explicit callback table indexed by housekeeping type. + * Invoke callbacks for affected types in deterministic order: + * pre_validate() before the RCU pointer swap, apply() after + * synchronize_rcu(). + */ +#define HK_MAX_CBS 4 + +static struct { + struct housekeeping_cbs *cbs[HK_MAX_CBS]; + int nr; +} housekeeping_cbs_table[HK_TYPE_MAX]; + +/** + * housekeeping_register_cbs - Register explicit callbacks for a housekeep= ing type + * @type: Housekeeping type to register for + * @cbs: Callback structure containing pre_validate() and apply() + * + * Callbacks run in registration order when the mask for @type changes: + * pre_validate() before the RCU swap may reject the update; apply() + * after synchronize_rcu() reconfigures subsystem state. + * + * Return: 0 on success, -EINVAL if @type or @cbs is invalid, + * -ENOSPC if the per-type table is full. + */ +int housekeeping_register_cbs(enum hk_type type, struct housekeeping_cbs *= cbs) +{ + if (type >=3D HK_TYPE_MAX || !cbs) + return -EINVAL; + if (housekeeping_cbs_table[type].nr >=3D HK_MAX_CBS) + return -ENOSPC; + housekeeping_cbs_table[type].cbs[housekeeping_cbs_table[type].nr++] =3D c= bs; + return 0; +} +EXPORT_SYMBOL_GPL(housekeeping_register_cbs); + +/** + * housekeeping_unregister_cbs - Remove previously registered callbacks + * @type: Housekeeping type + * @cbs: Callback structure to remove + * + * Return: 0 on success, -EINVAL if arguments are invalid, + * -ENOENT if @cbs was not registered. + */ +int housekeeping_unregister_cbs(enum hk_type type, struct housekeeping_cbs= *cbs) +{ + int i; + + if (type >=3D HK_TYPE_MAX || !cbs) + return -EINVAL; + for (i =3D 0; i < housekeeping_cbs_table[type].nr; i++) { + if (housekeeping_cbs_table[type].cbs[i] =3D=3D cbs) { + housekeeping_cbs_table[type].cbs[i] =3D + housekeeping_cbs_table[type].cbs[--housekeeping_cbs_table[type].nr]; + return 0; + } + } + return -ENOENT; +} +EXPORT_SYMBOL_GPL(housekeeping_unregister_cbs); + +static int housekeeping_pre_validate_cbs(enum hk_type type, + const struct cpumask *cur, + const struct cpumask *new) +{ + int i, ret; + + for (i =3D 0; i < housekeeping_cbs_table[type].nr; i++) { + if (!housekeeping_cbs_table[type].cbs[i]->pre_validate) + continue; + ret =3D housekeeping_cbs_table[type].cbs[i]->pre_validate(type, cur, new= ); + if (ret < 0) + return ret; + } + return 0; +} + +static void housekeeping_apply_cbs(enum hk_type type) +{ + int i; + + for (i =3D 0; i < housekeeping_cbs_table[type].nr; i++) { + if (housekeeping_cbs_table[type].cbs[i]->apply) + housekeeping_cbs_table[type].cbs[i]->apply(type); + } +} + bool housekeeping_enabled(enum hk_type type) { return !!(READ_ONCE(housekeeping.flags) & BIT(type)); --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54BB3331EA8 for ; Thu, 18 Jun 2026 03:11:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752299; cv=none; b=pK3OZ6lyWui8HUzTvjd3waY6FEj0zbuOYjVJ37YR8nGl3mPRY1JtzfmcH8WS561RaCt5VOvBWG23k8Ir0lYsXSz/vJdupUhj5ab1S9UiEwRL5H6kkfuWBqMhxH757OlgkFyFNTZaKov+Wh9My8KHQ/7oLvLAZ9mahIAkxWUdrGo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752299; c=relaxed/simple; bh=wNAVoY+2XUhNXD7ws5hDG4Y7LngUTCxkc6ilR6rWf9o=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=XV7vXuT4LofhXuCe7S0cGepMk7LzhZkCVW73Vob0PM3EC7684+Ms156vGj6KnXVTQtnxjfxE+BC/OU5+eL2WtM224bYxq4aFFu6YvS1GX1tJKypymIjBqRukN06u6Zx1TwAg99lT/mkIj1LX/EM/efzqKv3PKyZ4DzQhrBZHHQc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RNbVv9Mb; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RNbVv9Mb" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2c6be9cd7afso2023015ad.1 for ; Wed, 17 Jun 2026 20:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752298; x=1782357098; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=GH2J//DFdecoNTzDNyFAADQileJGwCOemNHhoqIoONY=; b=RNbVv9Mb9PpiUrdAQpE9vJagWCFUYRcE4SfoUFAoTQjFEO/62sIBh5NDM+zQtGy48C CLRHkE0ctwXA9zcmEbRa3fLDapHAQnWkU+TqIscCln0iqqviIt7iCg8xYMABo8onmNJG Rby8kq6RpslJcF/iy+8aBy4ZbncoWvcjXlWFCzJtxFIoey93oo7m6ieRNRHOo3aOTvB6 +3caSVFAgM7vYRzjBKNU21wBmhhKRVAxhcjZ+Hwoj1uvl7vgUO7ub82M+P4Q9UC83GSE 4oJbOS4+8sJeiG6KEX2KfTgcjx4uewd7z9n+1mqdQ9y7R/oCrdT+TgZtesptSoF1Slbt qAQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752298; x=1782357098; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=GH2J//DFdecoNTzDNyFAADQileJGwCOemNHhoqIoONY=; b=anmlma97MdEeKNBaHHYRjXZi+JUAGIUxTZJt4LlhOyVI+M2n/kSCxTWbr+oQVRsyQ+ 0gTurJZEYpXbrIp8i/k+/hG7BKlqgJqgfc+6u7Lb+cmRyZHxhznu/iyM0zMZfyIhmz80 7NDGV0+w4XPJF8uPj2znUp0KmPyTIwXAlTb6HPy97tj7kv1D66exK469nHtR10xPCEfx J5DLa6Gs4/LP8DeYvNWrAcrFSQJztXvD9QWHeoe0P/f7YTyzh4BJt9FF6UQByy337NZ7 jC1cAN9x2APjCvmpu3x3lEn7ej+TaEdg1UWuUgjrrwFOJe9uDkfI0cWAZIA9NqFTAFU9 jXxg== X-Gm-Message-State: AOJu0YzvPxDvRQG96rO5Ms2h/tH70gXO3PGPVA/ew+s9M69SuAMfE/jF z+0LblqkU3xGcPVsSWjadAOHRWT//itfSq9PT9Ye1B9rHj9g5Kvjg107 X-Gm-Gg: AfdE7cnMtpNSS71hHDQXQxZJYiKk4PNvpZL2lX541vRH9TYCgzLznfqOyzxOIy9tNT5 Ni9lJX34TSWW+oZASJ1BVKo+tEdHYMGT1ywdoof7cZMqCueAmpjCSsl52b/NTVTMEaOyFLVFRci VvGm0T8J+39+3o8CReJw+YcoFRbrTg53Jln1kf+KQa4k9HaQt4rQfvZK2duMR2+8rQJ270WFu87 b9wHE2fMuj5OsSCw1NMa7SkvhFAbROc9XnmbIhElY6mqts9uqWzAJpWVCEkv3Ooq83ECovXNWgd sCy31gYKdDyFezn1xL5Avu+LRE6onJ34bSM3B51hSbt9o1MagRck2dYI5S0W79wuF2UJm2tLCsN jZz15hyeaVWahCE6rWwYLTetVZK09a2Ae/iFV8KStY4Ynbs8HXIkZPKBj9cC5xt8OPzlSZdB0aT ZT5Q/ZMmgwa64= X-Received: by 2002:a17:903:3550:b0:2c0:e5ee:f56c with SMTP id d9443c01a7336-2c6e527622dmr16575975ad.20.1781752297602; Wed, 17 Jun 2026 20:11:37 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.11.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:11:37 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:13 +0800 Subject: [PATCH v3 02/13] sched/isolation: Add housekeeping_update_types() for kernel-noise masks Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-2-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Introduce housekeeping_update_types(), which updates the cpumask for each specified housekeeping type atomically using an RCU pointer swap. For each type in @type_mask the trial mask is computed as (base & ~isol_mask), where the base depends on the type: - Most types use the current housekeeping cpumask as base. For types that are only set at boot this is equivalent to the boot mask, so trial =3D (boot_mask & ~isol_mask). - HK_TYPE_KERNEL_NOISE always uses cpu_possible_mask as base. Its semantics are "all possible CPUs minus the currently-isolated set"; using the current HK mask instead would leave it stuck at its last non-trivial value after de-isolation, breaking subsequent isolation cycles. HK_TYPE_KERNEL_NOISE also supports runtime first-enable: if it was not registered at boot (no nohz_full=3D on the kernel command line), housekeeping_update_types() registers it in housekeeping.flags on the first call. All other types must already be boot-enabled. For each type the function validates the trial mask against cpu_online_mask, runs registered pre_validate() callbacks (which may reject the update), swaps all RCU cpumask pointers in a single pass, calls synchronize_rcu(), frees the old masks, and then runs apply() callbacks. The existing housekeeping_update() continues to update only HK_TYPE_DOMAIN and remains the entry point for the cpuset partition path. housekeeping_update_types() enables the partition path to also drive the kernel-noise types (HK_TYPE_KERNEL_NOISE, HK_TYPE_MANAGED_IRQ) through the explicit callback interface added in the previous patch. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- include/linux/sched/isolation.h | 4 ++ kernel/sched/isolation.c | 112 ++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 116 insertions(+) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index f362876b3ebdf..eecbcbe802bd0 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -44,6 +44,8 @@ extern bool housekeeping_enabled(enum hk_type type); extern void housekeeping_affine(struct task_struct *t, enum hk_type type); extern bool housekeeping_test_cpu(int cpu, enum hk_type type); extern int housekeeping_update(struct cpumask *isol_mask); +extern int housekeeping_update_types(unsigned long type_mask, + struct cpumask *isol_mask); extern void __init housekeeping_init(void); =20 /** @@ -99,6 +101,8 @@ static inline bool housekeeping_test_cpu(int cpu, enum h= k_type type) } =20 static inline int housekeeping_update(struct cpumask *isol_mask) { return = 0; } +static inline int housekeeping_update_types(unsigned long type_mask, + struct cpumask *isol_mask) { return 0; } static inline void housekeeping_init(void) { } static inline int housekeeping_register_cbs(enum hk_type type, struct housekeeping_cbs *cbs) { return 0; } diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index aae4dff7fbfc8..4eca18cc5e8ce 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -249,6 +249,118 @@ int housekeeping_update(struct cpumask *isol_mask) return 0; } =20 +/** + * housekeeping_update_types - Update housekeeping masks for specified typ= es + * @type_mask: Bitmask of housekeeping types to update + * @isol_mask: CPUs being added to the isolation set + * + * For each type in @type_mask that was enabled at boot, compute the + * trial mask as (boot mask & ~@isol_mask), validate it against + * @cpu_online_mask, invoke pre_validate() callbacks, swap the RCU + * mask pointer, and run apply() callbacks after synchronize_rcu(). + * + * HK_TYPE_KERNEL_NOISE also supports runtime first-enable: when an + * isolated cpuset partition is created without nohz_full=3D at boot, + * cpu_possible_mask is used as the initial base and the type flag is + * set in housekeeping.flags on the first call. + * + * Return: 0 on success, -ENOMEM on allocation failure, -EINVAL if + * a trial mask has no online CPUs. + */ +int housekeeping_update_types(unsigned long type_mask, + struct cpumask *isol_mask) +{ + struct cpumask *trials[HK_TYPE_MAX] =3D {}; + struct cpumask *old_masks[HK_TYPE_MAX] =3D {}; + enum hk_type type; + int ret =3D 0; + + for_each_set_bit(type, &type_mask, HK_TYPE_MAX) { + const struct cpumask *base; + + if (type =3D=3D HK_TYPE_DOMAIN_BOOT) + continue; + if (!housekeeping_enabled(type)) { + /* + * HK_TYPE_KERNEL_NOISE supports runtime first-enable + * for DHM isolated partitions created without nohz_full=3D + * at boot. All other types must be boot-enabled. + */ + if (type !=3D HK_TYPE_KERNEL_NOISE) + continue; + } + + /* + * HK_TYPE_KERNEL_NOISE always uses cpu_possible_mask as its + * base. Its semantics are exactly "cpu_possible minus the + * currently-isolated set", so the base never shrinks across + * successive isolation/de-isolation cycles. If we used the + * current HK mask instead, de-isolating all partitions would + * leave the mask at its last non-trivial value rather than + * reverting to cpu_possible, breaking subsequent isolations. + */ + if (type =3D=3D HK_TYPE_KERNEL_NOISE) + base =3D cpu_possible_mask; + else + base =3D housekeeping_cpumask(type); + trials[type] =3D kmalloc(cpumask_size(), GFP_KERNEL); + if (!trials[type]) { + ret =3D -ENOMEM; + goto err_free; + } + cpumask_andnot(trials[type], base, isol_mask); + if (!cpumask_intersects(trials[type], cpu_online_mask)) { + ret =3D -EINVAL; + goto err_free; + } + } + + if (!housekeeping.flags) { + ret =3D -EINVAL; + goto err_free; + } + + for_each_set_bit(type, &type_mask, HK_TYPE_MAX) { + if (!trials[type]) + continue; + ret =3D housekeeping_pre_validate_cbs(type, + housekeeping_cpumask(type), + trials[type]); + if (ret < 0) + goto err_free; + } + + for_each_set_bit(type, &type_mask, HK_TYPE_MAX) { + if (!trials[type]) + continue; + old_masks[type] =3D housekeeping_cpumask_dereference(type); + /* First-time runtime enable: register the type now. */ + if (!housekeeping_enabled(type)) + WRITE_ONCE(housekeeping.flags, + housekeeping.flags | BIT(type)); + rcu_assign_pointer(housekeeping.cpumasks[type], trials[type]); + trials[type] =3D NULL; + } + + synchronize_rcu(); + + for_each_set_bit(type, &type_mask, HK_TYPE_MAX) { + if (housekeeping_cbs_table[type].nr =3D=3D 0) + continue; + housekeeping_apply_cbs(type); + } + + for_each_set_bit(type, &type_mask, HK_TYPE_MAX) + kfree(old_masks[type]); + + return 0; + +err_free: + for_each_set_bit(type, &type_mask, HK_TYPE_MAX) + kfree(trials[type]); + return ret; +} + void __init housekeeping_init(void) { enum hk_type type; --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A8BC7326939 for ; Thu, 18 Jun 2026 03:11:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752306; cv=none; b=kPef7TIazhWzTulGqxVDkjM3YjY34IcrQMlI0YSWb9d5BqqBiyuJ2xOl7HA74oVsPj/Q6aKzqJqF1M3s74x6clltzYcOLv5l8IEz8AUT8h4nrrw5jxo1pztYHbqoXtZHb9h2EhC9NWseSSLKRZDD4fKQuzAoMK3PP+Bzur9IiBs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752306; c=relaxed/simple; bh=YCJVKs6GRsboN/yU+D9T5MpjnesbxxZmC8h0qZxZbQI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=hWA2XiSwUrIsY9WOsG4P8LruQ1rFGNzPuGWDyPJSzX6tpcVejuupH+fa9aMe4CdcxtsJB4Cp1AQanYkA+Ye0D0/2TVDUpDv5rirOAlIJRTSd/sF8War+/a5UzpEV4z+yEGKZcKCa412jxg+z2QR2Aq99/ZMYee378f3xe9oDoww= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=snYOpuep; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="snYOpuep" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-2c0c3546924so4126425ad.3 for ; Wed, 17 Jun 2026 20:11:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752304; x=1782357104; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=8004pEbSIDGa1Ap5Bs3Wfq58HXT/Y9JfCOcV2ZZV6yI=; b=snYOpuep97JPsToZB0W4g+q370ip3Aj3uGEwrKAOGhSl8Iz3CJNqwGKO9CYLK8DcYS jZAPlE+khqKLi7t4hITH6mq/qljMvAyJL+g+paXOa/FBOpOgmjayosvtW+/npENk7oEz pWaTABQNWUSzM+XclOHGElKZOxGPjRNV9XrtzOp81HB9fMEahH3fqdtMdwb+V5+b0Cvw Ywy30/UFVZ/vhyoYPVV0T6HIYZI627suVm4Lp2FLtVRFG7nSWTlTKiTBoSckxB8lFuNh FQJsmltWJDjWYd3wyuGt+p4XM2wXxXZLASQsvaiyJ2JABufc15DqDY78F/3EgjHH7Lja SWgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752304; x=1782357104; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=8004pEbSIDGa1Ap5Bs3Wfq58HXT/Y9JfCOcV2ZZV6yI=; b=kFFgJH5y8/YBdcfJ6TpbHm+kypC9klNqI5bWGih1DYfbwz5LoP6XrQesxdujyCKDnq yq2OfC+zqcjAQsMOrsE0WsGGAZe6MIZG3VUTSYPyrUzZn22FesdofF9eHax5BK6jBA2+ qEdApRd4S5Auss0Wq1LNCfASdpdqQC2all5ZS82xChGRvW/xFwdEFlmIPqkeoxqfQqg+ Yk/RtYoop4zn2v70INAcaNuRwXIy7Tp+yEOWcZwDziFTTbboSykwsedyFW9IzOVsqnS/ sKNKthlLcP10lQf7L2kTdai6vlcBHU/bjCk+1B56qKHnO9W3DCPPp7xTgDT1F7o2aG9L 2nHg== X-Gm-Message-State: AOJu0YzvFgPh2qU1KeXIjEeKserVPpiO4mQU/jsW80+l7BdALeVbDW6e Tjxfr8cuSgsalsoHob8C6kx4Skwdmt1c2xSoTRsxq1Z9ut7ZR7/WtdhE X-Gm-Gg: AfdE7cmda0cOpPA4oiI+G1aksTmE1joc9t9DZUugPtUAOyjjaNxJ4Z14Q4Qn8gqUP9m 8HZ155OEtZ/MnTpO9mBRnqYeM/dras9ij5oz5H4Zzmk+kVNzp1KeBpOf8sMHoKSQPJNeXrdpcIT 7LbNtHoEAXGQm0D+2jWFOoBjAcykKpb+3humdHsBlZF+MP1MvLWWOREpbs9GAthc9iZkfimnqMB RG2Vra+fov8OwSDGn84s0kZx9qVhpj+wMn4/mHe6lceTpYgp6AEf/qdIw5sPsPVxrwXADYFSmLE Wl3QfIS7NAsqq1zdr6fezjyTgqf+VVy3DLa4FoiBZv5y0EjAQj58oIEx6q/XjoHTQWuSIH9RxBB 0+gUbkJo4cMCM878UsquXJg4clI8soAAALa38lp8BN5HdIkS8/jYbA4m/ULHhgKacmPJG2SR+h7 iC7dmDpp5I04w= X-Received: by 2002:a17:902:e889:b0:2c1:150d:6db5 with SMTP id d9443c01a7336-2c6e4724551mr17521875ad.6.1781752304159; Wed, 17 Jun 2026 20:11:44 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.11.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:11:43 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:14 +0800 Subject: [PATCH v3 03/13] sched/isolation: RCU-protect all housekeeping cpumask readers Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-3-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Extend housekeeping_dereference_check() to validate all runtime-mutable types (HK_TYPE_DOMAIN, HK_TYPE_KERNEL_NOISE, HK_TYPE_MANAGED_IRQ), not only HK_TYPE_DOMAIN. Boot-only types (HK_TYPE_DOMAIN_BOOT) remain unchecked. Add housekeeping_cpumask_rcu() for callers that already hold an RCU read lock. This variant uses rcu_dereference() without the lockdep annotation, avoiding false-positive lockdep warnings in RCU read-side critical sections. Use READ_ONCE() consistently when testing housekeeping.flags in paths that may race with housekeeping_update_types(). Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- include/linux/sched/isolation.h | 6 +++++ kernel/sched/isolation.c | 57 +++++++++++++++++++++++++++++++------= ---- 2 files changed, 49 insertions(+), 14 deletions(-) diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index eecbcbe802bd0..ed6e1c6980131 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -40,6 +40,7 @@ enum hk_type { DECLARE_STATIC_KEY_FALSE(housekeeping_overridden); extern int housekeeping_any_cpu(enum hk_type type); extern const struct cpumask *housekeeping_cpumask(enum hk_type type); +extern const struct cpumask *housekeeping_cpumask_rcu(enum hk_type type); extern bool housekeeping_enabled(enum hk_type type); extern void housekeeping_affine(struct task_struct *t, enum hk_type type); extern bool housekeeping_test_cpu(int cpu, enum hk_type type); @@ -87,6 +88,11 @@ static inline const struct cpumask *housekeeping_cpumask= (enum hk_type type) return cpu_possible_mask; } =20 +static inline const struct cpumask *housekeeping_cpumask_rcu(enum hk_type = type) +{ + return cpu_possible_mask; +} + static inline bool housekeeping_enabled(enum hk_type type) { return false; diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 4eca18cc5e8ce..3d5d3f12853c7 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -121,25 +121,40 @@ bool housekeeping_enabled(enum hk_type type) } EXPORT_SYMBOL_GPL(housekeeping_enabled); =20 +/* + * Types that can change at runtime via cpuset isolated partitions. + * Boot-only types (DOMAIN_BOOT) are always safe to read without lockdep. + */ +static bool housekeeping_type_can_change(enum hk_type type) +{ + switch (type) { + case HK_TYPE_DOMAIN: + case HK_TYPE_KERNEL_NOISE: + case HK_TYPE_MANAGED_IRQ: + return true; + default: + return false; + } +} + static bool housekeeping_dereference_check(enum hk_type type) { - if (IS_ENABLED(CONFIG_LOCKDEP) && type =3D=3D HK_TYPE_DOMAIN) { - /* Cpuset isn't even writable yet? */ - if (system_state <=3D SYSTEM_SCHEDULING) - return true; + if (!IS_ENABLED(CONFIG_LOCKDEP) || !housekeeping_type_can_change(type)) + return true; =20 - /* CPU hotplug write locked, so cpuset partition can't be overwritten */ - if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && lockdep_is_cpus_write_held()) - return true; + /* Cpuset isn't even writable yet? */ + if (system_state <=3D SYSTEM_SCHEDULING) + return true; =20 - /* Cpuset lock held, partitions not writable */ - if (IS_ENABLED(CONFIG_CPUSETS) && lockdep_is_cpuset_held()) - return true; + /* CPU hotplug write locked, so cpuset partition can't be overwritten */ + if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && lockdep_is_cpus_write_held()) + return true; =20 - return false; - } + /* Cpuset lock held, partitions not writable */ + if (IS_ENABLED(CONFIG_CPUSETS) && lockdep_is_cpuset_held()) + return true; =20 - return true; + return false; } =20 static inline struct cpumask *housekeeping_cpumask_dereference(enum hk_typ= e type) @@ -162,12 +177,26 @@ const struct cpumask *housekeeping_cpumask(enum hk_ty= pe type) } EXPORT_SYMBOL_GPL(housekeeping_cpumask); =20 +const struct cpumask *housekeeping_cpumask_rcu(enum hk_type type) +{ + const struct cpumask *mask =3D NULL; + + if (static_branch_unlikely(&housekeeping_overridden)) { + if (READ_ONCE(housekeeping.flags) & BIT(type)) + mask =3D rcu_dereference(housekeeping.cpumasks[type]); + } + if (!mask) + mask =3D cpu_possible_mask; + return mask; +} +EXPORT_SYMBOL_GPL(housekeeping_cpumask_rcu); + int housekeeping_any_cpu(enum hk_type type) { int cpu; =20 if (static_branch_unlikely(&housekeeping_overridden)) { - if (housekeeping.flags & BIT(type)) { + if (READ_ONCE(housekeeping.flags) & BIT(type)) { cpu =3D sched_numa_find_closest(housekeeping_cpumask(type), smp_process= or_id()); if (cpu < nr_cpu_ids) return cpu; --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEDDF326939 for ; Thu, 18 Jun 2026 03:11:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752312; cv=none; b=i2xgvxMcKcryGgk0C+wbnm7g8+jKx6di145tL8wDnAzroBUuSy14NXGCmpFjlhREjClD8T4FYIiDEx1a6es8Axse/ynMtRh10rzNeWV7xcipTOmsdeJ3cGrKpKVEFNzgveu3GV9/sHu/n7KYFlSBaHxbmGroyquxlKNct4w9o0U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752312; c=relaxed/simple; bh=DkfY/r2xkmo438EzM4kwrrTgd/VUujsBaL78ZzvJYv0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VM4K7V8kum/lO5az223pRx+f9+ZP7dX30z0cnOeRL7vPuMU9mzduKZJt9AnSd1YbloMjEl/gr/TnzMhlbN7Yt4Fc6eTRsGfei/lrhueSEyyKCVbUk+SXeizjlwjMAmO9bpSLig2ZOQU5DdvKyhX2ZFjsb6EcKagrQ78t4b3Ym84= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=miNT5MPt; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="miNT5MPt" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2c69921bd15so2305465ad.1 for ; Wed, 17 Jun 2026 20:11:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752310; x=1782357110; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=PoMbytYOzbmWtTlBZiAAPcp758fAUen5og8k+7nFdmk=; b=miNT5MPts8clFix+tyjttW7l/VPBvTrZ7A+khs5o9ZXl64idvfiun7L9cGrvQY095s Jjv1gA2SD1ZE08k2KT/zh7sRUDvtBvm7ZWiyzjPyVIyWHQ0S84scfUxbytMJkAh/YZMK eAcB9Z40UvEHfXFMJ/Agi+n81nGIMtpoJBCke6rtEcqUQFpBrmBe4G7ufWfLLhWWhFj2 9nAR12YvmXTkxzdtBp7eoM7obAi1ZCMLmvKRPB8YruqlU/rsrmXDm+TfIXsHyl3lcuc0 eCOZ9FoGDqU48D1h5amG38NWtWnkLqLf2iPm0HKmTJi6+GUSz+zo6yVyLFj6NGZoF2qp MRvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752310; x=1782357110; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=PoMbytYOzbmWtTlBZiAAPcp758fAUen5og8k+7nFdmk=; b=lkJcl9Tkt/d6i8w2nIzHTDwz3VYNch92UTVOVj2+qzm8EgI+Zo9WuwryCscg4L3sNS s/jdycZnkIbQUHoujguccUKf4a5MDPBuqNYyN2RnhvUq5ffKgtDonWMS0uZSlQmbscx8 hjdUFeM89hXkRRzlVCxU+XV2hmTZo0TMKwoA7c1zleAvyl+bOIZ8l7igVT9AWcrIrVTS Rg6w9JxBhYJpTfWRIDcSuQDDmX/C7sBVFb3gMz9UJlaSutiyr9BFQXhVM3a1Mw9eqwij q89xPgQUJgQqwS5kdMZRxOYxvddlHHZEk7jiWLYjdSX7l0b7t7upk2mAl0qi2wkKNvnr 4bgw== X-Gm-Message-State: AOJu0YyW3wREghcHHHd6aBSeNcouauQdf5WQLQ2ihiscZlzkg3flbPea L3MbQ8ophgLEvIDyb6PsehOBNsnUTmtpJf1epALHSS6ebmHOkMuzpiy6 X-Gm-Gg: AfdE7clhAQyyUaL0imzIGZ/YPcRNK0btCGjpeM2k+Gya4WH/h0JrffK3YCnY7xqK/LI sXvwsFdN7komqzsI7mGIzCLPavTajXBgCeMDZ9DEbICTlsLHx6QUa1YUhUo+GNZfZwft7TJbs0E XUf0QUwec4nfyY059ZC9eJb4W8tn4vkft7FzFuIdg+vs2FG/Y2XFNjBJ+HBezspq4LTQ9HkZ3Jm +Zlyz1RcWmD50lq0h+qp5bV1TEKmRaHqTJ+6QemMDLZ0FGjE10aTxfeoJEMDb5mgS7yf7g23pOT USrnis6FJzv+ZdI5SibEcRP/n1WcJiuzsN8BpB3id0I9lYqRclis2kjx8juoMDS4G33MsSa4ldz w8RcHKgL9K4JKxrwDvmpZwVLvIQjhncwbzVA8MG6r9LcGTktH1vCpuzM8G+izx49/mcuN4vwUnu OiyJPdYpIm+wo= X-Received: by 2002:a17:902:ce04:b0:2bd:ba44:6c07 with SMTP id d9443c01a7336-2c6e494930cmr17614055ad.16.1781752310297; Wed, 17 Jun 2026 20:11:50 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.11.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:11:49 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:15 +0800 Subject: [PATCH v3 04/13] sched/isolation: Fix RCU protection for runtime-mutable cpumask callers Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-4-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 housekeeping_update_types() installs new cpumasks via rcu_assign_pointer() and frees the old ones after synchronize_rcu(); callers that dereference the old pointer without holding an RCU read lock can access freed memory. Fix the four call sites: kernel/sched/core.c (get_nohz_timer_target, HK_TYPE_KERNEL_NOISE): The guard(rcu)() was acquired after housekeeping_cpumask(). Move it before the call and switch to housekeeping_cpumask_rcu() so hk_mask is read inside the RCU read-side critical section. HK_TYPE_KERNEL_NOISE is updated at runtime by housekeeping_update_types(); this fix is required for correctness. drivers/hv/channel_mgmt.c (init_vp_index, HK_TYPE_MANAGED_IRQ): The function stored the raw pointer in a local variable and used it across GFP_KERNEL allocations (which can sleep, so an RCU read lock cannot span them). Allocate both cpumask_var_t buffers first, then snapshot the housekeeping mask under a brief rcu_read_lock() and use the snapshot throughout. HK_TYPE_MANAGED_IRQ is updated at runtime; this fix is required for correctness. kernel/time/hrtimer.c (get_target_base, HK_TYPE_TIMER): cpumask_any_and() against housekeeping_cpumask(HK_TYPE_TIMER) was called without any lock. Wrap with rcu_read_lock()/rcu_read_unlock() and use housekeeping_cpumask_rcu(). HK_TYPE_TIMER is not changed at runtime in this series; this is a defensive fix to satisfy the housekeeping_dereference_check() lockdep annotation for future-proofing. hrtimers_cpu_dying() is already safe: it runs under the cpu_hotplug_lock write side, which housekeeping_dereference_check() already permits. arch/arm64/kernel/topology.c (arch_freq_get_on_cpu, HK_TYPE_TICK): cpumask_intersects() against housekeeping_cpumask(HK_TYPE_TICK) was called without any lock. Evaluate under rcu_read_lock() and store the boolean result before releasing the lock. HK_TYPE_TICK is not changed at runtime in this series; this is a defensive fix. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- arch/arm64/kernel/topology.c | 9 ++++++-- drivers/hv/channel_mgmt.c | 50 ++++++++++++++++++++++++++++++----------= ---- kernel/sched/core.c | 3 +-- kernel/time/hrtimer.c | 5 ++++- 4 files changed, 46 insertions(+), 21 deletions(-) diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index b32f13358fbb1..8f4329b57cea7 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -212,8 +212,13 @@ int arch_freq_get_on_cpu(int cpu) if (!policy) return -EINVAL; =20 - if (!cpumask_intersects(policy->related_cpus, - housekeeping_cpumask(HK_TYPE_TICK))) { + bool no_hk_in_policy; + + rcu_read_lock(); + no_hk_in_policy =3D !cpumask_intersects(policy->related_cpus, + housekeeping_cpumask_rcu(HK_TYPE_TICK)); + rcu_read_unlock(); + if (no_hk_in_policy) { cpufreq_cpu_put(policy); return -EOPNOTSUPP; } diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 84eb0a6a0b546..fc5247e92e1b3 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -750,26 +750,43 @@ static void init_vp_index(struct vmbus_channel *chann= el) { bool perf_chn =3D hv_is_perf_channel(channel); u32 i, ncpu =3D num_online_cpus(); - cpumask_var_t available_mask; + cpumask_var_t available_mask, hk_snap; struct cpumask *allocated_mask; - const struct cpumask *hk_mask =3D housekeeping_cpumask(HK_TYPE_MANAGED_IR= Q); u32 target_cpu; int numa_node; =20 - if (!perf_chn || - !alloc_cpumask_var(&available_mask, GFP_KERNEL) || - cpumask_empty(hk_mask)) { - /* - * If the channel is not a performance critical - * channel, bind it to VMBUS_CONNECT_CPU. - * In case alloc_cpumask_var() fails, bind it to - * VMBUS_CONNECT_CPU. - * If all the cpus are isolated, bind it to - * VMBUS_CONNECT_CPU. - */ + if (!perf_chn) { + channel->target_cpu =3D VMBUS_CONNECT_CPU; + return; + } + + if (!alloc_cpumask_var(&available_mask, GFP_KERNEL)) { + channel->target_cpu =3D VMBUS_CONNECT_CPU; + hv_set_allocated_cpu(VMBUS_CONNECT_CPU); + return; + } + + /* + * Snapshot HK_TYPE_MANAGED_IRQ cpumask under RCU read lock. + * housekeeping_update_types() frees the old cpumask after + * synchronize_rcu(), so we must not hold the pointer beyond an + * RCU read-side critical section. + */ + if (!alloc_cpumask_var(&hk_snap, GFP_KERNEL)) { + free_cpumask_var(available_mask); + channel->target_cpu =3D VMBUS_CONNECT_CPU; + hv_set_allocated_cpu(VMBUS_CONNECT_CPU); + return; + } + rcu_read_lock(); + cpumask_copy(hk_snap, housekeeping_cpumask_rcu(HK_TYPE_MANAGED_IRQ)); + rcu_read_unlock(); + + if (cpumask_empty(hk_snap)) { + free_cpumask_var(hk_snap); + free_cpumask_var(available_mask); channel->target_cpu =3D VMBUS_CONNECT_CPU; - if (perf_chn) - hv_set_allocated_cpu(VMBUS_CONNECT_CPU); + hv_set_allocated_cpu(VMBUS_CONNECT_CPU); return; } =20 @@ -788,7 +805,7 @@ static void init_vp_index(struct vmbus_channel *channel) =20 retry: cpumask_xor(available_mask, allocated_mask, cpumask_of_node(numa_node)); - cpumask_and(available_mask, available_mask, hk_mask); + cpumask_and(available_mask, available_mask, hk_snap); =20 if (cpumask_empty(available_mask)) { /* @@ -809,6 +826,7 @@ static void init_vp_index(struct vmbus_channel *channel) =20 channel->target_cpu =3D target_cpu; =20 + free_cpumask_var(hk_snap); free_cpumask_var(available_mask); } =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b8871449d3c69..371b509d92164 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1272,9 +1272,8 @@ int get_nohz_timer_target(void) default_cpu =3D cpu; } =20 - hk_mask =3D housekeeping_cpumask(HK_TYPE_KERNEL_NOISE); - guard(rcu)(); + hk_mask =3D housekeeping_cpumask_rcu(HK_TYPE_KERNEL_NOISE); =20 for_each_domain(cpu, sd) { for_each_cpu_and(i, sched_domain_span(sd), hk_mask) { diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 5bd6efe598f0f..18e17a9dad67b 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -242,8 +242,11 @@ static bool hrtimer_suitable_target(struct hrtimer *ti= mer, struct hrtimer_clock_ static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_= base *base, bool pinned) { if (!hrtimer_base_is_online(base)) { - int cpu =3D cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYP= E_TIMER)); + int cpu; =20 + rcu_read_lock(); + cpu =3D cpumask_any_and(cpu_online_mask, housekeeping_cpumask_rcu(HK_TYP= E_TIMER)); + rcu_read_unlock(); return &per_cpu(hrtimer_bases, cpu); } =20 --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4892C26B74A for ; Thu, 18 Jun 2026 03:11:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752318; cv=none; b=nngo7b8c7T1JeTmURjQhU/jLcRYlPZDFCOVokt5KEamoP25xmJyTw0Z4O8mrtzlJGIFYSAp9aROng6Ud5HwUybo5FEiRoFpGcC1KeyAViP/bk9Sfkm9CZ6+G3SH+3Yl83O9SA31ykSxqbt91gkSm9ymWkIzFVP5O8q88qqoq9Fk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752318; c=relaxed/simple; bh=su/86e8IzEx6KCsYE6y2QU0JcZeTp1olU2GBvdI7t20=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=LX7YuAEjnD6DpEyXyxdQo2qH4REMxs9+b5PA+mEktxcbzKhoJykdFBHGluJp8PfhhSb7sbWKjG/bag6+ys4wo76f0j0Tc9hheEYwV9nvBkBgnW9UIuYZAUXtx37WrDgXzQOpV1ICd4noVZmILx8mQ2/jzZ2sPFLcopKh2DosM7U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lE0c4K8i; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lE0c4K8i" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2c40397e746so2964255ad.3 for ; Wed, 17 Jun 2026 20:11:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752317; x=1782357117; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=1y36uZa5QHApqDykuJpc+gggPAc0ysbP66IjBiya9ds=; b=lE0c4K8itEg+Udz8swnAs0DMm9NEQS93H7SjpXkhcvWUEwJnhnU/xR48vyy0XhXk2N MdJOogr7iXhhOwXLm0xDH0XwHqsrCgzUwchmzmjqkqHXNPy+xwSPGJLrLXbULNlz0Otx egOr0btUqP8K22O1JJkbUWEFsozL3Zi3FpSjZYnZzFKg9kR/vr4OBKDeODwa9Vsqd4Vd xIEgBQ/ocu1aXvcqqRmg2ztmLlc26PqWk/qIZzukj3Lw4HSDQI0GAsm2hUmGKOAAdEUE T7JbVmotbagk9OmX2uQG0MVyO6sU2GwTb0lW0oSkG174onZS+pxv2BuUMA9jvA9xwk26 XjwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752317; x=1782357117; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=1y36uZa5QHApqDykuJpc+gggPAc0ysbP66IjBiya9ds=; b=pXmHyE0fRFzbjp1DKGZa8DP04B3nzl+b7nKvCPjMpX2AWQ2d9rZnvK4jAD94icxBVI 96GjekE10EqWGLWItDnUwmA94xc506lH+iwdUKNG3Td800Lj744Awk31ILCOMl2v63Lu LEuK6rny9jDgSF7Uj5w7WekJDhw52ia4xfvsYGH9/uyEcI5BpbnAk13t3cwlwnROBIjl KQX/OisKqlg3UfuZZrzw1MVwU+5g7ri06LjzEKaWMynkzQXzKyDs8tsLEFk8xFsLn/YV 5riHBjcysb+vw7uHgvlQBIaGr+mEtlXC/pwzjlHU1C6pne4TywuqNtSvY+nMiDcTvkp2 Czyg== X-Gm-Message-State: AOJu0Yw7c7ce0Vmo0+WND4k+orwl+FG7inDMIRFSrQcAk1tjC81Qfani yBPO4g+HdGjhlPLfxyAihXqHnvOAfAkhbrYaqJ7Z1DSJ0HTbKlbzadk2 X-Gm-Gg: AfdE7ckqgdLBfZYvl5U1HOzl4IgaCLQQxebY0xdJY4OuS6J3aS6XWBRx7FH6KHAxd5b FzbPzonpqyFAnNA07+S9vamFagSFZBE/+yvLu0X92bh3LFysCS3ZD8nRkQAX2vJqqdfFhitDM/S c3hMhBvAAEPoogA6AQUrUZF1L50RTq+QEbVV400QkdZuce4na/UIsrAErwyu+7Y1mBpEBsFCspx S8eqI+Ry5kcUY7qNIJMuy4EBU4mPN8TvZrxSOPVxzmoBc7gwzIX7iYHTE35rRmzmLrDpaDlWaW6 D8cJCzSBCvX9eXiLEGH7tI/urYMYVG+WHdgg2c50+CgAMrw3hMiuvyOVuzinF2IW5O/YnDl9/++ kavKissxn6opjmYdPVAwxXDKQEmF+lpR08XcLJi+PZ7LGI5r0Nz/8L5eAX8B6APPP4uBuMDD6w3 pw21cq2NXLGpI= X-Received: by 2002:a17:903:240e:b0:2c0:a360:45da with SMTP id d9443c01a7336-2c6f34744e6mr5011065ad.32.1781752316749; Wed, 17 Jun 2026 20:11:56 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.11.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:11:56 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:16 +0800 Subject: [PATCH v3 05/13] cpu/hotplug: Reserve CPUHP states for nohz_full and managed IRQ down-paths Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-5-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Add CPUHP_AP_NO_HZ_FULL_DYING and CPUHP_AP_IRQ_AFFINITY_DYING to the cpuhp_state enum. These dying callbacks are invoked during CPU offline before the tick is stopped, enabling clean tick handover and managed IRQ migration when a CPU transitions between isolated and housekeeping states. The existing CPUHP_AP_IRQ_AFFINITY_ONLINE already handles managed IRQ restoration on CPU online. The new dying callback completes the pair, migrating managed interrupts away from the CPU before it goes down. Subsequent patches register handlers for these states. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- include/linux/cpuhotplug.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 22ba327ec2278..075cfa8161334 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -186,6 +186,8 @@ enum cpuhp_state { CPUHP_AP_SMPCFD_DYING, CPUHP_AP_HRTIMERS_DYING, CPUHP_AP_TICK_DYING, + CPUHP_AP_IRQ_AFFINITY_DYING, + CPUHP_AP_NO_HZ_FULL_DYING, CPUHP_AP_X86_TBOOT_DYING, CPUHP_AP_ARM_CACHE_B15_RAC_DYING, CPUHP_AP_ONLINE, --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 234A5331EAA for ; Thu, 18 Jun 2026 03:12:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752325; cv=none; b=huXOS3OSo/2EG4lgowJHF7yB4qa9LZwzYGFVIlkOHzhPT582IbBVm1lQkmQuoHRU4Y+BWNjTrnE7zBnpUtiSYGxnLonYkstTvUEiLo0zjegZ6Qky9s+5B/yPZLqTtwv9k0ICi/pBxqxER9b1B8axjaD34sJiq8r4mYezddlv8AQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752325; c=relaxed/simple; bh=NGMo6Qrww1dJu/rQIpKoY8Dsqru3A+nBInxhSjZKlJw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=JV75CJzS7iWi0N8ZTNcAmu4wCcov9B5kE6x1kvJxWKeIk6PbnYcThuKxKHZORJyjA2I+XuqG8j7oPqcejgaegf+R1tSeJ3UvXwL/Sex1/w+AJyTW7QgrACO/TbqQHTwL+Bn8tr0NW0GEDkKIXOpfa8cA4uknSV3xbQL9ZGe+DHI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VgIzoKEG; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VgIzoKEG" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-2c0c379e8ffso3919465ad.3 for ; Wed, 17 Jun 2026 20:12:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752323; x=1782357123; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=HHPKcwKivbUSuSD1ETvryOn1BWPlqo5A5EDmYEYSw9g=; b=VgIzoKEGvj64yDtKUpABwC6fSoqI9NQdNIRNsIvl208WEH5OtxLqVw1TVszBPx8jKB 2fTUZEMpm9oE/7jQ5URCsEs0LVwVwhb9hlPsvi3tidk0+HOJkGxcnRByebmG26Ta6XIz 5hrOJMN2G9nWHfC5ZLH0AVIZ2JUcqMGp1wRRz53BqxAipuNzphl5SEzYOAEhjDGmRwD1 rLVbJDjL8TFmtli9WfD2SUIXvYQOY48UV2QNy7xOHKghB2nZ4ouiIQM7LGFU0IrdoQu7 1UKK4ZluK6eV4M/MN4tfBhwsPqGk8fKWzkfO28s3Mm9lX0hN6/r3EJIYrzmTQf/tlata IhiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752323; x=1782357123; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=HHPKcwKivbUSuSD1ETvryOn1BWPlqo5A5EDmYEYSw9g=; b=KCp2mPusqjOr9g3meRbmVzno7IQU+E26SOGC79qxorXCU9UNZmyotGdoF+ZYws44L1 TuZPZpRG5fVjHmYk+7fI3DOxNZbtaJ+APheXiOlR91reXvw8nQi1XYXVqU2KGjEntXe8 LHyF8iC6XKZoKuPV1+YSE5WfplVNOjOJK9mINTKinT8GfzkBbSKT6gvKUilQRQQgNKFY lB17b+7UtsOzNd2jEjztXh8lwUH+HcZr4V83NYMMnnD5yDXKnYuGr+cLI/myD3/JCdIg y+cKf5EdAoSIkByEfIU84Ii2TrWBgOkffNgcZGI2x4B7A3zAycP5N0ZlqU296E8iWrhg AAhw== X-Gm-Message-State: AOJu0Yzx06eR2snpBXy4T0M5BBXifjTPFy9H94QVFhhiLT/R0qkUHl2K mBdc8itDATzny5E7NJBhoMj5Lt0CJFOpRHmJ8GuuS/ejjencKAxmVElt X-Gm-Gg: AfdE7cnxuQ1R9kU/o0e6ZSX4OZHV+DYdbxKsFTLPC8Y7hMRag0tVAZ34fwdFc5duIkK rOqyScYBwLiVnmFELOng36k2UaeHHVoNEZ24Zq9mArmPjvALsrlD14buLMzKBUZkf4r8q4d9qZ4 UXOVL3n9w9yIFqlyqQrtjho4PgWfO6a2f70q8NS/4dDg2MRyvRi8mX2F2UF2OOXMx38dF+rdPvz HVvbex5ZuC0Ts9YWnPufyFlEO7tu4sPooZe2pqIosH7HI/mMbJNdA2uSNvplswGEQiRL6aOD8rT GgJ/F3CeIku4wUoldMSaQbXlbHtdbUcQL4Yp2I4MEUW/LvGoNwC6SJLPP5NKBD5Jh4bEkSqSbuy vzO05cZeZzpcfDQQAH584XHqqZlsUjOhjGJat7RTyRO+C027LQ9B9eolQVd1z8swbq1mwgXGkfc HSWBOITO66VtM= X-Received: by 2002:a17:903:32d0:b0:2c1:ef9:4516 with SMTP id d9443c01a7336-2c6e52e23edmr17236945ad.35.1781752323371; Wed, 17 Jun 2026 20:12:03 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.11.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:12:03 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:17 +0800 Subject: [PATCH v3 06/13] tick/nohz, context_tracking: Prepare for runtime nohz_full updates Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-6-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Remove __init from ct_cpu_track_user() and __initdata from the initialized flag so context tracking can be activated on CPUs that join nohz_full at runtime. Drop the __ro_after_init attribute from the context_tracking_key static key, allowing static_branch_dec() when a CPU leaves nohz_full. Add ct_cpu_untrack_user() to reverse ct_cpu_track_user(), decrementing the static key and clearing the per-CPU tracking state. Register a housekeeping_cbs for HK_TYPE_KERNEL_NOISE that: - pre_validate: checks CONFIG_NO_HZ_FULL is available. - apply: snapshots the new HK_TYPE_KERNEL_NOISE mask under an RCU read lock (the lockdep annotation in housekeeping_cpumask() requires this even after synchronize_rcu() completes), computes nohz_full as the complement of the housekeeping mask, then under tick_nohz_lock: - Activates context tracking (ct_cpu_track_user()) on CPUs newly added to nohz_full, and deactivates it (ct_cpu_untrack_user()) on CPUs returning to the housekeeping set. This activates the context_tracking_key static key dynamically, eliminating the need for CONFIG_CONTEXT_TRACKING_USER_FORCE. - Updates tick_nohz_full_mask in-place (legacy EXPORT_SYMBOL_GPL snapshot, eventually consistent). - Migrates tick_do_timer_cpu if it moved into the isolated set. - Kicks all CPUs to re-evaluate tick behaviour. When CONFIG_CONTEXT_TRACKING_USER_FORCE is enabled and nohz_full=3D is given at boot, tick_nohz_init() now calls context_tracking_init() before iterating over tick_nohz_full_mask to call ct_cpu_track_user(). This ensures the per-CPU tracking state is set up before any CPU is tracked, which is also required for CPUs later added to nohz_full at runtime via DHM isolated partitions. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- include/linux/context_tracking.h | 1 + kernel/context_tracking.c | 23 ++---- kernel/time/tick-sched.c | 157 +++++++++++++++++++++++++++++++++++= ++-- 3 files changed, 161 insertions(+), 20 deletions(-) diff --git a/include/linux/context_tracking.h b/include/linux/context_track= ing.h index af9fe87a09225..632cfc97b5b22 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -12,6 +12,7 @@ =20 #ifdef CONFIG_CONTEXT_TRACKING_USER extern void ct_cpu_track_user(int cpu); +extern void ct_cpu_untrack_user(int cpu); =20 /* Called with interrupts disabled. */ extern void __ct_user_enter(enum ctx_state state); diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index a743e7ffa6c00..e68fb02b25ad4 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -411,7 +411,7 @@ static __always_inline void ct_kernel_enter(bool user, = int offset) { } #define CREATE_TRACE_POINTS #include =20 -DEFINE_STATIC_KEY_FALSE_RO(context_tracking_key); +DEFINE_STATIC_KEY_FALSE(context_tracking_key); EXPORT_SYMBOL_GPL(context_tracking_key); =20 static noinstr bool context_tracking_recursion_enter(void) @@ -674,28 +674,21 @@ void user_exit_callable(void) } NOKPROBE_SYMBOL(user_exit_callable); =20 -void __init ct_cpu_track_user(int cpu) +void ct_cpu_track_user(int cpu) { - static __initdata bool initialized =3D false; - if (!per_cpu(context_tracking.active, cpu)) { per_cpu(context_tracking.active, cpu) =3D true; static_branch_inc(&context_tracking_key); } +} =20 - if (initialized) +void ct_cpu_untrack_user(int cpu) +{ + if (!per_cpu(context_tracking.active, cpu)) return; =20 -#ifdef CONFIG_HAVE_TIF_NOHZ - /* - * Set TIF_NOHZ to init/0 and let it propagate to all tasks through fork - * This assumes that init is the only task at this early boot stage. - */ - set_tsk_thread_flag(&init_task, TIF_NOHZ); -#endif - WARN_ON_ONCE(!tasklist_empty()); - - initialized =3D true; + per_cpu(context_tracking.active, cpu) =3D false; + static_branch_dec(&context_tracking_key); } =20 #ifdef CONFIG_CONTEXT_TRACKING_USER_FORCE diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index cbbb87a0c6e7c..a7fe097042f7d 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -26,6 +26,7 @@ #include #include #include +#include #include =20 #include @@ -653,11 +654,6 @@ void __init tick_nohz_init(void) if (!tick_nohz_full_running) return; =20 - /* - * Full dynticks uses IRQ work to drive the tick rescheduling on safe - * locking contexts. But then we need IRQ work to raise its own - * interrupts to avoid circular dependency on the tick. - */ if (!arch_irq_work_has_interrupt()) { pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IRQ= work self-IPIs\n"); cpumask_clear(tick_nohz_full_mask); @@ -676,6 +672,16 @@ void __init tick_nohz_init(void) } } =20 + /* + * Pre-initialize context tracking for all possible CPUs so + * ctx tracking is already active when a CPU is later added to + * nohz_full at runtime. The tracking overhead is negligible + * because the static key is not incremented yet =E2=80=94 only per-CPU + * tracking state is set up. + */ + if (IS_ENABLED(CONFIG_CONTEXT_TRACKING_USER_FORCE)) + context_tracking_init(); + for_each_cpu(cpu, tick_nohz_full_mask) ct_cpu_track_user(cpu); =20 @@ -686,6 +692,147 @@ void __init tick_nohz_init(void) pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n", cpumask_pr_args(tick_nohz_full_mask)); } + +static int tick_nohz_hk_validate(enum hk_type type, + const struct cpumask *cur_mask, + const struct cpumask *new_mask) +{ + if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) + return -EOPNOTSUPP; + return 0; +} + +static void tick_nohz_hk_apply(enum hk_type type) +{ + static DEFINE_SPINLOCK(tick_nohz_lock); + cpumask_var_t nohz_full, added, removed; + bool was_running; + int cpu; + + if (!alloc_cpumask_var(&nohz_full, GFP_KERNEL)) + return; + if (!alloc_cpumask_var(&added, GFP_KERNEL)) { + free_cpumask_var(nohz_full); + return; + } + if (!alloc_cpumask_var(&removed, GFP_KERNEL)) { + free_cpumask_var(added); + free_cpumask_var(nohz_full); + return; + } + + /* + * Snapshot the new HK_TYPE_KERNEL_NOISE mask under an RCU read lock. + * housekeeping_update_types() completes synchronize_rcu() before + * invoking apply(), so the new pointer is stable; however the lockdep + * annotation in housekeeping_cpumask() still requires an RCU read-side + * critical section for runtime-mutable types. + */ + rcu_read_lock(); + cpumask_andnot(nohz_full, cpu_possible_mask, + housekeeping_cpumask_rcu(HK_TYPE_KERNEL_NOISE)); + rcu_read_unlock(); + + /* + * When "nohz_full=3D" was not passed at boot, tick_nohz_full_running is + * false and the full dynticks infrastructure (sched_tick_offload_init, + * RCU nohz quiescent-state reporting, context-tracking bootstrap) was + * never initialised. In that case restrict the update to + * tick_nohz_full_mask so the /sys/devices/system/cpu/nohz_full sysfs + * attribute reflects DHM-isolated CPUs without enabling tick + * suppression, context tracking, or timer migration =E2=80=93 all of whi= ch + * require boot-time setup and would deadlock on the first + * synchronize_rcu() call after CPUs are offlined. + */ + was_running =3D READ_ONCE(tick_nohz_full_running); + + spin_lock(&tick_nohz_lock); + + /* + * When nohz_full=3D was active at boot, compute the delta and update + * context tracking for CPUs joining or leaving the nohz_full set. + * Skip when !was_running: ct_cpu_track_user() calls + * static_branch_inc() which may sleep (jump_label_update on the + * 0=E2=86=921 transition) =E2=80=93 illegal inside a spinlock. + */ + if (IS_ENABLED(CONFIG_CONTEXT_TRACKING_USER) && + was_running && + cpumask_available(tick_nohz_full_mask)) { + cpumask_andnot(added, nohz_full, tick_nohz_full_mask); + cpumask_andnot(removed, tick_nohz_full_mask, nohz_full); + for_each_cpu(cpu, added) + ct_cpu_track_user(cpu); + for_each_cpu(cpu, removed) + ct_cpu_untrack_user(cpu); + } + + /* + * Update tick_nohz_full_mask unconditionally: this is the snapshot + * read by the /sys/devices/system/cpu/nohz_full sysfs attribute and + * must reflect the current isolation set even in the DHM runtime case. + */ + if (cpumask_available(tick_nohz_full_mask)) + cpumask_copy(tick_nohz_full_mask, nohz_full); + + /* + * Only modify tick_nohz_full_running and migrate the global tick when + * nohz_full=3D was set at boot; without boot-time setup, setting + * tick_nohz_full_running would suppress ticks on isolated CPUs and + * prevent RCU quiescent-state reporting, causing synchronize_rcu() + * to stall permanently when a CPU is subsequently offlined. + */ + if (was_running) { + tick_nohz_full_running =3D !cpumask_empty(nohz_full); + + if (tick_nohz_full_running) { + cpu =3D READ_ONCE(tick_do_timer_cpu); + if (cpu < nr_cpu_ids && + !housekeeping_test_cpu(cpu, HK_TYPE_KERNEL_NOISE)) { + int new_cpu; + + new_cpu =3D housekeeping_any_cpu(HK_TYPE_KERNEL_NOISE); + if (new_cpu < nr_cpu_ids) + WRITE_ONCE(tick_do_timer_cpu, new_cpu); + } + } + } + + spin_unlock(&tick_nohz_lock); + + if (was_running) + tick_nohz_full_kick_all(); + free_cpumask_var(removed); + free_cpumask_var(added); + free_cpumask_var(nohz_full); +} + +static struct housekeeping_cbs tick_nohz_hk_cbs =3D { + .name =3D "tick/nohz", + .pre_validate =3D tick_nohz_hk_validate, + .apply =3D tick_nohz_hk_apply, +}; + +static int __init tick_nohz_hk_init_late(void) +{ + int ret; + + /* + * Ensure tick_nohz_full_mask is allocated so that tick_nohz_hk_apply() + * can update it (and the /sys/devices/system/cpu/nohz_full sysfs + * attribute) when CPUs are isolated at runtime via DHM. If "nohz_full= =3D" + * was passed at boot the mask is already allocated; allocate an empty + * one here for the runtime-only case. + */ + if (!cpumask_available(tick_nohz_full_mask) && + !zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL)) + pr_warn("tick/nohz: failed to allocate nohz_full_mask for DHM\n"); + + ret =3D housekeeping_register_cbs(HK_TYPE_KERNEL_NOISE, &tick_nohz_hk_cbs= ); + if (ret) + pr_warn("tick/nohz: Failed to register hk callback: %d\n", ret); + return 0; +} +late_initcall(tick_nohz_hk_init_late); #endif /* #ifdef CONFIG_NO_HZ_FULL */ =20 /* --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4278C78F39 for ; Thu, 18 Jun 2026 03:12:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752331; cv=none; b=rg82/PC/qfvpWUBf6zozcJ+QVPHU/LdPI51pt/SxRSrMiSrjdKPMy8l5tVicxR7gesXvcTHpk4CIc4kNnXpLm5UxTJgn131yK1HqZVglERet5qacY1ADIei7wGB8HpDxueH84V8cXSeBeSSvnaVLpMYAaIl5jynFKEckI2OpZH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752331; c=relaxed/simple; bh=B8gyjG7FPCLUh1O4Z9fKhzpiDUNCioFcs5Lz1i1uV6s=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=uRCS2CoUzeQrPBxhvdwji751CyZznOLZcuMuYl9JAiF8xWPEQ41/TC0gVMw8RbPVoQN1m3+YDv4wCCBH0LmKlUypMpsXHCg+dtNKJ/4FvVuNYN8W4DSxO422P85w+j/ErBxXzoxo4TmzO6ln3KotiwOsTs4qoTnyCB/7bl3uDNU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fHU98Jew; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fHU98Jew" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-36b9ec98144so263088a91.1 for ; Wed, 17 Jun 2026 20:12:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752330; x=1782357130; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=NERal9DiopbIQ+nDYvnmg4O3D7AE0rEg1povJlnV+E4=; b=fHU98JewrVJ1wDqYVeMzen+xCWrLzTAJhVR/pC1orc9O3m/zXfxOtH6EkGnFuOkwQo iSgb82m4fG+KXp6mKsiTKWzqkLP1R3IvuzFqSBno/Qk8XKFare3LXx4v2BSvttuUN57w gtBipLBsexh9aY1ssft/b2o+tY57rOIwnJ/SXz5HmUccJvgksYRWkXdAj6pVgZrbj+Xy Vb69yLyxMI7sE3Q8BfCbcUCdLktIpAWmnUYvncIaxuLPchznDQM51tcxwYasyWPzDKIo 6y6lzofw2le8q6jp88rPez38cc2yUEKWJqLVZeqWfwEDGqH0SR9sWfNOpxx99m90ucVk F/kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752330; x=1782357130; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=NERal9DiopbIQ+nDYvnmg4O3D7AE0rEg1povJlnV+E4=; b=Vk2zy/FH2swWFE/k4Sy0BoGOzGeV1fO8n9ekMtKa396+UwGAE/Gh0nyStpsnSey4Hp 5mGGP+bIAYmIoW4e59LW6wM4TJB7w0ZB74l2HCSB9GCbMsOweD2YiOd5Cfg2qi2M1SiR sEyCpmqGkcexfWFpQm0O8wlM5bo1+vG363YkqP7xnNAIWXHh4aaQp6zz4ixERinT+b3o XWQnvfXzhZyM541U4xFeupJCzzpKmAhO54MwV//4mlnfT0iKgJzDw1V1fypPY7+8lhd1 hDITuyOCZpMjBfsi98kSAWKeA+XByd3wjMA3s2VsI3FiYEYavR/7Z7c/AQJbFRx1D1BJ JY5w== X-Gm-Message-State: AOJu0YwQG8/HagPDvruOIMbq/Ygn29puX6KMFRqo1IFj1czuLT3t2QAm 3HU8QplO92NafLHqz31qGzzXe8ZP6i0FysgRB2m+dINn4aAxLoavQ0ys X-Gm-Gg: AfdE7cnjw0MUO7ouLMA2AS+Fz26zIQUVNFa/fpF98SH0hOGmf02uPupY+Y7i06orve6 RaPMHXng3w0T19rSvNfWyx3Ly3GwCj3zdb4GOS11znMWNJ22UCmnX/GuDxBi3hL1SytYDQ/cbjy hL23I2Hq6Nr4abmav5B5iA0SoM3tMmCjYrw32d/7V2IubqZ7+7l0ToQgekbKZn67dV6ZTHPYGJZ rgSCiCeEavAnQFAMZgJwchOL2HSM8Zt50xnUCUCe5uoewLdzuSD7ZSv443EhVWz52i75R3CaxoV /ZELeH9olgnVL1ew7EUbQsJyYJW4E5UgA958zDOXzLlahJDh3EQUyoXAgt9v0Rnm/dG78NoefnO OdTDhzG9NeT4mGom+7PVnnHUFm35KWTSYIALaDukM8/i25uL7wmOVSlqFHZaMO6xBWhPcjFpOhO FfulcPjpkwbZ4= X-Received: by 2002:a17:902:ce07:b0:2bf:211c:4980 with SMTP id d9443c01a7336-2c6f347e06fmr4597485ad.35.1781752329634; Wed, 17 Jun 2026 20:12:09 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.12.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:12:09 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:18 +0800 Subject: [PATCH v3 07/13] rcu/nocb: Add explicit housekeeping callback for runtime NOCB toggling Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-7-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Register a housekeeping callback for HK_TYPE_KERNEL_NOISE. When the mask changes, schedule asynchronous work to iterate all possible CPUs and toggle NOCB mode for CPUs whose state disagrees with the new mask. CPUs in the housekeeping set are de-offloaded; isolated CPUs are offloaded. Use CPU hotplug (remove_cpu() / add_cpu()) because rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() require the target CPU to be offline. The hotplug cycle takes the CPU fully offline to quiesce its RCU state before toggling the NOCB flag, then brings it back. Skip CPUs whose state already matches to avoid unnecessary hotplug churn. Only bring a CPU back online if it was online before the state change (was_online guard avoids add_cpu() on a CPU that was already offline). This differs from Frederic Weisbecker's suggestion to "assume the CPU is offline" within the RCU subsystem and toggle NOCB without a full hotplug cycle. The full hotplug approach was chosen for v3 because rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() are the existing stable interfaces and the "assume offline" path would require adding new internal RCU APIs. This is a known limitation that may be addressed by RCU maintainers in follow-up work. Snapshot the current HK_TYPE_KERNEL_NOISE cpumask inside the work function under an RCU read lock rather than caching the pointer at apply() time. Caching at apply() time would create a use-after-free hazard: a subsequent housekeeping_update_types() call frees the old cpumask after synchronize_rcu() but before the work function runs. Remove the cpus_read_lock() / cpus_read_unlock() pair that wrapped the hotplug loop. remove_cpu() and add_cpu() acquire the cpu_hotplug_lock write side; holding the read side via cpus_read_lock() before calling them causes a deadlock. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- kernel/rcu/tree.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 104 insertions(+) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 55df6d37145e8..214ce940f501b 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4929,3 +4929,107 @@ void __init rcu_init(void) #include "tree_exp.h" #include "tree_nocb.h" #include "tree_plugin.h" + +#ifdef CONFIG_RCU_NOCB_CPU +/* + * RCU NOCB runtime toggle via housekeeping callback. + * Schedule the CPU-hotplug work asynchronously because + * remove_cpu() and add_cpu() must not be called while holding + * cpuset_top_mutex (the hk callback context). + * + * Snapshot the current HK_TYPE_KERNEL_NOISE cpumask inside the work + * function under an RCU read lock to avoid caching a pointer at + * apply() time that could be freed before the work runs. + */ +struct rcu_hk_work { + struct work_struct work; +}; + +static void rcu_hk_workfn(struct work_struct *w) +{ + struct rcu_hk_work *hw =3D container_of(w, struct rcu_hk_work, work); + cpumask_var_t hk_mask; + int cpu, ret; + + if (!alloc_cpumask_var(&hk_mask, GFP_KERNEL)) { + kfree(hw); + return; + } + + rcu_read_lock(); + cpumask_copy(hk_mask, housekeeping_cpumask_rcu(HK_TYPE_KERNEL_NOISE)); + rcu_read_unlock(); + + for_each_possible_cpu(cpu) { + bool should_offload =3D !cpumask_test_cpu(cpu, hk_mask); + bool is_offloaded; + bool was_online; + + if (!cpumask_available(rcu_nocb_mask)) { + is_offloaded =3D false; + } else { + is_offloaded =3D cpumask_test_cpu(cpu, rcu_nocb_mask); + } + + if (should_offload =3D=3D is_offloaded) + continue; + + was_online =3D cpu_online(cpu); + if (was_online) { + ret =3D remove_cpu(cpu); + if (ret) + continue; + } + if (should_offload) + rcu_nocb_cpu_offload(cpu); + else + rcu_nocb_cpu_deoffload(cpu); + if (was_online) + add_cpu(cpu); + } + + free_cpumask_var(hk_mask); + kfree(hw); +} + +static void rcu_hk_apply(enum hk_type type) +{ + struct rcu_hk_work *hw; + + if (!cpumask_available(rcu_nocb_mask)) + return; + + hw =3D kmalloc(sizeof(*hw), GFP_KERNEL); + if (!hw) + return; + + INIT_WORK(&hw->work, rcu_hk_workfn); + schedule_work(&hw->work); +} + +static int rcu_hk_validate(enum hk_type type, + const struct cpumask *cur_mask, + const struct cpumask *new_mask) +{ + if (!IS_ENABLED(CONFIG_RCU_NOCB_CPU)) + return -EOPNOTSUPP; + return 0; +} + +static struct housekeeping_cbs rcu_hk_cbs =3D { + .name =3D "rcu/nocb", + .pre_validate =3D rcu_hk_validate, + .apply =3D rcu_hk_apply, +}; + +static int __init rcu_hk_init(void) +{ + int ret; + + ret =3D housekeeping_register_cbs(HK_TYPE_KERNEL_NOISE, &rcu_hk_cbs); + if (ret) + pr_info("rcu/nocb: runtime NOCB toggle disabled (%d)\n", ret); + return 0; +} +late_initcall(rcu_hk_init); +#endif /* CONFIG_RCU_NOCB_CPU */ --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CF8C3385A1 for ; Thu, 18 Jun 2026 03:12:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752340; cv=none; b=R2ysAU683pJK5/YSGPQPLYkrWvJJ+ypbTkb6QLKaBDgyAEsknbwCWGaw3FGy7AKvXEEHtL2lNcNrQiy/m8R8ZC4nj9IxeGNXDGd9yZPtjNqkYwiVoQwaWQfdBrh4W0aCjJfuU+1ITbV5SClCnoZ/4RcS5cJyzWDSCAuTUpMag7g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752340; c=relaxed/simple; bh=3HjydHuBQc8Yz/ieYczKmbxBkWvFg657XPocXx3aUSI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=U3ub4+EyPTxWubCKT8Z9M0phz9uEQXMCOv77u+bG9Ij7FO9d69Mss/4Nbk1v8Yk4KKwBsxPjQD8ErBDAdUrq4koGWAEz2WfFoYBWcJUCcVkLonYvq1HvQrtkzOFNjfyFQ3XLuFGULahGWJFX4PLhNAIXKe/JUW7Jw1dOIBmpRwc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=r/uq9dbg; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="r/uq9dbg" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-2c6b3f71648so6136125ad.2 for ; Wed, 17 Jun 2026 20:12:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752336; x=1782357136; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=POSLMoWO52pf9/fOCCs2lrXk8F61Uty8Y4SKc9fV888=; b=r/uq9dbgmTSTgBtkQQFEDXdDvYL4SmgILgJ4iqOB5Uvjw1LpHYc2zeB1zrVw5ZvyVk u8SPn77jlToZzZLatU9Nbsin7wIImhw2+kThl4Louxqe6QPZaQ7EOqueg16cASSWmVjR HJmw2amKIE57qK3SWfLHjDWs+3iXJOVS4pGsidLOlEDb5oXUngZrdZI2KJrqYSxLCHxy Mihi0Plt733FUpYw6g0jRoa4L+NguZKhotv5RRP8vxak0goBovFNP+yOo5YM0tK0MruU BSq1qMKxYbDOtnwBDSGNxYoEBQUFPJ1C/7P7bFKnu+XNwIzBOINNVrThMACVY/1AORS2 eWZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752336; x=1782357136; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=POSLMoWO52pf9/fOCCs2lrXk8F61Uty8Y4SKc9fV888=; b=lrtkHU8f+J2oaQ45TBOM6PKla8RcIy1IOtwTeLAJdZyZmazSSCT3nTxd13Oyz9J5D9 hcdeylxgCFxgLKQ2N8vXTW3cpniAqoC/dQoJ8Jy6C3YhPo4fMUG5lbopvxU955pEz8n1 5qTpMnT0/mcqJFGgvZW/g/Zi4MCydgn+0Gjvt/0vUEkz8AwOoneKa+Rdd8SDmpRZWar0 BkmJmRbntzVCn91/24FaSvRSy3ov5n4wrnKE90D3dfRcLiooFyCoJVCLX1eK9OX9sB6m ZLxL8N/edcKmXQk+IPdySWJmZGXr7jZOpj8euQ1/1nPApLp1HWFto7bzm7QQP3zGACCJ bXmA== X-Gm-Message-State: AOJu0YwTu1Ye2JvzkbkNddcJMl82alnSKcPsRc5d9QaZsF6Cjt/Kw//2 KUbWj9C8ItsInXMAJKdtooYdAB4R5FLharDDXg2ExFrcLDyJuCY+p4a6 X-Gm-Gg: AfdE7cn0TEQUZRC/5EI3nqKOgjjMM3RpgNoTnggA3CCHtr7z4PRrXsSU013K5FET1ux ISPPbL0neSHsWd//BmQKMEmA0JHYDx7ENcRaJlKCXURqDErGrj2n6jXHXBx/j2/f1qdNyKaudcZ fgD3N3n3XV35JTLPaQKepbNOZWjk+Au9yaa0xM/xUyjEpqWj2k0fltrpfQ/x0PFqrvnbeAtwxtZ q4TdJM5ro7bfA9mRGNBOodktO3moLVNxbhxW9u6+rJBs78n1GoLaDEaNfWZuwOboqg007GMwnOm VAWo0ZEeH3LdkdNM/NACkMt7W3zI7hEmjds3/H+rh5vqi3s7yucKzRWkxCkJs528HPEsKLjE6rx lTlakuj3jP7Z4ODzosYQJS2Qr+zIepQwXsSA+CM5FTiLNZopZrISQU0Q07D9zur9nXgW0iFdh+t OkM7FQEnKc3/w= X-Received: by 2002:a17:903:b85:b0:2c6:cf7b:d34d with SMTP id d9443c01a7336-2c6cf7bd4fdmr41476075ad.13.1781752335960; Wed, 17 Jun 2026 20:12:15 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.12.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:12:15 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:19 +0800 Subject: [PATCH v3 08/13] genirq: Add explicit housekeeping callback for managed IRQ migration Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-8-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Register a housekeeping callback for HK_TYPE_MANAGED_IRQ. When the mask changes, iterate all active managed interrupts, intersect their current affinity mask with the new housekeeping mask, and re-apply with irq_do_set_affinity(). Managed interrupts on CPUs removed from the housekeeping set are migrated to remaining housekeeping CPUs. Only managed interrupts (IRQF_AFFINITY_MANAGED) are selected because the kernel owns their affinity; user-controlled IRQ affinities must not be overridden by the housekeeping layer. The new HK_TYPE_MANAGED_IRQ cpumask is snapshotted once under an RCU read lock before the IRQ loop, satisfying the lockdep annotation in housekeeping_cpumask() for runtime-mutable types. When the intersection of the IRQ's current affinity and the new housekeeping mask is non-empty, irq_do_set_affinity() moves the IRQ to the restricted set. If the intersection is empty (all CPUs that were serving this IRQ are now isolated), the affinity update is skipped and the IRQ continues to run on the isolated CPU temporarily. Full support for the IRQ shutdown / re-startup path (when all serving CPUs become isolated) is left for follow-up work. Guarded by irq_lock_sparse() and per-descriptor raw_spin_lock to prevent races with concurrent affinity changes. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- kernel/irq/manage.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 86 insertions(+) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 2e80724378267..ea97f455eab2a 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -2801,3 +2801,89 @@ bool irq_check_status_bit(unsigned int irq, unsigned= int bitmask) return res; } EXPORT_SYMBOL_GPL(irq_check_status_bit); + +/* + * Managed IRQ housekeeping callback: iterate all managed IRQs and ask + * the chip to move them off CPUs newly removed from HK_TYPE_MANAGED_IRQ. + */ +static void irq_hk_apply(enum hk_type type) +{ + cpumask_var_t hk_mask; + struct irq_desc *desc; + unsigned int irq; + + if (!alloc_cpumask_var(&hk_mask, GFP_KERNEL)) + return; + + /* + * Snapshot the new HK_TYPE_MANAGED_IRQ mask under an RCU read lock + * before iterating IRQ descriptors. The lockdep annotation in + * housekeeping_cpumask() requires an RCU read-side critical section + * for runtime-mutable types. + */ + rcu_read_lock(); + cpumask_copy(hk_mask, housekeeping_cpumask_rcu(HK_TYPE_MANAGED_IRQ)); + rcu_read_unlock(); + + irq_lock_sparse(); + + for_each_active_irq(irq) { + desc =3D irq_to_desc(irq); + if (!desc || !desc->action) + continue; + + /* + * Only managed interrupts are selected: they have + * IRQF_AFFINITY_MANAGED set, meaning the kernel owns their + * affinity. User-controlled IRQs are intentionally skipped. + * + * When the intersection of the current affinity mask and the + * new housekeeping mask is non-empty, re-apply the restricted + * affinity to migrate the IRQ away from newly isolated CPUs. + * If the intersection is empty (all serving CPUs are now + * isolated), the IRQ is left on its current CPU temporarily; + * handling that case (IRQ shutdown / re-startup) is left for + * a follow-up. + */ + if (irqd_affinity_is_managed(&desc->irq_data)) { + const struct cpumask *mask; + struct cpumask *tmp =3D this_cpu_ptr(&__tmp_mask); + + raw_spin_lock_irq(&desc->lock); + mask =3D irq_data_get_affinity_mask(&desc->irq_data); + cpumask_and(tmp, mask, hk_mask); + if (cpumask_intersects(tmp, cpu_online_mask)) + irq_do_set_affinity(&desc->irq_data, tmp, false); + raw_spin_unlock_irq(&desc->lock); + } + } + + irq_unlock_sparse(); + free_cpumask_var(hk_mask); +} + +static int irq_hk_validate(enum hk_type type, + const struct cpumask *cur_mask, + const struct cpumask *new_mask) +{ + if (!IS_ENABLED(CONFIG_SMP)) + return -EOPNOTSUPP; + return 0; +} + +static struct housekeeping_cbs irq_hk_cbs =3D { + .name =3D "genirq/managed", + .pre_validate =3D irq_hk_validate, + .apply =3D irq_hk_apply, +}; + +static int __init irq_hk_init(void) +{ + int ret; + + ret =3D housekeeping_register_cbs(HK_TYPE_MANAGED_IRQ, &irq_hk_cbs); + if (ret) + pr_info("genirq: managed IRQ runtime migration disabled (%d)\n", ret); + return 0; +} +late_initcall(irq_hk_init); --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFDE633BBC5 for ; Thu, 18 Jun 2026 03:12:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752345; cv=none; b=Ffb7ejv60JrRIao4ygjThIGKGERaiqwtlszYm+htPALTe3VcAkqysRWPtg3KoE8Sn95YGBn0+Ssc0qLJv2SZjVQlY4gXFX4a5s/G3OhkE6F62RcZxdbfaaqiJYuwL9p1rPyx5N1KwrjXwDa6+5ZxixJNC2fTlebD7HH8vlrh6W8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752345; c=relaxed/simple; bh=SZOR3Po26I5px5IWl3qao5Q198zBGy4no+Y9jQCKPSI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=YsQtCnefML9A/pg4n8mXY4xgKIcuPd3ij5R2JJG4K1oJN+MlarBkT1lsVqJb7W5esJGEPukXVwQH+B1xpDT5+EVJTYcghd55oh/cLBps8A7J7yWxWwXUUDBWZMl2E6EatIR4KrBPf4o4ASeQFAwkpVclZDxG0PqCIuhclTBWRd0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PHAmH5IP; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PHAmH5IP" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-2c6be9cd7afso2025515ad.1 for ; Wed, 17 Jun 2026 20:12:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752342; x=1782357142; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=GTFQPj89QbzgFs0dMZHtOVwajXrW91e6WK1jm8uL+Eg=; b=PHAmH5IPLrZV/EG6wS1qTEkI8DsVC2qRHijBIoPG3bGf1q3aI6OaG8IstQ5gFY71XG 12fAD4RRLw00mD/USJ0d/0FKlgopsQ1jjimgtHGVHW9GugaUGexPgRn0+OnXlEgCQpjM Mc4K8w9N79gZTqi0th0g4YJ9lLFmWuSv03IeKxSmZHHmeIfKJNcoH7kfsDp8jdureFKM pQT0KP1oL17uD3r0omav7WTSTnxzVfC3w3LsYpY3uvEmuFbcNnj+Co9FEPlEpNhzLtJA C/W0QhGHNrDB67BAbRYD4wXkhS5aovYwUpBeAadRGLCO6nEdUFmZNtJjnjj+c6kXiDg3 YVtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752342; x=1782357142; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=GTFQPj89QbzgFs0dMZHtOVwajXrW91e6WK1jm8uL+Eg=; b=XhbuD9D0fraGXxbCHkDuQ7HmIz2iVXiSaIhpt64H8fxjvVNphg7TvVZ/yy3jJ1D6op ldJvf3us3wLBZEFpA/4f5neDj6xCTGunDZHiic72xvud4cXguVY0vhvUAWFYUERUm/Uk h2l7T8Eei3yrWEYyUXSg7PG9yjizTlsC7kxPz9fod8xMkstAyDC7nbDIbi69qjSCyXaP tPwd9leM3aUxfxc4fe8CGk5T3nfVXC6rX/ANG2gV3QS+uKP2e/AoKIArFd+nQYx0SZz5 VCMbdJtEHX6tD1tzbPfzRK24eMseHAL6KE7Ulsln47V4MwuOt/3b17gC1nBUwvFzwXZE WCuA== X-Gm-Message-State: AOJu0YwbxZTAvVVd7DHiXTSsfVUwr1U5I1vVv0VitaZKLoxC11fKe0V3 91DQsDSK7QWj4eyjjfnLFF2IaIHtoKd3+fBaklw0ZU+bYfADIAkvIaVn X-Gm-Gg: AfdE7cn2kLZRUJ4UgpCiwPlIIXTypOnHwHbNBhHzqHESBzMhOXh37s4Dyl+TWKW9Ig1 AaZDAFvHGfJoj0yYV6IZdW53DGLStChSlaauBLKv5JeuNHmCG1s/6mry8vgrREX8YsGEI3I024O bFwslzsd3PgHlm+0DvMjstlO0/2Yfj5IfTEQ7JoiyQKr+uRKPPnMpYf9nJB+E2qA+gwpTylW/h6 ThKDse6CSqOUzvmEUwQnzlBwXQD9ZBtd/1I/mbE/y3BNwj1bh05AaTU5d1ZoMxmeV2muTckQIs2 rTleGavgyc9M1GC+fwrHoU8cWgJaW/O7ps7HGKFtsGJmKZ5rqhm7SClI4Whyhz6DEw2r6Vu232s RfNtscm4LgwixHbI7+8ZXA2tAkJQ2cJR4Jk8xFpW33I04OELntEXnttDqBHPpLyZBHxZRqB8Bjg Xe8pJgNBbjjPY= X-Received: by 2002:a17:902:e54b:b0:2bc:b80f:6782 with SMTP id d9443c01a7336-2c6e484a87fmr17591495ad.11.1781752342087; Wed, 17 Jun 2026 20:12:22 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.12.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:12:21 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:20 +0800 Subject: [PATCH v3 09/13] watchdog/lockup_detector: Register housekeeping callback for kernel-noise Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-9-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Initialize watchdog_cpumask from HK_TYPE_KERNEL_NOISE rather than HK_TYPE_TIMER at boot, so the initial mask already reflects any CPUs excluded by nohz_full=3D on the kernel command line. Register a housekeeping_cbs so watchdog_cpumask stays in sync with HK_TYPE_KERNEL_NOISE when isolation boundaries change at runtime via cpuset isolated partitions. The apply() callback copies the new housekeeping mask into watchdog_cpumask and triggers __lockup_detector_reconfigure() to restart watchdog threads on the updated CPU set. When nohz_full=3D is absent at boot, tick_nohz_full_running remains false and DHM isolated partitions do not activate tick suppression. In that case watchdog_hk_apply() is a no-op: there is no need to reconfigure the watchdog CPU set because the full nohz_full infrastructure was never initialized. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- kernel/watchdog.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++= +++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 87dd5e0f6968d..998ad94da4cb9 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -1389,7 +1389,7 @@ void __init lockup_detector_init(void) pr_info("Disabling watchdog on nohz_full cores by default\n"); =20 cpumask_copy(&watchdog_cpumask, - housekeeping_cpumask(HK_TYPE_TIMER)); + housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)); =20 if (!watchdog_hardlockup_probe()) watchdog_hardlockup_available =3D true; @@ -1398,3 +1398,57 @@ void __init lockup_detector_init(void) =20 lockup_detector_setup(); } + +/* + * Watchdog housekeeping callback: resync watchdog_cpumask with + * HK_TYPE_KERNEL_NOISE when isolation boundaries change at runtime. + */ +#ifdef CONFIG_CPU_ISOLATION +static void watchdog_hk_apply(enum hk_type type) +{ + const struct cpumask *hk; + + /* + * When nohz_full=3D was not given at boot, tick_nohz_full_running + * remains false and the full nohz_full infrastructure was never + * initialised. DHM isolated partitions do not activate tick + * suppression in that case, so there is no need to reconfigure the + * watchdog CPU set. + */ +#ifdef CONFIG_NO_HZ_FULL + if (!READ_ONCE(tick_nohz_full_running)) + return; +#endif + + hk =3D housekeeping_cpumask(HK_TYPE_KERNEL_NOISE); + if (mutex_trylock(&watchdog_mutex)) { + cpumask_copy(&watchdog_cpumask, hk); + __lockup_detector_reconfigure(false); + mutex_unlock(&watchdog_mutex); + } +} + +static int watchdog_hk_validate(enum hk_type type, + const struct cpumask *cur_mask, + const struct cpumask *new_mask) +{ + return 0; +} + +static struct housekeeping_cbs watchdog_hk_cbs =3D { + .name =3D "watchdog", + .pre_validate =3D watchdog_hk_validate, + .apply =3D watchdog_hk_apply, +}; + +static int __init watchdog_hk_init(void) +{ + int ret; + + ret =3D housekeeping_register_cbs(HK_TYPE_KERNEL_NOISE, &watchdog_hk_cbs); + if (ret) + pr_debug("watchdog: hk callback registration skipped (%d)\n", ret); + return 0; +} +late_initcall(watchdog_hk_init); +#endif --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C1AF3368A7 for ; Thu, 18 Jun 2026 03:12:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752350; cv=none; b=Y1h/51A74irrS8fMYeot6j134fF7F9LQK3PQxBdfAjnU80MTK10nvDjtrLlDap+Z+2Qrafs6+e1+Ifn4J6RO1642nGOFl5GCAdLSVL3BJuLqOPtBfkCHtTToiAYslfu63+W7jd7Mnh34J0JuWDRJLuswCE252u7gVCUG+pFzpbw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752350; c=relaxed/simple; bh=CGVA/ZVSxDQwk3QuV1O5/oZqn1YqVi8jf66NBKmDl2E=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ei7E3U4WlMTI2npYtIeP8/LYakSeJOm1Ouj4/3yah8OONO3ITN7But854+/zi/FRgzsIeqm3TvHmUn0fUOCT2A3Mw3+8dy3jMwVrZbtU+LtMEGlKLl5qJGYr3Pb9NPcx+A1OTbLFLKsQuxCNhxFQs/qbBwL2oOHzVNVsjgdpiUg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SKV3MOor; arc=none smtp.client-ip=209.85.215.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SKV3MOor" Received: by mail-pg1-f176.google.com with SMTP id 41be03b00d2f7-c86307c4e6bso231072a12.0 for ; Wed, 17 Jun 2026 20:12:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752348; x=1782357148; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=jSeuXyNGDp0AEfOTbyJ0TXed/qE2r/+QX9gKv1aP6Yc=; b=SKV3MOorcrjwv4HA9Mg8Ix23s1nMLV28lvCxj2kIbssmS4bXrK2jSPDkwBifo5gJw7 wSKtcC8O9qdpYPadAeq/CTzZhC6LkcTE3gIdRSduX1+t1i+lG+71BZl3fUb4CKTtzDWu 8hkvlCYTlj7XEKRbAhGc2dt5stId3Dq7VEqoC8GKcwb5/HaZ49r5iZ8Waus7OzN9O+Kw dxh0g1eYs7ay1/x9E3KZeSOD7WgAz27Uh/QAoGmi3scboQdvwuoki4eTEbJJz272YcuY SXP0ltz7A6aPr8F+OKocWKUQpUTnS3y9XkuNTbhsZqrcD6+AqjermA3T32B6w1jGxp6e 5zcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752348; x=1782357148; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=jSeuXyNGDp0AEfOTbyJ0TXed/qE2r/+QX9gKv1aP6Yc=; b=e4AiKqu2MhLoefLWS8I/zpENpJp0CgadDdrjVDRDQAM+K+EDUMH43G1C1F7DnRt/5Q 1re4GjOgbukrcAMn2sgJhxbbVzcX6yl5b1t0DiJSmcW8RM1+2MxDRDa/PmLcBU3XxixU AkrA3O5+1QAkkJvX2s3sEFheyGJfCCsz53GwaLBk0s/kvwinWzRgRtaVY2M2/nKu4jJf 5EkY+cmXMKKJunqSaUFiKkmXuhilrvV85WMT4M/dNZN5TDihWHL6vNIOghIW2eHHvgbh yjgySMM9vIMG9r+dVOjJVVegZP76gJ77AUUlSWar2fNT7rgBqakioYoR2x04BWWvN6Ay gURw== X-Gm-Message-State: AOJu0YwfOhbGXMPeVEZ3R3vxwhgvQrGnmy/V4+LZCx4zE2IDj2eQh1G6 oesqkaSPMh8hzmh64uY764xllofcvgtlnJZ/ZOa4dGE5XnQz5+q/dbiV X-Gm-Gg: AfdE7cnmpvcPDuBAfgd5RWSXOCMxwlvhkG2T+p6MwS7JGRMJOXxHEYOiaaiKtDqXRAd cXd2dA/vHQ3LIqKP5hWtJe2kLDGN7GOMRMZ7mvDWEyVFsKlq19ZjM+2fGpl+Vy9OvcCQuVpUocq xkY2HBPnq9YiWvdy7hUvpSSCEOVB1ppoNC0TNARxWYBefPbNn+VAtqk3EJ1qw+TwyKshAPsMbbx B0e4taHklDmk+dmyVeLAtmor/kVWe7oUe7iuEAQW7t9VNeP4M5/gfgSEwNrljPc2ch/O4NuXiXq 33U2cxWQR59gG6CPPPdE3qPCHu0/kwAEDfNSmYRT3l3l6NH2lhxqP9nmdXWIOYSofX7UK82ghD7 JsfRXtaIBC0nImRq02XC7NuW+4IR/o6lvjXQf8Ba1+JNZG2RXws644RicE142SDiP1YKZ9OrvXH We54MnN751/Hk= X-Received: by 2002:a17:903:19ce:b0:2bf:128d:f7ff with SMTP id d9443c01a7336-2c6f34327c6mr5866785ad.16.1781752348483; Wed, 17 Jun 2026 20:12:28 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.12.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:12:28 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:21 +0800 Subject: [PATCH v3 10/13] sched: Guard sched_tick_start/stop against uninitialized tick_work_cpu Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-10-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 sched_tick_start() and sched_tick_stop() are called during CPU hotplug for CPUs not in the HK_TYPE_KERNEL_NOISE set. They dereference tick_work_cpu, which is allocated by sched_tick_offload_init() and only called from housekeeping_init() when nohz_full=3D is present at boot. When the DHM subsystem first-enables HK_TYPE_KERNEL_NOISE at runtime via housekeeping_update_types(), tick_work_cpu remains NULL because sched_tick_offload_init() is __init-only and cannot be re-invoked. A subsequent CPU offline/online cycle for an isolated CPU triggers WARN_ON_ONCE(!tick_work_cpu) followed by a NULL-pointer dereference in per_cpu_ptr(tick_work_cpu, cpu), crashing the kernel. Since nohz_full=3D was not active at boot, tick_nohz_full_running remains false and the tick-offload infrastructure is never activated; isolated CPUs continue to receive their own ticks. Guard both helpers with an additional !tick_work_cpu check so they become no-ops in this case. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- kernel/sched/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 371b509d92164..df004e3efca70 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5778,7 +5778,7 @@ static void sched_tick_start(int cpu) int os; struct tick_work *twork; =20 - if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE)) + if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) || !tick_work_cpu) return; =20 WARN_ON_ONCE(!tick_work_cpu); @@ -5799,7 +5799,7 @@ static void sched_tick_stop(int cpu) struct tick_work *twork; int os; =20 - if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE)) + if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) || !tick_work_cpu) return; =20 WARN_ON_ONCE(!tick_work_cpu); --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B2AF33E34B for ; Thu, 18 Jun 2026 03:12:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752356; cv=none; b=WgOF2CzfLkjjMrVW1lEV0w9WyPbhsCEdzPOXtIC+d5iASQ+GJSEj3ocRETw/40P9h3kMSkI7mLf0j9zUjDvXCa9hS7n6pFkqgBrZOVgNlZj7lztpe3OyBK9+CTOsmTz4t80b8gQDPP67O2tiOGq+ZurYsNJFlXs97YE0XZASvpg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752356; c=relaxed/simple; bh=1wFtiEBV/eExa/0fk9lHm27dQyGmUqkvypCV6YhwRzY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Q/vopwg6620mImx9p0Zqhwr85CK1KtK36ZNxpyLS17XqFhedjOsfQKGDnmSke+rMOq3pkaMoHvbtG0r8De16XbQSdfg5HGp1YgC0r85tfbzq1Y+hCqda0vYS/N6CtHunztXahPAb/Rs6E22ZIp9Y61a9ifCDYN9u/V9VaGNalzs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VsaYj/jr; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VsaYj/jr" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2c6be9cd7afso2026175ad.1 for ; Wed, 17 Jun 2026 20:12:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752355; x=1782357155; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=115mykkTwm3kAcmxwjkRrMY9t8k6+X+EEMpEx7hgUEU=; b=VsaYj/jrf+ZEe5Imp3Ek8cO/T2bMuZNeZTotlq3vYw8xilW7zHAGNKorbVZdo3YJSq aIBB0cD/0HMBDeUaDLkvJ834mUUS33tDonCbToIMcorUcxlLR4V+scQ7bu5nnJbjuz+A L2aVa/LLqsOkG1hoC3GRhgwYxotJl8xbGtv00+WzMjQgrblHsDuXs7QALbjCEA0UxpZp LRSlXzk0ixotFpQzgR7+3g5yW43Q2T9hunXg2yFsPi4XSduYvxGEgTilfe8OPNUtL+jd 3Fvx2D5yh4bv0soZV0neOlkkfSFRb2dF00j3fZ7ogOqBYnvZz0b/o3zldRW4ljdjRq9H M4dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752355; x=1782357155; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=115mykkTwm3kAcmxwjkRrMY9t8k6+X+EEMpEx7hgUEU=; b=RXXzOiZ/KXyV9l58isIoSugz/oQOsYtcGYUMZ6+jg7PgZ7PNBamJ044cVO9m9qhV0T zpxNYeR/FA27g5+mUFyhB5r6Cajo+aDTirfd9aOVDJW2h6KdWFDXQw30H9bVqL2PyWEI ncCdulfQRSixS9NrjgI9kHzH9bkBpZdEWXBDt7fHvIefAajWRk/yV3MGNXDZEYPs4iFH E0csn/sL+j9Py/C4YiXDdlBr6QbonUt4VF+u+dAA9BMSF+OWFGvzyKwELBsNUJcDEVJY n0ZEEfluKluHf1c256nfNWzSI/23QlqJppYUB5vQUD3KOZSGY3PV+4frp6krkZG4Fb+y Mlcw== X-Gm-Message-State: AOJu0YzHhnrAmQMNMbV8hMpUmog62Vwe13Bj88Zf++alvVmOfPxXFboj 4jS+tR6RQ2XX8G+VGOJtMp6USaYhwU8BXYBLnxA7cBR3jkTbH2jjTymx X-Gm-Gg: Acq92OF/gEY5WC4oPriNKx+gn4nWxehSPwiIb8jf/i/iRVsnHWBVJy56m8uo5CvJFRy Ip5tU+bju/H1/eC/iVGTiA1PqfDSh9YX2yg6rZ5gHWhclhJrgsE61a95Om2GbdBfF6C8qj4BOpG 6Fj1DgMZOLIcvylcbpgTS0VnnmvsQaG8tBxjNEphKbmYzKRXJ4z8vSZEajfcZKRBLPsGczciC16 Wy+VfbcQr1PK6SHtmKdMRW8AJtlpOAhAj8iOOPIIc2WYw3WUjveVvs7zqEZs72x4RLdHR+1ixZ3 w2/RgIGr+ygF3xnG8uWPT71mxJO6/bIfHOrFom6cWFGNrn48Aso8xIM3zXWVn5QN34bemPrylgu k3dr6mCXV8myfPBYO7C4KuN2ysZcRVmJ5csFLl7mSst33d0o4am7/4DoW5KgrpwtffFvnUMkZvb yNVIlfSmjaslL2DUyhqEHSzg== X-Received: by 2002:a17:902:ced0:b0:2c0:c625:400d with SMTP id d9443c01a7336-2c6e52debf1mr15847925ad.37.1781752354656; Wed, 17 Jun 2026 20:12:34 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.12.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:12:34 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:22 +0800 Subject: [PATCH v3 11/13] cgroup/cpuset: Extend isolated partition to trigger kernel-noise isolation Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-11-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 When a cpuset isolated partition is created or destroyed, also drive kernel-noise housekeeping types (HK_TYPE_KERNEL_NOISE and HK_TYPE_MANAGED_IRQ) through housekeeping_update_types(). The sched domain mask (HK_TYPE_DOMAIN) is updated first via the existing housekeeping_update() call, then the explicit callback chain in housekeeping_update_types() invokes subsystem apply() handlers to toggle nohz_full, managed IRQ migration, and RCU NOCB offloading. The update runs outside cpuset_mutex and cpus_read_lock, protected only by cpuset_top_mutex. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- kernel/cgroup/cpuset.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 5c33ab20cc208..67b93bd4d58f2 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1347,17 +1347,30 @@ static void cpuset_update_sd_hk_unlock(void) rebuild_sched_domains_locked(); =20 if (update_housekeeping) { + static const unsigned long noise_types =3D + BIT(HK_TYPE_KERNEL_NOISE) | BIT(HK_TYPE_MANAGED_IRQ); + update_housekeeping =3D false; cpumask_copy(isolated_hk_cpus, isolated_cpus); =20 - /* - * housekeeping_update() is now called without holding - * cpus_read_lock and cpuset_mutex. Only cpuset_top_mutex - * is still being held for mutual exclusion. - */ mutex_unlock(&cpuset_mutex); cpus_read_unlock(); + + /* + * Update the sched domain mask first; it must succeed + * before the kernel-noise types because workqueue flush + * and timer migration depend on the sched domain mask. + */ WARN_ON_ONCE(housekeeping_update(isolated_hk_cpus)); + + /* + * Drive kernel-noise types through the new explicit + * callback chain. Tik/rcu/genirq subtypes react + * through their registered housekeeping_cbs apply() + * handlers. + */ + WARN_ON_ONCE(housekeeping_update_types(noise_types, + isolated_hk_cpus)); mutex_unlock(&cpuset_top_mutex); } else { cpuset_full_unlock(); --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F0F233B6D9 for ; Thu, 18 Jun 2026 03:12:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752362; cv=none; b=GXBuuLw7HaLmrlzzQrm0EoVrGJIbNHtmWKsc/B7en+8biBNvP2RWkAh3DD9klGTS8qCAuvorEHBeJqY9kYQYjRk1B4msusn6n1EJL6BqHg8HH/06zNp+DHC+yc9LgXfyGVM9oFhU5SeUY6vRlL0b3ZWROCigvjuyrMrkRmhGR/0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752362; c=relaxed/simple; bh=SZSfvX12VlEuwZl+6hAVKIfqxpPnB1VqKOp94X4ryo0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=PABc0P9BIxRD+NelpEjC83XVkA1kAXl61gIxkWGE602ekYW4rEKm8FkG9xX5mnIb0X7Q7q8+dX5MgK329Hao7MzbM/iVmF+lWbGM/Me9ulCGmG9BF2UzuxUuWl4aFVhUoOy4LuLUQi41K+FRUnnI06RMZ6RSqp2/DQm3dju32ag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CZqe4DdH; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CZqe4DdH" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2bf125989f2so2930365ad.3 for ; Wed, 17 Jun 2026 20:12:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752361; x=1782357161; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=v3I6IX1xUYJMTfIXEi0w2UhaS8cMsiIG3uPORXy/Kqg=; b=CZqe4DdHuvA83HXuWHxNF+zbSnA/5STgQ7D8CWrVBB6gmc4KskyUW59tKbRVl1V3vC iEnxeIbNe3Xk0TsPZiLIVbK3M1WDMzKTjnTvV9gA2Y5bT1L1ZRtvGxWDT2KP1sVCWjaO DN04QcsZ6c6SDbsjZakOm/AD3yccCOyNCXMMd6c1EsiqTocuIJ9wUGoy5B6H/OgIey5M G6vWB6sE3qPynStWyu7ckXbk8zpmAUQ7G1rwosSEp8tUxgEfCqd83FBTwaIBr2xpeNe7 kXFDpdpf+DCO9kP2Wb9HfyfFet0xxaYu/Aj2ft5pOLvWHGzAaqcqE8i/AruzjUBA22wT CLWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752361; x=1782357161; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=v3I6IX1xUYJMTfIXEi0w2UhaS8cMsiIG3uPORXy/Kqg=; b=FVXkszNkS1cc38Xx2tQhnVZSu8dwu+H2ebN6o7f254PE6Zp7SSPRAtGSHIVhZEPmgz ps/xcLgFQtPzpLzmLA/FGLm1FViiteTEWt1MUo7NUrBfcOmwt/xviQR70cLetRhRAdiM 02XrzeVk2EMs31O0fdKfoamTCsL5a10HkctWCbLF1aDLJ7A+aTejlMJ9Xkr6+EQda7uf dF2m1ElN1v61wU04gVYko4IiPR51azsylFmzB+7NLFzTJYjaQkJ8fta3DY7Vnt4mAYuQ WX4MvQkSPoDgSzsJ/PLPOkstD7qnLlBZNZqCdW/3Mm3D0iDVLmWVea4Aziy1mH4uoGXQ wIAQ== X-Gm-Message-State: AOJu0YxjTAK8Ndsxu7MIOa2q5RuBzZpRhyEOLNTQ73oHKsAnuNJZewjN 5JVYIPIMM7ggRPt8okn2YWl1cuwQV2y2Pj4DCfgJoFVkqwInRqxxL9D0 X-Gm-Gg: AfdE7cmETlkpQLmpSgSwXjORWIx5kPqWtFV7EGVyfrM5ijbJLey6/1MryT/HQg0rEVF BiXNZBelgp9f1PAktoe/Py1ymuR5szUps2H7XqnlCc2N+M8Y7rzqtXfaQg8kArk1CqMjdIA4+G2 fCnE12c9Ltsvf8X+IML8+lOhKqRFGe1vgJmwooJJUR+Shjl5AXGHaLsti3km2f2Xsuna/SJocIw xc608yaUGnxihitekbBUr6/oBdTrmTjQAxdPjiAnufhVfctZMCkKZyvr+DW3Zkg/NHkdppiRMl6 EVKNQmm/8mUMDTa6pu+YPIF69TqIF98sbetWuK+JVb+N+pEaVrXCTl6PqERHR+8EdUpHWkqTNx6 +dTwZfVagPhu2+HPdcewrSkEJpkc3v+rGtB4lnvc1/ARwwcWG2MUE1nK4LFl54TcZSPN+t6YWZl XEvxLgd6iwB9Y= X-Received: by 2002:a17:902:ea07:b0:2be:22cc:e227 with SMTP id d9443c01a7336-2c6bbf9dfa0mr65761055ad.4.1781752360871; Wed, 17 Jun 2026 20:12:40 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.12.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:12:40 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:23 +0800 Subject: [PATCH v3 12/13] docs: cgroup-v2: Document kernel-noise isolation via isolated partitions Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-12-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Document that cpuset.cpus.partition=3Disolated now drives runtime updates of the housekeeping masks for kernel-noise types: nohz_full (tick suppression), RCU NOCB offloading, and managed IRQ migration. No additional cgroupfs files are required; the partition update path automatically triggers explicit housekeeping callbacks for all affected subsystems. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- Documentation/admin-guide/cgroup-v2.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-= guide/cgroup-v2.rst index 6efd0095ed995..7c3b048e75cb5 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2721,6 +2721,14 @@ Cpuset Interface Files kernel boot command line option. If those CPUs are to be put into a partition, they have to be used in an isolated partition. =20 + When an isolated partition is created or destroyed, the kernel + automatically drives runtime updates of the housekeeping masks + for kernel-noise types (nohz_full, RCU NOCB, managed IRQ + interrupts). This extends isolation beyond scheduler domains: + the tick is stopped on isolated CPUs, RCU callbacks are + offloaded to housekeeping cores, and managed interrupts are + migrated away. No additional cgroupfs files are required. + =20 Device controller ----------------- --=20 2.43.0 From nobody Sat Jun 20 16:30:40 2026 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F5C733A6E9 for ; Thu, 18 Jun 2026 03:12:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752369; cv=none; b=Al7DBOssu4nXTf/mSMJxcnkvGdfuDzHT8LbnB18cLSjPir/6pLg1cKnJt9Ta6zr6jUNTQdYcNEQYQpQEJN5BfLMmKocDlMH71PmS24+aXrts9C9z6GuaDfZCaq1tQLRJGoB8ulf/68vVRr4jNnsiVu5bMKvF51ZS0OVZAa5tpmM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781752369; c=relaxed/simple; bh=sFs7mynJbe12HuAxVNUwUdIyzOYLmiTuiC6AfxKdZXc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Nk+KFtqj6HfaYveDrvbpgtQo+WWeSvKNcbFYgaHocFOOnF3Coh2OoP6JzSwiwuazWFh99sHZYsxERTFDt6sker+dTP7/vmdIt09RSd2Y+XKZv5ao45Hu+jc+chaHXu12WnNIPcb05H9msE56PnlsjvxOj/O/kZwc69DJrnDU7w0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=crKDOiIJ; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="crKDOiIJ" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2c6bdb8a8bdso2739885ad.1 for ; Wed, 17 Jun 2026 20:12:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752367; x=1782357167; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=EAMlJUtgJr93wlY4Vrgrp/rfGIjYwybe1wRU3OQLFss=; b=crKDOiIJ+3vKsFMXjsHxDDuyInXKr6ZbM7f5q1ZDmMeA6uRS3Bb2BYhuZtS6wEynNi p1MB8ok+AB3CbvCM9bMLeecS0eJn7UAXcOzltlfGri40RG0E3iwubEoJTpKrsH1HcV6a m4tq3a8sD3czy2PCM4HyXTvbHBRiuLgP+ULmQsllwKBsSC8SVnlfVjX5zvSATwHqECmv bEGMhjQAT+xeerFrgznm3954Yj/Nb7mA9htAztoGDf6j2xOYdAfi8sCBG27zeSpCGrq/ 7F4V87e4yZp+qpHQ57Am52AXBzh03xYxbNAWOXv0Y1xI+FO/0AA+Fjz4S5xvFusCHjOB kxAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752367; x=1782357167; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=EAMlJUtgJr93wlY4Vrgrp/rfGIjYwybe1wRU3OQLFss=; b=sZ6lGlR3yzQgnUAbPYc2Qid4S2mrbslSx/SfVpCQ6aPLOvo2lvp1GHfO/wny06uSBd I+OhXqW+Zj717K+AJ43skx7NbeLWyXXBkNCbjm20R6flSGGslQOz2yoCWJ3SxlSOir9d 8zPCZ/CzfDUzS2Xdpl28JHeeJmC4tnZ68qebrVjBOgQq3wipSk1dU88wINyU9l6BnWIc v2zdRrRUrdZUHNtSHqofxauMQQ/fB3advHyvtidPOQNHx3yhTtyzVzkK5nfCHeJzKNhC Zp7K+ctwX9vs9VAkGmHEFcd9EC0Bj7e/zYuxzdAWilsErngKnirhbsEkrfEZaMEowpUy F5cw== X-Gm-Message-State: AOJu0YxennTZRqyRY+oPKLvUxrrjPC+cBkfjIVUbndwoHKHVlnmIdCsR 9wT9+Weey55UStYmvMrmDy5kFMICdF27oXq7yWVQ5mwxFcG5RhEXXVwp X-Gm-Gg: AfdE7cki9anT+SGI4Fheevz+GJh5a21o9MMoA67JcXt+J7t5uMFCFrgOd6gKa+NiC47 /DsCdrkATp3ruK6uT7KvOxYGNm4+LHPxi+OQBGyNJbdOF+QaaMLrLCqwdBe/OA486sAC5VCIaaO LOCIJXjQeph+NfJ3i3O2sR7ws9ACmeYjqBwA9Hht58wDSVn43q0z7OdO1V4nm0REcrMkzkza2nR IjE4qOfHwpLMOj1xYu7AqYDGqGjfzcs5pZKOYCGvn6J+JAlFl+vupj7kMF6XxBFPuTDqqjpY8rs pRBkPFK5TZPcyvEfh6v2HQYIg5Fl6dYwtBl8fmljKaq++qlOKAJq3TbvA00oc4rQc1eRyHHpnXl LLsXbiyGBSwCLVLa1HdqMQfYiPyH7MzKevzscx2lUh20LzoeDKkYZUMpChtbLemPFr5ayy6kJrZ QDJw3DCjnIePI= X-Received: by 2002:a17:902:f547:b0:2c2:bd0d:3cfa with SMTP id d9443c01a7336-2c6bc0c6684mr67014325ad.11.1781752366951; Wed, 17 Jun 2026 20:12:46 -0700 (PDT) Received: from [127.0.1.1] ([138.199.21.246]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c6a403b242sm60152975ad.31.2026.06.17.20.12.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 20:12:46 -0700 (PDT) From: Jing Wu Date: Thu, 18 Jun 2026 11:11:24 +0800 Subject: [PATCH v3 13/13] selftests/cgroup: Add kernel-noise isolation test to cpuset selftest Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260618-wujing-dhm-v3-13-28f1a4d83b68@gmail.com> References: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> In-Reply-To: <20260618-wujing-dhm-v3-0-28f1a4d83b68@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Tejun Heo , Jonathan Corbet , Shuah Khan , Shuah Khan , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Jing Wu , Qiliang Yuan X-Mailer: b4 0.13.0 Add test_hk_noise_isolated() to test_cpuset_prs.sh to verify that creating and destroying an isolated cpuset partition updates both the domain isolation state and the kernel-noise (nohz_full) state. For domain isolation, the test checks cpuset.cpus.isolated before and after the partition create/destroy cycle. For kernel-noise isolation, the test reads /sys/devices/system/cpu/nohz_full to confirm that the CPUs placed in an isolated partition appear in the nohz_full mask while the partition is active, and are removed from it once the partition is destroyed. This sysfs attribute only exists when CONFIG_NO_HZ_FULL is enabled; the nohz_full checks are skipped when it is absent so the test remains usable on kernels without NO_HZ_FULL. Add cpu_in_cpulist() to correctly determine whether a CPU number falls within a kernel cpulist string (e.g. "4-7"). A plain grep cannot detect membership in the interior of a range; cpu_in_cpulist() walks each comma-separated element and handles both single values and lo-hi ranges explicitly. The test also covers: rejection of all-CPU isolation, the SMT sibling constraint, nested partition inheritance, and a 100-cycle pressure test. nohz_full is verified to be restored to its pre-test value after each create/destroy cycle and after the pressure test. Fix awk invocation to drop the spurious -e flag. Signed-off-by: Jing Wu Signed-off-by: Qiliang Yuan --- tools/testing/selftests/cgroup/test_cpuset_prs.sh | 204 ++++++++++++++++++= +++- 1 file changed, 203 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/test= ing/selftests/cgroup/test_cpuset_prs.sh index a56f4153c64df..047db14953fac 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -20,7 +20,7 @@ skip_test() { WAIT_INOTIFY=3D$(cd $(dirname $0); pwd)/wait_inotify =20 # Find cgroup v2 mount point -CGROUP2=3D$(mount -t cgroup2 | head -1 | awk -e '{print $3}') +CGROUP2=3D$(mount -t cgroup2 | head -1 | awk '{print $3}') [[ -n "$CGROUP2" ]] || skip_test "Cgroup v2 mount point not found!" SUBPARTS_CPUS=3D$CGROUP2/.__DEBUG__.cpuset.cpus.subpartitions CPULIST=3D$(cat $CGROUP2/cpuset.cpus.effective) @@ -1204,9 +1204,211 @@ test_inotify() echo "" > cpuset.cpus } =20 +# +# cpu_in_cpulist +# +# Return 0 if appears in (a kernel cpumask list such as +# "0-3,8-31"), non-zero otherwise. The kernel cpulist format uses ranges +# ("lo-hi") and comma-separated items; a simple grep cannot detect that a +# number falls in the middle of a range, so walk each element explicitly. +# +cpu_in_cpulist() +{ + local cpu=3D$1 list=3D$2 range lo hi + for range in $(echo "$list" | tr ',' ' '); do + if [[ "$range" =3D=3D *-* ]]; then + lo=3D${range%-*} + hi=3D${range#*-} + [[ $cpu -ge $lo && $cpu -le $hi ]] && return 0 + else + [[ $cpu -eq $range ]] && return 0 + fi + done + return 1 +} + +# +# Test that isolated partition creation/destruction drives kernel-noise +# housekeeping mask updates and remains correct under pressure. +# +# Requires: >=3D8 CPUs, no isolcpus=3D boot conflict, root +# +test_hk_noise_isolated() +{ + local ISOL_BEFORE TEST_CPUS i PART ISOL_AFTER ISOL_RESTORE + local NOHZ_FILE NOHZ_BEFORE NOHZ_AFTER NOHZ_RESTORE + local HK_NOHZ_CHECK=3D0 + local LOOPS=3D100 + + [[ $NR_CPUS -ge 8 ]] || { + echo "HK-noise test skipped: need >=3D8 CPUs, have $NR_CPUS" + return 0 + } + + # Detect whether CONFIG_NO_HZ_FULL is active: the sysfs attribute + # /sys/devices/system/cpu/nohz_full exposes the current nohz_full + # cpumask and is only present when NO_HZ_FULL is enabled. + NOHZ_FILE=3D/sys/devices/system/cpu/nohz_full + [[ -r "$NOHZ_FILE" ]] && HK_NOHZ_CHECK=3D1 + + cd $CGROUP2/test + echo member > cpuset.cpus.partition 2>/dev/null + echo "" > cpuset.cpus 2>/dev/null + + ISOL_BEFORE=3D$(cat $CGROUP2/cpuset.cpus.isolated) + [[ $HK_NOHZ_CHECK -eq 1 ]] && NOHZ_BEFORE=3D$(cat $NOHZ_FILE) + TEST_CPUS=3D"4-7" + echo $TEST_CPUS > cpuset.cpus + + # + # Basic create/destroy cycle =E2=80=94 verify domain isolation and + # kernel-noise (nohz_full) changes together. + # + console_msg "HK-noise: basic create/destroy cycle" + echo isolated > cpuset.cpus.partition + + ISOL_AFTER=3D$(cat $CGROUP2/cpuset.cpus.isolated) + [[ $ISOL_AFTER !=3D "$ISOL_BEFORE" ]] || { + echo "FAIL: isolated set unchanged after partition create" + exit 1 + } + + if [[ $HK_NOHZ_CHECK -eq 1 ]]; then + NOHZ_AFTER=3D$(cat $NOHZ_FILE) + # Verify that the newly isolated CPUs (4-7) appear in nohz_full. + # nohz_full =3D inverse of housekeeping, so isolating 4-7 should + # add them to nohz_full. + for cpu in 4 5 6 7; do + if ! cpu_in_cpulist $cpu "$NOHZ_AFTER"; then + echo "FAIL: cpu${cpu} not in nohz_full after isolation" \ + "(got: '$NOHZ_AFTER')" + exit 1 + fi + done + console_msg "HK-noise: nohz_full after isolation: $NOHZ_AFTER" + fi + + echo member > cpuset.cpus.partition + + ISOL_RESTORE=3D$(cat $CGROUP2/cpuset.cpus.isolated) + [[ $ISOL_RESTORE =3D "$ISOL_BEFORE" ]] || { + echo "FAIL: expected '$ISOL_BEFORE' after destroy, got '$ISOL_RESTORE'" + exit 1 + } + + if [[ $HK_NOHZ_CHECK -eq 1 ]]; then + NOHZ_RESTORE=3D$(cat $NOHZ_FILE) + [[ "$NOHZ_RESTORE" =3D "$NOHZ_BEFORE" ]] || { + echo "FAIL: nohz_full not restored: expected '$NOHZ_BEFORE'," \ + "got '$NOHZ_RESTORE'" + exit 1 + } + fi + + # + # Reject all-CPU isolation (must leave at least one housekeeping CPU) + # + console_msg "HK-noise: reject all-CPU isolation" + echo 0-$((NR_CPUS - 1)) > cpuset.cpus + echo isolated > cpuset.cpus.partition + PART=3D$(cat cpuset.cpus.partition) + [[ $PART =3D *invalid* || $PART =3D member ]] || { + echo "FAIL: all-CPU isolation was not rejected, got '$PART'" + exit 1 + } + + # + # SMT safety: partial sibling isolation + # + console_msg "HK-noise: SMT sibling constraint" + echo $TEST_CPUS > cpuset.cpus + echo isolated > cpuset.cpus.partition + PART=3D$(cat cpuset.cpus.partition) + [[ $PART =3D isolated ]] || { + echo "FAIL: could not create isolated partition, got '$PART'" + exit 1 + } + echo member > cpuset.cpus.partition + + # + # Nested partition: parent root =E2=86=92 child isolated + # + console_msg "HK-noise: nested partition inheritance" + echo $TEST_CPUS > cpuset.cpus + test_partition root + mkdir -p HK_SUB + cd HK_SUB + echo 4-5 > cpuset.cpus + echo isolated > cpuset.cpus.partition + ISOL_AFTER=3D$(cat $CGROUP2/cpuset.cpus.isolated) + [[ -n $ISOL_AFTER ]] || { + echo "FAIL: nested isolated partition not reflected in cpuset.cpus.isola= ted" + exit 1 + } + echo member > cpuset.cpus.partition + cd $CGROUP2/test + echo member > cpuset.cpus.partition + rmdir HK_SUB 2>/dev/null + + # + # Pressure test: 100 create/destroy cycles + # + console_msg "HK-noise: pressure test ($LOOPS cycles)" + echo $TEST_CPUS > cpuset.cpus + for i in $(seq 1 $LOOPS); do + echo isolated > cpuset.cpus.partition + PART=3D$(cat cpuset.cpus.partition) + [[ $PART =3D isolated ]] || { + echo "FAIL: cycle $i create failed, got '$PART'" + exit 1 + } + echo member > cpuset.cpus.partition + PART=3D$(cat cpuset.cpus.partition) + [[ $PART =3D member ]] || { + echo "FAIL: cycle $i destroy failed, got '$PART'" + exit 1 + } + done + + # + # Stability: after pressure test, verify final state + # + console_msg "HK-noise: post-pressure cleanup" + echo isolated > cpuset.cpus.partition + ISOL_AFTER=3D$(cat $CGROUP2/cpuset.cpus.isolated) + [[ -n $ISOL_AFTER ]] || { + echo "FAIL: isolated set empty after pressure test" + exit 1 + } + echo member > cpuset.cpus.partition + echo "" > cpuset.cpus + ISOL_RESTORE=3D$(cat $CGROUP2/cpuset.cpus.isolated) + [[ $ISOL_RESTORE =3D "$ISOL_BEFORE" ]] || { + echo "FAIL: final isolated '$ISOL_RESTORE' !=3D '$ISOL_BEFORE'" + exit 1 + } + + if [[ $HK_NOHZ_CHECK -eq 1 ]]; then + NOHZ_RESTORE=3D$(cat $NOHZ_FILE) + [[ "$NOHZ_RESTORE" =3D "$NOHZ_BEFORE" ]] || { + echo "FAIL: nohz_full not restored after pressure test:" \ + "expected '$NOHZ_BEFORE', got '$NOHZ_RESTORE'" + exit 1 + } + fi + + cd $CGROUP2 + if [[ $HK_NOHZ_CHECK -eq 1 ]]; then + console_msg "HK-noise: PASSED (with nohz_full verification)" + else + console_msg "HK-noise: PASSED (nohz_full skipped: CONFIG_NO_HZ_FULL not = active)" + fi +} + trap cleanup 0 2 3 6 run_state_test TEST_MATRIX run_remote_state_test REMOTE_TEST_MATRIX test_isolated test_inotify +test_hk_noise_isolated echo "All tests PASSED." --=20 2.43.0