From nobody Sat Feb 7 17:48:35 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88F47363C7E for ; Mon, 12 Jan 2026 16:01:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233670; cv=none; b=TC5rvohl8FVKFMMlIQUf2PjLt1LhL8tOmylnBKUm6f82+gwlDxKN2A5qiN72wsndGvnONVTUALvrtnCofaHtvns8b4KdwoGclOv0pDwq4ZYTgOs16xjobsmbCmboOP9QqazeMxEyM8skuxMex3QOPNbegOh6OeDtTm8hIl+ZfYc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233670; c=relaxed/simple; bh=/Hr5td0q2fKUO8L0UFsfzO/yV0xkttzaHGLMqqbjcaA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c0uoPMZUOwAeU+m0NhlFhlrM7AmDgMaMvLMclj8BbqifhXm3muM2IbHOJfH1aeFbsMygdIcLsKiIeTkPddzojFva3FxO7BJEG/2FSD112SatbGmBElnMwcbTAfQPTn6ISwuWZnHUyBGoWNOkD9BGok27Q9OpkHaIcWIVNGs8DX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NqSeYKOw; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NqSeYKOw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768233667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B2dX1nduvd1Y2+YZKFwl6uVgXF8yg4bmyS1EUMGzFdk=; b=NqSeYKOwPPVWrJMg8MM4vc1mMN9hecBFs9LYccq0XePADHaA9byJS6UHFgg32w33cLxiup lwcRTZENkDI4UGxVvytYE649pq2ql2yuETSPzU1kq0V58tYj4cGXsBMEUPiiyT6BT9ph7/ RjWJiVhxW4ZUty1wVm3JkOMOGmx1qKU= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-75-adJYnnBANv61u0nF6HBMIw-1; Mon, 12 Jan 2026 11:01:05 -0500 X-MC-Unique: adJYnnBANv61u0nF6HBMIw-1 X-Mimecast-MFC-AGG-ID: adJYnnBANv61u0nF6HBMIw_1768233663 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 398D51955E8C; Mon, 12 Jan 2026 16:01:01 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.89.95]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3A13619560B2; Mon, 12 Jan 2026 16:00:58 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, Sun Shaojie , Chen Ridong , Waiman Long , Chen Ridong Subject: [PATCH cgroup/for-6.20 v5 1/5] cgroup/cpuset: Streamline rm_siblings_excl_cpus() Date: Mon, 12 Jan 2026 11:00:17 -0500 Message-ID: <20260112160021.483561-2-longman@redhat.com> In-Reply-To: <20260112160021.483561-1-longman@redhat.com> References: <20260112160021.483561-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" If exclusive_cpus is set, effective_xcpus must be a subset of exclusive_cpus. Currently, rm_siblings_excl_cpus() checks both exclusive_cpus and effective_xcpus consecutively. It is simpler to check only exclusive_cpus if non-empty or just effective_xcpus otherwise. No functional change is expected. Signed-off-by: Waiman Long Reviewed-by: Chen Ridong --- kernel/cgroup/cpuset.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 221da921b4f9..da2b3b51630e 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1355,23 +1355,29 @@ static int rm_siblings_excl_cpus(struct cpuset *par= ent, struct cpuset *cs, int retval =3D 0; =20 if (cpumask_empty(excpus)) - return retval; + return 0; =20 /* - * Exclude exclusive CPUs from siblings + * Remove exclusive CPUs from siblings */ rcu_read_lock(); cpuset_for_each_child(sibling, css, parent) { + struct cpumask *sibling_xcpus; + if (sibling =3D=3D cs) continue; =20 - if (cpumask_intersects(excpus, sibling->exclusive_cpus)) { - cpumask_andnot(excpus, excpus, sibling->exclusive_cpus); - retval++; - continue; - } - if (cpumask_intersects(excpus, sibling->effective_xcpus)) { - cpumask_andnot(excpus, excpus, sibling->effective_xcpus); + /* + * If exclusive_cpus is defined, effective_xcpus will always + * be a subset. Otherwise, effective_xcpus will only be set + * in a valid partition root. + */ + sibling_xcpus =3D cpumask_empty(sibling->exclusive_cpus) + ? sibling->effective_xcpus + : sibling->exclusive_cpus; + + if (cpumask_intersects(excpus, sibling_xcpus)) { + cpumask_andnot(excpus, excpus, sibling_xcpus); retval++; } } --=20 2.52.0 From nobody Sat Feb 7 17:48:35 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 682033644A5 for ; Mon, 12 Jan 2026 16:01:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233688; cv=none; b=oIkZydIs4/zJBH51HQ1KycGb3FjmhdsJbayQCOg4kERbkB2f8lpzOvUZ7ax/Ajzbocm93muNeFtTUh/a2uamTUAnBvVz5JzOq1M+L3dv98+xDedL50A/CGhQipHAhfuMYMVgKEmWR73lwdpnbxSSRZYwC0Rf5EaC6oeA2d0cXak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233688; c=relaxed/simple; bh=6LulNGCavc+yLdN4iqeRSLWPIkycrl6AeyWZ5TlwY/Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mLVyusy17pCf1QN6ctrfrudIgdH3r7ld4GM60NyuqTii3DnMhrr68OEHBiWh0YBVYsAgXsYfx0ptAk6rpLpab0QWwsNUWY2VVRUZXMBu/a6u3HkXTDY88FpEWnHTvVvEijQBuE8EPKX2/Be3sizt+S+Vz/ymHJ88k8caYB8SZ90= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hZipLVF2; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hZipLVF2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768233683; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HwdXpgUBm2FrvWy3bWrjnHVwbNhC3pc6fLx1+G89qs8=; b=hZipLVF2c3qOMmeMnJhWe19E1GgTAZ4aFozxaYJEad8mccFINJwzI7MLB2970Q875p/BTP dgno2eWHfcfPMgudeJTLxB1zECuX24VaeU1bmF8agVHgY7WZdH0WkTsxq83VYdOX5o2RIz I5nxlZ++sfDBPX6XtOuPH3zPwxES+i0= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-482-0E7jcu3UOUmshcBvmAQIHg-1; Mon, 12 Jan 2026 11:01:13 -0500 X-MC-Unique: 0E7jcu3UOUmshcBvmAQIHg-1 X-Mimecast-MFC-AGG-ID: 0E7jcu3UOUmshcBvmAQIHg_1768233668 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8B9F218D95DE; Mon, 12 Jan 2026 16:01:04 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.89.95]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 732BC19560BA; Mon, 12 Jan 2026 16:01:01 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, Sun Shaojie , Chen Ridong , Waiman Long , Chen Ridong Subject: [PATCH cgroup/for-6.20 v5 2/5] cgroup/cpuset: Consistently compute effective_xcpus in update_cpumasks_hier() Date: Mon, 12 Jan 2026 11:00:18 -0500 Message-ID: <20260112160021.483561-3-longman@redhat.com> In-Reply-To: <20260112160021.483561-1-longman@redhat.com> References: <20260112160021.483561-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Since commit f62a5d39368e ("cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() handle remote partition"), the compute_effective_exclusive_cpumask() helper was extended to strip exclusive CPUs from siblings when computing effective_xcpus (cpuset.cpus.exclusive.effective). This helper was later renamed to compute_excpus() in commit 86bbbd1f33ab ("cpuset: Refactor exclusive CPU mask computation logic"). This helper is supposed to be used consistently to compute effective_xcpus. However, there is an exception within the callback critical section in update_cpumasks_hier() when exclusive_cpus of a valid partition root is empty. This can cause effective_xcpus value to differ depending on where exactly it is last computed. Fix this by using compute_excpus() in this case to give a consistent result. Signed-off-by: Waiman Long Reviewed-by: Chen Ridong --- kernel/cgroup/cpuset.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index da2b3b51630e..894131f47f78 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2168,17 +2168,13 @@ static void update_cpumasks_hier(struct cpuset *cs,= struct tmpmasks *tmp, spin_lock_irq(&callback_lock); cpumask_copy(cp->effective_cpus, tmp->new_cpus); cp->partition_root_state =3D new_prs; - if (!cpumask_empty(cp->exclusive_cpus) && (cp !=3D cs)) - compute_excpus(cp, cp->effective_xcpus); - /* - * Make sure effective_xcpus is properly set for a valid - * partition root. + * Need to compute effective_xcpus if either exclusive_cpus + * is non-empty or it is a valid partition root. */ - if ((new_prs > 0) && cpumask_empty(cp->exclusive_cpus)) - cpumask_and(cp->effective_xcpus, - cp->cpus_allowed, parent->effective_xcpus); - else if (new_prs < 0) + if ((new_prs > 0) || !cpumask_empty(cp->exclusive_cpus)) + compute_excpus(cp, cp->effective_xcpus); + if (new_prs <=3D 0) reset_partition_data(cp); spin_unlock_irq(&callback_lock); =20 --=20 2.52.0 From nobody Sat Feb 7 17:48:35 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28083364045 for ; Mon, 12 Jan 2026 16:01:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233686; cv=none; b=PlQ9PMPlRmtFUt0DtxyexYYvisELEEEl9nVXo8PgIV44IkgtsJUurFqc+5iINENqv04tvGA0NCFxYzeSIDJpmjoJlCcH8PDeowyN58qOJoXENjDSRmHpJ/W0U6lTPPbS/Vmtm3DHSKvg4d/J3vB2ZorxUp10YNhwqwXezGlHNp0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233686; c=relaxed/simple; bh=36mAFxRYaNY06HT5BGbXc/h0ywPNpQUoVyhIHkIpbeE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sNE9ercOzrByxjpuOCusf2dNKQe+RzT19BbN74rIHMtcQxYvxNMsWOBEgfCShhXJfzJDUKBF9DhRVVj08mskP2lKI72aTTrKsPcma7ihIqtFzc0rFSL7KUVNVZz7OOkSENG1V2Xx/yeSHVykyURQBtYhmLNcxVqc5+m6SZwuc6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Clq++tbN; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Clq++tbN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768233682; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3GpY/lMd+KwV4w59jZdkQcFRXk2EASk1qcLjb4w5/n4=; b=Clq++tbNZV/HPxC4g7nRztfX3wCMlvUZXrYvmd1HbuZWX50pa5qwIaGHkYcvHOZF8bygDQ vxF+IKUQEVePxKY+i7R74g1XYoDJSRPeyuUd9wpJuv5NE+lFrdqMihYpYKAcOruBc4dJwi lVlp+H0rzLugtKBQVCfyTGHhkD+FBqs= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-612-sShUt-sJP2mNyJ5CYUyRfw-1; Mon, 12 Jan 2026 11:01:17 -0500 X-MC-Unique: sShUt-sJP2mNyJ5CYUyRfw-1 X-Mimecast-MFC-AGG-ID: sShUt-sJP2mNyJ5CYUyRfw_1768233675 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 92A42195605F; Mon, 12 Jan 2026 16:01:07 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.89.95]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C133619560B2; Mon, 12 Jan 2026 16:01:04 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, Sun Shaojie , Chen Ridong , Waiman Long , Chen Ridong Subject: [PATCH cgroup/for-6.20 v5 3/5] cgroup/cpuset: Don't fail cpuset.cpus change in v2 Date: Mon, 12 Jan 2026 11:00:19 -0500 Message-ID: <20260112160021.483561-4-longman@redhat.com> In-Reply-To: <20260112160021.483561-1-longman@redhat.com> References: <20260112160021.483561-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition") introduced a new check to disallow the setting of a new cpuset.cpus.exclusive value that is a superset of a sibling's cpuset.cpus value so that there will at least be one CPU left in the sibling in case the cpuset becomes a valid partition root. This new check does have the side effect of failing a cpuset.cpus change that make it a subset of a sibling's cpuset.cpus.exclusive value. With v2, users are supposed to be allowed to set whatever value they want in cpuset.cpus without failure. To maintain this rule, the check is now restricted to only when cpuset.cpus.exclusive is being changed not when cpuset.cpus is changed. The cgroup-v2.rst doc file is also updated to reflect this change. Signed-off-by: Waiman Long Reviewed-by: Chen Ridong --- Documentation/admin-guide/cgroup-v2.rst | 8 +++---- kernel/cgroup/cpuset.c | 30 ++++++++++++------------- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-= guide/cgroup-v2.rst index 7f5b59d95fce..510df2461aff 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2561,10 +2561,10 @@ Cpuset Interface Files Users can manually set it to a value that is different from "cpuset.cpus". One constraint in setting it is that the list of CPUs must be exclusive with respect to "cpuset.cpus.exclusive" - of its sibling. If "cpuset.cpus.exclusive" of a sibling cgroup - isn't set, its "cpuset.cpus" value, if set, cannot be a subset - of it to leave at least one CPU available when the exclusive - CPUs are taken away. + and "cpuset.cpus.exclusive.effective" of its siblings. Another + constraint is that it cannot be a superset of "cpuset.cpus" + of its sibling in order to leave at least one CPU available to + that sibling when the exclusive CPUs are taken away. =20 For a parent cgroup, any one of its exclusive CPUs can only be distributed to at most one of its child cgroups. Having an diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 894131f47f78..4819ab429771 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -609,33 +609,31 @@ static inline bool cpusets_are_exclusive(struct cpuse= t *cs1, struct cpuset *cs2) =20 /** * cpus_excl_conflict - Check if two cpusets have exclusive CPU conflicts - * @cs1: first cpuset to check - * @cs2: second cpuset to check + * @trial: the trial cpuset to be checked + * @sibling: a sibling cpuset to be checked against + * @xcpus_changed: set if exclusive_cpus has been set * * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: * 1. If either cpuset is CPU exclusive, they must be mutually exclusive * 2. exclusive_cpus masks cannot intersect between cpusets - * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclu= sive CPUs + * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new e= xclusive CPUs */ -static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *c= s2) +static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset = *sibling, + bool xcpus_changed) { /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) - return !cpusets_are_exclusive(cs1, cs2); + if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling)) + return !cpusets_are_exclusive(trial, sibling); =20 /* Exclusive_cpus cannot intersect */ - if (cpumask_intersects(cs1->exclusive_cpus, cs2->exclusive_cpus)) + if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus)) return true; =20 - /* The cpus_allowed of one cpuset cannot be a subset of another cpuset's = exclusive_cpus */ - if (!cpumask_empty(cs1->cpus_allowed) && - cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus)) - return true; - - if (!cpumask_empty(cs2->cpus_allowed) && - cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus)) + /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exc= lusive_cpus */ + if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) && + cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus)) return true; =20 return false; @@ -672,6 +670,7 @@ static int validate_change(struct cpuset *cur, struct c= puset *trial) { struct cgroup_subsys_state *css; struct cpuset *c, *par; + bool xcpus_changed; int ret =3D 0; =20 rcu_read_lock(); @@ -728,10 +727,11 @@ static int validate_change(struct cpuset *cur, struct= cpuset *trial) * overlap. exclusive_cpus cannot overlap with each other if set. */ ret =3D -EINVAL; + xcpus_changed =3D !cpumask_equal(cur->exclusive_cpus, trial->exclusive_cp= us); cpuset_for_each_child(c, css, par) { if (c =3D=3D cur) continue; - if (cpus_excl_conflict(trial, c)) + if (cpus_excl_conflict(trial, c, xcpus_changed)) goto out; if (mems_excl_conflict(trial, c)) goto out; --=20 2.52.0 From nobody Sat Feb 7 17:48:35 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D49663644BB for ; Mon, 12 Jan 2026 16:01:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233690; cv=none; b=MuiKYoiI4pQZ7ejC2q8ktfaV2BKa2gs7y3SlQcD7KAzMnJdNUaSN0ggPrQZpOaxvxjXoiKu7bAdxQi7AIb2+Pkt9/idHLXVJPjS81WNiejRj6DUDFPWq2nC4YifoPbMO+1ZnMPjZW0nuKTsr+jfKaG//DfNdx6c1UxxKgk6RAQs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233690; c=relaxed/simple; bh=iEUuNOzkOfdQ3RVGSSmfziTJiiBq7HbU9WAhMUHqAKQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uYjf4YdaLFhIHYSLYSRTfZjd1kkWANf3qBRQHaxxGlRQJFRYi2+JtC3cXDZH7ACVYgaho6oxj1MG9Vb2m6neR5pYA4BXx7Ot9Sq4ilVgTvlG2Gd5w4QkCwRCM0HMdrifdgpS8HU4Wvak+ut31Dh+xy16+NNdX3d7QlB28q7SWa0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OijZLVkN; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OijZLVkN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768233684; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1lFg80blegymnftdKAfqJmqmdAeQ1B0ktIJ1IA31Qow=; b=OijZLVkNrQRMJ4B7a42F/gHxWEYltYj81Ibb+iHlXAocDlxzeqaLY46zMcoqmX9gkL1RnQ 7wbrr3MQIIESwPn4KpENe8PYAgHwGlAVN8jP7pkZGfdGWPGBPfVE8Ssh9LOVB9Z8VfH1O3 LO0lp2WiK01UYlTxcuMBfyMp0jCKvm0= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-218-qUliWW9eNj6-F_Yz42d3LA-1; Mon, 12 Jan 2026 11:01:14 -0500 X-MC-Unique: qUliWW9eNj6-F_Yz42d3LA-1 X-Mimecast-MFC-AGG-ID: qUliWW9eNj6-F_Yz42d3LA_1768233672 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 095A418007FA; Mon, 12 Jan 2026 16:01:10 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.89.95]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CD7AA1955F1C; Mon, 12 Jan 2026 16:01:07 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, Sun Shaojie , Chen Ridong , Waiman Long , Chen Ridong Subject: [PATCH cgroup/for-6.20 v5 4/5] cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict Date: Mon, 12 Jan 2026 11:00:20 -0500 Message-ID: <20260112160021.483561-5-longman@redhat.com> In-Reply-To: <20260112160021.483561-1-longman@redhat.com> References: <20260112160021.483561-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Currently, when setting a cpuset's cpuset.cpus to a value that conflicts with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition, the sibling's partition state becomes invalid. This is overly harsh and is probably not necessary. The cpuset.cpus.exclusive control file, if set, will override the cpuset.cpus of the same cpuset when creating a cpuset partition. So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up a partition. However, it cannot override a conflicting cpuset.cpus file in a sibling cpuset and the partition creation process will fail. This is inconsistent. That will also make using cpuset.cpus.exclusive less valuable as a tool to set up cpuset partitions as the users have to check if such a cpuset.cpus conflict exists or not. Fix these problems by making sure that once a cpuset.cpus.exclusive is set without failure, it will always be allowed to form a valid partition as long as at least one CPU can be granted from its parent irrespective of the state of the siblings' cpuset.cpus values. Of course, setting cpuset.cpus.exclusive will fail if it conflicts with the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value of a sibling. Partition can still be created by setting only cpuset.cpus without setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will be removed from its cpuset.cpus.exclusive.effective as long as there is still one or more CPUs left and can be granted from its parent. This CPU stripping is currently done in rm_siblings_excl_cpus(). The new code will now try its best to enable the creation of new partitions with only cpuset.cpus set without invalidating existing ones. However it is not guaranteed that all the CPUs requested in cpuset.cpus will be used in the new partition even when all these CPUs can be granted from the parent. This is similar to the fact that cpuset.cpus.effective may not be able to include all the CPUs requested in cpuset.cpus. In this case, the parent may not able to grant all the exclusive CPUs requested in cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have already been granted to other partitions earlier. With the creation of multiple sibling partitions by setting only cpuset.cpus, this does have the side effect that their exact cpuset.cpus.exclusive.effective settings will depend on the order of partition creation if there are conflicts. Due to the exclusive nature of the CPUs in a partition, it is not easy to make it fair other than the old behavior of invalidating all the conflicting partitions. For example, # echo "0-2" > A1/cpuset.cpus # echo "root" > A1/cpuset.cpus.partition # cat A1/cpuset.cpus.partition root # cat A1/cpuset.cpus.exclusive.effective 0-2 # echo "2-4" > B1/cpuset.cpus # echo "root" > B1/cpuset.cpus.partition # cat B1/cpuset.cpus.partition root # cat B1/cpuset.cpus.exclusive.effective 3-4 # cat B1/cpuset.cpus.effective 3-4 For users who want to be sure that they can get most of the CPUs they want, cpuset.cpus.exclusive should be used instead if they can set it successfully without failure. Setting cpuset.cpus.exclusive will guarantee that sibling conflicts from then onward is no longer possible. To make this change, we have to separate out the is_cpu_exclusive() check in cpus_excl_conflict() into a cgroup v1 only cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change() helper is now no longer needed and can be removed. Some existing tests in test_cpuset_prs.sh are updated and new ones are added to reflect the new behavior. The cgroup-v2.rst doc file is also updated the clarify what exclusive CPUs will be used when a partition is created. Reported-by: Sun Shaojie Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kyl= inos.cn/ Signed-off-by: Waiman Long Reviewed-by: Chen Ridong --- Documentation/admin-guide/cgroup-v2.rst | 33 +++++--- kernel/cgroup/cpuset-internal.h | 3 + kernel/cgroup/cpuset-v1.c | 19 +++++ kernel/cgroup/cpuset.c | 80 ++++++------------- .../selftests/cgroup/test_cpuset_prs.sh | 26 ++++-- 5 files changed, 90 insertions(+), 71 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-= guide/cgroup-v2.rst index 510df2461aff..28613c0e1c90 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2584,9 +2584,9 @@ Cpuset Interface Files of this file will always be a subset of its parent's "cpuset.cpus.exclusive.effective" if its parent is not the root cgroup. It will also be a subset of "cpuset.cpus.exclusive" - if it is set. If "cpuset.cpus.exclusive" is not set, it is - treated to have an implicit value of "cpuset.cpus" in the - formation of local partition. + if it is set. This file should only be non-empty if either + "cpuset.cpus.exclusive" is set or when the current cpuset is + a valid partition root. =20 cpuset.cpus.isolated A read-only and root cgroup only multiple values file. @@ -2618,13 +2618,22 @@ Cpuset Interface Files There are two types of partitions - local and remote. A local partition is one whose parent cgroup is also a valid partition root. A remote partition is one whose parent cgroup is not a - valid partition root itself. Writing to "cpuset.cpus.exclusive" - is optional for the creation of a local partition as its - "cpuset.cpus.exclusive" file will assume an implicit value that - is the same as "cpuset.cpus" if it is not set. Writing the - proper "cpuset.cpus.exclusive" values down the cgroup hierarchy - before the target partition root is mandatory for the creation - of a remote partition. + valid partition root itself. + + Writing to "cpuset.cpus.exclusive" is optional for the creation + of a local partition as its "cpuset.cpus.exclusive" file will + assume an implicit value that is the same as "cpuset.cpus" if it + is not set. Writing the proper "cpuset.cpus.exclusive" values + down the cgroup hierarchy before the target partition root is + mandatory for the creation of a remote partition. + + Not all the CPUs requested in "cpuset.cpus.exclusive" can be + used to form a new partition. Only those that were present + in its parent's "cpuset.cpus.exclusive.effective" control + file can be used. For partitions created without setting + "cpuset.cpus.exclusive", exclusive CPUs specified in sibling's + "cpuset.cpus.exclusive" or "cpuset.cpus.exclusive.effective" + also cannot be used. =20 Currently, a remote partition cannot be created under a local partition. All the ancestors of a remote partition root except @@ -2632,6 +2641,10 @@ Cpuset Interface Files =20 The root cgroup is always a partition root and its state cannot be changed. All other non-root cgroups start out as "member". + Even though the "cpuset.cpus.exclusive*" and "cpuset.cpus" + control files are not present in the root cgroup, they are + implicitly the same as the "/sys/devices/system/cpu/possible" + sysfs file. =20 When set to "root", the current cgroup is the root of a new partition or scheduling domain. The set of exclusive CPUs is diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-interna= l.h index e718a4f54360..e8e2683cb067 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -312,6 +312,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated); int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial); +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2); void cpuset1_init(struct cpuset *cs); void cpuset1_online_css(struct cgroup_subsys_state *css); int cpuset1_generate_sched_domains(cpumask_var_t **domains, @@ -326,6 +327,8 @@ static inline void cpuset1_hotplug_update_tasks(struct = cpuset *cs, bool cpus_updated, bool mems_updated) {} static inline int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial) { return 0; } +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, + struct cpuset *cs2) { return false; } static inline void cpuset1_init(struct cpuset *cs) {} static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {} static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains, diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index ecfea7800f0d..04124c38a774 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct= cpuset *trial) return ret; } =20 +/* + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU = conflicts + * to legacy (v1) + * @cs1: first cpuset to check + * @cs2: second cpuset to check + * + * Returns: true if CPU exclusivity conflict exists, false otherwise + * + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect. + */ +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2) +{ + if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2)) + return cpumask_intersects(cs1->cpus_allowed, + cs2->cpus_allowed); + + return false; +} + #ifdef CONFIG_PROC_PID_CPUSET /* * proc_cpuset_show() diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 4819ab429771..83fb83a86b4b 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -129,6 +129,17 @@ static bool force_sd_rebuild; * For simplicity, a local partition can be created under a local or remo= te * partition but a remote partition cannot have any partition root in its * ancestor chain except the cgroup root. + * + * A valid partition can be formed by setting exclusive_cpus or cpus_allo= wed + * if exclusive_cpus is not set. In the case of partition with empty + * exclusive_cpus, all the conflicting exclusive CPUs specified in the + * following cpumasks of sibling cpusets will be removed from its + * cpus_allowed in determining its effective_xcpus. + * - effective_xcpus + * - exclusive_cpus + * + * The "cpuset.cpus.exclusive" control file should be used for setting up + * partition if the users want to get as many CPUs as possible. */ #define PRS_MEMBER 0 #define PRS_ROOT 1 @@ -616,27 +627,25 @@ static inline bool cpusets_are_exclusive(struct cpuse= t *cs1, struct cpuset *cs2) * Returns: true if CPU exclusivity conflict exists, false otherwise * * Conflict detection rules: - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive - * 2. exclusive_cpus masks cannot intersect between cpusets - * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new e= xclusive CPUs + * o cgroup v1 + * See cpuset1_cpus_excl_conflict() + * o cgroup v2 + * - The exclusive_cpus values cannot overlap. + * - New exclusive_cpus cannot be a superset of a sibling's cpus_allowe= d. */ static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset = *sibling, bool xcpus_changed) { - /* If either cpuset is exclusive, check if they are mutually exclusive */ - if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling)) - return !cpusets_are_exclusive(trial, sibling); - - /* Exclusive_cpus cannot intersect */ - if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus)) - return true; + if (!cpuset_v2()) + return cpuset1_cpus_excl_conflict(trial, sibling); =20 /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exc= lusive_cpus */ if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) && cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus)) return true; =20 - return false; + /* Exclusive_cpus cannot intersect */ + return cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus); } =20 static inline bool mems_excl_conflict(struct cpuset *cs1, struct cpuset *c= s2) @@ -2312,43 +2321,6 @@ static enum prs_errcode validate_partition(struct cp= uset *cs, struct cpuset *tri return PERR_NONE; } =20 -static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *= trialcs, - struct tmpmasks *tmp) -{ - int retval; - struct cpuset *parent =3D parent_cs(cs); - - retval =3D validate_change(cs, trialcs); - - if ((retval =3D=3D -EINVAL) && cpuset_v2()) { - struct cgroup_subsys_state *css; - struct cpuset *cp; - - /* - * The -EINVAL error code indicates that partition sibling - * CPU exclusivity rule has been violated. We still allow - * the cpumask change to proceed while invalidating the - * partition. However, any conflicting sibling partitions - * have to be marked as invalid too. - */ - trialcs->prs_err =3D PERR_NOTEXCL; - rcu_read_lock(); - cpuset_for_each_child(cp, css, parent) { - struct cpumask *xcpus =3D user_xcpus(trialcs); - - if (is_partition_valid(cp) && - cpumask_intersects(xcpus, cp->effective_xcpus)) { - rcu_read_unlock(); - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp); - rcu_read_lock(); - } - } - rcu_read_unlock(); - retval =3D 0; - } - return retval; -} - /** * partition_cpus_change - Handle partition state changes due to CPU mask = updates * @cs: The target cpuset being modified @@ -2408,15 +2380,15 @@ static int update_cpumask(struct cpuset *cs, struct= cpuset *trialcs, if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed)) return 0; =20 - if (alloc_tmpmasks(&tmp)) - return -ENOMEM; - compute_trialcs_excpus(trialcs, cs); trialcs->prs_err =3D PERR_NONE; =20 - retval =3D cpus_allowed_validate_change(cs, trialcs, &tmp); + retval =3D validate_change(cs, trialcs); if (retval < 0) - goto out_free; + return retval; + + if (alloc_tmpmasks(&tmp)) + return -ENOMEM; =20 /* * Check all the descendants in update_cpumasks_hier() if @@ -2439,7 +2411,7 @@ static int update_cpumask(struct cpuset *cs, struct c= puset *trialcs, /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains, if necessary */ if (cs->partition_root_state) update_partition_sd_lb(cs, old_prs); -out_free: + free_tmpmasks(&tmp); return retval; } diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/test= ing/selftests/cgroup/test_cpuset_prs.sh index a17256d9f88a..ff4540b0490e 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -269,7 +269,7 @@ TEST_MATRIX=3D( " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:= 3|A3:3 A1:P0|A2:P2 3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:= 1|A3:2-3 A1:P0|A3:P2 2-3" " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:= 1|A3:2-3 A1:P0|A3:P2 2-3" - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:= 1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2" + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:= 1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2" " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:= P2 4-5" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:= 4 A3:P2|B1:P2 2-4" " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:= 4 A3:P2|B1:P2 2-4" @@ -318,7 +318,7 @@ TEST_MATRIX=3D( # Invalid to valid local partition direct transition tests " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1= :1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3" " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1= :1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3" - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:= 4-6 A1:P-2|B1:P0" + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:= 5-6 A1:P2|B1:P0" " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:= 4-6 A1:P2|B1:P0 0-3" =20 # Local partition invalidation tests @@ -388,10 +388,10 @@ TEST_MATRIX=3D( " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:= 1 A1:P0|A2:P-2" " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 = A1:P1|A2:P2 0-1|1" =20 - # A non-exclusive cpuset.cpus change will invalidate partition and its si= blings - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:= 2-3 A1:P-1|B1:P0" - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:= 2-3 A1:P-1|B1:P-1" - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:= 2-3 A1:P0|B1:P-1" + # A non-exclusive cpuset.cpus change will not invalidate its siblings par= tition. + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:= 3 A1:P1|B1:P0" + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|XA1= :0-1|B1:2-3 A1:P1|B1:P1" + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:= 2-3 A1:P0|B1:P1" =20 # cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subs= umed by it " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:= 4-5" @@ -417,6 +417,14 @@ TEST_MATRIX=3D( " CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0 A1:1|A2:2-= 4|B1:5-6 \ A1:P0|A2:P2:B1:P-1 2-4" =20 + # When multiple partitions with conflicting cpuset.cpus are created, the + # latter created ones will only get what are left of the available exclus= ive + # CPUs. + " C1-3:P1 . . . . . . C3-5:P1 0 A1:1-3|B1:= 4-5:XB1:4-5 A1:P1|B1:P1" + + # cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive + " C1-3:X1-3 . . C4-5 . . . C1-2 0 A1:1-3|B1:= 1-2" + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pst= ate ISOLCPUS # ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ---= --- -------- # Failure cases: @@ -427,7 +435,7 @@ TEST_MATRIX=3D( # Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejec= ted " C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3|B1:= 4-5" =20 - # cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive + # cpuset.cpus.exclusive cannot be set to a superset of sibling's cpuset.c= pus " C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:= 4-5" ) =20 @@ -477,6 +485,10 @@ REMOTE_TEST_MATRIX=3D( . . X1-2:P2 X4-5:P1 . X1-7:P2 p1:3|c11:1-2|c12:4:c22:5= -6 \ p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \ 1-2,4-6|1-2,5-6" + # c12 whose cpuset.cpus CPUs are all granted to c11 will become invalid p= artition + " C1-5:P1:S+ . C1-4:P1 C2-3 . . \ + . . . P1 . . p1:5|c11:1-4|c12:5 \ + p1:P1|c11:P1|c12:P-1" ) =20 # --=20 2.52.0 From nobody Sat Feb 7 17:48:35 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A6782ECE93 for ; Mon, 12 Jan 2026 16:01:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233685; cv=none; b=YZyrmEZfeUK4Fz7EV1v6iKxfJPgt460OkbLaKOcXhbjOB1KE9+3AxQ2bLCAfEM7IhtsNqv9DqwzkSnv1coOaAfnmHiQqs1mW/MZ52Dn+qsD+I41O+ITUCDMBABUPNSz2aGzOgC27+ZNsbWoBulIKE6Fbnw1YJJw//uJ8qAEtFcY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768233685; c=relaxed/simple; bh=ZkAVkHML8eDQGH5JEQzNWgXaq+XcoQKd1lrbZhLVRuY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Wsn0yk6Jz88lc8rXDoZGIec4pfGsOM5c+n4z+gDuiBZWV+BE7tVNj91z76M+iTHn40KxrfHzukvQ5YM20rXeqsRUuDCJmOYRi2nE4W/PUHkuQ1xpRcGMFBBJicYR/X8xXUFC7e314IKgm/jiwYJpDViCgTfVcrKE0Y8kyhVU/q4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=g2FjlkA+; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="g2FjlkA+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1768233682; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HdasYJPCoxVdOiqbOW3eT8JBsi/YaL/aD+daljmfraQ=; b=g2FjlkA+CDWeDY9hfRpAFmjaeTEbSJEW0bCP/dxPQtxD6K5YUUQryxFDbLoXhd2tjvOq5H Iy6Jy9VcVkJqKu8JgG+KlMnsNKJ9B8a/D8ZfiGpGO0ucsARfBzRJq67/zezW+4I2iJW2vz k2AvC48IC4IqO76BiE1PrG+43a38a8s= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-349-m-yLkn5dNTGTuFl73l5P7A-1; Mon, 12 Jan 2026 11:01:19 -0500 X-MC-Unique: m-yLkn5dNTGTuFl73l5P7A-1 X-Mimecast-MFC-AGG-ID: m-yLkn5dNTGTuFl73l5P7A_1768233676 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 649061954204; Mon, 12 Jan 2026 16:01:12 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.89.95]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4136919560BA; Mon, 12 Jan 2026 16:01:10 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, Sun Shaojie , Chen Ridong , Waiman Long , Chen Ridong Subject: [PATCH cgroup/for-6.20 v5 5/5] cgroup/cpuset: Move the v1 empty cpus/mems check to cpuset1_validate_change() Date: Mon, 12 Jan 2026 11:00:21 -0500 Message-ID: <20260112160021.483561-6-longman@redhat.com> In-Reply-To: <20260112160021.483561-1-longman@redhat.com> References: <20260112160021.483561-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" As stated in commit 1c09b195d37f ("cpuset: fix a regression in validating config change"), it is not allowed to clear masks of a cpuset if there're tasks in it. This is specific to v1 since empty "cpuset.cpus" or "cpuset.mems" will cause the v2 cpuset to inherit the effective CPUs or memory nodes from its parent. So it is OK to have empty cpus or mems even if there are tasks in the cpuset. Move this empty cpus/mems check in validate_change() to cpuset1_validate_change() to allow more flexibility in setting cpus or mems in v2. cpuset_is_populated() needs to be moved into cpuset-internal.h as it is needed by the empty cpus/mems checking code. Also add a test case to test_cpuset_prs.sh to verify that. Reported-by: Chen Ridong Closes: https://lore.kernel.org/lkml/7a3ec392-2e86-4693-aa9f-1e668a668b9c@h= uaweicloud.com/ Signed-off-by: Waiman Long Reviewed-by: Chen Ridong --- kernel/cgroup/cpuset-internal.h | 9 ++++++++ kernel/cgroup/cpuset-v1.c | 14 +++++++++++ kernel/cgroup/cpuset.c | 23 ------------------- .../selftests/cgroup/test_cpuset_prs.sh | 3 +++ 4 files changed, 26 insertions(+), 23 deletions(-) diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-interna= l.h index e8e2683cb067..fd7d19842ded 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -260,6 +260,15 @@ static inline int nr_cpusets(void) return static_key_count(&cpusets_enabled_key.key) + 1; } =20 +static inline bool cpuset_is_populated(struct cpuset *cs) +{ + lockdep_assert_cpuset_lock_held(); + + /* Cpusets in the process of attaching should be considered as populated = */ + return cgroup_is_populated(cs->css.cgroup) || + cs->attach_in_progress; +} + /** * cpuset_for_each_child - traverse online children of a cpuset * @child_cs: loop cursor pointing to the current child diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c index 04124c38a774..7a23b9e8778f 100644 --- a/kernel/cgroup/cpuset-v1.c +++ b/kernel/cgroup/cpuset-v1.c @@ -368,6 +368,20 @@ int cpuset1_validate_change(struct cpuset *cur, struct= cpuset *trial) if (par && !is_cpuset_subset(trial, par)) goto out; =20 + /* + * Cpusets with tasks - existing or newly being attached - can't + * be changed to have empty cpus_allowed or mems_allowed. + */ + ret =3D -ENOSPC; + if (cpuset_is_populated(cur)) { + if (!cpumask_empty(cur->cpus_allowed) && + cpumask_empty(trial->cpus_allowed)) + goto out; + if (!nodes_empty(cur->mems_allowed) && + nodes_empty(trial->mems_allowed)) + goto out; + } + ret =3D 0; out: return ret; diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 83fb83a86b4b..a3dbca125588 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -370,15 +370,6 @@ static inline bool is_in_v2_mode(void) (cpuset_cgrp_subsys.root->flags & CGRP_ROOT_CPUSET_V2_MODE); } =20 -static inline bool cpuset_is_populated(struct cpuset *cs) -{ - lockdep_assert_held(&cpuset_mutex); - - /* Cpusets in the process of attaching should be considered as populated = */ - return cgroup_is_populated(cs->css.cgroup) || - cs->attach_in_progress; -} - /** * partition_is_populated - check if partition has tasks * @cs: partition root to be checked @@ -695,20 +686,6 @@ static int validate_change(struct cpuset *cur, struct = cpuset *trial) =20 par =3D parent_cs(cur); =20 - /* - * Cpusets with tasks - existing or newly being attached - can't - * be changed to have empty cpus_allowed or mems_allowed. - */ - ret =3D -ENOSPC; - if (cpuset_is_populated(cur)) { - if (!cpumask_empty(cur->cpus_allowed) && - cpumask_empty(trial->cpus_allowed)) - goto out; - if (!nodes_empty(cur->mems_allowed) && - nodes_empty(trial->mems_allowed)) - goto out; - } - /* * We can't shrink if we won't have enough room for SCHED_DEADLINE * tasks. This check is not done when scheduling is disabled as the diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/test= ing/selftests/cgroup/test_cpuset_prs.sh index ff4540b0490e..5dff3ad53867 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -425,6 +425,9 @@ TEST_MATRIX=3D( # cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive " C1-3:X1-3 . . C4-5 . . . C1-2 0 A1:1-3|B1:= 1-2" =20 + # cpuset.cpus can become empty with task in it as it inherits parent's ef= fective CPUs + " C1-3:S+ C2 . . . T:C . . 0 A1:1-3|A2:= 1-3" + # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pst= ate ISOLCPUS # ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ---= --- -------- # Failure cases: --=20 2.52.0