From nobody Mon Jun 8 09:48:37 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FCB03B774F for ; Fri, 29 May 2026 21:28:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090136; cv=none; b=jqgawYWwDBWt2txQUl7bGaa7cMl5qfcsR48eWdBJIMeBfOxmwnwOm5Oaojxp5mwHFoFy70kIp309pUQas1IiHR08ohXu/T/4GsqFq7KwR6ffnmdr3l4A0hV7d+Qui6d2JQDcCIaJ32aKcmGWnlq9hz0kXwPN6qYhc98MNT6D9Z0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090136; c=relaxed/simple; bh=NTEXf8aQYEc8MJuQHzKBIvYng4oAmq/M8poKbVSsfVM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lq54v4Yy6hOm0fT6Gpu2eZw9PIXLL0kVjmc52b3zgXjV3TvZAo2dq8NZrno6uZ1rExpn0QkRQtnRyCeIfewgVjww6ruX0QUW+AS/rm6mmVWdbrT1fKE6ZgNJhP79MSogeiME0rnnfn/Cmlro/bwdv4Evzt6qIw0lLr0J9FwiCbg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=F0+lIXiy; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="F0+lIXiy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780090134; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hGjmwF10xGyuxQZDmJerw4+PM99FzisxArCfH3Pc0oo=; b=F0+lIXiyoe8VZHbwzESfQXW54axb9noQdJbMdOOo1j2GcYnIS8Xu2dF/AjYXpzZxT9mUXQ 6Oo+6VPsKMyu5ldw5L2CMQ6nBy1IVFOOzV6Y/wg+jZonhewN7VD3JBTH8B3KcoxCcRXeCl UgvbQIytONuTKeDBQSfkxQWBEd6YggI= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-628-AeP6LoyiNGyPYM6TcTgEfQ-1; Fri, 29 May 2026 17:28:49 -0400 X-MC-Unique: AeP6LoyiNGyPYM6TcTgEfQ-1 X-Mimecast-MFC-AGG-ID: AeP6LoyiNGyPYM6TcTgEfQ_1780090127 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A4E4319560B2; Fri, 29 May 2026 21:28:46 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.64.54]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5B7AF19560B0; Fri, 29 May 2026 21:28:44 +0000 (UTC) From: Waiman Long To: Chen Ridong , Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Peter Zijlstra Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Tomlin , Guopeng Zhang , Waiman Long Subject: [PATCH-next v4 1/6] cgroup/cpuset: Fix node inconsistencies between cpuset_update_tasks_nodemask() and cpuset_attach() Date: Fri, 29 May 2026 17:21:03 -0400 Message-ID: <20260529212108.120506-2-longman@redhat.com> In-Reply-To: <20260529212108.120506-1-longman@redhat.com> References: <20260529212108.120506-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Whenever memory node mask is changed, there are 4 places where the node mask has to be updated or used. 1) task's node mask via cpuset_change_task_nodemask() 2) memory policy binding via mpol_rebind_mm() 3) if memory migration is enabled, migrate from old_mems_allowed to the new node mask via cpuset_migrate_mm(). 4) setting old_mems_allowed These memory actions are done in cpuset_update_tasks_nodemask() and cpuset_attach(). However there are inconsistencies in what node masks are being used in these 2 functions. In cpuset_update_tasks_nodemask(), - cpuset_change_task_nodemask(): guarantee_online_mems() - mpol_rebind_mm(): mems_allowed - cpuset_migrate_mm(): guarantee_online_mems() - old_mems_allowed: guarantee_online_mems() In cpuset_attach(), - cpuset_change_task_nodemask(): guarantee_online_mems() - mpol_rebind_mm(): effective_mems - cpuset_migrate_mm(): effective_mems - old_mems_allowed: effective_mems These inconsistencies dates back to quite a long time ago and it is hard to say what should be the correct values. The guarantee_online_mems() function returns a node mask from current or an ancestor cpuset that is a subset of node_states[N_MEMORY]. Nodes in node_states[N_MEMORY] are all online, i.e. in node_states[N_ONLINE]. However, node in node_states[N_ONLINE] may not have memory. So node_states[N_MEMORY] should be a subset of node_states[N_ONLINE]. The guarantee_online_mems() function should only be useful for v1 where mems_allowed is the same as effective_mems. With v2, the memory nodes in effective_mems should always be a subset of node_states[N_MEMORY], so guarantee_online_mems() should just return cs->effective_mems. Let use the following setup for both of them and make them consistent. - cpuset_change_task_nodemask(): guarantee_online_mems() - mpol_rebind_mm(): effective_mems - cpuset_migrate_mm(): guarantee_online_mems() - old_mems_allowed: guarantee_online_mems() So for v2, it is effectively all effective_mems. For v1, mpol_rebind_mm() uses cpus_allowed which may differ from what guarantee_online_mems() returns. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 51327333980a..961427cd83a5 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2615,6 +2615,13 @@ static void *cpuset_being_rebound; * Iterate through each task of @cs updating its mems_allowed to the * effective cpuset's. As this function is called with cpuset_mutex held, * cpuset membership stays stable. + * + * - cpuset_change_task_nodemask(): guarantee_online_mems() + * - mpol_rebind_mm(): effective_mems + * - cpuset_migrate_mm(): guarantee_online_mems() + * - old_mems_allowed: guarantee_online_mems() + * + * For v2, guarantee_online_mems() should just return effective_mems. */ void cpuset_update_tasks_nodemask(struct cpuset *cs) { @@ -2624,7 +2631,10 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs) =20 cpuset_being_rebound =3D cs; /* causes mpol_dup() rebind */ =20 - guarantee_online_mems(cs, &newmems); + if (cpuset_v2()) + newmems =3D cs->effective_mems; + else + guarantee_online_mems(cs, &newmems); =20 /* * The mpol_rebind_mm() call takes mmap_lock, which we couldn't @@ -2649,7 +2659,7 @@ void cpuset_update_tasks_nodemask(struct cpuset *cs) =20 migrate =3D is_memory_migrate(cs); =20 - mpol_rebind_mm(mm, &cs->mems_allowed); + mpol_rebind_mm(mm, &cs->effective_mems); if (migrate) cpuset_migrate_mm(mm, &cs->old_mems_allowed, &newmems); else @@ -2713,6 +2723,8 @@ static void update_nodemasks_hier(struct cpuset *cs, = nodemask_t *new_mems) =20 WARN_ON(!is_in_v2_mode() && !nodes_equal(cp->mems_allowed, cp->effective_mems)); + WARN_ON(cpuset_v2() && + !nodes_subset(cp->effective_mems, node_states[N_MEMORY])); =20 cpuset_update_tasks_nodemask(cp); =20 @@ -3147,17 +3159,18 @@ static void cpuset_attach(struct cgroup_taskset *ts= et) =20 /* * In the default hierarchy, enabling cpuset in the child cgroups - * will trigger a number of cpuset_attach() calls with no change - * in effective cpus and mems. In that case, we can optimize out - * by skipping the task iteration and update. + * will trigger a cpuset_attach() call with no change in effective cpus + * and mems. In that case, we can optimize out by skipping the task + * iteration and update. */ - if (cpuset_v2() && !cpus_updated && !mems_updated) { + if (cpuset_v2()) { cpuset_attach_nodemask_to =3D cs->effective_mems; - goto out; + if (!cpus_updated && !mems_updated) + goto out; + } else { + guarantee_online_mems(cs, &cpuset_attach_nodemask_to); } =20 - guarantee_online_mems(cs, &cpuset_attach_nodemask_to); - cgroup_taskset_for_each(task, css, tset) cpuset_attach_task(cs, task); =20 @@ -3167,7 +3180,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) * if there is no change in effective_mems and CS_MEMORY_MIGRATE is * not set. */ - cpuset_attach_nodemask_to =3D cs->effective_mems; if (!is_memory_migrate(cs) && !mems_updated) goto out; =20 @@ -3175,7 +3187,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) struct mm_struct *mm =3D get_task_mm(leader); =20 if (mm) { - mpol_rebind_mm(mm, &cpuset_attach_nodemask_to); + mpol_rebind_mm(mm, &cs->effective_mems); =20 /* * old_mems_allowed is the same with mems_allowed --=20 2.54.0 From nobody Mon Jun 8 09:48:37 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F34203B5847 for ; Fri, 29 May 2026 21:28:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090138; cv=none; b=kvqJeAfYB7Rsw3sz+PHCjMUy14B7FD6kfk8Rg903a1mDbHAkNoogJJds8AuMqylnq7VFn+oW7c6WBwl9DUkBg5DGJ8EU8y/4xTsN5Y9KF2PmbhBU0KQKkkoE3pqZLbRvjvIDXCXaQoytA9FWQ+Frx1xZXwfTWqgJS4tpvGcCZJo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090138; c=relaxed/simple; bh=dauKbPjpI0fNqAAsQtAR1VwjFqQ0hhG0aIcjoX2j3b8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XwDVCOpqzGeQqIJ+1s7ROBfbzAQGANhMFTFh60YHtIQlA736F4K3UIGWUjBLhpbc1xrn12KoBlbmjW+FkLpw3LfVc2FMtRHJG4G7L7TJIzdaGRtQZhPLvTHfBQaTkJM+MRLRuHWpgr0ZN7+vFa3PQOdlpw+MayECRAVcUoYAFSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ejdpb6jA; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ejdpb6jA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780090136; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pGtNVc/HlFeIKiY6aBuwFV5+Ml3YH/UwqpIoqUttfKI=; b=ejdpb6jA+S2LB0pOUm6EMpikWzElWwyIgtYGhUFP3Ypi7Un6hB9xw12kFgULdwjP55TTf0 S6GhbUj9+CmHveXjTAnedkJ9sL3gT/e4QEZnDOqqurhb+VAX4CAsb23i5mhiAdSMo3p+Qj bznpvoGi2LxMLb6JNQCN9pPEMiLUtlI= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-347-NrVAW1A9OU6lM4osxMxLKQ-1; Fri, 29 May 2026 17:28:50 -0400 X-MC-Unique: NrVAW1A9OU6lM4osxMxLKQ-1 X-Mimecast-MFC-AGG-ID: NrVAW1A9OU6lM4osxMxLKQ_1780090129 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E304619560BB; Fri, 29 May 2026 21:28:48 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.64.54]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C448F19560A3; Fri, 29 May 2026 21:28:46 +0000 (UTC) From: Waiman Long To: Chen Ridong , Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Peter Zijlstra Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Tomlin , Guopeng Zhang , Waiman Long Subject: [PATCH-next v4 2/6] cgroup/cpuset: Add a cpuset_reserve_dl_bw() helper Date: Fri, 29 May 2026 17:21:04 -0400 Message-ID: <20260529212108.120506-3-longman@redhat.com> In-Reply-To: <20260529212108.120506-1-longman@redhat.com> References: <20260529212108.120506-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Extract the DL bandwidth allocation code in cpuset_attach() to a new cpuset_reserve_dl_bw() helper to simplify code. No functional change is expected. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 53 ++++++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 23 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 961427cd83a5..a6f191b48529 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2992,6 +2992,25 @@ static int cpuset_can_attach_check(struct cpuset *cs) return 0; } =20 +static int cpuset_reserve_dl_bw(struct cpuset *cs) +{ + int cpu, ret; + + if (!cs->sum_migrate_dl_bw) + return 0; + + cpu =3D cpumask_any_and(cpu_active_mask, cs->effective_cpus); + if (unlikely(cpu >=3D nr_cpu_ids)) + return -EINVAL; + + ret =3D dl_bw_alloc(cpu, cs->sum_migrate_dl_bw); + if (ret) + return ret; + + cs->dl_bw_cpu =3D cpu; + return 0; +} + static void reset_migrate_dl_data(struct cpuset *cs) { cs->nr_migrate_dl_tasks =3D 0; @@ -3006,7 +3025,7 @@ static int cpuset_can_attach(struct cgroup_taskset *t= set) struct cpuset *cs, *oldcs; struct task_struct *task; bool setsched_check; - int cpu, ret; + int ret; =20 /* used later by cpuset_attach() */ cpuset_attach_old_cs =3D task_cs(cgroup_taskset_first(tset, &css)); @@ -3062,31 +3081,19 @@ static int cpuset_can_attach(struct cgroup_taskset = *tset) } } =20 - if (!cs->sum_migrate_dl_bw) - goto out_success; - - cpu =3D cpumask_any_and(cpu_active_mask, cs->effective_cpus); - if (unlikely(cpu >=3D nr_cpu_ids)) { - ret =3D -EINVAL; - goto out_unlock; - } - - ret =3D dl_bw_alloc(cpu, cs->sum_migrate_dl_bw); - if (ret) - goto out_unlock; - - cs->dl_bw_cpu =3D cpu; - -out_success: - /* - * Mark attach is in progress. This makes validate_change() fail - * changes which zero cpus/mems_allowed. - */ - cs->attach_in_progress++; + ret =3D cpuset_reserve_dl_bw(cs); =20 out_unlock: - if (ret) + if (ret) { reset_migrate_dl_data(cs); + } else { + /* + * Mark attach is in progress. This makes validate_change() fail + * changes which zero cpus/mems_allowed. + */ + cs->attach_in_progress++; + } + mutex_unlock(&cpuset_mutex); return ret; } --=20 2.54.0 From nobody Mon Jun 8 09:48:37 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F34D63B585F for ; Fri, 29 May 2026 21:28:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090138; cv=none; b=oaMmZvLr+BghGqVsz6md+X1TcwzIUN+IO5NksZFSpSwVFTB+0rtxqV/CIo7fsYNT1KFg+tI52mdDQe6ExENa0oLcmfx53l9xHZr85KP79A32r4dnIe0IW+Seie869Yf+R3Yuejt4bMG7q1MvYQ93BS6iHHbTh++CGY/ZGy78GfA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090138; c=relaxed/simple; bh=QH1anqwkqXCmuzjztNL2pcLYHQUx6crv4qHp/ijamjc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=W3GnMmPhP5Eh8GqGu5MLF5d1vso3D7GxDitMLkGrOwYIq98mVfyKGrRunoh1NsRaOaOoJ3YlOMkbjlI2H8rQFlrn/AKF7z6uILPsCJVsgL3qzSLXOvWX12cDirnqc0o1Mp9U7qtgrcjO/wCk5sz8vgaemcYnqNtClKBWer1uRpQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QFVPKdFJ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QFVPKdFJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780090136; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9SMw7xQ5BsFrHaaGmk/pV3w+l6rWVP0mbQPQI/2QdkM=; b=QFVPKdFJRXsIQlWTOmxkF9CKwXDRFqjE12zumhxWaUpD+vSIFbcfbK3Ze4vwzYsXfyKuuJ 1t8j9tIikwNO8ltI/tfHHGVu+67D8ePZujMVFPmpvQYKRbbD0SELDEgpHTE2FwwMt8xBbK oURY60Q4zSwqveR7T3ZNlTmM6fNB91U= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-515-aeAr4rhzPsiRxzaBeVNT4w-1; Fri, 29 May 2026 17:28:52 -0400 X-MC-Unique: aeAr4rhzPsiRxzaBeVNT4w-1 X-Mimecast-MFC-AGG-ID: aeAr4rhzPsiRxzaBeVNT4w_1780090131 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F344B18005BF; Fri, 29 May 2026 21:28:50 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.64.54]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3A95C19560B2; Fri, 29 May 2026 21:28:49 +0000 (UTC) From: Waiman Long To: Chen Ridong , Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Peter Zijlstra Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Tomlin , Guopeng Zhang , Waiman Long Subject: [PATCH-next v4 3/6] cgroup/cpuset: Expand the scope of cpuset_can_attach_check() Date: Fri, 29 May 2026 17:21:05 -0400 Message-ID: <20260529212108.120506-4-longman@redhat.com> In-Reply-To: <20260529212108.120506-1-longman@redhat.com> References: <20260529212108.120506-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" Expand the scope of cpuset_can_attach_check() by including the setting of setsched flag inside cpuset_can_attach_check() with the new @oldcs and @psetsched argument. As cpuset_can_attach_check() is also called from cpuset_can_fork(), set the new arguments to NULL from that caller. While at it, expose the source and destination cpuset cpu/memory check results in the new attach_cpus_updated and attach_mems_updated static flags so that these flags can be used directly from cpuset_attach() without the need to do the same computations again. Two new global attach related flags are added (attach_cpus_updated & attach_mems_updated) which are set to indicate that CPUs or memory nodes are updated. These 2 flags are set in cpuset_can_attach() and are used in cpuset_attach() for optimization. Since cpuset_mutex will be released between the 2 calls, it is possible that an intervening cpuset action may change the CPU or node mask of the relevant cpusets, so check is added to set these flags if the effective_cpus or effective_mems of those cpusets is changed. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 90 ++++++++++++++++++++++++++++-------------- 1 file changed, 60 insertions(+), 30 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index a6f191b48529..0f93f3d84494 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1108,6 +1108,14 @@ enum partition_cmd { static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *= cs, struct tmpmasks *tmp); =20 +/* + * cpuset_can_attach() and cpuset_attach() specific internal data + * Protected by cpuset_mutex + */ +static struct cpuset *cpuset_attach_old_cs; +static bool attach_cpus_updated; +static bool attach_mems_updated; + /* * Update partition exclusive flag * @@ -1192,6 +1200,8 @@ static void reset_partition_data(struct cpuset *cs) } if (!cpumask_and(cs->effective_cpus, parent->effective_cpus, cs->cpus_all= owed)) cpumask_copy(cs->effective_cpus, parent->effective_cpus); + if (cs->attach_in_progress) + attach_cpus_updated =3D true; } =20 /* @@ -1242,6 +1252,8 @@ static void partition_xcpus_add(int new_prs, struct c= puset *parent, xcpus); =20 cpumask_andnot(parent->effective_cpus, parent->effective_cpus, xcpus); + if (parent->attach_in_progress) + attach_cpus_updated =3D true; } =20 /* @@ -1269,6 +1281,8 @@ static void partition_xcpus_del(int old_prs, struct c= puset *parent, =20 cpumask_or(parent->effective_cpus, parent->effective_cpus, xcpus); cpumask_and(parent->effective_cpus, parent->effective_cpus, cpu_active_ma= sk); + if (parent->attach_in_progress) + attach_cpus_updated =3D true; } =20 /* @@ -2217,6 +2231,8 @@ static void update_cpumasks_hier(struct cpuset *cs, s= truct tmpmasks *tmp, if (new_prs <=3D 0) reset_partition_data(cp); spin_unlock_irq(&callback_lock); + if (cp->attach_in_progress) + attach_cpus_updated =3D true; =20 notify_partition_change(cp, old_prs); =20 @@ -2720,6 +2736,8 @@ static void update_nodemasks_hier(struct cpuset *cs, = nodemask_t *new_mems) spin_lock_irq(&callback_lock); cp->effective_mems =3D *new_mems; spin_unlock_irq(&callback_lock); + if (cp->attach_in_progress) + attach_mems_updated =3D true; =20 WARN_ON(!is_in_v2_mode() && !nodes_equal(cp->mems_allowed, cp->effective_mems)); @@ -2976,19 +2994,48 @@ static int update_prstate(struct cpuset *cs, int ne= w_prs) return 0; } =20 -static struct cpuset *cpuset_attach_old_cs; - /* * Check to see if a cpuset can accept a new task * For v1, cpus_allowed and mems_allowed can't be empty. * For v2, effective_cpus can't be empty. * Note that in v1, effective_cpus =3D cpus_allowed. + * + * Also set the boolean flag passed in by @psetsched depending on if + * security_task_setscheduler() call is needed and @oldcs is not NULL. */ -static int cpuset_can_attach_check(struct cpuset *cs) +static int cpuset_can_attach_check(struct cpuset *cs, struct cpuset *oldcs, + bool *psetsched) { if (cpumask_empty(cs->effective_cpus) || (!is_in_v2_mode() && nodes_empty(cs->mems_allowed))) return -ENOSPC; + + if (!oldcs) + return 0; + + /* + * Update attach specific data + */ + attach_cpus_updated =3D !cpumask_equal(cs->effective_cpus, oldcs->effecti= ve_cpus); + attach_mems_updated =3D !nodes_equal(cs->effective_mems, oldcs->effective= _mems); + + /* + * Skip rights over task setsched check in v2 when nothing changes, + * migration permission derives from hierarchy ownership in + * cgroup_procs_write_permission()). + */ + *psetsched =3D !cpuset_v2() || attach_cpus_updated || attach_mems_updated; + + /* + * A v1 cpuset with tasks will have no CPU left only when CPU hotplug + * brings the last online CPU offline as users are not allowed to empty + * cpuset.cpus when there are active tasks inside. When that happens, + * we should allow tasks to migrate out without security check to make + * sure they will be able to run after migration. + */ + if (!is_in_v2_mode() && cpumask_empty(oldcs->effective_cpus)) + *psetsched =3D false; + return 0; } =20 @@ -3035,29 +3082,10 @@ static int cpuset_can_attach(struct cgroup_taskset = *tset) mutex_lock(&cpuset_mutex); =20 /* Check to see if task is allowed in the cpuset */ - ret =3D cpuset_can_attach_check(cs); + ret =3D cpuset_can_attach_check(cs, oldcs, &setsched_check); if (ret) goto out_unlock; =20 - /* - * Skip rights over task setsched check in v2 when nothing changes, - * migration permission derives from hierarchy ownership in - * cgroup_procs_write_permission()). - */ - setsched_check =3D !cpuset_v2() || - !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus) || - !nodes_equal(cs->effective_mems, oldcs->effective_mems); - - /* - * A v1 cpuset with tasks will have no CPU left only when CPU hotplug - * brings the last online CPU offline as users are not allowed to empty - * cpuset.cpus when there are active tasks inside. When that happens, - * we should allow tasks to migrate out without security check to make - * sure they will be able to run after migration. - */ - if (!is_in_v2_mode() && cpumask_empty(oldcs->effective_cpus)) - setsched_check =3D false; - cgroup_taskset_for_each(task, css, tset) { ret =3D task_can_attach(task); if (ret) @@ -3152,7 +3180,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) struct cgroup_subsys_state *css; struct cpuset *cs; struct cpuset *oldcs =3D cpuset_attach_old_cs; - bool cpus_updated, mems_updated; bool queue_task_work =3D false; =20 cgroup_taskset_first(tset, &css); @@ -3160,9 +3187,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) =20 lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ mutex_lock(&cpuset_mutex); - cpus_updated =3D !cpumask_equal(cs->effective_cpus, - oldcs->effective_cpus); - mems_updated =3D !nodes_equal(cs->effective_mems, oldcs->effective_mems); =20 /* * In the default hierarchy, enabling cpuset in the child cgroups @@ -3172,7 +3196,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) */ if (cpuset_v2()) { cpuset_attach_nodemask_to =3D cs->effective_mems; - if (!cpus_updated && !mems_updated) + if (!attach_cpus_updated && !attach_mems_updated) goto out; } else { guarantee_online_mems(cs, &cpuset_attach_nodemask_to); @@ -3187,7 +3211,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) * if there is no change in effective_mems and CS_MEMORY_MIGRATE is * not set. */ - if (!is_memory_migrate(cs) && !mems_updated) + if (!is_memory_migrate(cs) && !attach_mems_updated) goto out; =20 cgroup_taskset_for_each_leader(leader, css, tset) { @@ -3602,7 +3626,7 @@ static int cpuset_can_fork(struct task_struct *task, = struct css_set *cset) mutex_lock(&cpuset_mutex); =20 /* Check to see if task is allowed in the cpuset */ - ret =3D cpuset_can_attach_check(cs); + ret =3D cpuset_can_attach_check(cs, NULL, NULL); if (ret) goto out_unlock; =20 @@ -3742,6 +3766,8 @@ hotplug_update_tasks(struct cpuset *cs, cpumask_copy(cs->effective_cpus, new_cpus); cs->effective_mems =3D *new_mems; spin_unlock_irq(&callback_lock); + if (cs->attach_in_progress) + attach_cpus_updated =3D attach_mems_updated =3D true; =20 if (cpus_updated) cpuset_update_tasks_cpumask(cs, new_cpus); @@ -3927,6 +3953,8 @@ static void cpuset_handle_hotplug(void) } cpumask_copy(top_cpuset.effective_cpus, &new_cpus); spin_unlock_irq(&callback_lock); + if (top_cpuset.attach_in_progress) + attach_cpus_updated =3D true; /* we don't mess with cpumasks of tasks in top_cpuset */ } =20 @@ -3937,6 +3965,8 @@ static void cpuset_handle_hotplug(void) top_cpuset.mems_allowed =3D new_mems; top_cpuset.effective_mems =3D new_mems; spin_unlock_irq(&callback_lock); + if (top_cpuset.attach_in_progress) + attach_mems_updated =3D true; cpuset_update_tasks_nodemask(&top_cpuset); } =20 --=20 2.54.0 From nobody Mon Jun 8 09:48:37 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 681483A63F2 for ; Fri, 29 May 2026 21:28:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090145; cv=none; b=nRF1lz8XVFtjMDKzHyRpq8vH2lkIZa/6DBeAJ4D71Qxw5Z4819G6pd0HJCkajStt66uhc3NhsFpmPeR8Np9UWseLvX/NQFl0fw3IsWr0idxfjBnVV/Hq9CIqc+HJx+C9vrZ5nD7n1T76X486IM9nXqp/+XpMtn7N85R6LkOCkNA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090145; c=relaxed/simple; bh=LLl8YSsAIOLYo3nmxGvTC6JxMFgJ1qBEOowVfPSxbTU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UOT/JIzGoa1NpHIYoFmvunzLC9VJf0505Tixqa+WMXBm2L9DzaY3Tp+GYPWK11zQux8g0lnkqU5ZueEVCc2wvNcvqXDZK9kzihWt/MdC9C1uCIhPPJi0F79zM5CvTXEn6P/LQd1adHHTvlAldzAojO00IJii0ACkWwK/eLL0W/o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fAdCyinI; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fAdCyinI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780090138; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=no9433L2vTEzwg3X66k17tcFeYU0ZZ98IuFscRMp3x4=; b=fAdCyinIiFXvngjfS68phNx+8v6VWntymKJ7/jke/90cUJHfNEj9a+7XENDbM5zFgeXvUi eQQTebtOm1yTNzO+f6NOa1bqQmU8voQdULtJ2S9TzJV7nU/uZ/VgrormA4SH/WWaASzoYc +vhEHUpHjQUk0kbjL/ckFIDIx+jn3vg= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-54-3wmb1PK7MiGmJyc436yqig-1; Fri, 29 May 2026 17:28:55 -0400 X-MC-Unique: 3wmb1PK7MiGmJyc436yqig-1 X-Mimecast-MFC-AGG-ID: 3wmb1PK7MiGmJyc436yqig_1780090133 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 49F0F18005B9; Fri, 29 May 2026 21:28:53 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.64.54]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 355B419560A3; Fri, 29 May 2026 21:28:51 +0000 (UTC) From: Waiman Long To: Chen Ridong , Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Peter Zijlstra Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Tomlin , Guopeng Zhang , Waiman Long , Ridong Chen Subject: [PATCH-next v4 4/6] cgroup/cpuset: Made cpuset_attach_old_cs track task group leaders Date: Fri, 29 May 2026 17:21:06 -0400 Message-ID: <20260529212108.120506-5-longman@redhat.com> In-Reply-To: <20260529212108.120506-1-longman@redhat.com> References: <20260529212108.120506-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" There are two possible ways that migration of tasks from multiple source cpusets to a target cpuset can happen. Either a multithread application with threads in different cpusets is wholely moved to a new cpuset or disabling of v2 cpuset controller will move all the tasks in child cpusets to the parent cpuset. In the former case, it is the mm setting of the group leader that really matters. So cpuset_attach_old_cs should track the oldcs of the thread leader. In the latter case, effective_mems of child cpusets must always be a subset of the parent. So no real page migration will be necessary no matter which child cpuset is selected as cpuset_attach_old_cs. IOW, cpuset_attach_old_cs should be updated to match the latest task group leader in cpuset_can_attach(). Suggested-by: Ridong Chen Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 0f93f3d84494..0bb63a9cda0b 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1111,6 +1111,20 @@ static void update_sibling_cpumasks(struct cpuset *p= arent, struct cpuset *cs, /* * cpuset_can_attach() and cpuset_attach() specific internal data * Protected by cpuset_mutex + * + * The cpuset_attach_old_cs is used mainly by cpuset_migrate_mm() to get t= he + * old_mems_allowed value. There are two ways that many-to-one cpuset migr= ation + * can happen: + * 1) A multithread application with threads in different cpusets is whole= ly + * moved to a new cpuset. + * 2) Disabling v2 cpuset controller will move all the tasks in child cpus= ets + * to the parent cpuset. + * + * In the former case, it is the mm setting of the group leader that really + * matters. So cpuset_attach_old_cs should track the oldcs of the thread + * leader. In the latter case, effective_mems of child cpusets must always + * be a subset of the parent. So no real page migration will be necessary = no + * matter which child cpuset is selected as cpuset_attach_old_cs. */ static struct cpuset *cpuset_attach_old_cs; static bool attach_cpus_updated; @@ -3091,6 +3105,10 @@ static int cpuset_can_attach(struct cgroup_taskset *= tset) if (ret) goto out_unlock; =20 + /* Update cpuset_attach_old_cs to the latest group leader */ + if (task =3D=3D task->group_leader) + cpuset_attach_old_cs =3D task_cs(task); + if (setsched_check) { ret =3D security_task_setscheduler(task); if (ret) --=20 2.54.0 From nobody Mon Jun 8 09:48:37 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23F033B6BFA for ; Fri, 29 May 2026 21:29:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090147; cv=none; b=unBK8umhqIqOkKi7b1wC8QycGAaMxY5oYEjcYFC5p5YSFWcQFO/BuKn2UgMm38BjN8nbo2q1IAsjsEqttfSkDRF49cmjDza93ZfKaqBnuEDdKYirkfnhV15mllnIEGHxyZwK8L3/24zJWpmCTxJME4BcA+4lMjgwvntd37DOttY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090147; c=relaxed/simple; bh=V9DFmSDhG43eGFNXkB+UQL0h3Yfa4lT/yeGDh7pikUw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TOYrb5jNQaUnpZsYKiIsWviry3wR/r6QdtJUoE5p/6NMYZ/7NIG+nicEu4ankgLe3jP0JixN7SkpRHzyVGOeg5JY5rNQIes600EsjG8yvPcA9I8EeGVSjWJzTK3E7M129bEyYlbTjojBcfETrTiTD6WHyIUgbeTT/nllthGSr+0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MkuBbYso; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MkuBbYso" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780090141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l5q566hPmN0jAjitqvJNqIepGnvrcfqAOr2eXJQ++W8=; b=MkuBbYsoopBuciO4xTqwWKrNjXDJndoiQgZS1WzROTMLrDsL9i+AMxrSEBvnTFod7DgUhS Afis2EMGcBq8NpY+FkPU57BgQD4EHifPDXurdQr/H+sG3qmhwVBiV6o/FCszQipDMVRZGF 6AE4JClIYuHFTF7WuohpY+4AVM/7oPk= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-245-GdTFWSfDNUusJdRxjEUbBQ-1; Fri, 29 May 2026 17:28:57 -0400 X-MC-Unique: GdTFWSfDNUusJdRxjEUbBQ-1 X-Mimecast-MFC-AGG-ID: GdTFWSfDNUusJdRxjEUbBQ_1780090136 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B9240195608A; Fri, 29 May 2026 21:28:55 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.64.54]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7B90819560B0; Fri, 29 May 2026 21:28:53 +0000 (UTC) From: Waiman Long To: Chen Ridong , Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Peter Zijlstra Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Tomlin , Guopeng Zhang , Waiman Long Subject: [PATCH-next v4 5/6] cgroup/cpuset: Move mpol_rebind_mm/cpuset_migrate_mm() calls inside cpuset_attach_task() Date: Fri, 29 May 2026 17:21:07 -0400 Message-ID: <20260529212108.120506-6-longman@redhat.com> In-Reply-To: <20260529212108.120506-1-longman@redhat.com> References: <20260529212108.120506-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" The cpuset_attach_task() was introduced in commit 42a11bf5c543 ("cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly") to enable the CLONE_INTO_CGROUP flag of clone(2) to behave more like moving a task from one cpuset into another one. That commits didn't move the mpol_rebind_mm() and cpuset_migrate_mm() calls for group leader into cpuset_attach_task(). When the CLONE_INTO_CGROUP flag is used without CLONE_THREAD, the new task is its own group leader. So it is still not equivalent to moving task between cpusets in this case. Make CLONE_INTO_CGROUP behaves more close to cpuset_attach() by moving the mpol_rebind_mm() and cpuset_migrate_mm() calls inside cpuset_attach_task(). As a result, the following static variables will have to be updated in cpuset_fork(). - cpuset_attach_old_cs - attach_cpus_updated - attach_mems_updated - queue_task_work Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 89 ++++++++++++++++++++++++------------------ 1 file changed, 51 insertions(+), 38 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 0bb63a9cda0b..a6506b94e60a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3171,9 +3171,12 @@ static void cpuset_cancel_attach(struct cgroup_tasks= et *tset) */ static cpumask_var_t cpus_attach; static nodemask_t cpuset_attach_nodemask_to; +static bool queue_task_work; =20 static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task) { + struct mm_struct *mm; + lockdep_assert_cpuset_lock_held(); =20 if (cs !=3D &top_cpuset) @@ -3187,24 +3190,56 @@ static void cpuset_attach_task(struct cpuset *cs, s= truct task_struct *task) */ WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach)); =20 + if (cpuset_v2() && !attach_mems_updated) + return; + cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to); cpuset1_update_task_spread_flags(cs, task); + + if (task !=3D task->group_leader) + return; + + /* + * Change mm for threadgroup leader. This is expensive and may + * sleep and should be moved outside migration path proper. + */ + mm =3D get_task_mm(task); + if (mm) { + struct cpuset *oldcs =3D cpuset_attach_old_cs; + + mpol_rebind_mm(mm, &cs->effective_mems); + + /* + * old_mems_allowed is the same with mems_allowed + * here, except if this task is being moved + * automatically due to hotplug. In that case + * @mems_allowed has been updated and is empty, so + * @old_mems_allowed is the right nodesets that we + * migrate mm from. + */ + if (is_memory_migrate(cs)) { + cpuset_migrate_mm(mm, &oldcs->old_mems_allowed, + &cpuset_attach_nodemask_to); + queue_task_work =3D true; + } else { + mmput(mm); + } + } } =20 static void cpuset_attach(struct cgroup_taskset *tset) { struct task_struct *task; - struct task_struct *leader; struct cgroup_subsys_state *css; struct cpuset *cs; struct cpuset *oldcs =3D cpuset_attach_old_cs; - bool queue_task_work =3D false; =20 cgroup_taskset_first(tset, &css); cs =3D css_cs(css); =20 lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ mutex_lock(&cpuset_mutex); + queue_task_work =3D false; =20 /* * In the default hierarchy, enabling cpuset in the child cgroups @@ -3223,38 +3258,6 @@ static void cpuset_attach(struct cgroup_taskset *tse= t) cgroup_taskset_for_each(task, css, tset) cpuset_attach_task(cs, task); =20 - /* - * Change mm for all threadgroup leaders. This is expensive and may - * sleep and should be moved outside migration path proper. Skip it - * if there is no change in effective_mems and CS_MEMORY_MIGRATE is - * not set. - */ - if (!is_memory_migrate(cs) && !attach_mems_updated) - goto out; - - cgroup_taskset_for_each_leader(leader, css, tset) { - struct mm_struct *mm =3D get_task_mm(leader); - - if (mm) { - mpol_rebind_mm(mm, &cs->effective_mems); - - /* - * old_mems_allowed is the same with mems_allowed - * here, except if this task is being moved - * automatically due to hotplug. In that case - * @mems_allowed has been updated and is empty, so - * @old_mems_allowed is the right nodesets that we - * migrate mm from. - */ - if (is_memory_migrate(cs)) { - cpuset_migrate_mm(mm, &oldcs->old_mems_allowed, - &cpuset_attach_nodemask_to); - queue_task_work =3D true; - } else - mmput(mm); - } - } - out: if (queue_task_work) schedule_flush_migrate_mm(); @@ -3688,15 +3691,14 @@ static void cpuset_cancel_fork(struct task_struct *= task, struct css_set *cset) */ static void cpuset_fork(struct task_struct *task) { - struct cpuset *cs; - bool same_cs; + struct cpuset *cs, *oldcs; =20 rcu_read_lock(); cs =3D task_cs(task); - same_cs =3D (cs =3D=3D task_cs(current)); + oldcs =3D task_cs(current); rcu_read_unlock(); =20 - if (same_cs) { + if (cs =3D=3D oldcs) { if (cs =3D=3D &top_cpuset) return; =20 @@ -3708,7 +3710,18 @@ static void cpuset_fork(struct task_struct *task) /* CLONE_INTO_CGROUP */ mutex_lock(&cpuset_mutex); guarantee_online_mems(cs, &cpuset_attach_nodemask_to); + /* + * Assume CPUs and memory nodes are updated + * A CLONE_INTO_CGROUP operation should have taken the cgroup mutex + * and so there shouldn't be a competing cpuset_attach() operation. + */ + attach_cpus_updated =3D attach_mems_updated =3D true; + queue_task_work =3D false; + cpuset_attach_old_cs =3D oldcs; cpuset_attach_task(cs, task); + attach_cpus_updated =3D attach_mems_updated =3D false; + if (queue_task_work) + schedule_flush_migrate_mm(); =20 dec_attach_in_progress_locked(cs); mutex_unlock(&cpuset_mutex); --=20 2.54.0 From nobody Mon Jun 8 09:48:37 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23E363B635B for ; Fri, 29 May 2026 21:29:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090147; cv=none; b=pZoOstOGzHnsP9PYLSzNR0Gs5ic5Ti3Y+R8LKxJhZzY3jOJsaxGrYQuyFEiByHbhHgsZp5JyT8KLOs7KuXFnJFV7WyMoRdHusmQVKqG1MR58+mbUSe0Yx+FfCDHv6v99ITORaaFBJUdwsQuuqWRqJkYb1Kxp4mbzSCczQwqvSOE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780090147; c=relaxed/simple; bh=NlzLK9zAqjVGDCFqIbUdMQMYBFaxjeIA5efE2gQ2PGs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LzRKCc8uvM3vYWhd+ID3w5Ys9f2yycaNe9wTA4vkxjjwGw9lfRSIpw+NE0KBkL8Rmo4pbjFDSGJSINxlYmB8Y12BDww36L3qA1YlbTClYZmbz0irhM9bKMmXD1eAr96hehdG9c2EnBEJdk1CqiJ0kzYvxXAkxZ8R+C8pz6TGjRY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iX83DmCv; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iX83DmCv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780090144; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kZ2v4QCcizlgq/ZOIawQKANhucDm/iOepXOi14u0xVY=; b=iX83DmCvDzh9rCMBpM5wlKUqFoiQRoGj8pAdEziFixiA7SWUj/wF/4a8Edb5pdI3TZWfVS pLNGBh1/lEst9lQkdrDp9rbvKhe0jVcLk2/VjM/l6UmfTmy++wQ3RIJrAIlqHHVXROhpe6 T1gchaeaoWGyLICGUArVs4knBrBUAyM= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-345-6kAdMXdxNRy0O8UV-2pWNw-1; Fri, 29 May 2026 17:28:59 -0400 X-MC-Unique: 6kAdMXdxNRy0O8UV-2pWNw-1 X-Mimecast-MFC-AGG-ID: 6kAdMXdxNRy0O8UV-2pWNw_1780090138 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DE971195608B; Fri, 29 May 2026 21:28:57 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.64.54]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id EDAE919560A3; Fri, 29 May 2026 21:28:55 +0000 (UTC) From: Waiman Long To: Chen Ridong , Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Peter Zijlstra Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Aaron Tomlin , Guopeng Zhang , Waiman Long Subject: [PATCH-next v4 6/6] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach() Date: Fri, 29 May 2026 17:21:08 -0400 Message-ID: <20260529212108.120506-7-longman@redhat.com> In-Reply-To: <20260529212108.120506-1-longman@redhat.com> References: <20260529212108.120506-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Content-Type: text/plain; charset="utf-8" With cgroup v2, the cgroup_taskset structure passed into the cgroup can_attach() and attach() methods can contain task migration data with multiple destination or source cpusets when the cpuset controller is enabled or disabled respectively. Since cpuset is threaded in both v1 and v2, another possible way to cause many-to-one migration is to move the whole process with multiple threads in different cpuset enabled threaded cgroups into another cpuset enabled cgroup. The current cpuset_can_attach() and cpuset_attach() functions still expect task migration is from one source cpuset to one destination cpuset. This has been the case since cpuset was enabled for cgroup v2 in commit 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy"). This problem is less an issue when enabling the cpuset controller as all the newly created child cpusets will have exactly the same set of CPUs and memory nodes except when deadline tasks are involved in migration as the deadline task accounting data can be off. It can be more problematic when the cpuset controller is disabled as their set of CPUs and memory nodes may differ from their parent or with the moving of multi-threaded process from different threaded cgroups. Fix that by tracking the set of source (old) and destination cpusets in singly linked lists and iterating them all to properly update the internal data. Also keep the current cs and oldcs variables up-to-date with the css and task iterators. To ensure proper DL tasks accounting, the nr_migrate_dl_tasks in both the source and destination cpusets are decremented/incremented with their values added to nr_deadline_tasks when the migration is successful. Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy= ") Signed-off-by: Waiman Long --- kernel/cgroup/cpuset-internal.h | 6 + kernel/cgroup/cpuset.c | 204 ++++++++++++++++++++++++-------- 2 files changed, 158 insertions(+), 52 deletions(-) diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-interna= l.h index f7aaf01f7cd5..4c2772a7fd5e 100644 --- a/kernel/cgroup/cpuset-internal.h +++ b/kernel/cgroup/cpuset-internal.h @@ -161,6 +161,12 @@ struct cpuset { */ bool remote_partition; =20 + /* + * cpuset_can_attach() and cpuset_attach() specific data + */ + bool attach_node_in_llist; + struct llist_node attach_node; + /* * number of SCHED_DEADLINE tasks attached to this cpuset, so that we * know when to rebuild associated root domain bandwidth information. diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index a6506b94e60a..2add658eb288 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -37,6 +37,7 @@ #include #include #include +#include =20 DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key); DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key); @@ -1127,6 +1128,8 @@ static void update_sibling_cpumasks(struct cpuset *pa= rent, struct cpuset *cs, * matter which child cpuset is selected as cpuset_attach_old_cs. */ static struct cpuset *cpuset_attach_old_cs; +static LLIST_HEAD(src_cs_head); +static LLIST_HEAD(dst_cs_head); static bool attach_cpus_updated; static bool attach_mems_updated; =20 @@ -3017,9 +3020,10 @@ static int update_prstate(struct cpuset *cs, int new= _prs) * Also set the boolean flag passed in by @psetsched depending on if * security_task_setscheduler() call is needed and @oldcs is not NULL. */ -static int cpuset_can_attach_check(struct cpuset *cs, struct cpuset *oldcs, - bool *psetsched) +static int cpuset_can_attach_check(struct cpuset *cs, struct cpuset *oldcs= , bool *psetsched) { + bool cpu_match, mem_match; + if (cpumask_empty(cs->effective_cpus) || (!is_in_v2_mode() && nodes_empty(cs->mems_allowed))) return -ENOSPC; @@ -3030,15 +3034,34 @@ static int cpuset_can_attach_check(struct cpuset *c= s, struct cpuset *oldcs, /* * Update attach specific data */ - attach_cpus_updated =3D !cpumask_equal(cs->effective_cpus, oldcs->effecti= ve_cpus); - attach_mems_updated =3D !nodes_equal(cs->effective_mems, oldcs->effective= _mems); + if (!cs->attach_node_in_llist) { + llist_add(&cs->attach_node, &dst_cs_head); + cs->attach_node_in_llist =3D true; + } + if (!oldcs->attach_node_in_llist) { + llist_add(&oldcs->attach_node, &src_cs_head); + oldcs->attach_node_in_llist =3D true; + } + + cpu_match =3D cpumask_equal(cs->effective_cpus, oldcs->effective_cpus); + mem_match =3D nodes_equal(cs->effective_mems, oldcs->effective_mems); + + /* + * Set the updated flags whenever there is a mismatch in any of the + * src/dst pairs. + */ + if (!attach_cpus_updated) + attach_cpus_updated =3D !cpu_match; + + if (!attach_mems_updated) + attach_mems_updated =3D !mem_match; =20 /* * Skip rights over task setsched check in v2 when nothing changes, * migration permission derives from hierarchy ownership in * cgroup_procs_write_permission()). */ - *psetsched =3D !cpuset_v2() || attach_cpus_updated || attach_mems_updated; + *psetsched =3D !cpuset_v2() || !cpu_match || !mem_match; =20 /* * A v1 cpuset with tasks will have no CPU left only when CPU hotplug @@ -3053,33 +3076,103 @@ static int cpuset_can_attach_check(struct cpuset *= cs, struct cpuset *oldcs, return 0; } =20 -static int cpuset_reserve_dl_bw(struct cpuset *cs) +/* + * If reset_dl_bw is set, reset the previous dl_bw_alloc() call. Otherwise, + * update nr_deadline_tasks according to nr_migrate_dl_tasks in both source + * and destination cpusets. + */ +static void clear_attach_data(bool reset_dl_bw) +{ + struct cpuset *cs, *next; + + llist_for_each_entry_safe(cs, next, src_cs_head.first, attach_node) { + cs->attach_node.next =3D NULL; + cs->attach_node_in_llist =3D false; + if (cs->nr_migrate_dl_tasks && !reset_dl_bw) + cs->nr_deadline_tasks +=3D cs->nr_migrate_dl_tasks; + cs->nr_migrate_dl_tasks =3D 0; + } + + llist_for_each_entry_safe(cs, next, dst_cs_head.first, attach_node) { + cs->attach_node.next =3D NULL; + cs->attach_node_in_llist =3D false; + if (reset_dl_bw && cs->dl_bw_cpu >=3D 0) + dl_bw_free(cs->dl_bw_cpu, cs->sum_migrate_dl_bw); + if (cs->nr_migrate_dl_tasks && !reset_dl_bw) + cs->nr_deadline_tasks +=3D cs->nr_migrate_dl_tasks; + cs->nr_migrate_dl_tasks =3D 0; + cs->sum_migrate_dl_bw =3D 0; + cs->dl_bw_cpu =3D -1; + } + + src_cs_head.first =3D NULL; + dst_cs_head.first =3D NULL; + attach_cpus_updated =3D false; + attach_mems_updated =3D false; +} + +static int cpuset_reserve_dl_bw(void) { + struct cpuset *cs; int cpu, ret; =20 - if (!cs->sum_migrate_dl_bw) - return 0; + llist_for_each_entry(cs, dst_cs_head.first, attach_node) { + if (!cs->sum_migrate_dl_bw) + continue; =20 - cpu =3D cpumask_any_and(cpu_active_mask, cs->effective_cpus); - if (unlikely(cpu >=3D nr_cpu_ids)) - return -EINVAL; + cpu =3D cpumask_any_and(cpu_active_mask, cs->effective_cpus); + if (unlikely(cpu >=3D nr_cpu_ids)) + return -EINVAL; =20 - ret =3D dl_bw_alloc(cpu, cs->sum_migrate_dl_bw); - if (ret) - return ret; + ret =3D dl_bw_alloc(cpu, cs->sum_migrate_dl_bw); + if (ret) + return ret; =20 - cs->dl_bw_cpu =3D cpu; + cs->dl_bw_cpu =3D cpu; + } return 0; } =20 -static void reset_migrate_dl_data(struct cpuset *cs) +static void set_attach_in_progress(void) { - cs->nr_migrate_dl_tasks =3D 0; - cs->sum_migrate_dl_bw =3D 0; - cs->dl_bw_cpu =3D -1; + struct cpuset *cs; + + /* + * Mark attach is in progress. This makes validate_change() fail + * changes which zero cpus/mems_allowed. + */ + llist_for_each_entry(cs, dst_cs_head.first, attach_node) + cs->attach_in_progress++; +} + +static void reset_attach_in_progress(void) +{ + struct cpuset *cs; + + llist_for_each_entry(cs, dst_cs_head.first, attach_node) + dec_attach_in_progress_locked(cs); } =20 -/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held= */ +/* + * Called by cgroups to determine if a cpuset is usable; cpuset_mutex held. + * + * With cgroup v2, enabling of cpuset controller in a cgroup subtree can + * cause @tset to contain task migration data from one parent cpuset to mu= ltiple + * child cpusets. Not much is needed to be done here other than tracking t= he + * number of DL tasks in each cpuset as the CPUs and memory nodes of the c= hild + * cpusets are exactly the same as the parent. + * + * Conversely, disabling of cpuset controller can cause @tset to contain t= ask + * migration data from multiple child cpusets to one parent cpuset. Here, = the + * CPUs and memory nodes of the child cpusets may be different from the pa= rent, + * but must be a subset of its parent. + * + * Another possible many-to-one migration is the moving of the whole + * multithreaded process with threads in different cpusets to another cpus= et. + * + * For all other use cases, @tset task migration data should be from one s= ource + * cpuset to one destination cpuset. + */ static int cpuset_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; @@ -3101,6 +3194,16 @@ static int cpuset_can_attach(struct cgroup_taskset *= tset) goto out_unlock; =20 cgroup_taskset_for_each(task, css, tset) { + struct cpuset *newcs =3D css_cs(css); + struct cpuset *new_oldcs =3D task_cs(task); + + if ((newcs !=3D cs) || (new_oldcs !=3D oldcs)) { + cs =3D newcs; + oldcs =3D new_oldcs; + ret =3D cpuset_can_attach_check(cs, oldcs, &setsched_check); + if (ret) + goto out_unlock; + } ret =3D task_can_attach(task); if (ret) goto out_unlock; @@ -3122,23 +3225,19 @@ static int cpuset_can_attach(struct cgroup_taskset = *tset) * contribute to sum_migrate_dl_bw. */ cs->nr_migrate_dl_tasks++; + oldcs->nr_migrate_dl_tasks--; if (dl_task_needs_bw_move(task, cs->effective_cpus)) cs->sum_migrate_dl_bw +=3D task->dl.dl_bw; } } =20 - ret =3D cpuset_reserve_dl_bw(cs); + ret =3D cpuset_reserve_dl_bw(); =20 out_unlock: - if (ret) { - reset_migrate_dl_data(cs); - } else { - /* - * Mark attach is in progress. This makes validate_change() fail - * changes which zero cpus/mems_allowed. - */ - cs->attach_in_progress++; - } + if (ret) + clear_attach_data(true); + else + set_attach_in_progress(); =20 mutex_unlock(&cpuset_mutex); return ret; @@ -3153,14 +3252,8 @@ static void cpuset_cancel_attach(struct cgroup_tasks= et *tset) cs =3D css_cs(css); =20 mutex_lock(&cpuset_mutex); - dec_attach_in_progress_locked(cs); - - if (cs->dl_bw_cpu >=3D 0) - dl_bw_free(cs->dl_bw_cpu, cs->sum_migrate_dl_bw); - - if (cs->nr_migrate_dl_tasks) - reset_migrate_dl_data(cs); - + reset_attach_in_progress(); + clear_attach_data(true); mutex_unlock(&cpuset_mutex); } =20 @@ -3232,7 +3325,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) struct task_struct *task; struct cgroup_subsys_state *css; struct cpuset *cs; - struct cpuset *oldcs =3D cpuset_attach_old_cs; =20 cgroup_taskset_first(tset, &css); cs =3D css_cs(css); @@ -3245,32 +3337,40 @@ static void cpuset_attach(struct cgroup_taskset *ts= et) * In the default hierarchy, enabling cpuset in the child cgroups * will trigger a cpuset_attach() call with no change in effective cpus * and mems. In that case, we can optimize out by skipping the task - * iteration and update. + * iteration and update, but the destination cpuset list is iterated to + * set old_mems_sllowed. */ if (cpuset_v2()) { cpuset_attach_nodemask_to =3D cs->effective_mems; - if (!attach_cpus_updated && !attach_mems_updated) + if (!attach_cpus_updated && !attach_mems_updated) { + llist_for_each_entry(cs, dst_cs_head.first, attach_node) + cs->old_mems_allowed =3D cs->effective_mems; goto out; + } } else { guarantee_online_mems(cs, &cpuset_attach_nodemask_to); } =20 - cgroup_taskset_for_each(task, css, tset) + cgroup_taskset_for_each(task, css, tset) { + struct cpuset *newcs =3D css_cs(css); + + if (newcs !=3D cs) { + cs->old_mems_allowed =3D cpuset_attach_nodemask_to; + cs =3D newcs; + if (cpuset_v2()) + cpuset_attach_nodemask_to =3D cs->effective_mems; + else + guarantee_online_mems(cs, &cpuset_attach_nodemask_to); + } cpuset_attach_task(cs, task); + } =20 -out: if (queue_task_work) schedule_flush_migrate_mm(); cs->old_mems_allowed =3D cpuset_attach_nodemask_to; - - if (cs->nr_migrate_dl_tasks) { - cs->nr_deadline_tasks +=3D cs->nr_migrate_dl_tasks; - oldcs->nr_deadline_tasks -=3D cs->nr_migrate_dl_tasks; - reset_migrate_dl_data(cs); - } - - dec_attach_in_progress_locked(cs); - +out: + reset_attach_in_progress(); + clear_attach_data(false); mutex_unlock(&cpuset_mutex); } =20 --=20 2.54.0