From nobody Fri Dec 19 12:31:33 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 728E7265629 for ; Wed, 5 Nov 2025 04:39:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762317560; cv=none; b=WwC8HHXUwf1cjSHPqYS4SEoJxtEl0PlJ5FhXh07TPMdUQLqB7yG+L22IUrjFfQNCI3Vbt91uwiMzXU+1To1Qg2jXFZWtlBh8Vw7X9HiYXyYSkGg5BfAfHnWYmr10xBuFDt6l2oPMgKFiEcygqZ6J2E1+8vPoXdW975bLpZ0fl8Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762317560; c=relaxed/simple; bh=FsDVuL4So5hYAjbKQr5dfptz8aZ5cBKhnkvvsXZclPc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DxGf02GRxXFZg2OrS8mFxhreOFNk9n0jGIFuAHMQ3a5AAPFoVm2GKBZy0QJ0VnQcq9ZltcTczQfQP5A9KdZyuUPYafCppEnTaN/rZK4vNIjcNcDwv99cx7AFvk2j4AwzFf4qR6EWHVEoDBU6Qj6QRyh6TkVx3DjO9S8G6/+80BE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MkdQs01P; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MkdQs01P" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762317557; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=56SeQtccCrfekrGgrecCjJvwM5GooUSBGQW14sYuREc=; b=MkdQs01P/Dl/ZU2enTRzALkiXuW3dVyIdoBSMK3aDR30pL10bVo5GlqnhPQPWn9F2f/vK7 +g8YtuaY1azp5M2zhBEwnmmBgVnxXT/xS2PCBgjOVEmbjlPVfT+Hj421WaV6IXC70xj5ar SeoXEaKPju3xO8nR2UpzG0rxACAXrbM= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-287-69ZcZ23TOwuaBMR7Fw-poA-1; Tue, 04 Nov 2025 23:39:13 -0500 X-MC-Unique: 69ZcZ23TOwuaBMR7Fw-poA-1 X-Mimecast-MFC-AGG-ID: 69ZcZ23TOwuaBMR7Fw-poA_1762317552 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7D93518011EF; Wed, 5 Nov 2025 04:39:11 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.89.34]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 463B5195608E; Wed, 5 Nov 2025 04:39:09 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Chen Ridong , Gabriele Monaco , Frederic Weisbecker , Waiman Long Subject: [cgroup/for-6.19 PATCH v3 2/5] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Date: Tue, 4 Nov 2025 23:38:45 -0500 Message-ID: <20251105043848.382703-3-longman@redhat.com> In-Reply-To: <20251105043848.382703-1-longman@redhat.com> References: <20251105043848.382703-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" Currently the user can set up isolated cpus via cpuset and nohz_full in such a way that leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated nor nohz full). This can be a problem for other subsystems (e.g. the timer wheel imgration). Prevent this configuration by blocking any assignation that would cause the union of domain isolated cpus and nohz_full to covers all CPUs. [longman: Remove isolated_cpus_should_update() and rewrite the checking in update_prstate() and update_parent_effective_cpumask()] Originally-by: Gabriele Monaco Signed-off-by: Waiman Long Reviewed-by: Chen Ridong --- kernel/cgroup/cpuset.c | 74 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index da770dac955e..99622e90991a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1393,6 +1393,45 @@ static bool partition_xcpus_del(int old_prs, struct = cpuset *parent, return isolcpus_updated; } =20 +/* + * isolated_cpus_can_update - check for isolated & nohz_full conflicts + * @add_cpus: cpu mask for cpus that are going to be isolated + * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL + * Return: false if there is conflict, true otherwise + * + * If nohz_full is enabled and we have isolated CPUs, their combination mu= st + * still leave housekeeping CPUs. + * + * TBD: Should consider merging this function into + * prstate_housekeeping_conflict(). + */ +static bool isolated_cpus_can_update(struct cpumask *add_cpus, + struct cpumask *del_cpus) +{ + cpumask_var_t full_hk_cpus; + int res =3D true; + + if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE)) + return true; + + if (del_cpus && cpumask_weight_and(del_cpus, + housekeeping_cpumask(HK_TYPE_KERNEL_NOISE))) + return true; + + if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL)) + return false; + + cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE), + housekeeping_cpumask(HK_TYPE_DOMAIN)); + cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus); + cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask); + if (!cpumask_weight_andnot(full_hk_cpus, add_cpus)) + res =3D false; + + free_cpumask_var(full_hk_cpus); + return res; +} + static void update_isolation_cpumasks(bool isolcpus_updated) { int ret; @@ -1551,6 +1590,9 @@ static int remote_partition_enable(struct cpuset *cs,= int new_prs, if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) || cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus)) return PERR_INVCPUS; + if ((new_prs =3D=3D PRS_ISOLATED) && + !isolated_cpus_can_update(tmp->new_cpus, NULL)) + return PERR_HKEEPING; =20 spin_lock_irq(&callback_lock); isolcpus_updated =3D partition_xcpus_add(new_prs, NULL, tmp->new_cpus); @@ -1650,6 +1692,9 @@ static void remote_cpus_update(struct cpuset *cs, str= uct cpumask *xcpus, else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) || cpumask_subset(top_cpuset.effective_cpus, tmp->addmask)) cs->prs_err =3D PERR_NOCPUS; + else if ((prs =3D=3D PRS_ISOLATED) && + !isolated_cpus_can_update(tmp->addmask, tmp->delmask)) + cs->prs_err =3D PERR_HKEEPING; if (cs->prs_err) goto invalidate; } @@ -1750,6 +1795,7 @@ static int update_parent_effective_cpumask(struct cpu= set *cs, int cmd, int part_error =3D PERR_NONE; /* Partition error? */ int isolcpus_updated =3D 0; struct cpumask *xcpus =3D user_xcpus(cs); + int parent_prs =3D parent->partition_root_state; bool nocpu; =20 lockdep_assert_held(&cpuset_mutex); @@ -1813,6 +1859,10 @@ static int update_parent_effective_cpumask(struct cp= uset *cs, int cmd, if (prstate_housekeeping_conflict(new_prs, xcpus)) return PERR_HKEEPING; =20 + if ((new_prs =3D=3D PRS_ISOLATED) && (new_prs !=3D parent_prs) && + !isolated_cpus_can_update(xcpus, NULL)) + return PERR_HKEEPING; + if (tasks_nocpu_error(parent, cs, xcpus)) return PERR_NOCPUS; =20 @@ -1866,6 +1916,7 @@ static int update_parent_effective_cpumask(struct cpu= set *cs, int cmd, * * For invalid partition: * delmask =3D newmask & parent->effective_xcpus + * The partition may become valid soon. */ if (is_partition_invalid(cs)) { adding =3D false; @@ -1880,6 +1931,23 @@ static int update_parent_effective_cpumask(struct cp= uset *cs, int cmd, deleting =3D cpumask_and(tmp->delmask, tmp->delmask, parent->effective_xcpus); } + + /* + * TBD: Invalidate a currently valid child root partition may + * still break isolated_cpus_can_update() rule if parent is an + * isolated partition. + */ + if (is_partition_valid(cs) && (old_prs !=3D parent_prs)) { + if ((parent_prs =3D=3D PRS_ROOT) && + /* Adding to parent means removing isolated CPUs */ + !isolated_cpus_can_update(tmp->delmask, tmp->addmask)) + part_error =3D PERR_HKEEPING; + if ((parent_prs =3D=3D PRS_ISOLATED) && + /* Adding to parent means adding isolated CPUs */ + !isolated_cpus_can_update(tmp->addmask, tmp->delmask)) + part_error =3D PERR_HKEEPING; + } + /* * The new CPUs to be removed from parent's effective CPUs * must be present. @@ -2994,7 +3062,11 @@ static int update_prstate(struct cpuset *cs, int new= _prs) * A change in load balance state only, no change in cpumasks. * Need to update isolated_cpus. */ - isolcpus_updated =3D true; + if ((new_prs =3D=3D PRS_ISOLATED) && + !isolated_cpus_can_update(cs->effective_xcpus, NULL)) + err =3D PERR_HKEEPING; + else + isolcpus_updated =3D true; } else { /* * Switching back to member is always allowed even if it --=20 2.51.1