From nobody Sun Feb 8 23:40:58 2026 Received: from canpmsgout11.his.huawei.com (canpmsgout11.his.huawei.com [113.46.200.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6B312DE71D for ; Fri, 30 Jan 2026 08:55:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.226 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769763342; cv=none; b=aUEsYTgrz1ecgLKT7lNmJPNHddEvQHTheHvpznKPtsz9KxzV8/gfh6UupVU7PEVQCs65t8lzXOxl5qcjXvgmdINjh/MWOtQxzcPaSmRGJBawAvBB9tM3CXk/sc71KQJWinOxHrhReEtpH/yHXjyiSn6dbJiVVyoWP/FDThgF+Ak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769763342; c=relaxed/simple; bh=6sSp6AGECTvx0NATsR5VXkSO53YYl5wSedIXiTpyFBE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=F6ZRB5rzGB8YIQZ/QeFFE5syODD/bhIyNmRr4WazjOpCZyXo2PXpbwQv/dzpQ6uJqoDL0IR6CkNYLvQfQrDlalOufKwXQ4DeHIiLTR3MjiZEdXcRDmlVQXvjzuO9ytXiGIgOfauyDlpa/+DMDxssi0/bmCnG52vn/QoPOXLc1IQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=0JgEuoU/; arc=none smtp.client-ip=113.46.200.226 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="0JgEuoU/" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=EMmRYHJJgqn8F9PFl7RVAoRzwJX2o1FlVPOz9bdq0Bs=; b=0JgEuoU/4nVeTQ/oDSFGTxbUI6o19mMPsb9AT5XtOMMrhrQApGpqBzlvIZzohT+X7Nq3sXV5G Z9ojSZknFZNGvc+Z+Ps1NuHGEcY0yQkhWl8fYFtkffTpHHuwxtxNQhj5jmI3Kt1fWwRn4LKm6Ii zkPwscbtJnVnzCtBsqDnZ/8= Received: from mail.maildlp.com (unknown [172.19.163.15]) by canpmsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4f2VBG1h0XzKm5w; Fri, 30 Jan 2026 16:52:06 +0800 (CST) Received: from dggemv706-chm.china.huawei.com (unknown [10.3.19.33]) by mail.maildlp.com (Postfix) with ESMTPS id 1A2E540539; Fri, 30 Jan 2026 16:55:36 +0800 (CST) Received: from kwepemq100012.china.huawei.com (7.202.195.195) by dggemv706-chm.china.huawei.com (10.3.19.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 30 Jan 2026 16:55:35 +0800 Received: from huawei.com (10.67.175.84) by kwepemq100012.china.huawei.com (7.202.195.195) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 30 Jan 2026 16:55:35 +0800 From: Zicheng Qu To: , CC: , , , , , , , , , , , , , , Subject: [PATCH] sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups Date: Fri, 30 Jan 2026 08:34:38 +0000 Message-ID: <20260130083438.1122457-1-quzicheng@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260120032549.186733-1-quzicheng@huawei.com> References: <20260120032549.186733-1-quzicheng@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems200001.china.huawei.com (7.221.188.67) To kwepemq100012.china.huawei.com (7.202.195.195) Content-Type: text/plain; charset="utf-8" Consider the following sequence on a CPU configured with nohz_full: 1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS bandwidth control. The gse (cgroup A) where the task P attached is dequeued and the CPU switches to idle. 2) Before cgroup A is unthrottled, task P is migrated from cgroup A to another cgroup B (not throttled). During sched_move_task(), the task P is observed as queued but not running, and therefore no resched_curr() is triggered. 3) Since the CPU is nohz_full, it remains in do_idle() waiting for an explicit scheduling event, i.e., resched_curr(). 4) For kernel <=3D 5.10: Later, cgroup A is unthrottled. However, the task P has already been migrated out of cgroup A, so unthrottle_cfs_rq() may observe load_weight =3D=3D 0 and return early without resched_curr() called. For kernel >=3D 6.6: The unthrottling path normally triggers `resched_curr()` almost cases even when no runnable tasks remain in the unthrottled cgroup, preventing the idle stall described above. However, if cgroup A is removed before it gets unthrottled, the unthrottling path for cgroup A is never executed. In a result, no `resched_curr()` can be called. 5) At this point, the task P is runnable in cgroup B (not throttled), but the CPU remains in do_idle() with no pending reschedule point. The system stays in this state until an unrelated event (e.g. a new task wakeup or any cases) that can trigger a resched_curr() breaks the nohz_full idle state, and then the task P finally gets scheduled. The root cause is that sched_move_task() may classify the task as only queued, not running, and therefore fails to trigger a resched_curr(), while the later unthrottling path no longer has visibility of the migrated task. Preserve the existing behavior for running tasks by issuing resched_curr(), and explicitly invoke check_preempt_curr() for tasks that were queued at the time of migration. This ensures that runnable tasks are reconsidered for scheduling even when nohz_full suppresses periodic ticks. Fixes: 29f59db3a74b ("sched: group-scheduler core") Signed-off-by: Zicheng Qu Reviewed-by: K Prateek Nayak Reviewed-by: Aaron Lu Tested-by: Aaron Lu --- kernel/sched/core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 045f83ad261e..04271b77101c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9110,6 +9110,7 @@ static void sched_change_group(struct task_struct *ts= k) void sched_move_task(struct task_struct *tsk, bool for_autogroup) { unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE; + bool queued =3D false; bool resched =3D false; struct rq *rq; =20 @@ -9122,10 +9123,13 @@ void sched_move_task(struct task_struct *tsk, bool = for_autogroup) scx_cgroup_move_task(tsk); if (scope->running) resched =3D true; + queued =3D scope->queued; } =20 if (resched) resched_curr(rq); + else if (queued) + wakeup_preempt(rq, tsk, 0); =20 __balance_callbacks(rq, &rq_guard.rf); } --=20 2.34.1