From nobody Thu Oct 2 10:53:02 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E65B2E2EF8; Thu, 18 Sep 2025 06:57:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758178623; cv=none; b=bBtvddyObno3YE2zf9uxB00nvJBtLtR1kdJhASvpEARDzETATQgXKH3zwixVcM27Xqz+MpX5/ypaUsteBebV1OX6UwIuvnpw7EQf5W6FDw2MA1EqJp5QX7/JkRfaqaiaqWQ+9gC6iaicmJr0H8q9amPPhd24NyUiMAB58F48IjU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758178623; c=relaxed/simple; bh=GYBvo8kuRGh+wdARuJtmqGpzVW5iaIuxfzugwLVhiS8=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=Yg5R6gy7BsTRmV9KnAPpFkD96cItdxeuo0Q/hpwLFmcPqEUOQfVE6/w4CthvbQTzh3PRhNcprCnF/VoeY8+e2C7uH3EJqW7swj2+gBacl32CPl5hi3XLyJDUmeWsGQN70BUBHNpUf2f+ZqsbP5Yl5DExMz0NQ9GBuYQnlf3pC5A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=DRUCXd14; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=/HRnhurr; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="DRUCXd14"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="/HRnhurr" Date: Thu, 18 Sep 2025 06:56:58 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1758178619; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mdrnKjbboSxyPICJLeXJidWy75qw+5AXlMOdmsPBku0=; b=DRUCXd14PMbcu7e2sIs6a8Nc8wngnoX969U/tnESrNvowp4UYtWrGfDciAx5rorYFjoWme pTbQnM/ibpLaNyXh3MccEd4BlQSH8MTTTiTttiS7gXOpMKLuk+ZabjIwmst/ek/jkLcl1Y c0WeMZu3ZfnBpT4EjYcSHsy6MFNyyHYod6+nFBlMNzHmP0c4xsOUGc2xKt3Z7ADEqLfiib m+SSJ6+wcWOgOg14nwMO+qz6CGXuHsnozvI+gWGKm6/07Nsc9E2KpZeg7Tch6PLjhSn9Ni TJ6t11fa4/rBafFhbd9l0tXTFR3pDvYx6t5cxokKj+o+Jfs5mfAhstPHppc0Yg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1758178619; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mdrnKjbboSxyPICJLeXJidWy75qw+5AXlMOdmsPBku0=; b=/HRnhurrP2Q4GlSdAO4OPeG3yvdhokHc9Ibw9io11sal2lP2jhiFdAhsBbAWK3qcD96c9R iVDcykkvuPhjy4AA== From: "tip-bot2 for Peter Zijlstra" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/urgent] sched/deadline: Fix dl_server getting stuck Cc: John Stultz , "Peter Zijlstra (Intel)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20250916110155.GH3245006@noisy.programming.kicks-ass.net> References: <20250916110155.GH3245006@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <175817861820.709179.10538516755307778527.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/urgent branch of tip: Commit-ID: 077e1e2e0015e5ba6538d1c5299fb299a3a92d60 Gitweb: https://git.kernel.org/tip/077e1e2e0015e5ba6538d1c5299fb299a= 3a92d60 Author: Peter Zijlstra AuthorDate: Tue, 16 Sep 2025 23:02:41 +02:00 Committer: Peter Zijlstra CommitterDate: Thu, 18 Sep 2025 08:50:05 +02:00 sched/deadline: Fix dl_server getting stuck John found it was easy to hit lockup warnings when running locktorture on a 2 CPU VM, which he bisected down to: commit cccb45d7c429 ("sched/deadline: Less agressive dl_server handling"). While debugging it seems there is a chance where we end up with the dl_server dequeued, with dl_se->dl_server_active. This causes dl_server_start() to return without enqueueing the dl_server, thus it fails to run when RT tasks starve the cpu. When this happens, dl_server_timer() catches the '!dl_se->server_has_tasks(dl_se)' case, which then calls replenish_dl_entity() and dl_server_stopped() and finally return HRTIMER_NO_RESTART. This ends in no new timer and also no enqueue, leaving the dl_server 'dead', allowing starvation. What should have happened is for the bandwidth timer to start the zero-laxity timer, which in turn would enqueue the dl_server and cause dl_se->server_pick_task() to be called -- which will stop the dl_server if no fair tasks are observed for a whole period. IOW, it is totally irrelevant if there are fair tasks at the moment of bandwidth refresh. This removes all dl_se->server_has_tasks() users, so remove the whole thing. Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling") Reported-by: John Stultz Signed-off-by: Peter Zijlstra (Intel) Tested-by: John Stultz --- include/linux/sched.h | 1 - kernel/sched/deadline.c | 12 +----------- kernel/sched/fair.c | 7 +------ kernel/sched/sched.h | 4 ---- 4 files changed, 2 insertions(+), 22 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index f8188b8..f89313b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -733,7 +733,6 @@ struct sched_dl_entity { * runnable task. */ struct rq *rq; - dl_server_has_tasks_f server_has_tasks; dl_server_pick_f server_pick_task; =20 #ifdef CONFIG_RT_MUTEXES diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f253012..5a5080b 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -875,7 +875,7 @@ static void replenish_dl_entity(struct sched_dl_entity = *dl_se) */ if (dl_se->dl_defer && !dl_se->dl_defer_running && dl_time_before(rq_clock(dl_se->rq), dl_se->deadline - dl_se->runtime)= ) { - if (!is_dl_boosted(dl_se) && dl_se->server_has_tasks(dl_se)) { + if (!is_dl_boosted(dl_se)) { =20 /* * Set dl_se->dl_defer_armed and dl_throttled variables to @@ -1152,8 +1152,6 @@ static void __push_dl_task(struct rq *rq, struct rq_f= lags *rf) /* a defer timer will not be reset if the runtime consumed was < dl_server= _min_res */ static const u64 dl_server_min_res =3D 1 * NSEC_PER_MSEC; =20 -static bool dl_server_stopped(struct sched_dl_entity *dl_se); - static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct = sched_dl_entity *dl_se) { struct rq *rq =3D rq_of_dl_se(dl_se); @@ -1171,12 +1169,6 @@ static enum hrtimer_restart dl_server_timer(struct h= rtimer *timer, struct sched_ if (!dl_se->dl_runtime) return HRTIMER_NORESTART; =20 - if (!dl_se->server_has_tasks(dl_se)) { - replenish_dl_entity(dl_se); - dl_server_stopped(dl_se); - return HRTIMER_NORESTART; - } - if (dl_se->dl_defer_armed) { /* * First check if the server could consume runtime in background. @@ -1625,11 +1617,9 @@ static bool dl_server_stopped(struct sched_dl_entity= *dl_se) } =20 void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, - dl_server_has_tasks_f has_tasks, dl_server_pick_f pick_task) { dl_se->rq =3D rq; - dl_se->server_has_tasks =3D has_tasks; dl_se->server_pick_task =3D pick_task; } =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c4d91e8..59d7dc9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8859,11 +8859,6 @@ static struct task_struct *__pick_next_task_fair(str= uct rq *rq, struct task_stru return pick_next_task_fair(rq, prev, NULL); } =20 -static bool fair_server_has_tasks(struct sched_dl_entity *dl_se) -{ - return !!dl_se->rq->cfs.nr_queued; -} - static struct task_struct *fair_server_pick_task(struct sched_dl_entity *d= l_se) { return pick_task_fair(dl_se->rq); @@ -8875,7 +8870,7 @@ void fair_server_init(struct rq *rq) =20 init_dl_entity(dl_se); =20 - dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick_task); + dl_server_init(dl_se, rq, fair_server_pick_task); } =20 /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index be9745d..f10d627 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -365,9 +365,6 @@ extern s64 dl_scaled_delta_exec(struct rq *rq, struct s= ched_dl_entity *dl_se, s6 * * dl_se::rq -- runqueue we belong to. * - * dl_se::server_has_tasks() -- used on bandwidth enforcement; we 'stop'= the - * server when it runs out of tasks to run. - * * dl_se::server_pick() -- nested pick_next_task(); we yield the period = if this * returns NULL. * @@ -383,7 +380,6 @@ extern void dl_server_update(struct sched_dl_entity *dl= _se, s64 delta_exec); extern void dl_server_start(struct sched_dl_entity *dl_se); extern void dl_server_stop(struct sched_dl_entity *dl_se); extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, - dl_server_has_tasks_f has_tasks, dl_server_pick_f pick_task); extern void sched_init_dl_servers(void); =20