From nobody Fri Jun 12 22:35:58 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D2563B4EA4 for ; Tue, 12 May 2026 09:03:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778576598; cv=none; b=ENgdxDZ3fcZR9KwDoXNU0LRfzLvKgTS2fqJxK2su2YP5WYy+A8FsSGAJ9bjFVtnZjGy1snxBx1V/NZDq2+lkaMCrXdxL2DxokQAilfKaMl+wslc2YCqDekUGIC8dUa+1qIKeuG0yK6/EQgKpkfertF/TGy6njGVQJ8CkC9c85oY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778576598; c=relaxed/simple; bh=swm5fmfuN8mlnI6ne7U2EflUQ83y0C8tL3L4wGq6xPM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=JTeRvbYFIRGntFJX3vV5vnnWf+QYPFh3AjeQPoyZ3mXWkgCIx79e1LEVBZgFcAxJw3RFd2nxMsVcSgyE6xpg3S5uWRRgFAmiSF6Bej7uoTrSELR7HkMKfJUF8nUTvJ3oAbi7HwxjWFkg+qHG894Natsgi/4LFTDDPp8IuGRIBbo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ftBMRurW; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ftBMRurW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1778576593; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=yl7BASCQXotgUqPQUH2W8QeWSp3+Fpm0adJ3KquNQNk=; b=ftBMRurW9DoJ5mQ3TXC0XCrrFPmtXiNGjNfEmB5Qr8IsSMuiK0sApbcyCcJPu91o5kkFFL 2Intf6Pz9wDwntdd2Ss5tlOpC5h9+yxAsO4vjyoJ+VWdM0REiPkEibg+VoKWs4jeBlSGzY TZprDC7r6kMJp40QHrGI9+gRMi7aCP0= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-683-Mw05CuHzPOe6oPYBiedg_Q-1; Tue, 12 May 2026 05:03:09 -0400 X-MC-Unique: Mw05CuHzPOe6oPYBiedg_Q-1 X-Mimecast-MFC-AGG-ID: Mw05CuHzPOe6oPYBiedg_Q_1778576587 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B3629195608F; Tue, 12 May 2026 09:03:06 +0000 (UTC) Received: from jlelli-thinkpadt14gen4.remote.csb (headnet01.pony-001.prod.iad2.dc.redhat.com [10.2.32.101]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A8B6019560A6; Tue, 12 May 2026 09:03:02 +0000 (UTC) From: Juri Lelli Date: Tue, 12 May 2026 11:02:37 +0200 Subject: [PATCH] sched/deadline: Make dl-server nohz full aware Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260512-upstream-fix-dlserver-nohzfull-b4-v1-1-a94844387ae7@redhat.com> X-B4-Tracking: v=1; b=H4sIAAAAAAAC/yWNywqDMBBFf0Vm3QENPmh/pXSRmLGmpFFmjJSK/ +7YLg+ce+4GQhxI4FZswLQGCVNSqC4F9KNNT8LglcGUpi2bymCeZWGybxzCB33U+UqMaRq/Q44 RXY2uqxsy9tp25EE7M5O6v4/748+S3Yv65QzDvh8bLPYthQAAAA== X-Change-ID: 20260512-upstream-fix-dlserver-nohzfull-b4-b745e2a967ed To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , Andrea Righi , Frederic Weisbecker Cc: linux-kernel@vger.kernel.org, David Haufe , Cao Ruichuang , Juri Lelli X-Developer-Signature: v=1; a=ed25519-sha256; t=1778576582; l=5633; i=juri.lelli@redhat.com; s=20250626; h=from:subject:message-id; bh=swm5fmfuN8mlnI6ne7U2EflUQ83y0C8tL3L4wGq6xPM=; b=sZ46nt77ugmjNEsChFC0zfPoewSzYSB/OynV73dz9RTCAXSjWZbPRNd9Nlj8OsePebf6AnWvl m8o7DDO3IvPCGYurcns0aVFZnUkAreE+lNeNNdFJJGS7XR85r0Pq/Wn X-Developer-Key: i=juri.lelli@redhat.com; a=ed25519; pk=kSwf88oiY/PYrNMRL/tjuBPiSGzc+U3bD13Zag6wO5Q= X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking isolation guarantees. The timer executes on a housekeeping core and eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores even when only a single task is running. The problem is that dl-servers are not coordinated with nohz_full tick state. Timers can fire and send IPIs to otherwise undisturbed cores. Fix by managing servers in sched_can_stop_tick(): - When RT tasks run with CFS/SCX tasks, start the appropriate server and keep the tick running - When only RT tasks remain, stop all servers and allow tick to stop (except for >1 RR tasks which need the tick for round-robin) - When only CFS/SCX tasks remain, stop all servers before stopping tick Introduce dl_servers_stop_all() to reduce duplication and abstract server management from core.c. Unify RT handling into one block that handles both RR and FIFO cases. Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server") Reported-by: David Haufe Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtD= d0yaKudW=3DrwVw@mail.gmail.com Signed-off-by: Juri Lelli --- I had to modify my first original attempt at fixing this (please take a look at the linked report/discussion) to also take SCX into consideration. FYI, I temporarily pushed the script I'm using to repro and verify the fix here https://github.com/jlelli/sched-deadline-tests/blob/master/test-dlserver-no= hz.sh --- kernel/sched/core.c | 43 +++++++++++++++++++++++-------------------- kernel/sched/deadline.c | 14 ++++++++++++++ kernel/sched/sched.h | 1 + 3 files changed, 38 insertions(+), 20 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b905805bbcbe4..98759255c306b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1414,30 +1414,35 @@ static inline bool __need_bw_check(struct rq *rq, s= truct task_struct *p) =20 bool sched_can_stop_tick(struct rq *rq) { - int fifo_nr_running; - /* Deadline tasks, even if single, need the tick */ if (rq->dl.dl_nr_running) return false; =20 /* - * If there are more than one RR tasks, we need the tick to affect the - * actual RR behaviour. + * If there are RT tasks, we may need the tick (for >1 RR tasks), + * but we must also service lower-priority CFS/SCX tasks via dl-servers. */ - if (rq->rt.rr_nr_running) { - if (rq->rt.rr_nr_running =3D=3D 1) - return true; - else + if (rq->rt.rt_nr_running) { + if (rq->cfs.h_nr_queued) { + dl_server_start(&rq->fair_server); + return false; + } +#ifdef CONFIG_SCHED_CLASS_EXT + if (rq->scx.nr_running) { + dl_server_start(&rq->ext_server); + return false; + } +#endif + /* + * Only RT tasks, no CFS/SCX. Stop servers to prevent spurious + * wakeups. Tick can stop for single RR or any FIFO, but must + * run for multiple RR (round-robin behavior). + */ + dl_servers_stop_all(rq); + if (rq->rt.rr_nr_running > 1) return false; - } - - /* - * If there's no RR tasks, but FIFO tasks, we can skip the tick, no - * forced preemption between FIFO tasks. - */ - fifo_nr_running =3D rq->rt.rt_nr_running - rq->rt.rr_nr_running; - if (fifo_nr_running) return true; + } =20 /* * If there are no DL,RR/FIFO tasks, there must only be CFS or SCX tasks @@ -1462,6 +1467,7 @@ bool sched_can_stop_tick(struct rq *rq) return false; } =20 + dl_servers_stop_all(rq); return true; } #endif /* CONFIG_NO_HZ_FULL */ @@ -8810,10 +8816,7 @@ int sched_cpu_dying(unsigned int cpu) WARN(true, "Dying CPU not properly vacated!"); dump_rq_tasks(rq, KERN_WARNING); } - dl_server_stop(&rq->fair_server); -#ifdef CONFIG_SCHED_CLASS_EXT - dl_server_stop(&rq->ext_server); -#endif + dl_servers_stop_all(rq); rq_unlock_irqrestore(rq, &rf); =20 calc_load_migrate(rq); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index edca7849b165d..c2b3d6bbe4828 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1826,6 +1826,20 @@ void dl_server_stop(struct sched_dl_entity *dl_se) dl_se->dl_server_active =3D 0; } =20 +/* + * Stop all dl-servers on this runqueue. Called when transitioning to a st= ate + * where the tick can be stopped (e.g., single RR/FIFO task, or no RT task= s). + * This ensures server timers are disarmed and won't cause spurious wakeup= s on + * nohz_full isolated cores. + */ +void dl_servers_stop_all(struct rq *rq) +{ + dl_server_stop(&rq->fair_server); +#ifdef CONFIG_SCHED_CLASS_EXT + dl_server_stop(&rq->ext_server); +#endif +} + void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, dl_server_pick_f pick_task) { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9f63b15d309d1..26cf1d14efde5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -412,6 +412,7 @@ extern void dl_server_update_idle(struct sched_dl_entit= y *dl_se, s64 delta_exec) extern void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec= ); extern void dl_server_start(struct sched_dl_entity *dl_se); extern void dl_server_stop(struct sched_dl_entity *dl_se); +extern void dl_servers_stop_all(struct rq *rq); extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, dl_server_pick_f pick_task); extern void sched_init_dl_servers(void); --- base-commit: 4ac4d6549a6563878d7c19c154e017f6cb7114d3 change-id: 20260512-upstream-fix-dlserver-nohzfull-b4-b745e2a967ed Best regards, -- =20 Juri Lelli