kernel/sched/deadline.c | 1 + 1 file changed, 1 insertion(+)
dl_server_stop() can leave a deadline server in an inconsistent internal
state across stop/start transitions, causing it to bypass its required
deferral phase when restarted. This breaks the scheduler invariant that
a restarted server must re-establish eligibility before being allowed to
execute.
When the server is stopped (e.g., because the associated task blocks),
it's expected to transition back to an inactive, initial state. However,
dl_server_stop() does not fully reset the execution state. As a result,
the server can be logically inactive while still appearing as if it was
still running.
When the server is restarted via dl_server_start(), the following
sequence occurs:
1. dl_server_start() calls enqueue_dl_entity(ENQUEUE_WAKEUP),
2. enqueue_dl_entity() calls update_dl_entity(),
3. update_dl_entity() checks (!dl_se->dl_defer_running) to decide
whether to arm the deferral mechanism,
4. because dl_defer_running is stale, the check fails,
5. dl_defer_armed and dl_throttled are not set,
6. enqueue_dl_entity() skips start_dl_timer(), because
dl_throttled == 0,
7. the server is enqueued via __enqueue_dl_entity(),
8. the scheduler picks the server to run,
9. update_curr_dl_se() detects that the server has exhausted its
runtime (or has negative runtime), as it wasn't properly
replenished/deferred,
10. the server is throttled (dl_throttled set to 1) and dequeued,
11. the server repeatedly cycles through wakeup and throttling,
effectively receiving no usable CPU bandwidth.
This results in starvation of the tasks serviced by the deadline server
in the presence of competing RT workloads.
This issue can be confirmed adding debugging traces, which show that the
server skips the deferral timer and is immediately throttled upon
execution with negative runtime:
DEBUG: dl_server_start: dl_defer_running=1 active=0
DEBUG: enqueue_dl_entity: flags=1 dl_throttled=0 dl_defer=1
DEBUG: update_dl_entity: dl_defer_running=1
DEBUG: enqueue_dl_entity: SKIPPING start_dl_timer! dl_throttled=0
...
DEBUG: update_curr_dl_se: THROTTLED runtime=-954758
Fix this by properly resetting dl_defer_running in dl_server_stop(),
ensuring the server correctly enters the defer phase upon restart.
This issue is quite difficult to observe when only the fair server
is present, as the required stop/start patterns are relatively rare.
However, it becomes easier to trigger with an additional deadline server
with more frequent server lifecycle transitions (such as a sched_ext
deadline server).
This change is a prerequisite for introducing a sched_ext deadline
server, as it ensures correct and predictable behavior across server
stop/start cycles.
Link: https://lore.kernel.org/all/aXEMat4IoNnGYgxw@gpd4/
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
kernel/sched/deadline.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c509f2e7d69de..214fe62a59723 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1813,6 +1813,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
hrtimer_try_to_cancel(&dl_se->dl_timer);
dl_se->dl_defer_armed = 0;
dl_se->dl_throttled = 0;
+ dl_se->dl_defer_running = 0;
dl_se->dl_defer_idle = 0;
dl_se->dl_server_active = 0;
}
--
2.52.0
Hello, On 22/01/26 15:08, Andrea Righi wrote: > dl_server_stop() can leave a deadline server in an inconsistent internal > state across stop/start transitions, causing it to bypass its required > deferral phase when restarted. This breaks the scheduler invariant that > a restarted server must re-establish eligibility before being allowed to > execute. > > When the server is stopped (e.g., because the associated task blocks), > it's expected to transition back to an inactive, initial state. However, > dl_server_stop() does not fully reset the execution state. As a result, > the server can be logically inactive while still appearing as if it was > still running. > > When the server is restarted via dl_server_start(), the following > sequence occurs: > 1. dl_server_start() calls enqueue_dl_entity(ENQUEUE_WAKEUP), > 2. enqueue_dl_entity() calls update_dl_entity(), > 3. update_dl_entity() checks (!dl_se->dl_defer_running) to decide > whether to arm the deferral mechanism, > 4. because dl_defer_running is stale, the check fails, > 5. dl_defer_armed and dl_throttled are not set, > 6. enqueue_dl_entity() skips start_dl_timer(), because > dl_throttled == 0, > 7. the server is enqueued via __enqueue_dl_entity(), > 8. the scheduler picks the server to run, > 9. update_curr_dl_se() detects that the server has exhausted its > runtime (or has negative runtime), as it wasn't properly > replenished/deferred, > 10. the server is throttled (dl_throttled set to 1) and dequeued, > 11. the server repeatedly cycles through wakeup and throttling, > effectively receiving no usable CPU bandwidth. > > This results in starvation of the tasks serviced by the deadline server > in the presence of competing RT workloads. > > This issue can be confirmed adding debugging traces, which show that the > server skips the deferral timer and is immediately throttled upon > execution with negative runtime: > > DEBUG: dl_server_start: dl_defer_running=1 active=0 > DEBUG: enqueue_dl_entity: flags=1 dl_throttled=0 dl_defer=1 > DEBUG: update_dl_entity: dl_defer_running=1 > DEBUG: enqueue_dl_entity: SKIPPING start_dl_timer! dl_throttled=0 > ... > DEBUG: update_curr_dl_se: THROTTLED runtime=-954758 > > Fix this by properly resetting dl_defer_running in dl_server_stop(), > ensuring the server correctly enters the defer phase upon restart. > > This issue is quite difficult to observe when only the fair server > is present, as the required stop/start patterns are relatively rare. > However, it becomes easier to trigger with an additional deadline server > with more frequent server lifecycle transitions (such as a sched_ext > deadline server). > > This change is a prerequisite for introducing a sched_ext deadline > server, as it ensures correct and predictable behavior across server > stop/start cycles. > > Link: https://lore.kernel.org/all/aXEMat4IoNnGYgxw@gpd4/ > Signed-off-by: Andrea Righi <arighi@nvidia.com> > --- > kernel/sched/deadline.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index c509f2e7d69de..214fe62a59723 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -1813,6 +1813,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se) > hrtimer_try_to_cancel(&dl_se->dl_timer); > dl_se->dl_defer_armed = 0; > dl_se->dl_throttled = 0; > + dl_se->dl_defer_running = 0; > dl_se->dl_defer_idle = 0; > dl_se->dl_server_active = 0; > } The fix looks good to me, thanks! State machine above dl_server_start() might need updating, though. Don't we want to add dl_defer_running = 0 under dl_server_stop() for case [4] D->A? Also for '[A] - init', dl_defer_running = 0 (remove /1)? Best, Juri
Hi Juri, On Fri, Jan 23, 2026 at 08:11:35AM +0100, Juri Lelli wrote: > Hello, > > On 22/01/26 15:08, Andrea Righi wrote: > > dl_server_stop() can leave a deadline server in an inconsistent internal > > state across stop/start transitions, causing it to bypass its required > > deferral phase when restarted. This breaks the scheduler invariant that > > a restarted server must re-establish eligibility before being allowed to > > execute. > > > > When the server is stopped (e.g., because the associated task blocks), > > it's expected to transition back to an inactive, initial state. However, > > dl_server_stop() does not fully reset the execution state. As a result, > > the server can be logically inactive while still appearing as if it was > > still running. > > > > When the server is restarted via dl_server_start(), the following > > sequence occurs: > > 1. dl_server_start() calls enqueue_dl_entity(ENQUEUE_WAKEUP), > > 2. enqueue_dl_entity() calls update_dl_entity(), > > 3. update_dl_entity() checks (!dl_se->dl_defer_running) to decide > > whether to arm the deferral mechanism, > > 4. because dl_defer_running is stale, the check fails, > > 5. dl_defer_armed and dl_throttled are not set, > > 6. enqueue_dl_entity() skips start_dl_timer(), because > > dl_throttled == 0, > > 7. the server is enqueued via __enqueue_dl_entity(), > > 8. the scheduler picks the server to run, > > 9. update_curr_dl_se() detects that the server has exhausted its > > runtime (or has negative runtime), as it wasn't properly > > replenished/deferred, > > 10. the server is throttled (dl_throttled set to 1) and dequeued, > > 11. the server repeatedly cycles through wakeup and throttling, > > effectively receiving no usable CPU bandwidth. > > > > This results in starvation of the tasks serviced by the deadline server > > in the presence of competing RT workloads. > > > > This issue can be confirmed adding debugging traces, which show that the > > server skips the deferral timer and is immediately throttled upon > > execution with negative runtime: > > > > DEBUG: dl_server_start: dl_defer_running=1 active=0 > > DEBUG: enqueue_dl_entity: flags=1 dl_throttled=0 dl_defer=1 > > DEBUG: update_dl_entity: dl_defer_running=1 > > DEBUG: enqueue_dl_entity: SKIPPING start_dl_timer! dl_throttled=0 > > ... > > DEBUG: update_curr_dl_se: THROTTLED runtime=-954758 > > > > Fix this by properly resetting dl_defer_running in dl_server_stop(), > > ensuring the server correctly enters the defer phase upon restart. > > > > This issue is quite difficult to observe when only the fair server > > is present, as the required stop/start patterns are relatively rare. > > However, it becomes easier to trigger with an additional deadline server > > with more frequent server lifecycle transitions (such as a sched_ext > > deadline server). > > > > This change is a prerequisite for introducing a sched_ext deadline > > server, as it ensures correct and predictable behavior across server > > stop/start cycles. > > > > Link: https://lore.kernel.org/all/aXEMat4IoNnGYgxw@gpd4/ > > Signed-off-by: Andrea Righi <arighi@nvidia.com> > > --- > > kernel/sched/deadline.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > index c509f2e7d69de..214fe62a59723 100644 > > --- a/kernel/sched/deadline.c > > +++ b/kernel/sched/deadline.c > > @@ -1813,6 +1813,7 @@ void dl_server_stop(struct sched_dl_entity *dl_se) > > hrtimer_try_to_cancel(&dl_se->dl_timer); > > dl_se->dl_defer_armed = 0; > > dl_se->dl_throttled = 0; > > + dl_se->dl_defer_running = 0; > > dl_se->dl_defer_idle = 0; > > dl_se->dl_server_active = 0; > > } > > The fix looks good to me, thanks! > > State machine above dl_server_start() might need updating, though. Don't > we want to add dl_defer_running = 0 under dl_server_stop() for case [4] > D->A? Also for '[A] - init', dl_defer_running = 0 (remove /1)? Definitely! I'll send a v2 with the updated state machine documentation. Thanks, -Andrea
© 2016 - 2026 Red Hat, Inc.