A global limits change (sched_rt_handler() logic) currently leaves stale
and/or incorrect values in variables related to accounting (e.g.
extra_bw).
Properly clean up per runqueue variables before implementing the change
and rebuild scheduling domains (so that accounting is also properly
restored) after such a change is complete.
Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
kernel/sched/deadline.c | 4 +++-
kernel/sched/rt.c | 6 ++++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 7a3b556d45a99..187f324565f92 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3166,6 +3166,9 @@ void sched_dl_do_global(void)
if (global_rt_runtime() != RUNTIME_INF)
new_bw = to_ratio(global_rt_period(), global_rt_runtime());
+ for_each_possible_cpu(cpu)
+ init_dl_rq_bw_ratio(&cpu_rq(cpu)->dl);
+
for_each_possible_cpu(cpu) {
rcu_read_lock_sched();
@@ -3181,7 +3184,6 @@ void sched_dl_do_global(void)
raw_spin_unlock_irqrestore(&dl_b->lock, flags);
rcu_read_unlock_sched();
- init_dl_rq_bw_ratio(&cpu_rq(cpu)->dl);
}
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 15d5855c542cb..be6e9bcbe82b6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2886,6 +2886,12 @@ static int sched_rt_handler(const struct ctl_table *table, int write, void *buff
sched_domains_mutex_unlock();
mutex_unlock(&mutex);
+ /*
+ * After changing maximum available bandwidth for DEADLINE, we need to
+ * recompute per root domain and per cpus variables accordingly.
+ */
+ rebuild_sched_domains();
+
return ret;
}
--
2.49.0
On Fri, Jun 27, 2025 at 01:51:16PM +0200, Juri Lelli wrote: > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index 15d5855c542cb..be6e9bcbe82b6 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -2886,6 +2886,12 @@ static int sched_rt_handler(const struct ctl_table *table, int write, void *buff > sched_domains_mutex_unlock(); > mutex_unlock(&mutex); > > + /* > + * After changing maximum available bandwidth for DEADLINE, we need to > + * recompute per root domain and per cpus variables accordingly. > + */ > + rebuild_sched_domains(); > + > return ret; > } So I'll merge these patches since correctness first etc. But the above is quite terrible. It would be really good not to have to rebuild the sched domains for every rt change. Surely we can iterate the existing domains and update stuff?
On 14/07/25 10:59, Peter Zijlstra wrote: > On Fri, Jun 27, 2025 at 01:51:16PM +0200, Juri Lelli wrote: > > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > > index 15d5855c542cb..be6e9bcbe82b6 100644 > > --- a/kernel/sched/rt.c > > +++ b/kernel/sched/rt.c > > @@ -2886,6 +2886,12 @@ static int sched_rt_handler(const struct ctl_table *table, int write, void *buff > > sched_domains_mutex_unlock(); > > mutex_unlock(&mutex); > > > > + /* > > + * After changing maximum available bandwidth for DEADLINE, we need to > > + * recompute per root domain and per cpus variables accordingly. > > + */ > > + rebuild_sched_domains(); > > + > > return ret; > > } > > So I'll merge these patches since correctness first etc. But the above Thanks! > is quite terrible. It would be really good not to have to rebuild the > sched domains for every rt change. Surely we can iterate the existing > domains and update stuff? Yeah, I agree. Tried doing an update at first, but then the involved locking and the not so pleasant thing I could come up with made me decide for the big hammer. Also because it should be a very infrequent operation anyway. But, I will try again somewhat soon. Thanks, Juri
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 440989c10f4e32620e9e2717ca52c3ed7ae11048
Gitweb: https://git.kernel.org/tip/440989c10f4e32620e9e2717ca52c3ed7ae11048
Author: Juri Lelli <juri.lelli@redhat.com>
AuthorDate: Fri, 27 Jun 2025 13:51:16 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 14 Jul 2025 10:59:33 +02:00
sched/deadline: Fix accounting after global limits change
A global limits change (sched_rt_handler() logic) currently leaves stale
and/or incorrect values in variables related to accounting (e.g.
extra_bw).
Properly clean up per runqueue variables before implementing the change
and rebuild scheduling domains (so that accounting is also properly
restored) after such a change is complete.
Reported-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # nuc & rock5b
Link: https://lore.kernel.org/r/20250627115118.438797-4-juri.lelli@redhat.com
---
kernel/sched/deadline.c | 4 +++-
kernel/sched/rt.c | 6 ++++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0abffe3..9c7d952 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3183,6 +3183,9 @@ void sched_dl_do_global(void)
if (global_rt_runtime() != RUNTIME_INF)
new_bw = to_ratio(global_rt_period(), global_rt_runtime());
+ for_each_possible_cpu(cpu)
+ init_dl_rq_bw_ratio(&cpu_rq(cpu)->dl);
+
for_each_possible_cpu(cpu) {
rcu_read_lock_sched();
@@ -3198,7 +3201,6 @@ void sched_dl_do_global(void)
raw_spin_unlock_irqrestore(&dl_b->lock, flags);
rcu_read_unlock_sched();
- init_dl_rq_bw_ratio(&cpu_rq(cpu)->dl);
}
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 15d5855..be6e9bc 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2886,6 +2886,12 @@ undo:
sched_domains_mutex_unlock();
mutex_unlock(&mutex);
+ /*
+ * After changing maximum available bandwidth for DEADLINE, we need to
+ * recompute per root domain and per cpus variables accordingly.
+ */
+ rebuild_sched_domains();
+
return ret;
}
© 2016 - 2025 Red Hat, Inc.