kernel/sched/sched.h | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
With the consolidation of put_prev_task/set_next_task(), we are now
skipping the sched_ext ops.stopping/running() transitions when the
previous and next tasks are the same, see commit 436f3eed5c69 ("sched:
Combine the last put_prev_task() and the first set_next_task()").
While this optimization makes sense in general, it can negatively impact
performance in some user-space schedulers, that expect to handle such
transitions when tasks exhaust their timeslice (see SCX_OPS_ENQ_LAST).
For example, scx_rustland suffers a significant performance regression
(e.g., gaming benchmarks drop from ~60fps to ~10fps).
To fix this, ensure that put_prev_task()/set_next_task() are never
skipped when the scx scheduling class is enabled, allowing the scx class
to handle such transitions.
This change restores the previous behavior, fixing the performance
regression in scx_rustland.
Link: https://github.com/sched-ext/scx/issues/788
Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
---
kernel/sched/sched.h | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6085ef50febf..44d736e49d06 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2470,21 +2470,6 @@ __put_prev_set_next_dl_server(struct rq *rq,
rq->dl_server = NULL;
}
-static inline void put_prev_set_next_task(struct rq *rq,
- struct task_struct *prev,
- struct task_struct *next)
-{
- WARN_ON_ONCE(rq->curr != prev);
-
- __put_prev_set_next_dl_server(rq, prev, next);
-
- if (next == prev)
- return;
-
- prev->sched_class->put_prev_task(rq, prev, next);
- next->sched_class->set_next_task(rq, next, true);
-}
-
/*
* Helper to define a sched_class instance; each one is placed in a separate
* section which is ordered by the linker script:
@@ -2523,6 +2508,21 @@ DECLARE_STATIC_KEY_FALSE(__scx_switched_all); /* all fair class tasks on SCX */
#define scx_switched_all() false
#endif /* !CONFIG_SCHED_CLASS_EXT */
+static inline void put_prev_set_next_task(struct rq *rq,
+ struct task_struct *prev,
+ struct task_struct *next)
+{
+ WARN_ON_ONCE(rq->curr != prev);
+
+ __put_prev_set_next_dl_server(rq, prev, next);
+
+ if (next == prev && !scx_enabled())
+ return;
+
+ prev->sched_class->put_prev_task(rq, prev, next);
+ next->sched_class->set_next_task(rq, next, true);
+}
+
/*
* Iterate only active classes. SCX can take over all fair tasks or be
* completely disabled. If the former, skip fair. If the latter, skip SCX.
--
2.47.0
On Sun, Oct 13, 2024 at 07:39:28PM +0200, Andrea Righi wrote: > With the consolidation of put_prev_task/set_next_task(), we are now > skipping the sched_ext ops.stopping/running() transitions when the > previous and next tasks are the same, see commit 436f3eed5c69 ("sched: > Combine the last put_prev_task() and the first set_next_task()"). > > While this optimization makes sense in general, it can negatively impact > performance in some user-space schedulers, that expect to handle such > transitions when tasks exhaust their timeslice (see SCX_OPS_ENQ_LAST). > > For example, scx_rustland suffers a significant performance regression > (e.g., gaming benchmarks drop from ~60fps to ~10fps). > > To fix this, ensure that put_prev_task()/set_next_task() are never > skipped when the scx scheduling class is enabled, allowing the scx class > to handle such transitions. > > This change restores the previous behavior, fixing the performance > regression in scx_rustland. > > Link: https://github.com/sched-ext/scx/issues/788 How persistent are links like that? In general I strongly discourage links to things not pointing to kernel.org resources. > @@ -2523,6 +2508,21 @@ DECLARE_STATIC_KEY_FALSE(__scx_switched_all); /* all fair class tasks on SCX */ > #define scx_switched_all() false > #endif /* !CONFIG_SCHED_CLASS_EXT */ > > +static inline void put_prev_set_next_task(struct rq *rq, > + struct task_struct *prev, > + struct task_struct *next) > +{ > + WARN_ON_ONCE(rq->curr != prev); > + > + __put_prev_set_next_dl_server(rq, prev, next); > + > + if (next == prev && !scx_enabled()) > + return; Does that not also want to include a 'next->sched_class == &ext_sched_class' clause ? And a comment? > + > + prev->sched_class->put_prev_task(rq, prev, next); > + next->sched_class->set_next_task(rq, next, true); > +} And is there really no way scx can infer this happened? We just did pick after all, that can see this coming a mile of.
On Mon, Oct 14, 2024 at 10:36:08AM +0200, Peter Zijlstra wrote: ... > > @@ -2523,6 +2508,21 @@ DECLARE_STATIC_KEY_FALSE(__scx_switched_all); /* all fair class tasks on SCX */ > > #define scx_switched_all() false > > #endif /* !CONFIG_SCHED_CLASS_EXT */ > > > > +static inline void put_prev_set_next_task(struct rq *rq, > > + struct task_struct *prev, > > + struct task_struct *next) > > +{ > > + WARN_ON_ONCE(rq->curr != prev); > > + > > + __put_prev_set_next_dl_server(rq, prev, next); > > + > > + if (next == prev && !scx_enabled()) > > + return; > > Does that not also want to include a 'next->sched_class == > &ext_sched_class' clause ? And a comment? > > > + > > + prev->sched_class->put_prev_task(rq, prev, next); > > + next->sched_class->set_next_task(rq, next, true); > > +} > > And is there really no way scx can infer this happened? We just did pick > after all, that can see this coming a mile of. Ah, I believe I understand better what's happening now. When prev == next with the idle class we're not calling ->put_prev_task/set_next_task anymore, so we may skip calling ops.update_idle() in scx. I think that's the only special case that we need to handle, and we may be able to solve the regression by calling scx_update_idle() from pick_task_idle(). Will do some testing with this. -Andrea
On Mon, Oct 14, 2024 at 10:36:08AM +0200, Peter Zijlstra wrote: > On Sun, Oct 13, 2024 at 07:39:28PM +0200, Andrea Righi wrote: > > With the consolidation of put_prev_task/set_next_task(), we are now > > skipping the sched_ext ops.stopping/running() transitions when the > > previous and next tasks are the same, see commit 436f3eed5c69 ("sched: > > Combine the last put_prev_task() and the first set_next_task()"). > > > > While this optimization makes sense in general, it can negatively impact > > performance in some user-space schedulers, that expect to handle such > > transitions when tasks exhaust their timeslice (see SCX_OPS_ENQ_LAST). > > > > For example, scx_rustland suffers a significant performance regression > > (e.g., gaming benchmarks drop from ~60fps to ~10fps). > > > > To fix this, ensure that put_prev_task()/set_next_task() are never > > skipped when the scx scheduling class is enabled, allowing the scx class > > to handle such transitions. > > > > This change restores the previous behavior, fixing the performance > > regression in scx_rustland. > > > > Link: https://github.com/sched-ext/scx/issues/788 > > How persistent are links like that? In general I strongly discourage > links to things not pointing to kernel.org resources. This one persists also after the issue is marked as resolved, I only added it to provide more context about the problem. However, the the commit description already contains all the details, so we can probably get rid of the link. > > > @@ -2523,6 +2508,21 @@ DECLARE_STATIC_KEY_FALSE(__scx_switched_all); /* all fair class tasks on SCX */ > > #define scx_switched_all() false > > #endif /* !CONFIG_SCHED_CLASS_EXT */ > > > > +static inline void put_prev_set_next_task(struct rq *rq, > > + struct task_struct *prev, > > + struct task_struct *next) > > +{ > > + WARN_ON_ONCE(rq->curr != prev); > > + > > + __put_prev_set_next_dl_server(rq, prev, next); > > + > > + if (next == prev && !scx_enabled()) > > + return; > > Does that not also want to include a 'next->sched_class == > &ext_sched_class' clause ? And a comment? Good point. > > > + > > + prev->sched_class->put_prev_task(rq, prev, next); > > + next->sched_class->set_next_task(rq, next, true); > > +} > > And is there really no way scx can infer this happened? We just did pick > after all, that can see this coming a mile of. I'll do some testing, we can probably infer this in pick_task_scx() and make adjustments there, or possibly in scx_update_idle() within the idle class, since that seems to be where the real issue lies. Thanks for looking at this, -Andrea
© 2016 - 2024 Red Hat, Inc.