[PATCH] sched/fair: revert force wakeup preemption

Vincent Guittot posted 1 patch 2 weeks ago
kernel/sched/fair.c | 10 ----------
1 file changed, 10 deletions(-)
[PATCH] sched/fair: revert force wakeup preemption
Posted by Vincent Guittot 2 weeks ago
This agressively bypasses run_to_parity and slice protection with the
assumpiton that this is what waker wants but there is no garantee that
the wakee will be the next to run. It is a better choice to use
yield_to_task or WF_SYNC in such case.

This increases the number of resched and preemption because a task becomes
quickly "ineligible" when it runs; We update the task vruntime periodically
and before the task exhausted its slice or at least quantum.

Example:
2 tasks A and B wake up simultaneously with lag = 0. Both are
eligible. Task A runs 1st and wakes up task C. Scheduler updates task
A's vruntime which becomes greater than average runtime as all others
have a lag == 0 and didn't run yet. Now task A is ineligible because
it received more runtime than the other task but it has not yet
exhausted its slice nor a min quantum. We force preemption, disable
protection but Task B will run 1st not task C.

Sidenote, DELAY_ZERO increases this effect by clearing positive lag at
wake up.

Fixes: e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04993c763a06..16ecc3475fe2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8822,16 +8822,6 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
 	if ((wake_flags & WF_FORK) || pse->sched_delayed)
 		return;
 
-	/*
-	 * If @p potentially is completing work required by current then
-	 * consider preemption.
-	 *
-	 * Reschedule if waker is no longer eligible. */
-	if (in_task() && !entity_eligible(cfs_rq, se)) {
-		preempt_action = PREEMPT_WAKEUP_RESCHED;
-		goto preempt;
-	}
-
 	/* Prefer picking wakee soon if appropriate. */
 	if (sched_feat(NEXT_BUDDY) &&
 	    set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
-- 
2.43.0
Re: [PATCH] sched/fair: revert force wakeup preemption
Posted by Ryan Roberts 1 week, 2 days ago
On 23/01/2026 10:28, Vincent Guittot wrote:
> This agressively bypasses run_to_parity and slice protection with the
> assumpiton that this is what waker wants but there is no garantee that
> the wakee will be the next to run. It is a better choice to use
> yield_to_task or WF_SYNC in such case.
> 
> This increases the number of resched and preemption because a task becomes
> quickly "ineligible" when it runs; We update the task vruntime periodically
> and before the task exhausted its slice or at least quantum.
> 
> Example:
> 2 tasks A and B wake up simultaneously with lag = 0. Both are
> eligible. Task A runs 1st and wakes up task C. Scheduler updates task
> A's vruntime which becomes greater than average runtime as all others
> have a lag == 0 and didn't run yet. Now task A is ineligible because
> it received more runtime than the other task but it has not yet
> exhausted its slice nor a min quantum. We force preemption, disable
> protection but Task B will run 1st not task C.
> 
> Sidenote, DELAY_ZERO increases this effect by clearing positive lag at
> wake up.
> 
> Fixes: e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

I see that this is already merged for -rc7 (which is great - thanks for the fast 
turnaround!). Here are the performance results I promised.

TL;DR: This patch combined with the NEXT_BUDDY disablement patch fixes all the 
regressions I originally reported.


6-18-0 (base)		(baseline)
6-19-0-rc6		(New NEXT_BUDDY implementation enabled)
6-19-0-rc6+p1		(New NEXT_BUDDY implementation disabled)
6-19-0-rc6+p1+p2	(+ this patch)


Multi-node SUT (workload running across 2 machines):

+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| Benchmark                       | Result Class                                       | 6-18-0 (base) |  6-19-0-rc6 | 6-19-0-rc6+p1 | 6-19-0-rc6+p1+p2 |
+=================================+====================================================+===============+=============+===============+==================+
| repro-collection/mysql-workload | db transaction rate (transactions/min)             |     646267.33 |  (R) -0.89% |     (I) 4.01% |        (I) 6.03% |
|                                 | new order rate (orders/min)                        |     213256.50 |  (R) -0.89% |     (I) 3.94% |        (I) 6.05% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+

Single-node SUT (workload running on single machine):

+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| Benchmark                       | Result Class                                       | 6-18-0 (base) |  6-19-0-rc6 | 6-19-0-rc6+p1 | 6-19-0-rc6+p1+p2 |
+=================================+====================================================+===============+=============+===============+==================+
| specjbb/composite               | critical-jOPS (jOPS)                               |      94700.00 |  (R) -4.12% |     (I) 3.07% |        (I) 1.27% |
|                                 | max-jOPS (jOPS)                                    |     113984.50 |  (R) -2.80% |     (I) 1.94% |        (I) 1.94% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| repro-collection/mysql-workload | db transaction rate (transactions/min)             |     245438.25 |  (R) -3.07% |        -1.34% |            0.23% |
|                                 | new order rate (orders/min)                        |      80985.75 |  (R) -3.06% |        -1.29% |            0.25% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| pts/pgbench                     | Scale: 1 Clients: 1 Read Only (TPS)                |      63124.00 |   (I) 2.67% |         2.58% |        (I) 2.69% |
|                                 | Scale: 1 Clients: 1 Read Only - Latency (ms)       |         0.016 |       4.35% |         4.35% |            4.35% |
|                                 | Scale: 1 Clients: 1 Read Write (TPS)               |        974.92 |       0.03% |         0.11% |           -0.06% |
|                                 | Scale: 1 Clients: 1 Read Write - Latency (ms)      |          1.03 |       0.01% |         0.14% |           -0.04% |
|                                 | Scale: 1 Clients: 250 Read Only (TPS)              |    1915931.58 |  (R) -3.28% |    (R) -3.92% |            1.23% |
|                                 | Scale: 1 Clients: 250 Read Only - Latency (ms)     |          0.13 |  (R) -3.33% |    (R) -3.93% |            1.16% |
|                                 | Scale: 1 Clients: 250 Read Write (TPS)             |        855.67 |       0.27% |        -0.49% |           -1.44% |
|                                 | Scale: 1 Clients: 250 Read Write - Latency (ms)    |        292.39 |       0.32% |        -0.49% |           -1.40% |
|                                 | Scale: 1 Clients: 1000 Read Only (TPS)             |    1534130.08 | (R) -12.20% |   (R) -11.85% |            0.45% |
|                                 | Scale: 1 Clients: 1000 Read Only - Latency (ms)    |          0.65 | (R) -12.19% |   (R) -11.87% |            0.46% |
|                                 | Scale: 1 Clients: 1000 Read Write (TPS)            |        578.75 |       0.85% |         1.60% |           -5.23% |
|                                 | Scale: 1 Clients: 1000 Read Write - Latency (ms)   |       1736.98 |       1.12% |         1.52% |           -4.91% |
|                                 | Scale: 100 Clients: 1 Read Only (TPS)              |      57170.33 |       1.64% |         2.16% |            1.69% |
|                                 | Scale: 100 Clients: 1 Read Only - Latency (ms)     |         0.018 |       1.94% |         1.94% |            2.94% |
|                                 | Scale: 100 Clients: 1 Read Write (TPS)             |        836.58 |       0.27% |         0.07% |            0.13% |
|                                 | Scale: 100 Clients: 1 Read Write - Latency (ms)    |          1.20 |       0.27% |         0.06% |            0.15% |
|                                 | Scale: 100 Clients: 250 Read Only (TPS)            |    1773440.67 |  (R) -2.54% |    (R) -2.94% |            1.00% |
|                                 | Scale: 100 Clients: 250 Read Only - Latency (ms)   |          0.14 |  (R) -2.42% |    (R) -2.87% |            1.08% |
|                                 | Scale: 100 Clients: 250 Read Write (TPS)           |       5505.50 |      -1.51% |         0.17% |           -0.03% |
|                                 | Scale: 100 Clients: 250 Read Write - Latency (ms)  |         45.42 |      -1.52% |         0.17% |           -0.03% |
|                                 | Scale: 100 Clients: 1000 Read Only (TPS)           |    1393037.50 | (R) -10.08% |   (R) -10.36% |            0.60% |
|                                 | Scale: 100 Clients: 1000 Read Only - Latency (ms)  |          0.72 | (R) -10.07% |   (R) -10.35% |            0.60% |
|                                 | Scale: 100 Clients: 1000 Read Write (TPS)          |       5085.92 |       0.70% |        -2.32% |           -0.28% |
|                                 | Scale: 100 Clients: 1000 Read Write - Latency (ms) |        196.79 |       0.72% |        -2.27% |           -0.29% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
| mmtests/hackbench               | hackbench-process-pipes-1 (seconds)                |          0.14 |      -1.28% |         0.35% |           -1.85% |
|                                 | hackbench-process-pipes-4 (seconds)                |          0.44 |   (I) 8.20% |     (I) 5.72% |        (I) 7.23% |
|                                 | hackbench-process-pipes-7 (seconds)                |          0.68 | (R) -18.31% |   (R) -24.54% |            1.56% |
|                                 | hackbench-process-pipes-12 (seconds)               |          1.24 | (R) -19.52% |   (R) -24.55% |           -0.25% |
|                                 | hackbench-process-pipes-21 (seconds)               |          1.81 |  (R) -7.33% |   (R) -13.58% |           -1.14% |
|                                 | hackbench-process-pipes-30 (seconds)               |          2.39 |  (R) -7.86% |   (R) -13.21% |           -0.23% |
|                                 | hackbench-process-pipes-48 (seconds)               |          3.18 | (R) -10.72% |   (R) -12.63% |            1.22% |
|                                 | hackbench-process-pipes-79 (seconds)               |          3.84 |  (R) -9.52% |   (R) -10.31% |           -0.07% |
|                                 | hackbench-process-pipes-110 (seconds)              |          4.68 |  (R) -6.78% |    (R) -7.15% |            1.30% |
|                                 | hackbench-process-pipes-141 (seconds)              |          5.75 |  (R) -5.50% |    (R) -5.60% |            1.11% |
|                                 | hackbench-process-pipes-172 (seconds)              |          6.80 |  (R) -4.67% |    (R) -4.79% |            1.61% |
|                                 | hackbench-process-pipes-203 (seconds)              |          7.94 |  (R) -4.01% |    (R) -3.74% |        (I) 2.08% |
|                                 | hackbench-process-pipes-234 (seconds)              |          9.02 |  (R) -3.69% |    (R) -3.63% |            1.67% |
|                                 | hackbench-process-pipes-256 (seconds)              |          9.78 |  (R) -3.80% |    (R) -3.19% |            1.65% |
|                                 | hackbench-process-sockets-1 (seconds)              |          0.29 |      -0.38% |        -0.43% |            0.03% |
|                                 | hackbench-process-sockets-4 (seconds)              |          0.76 |  (I) 17.71% |    (I) 18.69% |       (I) 19.52% |
|                                 | hackbench-process-sockets-7 (seconds)              |          1.16 |  (I) 12.10% |    (I) 11.37% |       (I) 13.52% |
|                                 | hackbench-process-sockets-12 (seconds)             |          1.86 |  (I) 10.19% |     (I) 9.31% |       (I) 12.83% |
|                                 | hackbench-process-sockets-21 (seconds)             |          3.12 |   (I) 9.59% |     (I) 8.99% |       (I) 12.15% |
|                                 | hackbench-process-sockets-30 (seconds)             |          4.30 |   (I) 6.23% |     (I) 6.75% |        (I) 8.88% |
|                                 | hackbench-process-sockets-48 (seconds)             |          6.58 |   (I) 2.39% |     (I) 2.98% |        (I) 4.39% |
|                                 | hackbench-process-sockets-79 (seconds)             |         10.56 |   (I) 3.44% |     (I) 3.10% |        (I) 3.94% |
|                                 | hackbench-process-sockets-110 (seconds)            |         13.85 |      -0.77% |         0.44% |        (I) 2.50% |
|                                 | hackbench-process-sockets-141 (seconds)            |         19.23 |      -0.47% |         1.54% |            2.95% |
|                                 | hackbench-process-sockets-172 (seconds)            |         26.33 |   (I) 3.44% |     (I) 4.25% |        (I) 3.21% |
|                                 | hackbench-process-sockets-203 (seconds)            |         30.27 |       0.36% |         1.67% |            0.90% |
|                                 | hackbench-process-sockets-234 (seconds)            |         35.12 |       2.05% |     (I) 3.11% |        (I) 2.45% |
|                                 | hackbench-process-sockets-256 (seconds)            |         38.74 |      -0.39% |         1.48% |            2.13% |
|                                 | hackbench-thread-pipes-1 (seconds)                 |          0.17 |      -0.38% |        -0.76% |           -1.51% |
|                                 | hackbench-thread-pipes-4 (seconds)                 |          0.45 |   (I) 7.85% |     (I) 6.15% |        (I) 9.93% |
|                                 | hackbench-thread-pipes-7 (seconds)                 |          0.74 |  (R) -7.22% |    (R) -9.98% |        (I) 6.47% |
|                                 | hackbench-thread-pipes-12 (seconds)                |          1.32 |  (R) -7.62% |   (R) -14.42% |            1.27% |
|                                 | hackbench-thread-pipes-21 (seconds)                |          1.95 |  (R) -3.00% |    (R) -7.93% |           -1.67% |
|                                 | hackbench-thread-pipes-30 (seconds)                |          2.50 |  (R) -4.79% |   (R) -11.99% |           -1.72% |
|                                 | hackbench-thread-pipes-48 (seconds)                |          3.32 |  (R) -5.49% |   (R) -11.45% |            1.15% |
|                                 | hackbench-thread-pipes-79 (seconds)                |          4.04 |  (R) -6.16% |    (R) -8.88% |           -0.56% |
|                                 | hackbench-thread-pipes-110 (seconds)               |          4.94 |  (R) -2.62% |    (R) -4.92% |            0.63% |
|                                 | hackbench-thread-pipes-141 (seconds)               |          6.04 |  (R) -2.05% |    (R) -3.56% |            0.51% |
|                                 | hackbench-thread-pipes-172 (seconds)               |          7.15 |      -0.74% |        -1.93% |            0.91% |
|                                 | hackbench-thread-pipes-203 (seconds)               |          8.31 |      -1.20% |        -1.41% |            0.91% |
|                                 | hackbench-thread-pipes-234 (seconds)               |          9.49 |      -0.65% |        -1.21% |            0.92% |
|                                 | hackbench-thread-pipes-256 (seconds)               |         10.30 |      -0.56% |        -0.92% |            0.88% |
|                                 | hackbench-thread-sockets-1 (seconds)               |          0.31 |       0.16% |        -0.05% |           -0.48% |
|                                 | hackbench-thread-sockets-4 (seconds)               |          0.79 |  (I) 18.70% |    (I) 19.30% |       (I) 19.79% |
|                                 | hackbench-thread-sockets-7 (seconds)               |          1.16 |  (I) 12.35% |    (I) 11.90% |       (I) 12.91% |
|                                 | hackbench-thread-sockets-12 (seconds)              |          1.87 |  (I) 12.75% |    (I) 11.66% |       (I) 14.43% |
|                                 | hackbench-thread-sockets-21 (seconds)              |          3.16 |  (I) 11.55% |    (I) 11.06% |       (I) 14.41% |
|                                 | hackbench-thread-sockets-30 (seconds)              |          4.32 |   (I) 7.66% |     (I) 6.58% |       (I) 10.15% |
|                                 | hackbench-thread-sockets-48 (seconds)              |          6.45 |   (I) 2.62% |         1.92% |        (I) 4.10% |
|                                 | hackbench-thread-sockets-79 (seconds)              |         10.15 |       1.85% |        -0.20% |            1.54% |
|                                 | hackbench-thread-sockets-110 (seconds)             |         13.45 |      -0.29% |        -0.41% |            0.08% |
|                                 | hackbench-thread-sockets-141 (seconds)             |         17.87 |      -1.84% |        -1.01% |            1.33% |
|                                 | hackbench-thread-sockets-172 (seconds)             |         24.38 |       0.82% |         1.33% |            3.68% |
|                                 | hackbench-thread-sockets-203 (seconds)             |         28.38 |      -1.29% |         0.72% |            1.58% |
|                                 | hackbench-thread-sockets-234 (seconds)             |         32.75 |      -1.01% |         1.00% |            0.94% |
|                                 | hackbench-thread-sockets-256 (seconds)             |         36.49 |      -0.99% |         1.22% |            1.00% |
+---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+

Thanks,
Ryan


> ---
>  kernel/sched/fair.c | 10 ----------
>  1 file changed, 10 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04993c763a06..16ecc3475fe2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8822,16 +8822,6 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
>  	if ((wake_flags & WF_FORK) || pse->sched_delayed)
>  		return;
>  
> -	/*
> -	 * If @p potentially is completing work required by current then
> -	 * consider preemption.
> -	 *
> -	 * Reschedule if waker is no longer eligible. */
> -	if (in_task() && !entity_eligible(cfs_rq, se)) {
> -		preempt_action = PREEMPT_WAKEUP_RESCHED;
> -		goto preempt;
> -	}
> -
>  	/* Prefer picking wakee soon if appropriate. */
>  	if (sched_feat(NEXT_BUDDY) &&
>  	    set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
Re: [PATCH] sched/fair: revert force wakeup preemption
Posted by Vincent Guittot 1 week, 2 days ago
On Wed, 28 Jan 2026 at 13:24, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 23/01/2026 10:28, Vincent Guittot wrote:
> > This agressively bypasses run_to_parity and slice protection with the
> > assumpiton that this is what waker wants but there is no garantee that
> > the wakee will be the next to run. It is a better choice to use
> > yield_to_task or WF_SYNC in such case.
> >
> > This increases the number of resched and preemption because a task becomes
> > quickly "ineligible" when it runs; We update the task vruntime periodically
> > and before the task exhausted its slice or at least quantum.
> >
> > Example:
> > 2 tasks A and B wake up simultaneously with lag = 0. Both are
> > eligible. Task A runs 1st and wakes up task C. Scheduler updates task
> > A's vruntime which becomes greater than average runtime as all others
> > have a lag == 0 and didn't run yet. Now task A is ineligible because
> > it received more runtime than the other task but it has not yet
> > exhausted its slice nor a min quantum. We force preemption, disable
> > protection but Task B will run 1st not task C.
> >
> > Sidenote, DELAY_ZERO increases this effect by clearing positive lag at
> > wake up.
> >
> > Fixes: e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>
> I see that this is already merged for -rc7 (which is great - thanks for the fast
> turnaround!). Here are the performance results I promised.
>
> TL;DR: This patch combined with the NEXT_BUDDY disablement patch fixes all the
> regressions I originally reported.

Thanks for the test results and the confirmation that regressions are fixes

>
>
> 6-18-0 (base)           (baseline)
> 6-19-0-rc6              (New NEXT_BUDDY implementation enabled)
> 6-19-0-rc6+p1           (New NEXT_BUDDY implementation disabled)
> 6-19-0-rc6+p1+p2        (+ this patch)
>
>
> Multi-node SUT (workload running across 2 machines):
>
> +---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
> | Benchmark                       | Result Class                                       | 6-18-0 (base) |  6-19-0-rc6 | 6-19-0-rc6+p1 | 6-19-0-rc6+p1+p2 |
> +=================================+====================================================+===============+=============+===============+==================+
> | repro-collection/mysql-workload | db transaction rate (transactions/min)             |     646267.33 |  (R) -0.89% |     (I) 4.01% |        (I) 6.03% |
> |                                 | new order rate (orders/min)                        |     213256.50 |  (R) -0.89% |     (I) 3.94% |        (I) 6.05% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
>
> Single-node SUT (workload running on single machine):
>
> +---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
> | Benchmark                       | Result Class                                       | 6-18-0 (base) |  6-19-0-rc6 | 6-19-0-rc6+p1 | 6-19-0-rc6+p1+p2 |
> +=================================+====================================================+===============+=============+===============+==================+
> | specjbb/composite               | critical-jOPS (jOPS)                               |      94700.00 |  (R) -4.12% |     (I) 3.07% |        (I) 1.27% |
> |                                 | max-jOPS (jOPS)                                    |     113984.50 |  (R) -2.80% |     (I) 1.94% |        (I) 1.94% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
> | repro-collection/mysql-workload | db transaction rate (transactions/min)             |     245438.25 |  (R) -3.07% |        -1.34% |            0.23% |
> |                                 | new order rate (orders/min)                        |      80985.75 |  (R) -3.06% |        -1.29% |            0.25% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
> | pts/pgbench                     | Scale: 1 Clients: 1 Read Only (TPS)                |      63124.00 |   (I) 2.67% |         2.58% |        (I) 2.69% |
> |                                 | Scale: 1 Clients: 1 Read Only - Latency (ms)       |         0.016 |       4.35% |         4.35% |            4.35% |
> |                                 | Scale: 1 Clients: 1 Read Write (TPS)               |        974.92 |       0.03% |         0.11% |           -0.06% |
> |                                 | Scale: 1 Clients: 1 Read Write - Latency (ms)      |          1.03 |       0.01% |         0.14% |           -0.04% |
> |                                 | Scale: 1 Clients: 250 Read Only (TPS)              |    1915931.58 |  (R) -3.28% |    (R) -3.92% |            1.23% |
> |                                 | Scale: 1 Clients: 250 Read Only - Latency (ms)     |          0.13 |  (R) -3.33% |    (R) -3.93% |            1.16% |
> |                                 | Scale: 1 Clients: 250 Read Write (TPS)             |        855.67 |       0.27% |        -0.49% |           -1.44% |
> |                                 | Scale: 1 Clients: 250 Read Write - Latency (ms)    |        292.39 |       0.32% |        -0.49% |           -1.40% |
> |                                 | Scale: 1 Clients: 1000 Read Only (TPS)             |    1534130.08 | (R) -12.20% |   (R) -11.85% |            0.45% |
> |                                 | Scale: 1 Clients: 1000 Read Only - Latency (ms)    |          0.65 | (R) -12.19% |   (R) -11.87% |            0.46% |
> |                                 | Scale: 1 Clients: 1000 Read Write (TPS)            |        578.75 |       0.85% |         1.60% |           -5.23% |
> |                                 | Scale: 1 Clients: 1000 Read Write - Latency (ms)   |       1736.98 |       1.12% |         1.52% |           -4.91% |
> |                                 | Scale: 100 Clients: 1 Read Only (TPS)              |      57170.33 |       1.64% |         2.16% |            1.69% |
> |                                 | Scale: 100 Clients: 1 Read Only - Latency (ms)     |         0.018 |       1.94% |         1.94% |            2.94% |
> |                                 | Scale: 100 Clients: 1 Read Write (TPS)             |        836.58 |       0.27% |         0.07% |            0.13% |
> |                                 | Scale: 100 Clients: 1 Read Write - Latency (ms)    |          1.20 |       0.27% |         0.06% |            0.15% |
> |                                 | Scale: 100 Clients: 250 Read Only (TPS)            |    1773440.67 |  (R) -2.54% |    (R) -2.94% |            1.00% |
> |                                 | Scale: 100 Clients: 250 Read Only - Latency (ms)   |          0.14 |  (R) -2.42% |    (R) -2.87% |            1.08% |
> |                                 | Scale: 100 Clients: 250 Read Write (TPS)           |       5505.50 |      -1.51% |         0.17% |           -0.03% |
> |                                 | Scale: 100 Clients: 250 Read Write - Latency (ms)  |         45.42 |      -1.52% |         0.17% |           -0.03% |
> |                                 | Scale: 100 Clients: 1000 Read Only (TPS)           |    1393037.50 | (R) -10.08% |   (R) -10.36% |            0.60% |
> |                                 | Scale: 100 Clients: 1000 Read Only - Latency (ms)  |          0.72 | (R) -10.07% |   (R) -10.35% |            0.60% |
> |                                 | Scale: 100 Clients: 1000 Read Write (TPS)          |       5085.92 |       0.70% |        -2.32% |           -0.28% |
> |                                 | Scale: 100 Clients: 1000 Read Write - Latency (ms) |        196.79 |       0.72% |        -2.27% |           -0.29% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
> | mmtests/hackbench               | hackbench-process-pipes-1 (seconds)                |          0.14 |      -1.28% |         0.35% |           -1.85% |
> |                                 | hackbench-process-pipes-4 (seconds)                |          0.44 |   (I) 8.20% |     (I) 5.72% |        (I) 7.23% |
> |                                 | hackbench-process-pipes-7 (seconds)                |          0.68 | (R) -18.31% |   (R) -24.54% |            1.56% |
> |                                 | hackbench-process-pipes-12 (seconds)               |          1.24 | (R) -19.52% |   (R) -24.55% |           -0.25% |
> |                                 | hackbench-process-pipes-21 (seconds)               |          1.81 |  (R) -7.33% |   (R) -13.58% |           -1.14% |
> |                                 | hackbench-process-pipes-30 (seconds)               |          2.39 |  (R) -7.86% |   (R) -13.21% |           -0.23% |
> |                                 | hackbench-process-pipes-48 (seconds)               |          3.18 | (R) -10.72% |   (R) -12.63% |            1.22% |
> |                                 | hackbench-process-pipes-79 (seconds)               |          3.84 |  (R) -9.52% |   (R) -10.31% |           -0.07% |
> |                                 | hackbench-process-pipes-110 (seconds)              |          4.68 |  (R) -6.78% |    (R) -7.15% |            1.30% |
> |                                 | hackbench-process-pipes-141 (seconds)              |          5.75 |  (R) -5.50% |    (R) -5.60% |            1.11% |
> |                                 | hackbench-process-pipes-172 (seconds)              |          6.80 |  (R) -4.67% |    (R) -4.79% |            1.61% |
> |                                 | hackbench-process-pipes-203 (seconds)              |          7.94 |  (R) -4.01% |    (R) -3.74% |        (I) 2.08% |
> |                                 | hackbench-process-pipes-234 (seconds)              |          9.02 |  (R) -3.69% |    (R) -3.63% |            1.67% |
> |                                 | hackbench-process-pipes-256 (seconds)              |          9.78 |  (R) -3.80% |    (R) -3.19% |            1.65% |
> |                                 | hackbench-process-sockets-1 (seconds)              |          0.29 |      -0.38% |        -0.43% |            0.03% |
> |                                 | hackbench-process-sockets-4 (seconds)              |          0.76 |  (I) 17.71% |    (I) 18.69% |       (I) 19.52% |
> |                                 | hackbench-process-sockets-7 (seconds)              |          1.16 |  (I) 12.10% |    (I) 11.37% |       (I) 13.52% |
> |                                 | hackbench-process-sockets-12 (seconds)             |          1.86 |  (I) 10.19% |     (I) 9.31% |       (I) 12.83% |
> |                                 | hackbench-process-sockets-21 (seconds)             |          3.12 |   (I) 9.59% |     (I) 8.99% |       (I) 12.15% |
> |                                 | hackbench-process-sockets-30 (seconds)             |          4.30 |   (I) 6.23% |     (I) 6.75% |        (I) 8.88% |
> |                                 | hackbench-process-sockets-48 (seconds)             |          6.58 |   (I) 2.39% |     (I) 2.98% |        (I) 4.39% |
> |                                 | hackbench-process-sockets-79 (seconds)             |         10.56 |   (I) 3.44% |     (I) 3.10% |        (I) 3.94% |
> |                                 | hackbench-process-sockets-110 (seconds)            |         13.85 |      -0.77% |         0.44% |        (I) 2.50% |
> |                                 | hackbench-process-sockets-141 (seconds)            |         19.23 |      -0.47% |         1.54% |            2.95% |
> |                                 | hackbench-process-sockets-172 (seconds)            |         26.33 |   (I) 3.44% |     (I) 4.25% |        (I) 3.21% |
> |                                 | hackbench-process-sockets-203 (seconds)            |         30.27 |       0.36% |         1.67% |            0.90% |
> |                                 | hackbench-process-sockets-234 (seconds)            |         35.12 |       2.05% |     (I) 3.11% |        (I) 2.45% |
> |                                 | hackbench-process-sockets-256 (seconds)            |         38.74 |      -0.39% |         1.48% |            2.13% |
> |                                 | hackbench-thread-pipes-1 (seconds)                 |          0.17 |      -0.38% |        -0.76% |           -1.51% |
> |                                 | hackbench-thread-pipes-4 (seconds)                 |          0.45 |   (I) 7.85% |     (I) 6.15% |        (I) 9.93% |
> |                                 | hackbench-thread-pipes-7 (seconds)                 |          0.74 |  (R) -7.22% |    (R) -9.98% |        (I) 6.47% |
> |                                 | hackbench-thread-pipes-12 (seconds)                |          1.32 |  (R) -7.62% |   (R) -14.42% |            1.27% |
> |                                 | hackbench-thread-pipes-21 (seconds)                |          1.95 |  (R) -3.00% |    (R) -7.93% |           -1.67% |
> |                                 | hackbench-thread-pipes-30 (seconds)                |          2.50 |  (R) -4.79% |   (R) -11.99% |           -1.72% |
> |                                 | hackbench-thread-pipes-48 (seconds)                |          3.32 |  (R) -5.49% |   (R) -11.45% |            1.15% |
> |                                 | hackbench-thread-pipes-79 (seconds)                |          4.04 |  (R) -6.16% |    (R) -8.88% |           -0.56% |
> |                                 | hackbench-thread-pipes-110 (seconds)               |          4.94 |  (R) -2.62% |    (R) -4.92% |            0.63% |
> |                                 | hackbench-thread-pipes-141 (seconds)               |          6.04 |  (R) -2.05% |    (R) -3.56% |            0.51% |
> |                                 | hackbench-thread-pipes-172 (seconds)               |          7.15 |      -0.74% |        -1.93% |            0.91% |
> |                                 | hackbench-thread-pipes-203 (seconds)               |          8.31 |      -1.20% |        -1.41% |            0.91% |
> |                                 | hackbench-thread-pipes-234 (seconds)               |          9.49 |      -0.65% |        -1.21% |            0.92% |
> |                                 | hackbench-thread-pipes-256 (seconds)               |         10.30 |      -0.56% |        -0.92% |            0.88% |
> |                                 | hackbench-thread-sockets-1 (seconds)               |          0.31 |       0.16% |        -0.05% |           -0.48% |
> |                                 | hackbench-thread-sockets-4 (seconds)               |          0.79 |  (I) 18.70% |    (I) 19.30% |       (I) 19.79% |
> |                                 | hackbench-thread-sockets-7 (seconds)               |          1.16 |  (I) 12.35% |    (I) 11.90% |       (I) 12.91% |
> |                                 | hackbench-thread-sockets-12 (seconds)              |          1.87 |  (I) 12.75% |    (I) 11.66% |       (I) 14.43% |
> |                                 | hackbench-thread-sockets-21 (seconds)              |          3.16 |  (I) 11.55% |    (I) 11.06% |       (I) 14.41% |
> |                                 | hackbench-thread-sockets-30 (seconds)              |          4.32 |   (I) 7.66% |     (I) 6.58% |       (I) 10.15% |
> |                                 | hackbench-thread-sockets-48 (seconds)              |          6.45 |   (I) 2.62% |         1.92% |        (I) 4.10% |
> |                                 | hackbench-thread-sockets-79 (seconds)              |         10.15 |       1.85% |        -0.20% |            1.54% |
> |                                 | hackbench-thread-sockets-110 (seconds)             |         13.45 |      -0.29% |        -0.41% |            0.08% |
> |                                 | hackbench-thread-sockets-141 (seconds)             |         17.87 |      -1.84% |        -1.01% |            1.33% |
> |                                 | hackbench-thread-sockets-172 (seconds)             |         24.38 |       0.82% |         1.33% |            3.68% |
> |                                 | hackbench-thread-sockets-203 (seconds)             |         28.38 |      -1.29% |         0.72% |            1.58% |
> |                                 | hackbench-thread-sockets-234 (seconds)             |         32.75 |      -1.01% |         1.00% |            0.94% |
> |                                 | hackbench-thread-sockets-256 (seconds)             |         36.49 |      -0.99% |         1.22% |            1.00% |
> +---------------------------------+----------------------------------------------------+---------------+-------------+---------------+------------------+
>
> Thanks,
> Ryan
>
>
> > ---
> >  kernel/sched/fair.c | 10 ----------
> >  1 file changed, 10 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 04993c763a06..16ecc3475fe2 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8822,16 +8822,6 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> >       if ((wake_flags & WF_FORK) || pse->sched_delayed)
> >               return;
> >
> > -     /*
> > -      * If @p potentially is completing work required by current then
> > -      * consider preemption.
> > -      *
> > -      * Reschedule if waker is no longer eligible. */
> > -     if (in_task() && !entity_eligible(cfs_rq, se)) {
> > -             preempt_action = PREEMPT_WAKEUP_RESCHED;
> > -             goto preempt;
> > -     }
> > -
> >       /* Prefer picking wakee soon if appropriate. */
> >       if (sched_feat(NEXT_BUDDY) &&
> >           set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
>
[tip: sched/urgent] sched/fair: Revert force wakeup preemption
Posted by tip-bot2 for Vincent Guittot 2 weeks ago
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     15257cc2f905dbf5813c0bfdd3c15885f28093c4
Gitweb:        https://git.kernel.org/tip/15257cc2f905dbf5813c0bfdd3c15885f28093c4
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Fri, 23 Jan 2026 11:28:58 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 23 Jan 2026 11:53:20 +01:00

sched/fair: Revert force wakeup preemption

This agressively bypasses run_to_parity and slice protection with the
assumpiton that this is what waker wants but there is no garantee that
the wakee will be the next to run. It is a better choice to use
yield_to_task or WF_SYNC in such case.

This increases the number of resched and preemption because a task becomes
quickly "ineligible" when it runs; We update the task vruntime periodically
and before the task exhausted its slice or at least quantum.

Example:
2 tasks A and B wake up simultaneously with lag = 0. Both are
eligible. Task A runs 1st and wakes up task C. Scheduler updates task
A's vruntime which becomes greater than average runtime as all others
have a lag == 0 and didn't run yet. Now task A is ineligible because
it received more runtime than the other task but it has not yet
exhausted its slice nor a min quantum. We force preemption, disable
protection but Task B will run 1st not task C.

Sidenote, DELAY_ZERO increases this effect by clearing positive lag at
wake up.

Fixes: e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260123102858.52428-1-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a148c61..3eaeced 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8828,16 +8828,6 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int 
 	if ((wake_flags & WF_FORK) || pse->sched_delayed)
 		return;
 
-	/*
-	 * If @p potentially is completing work required by current then
-	 * consider preemption.
-	 *
-	 * Reschedule if waker is no longer eligible. */
-	if (in_task() && !entity_eligible(cfs_rq, se)) {
-		preempt_action = PREEMPT_WAKEUP_RESCHED;
-		goto preempt;
-	}
-
 	/* Prefer picking wakee soon if appropriate. */
 	if (sched_feat(NEXT_BUDDY) &&
 	    set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {