kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
balance_fair() skips newidle balancing if rq->nr_running - there are already
tasks on the rq, so no need to try to pull tasks. However, this doesn't seem
correct when bandwidth throttling is in use. When an entity gets throttled,
rq->nr_running is not decremented, so a CPU could end up in a situation
where rq->nr_running is not zero but there are no runnable tasks.
Theoretically, skipping newidle balance in this condition can lead to
increased latencies although I couldn't come up with a scenario where this
could be demonstrated reliably.
Update balance_fair() to use sched_fair_runnable() which tests
rq->cfs.nr_running which is updated by bandwidth throttling. Note that
pick_next_task_fair() already uses sched_fair_runnable() in its optimized
path for the same purpose.
This also makes put_prev_task_balance() avoid incorrectly skipping lower
priority classes' (such as sched_ext) balance(). When a CPU has only lower
priority class tasks, rq->nr_running would still be positive and
balance_fair() would return 1 even when fair doesn't have any tasks to run.
This makes put_prev_task_balance() skip lower priority classes' balance()
incorrectly which may lead to stalls.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Peter Zijlstra <peterz@infradead.org>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct
static int
balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
- if (rq->nr_running)
+ if (sched_fair_runnable(rq))
return 1;
return sched_balance_newidle(rq, rf) != 0;
Hello Tejun,
On 8/3/2024 6:51 AM, Tejun Heo wrote:
> balance_fair() skips newidle balancing if rq->nr_running - there are already
> tasks on the rq, so no need to try to pull tasks. However, this doesn't seem
> correct when bandwidth throttling is in use. When an entity gets throttled,
> rq->nr_running is not decremented, so a CPU could end up in a situation
> where rq->nr_running is not zero but there are no runnable tasks.
When does this happen exactly? throttle_cfs_rq() calls sub_nr_running()
negating the "h_nr_running" of the throttled entity from the
rq->nr_running. Even deadline entities on being throttled dequeue the
entity (or don't enqueue them at all in enqueue_dl_entity()) and would
have the correct "rq->nr_running". Could you please highlight the path
that might hit this scenario?
--
Thanks and Regards,
Prateek
> Theoretically, skipping newidle balance in this condition can lead to
> increased latencies although I couldn't come up with a scenario where this
> could be demonstrated reliably.
>
> Update balance_fair() to use sched_fair_runnable() which tests
> rq->cfs.nr_running which is updated by bandwidth throttling. Note that
> pick_next_task_fair() already uses sched_fair_runnable() in its optimized
> path for the same purpose.
>
> This also makes put_prev_task_balance() avoid incorrectly skipping lower
> priority classes' (such as sched_ext) balance(). When a CPU has only lower
> priority class tasks, rq->nr_running would still be positive and
> balance_fair() would return 1 even when fair doesn't have any tasks to run.
> This makes put_prev_task_balance() skip lower priority classes' balance()
> incorrectly which may lead to stalls.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct
> static int
> balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> {
> - if (rq->nr_running)
> + if (sched_fair_runnable(rq))
> return 1;
>
> return sched_balance_newidle(rq, rf) != 0;
>
Hello, On Mon, Aug 05, 2024 at 08:51:05AM +0530, K Prateek Nayak wrote: ... > When does this happen exactly? throttle_cfs_rq() calls sub_nr_running() > negating the "h_nr_running" of the throttled entity from the > rq->nr_running. Even deadline entities on being throttled dequeue the > entity (or don't enqueue them at all in enqueue_dl_entity()) and would > have the correct "rq->nr_running". Could you please highlight the path > that might hit this scenario? I just missed the sub_nr_running() call. No wonder I couldn't see behavior difference. I'll redo the patch description. Thanks. -- tejun
balance_fair() skips newidle balancing if rq->nr_running - there are already
tasks on the rq, so no need to try to pull tasks. This tests the total
number of queued tasks on the CPU instead of only the fair class, but is
still correct as the rq can currently only have fair class tasks while
balance_fair() is running.
However, with the addition of sched_ext below the fair class, this will not
hold anymore and make put_prev_task_balance() skip sched_ext's balance()
incorrectly as, when a CPU has only lower priority class tasks,
rq->nr_running would still be positive and balance_fair() would return 1
even when fair doesn't have any tasks to run.
Update balance_fair() to use sched_fair_runnable() which tests
rq->cfs.nr_running which is updated by bandwidth throttling. Note that
pick_next_task_fair() already uses sched_fair_runnable() in its optimized
path for the same purpose.
v2: K Prateek Nayak pointed out that the bw control issue described in v1
was incorrect. Patch description updated. As this makes the patch only
relevant for sched_ext, I'll carry this through the sched_ext tree
unless there are objections.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct
static int
balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
- if (rq->nr_running)
+ if (sched_fair_runnable(rq))
return 1;
return sched_balance_newidle(rq, rf) != 0;
Hello Tejun,
Thank you for updating the commit message.
On 8/6/2024 4:09 AM, Tejun Heo wrote:
> balance_fair() skips newidle balancing if rq->nr_running - there are already
> tasks on the rq, so no need to try to pull tasks. This tests the total
> number of queued tasks on the CPU instead of only the fair class, but is
> still correct as the rq can currently only have fair class tasks while
> balance_fair() is running.
>
> However, with the addition of sched_ext below the fair class, this will not
> hold anymore and make put_prev_task_balance() skip sched_ext's balance()
> incorrectly as, when a CPU has only lower priority class tasks,
> rq->nr_running would still be positive and balance_fair() would return 1
> even when fair doesn't have any tasks to run.
>
> Update balance_fair() to use sched_fair_runnable() which tests
> rq->cfs.nr_running which is updated by bandwidth throttling. Note that
> pick_next_task_fair() already uses sched_fair_runnable() in its optimized
> path for the same purpose.
>
> v2: K Prateek Nayak pointed out that the bw control issue described in v1
> was incorrect. Patch description updated. As this makes the patch only
> relevant for sched_ext, I'll carry this through the sched_ext tree
> unless there are objections.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Looks good to me!
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
--
Thanks and Regards,
Prateek
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct
> static int
> balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> {
> - if (rq->nr_running)
> + if (sched_fair_runnable(rq))
> return 1;
>
> return sched_balance_newidle(rq, rf) != 0;
On 2024/8/6 06:39, Tejun Heo wrote:
> balance_fair() skips newidle balancing if rq->nr_running - there are already
> tasks on the rq, so no need to try to pull tasks. This tests the total
> number of queued tasks on the CPU instead of only the fair class, but is
> still correct as the rq can currently only have fair class tasks while
> balance_fair() is running.
>
> However, with the addition of sched_ext below the fair class, this will not
> hold anymore and make put_prev_task_balance() skip sched_ext's balance()
> incorrectly as, when a CPU has only lower priority class tasks,
> rq->nr_running would still be positive and balance_fair() would return 1
> even when fair doesn't have any tasks to run.
>
> Update balance_fair() to use sched_fair_runnable() which tests
> rq->cfs.nr_running which is updated by bandwidth throttling. Note that
> pick_next_task_fair() already uses sched_fair_runnable() in its optimized
> path for the same purpose.
>
> v2: K Prateek Nayak pointed out that the bw control issue described in v1
> was incorrect. Patch description updated. As this makes the patch only
> relevant for sched_ext, I'll carry this through the sched_ext tree
> unless there are objections.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Looks good to me.
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Thanks.
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8323,7 +8323,7 @@ static void set_cpus_allowed_fair(struct
> static int
> balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> {
> - if (rq->nr_running)
> + if (sched_fair_runnable(rq))
> return 1;
>
> return sched_balance_newidle(rq, rf) != 0;
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 924e2904da9b5edec61611918b98ab1f7fccc461
Gitweb: https://git.kernel.org/tip/924e2904da9b5edec61611918b98ab1f7fccc461
Author: Tejun Heo <tj@kernel.org>
AuthorDate: Mon, 05 Aug 2024 12:39:10 -10:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 07 Aug 2024 12:44:16 +02:00
sched/fair: Make balance_fair() test sched_fair_runnable() instead of rq->nr_running
balance_fair() skips newidle balancing if rq->nr_running - there are already
tasks on the rq, so no need to try to pull tasks. This tests the total
number of queued tasks on the CPU instead of only the fair class, but is
still correct as the rq can currently only have fair class tasks while
balance_fair() is running.
However, with the addition of sched_ext below the fair class, this will not
hold anymore and make put_prev_task_balance() skip sched_ext's balance()
incorrectly as, when a CPU has only lower priority class tasks,
rq->nr_running would still be positive and balance_fair() would return 1
even when fair doesn't have any tasks to run.
Update balance_fair() to use sched_fair_runnable() which tests
rq->cfs.nr_running which is updated by bandwidth throttling. Note that
pick_next_task_fair() already uses sched_fair_runnable() in its optimized
path for the same purpose.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/ZrFUjlCf7x3TNXB8@slm.duckdns.org
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 795ceef..6d39a82 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8355,7 +8355,7 @@ static void set_cpus_allowed_fair(struct task_struct *p, struct affinity_context
static int
balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
- if (rq->nr_running)
+ if (sched_fair_runnable(rq))
return 1;
return sched_balance_newidle(rq, rf) != 0;
© 2016 - 2025 Red Hat, Inc.