[PATCH v6 4/4] sched/fair: Don't double balance_interval for migrate_misfit

Qais Yousef posted 4 patches 1 year, 11 months ago
There is a newer version of this series
[PATCH v6 4/4] sched/fair: Don't double balance_interval for migrate_misfit
Posted by Qais Yousef 1 year, 11 months ago
It is not necessarily an indication of the system being busy and
requires a backoff of the load balancer activities. But pushing it high
could mean generally delaying other misfit activities or other type of
imbalances.

Also don't pollute nr_balance_failed because of misfit failures. The
value is used for enabling cache hot migration and in migrate_util/load
types. None of which should be impacted (skewed) by misfit failures.

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 kernel/sched/fair.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 20006fcf7df2..4c1235a5dd60 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11467,8 +11467,12 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 		 * We do not want newidle balance, which can be very
 		 * frequent, pollute the failure counter causing
 		 * excessive cache_hot migrations and active balances.
+		 *
+		 * Similarly for migration_misfit which is not related to
+		 * load/util migration, don't pollute nr_balance_failed.
 		 */
-		if (idle != CPU_NEWLY_IDLE)
+		if (idle != CPU_NEWLY_IDLE &&
+		    env.migration_type != migrate_misfit)
 			sd->nr_balance_failed++;
 
 		if (need_active_balance(&env)) {
@@ -11551,8 +11555,13 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 	 * repeatedly reach this code, which would lead to balance_interval
 	 * skyrocketing in a short amount of time. Skip the balance_interval
 	 * increase logic to avoid that.
+	 *
+	 * Similarly misfit migration which is not necessarily an indication of
+	 * the system being busy and requires lb to backoff to let it settle
+	 * down.
 	 */
-	if (env.idle == CPU_NEWLY_IDLE)
+	if (env.idle == CPU_NEWLY_IDLE ||
+	    env.migration_type == migrate_misfit)
 		goto out;
 
 	/* tune up the balancing interval */
-- 
2.34.1
Re: [PATCH v6 4/4] sched/fair: Don't double balance_interval for migrate_misfit
Posted by Vincent Guittot 1 year, 11 months ago
On Tue, 20 Feb 2024 at 23:56, Qais Yousef <qyousef@layalina.io> wrote:
>
> It is not necessarily an indication of the system being busy and
> requires a backoff of the load balancer activities. But pushing it high
> could mean generally delaying other misfit activities or other type of
> imbalances.
>
> Also don't pollute nr_balance_failed because of misfit failures. The
> value is used for enabling cache hot migration and in migrate_util/load
> types. None of which should be impacted (skewed) by misfit failures.
>
> Signed-off-by: Qais Yousef <qyousef@layalina.io>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 20006fcf7df2..4c1235a5dd60 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11467,8 +11467,12 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>                  * We do not want newidle balance, which can be very
>                  * frequent, pollute the failure counter causing
>                  * excessive cache_hot migrations and active balances.
> +                *
> +                * Similarly for migration_misfit which is not related to
> +                * load/util migration, don't pollute nr_balance_failed.
>                  */
> -               if (idle != CPU_NEWLY_IDLE)
> +               if (idle != CPU_NEWLY_IDLE &&
> +                   env.migration_type != migrate_misfit)
>                         sd->nr_balance_failed++;
>
>                 if (need_active_balance(&env)) {
> @@ -11551,8 +11555,13 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>          * repeatedly reach this code, which would lead to balance_interval
>          * skyrocketing in a short amount of time. Skip the balance_interval
>          * increase logic to avoid that.
> +        *
> +        * Similarly misfit migration which is not necessarily an indication of
> +        * the system being busy and requires lb to backoff to let it settle
> +        * down.
>          */
> -       if (env.idle == CPU_NEWLY_IDLE)
> +       if (env.idle == CPU_NEWLY_IDLE ||
> +           env.migration_type == migrate_misfit)
>                 goto out;
>
>         /* tune up the balancing interval */
> --
> 2.34.1
>