[PATCH] sched/fair: Add SMT4 group_smt_balance handling

Shrikanth Hegde posted 1 patch 2 years, 6 months ago
kernel/sched/fair.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
[PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Shrikanth Hegde 2 years, 6 months ago
From: Tim Chen <tim.c.chen@linux.intel.com>

For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.

Also, handle rounding effect of adding (ncores_local + ncores_busy)
when the local is fully idle and busy group has more than 2 tasks.
Local group should try to pull at least 1 task in this case.

Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
---
 kernel/sched/fair.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 932e7b78894a..9502013abe33 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9532,7 +9532,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	imbalance /= ncores_local + ncores_busiest;

 	/* Take advantage of resource in an empty sched group */
-	if (imbalance == 0 && local->sum_nr_running == 0 &&
+	if (imbalance <= 1 && local->sum_nr_running == 0 &&
 	    busiest->sum_nr_running > 1)
 		imbalance = 2;

@@ -9720,6 +9720,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		break;

 	case group_smt_balance:
+		/* no idle cpus on both groups handled by group_fully_busy below */
+		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
+			if (sgs->idle_cpus > busiest->idle_cpus)
+				return false;
+			if (sgs->idle_cpus < busiest->idle_cpus)
+				return true;
+			if (sgs->sum_nr_running <= busiest->sum_nr_running)
+				return false;
+		}
+		break;
+
 	case group_fully_busy:
 		/*
 		 * Select the fully busy group with highest avg_load. In
--
2.31.1
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Tim Chen 2 years, 6 months ago
On Mon, 2023-07-17 at 20:28 +0530, Shrikanth Hegde wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> For SMT4, any group with more than 2 tasks will be marked as
> group_smt_balance. Retain the behaviour of group_has_spare by marking
> the busiest group as the group which has the least number of idle_cpus.
> 
> Also, handle rounding effect of adding (ncores_local + ncores_busy)
> when the local is fully idle and busy group has more than 2 tasks.
> Local group should try to pull at least 1 task in this case.
> 
> Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
> ---
>  kernel/sched/fair.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 932e7b78894a..9502013abe33 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9532,7 +9532,7 @@ static inline long sibling_imbalance(struct lb_env *env,
>  	imbalance /= ncores_local + ncores_busiest;
> 
>  	/* Take advantage of resource in an empty sched group */
> -	if (imbalance == 0 && local->sum_nr_running == 0 &&
> +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
>  	    busiest->sum_nr_running > 1)
>  		imbalance = 2;
> 
> @@ -9720,6 +9720,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  		break;
> 
>  	case group_smt_balance:
> +		/* no idle cpus on both groups handled by group_fully_busy below */
> +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> +			if (sgs->idle_cpus > busiest->idle_cpus)
> +				return false;
> +			if (sgs->idle_cpus < busiest->idle_cpus)
> +				return true;
> +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
> +				return false;
> +		}
> +		break;
> +
>  	case group_fully_busy:
>  		/*
>  		 * Select the fully busy group with highest avg_load. In

Thanks for the fix up patch.

Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Tim Chen 2 years, 6 months ago
On Wed, 2023-07-26 at 20:11 -0700, Tim Chen wrote:
> On Mon, 2023-07-17 at 20:28 +0530, Shrikanth Hegde wrote:
> > From: Tim Chen <tim.c.chen@linux.intel.com>
> > 
> > For SMT4, any group with more than 2 tasks will be marked as
> > group_smt_balance. Retain the behaviour of group_has_spare by marking
> > the busiest group as the group which has the least number of idle_cpus.
> > 
> > Also, handle rounding effect of adding (ncores_local + ncores_busy)
> > when the local is fully idle and busy group has more than 2 tasks.
> > Local group should try to pull at least 1 task in this case.
> > 
> > Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
> > ---
> >  kernel/sched/fair.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 932e7b78894a..9502013abe33 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9532,7 +9532,7 @@ static inline long sibling_imbalance(struct lb_env *env,
> >  	imbalance /= ncores_local + ncores_busiest;
> > 
> >  	/* Take advantage of resource in an empty sched group */
> > -	if (imbalance == 0 && local->sum_nr_running == 0 &&
> > +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
> >  	    busiest->sum_nr_running > 1)
> >  		imbalance = 2;
> > 
> > @@ -9720,6 +9720,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  		break;
> > 
> >  	case group_smt_balance:
> > +		/* no idle cpus on both groups handled by group_fully_busy below */
> > +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> > +			if (sgs->idle_cpus > busiest->idle_cpus)
> > +				return false;
> > +			if (sgs->idle_cpus < busiest->idle_cpus)
> > +				return true;
> > +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
> > +				return false;
> > +		}
> > +		break;

Shrikanth and Peter,

Sorry, I acked Shrikanth's fixup patch too quickly without seeing that Shrikanth added
a "break" in the patch above.  My original code did not have that break statement as
I did intend the code to fall through to the "group_fully_busy" code path when
there are no idle cpus in both groups.  To make the compiler happy and putting
in the correct logic, I refresh the patch as below.

Thanks.

Tim

From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Fri, 14 Jul 2023 16:09:30 -0700
Subject: [PATCH] sched/fair: Add SMT4 group_smt_balance handling

For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.

Also, handle rounding effect of adding (ncores_local + ncores_busy)
when the local is fully idle and busy group has more than 2 tasks.
Local group should try to pull at least 1 task in this case.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a87988327f88..566686c5f2bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	imbalance /= ncores_local + ncores_busiest;
 
 	/* Take advantage of resource in an empty sched group */
-	if (imbalance == 0 && local->sum_nr_running == 0 &&
+	if (imbalance <= 1 && local->sum_nr_running == 0 &&
 	    busiest->sum_nr_running > 1)
 		imbalance = 2;
 
@@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		break;
 
 	case group_smt_balance:
+		/* no idle cpus on both groups handled by group_fully_busy below */
+		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
+			if (sgs->idle_cpus > busiest->idle_cpus)
+				return false;
+			if (sgs->idle_cpus < busiest->idle_cpus)
+				return true;
+			if (sgs->sum_nr_running <= busiest->sum_nr_running)
+				return false;
+			else
+				return true;
+		}
+		goto fully_busy;
+		break;
+
 	case group_fully_busy:
 		/*
 		 * Select the fully busy group with highest avg_load. In
@@ -9763,7 +9777,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		 * select the 1st one, except if @sg is composed of SMT
 		 * siblings.
 		 */
-
+fully_busy:
 		if (sgs->avg_load < busiest->avg_load)
 			return false;
 
-- 
2.32.0
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Peter Zijlstra 2 years, 5 months ago
On Thu, Jul 27, 2023 at 06:32:44AM -0700, Tim Chen wrote:

>  kernel/sched/fair.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a87988327f88..566686c5f2bd 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
>  	imbalance /= ncores_local + ncores_busiest;
>  
>  	/* Take advantage of resource in an empty sched group */
> -	if (imbalance == 0 && local->sum_nr_running == 0 &&
> +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
>  	    busiest->sum_nr_running > 1)
>  		imbalance = 2;
>  
> @@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  		break;
>  
>  	case group_smt_balance:
> +		/* no idle cpus on both groups handled by group_fully_busy below */
> +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> +			if (sgs->idle_cpus > busiest->idle_cpus)
> +				return false;
> +			if (sgs->idle_cpus < busiest->idle_cpus)
> +				return true;
> +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
> +				return false;
> +			else
> +				return true;
> +		}
> +		goto fully_busy;
> +		break;

This is really daft; why can't this simply be: fallthrough; ? At the
very least that break must go.

> +
>  	case group_fully_busy:
>  		/*
>  		 * Select the fully busy group with highest avg_load. In
> @@ -9763,7 +9777,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  		 * select the 1st one, except if @sg is composed of SMT
>  		 * siblings.
>  		 */
> -
> +fully_busy:
>  		if (sgs->avg_load < busiest->avg_load)
>  			return false;
>  
> -- 
> 2.32.0
> 
>
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Tim Chen 2 years, 5 months ago
On Tue, 2023-09-05 at 12:41 +0200, Peter Zijlstra wrote:
> On Thu, Jul 27, 2023 at 06:32:44AM -0700, Tim Chen wrote:
> 
> >  kernel/sched/fair.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index a87988327f88..566686c5f2bd 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
> >  	imbalance /= ncores_local + ncores_busiest;
> >  
> >  	/* Take advantage of resource in an empty sched group */
> > -	if (imbalance == 0 && local->sum_nr_running == 0 &&
> > +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
> >  	    busiest->sum_nr_running > 1)
> >  		imbalance = 2;
> >  
> > @@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  		break;
> >  
> >  	case group_smt_balance:
> > +		/* no idle cpus on both groups handled by group_fully_busy below */
> > +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> > +			if (sgs->idle_cpus > busiest->idle_cpus)
> > +				return false;
> > +			if (sgs->idle_cpus < busiest->idle_cpus)
> > +				return true;
> > +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
> > +				return false;
> > +			else
> > +				return true;
> > +		}
> > +		goto fully_busy;
> > +		break;
> 
> This is really daft; why can't this simply be: fallthrough; ? At the
> very least that break must go.
> 
> 

Yes, the break should go.  I was adding the goto to prevent compiler
from complaining about fall through code.  The break no longer is needed.

Tim

From 81971a0b1eb64059756f00d8497b1865af2c0792 Mon Sep 17 00:00:00 2001
From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Fri, 14 Jul 2023 16:09:30 -0700
Subject: [PATCH] sched/fair: Add SMT4 group_smt_balance handling

For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.

Also, handle rounding effect of adding (ncores_local + ncores_busy)
when the local is fully idle and busy group has more than 2 tasks.
Local group should try to pull at least 1 task in this case.

Fixes: fee1759e4f04 ("sched/fair: Determine active load balance for SMT sched groups")
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0b7445cd5af9..6e7ee2efc1ba 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9575,7 +9575,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	imbalance /= ncores_local + ncores_busiest;
 
 	/* Take advantage of resource in an empty sched group */
-	if (imbalance == 0 && local->sum_nr_running == 0 &&
+	if (imbalance <= 1 && local->sum_nr_running == 0 &&
 	    busiest->sum_nr_running > 1)
 		imbalance = 2;
 
@@ -9763,6 +9763,19 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		break;
 
 	case group_smt_balance:
+		/* no idle cpus on both groups handled by group_fully_busy below */
+		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
+			if (sgs->idle_cpus > busiest->idle_cpus)
+				return false;
+			if (sgs->idle_cpus < busiest->idle_cpus)
+				return true;
+			if (sgs->sum_nr_running <= busiest->sum_nr_running)
+				return false;
+			else
+				return true;
+		}
+		goto fully_busy;
+
 	case group_fully_busy:
 		/*
 		 * Select the fully busy group with highest avg_load. In
@@ -9775,7 +9788,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		 * select the 1st one, except if @sg is composed of SMT
 		 * siblings.
 		 */
-
+fully_busy:
 		if (sgs->avg_load < busiest->avg_load)
 			return false;
 
-- 
2.32.0
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Peter Zijlstra 2 years, 5 months ago
On Tue, Sep 05, 2023 at 10:54:09AM -0700, Tim Chen wrote:

> > > +		goto fully_busy;
> > > +		break;
> > 
> > This is really daft; why can't this simply be: fallthrough; ? At the
> > very least that break must go.
> > 
> > 
> 
> Yes, the break should go.  I was adding the goto to prevent compiler
> from complaining about fall through code. 

But that's what we have the fallthrough keyword for, no?
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Tim Chen 2 years, 5 months ago
On Wed, 2023-09-06 at 10:23 +0200, Peter Zijlstra wrote:
> On Tue, Sep 05, 2023 at 10:54:09AM -0700, Tim Chen wrote:
> 
> > > > +		goto fully_busy;
> > > > +		break;
> > > 
> > > This is really daft; why can't this simply be: fallthrough; ? At the
> > > very least that break must go.
> > > 
> > > 
> > 
> > Yes, the break should go.  I was adding the goto to prevent compiler
> > from complaining about fall through code. 
> 
> But that's what we have the fallthrough keyword for, no?

Okay.  Will update patch to use fallthrough once Shrikanth has
a chance to test the update to use has_spare path for SMT4.

Tim
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Shrikanth Hegde 2 years, 6 months ago

On 7/27/23 7:02 PM, Tim Chen wrote:
> On Wed, 2023-07-26 at 20:11 -0700, Tim Chen wrote:
>> On Mon, 2023-07-17 at 20:28 +0530, Shrikanth Hegde wrote:
>>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>>
>>> For SMT4, any group with more than 2 tasks will be marked as
>>> group_smt_balance. Retain the behaviour of group_has_spare by marking
>>> the busiest group as the group which has the least number of idle_cpus.
>>>
>>> Also, handle rounding effect of adding (ncores_local + ncores_busy)
>>> when the local is fully idle and busy group has more than 2 tasks.
>>> Local group should try to pull at least 1 task in this case.
>>>
>>> Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
>>> ---
>>>  kernel/sched/fair.c | 13 ++++++++++++-
>>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 932e7b78894a..9502013abe33 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -9532,7 +9532,7 @@ static inline long sibling_imbalance(struct lb_env *env,
>>>  	imbalance /= ncores_local + ncores_busiest;
>>>
>>>  	/* Take advantage of resource in an empty sched group */
>>> -	if (imbalance == 0 && local->sum_nr_running == 0 &&
>>> +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
>>>  	    busiest->sum_nr_running > 1)
>>>  		imbalance = 2;
>>>
>>> @@ -9720,6 +9720,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>  		break;
>>>
>>>  	case group_smt_balance:
>>> +		/* no idle cpus on both groups handled by group_fully_busy below */
>>> +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
>>> +			if (sgs->idle_cpus > busiest->idle_cpus)
>>> +				return false;
>>> +			if (sgs->idle_cpus < busiest->idle_cpus)
>>> +				return true;
>>> +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
>>> +				return false;
>>> +		}
>>> +		break;
> 
> Shrikanth and Peter,
> 
> Sorry, I acked Shrikanth's fixup patch too quickly without seeing that Shrikanth added
> a "break" in the patch above.  My original code did not have that break statement as
> I did intend the code to fall through to the "group_fully_busy" code path when
> there are no idle cpus in both groups.  To make the compiler happy and putting
> in the correct logic, I refresh the patch as below.
> 
> Thanks.
> 
> Tim
> 
> From: Tim Chen <tim.c.chen@linux.intel.com>
> Date: Fri, 14 Jul 2023 16:09:30 -0700
> Subject: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
> 
> For SMT4, any group with more than 2 tasks will be marked as
> group_smt_balance. Retain the behaviour of group_has_spare by marking
> the busiest group as the group which has the least number of idle_cpus.
> 
> Also, handle rounding effect of adding (ncores_local + ncores_busy)
> when the local is fully idle and busy group has more than 2 tasks.
> Local group should try to pull at least 1 task in this case.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/fair.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a87988327f88..566686c5f2bd 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
>  	imbalance /= ncores_local + ncores_busiest;
>  
>  	/* Take advantage of resource in an empty sched group */
> -	if (imbalance == 0 && local->sum_nr_running == 0 &&
> +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
>  	    busiest->sum_nr_running > 1)
>  		imbalance = 2;
>  
> @@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  		break;
>  
>  	case group_smt_balance:
> +		/* no idle cpus on both groups handled by group_fully_busy below */
> +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> +			if (sgs->idle_cpus > busiest->idle_cpus)
> +				return false;
> +			if (sgs->idle_cpus < busiest->idle_cpus)
> +				return true;
> +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
> +				return false;
> +			else
> +				return true;
> +		}
> +		goto fully_busy;
> +		break;
> +
>  	case group_fully_busy:
>  		/*
>  		 * Select the fully busy group with highest avg_load. In
> @@ -9763,7 +9777,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  		 * select the 1st one, except if @sg is composed of SMT
>  		 * siblings.
>  		 */
> -
> +fully_busy:
>  		if (sgs->avg_load < busiest->avg_load)
>  			return false;
>  

Hi Tim, Peter. 

group_smt_balance(cluster scheduling), patches are in tip/sched/core. I dont 
see this above patch there yet. Currently as is, this can cause function difference 
in SMT4 systems( such as Power10). 

Can we please have the above patch as well in tip/sched/core?

Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Tim Chen 2 years, 5 months ago
On Mon, 2023-08-07 at 15:06 +0530, Shrikanth Hegde wrote:
> > 
> > From: Tim Chen <tim.c.chen@linux.intel.com>
> > Date: Fri, 14 Jul 2023 16:09:30 -0700
> > Subject: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
> > 
> > For SMT4, any group with more than 2 tasks will be marked as
> > group_smt_balance. Retain the behaviour of group_has_spare by marking
> > the busiest group as the group which has the least number of idle_cpus.
> > 
> > Also, handle rounding effect of adding (ncores_local + ncores_busy)
> > when the local is fully idle and busy group has more than 2 tasks.
> > Local group should try to pull at least 1 task in this case.
> > 
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  kernel/sched/fair.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index a87988327f88..566686c5f2bd 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
> >  	imbalance /= ncores_local + ncores_busiest;
> >  
> >  	/* Take advantage of resource in an empty sched group */
> > -	if (imbalance == 0 && local->sum_nr_running == 0 &&
> > +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
> >  	    busiest->sum_nr_running > 1)
> >  		imbalance = 2;
> >  
> > @@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  		break;
> >  
> >  	case group_smt_balance:
> > +		/* no idle cpus on both groups handled by group_fully_busy below */
> > +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> > +			if (sgs->idle_cpus > busiest->idle_cpus)
> > +				return false;
> > +			if (sgs->idle_cpus < busiest->idle_cpus)
> > +				return true;
> > +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
> > +				return false;
> > +			else
> > +				return true;
> > +		}
> > +		goto fully_busy;
> > +		break;
> > +
> >  	case group_fully_busy:
> >  		/*
> >  		 * Select the fully busy group with highest avg_load. In
> > @@ -9763,7 +9777,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  		 * select the 1st one, except if @sg is composed of SMT
> >  		 * siblings.
> >  		 */
> > -
> > +fully_busy:
> >  		if (sgs->avg_load < busiest->avg_load)
> >  			return false;
> >  
> 
> Hi Tim, Peter. 
> 
> group_smt_balance(cluster scheduling), patches are in tip/sched/core. I dont 
> see this above patch there yet. Currently as is, this can cause function difference 
> in SMT4 systems( such as Power10). 
> 
> Can we please have the above patch as well in tip/sched/core?
> 
> Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>

Hi Peter,

Just back from my long vacation.  Wonder if you have any comments on the above patch
for fixing the SMT4 case?

Tim
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Shrikanth Hegde 2 years, 5 months ago

On 8/22/23 12:49 AM, Tim Chen wrote:
> On Mon, 2023-08-07 at 15:06 +0530, Shrikanth Hegde wrote:
>>>
>>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>> Date: Fri, 14 Jul 2023 16:09:30 -0700
>>> Subject: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
>>>
>>> For SMT4, any group with more than 2 tasks will be marked as
>>> group_smt_balance. Retain the behaviour of group_has_spare by marking
>>> the busiest group as the group which has the least number of idle_cpus.
>>>
>>> Also, handle rounding effect of adding (ncores_local + ncores_busy)
>>> when the local is fully idle and busy group has more than 2 tasks.
>>> Local group should try to pull at least 1 task in this case.
>>>
>>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> ---
>>>  kernel/sched/fair.c | 18 ++++++++++++++++--
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index a87988327f88..566686c5f2bd 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
>>>  	imbalance /= ncores_local + ncores_busiest;
>>>  
>>>  	/* Take advantage of resource in an empty sched group */
>>> -	if (imbalance == 0 && local->sum_nr_running == 0 &&
>>> +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
>>>  	    busiest->sum_nr_running > 1)
>>>  		imbalance = 2;
>>>  
>>> @@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>  		break;
>>>  
>>>  	case group_smt_balance:
>>> +		/* no idle cpus on both groups handled by group_fully_busy below */
>>> +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
>>> +			if (sgs->idle_cpus > busiest->idle_cpus)
>>> +				return false;
>>> +			if (sgs->idle_cpus < busiest->idle_cpus)
>>> +				return true;
>>> +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
>>> +				return false;
>>> +			else
>>> +				return true;
>>> +		}
>>> +		goto fully_busy;
>>> +		break;
>>> +
>>>  	case group_fully_busy:
>>>  		/*
>>>  		 * Select the fully busy group with highest avg_load. In
>>> @@ -9763,7 +9777,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>  		 * select the 1st one, except if @sg is composed of SMT
>>>  		 * siblings.
>>>  		 */
>>> -
>>> +fully_busy:
>>>  		if (sgs->avg_load < busiest->avg_load)
>>>  			return false;
>>>  
>>
>> Hi Tim, Peter. 
>>
>> group_smt_balance(cluster scheduling), patches are in tip/sched/core. I dont 
>> see this above patch there yet. Currently as is, this can cause function difference 
>> in SMT4 systems( such as Power10). 
>>
>> Can we please have the above patch as well in tip/sched/core?
>>
>> Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
> 
> Hi Peter,
> 
> Just back from my long vacation.  Wonder if you have any comments on the above patch
> for fixing the SMT4 case?
> 
> Tim

Hi Tim, Peter. 

are there any concerns with the above patch for fixing the SMT4 case. 
Currently the behavior is group_smt_balance is set for having even 2 tasks in 
SMT4, ideally it should be same as the group_has_spare. 

The above patch copies the same behavior to group_smt_balance. 
>
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Tim Chen 2 years, 5 months ago
On Tue, 2023-09-05 at 13:33 +0530, Shrikanth Hegde wrote:
> 
> On 8/22/23 12:49 AM, Tim Chen wrote:
> > On Mon, 2023-08-07 at 15:06 +0530, Shrikanth Hegde wrote:
> > > > 
> > > > From: Tim Chen <tim.c.chen@linux.intel.com>
> > > > Date: Fri, 14 Jul 2023 16:09:30 -0700
> > > > Subject: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
> > > > 
> > > > For SMT4, any group with more than 2 tasks will be marked as
> > > > group_smt_balance. Retain the behaviour of group_has_spare by marking
> > > > the busiest group as the group which has the least number of idle_cpus.
> > > > 
> > > > Also, handle rounding effect of adding (ncores_local + ncores_busy)
> > > > when the local is fully idle and busy group has more than 2 tasks.
> > > > Local group should try to pull at least 1 task in this case.
> > > > 
> > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > ---
> > > >  kernel/sched/fair.c | 18 ++++++++++++++++--
> > > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index a87988327f88..566686c5f2bd 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
> > > >  	imbalance /= ncores_local + ncores_busiest;
> > > >  
> > > >  	/* Take advantage of resource in an empty sched group */
> > > > -	if (imbalance == 0 && local->sum_nr_running == 0 &&
> > > > +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
> > > >  	    busiest->sum_nr_running > 1)
> > > >  		imbalance = 2;
> > > >  
> > > > @@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > > >  		break;
> > > >  
> > > >  	case group_smt_balance:
> > > > +		/* no idle cpus on both groups handled by group_fully_busy below */
> > > > +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> > > > +			if (sgs->idle_cpus > busiest->idle_cpus)
> > > > +				return false;
> > > > +			if (sgs->idle_cpus < busiest->idle_cpus)
> > > > +				return true;
> > > > +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
> > > > +				return false;
> > > > +			else
> > > > +				return true;
> > > > +		}
> > > > +		goto fully_busy;
> > > > +		break;
> > > > +
> > > >  	case group_fully_busy:
> > > >  		/*
> > > >  		 * Select the fully busy group with highest avg_load. In
> > > > @@ -9763,7 +9777,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > > >  		 * select the 1st one, except if @sg is composed of SMT
> > > >  		 * siblings.
> > > >  		 */
> > > > -
> > > > +fully_busy:
> > > >  		if (sgs->avg_load < busiest->avg_load)
> > > >  			return false;
> > > >  
> > > 
> > > Hi Tim, Peter. 
> > > 
> > > group_smt_balance(cluster scheduling), patches are in tip/sched/core. I dont 
> > > see this above patch there yet. Currently as is, this can cause function difference 
> > > in SMT4 systems( such as Power10). 
> > > 
> > > Can we please have the above patch as well in tip/sched/core?
> > > 
> > > Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
> > 
> > Hi Peter,
> > 
> > Just back from my long vacation.  Wonder if you have any comments on the above patch
> > for fixing the SMT4 case?
> > 
> > Tim
> 
> Hi Tim, Peter. 
> 
> are there any concerns with the above patch for fixing the SMT4 case. 
> Currently the behavior is group_smt_balance is set for having even 2 tasks in 
> SMT4, ideally it should be same as the group_has_spare. 
> 
> The above patch copies the same behavior to group_smt_balance. 
> > 

You mean simplify the patch as below?  I think that should be fine.  Can you
make sure it works for SMT4?  And I can update the patch once you confirm it
works properly.

Tim

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e7ee2efc1ba..48e9ab7f8a87 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9764,16 +9764,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 
        case group_smt_balance:
                /* no idle cpus on both groups handled by group_fully_busy below */
-               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
-                       if (sgs->idle_cpus > busiest->idle_cpus)
-                               return false;
-                       if (sgs->idle_cpus < busiest->idle_cpus)
-                               return true;
-                       if (sgs->sum_nr_running <= busiest->sum_nr_running)
-                               return false;
-                       else
-                               return true;
-               }
+               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0)
+                       goto has_spare;
+
                goto fully_busy;
 
        case group_fully_busy:
@@ -9809,6 +9802,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
                 * as we do not want to pull task off SMT core with one task
                 * and make the core idle.
                 */
+has_spare:
                if (smt_vs_nonsmt_groups(sds->busiest, sg)) {
                        if (sg->flags & SD_SHARE_CPUCAPACITY && sgs->sum_h_nr_running <= 1)
                                return false;
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Shrikanth Hegde 2 years, 5 months ago

On 9/6/23 12:07 AM, Tim Chen wrote:
> On Tue, 2023-09-05 at 13:33 +0530, Shrikanth Hegde wrote:
>>
>> On 8/22/23 12:49 AM, Tim Chen wrote:
>>> On Mon, 2023-08-07 at 15:06 +0530, Shrikanth Hegde wrote:
>>>>>
>>>>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>>>> Date: Fri, 14 Jul 2023 16:09:30 -0700
>>>>> Subject: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
>>>>>
>>>>> For SMT4, any group with more than 2 tasks will be marked as
>>>>> group_smt_balance. Retain the behaviour of group_has_spare by marking
>>>>> the busiest group as the group which has the least number of idle_cpus.
>>>>>
>>>>> Also, handle rounding effect of adding (ncores_local + ncores_busy)
>>>>> when the local is fully idle and busy group has more than 2 tasks.
>>>>> Local group should try to pull at least 1 task in this case.
>>>>>
>>>>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>>>>> ---
>>>>>  kernel/sched/fair.c | 18 ++++++++++++++++--
>>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index a87988327f88..566686c5f2bd 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
>>>>>  	imbalance /= ncores_local + ncores_busiest;
>>>>>  
>>>>>  	/* Take advantage of resource in an empty sched group */
>>>>> -	if (imbalance == 0 && local->sum_nr_running == 0 &&
>>>>> +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
>>>>>  	    busiest->sum_nr_running > 1)
>>>>>  		imbalance = 2;
>>>>>  
>>>>> @@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>>>  		break;
>>>>>  
>>>>>  	case group_smt_balance:
>>>>> +		/* no idle cpus on both groups handled by group_fully_busy below */
>>>>> +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
>>>>> +			if (sgs->idle_cpus > busiest->idle_cpus)
>>>>> +				return false;
>>>>> +			if (sgs->idle_cpus < busiest->idle_cpus)
>>>>> +				return true;
>>>>> +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
>>>>> +				return false;
>>>>> +			else
>>>>> +				return true;
>>>>> +		}
>>>>> +		goto fully_busy;
>>>>> +		break;
>>>>> +
>>>>>  	case group_fully_busy:
>>>>>  		/*
>>>>>  		 * Select the fully busy group with highest avg_load. In
>>>>> @@ -9763,7 +9777,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>>>  		 * select the 1st one, except if @sg is composed of SMT
>>>>>  		 * siblings.
>>>>>  		 */
>>>>> -
>>>>> +fully_busy:
>>>>>  		if (sgs->avg_load < busiest->avg_load)
>>>>>  			return false;
>>>>>  
>>>>
>>>> Hi Tim, Peter. 
>>>>
>>>> group_smt_balance(cluster scheduling), patches are in tip/sched/core. I dont 
>>>> see this above patch there yet. Currently as is, this can cause function difference 
>>>> in SMT4 systems( such as Power10). 
>>>>
>>>> Can we please have the above patch as well in tip/sched/core?
>>>>
>>>> Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
>>>
>>> Hi Peter,
>>>
>>> Just back from my long vacation.  Wonder if you have any comments on the above patch
>>> for fixing the SMT4 case?
>>>
>>> Tim
>>
>> Hi Tim, Peter. 
>>
>> are there any concerns with the above patch for fixing the SMT4 case. 
>> Currently the behavior is group_smt_balance is set for having even 2 tasks in 
>> SMT4, ideally it should be same as the group_has_spare. 
>>
>> The above patch copies the same behavior to group_smt_balance. 
>>>
> 
> You mean simplify the patch as below?  I think that should be fine.  Can you
> make sure it works for SMT4?  And I can update the patch once you confirm it
> works properly.
> 
> Tim
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e7ee2efc1ba..48e9ab7f8a87 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9764,16 +9764,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  
>         case group_smt_balance:
>                 /* no idle cpus on both groups handled by group_fully_busy below */
> -               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> -                       if (sgs->idle_cpus > busiest->idle_cpus)
> -                               return false;
> -                       if (sgs->idle_cpus < busiest->idle_cpus)
> -                               return true;
> -                       if (sgs->sum_nr_running <= busiest->sum_nr_running)
> -                               return false;
> -                       else
> -                               return true;
> -               }
> +               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0)
> +                       goto has_spare;
> +
>                 goto fully_busy;
>  
>         case group_fully_busy:
> @@ -9809,6 +9802,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>                  * as we do not want to pull task off SMT core with one task
>                  * and make the core idle.
>                  */
> +has_spare:
>                 if (smt_vs_nonsmt_groups(sds->busiest, sg)) {
>                         if (sg->flags & SD_SHARE_CPUCAPACITY && sgs->sum_h_nr_running <= 1)
>                                 return false;
> 
> 
> 

Hi Tim,

In case you were waiting for my reply as inferred from other email. 
The above change looks fine as well. This would avoid duplication of
code for group_smt_balance. 

Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Tim Chen 2 years, 5 months ago
On Thu, 2023-09-07 at 14:28 +0530, Shrikanth Hegde wrote:
> > 
> > You mean simplify the patch as below?  I think that should be fine.  Can you
> > make sure it works for SMT4?  And I can update the patch once you confirm it
> > works properly.
> > 
> > Tim
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6e7ee2efc1ba..48e9ab7f8a87 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9764,16 +9764,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  
> >         case group_smt_balance:
> >                 /* no idle cpus on both groups handled by group_fully_busy below */
> > -               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> > -                       if (sgs->idle_cpus > busiest->idle_cpus)
> > -                               return false;
> > -                       if (sgs->idle_cpus < busiest->idle_cpus)
> > -                               return true;
> > -                       if (sgs->sum_nr_running <= busiest->sum_nr_running)
> > -                               return false;
> > -                       else
> > -                               return true;
> > -               }
> > +               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0)
> > +                       goto has_spare;
> > +
> >                 goto fully_busy;
> >  
> >         case group_fully_busy:
> > @@ -9809,6 +9802,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >                  * as we do not want to pull task off SMT core with one task
> >                  * and make the core idle.
> >                  */
> > +has_spare:
> >                 if (smt_vs_nonsmt_groups(sds->busiest, sg)) {
> >                         if (sg->flags & SD_SHARE_CPUCAPACITY && sgs->sum_h_nr_running <= 1)
> >                                 return false;
> > 
> > 
> > 
> 
> Hi Tim,
> 
> In case you were waiting for my reply as inferred from other email. 
> The above change looks fine as well. This would avoid duplication of
> code for group_smt_balance. 
> 
> Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>

Peter,

Here's the updated patch.  Please consider it for inclusion.

Thanks.

Tim

From 979e261fed6e3765316a4de794f595f93c02cef0 Mon Sep 17 00:00:00 2001
From: Tim Chen <tim.c.chen@linux.intel.com>
Subject: [PATCH] sched/fair: Fix SMT4 group_smt_balance handling
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Ricardo Neri <ricardo.neri@intel.com>, Ravi V. Shankar <ravi.v.shankar@intel.com>, Ben Segall
<bsegall@google.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Len Brown <len.brown@intel.com>, Mel Gorman <mgorman@suse.de>, Rafael J. Wysocki
<rafael.j.wysocki@intel.com>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Steven Rostedt <rostedt@goodmis.org>, Tim Chen <tim.c.chen@linux.intel.com>, Valentin Schneider
<vschneid@redhat.com>, Ionela Voinescu <ionela.voinescu@arm.com>, x86@kernel.org, linux-kernel@vger.kernel.org, Shrikanth Hegde <sshegde@linux.vnet.ibm.com>, Srikar Dronamraju
<srikar@linux.vnet.ibm.com>, naveen.n.rao@linux.vnet.ibm.com, Yicong Yang <yangyicong@hisilicon.com>, Barry Song <v-songbaohua@oppo.com>, Chen Yu <yu.c.chen@intel.com>, Hillf Danton <hdanton@sina.com>

For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.

Also, handle rounding effect of adding (ncores_local + ncores_busy) when
the local is fully idle and busy group imbalance is less than 2 tasks.
Local group should try to pull at least 1 task in this case so imbalance
should be set to 2 instead.

Fixes: fee1759e4f04 ("sched/fair: Determine active load balance for SMT sched groups")
Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0b7445cd5af9..fd9e594b5623 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9575,7 +9575,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	imbalance /= ncores_local + ncores_busiest;
 
 	/* Take advantage of resource in an empty sched group */
-	if (imbalance == 0 && local->sum_nr_running == 0 &&
+	if (imbalance <= 1 && local->sum_nr_running == 0 &&
 	    busiest->sum_nr_running > 1)
 		imbalance = 2;
 
@@ -9763,6 +9763,15 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		break;
 
 	case group_smt_balance:
+		/*
+		 * Check if we have spare CPUs on either SMT group to
+		 * choose has spare or fully busy handling.
+		 */
+		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0)
+			goto has_spare;
+
+		fallthrough;
+
 	case group_fully_busy:
 		/*
 		 * Select the fully busy group with highest avg_load. In
@@ -9802,6 +9811,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 			else
 				return true;
 		}
+has_spare:
 
 		/*
 		 * Select not overloaded group with lowest number of idle cpus
-- 
2.32.0
[tip: sched/urgent] sched/fair: Fix SMT4 group_smt_balance handling
Posted by tip-bot2 for Tim Chen 2 years, 4 months ago
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     450e749707bc1755f22b505d9cd942d4869dc535
Gitweb:        https://git.kernel.org/tip/450e749707bc1755f22b505d9cd942d4869dc535
Author:        Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate:    Thu, 07 Sep 2023 10:42:21 -07:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 13 Sep 2023 15:03:06 +02:00

sched/fair: Fix SMT4 group_smt_balance handling

For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.

Also, handle rounding effect of adding (ncores_local + ncores_busy) when
the local is fully idle and busy group imbalance is less than 2 tasks.
Local group should try to pull at least 1 task in this case so imbalance
should be set to 2 instead.

Fixes: fee1759e4f04 ("sched/fair: Determine active load balance for SMT sched groups")
Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/6cd1633036bb6b651af575c32c2a9608a106702c.camel@linux.intel.com
---
 kernel/sched/fair.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 33a2b6b..cb22592 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9580,7 +9580,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	imbalance /= ncores_local + ncores_busiest;
 
 	/* Take advantage of resource in an empty sched group */
-	if (imbalance == 0 && local->sum_nr_running == 0 &&
+	if (imbalance <= 1 && local->sum_nr_running == 0 &&
 	    busiest->sum_nr_running > 1)
 		imbalance = 2;
 
@@ -9768,6 +9768,15 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		break;
 
 	case group_smt_balance:
+		/*
+		 * Check if we have spare CPUs on either SMT group to
+		 * choose has spare or fully busy handling.
+		 */
+		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0)
+			goto has_spare;
+
+		fallthrough;
+
 	case group_fully_busy:
 		/*
 		 * Select the fully busy group with highest avg_load. In
@@ -9807,6 +9816,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 			else
 				return true;
 		}
+has_spare:
 
 		/*
 		 * Select not overloaded group with lowest number of idle cpus
[tip: sched/urgent] sched/fair: Fix SMT4 group_smt_balance handling
Posted by tip-bot2 for Tim Chen 2 years, 4 months ago
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     ad468232c3eb1dab163672f98a1ab2363be7981e
Gitweb:        https://git.kernel.org/tip/ad468232c3eb1dab163672f98a1ab2363be7981e
Author:        Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate:    Thu, 07 Sep 2023 10:42:21 -07:00
Committer:     root <root@noisy.programming.kicks-ass.net>
CommitterDate: Sat, 09 Sep 2023 15:10:10 +02:00

sched/fair: Fix SMT4 group_smt_balance handling

For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.

Also, handle rounding effect of adding (ncores_local + ncores_busy) when
the local is fully idle and busy group imbalance is less than 2 tasks.
Local group should try to pull at least 1 task in this case so imbalance
should be set to 2 instead.

Fixes: fee1759e4f04 ("sched/fair: Determine active load balance for SMT sched groups")
Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/6cd1633036bb6b651af575c32c2a9608a106702c.camel@linux.intel.com
---
 kernel/sched/fair.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 33a2b6b..cb22592 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9580,7 +9580,7 @@ static inline long sibling_imbalance(struct lb_env *env,
 	imbalance /= ncores_local + ncores_busiest;
 
 	/* Take advantage of resource in an empty sched group */
-	if (imbalance == 0 && local->sum_nr_running == 0 &&
+	if (imbalance <= 1 && local->sum_nr_running == 0 &&
 	    busiest->sum_nr_running > 1)
 		imbalance = 2;
 
@@ -9768,6 +9768,15 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		break;
 
 	case group_smt_balance:
+		/*
+		 * Check if we have spare CPUs on either SMT group to
+		 * choose has spare or fully busy handling.
+		 */
+		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0)
+			goto has_spare;
+
+		fallthrough;
+
 	case group_fully_busy:
 		/*
 		 * Select the fully busy group with highest avg_load. In
@@ -9807,6 +9816,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 			else
 				return true;
 		}
+has_spare:
 
 		/*
 		 * Select not overloaded group with lowest number of idle cpus
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Shrikanth Hegde 2 years, 5 months ago

On 9/6/23 12:07 AM, Tim Chen wrote:
> On Tue, 2023-09-05 at 13:33 +0530, Shrikanth Hegde wrote:
>>
>> On 8/22/23 12:49 AM, Tim Chen wrote:
>>> On Mon, 2023-08-07 at 15:06 +0530, Shrikanth Hegde wrote:
>>>>>
>>>>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>>>> Date: Fri, 14 Jul 2023 16:09:30 -0700
>>>>> Subject: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
>>>>>
>>>>> For SMT4, any group with more than 2 tasks will be marked as
>>>>> group_smt_balance. Retain the behaviour of group_has_spare by marking
>>>>> the busiest group as the group which has the least number of idle_cpus.
>>>>>
>>>>> Also, handle rounding effect of adding (ncores_local + ncores_busy)
>>>>> when the local is fully idle and busy group has more than 2 tasks.
>>>>> Local group should try to pull at least 1 task in this case.
>>>>>
>>>>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>>>>> ---
>>>>>  kernel/sched/fair.c | 18 ++++++++++++++++--
>>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index a87988327f88..566686c5f2bd 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -9563,7 +9563,7 @@ static inline long sibling_imbalance(struct lb_env *env,
>>>>>  	imbalance /= ncores_local + ncores_busiest;
>>>>>  
>>>>>  	/* Take advantage of resource in an empty sched group */
>>>>> -	if (imbalance == 0 && local->sum_nr_running == 0 &&
>>>>> +	if (imbalance <= 1 && local->sum_nr_running == 0 &&
>>>>>  	    busiest->sum_nr_running > 1)
>>>>>  		imbalance = 2;
>>>>>  
>>>>> @@ -9751,6 +9751,20 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>>>  		break;
>>>>>  
>>>>>  	case group_smt_balance:
>>>>> +		/* no idle cpus on both groups handled by group_fully_busy below */
>>>>> +		if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
>>>>> +			if (sgs->idle_cpus > busiest->idle_cpus)
>>>>> +				return false;
>>>>> +			if (sgs->idle_cpus < busiest->idle_cpus)
>>>>> +				return true;
>>>>> +			if (sgs->sum_nr_running <= busiest->sum_nr_running)
>>>>> +				return false;
>>>>> +			else
>>>>> +				return true;
>>>>> +		}
>>>>> +		goto fully_busy;
>>>>> +		break;
>>>>> +
>>>>>  	case group_fully_busy:
>>>>>  		/*
>>>>>  		 * Select the fully busy group with highest avg_load. In
>>>>> @@ -9763,7 +9777,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>>>>>  		 * select the 1st one, except if @sg is composed of SMT
>>>>>  		 * siblings.
>>>>>  		 */
>>>>> -
>>>>> +fully_busy:
>>>>>  		if (sgs->avg_load < busiest->avg_load)
>>>>>  			return false;
>>>>>  
>>>>
>>>> Hi Tim, Peter. 
>>>>
>>>> group_smt_balance(cluster scheduling), patches are in tip/sched/core. I dont 
>>>> see this above patch there yet. Currently as is, this can cause function difference 
>>>> in SMT4 systems( such as Power10). 
>>>>
>>>> Can we please have the above patch as well in tip/sched/core?
>>>>
>>>> Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
>>>
>>> Hi Peter,
>>>
>>> Just back from my long vacation.  Wonder if you have any comments on the above patch
>>> for fixing the SMT4 case?
>>>
>>> Tim
>>
>> Hi Tim, Peter. 
>>
>> are there any concerns with the above patch for fixing the SMT4 case. 
>> Currently the behavior is group_smt_balance is set for having even 2 tasks in 
>> SMT4, ideally it should be same as the group_has_spare. 
>>
>> The above patch copies the same behavior to group_smt_balance. 
>>>
> 
> You mean simplify the patch as below?  I think that should be fine.  Can you
> make sure it works for SMT4?  And I can update the patch once you confirm it
> works properly.
> 

This looks fine. likely better as it would avoid duplication. A few nit below.


> Tim
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e7ee2efc1ba..48e9ab7f8a87 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9764,16 +9764,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  
>         case group_smt_balance:
>                 /* no idle cpus on both groups handled by group_fully_busy below */

Please add a comment here explaining the fall-through and spare logic.

> -               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> -                       if (sgs->idle_cpus > busiest->idle_cpus)
> -                               return false;
> -                       if (sgs->idle_cpus < busiest->idle_cpus)
> -                               return true;
> -                       if (sgs->sum_nr_running <= busiest->sum_nr_running)
> -                               return false;
> -                       else
> -                               return true;
> -               }
> +               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0)
> +                       goto has_spare;
> +
>                 goto fully_busy;

This can fall through without the additional goto statement no?

>  
>         case group_fully_busy:
> @@ -9809,6 +9802,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>                  * as we do not want to pull task off SMT core with one task
>                  * and make the core idle.
>                  */
> +has_spare:
>                 if (smt_vs_nonsmt_groups(sds->busiest, sg)) {
>                         if (sg->flags & SD_SHARE_CPUCAPACITY && sgs->sum_h_nr_running <= 1)
>                                 return false;
> 
> 
>
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Tim Chen 2 years, 5 months ago
On Wed, 2023-09-06 at 14:59 +0530, Shrikanth Hegde wrote:
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6e7ee2efc1ba..48e9ab7f8a87 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9764,16 +9764,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  
> >         case group_smt_balance:
> >                 /* no idle cpus on both groups handled by group_fully_busy below */
> 
> Please add a comment here explaining the fall-through and spare logic.
> 

Sure.


> > -               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> > -                       if (sgs->idle_cpus > busiest->idle_cpus)
> > -                               return false;
> > -                       if (sgs->idle_cpus < busiest->idle_cpus)
> > -                               return true;
> > -                       if (sgs->sum_nr_running <= busiest->sum_nr_running)
> > -                               return false;
> > -                       else
> > -                               return true;
> > -               }
> > +               if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0)
> > +                       goto has_spare;
> > +
> >                 goto fully_busy;
> 
> This can fall through without the additional goto statement no?
> 

There is an unconditional goto fully_busy so won't fall through and
compiler won't complain.

> >  
> >         case group_fully_busy:
> > @@ -9809,6 +9802,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >                  * as we do not want to pull task off SMT core with one task
> >                  * and make the core idle.
> >                  */
> > +has_spare:
> >                 if (smt_vs_nonsmt_groups(sds->busiest, sg)) {
> >                         if (sg->flags & SD_SHARE_CPUCAPACITY && sgs->sum_h_nr_running <= 1)
> >                                 return false;
> > 
> > 
> > 
Tim
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Peter Zijlstra 2 years, 5 months ago
On Tue, Sep 05, 2023 at 01:33:57PM +0530, Shrikanth Hegde wrote:
> Hi Tim, Peter. 

Back from PTO; mailbox is a disaster area, but I'll try and have a look
soon.
Re: [PATCH] sched/fair: Add SMT4 group_smt_balance handling
Posted by Peter Zijlstra 2 years, 5 months ago
On Mon, Aug 21, 2023 at 12:19:40PM -0700, Tim Chen wrote:
> Just back from my long vacation.  Wonder if you have any comments on the above patch
> for fixing the SMT4 case?

This should have:

Fixes: fee1759e4f04 ("sched/fair: Determine active load balance for SMT sched groups")

Right?