[v1] sched/rt: Fix live lock between select_fallback_rq() and RT push

[PATCH] sched/rt: Fix live lock between select_fallback_rq() and RT push

Posted by Joel Fernandes (Google) 2 years, 4 months ago

During RCU-boost testing with the TREE03 rcutorture config, I found that
after a few hours, the machine locks up.

On tracing, I found that there is a live lock happening between 2 CPUs.
One CPU has an RT task running, while another CPU is being offlined
which also has an RT task running.  During this offlining, all threads
are migrated. The migration thread is repeatedly scheduled to migrate
actively running tasks on the CPU being offlined. This results in a live
lock because select_fallback_rq() keeps picking the CPU that an RT task
is already running on only to get pushed back to the CPU being offlined.

It is anyway pointless to pick CPUs for pushing tasks to if they are
being offlined only to get migrated away to somewhere else. This could
also add unwanted latency to this task.

Fix these issues by not selecting CPUs in RT if they are not 'active'
for scheduling, using the cpu_active_mask. Other parts in core.c already
use cpu_active_mask to prevent tasks from being put on CPUs going
offline.

Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/sched/cpupri.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index a286e726eb4b..42c40cfdf836 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
 
 	if (lowest_mask) {
 		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
+		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
 
 		/*
 		 * We have to ensure that we have at least one bit
-- 
2.42.0.515.g380fc7ccd1-goog

Re: [PATCH] sched/rt: Fix live lock between select_fallback_rq() and RT push

Posted by Qais Yousef 2 years, 4 months ago

Hey Joel

On 09/23/23 01:14, Joel Fernandes (Google) wrote:
> During RCU-boost testing with the TREE03 rcutorture config, I found that
> after a few hours, the machine locks up.
> 
> On tracing, I found that there is a live lock happening between 2 CPUs.
> One CPU has an RT task running, while another CPU is being offlined
> which also has an RT task running.  During this offlining, all threads
> are migrated. The migration thread is repeatedly scheduled to migrate
> actively running tasks on the CPU being offlined. This results in a live
> lock because select_fallback_rq() keeps picking the CPU that an RT task
> is already running on only to get pushed back to the CPU being offlined.
> 
> It is anyway pointless to pick CPUs for pushing tasks to if they are
> being offlined only to get migrated away to somewhere else. This could
> also add unwanted latency to this task.
> 
> Fix these issues by not selecting CPUs in RT if they are not 'active'
> for scheduling, using the cpu_active_mask. Other parts in core.c already
> use cpu_active_mask to prevent tasks from being put on CPUs going
> offline.

I think this is the same report as this one from Xuewen

	https://lore.kernel.org/lkml/20221114120453.3233-1-xuewen.yan@unisoc.com/

You also have only 2 CPUs system, so when one CPU is offline, the other becomes
overloaded.

I think it is worth trying to fix rto_next_cpu() to consider the
cpu_active_mask too (see my reply to Xuewen then). Keep your change as well.

I think Xuewen and I considered disabling the overloaded logic when we devolve
to a UP system too as the whole logic doesn't make sense anymore then. This is
more of an optimization than correctness though. Don't feel strongly about it
now though.


Thanks!

--
Qais Yousef


> 
> Tested-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/sched/cpupri.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index a286e726eb4b..42c40cfdf836 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
>  
>  	if (lowest_mask) {
>  		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
> +		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
>  
>  		/*
>  		 * We have to ensure that we have at least one bit
> -- 
> 2.42.0.515.g380fc7ccd1-goog
>

Re: [PATCH] sched/rt: Fix live lock between select_fallback_rq() and RT push

Posted by Steven Rostedt 2 years, 4 months ago

On Sat, 23 Sep 2023 01:14:08 +0000
"Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> During RCU-boost testing with the TREE03 rcutorture config, I found that
> after a few hours, the machine locks up.
> 
> On tracing, I found that there is a live lock happening between 2 CPUs.
> One CPU has an RT task running, while another CPU is being offlined
> which also has an RT task running.  During this offlining, all threads
> are migrated. The migration thread is repeatedly scheduled to migrate
> actively running tasks on the CPU being offlined. This results in a live
> lock because select_fallback_rq() keeps picking the CPU that an RT task
> is already running on only to get pushed back to the CPU being offlined.
> 
> It is anyway pointless to pick CPUs for pushing tasks to if they are
> being offlined only to get migrated away to somewhere else. This could
> also add unwanted latency to this task.
> 
> Fix these issues by not selecting CPUs in RT if they are not 'active'
> for scheduling, using the cpu_active_mask. Other parts in core.c already
> use cpu_active_mask to prevent tasks from being put on CPUs going
> offline.
> 
> Tested-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/sched/cpupri.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index a286e726eb4b..42c40cfdf836 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
>  
>  	if (lowest_mask) {
>  		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
> +		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);

What happens if the cpu_active_mask changes right here?

Is this just making the race window smaller?

Something tells me the fix is going to be something a bit more involved.
But as I'm getting ready for Paris, I can't look at it at the moment.

-- Steve

>  
>  		/*
>  		 * We have to ensure that we have at least one bit

Re: [PATCH] sched/rt: Fix live lock between select_fallback_rq() and RT push

Posted by Joel Fernandes 2 years, 4 months ago

Hi Steve,

On Fri, Sep 22, 2023 at 09:45:39PM -0400, Steven Rostedt wrote:
> On Sat, 23 Sep 2023 01:14:08 +0000
> "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> 
> > During RCU-boost testing with the TREE03 rcutorture config, I found that
> > after a few hours, the machine locks up.
> > 
> > On tracing, I found that there is a live lock happening between 2 CPUs.
> > One CPU has an RT task running, while another CPU is being offlined
> > which also has an RT task running.  During this offlining, all threads
> > are migrated. The migration thread is repeatedly scheduled to migrate
> > actively running tasks on the CPU being offlined. This results in a live
> > lock because select_fallback_rq() keeps picking the CPU that an RT task
> > is already running on only to get pushed back to the CPU being offlined.
> > 
> > It is anyway pointless to pick CPUs for pushing tasks to if they are
> > being offlined only to get migrated away to somewhere else. This could
> > also add unwanted latency to this task.
> > 
> > Fix these issues by not selecting CPUs in RT if they are not 'active'
> > for scheduling, using the cpu_active_mask. Other parts in core.c already
> > use cpu_active_mask to prevent tasks from being put on CPUs going
> > offline.
> > 
> > Tested-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/sched/cpupri.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> > index a286e726eb4b..42c40cfdf836 100644
> > --- a/kernel/sched/cpupri.c
> > +++ b/kernel/sched/cpupri.c
> > @@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
> >  
> >  	if (lowest_mask) {
> >  		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
> > +		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
> 
> What happens if the cpu_active_mask changes right here?
> 
> Is this just making the race window smaller?

It should not be an issue for fixing the live lock because at most that would
cause a few more bounces between the 2 CPUs but eventually once
cpu_active_mask is stable, the CPU being offlined will not be selected for
the push. That's nothing compared to the multi-second live lock that happens
right now.

Also, with this patch I ran the tests for days and could not reproduce the
issue. Without the patch, I hit it in a few hours.

> Something tells me the fix is going to be something a bit more involved.
> But as I'm getting ready for Paris, I can't look at it at the moment.

Thanks for taking a look and safe travels!

 - Joel

[tip: sched/urgent] sched/rt: Fix live lock between select_fallback_rq() and RT push

Posted by tip-bot2 for Joel Fernandes (Google) 2 years, 4 months ago

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     fc09027786c900368de98d03d40af058bcb01ad9
Gitweb:        https://git.kernel.org/tip/fc09027786c900368de98d03d40af058bcb01ad9
Author:        Joel Fernandes (Google) <joel@joelfernandes.org>
AuthorDate:    Sat, 23 Sep 2023 01:14:08 
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 28 Sep 2023 22:58:13 +02:00

sched/rt: Fix live lock between select_fallback_rq() and RT push

During RCU-boost testing with the TREE03 rcutorture config, I found that
after a few hours, the machine locks up.

On tracing, I found that there is a live lock happening between 2 CPUs.
One CPU has an RT task running, while another CPU is being offlined
which also has an RT task running.  During this offlining, all threads
are migrated. The migration thread is repeatedly scheduled to migrate
actively running tasks on the CPU being offlined. This results in a live
lock because select_fallback_rq() keeps picking the CPU that an RT task
is already running on only to get pushed back to the CPU being offlined.

It is anyway pointless to pick CPUs for pushing tasks to if they are
being offlined only to get migrated away to somewhere else. This could
also add unwanted latency to this task.

Fix these issues by not selecting CPUs in RT if they are not 'active'
for scheduling, using the cpu_active_mask. Other parts in core.c already
use cpu_active_mask to prevent tasks from being put on CPUs going
offline.

With this fix I ran the tests for days and could not reproduce the
hang. Without the patch, I hit it in a few hours.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230923011409.3522762-1-joel@joelfernandes.org
---
 kernel/sched/cpupri.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index a286e72..42c40cf 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
 
 	if (lowest_mask) {
 		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
+		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
 
 		/*
 		 * We have to ensure that we have at least one bit