[PATCH] sched: Increase sched_tick_remote timeout

Phil Auld posted 1 patch 4 months, 4 weeks ago
There is a newer version of this series
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] sched: Increase sched_tick_remote timeout
Posted by Phil Auld 4 months, 4 weeks ago
Increase the sched_tick_remote WARN_ON timeout to remove false
positives due to temporarily busy HK cpus. The suggestion
was 30 seconds to catch really stuck remote tick processing
but not trigger it too easily.

Signed-off-by: Phil Auld <pauld@redhat.com>
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index be00629f0ba4..ef90d358252d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5724,7 +5724,7 @@ static void sched_tick_remote(struct work_struct *work)
 				 * reasonable amount of time.
 				 */
 				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
-				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
+				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
 			}
 			curr->sched_class->task_tick(rq, curr, 0);
 
-- 
2.51.0
Re: [PATCH] sched: Increase sched_tick_remote timeout
Posted by Phil Auld 4 months, 2 weeks ago
Hi,

On Thu, Sep 11, 2025 at 12:13:00PM -0400 Phil Auld wrote:
> Increase the sched_tick_remote WARN_ON timeout to remove false
> positives due to temporarily busy HK cpus. The suggestion
> was 30 seconds to catch really stuck remote tick processing
> but not trigger it too easily.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>

Frederic ack'd this. Any other thoughts or opinions on this one
character patch?

Cheers,
Phil



> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index be00629f0ba4..ef90d358252d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5724,7 +5724,7 @@ static void sched_tick_remote(struct work_struct *work)
>  				 * reasonable amount of time.
>  				 */
>  				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
> -				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> +				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
>  			}
>  			curr->sched_class->task_tick(rq, curr, 0);
>  
> -- 
> 2.51.0
> 

--
Re: [PATCH] sched: Increase sched_tick_remote timeout
Posted by Phil Auld 3 months ago
Hi Peter,

On Tue, Sep 23, 2025 at 06:47:39AM -0400 Phil Auld wrote:
> Hi,
> 
> On Thu, Sep 11, 2025 at 12:13:00PM -0400 Phil Auld wrote:
> > Increase the sched_tick_remote WARN_ON timeout to remove false
> > positives due to temporarily busy HK cpus. The suggestion
> > was 30 seconds to catch really stuck remote tick processing
> > but not trigger it too easily.
> > 
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> 
> Frederic ack'd this. Any other thoughts or opinions on this one
> character patch?

Can we have this timeout increase, please? 


Thanks,
Phil

> 
> Cheers,
> Phil
> 
> 
> 
> > ---
> >  kernel/sched/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index be00629f0ba4..ef90d358252d 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5724,7 +5724,7 @@ static void sched_tick_remote(struct work_struct *work)
> >  				 * reasonable amount of time.
> >  				 */
> >  				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
> > -				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > +				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
> >  			}
> >  			curr->sched_class->task_tick(rq, curr, 0);
> >  
> > -- 
> > 2.51.0
> > 
> 
> -- 
> 
> 

--
Re: [PATCH] sched: Increase sched_tick_remote timeout
Posted by Phil Auld 4 months ago
On Tue, Sep 23, 2025 at 06:47:39AM -0400 Phil Auld wrote:
> Hi,
> 
> On Thu, Sep 11, 2025 at 12:13:00PM -0400 Phil Auld wrote:
> > Increase the sched_tick_remote WARN_ON timeout to remove false
> > positives due to temporarily busy HK cpus. The suggestion
> > was 30 seconds to catch really stuck remote tick processing
> > but not trigger it too easily.
> > 
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> 
> Frederic ack'd this. Any other thoughts or opinions on this one
> character patch?

Ping...

> 
> Cheers,
> Phil
> 
> 
> 
> > ---
> >  kernel/sched/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index be00629f0ba4..ef90d358252d 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5724,7 +5724,7 @@ static void sched_tick_remote(struct work_struct *work)
> >  				 * reasonable amount of time.
> >  				 */
> >  				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
> > -				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > +				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
> >  			}
> >  			curr->sched_class->task_tick(rq, curr, 0);
> >  
> > -- 
> > 2.51.0
> > 
> 
> -- 
> 
> 

--
Re: [PATCH] sched: Increase sched_tick_remote timeout
Posted by wangtao (EQ) 4 months, 3 weeks ago
Increasing timeout alerts can reduce the probability of deadlocks. However, in the 'sched_tick_remote' method, there are 'WARN_ON_ONCE(rq->curr!= rq->donor)' and 'assert_clock_updated' in 'rq_clock_task'. Regardless of why these alerts are triggered, once they are triggered, 'printk' is called, which still leaves potential deadlock issues. Is there a better way to address these problems?

在 2025/9/12 0:13, Phil Auld 写道:
> Increase the sched_tick_remote WARN_ON timeout to remove false
> positives due to temporarily busy HK cpus. The suggestion
> was 30 seconds to catch really stuck remote tick processing
> but not trigger it too easily.
>
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> ---
>   kernel/sched/core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index be00629f0ba4..ef90d358252d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5724,7 +5724,7 @@ static void sched_tick_remote(struct work_struct *work)
>   				 * reasonable amount of time.
>   				 */
>   				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
> -				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> +				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
>   			}
>   			curr->sched_class->task_tick(rq, curr, 0);
>   
Re: [PATCH] sched: Increase sched_tick_remote timeout
Posted by Phil Auld 4 months, 3 weeks ago
On Tue, Sep 16, 2025 at 04:44:39PM +0800 wangtao (EQ) wrote:
> Increasing timeout alerts can reduce the probability of deadlocks. However, in the 'sched_tick_remote' method, there are 'WARN_ON_ONCE(rq->curr!= rq->donor)' and 'assert_clock_updated' in 'rq_clock_task'. Regardless of why these alerts are triggered, once they are triggered, 'printk' is called, which still leaves potential deadlock issues. Is there a better way to address these problems?
>

I'm not specically trying to solve the printk deadlock problem. My patch is
to make this particular warning go away by reducing the false positives.
That's tangential to your original posting. 

You can use the new printk mechanism with an atomic console to get around
the printk bug I think.

I think you could also use a serial console instead of a framebuffer based
console.


Cheers,
Phil



> 在 2025/9/12 0:13, Phil Auld 写道:
> > Increase the sched_tick_remote WARN_ON timeout to remove false
> > positives due to temporarily busy HK cpus. The suggestion
> > was 30 seconds to catch really stuck remote tick processing
> > but not trigger it too easily.
> > 
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >   kernel/sched/core.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index be00629f0ba4..ef90d358252d 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5724,7 +5724,7 @@ static void sched_tick_remote(struct work_struct *work)
> >   				 * reasonable amount of time.
> >   				 */
> >   				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
> > -				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> > +				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
> >   			}
> >   			curr->sched_class->task_tick(rq, curr, 0);
> 

-- 

Re: [PATCH] sched: Increase sched_tick_remote timeout
Posted by Frederic Weisbecker 4 months, 4 weeks ago
Le Thu, Sep 11, 2025 at 12:13:00PM -0400, Phil Auld a écrit :
> Increase the sched_tick_remote WARN_ON timeout to remove false
> positives due to temporarily busy HK cpus. The suggestion
> was 30 seconds to catch really stuck remote tick processing
> but not trigger it too easily.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Frederic Weisbecker <frederic@kernel.org>

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] sched: Increase sched_tick_remote timeout
Posted by wangtao (EQ) 4 months, 3 weeks ago
Do we have plans to merge this patch into the mainline?

Thanks,

Tao

在 2025/9/12 0:29, Frederic Weisbecker 写道:
> Le Thu, Sep 11, 2025 at 12:13:00PM -0400, Phil Auld a écrit :
>> Increase the sched_tick_remote WARN_ON timeout to remove false
>> positives due to temporarily busy HK cpus. The suggestion
>> was 30 seconds to catch really stuck remote tick processing
>> but not trigger it too easily.
>>
>> Signed-off-by: Phil Auld <pauld@redhat.com>
>> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Frederic Weisbecker <frederic@kernel.org>
> Acked-by: Frederic Weisbecker <frederic@kernel.org>
>
[tip: sched/core] sched: Increase sched_tick_remote timeout
Posted by tip-bot2 for Phil Auld 2 months, 3 weeks ago
The following commit has been merged into the sched/core branch of tip:

Commit-ID:     aaab6bb54ab9bc4c37ff33b816031918d2760517
Gitweb:        https://git.kernel.org/tip/aaab6bb54ab9bc4c37ff33b816031918d2760517
Author:        Phil Auld <pauld@redhat.com>
AuthorDate:    Thu, 11 Sep 2025 12:13:00 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 17 Nov 2025 17:13:15 +01:00

sched: Increase sched_tick_remote timeout

Increase the sched_tick_remote WARN_ON timeout to remove false
positives due to temporarily busy HK cpus. The suggestion
was 30 seconds to catch really stuck remote tick processing
but not trigger it too easily.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20250911161300.437944-1-pauld@redhat.com
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 68f19aa..699db3f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5619,7 +5619,7 @@ static void sched_tick_remote(struct work_struct *work)
 				 * reasonable amount of time.
 				 */
 				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
-				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
+				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
 			}
 			curr->sched_class->task_tick(rq, curr, 0);
 
[tip: sched/core] sched: Increase sched_tick_remote timeout
Posted by tip-bot2 for Phil Auld 2 months, 3 weeks ago
The following commit has been merged into the sched/core branch of tip:

Commit-ID:     2616d12247639da40339757adc08c822147aa993
Gitweb:        https://git.kernel.org/tip/2616d12247639da40339757adc08c822147aa993
Author:        Phil Auld <pauld@redhat.com>
AuthorDate:    Thu, 11 Sep 2025 12:13:00 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 14 Nov 2025 13:03:06 +01:00

sched: Increase sched_tick_remote timeout

Increase the sched_tick_remote WARN_ON timeout to remove false
positives due to temporarily busy HK cpus. The suggestion
was 30 seconds to catch really stuck remote tick processing
but not trigger it too easily.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20250911161300.437944-1-pauld@redhat.com
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 68f19aa..699db3f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5619,7 +5619,7 @@ static void sched_tick_remote(struct work_struct *work)
 				 * reasonable amount of time.
 				 */
 				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
-				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
+				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
 			}
 			curr->sched_class->task_tick(rq, curr, 0);
 
Re: [tip: sched/core] sched: Increase sched_tick_remote timeout
Posted by Phil Auld 2 months, 3 weeks ago
On Fri, Nov 14, 2025 at 12:19:06PM -0000 tip-bot2 for Phil Auld wrote:
> The following commit has been merged into the sched/core branch of tip:
> 
> Commit-ID:     2616d12247639da40339757adc08c822147aa993
> Gitweb:        https://git.kernel.org/tip/2616d12247639da40339757adc08c822147aa993
> Author:        Phil Auld <pauld@redhat.com>
> AuthorDate:    Thu, 11 Sep 2025 12:13:00 -04:00
> Committer:     Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Fri, 14 Nov 2025 13:03:06 +01:00
>

Thanks Peter!  


> sched: Increase sched_tick_remote timeout
> 
> Increase the sched_tick_remote WARN_ON timeout to remove false
> positives due to temporarily busy HK cpus. The suggestion
> was 30 seconds to catch really stuck remote tick processing
> but not trigger it too easily.
> 
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Frederic Weisbecker <frederic@kernel.org>
> Link: https://patch.msgid.link/20250911161300.437944-1-pauld@redhat.com
> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 68f19aa..699db3f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5619,7 +5619,7 @@ static void sched_tick_remote(struct work_struct *work)
>  				 * reasonable amount of time.
>  				 */
>  				u64 delta = rq_clock_task(rq) - curr->se.exec_start;
> -				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
> +				WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 30);
>  			}
>  			curr->sched_class->task_tick(rq, curr, 0);
>  
> 

--