[v1] sched/smt: fix unbalance sched_smt_present dec/inc

[PATCH] sched/smt: fix unbalance sched_smt_present dec/inc

Posted by Yang Yingliang 2 years, 5 months ago

I got the following warn report while doing stress test:

jump label: negative count!
WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
Call Trace:
 <TASK>
 __static_key_slow_dec_cpuslocked+0x16/0x70
 sched_cpu_deactivate+0x26e/0x2a0
 cpuhp_invoke_callback+0x3ad/0x10d0
 cpuhp_thread_fun+0x3f5/0x680
 smpboot_thread_fn+0x56d/0x8d0
 kthread+0x309/0x400
 ret_from_fork+0x41/0x70
 ret_from_fork_asm+0x1b/0x30
 </TASK>

Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
the cpu offline failed, but sched_smt_present is decreased before
calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix
it by increasing sched_smt_present in the error path.

Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 kernel/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2299a5cfbfb9..b7ef2df36b75 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
 	sched_update_numa(cpu, false);
 	ret = cpuset_cpu_inactive(cpu);
 	if (ret) {
+#ifdef CONFIG_SCHED_SMT
+		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+			static_branch_inc_cpuslocked(&sched_smt_present);
+#endif
 		balance_push_set(cpu, false);
 		set_cpu_active(cpu, true);
 		sched_update_numa(cpu, true);
-- 
2.25.1

Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc

Posted by Chen Yu 2 years, 5 months ago

Hi Yingliang,

On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> I got the following warn report while doing stress test:
> 

May I know if the test is to run many deadline tasks while offline the CPUs,
so as to trigger the failing case that removing one CPU gets us below the total
allocated bandwidth?

> jump label: negative count!
> WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> Call Trace:
>  <TASK>
>  __static_key_slow_dec_cpuslocked+0x16/0x70
>  sched_cpu_deactivate+0x26e/0x2a0
>  cpuhp_invoke_callback+0x3ad/0x10d0
>  cpuhp_thread_fun+0x3f5/0x680
>  smpboot_thread_fn+0x56d/0x8d0
>  kthread+0x309/0x400
>  ret_from_fork+0x41/0x70
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
> 
> Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),

s/Becaus/Because/

> the cpu offline failed, but sched_smt_present is decreased before
> calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix

s/calling sched_cpu_deactivate/calling cpuset_cpu_inactive() ?

> it by increasing sched_smt_present in the error path.
> 
> Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  kernel/sched/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2299a5cfbfb9..b7ef2df36b75 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
>  	sched_update_numa(cpu, false);
>  	ret = cpuset_cpu_inactive(cpu);
>  	if (ret) {
> +#ifdef CONFIG_SCHED_SMT
> +		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> +			static_branch_inc_cpuslocked(&sched_smt_present);
> +#endif

While checking the code, it seems that the core scheduling also missed
the error path, maybe we should also invoke sched_core_cpu_starting() to
restore the context. I'll have a check.

Other than above typo, it looks good to me,

Reviewed-by: Chen Yu <yu.c.chen@intel.com>

thanks,
Chenyu

Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc

Posted by Tim Chen 2 years, 5 months ago

On Mon, 2023-09-04 at 10:26 +0800, Chen Yu wrote:
> Hi Yingliang,
> 
> On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> > I got the following warn report while doing stress test:
> > 
> 
> May I know if the test is to run many deadline tasks while offline the CPUs,
> so as to trigger the failing case that removing one CPU gets us below the total
> allocated bandwidth?
> 
> > jump label: negative count!
> > WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> > Call Trace:
> >  <TASK>
> >  __static_key_slow_dec_cpuslocked+0x16/0x70
> >  sched_cpu_deactivate+0x26e/0x2a0
> >  cpuhp_invoke_callback+0x3ad/0x10d0
> >  cpuhp_thread_fun+0x3f5/0x680
> >  smpboot_thread_fn+0x56d/0x8d0
> >  kthread+0x309/0x400
> >  ret_from_fork+0x41/0x70
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> > 
> > Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
> 
> s/Becaus/Because/
> 
> > the cpu offline failed, but sched_smt_present is decreased before

s/decreased/decremented
to be precise

> > calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix

s/leads unbalance dec/inc / leads to unbalanced dec/inc
> 
> s/calling sched_cpu_deactivate/calling cpuset_cpu_inactive() ?
> > it by increasing sched_smt_present in the error path.

s/increasing/incrementing

> > 
> > Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> > Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> > ---
> >  kernel/sched/core.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2299a5cfbfb9..b7ef2df36b75 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
> >  	sched_update_numa(cpu, false);
> >  	ret = cpuset_cpu_inactive(cpu);
> >  	if (ret) {
> > +#ifdef CONFIG_SCHED_SMT
> > +		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> > +			static_branch_inc_cpuslocked(&sched_smt_present);
> > +#endif
> 
> While checking the code, it seems that the core scheduling also missed
> the error path, maybe we should also invoke sched_core_cpu_starting() to
> restore the context. I'll have a check.
> 
> Other than above typo, it looks good to me,
> 
> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> 

Other than some minor nits to commit log wording,

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>

Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc

Posted by Chen Yu 2 years, 5 months ago

On 2023-09-04 at 10:26:20 +0800, Chen Yu wrote:
> Hi Yingliang,
> 
> On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> 
> While checking the code, it seems that the core scheduling also missed
> the error path, maybe we should also invoke sched_core_cpu_starting() to
> restore the context. I'll have a check.

After checking the core scheduling code, there is no need to restore
the context 'broken' by sched_core_cpu_deactivate(). It just tries to
find a new leader if the offline CPU is the old leader, and it should
not have any impact to switch to a new leader if the cpu hotplug offline
fails.

thanks,
Chenyu