[PATCH resend] sched/smt: fix unbalance sched_smt_present dec/inc

Yang Yingliang posted 1 patch 1 year, 5 months ago
There is a newer version of this series
kernel/sched/core.c | 4 ++++
1 file changed, 4 insertions(+)
[PATCH resend] sched/smt: fix unbalance sched_smt_present dec/inc
Posted by Yang Yingliang 1 year, 5 months ago
From: Yang Yingliang <yangyingliang@huawei.com>

I got the following warn report while doing stress test:

jump label: negative count!
WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
Call Trace:
 <TASK>
 __static_key_slow_dec_cpuslocked+0x16/0x70
 sched_cpu_deactivate+0x26e/0x2a0
 cpuhp_invoke_callback+0x3ad/0x10d0
 cpuhp_thread_fun+0x3f5/0x680
 smpboot_thread_fn+0x56d/0x8d0
 kthread+0x309/0x400
 ret_from_fork+0x41/0x70
 ret_from_fork_asm+0x1b/0x30
 </TASK>

Because when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
the cpu offline failed, but sched_smt_present is decremented before
calling sched_cpu_deactivate(), it leads to unbalanced dec/inc, so
fix it by incrementing sched_smt_present in the error path.

Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 kernel/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bcf2c4cc0522..5ab6717b57e0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9756,6 +9756,10 @@ int sched_cpu_deactivate(unsigned int cpu)
 	sched_update_numa(cpu, false);
 	ret = cpuset_cpu_inactive(cpu);
 	if (ret) {
+#ifdef CONFIG_SCHED_SMT
+		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+			static_branch_inc_cpuslocked(&sched_smt_present);
+#endif
 		balance_push_set(cpu, false);
 		set_cpu_active(cpu, true);
 		sched_update_numa(cpu, true);
-- 
2.25.1
Re: [PATCH resend] sched/smt: fix unbalance sched_smt_present dec/inc
Posted by Peter Zijlstra 1 year, 5 months ago
On Tue, Jul 02, 2024 at 04:11:28PM +0800, Yang Yingliang wrote:
> From: Yang Yingliang <yangyingliang@huawei.com>
> 
> I got the following warn report while doing stress test:
> 
> jump label: negative count!
> WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> Call Trace:
>  <TASK>
>  __static_key_slow_dec_cpuslocked+0x16/0x70
>  sched_cpu_deactivate+0x26e/0x2a0
>  cpuhp_invoke_callback+0x3ad/0x10d0
>  cpuhp_thread_fun+0x3f5/0x680
>  smpboot_thread_fn+0x56d/0x8d0
>  kthread+0x309/0x400
>  ret_from_fork+0x41/0x70
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
> 
> Because when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
> the cpu offline failed, but sched_smt_present is decremented before
> calling sched_cpu_deactivate(), it leads to unbalanced dec/inc, so
> fix it by incrementing sched_smt_present in the error path.
> 
> Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  kernel/sched/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bcf2c4cc0522..5ab6717b57e0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9756,6 +9756,10 @@ int sched_cpu_deactivate(unsigned int cpu)
>  	sched_update_numa(cpu, false);
>  	ret = cpuset_cpu_inactive(cpu);
>  	if (ret) {
> +#ifdef CONFIG_SCHED_SMT
> +		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> +			static_branch_inc_cpuslocked(&sched_smt_present);
> +#endif
>  		balance_push_set(cpu, false);
>  		set_cpu_active(cpu, true);
>  		sched_update_numa(cpu, true);

Yes, does indeed appear needed, however!, when I look at
what else goes before this failure, should we not also call
set_rq_online() and things like that?

That is, can we rework things to be less fragile by sharing code between
this error path and sched_cpu_activate() ?
Re: [PATCH resend] sched/smt: fix unbalance sched_smt_present dec/inc
Posted by yangyingliang@huaweicloud.com 1 year, 5 months ago


&gt; -----Original Messages-----
&gt; From: "Peter Zijlstra" <peterz@infradead.org>
&gt; Sent Time: 2024-07-02 16:44:18 (Tuesday)
&gt; To: "Yang Yingliang" <yangyingliang@huaweicloud.com>
&gt; Cc: linux-kernel@vger.kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, tglx@linutronix.de, yu.c.chen@intel.com, tim.c.chen@linux.intel.com, yangyingliang@huawei.com, liwei391@huawei.com
&gt; Subject: Re: [PATCH resend] sched/smt: fix unbalance sched_smt_present dec/inc
&gt; 
&gt; On Tue, Jul 02, 2024 at 04:11:28PM +0800, Yang Yingliang wrote:
&gt; &gt; From: Yang Yingliang <yangyingliang@huawei.com>
&gt; &gt; 
&gt; &gt; I got the following warn report while doing stress test:
&gt; &gt; 
&gt; &gt; jump label: negative count!
&gt; &gt; WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
&gt; &gt; Call Trace:
&gt; &gt;  <task>
&gt; &gt;  __static_key_slow_dec_cpuslocked+0x16/0x70
&gt; &gt;  sched_cpu_deactivate+0x26e/0x2a0
&gt; &gt;  cpuhp_invoke_callback+0x3ad/0x10d0
&gt; &gt;  cpuhp_thread_fun+0x3f5/0x680
&gt; &gt;  smpboot_thread_fn+0x56d/0x8d0
&gt; &gt;  kthread+0x309/0x400
&gt; &gt;  ret_from_fork+0x41/0x70
&gt; &gt;  ret_from_fork_asm+0x1b/0x30
&gt; &gt;  </task>
&gt; &gt; 
&gt; &gt; Because when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
&gt; &gt; the cpu offline failed, but sched_smt_present is decremented before
&gt; &gt; calling sched_cpu_deactivate(), it leads to unbalanced dec/inc, so
&gt; &gt; fix it by incrementing sched_smt_present in the error path.
&gt; &gt; 
&gt; &gt; Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
&gt; &gt; Reviewed-by: Chen Yu <yu.c.chen@intel.com>
&gt; &gt; Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
&gt; &gt; Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
&gt; &gt; ---
&gt; &gt;  kernel/sched/core.c | 4 ++++
&gt; &gt;  1 file changed, 4 insertions(+)
&gt; &gt; 
&gt; &gt; diff --git a/kernel/sched/core.c b/kernel/sched/core.c
&gt; &gt; index bcf2c4cc0522..5ab6717b57e0 100644
&gt; &gt; --- a/kernel/sched/core.c
&gt; &gt; +++ b/kernel/sched/core.c
&gt; &gt; @@ -9756,6 +9756,10 @@ int sched_cpu_deactivate(unsigned int cpu)
&gt; &gt;  	sched_update_numa(cpu, false);
&gt; &gt;  	ret = cpuset_cpu_inactive(cpu);
&gt; &gt;  	if (ret) {
&gt; &gt; +#ifdef CONFIG_SCHED_SMT
&gt; &gt; +		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
&gt; &gt; +			static_branch_inc_cpuslocked(&amp;sched_smt_present);
&gt; &gt; +#endif
&gt; &gt;  		balance_push_set(cpu, false);
&gt; &gt;  		set_cpu_active(cpu, true);
&gt; &gt;  		sched_update_numa(cpu, true);
&gt; 
&gt; Yes, does indeed appear needed, however!, when I look at
&gt; what else goes before this failure, should we not also call
&gt; set_rq_online() and things like that?
Yes, set_rq_online() is needed in the error path. I will send a new patch to add this.

&gt; 
&gt; That is, can we rework things to be less fragile by sharing code between
&gt; this error path and sched_cpu_activate() ?
</yangyingliang@huawei.com></tim.c.chen@linux.intel.com></yu.c.chen@intel.com></yangyingliang@huawei.com></yangyingliang@huaweicloud.com></peterz@infradead.org>