sched_ext: Fix rq lock state in hotplug ops

[PATCH] sched_ext: Fix rq lock state in hotplug ops

Posted by Andrea Righi 9 months, 2 weeks ago

The ops.cpu_online() and ops.cpu_offline() callbacks incorrectly assume
that the rq involved in the operation is locked, which is not the case
during hotplug, triggering the following warning:

  WARNING: CPU: 1 PID: 20 at kernel/sched/sched.h:1504 handle_hotplug+0x280/0x340

Fix by not tracking the target rq as locked in the context of
ops.cpu_online() and ops.cpu_offline().

Fixes: 18853ba782bef ("sched_ext: Track currently locked rq")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index ac79067dc87e6..d0f40fd258752 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3477,9 +3477,9 @@ static void handle_hotplug(struct rq *rq, bool online)
 		scx_idle_update_selcpu_topology(&scx_ops);
 
 	if (online && SCX_HAS_OP(cpu_online))
-		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_online, rq, cpu);
+		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_online, NULL, cpu);
 	else if (!online && SCX_HAS_OP(cpu_offline))
-		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_offline, rq, cpu);
+		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_offline, NULL, cpu);
 	else
 		scx_ops_exit(SCX_ECODE_ACT_RESTART | SCX_ECODE_RSN_HOTPLUG,
 			     "cpu %d going %s, exiting scheduler", cpu,
-- 
2.49.0

Re: [PATCH] sched_ext: Fix rq lock state in hotplug ops

Posted by Tejun Heo 9 months, 1 week ago

On Mon, Apr 28, 2025 at 11:43:20PM +0200, Andrea Righi wrote:
> The ops.cpu_online() and ops.cpu_offline() callbacks incorrectly assume
> that the rq involved in the operation is locked, which is not the case
> during hotplug, triggering the following warning:
> 
>   WARNING: CPU: 1 PID: 20 at kernel/sched/sched.h:1504 handle_hotplug+0x280/0x340
> 
> Fix by not tracking the target rq as locked in the context of
> ops.cpu_online() and ops.cpu_offline().
> 
> Fixes: 18853ba782bef ("sched_ext: Track currently locked rq")
> Reported-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Andrea Righi <arighi@nvidia.com>

Applied to sched_ext/for-6.15-fixes.

Thanks.

-- 
tejun

Re: [PATCH] sched_ext: Fix rq lock state in hotplug ops

Posted by Changwoo Min 9 months, 2 weeks ago

Hi Andrea,

Thanks for the fix. Now it cleanly passes the hotplug selftest.

Tested-by: Changwoo Min <changwoo@igalia.com>

On 4/29/25 06:43, Andrea Righi wrote:
> The ops.cpu_online() and ops.cpu_offline() callbacks incorrectly assume
> that the rq involved in the operation is locked, which is not the case
> during hotplug, triggering the following warning:
> 
>    WARNING: CPU: 1 PID: 20 at kernel/sched/sched.h:1504 handle_hotplug+0x280/0x340
> 
> Fix by not tracking the target rq as locked in the context of
> ops.cpu_online() and ops.cpu_offline().
> 
> Fixes: 18853ba782bef ("sched_ext: Track currently locked rq")
> Reported-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Andrea Righi <arighi@nvidia.com>
> ---
>   kernel/sched/ext.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index ac79067dc87e6..d0f40fd258752 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3477,9 +3477,9 @@ static void handle_hotplug(struct rq *rq, bool online)
>   		scx_idle_update_selcpu_topology(&scx_ops);
>   
>   	if (online && SCX_HAS_OP(cpu_online))
> -		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_online, rq, cpu);
> +		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_online, NULL, cpu);
>   	else if (!online && SCX_HAS_OP(cpu_offline))
> -		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_offline, rq, cpu);
> +		SCX_CALL_OP(SCX_KF_UNLOCKED, cpu_offline, NULL, cpu);
>   	else
>   		scx_ops_exit(SCX_ECODE_ACT_RESTART | SCX_ECODE_RSN_HOTPLUG,
>   			     "cpu %d going %s, exiting scheduler", cpu,