kernel/sched/rt.c | 8 ++++++++ 1 file changed, 8 insertions(+)
Recheck whether next_task is still in the runqueue of this_rq after
locking this_rq and lowest_rq via double_lock_balance() in
push_rt_task(). This is necessary because double_lock_balance() first
releases this_rq->lock and then attempts to acquire both this_rq->lock
and lowest_rq->lock, during which next_task may have already been
removed from this_rq's runqueue, leading to a double dequeue issue.
The double dequeue issue can occur in the following scenario:
1. Core0 call stack:
autoremove_wake_function
default_wake_function
try_to_wake_up
ttwu_do_activate
task_woken_rt
push_rt_task
move_queued_task_locked
dequeue_task
__wake_up
2. Execution flow on Core0, Core1 and Core2(Core0, Core1 and Core2 are
contending for Core1's rq->lock):
- Core1: enqueue next_task on Core1
- Core0: lock Core1's rq->lock
next_task = pick_next_pushable_task()
unlock Core1's rq->lock via double_lock_balance()
- Core1: lock Core1's rq->lock
next_task = pick_next_task()
unlock Core1's rq->lock
- Core2: lock Core1's rq->lock in migration thread
- Core1: running next_task
- Core2: unlock Core1's rq->lock
- Core1: lock Core1's rq->lock
switches out and dequeue next_task
unlock Core1's rq->lock
- Core0: relock Core1's rq->lock from double_lock_balance()
try to relock Core1's rq->lock from double_lock_balance()
but next_task has been dequeued from Core1, causing the issue
Signed-off-by: Tengfei Fan <tengfei.fan@oss.qualcomm.com>
---
kernel/sched/rt.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7936d4333731..b4e44317a5de 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2037,6 +2037,14 @@ static int push_rt_task(struct rq *rq, bool pull)
goto retry;
}
+ /* Within find_lock_lowest_rq(), it's possible to first unlock the
+ * rq->lock of the runqueue containing next_task, and the re->lock
+ * it. During this window, the state of next_task might have change.
+ */
+ if (unlikely(rq != task_rq(next_task) ||
+ !task_on_rq_queued(next_task)))
+ goto out;
+
move_queued_task_locked(rq, lowest_rq, next_task);
resched_curr(lowest_rq);
ret = 1;
---
base-commit: 7c3ba4249a3604477ea9c077e10089ba7ddcaa03
change-id: 20251008-recheck_rt_task_enqueue_state-e159aa6a2749
Best regards,
--
Tengfei Fan <tengfei.fan@oss.qualcomm.com>
On 09/10/25 00:23, Tengfei Fan wrote:
> Recheck whether next_task is still in the runqueue of this_rq after
> locking this_rq and lowest_rq via double_lock_balance() in
> push_rt_task(). This is necessary because double_lock_balance() first
> releases this_rq->lock and then attempts to acquire both this_rq->lock
> and lowest_rq->lock, during which next_task may have already been
> removed from this_rq's runqueue, leading to a double dequeue issue.
>
> The double dequeue issue can occur in the following scenario:
> 1. Core0 call stack:
> autoremove_wake_function
> default_wake_function
> try_to_wake_up
> ttwu_do_activate
> task_woken_rt
> push_rt_task
> move_queued_task_locked
> dequeue_task
> __wake_up
>
> 2. Execution flow on Core0, Core1 and Core2(Core0, Core1 and Core2 are
> contending for Core1's rq->lock):
> - Core1: enqueue next_task on Core1
> - Core0: lock Core1's rq->lock
> next_task = pick_next_pushable_task()
> unlock Core1's rq->lock via double_lock_balance()
> - Core1: lock Core1's rq->lock
> next_task = pick_next_task()
> unlock Core1's rq->lock
> - Core2: lock Core1's rq->lock in migration thread
> - Core1: running next_task
> - Core2: unlock Core1's rq->lock
> - Core1: lock Core1's rq->lock
> switches out and dequeue next_task
> unlock Core1's rq->lock
> - Core0: relock Core1's rq->lock from double_lock_balance()
> try to relock Core1's rq->lock from double_lock_balance()
> but next_task has been dequeued from Core1, causing the issue
>
> Signed-off-by: Tengfei Fan <tengfei.fan@oss.qualcomm.com>
> ---
> kernel/sched/rt.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 7936d4333731..b4e44317a5de 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2037,6 +2037,14 @@ static int push_rt_task(struct rq *rq, bool pull)
> goto retry;
> }
>
> + /* Within find_lock_lowest_rq(), it's possible to first unlock the
> + * rq->lock of the runqueue containing next_task, and the re->lock
> + * it. During this window, the state of next_task might have change.
> + */
> + if (unlikely(rq != task_rq(next_task) ||
> + !task_on_rq_queued(next_task)))
> + goto out;
> +
Isn't this already covered by find_lock_lowest_rq()?
if @next_task migrates during the double_lock_balance(), we'll see that
it's no longer the next highest priority pushable task of its original rq
(it won't be in that pushable list at all actually):
static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
{
[...]
if (double_lock_balance(rq, lowest_rq)) {
if (unlikely(is_migration_disabled(task) ||
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
task != pick_next_pushable_task(rq))) {
double_unlock_balance(rq, lowest_rq);
lowest_rq = NULL;
break;
}
}
}
Plus:
static int push_rt_task(struct rq *rq, bool pull)
{
[...]
if (!lowest_rq) {
struct task_struct *task;
task = pick_next_pushable_task(rq);
[...]
put_task_struct(next_task);
next_task = task;
goto retry;
}
}
AFAICT in the scenario you described, we'd just retry with another next
pushable task.
> move_queued_task_locked(rq, lowest_rq, next_task);
> resched_curr(lowest_rq);
> ret = 1;
>
> ---
> base-commit: 7c3ba4249a3604477ea9c077e10089ba7ddcaa03
> change-id: 20251008-recheck_rt_task_enqueue_state-e159aa6a2749
>
> Best regards,
> --
> Tengfei Fan <tengfei.fan@oss.qualcomm.com>
On 10/20/2025 8:55 PM, 'Valentin Schneider' via kernel wrote:
> On 09/10/25 00:23, Tengfei Fan wrote:
>> Recheck whether next_task is still in the runqueue of this_rq after
>> locking this_rq and lowest_rq via double_lock_balance() in
>> push_rt_task(). This is necessary because double_lock_balance() first
>> releases this_rq->lock and then attempts to acquire both this_rq->lock
>> and lowest_rq->lock, during which next_task may have already been
>> removed from this_rq's runqueue, leading to a double dequeue issue.
>>
>> The double dequeue issue can occur in the following scenario:
>> 1. Core0 call stack:
>> autoremove_wake_function
>> default_wake_function
>> try_to_wake_up
>> ttwu_do_activate
>> task_woken_rt
>> push_rt_task
>> move_queued_task_locked
>> dequeue_task
>> __wake_up
>>
>> 2. Execution flow on Core0, Core1 and Core2(Core0, Core1 and Core2 are
>> contending for Core1's rq->lock):
>> - Core1: enqueue next_task on Core1
>> - Core0: lock Core1's rq->lock
>> next_task = pick_next_pushable_task()
>> unlock Core1's rq->lock via double_lock_balance()
>> - Core1: lock Core1's rq->lock
>> next_task = pick_next_task()
>> unlock Core1's rq->lock
>> - Core2: lock Core1's rq->lock in migration thread
>> - Core1: running next_task
>> - Core2: unlock Core1's rq->lock
>> - Core1: lock Core1's rq->lock
>> switches out and dequeue next_task
>> unlock Core1's rq->lock
>> - Core0: relock Core1's rq->lock from double_lock_balance()
>> try to relock Core1's rq->lock from double_lock_balance()
>> but next_task has been dequeued from Core1, causing the issue
>>
>> Signed-off-by: Tengfei Fan <tengfei.fan@oss.qualcomm.com>
>> ---
>> kernel/sched/rt.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> index 7936d4333731..b4e44317a5de 100644
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
>> @@ -2037,6 +2037,14 @@ static int push_rt_task(struct rq *rq, bool pull)
>> goto retry;
>> }
>>
>> + /* Within find_lock_lowest_rq(), it's possible to first unlock the
>> + * rq->lock of the runqueue containing next_task, and the re->lock
>> + * it. During this window, the state of next_task might have change.
>> + */
>> + if (unlikely(rq != task_rq(next_task) ||
>> + !task_on_rq_queued(next_task)))
>> + goto out;
>> +
> Isn't this already covered by find_lock_lowest_rq()?
Yes, this logic is already included in find_lock_lowest_rq().
Previously, we were missing the following patch.
https://lore.kernel.org/r/20250225180553.167995-1-harshit@nutanix.com
We will recheck whether our case has already been resolved by this patch.
>
> if @next_task migrates during the double_lock_balance(), we'll see that
> it's no longer the next highest priority pushable task of its original rq
> (it won't be in that pushable list at all actually):
>
> static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
> {
> [...]
> if (double_lock_balance(rq, lowest_rq)) {
> if (unlikely(is_migration_disabled(task) ||
> !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
> task != pick_next_pushable_task(rq))) {
>
> double_unlock_balance(rq, lowest_rq);
> lowest_rq = NULL;
> break;
> }
> }
> }
>
> Plus:
>
> static int push_rt_task(struct rq *rq, bool pull)
> {
> [...]
> if (!lowest_rq) {
> struct task_struct *task;
> task = pick_next_pushable_task(rq);
> [...]
> put_task_struct(next_task);
> next_task = task;
> goto retry;
> }
> }
>
> AFAICT in the scenario you described, we'd just retry with another next
> pushable task.
I think this is just a different handling approach. At the time, our
concern was that retrying might introduce an infinite loop.
>
>> move_queued_task_locked(rq, lowest_rq, next_task);
>> resched_curr(lowest_rq);
>> ret = 1;
>>
>> ---
>> base-commit: 7c3ba4249a3604477ea9c077e10089ba7ddcaa03
>> change-id: 20251008-recheck_rt_task_enqueue_state-e159aa6a2749
>>
>> Best regards,
>> --
>> Tengfei Fan <tengfei.fan@oss.qualcomm.com>
--
Thx and BRs,
Tengfei Fan
© 2016 - 2025 Red Hat, Inc.