sched: Recheck the rt task's on rq state after double_lock_balance()

[PATCH] sched: Recheck the rt task's on rq state after double_lock_balance()

Posted by Tengfei Fan 4 months ago

Recheck whether next_task is still in the runqueue of this_rq after
locking this_rq and lowest_rq via double_lock_balance() in
push_rt_task(). This is necessary because double_lock_balance() first
releases this_rq->lock and then attempts to acquire both this_rq->lock
and lowest_rq->lock, during which next_task may have already been
removed from this_rq's runqueue, leading to a double dequeue issue.

The double dequeue issue can occur in the following scenario:
1. Core0 call stack:
        autoremove_wake_function
        default_wake_function
        try_to_wake_up
        ttwu_do_activate
        task_woken_rt
        push_rt_task
        move_queued_task_locked
        dequeue_task
        __wake_up

2. Execution flow on Core0, Core1 and Core2(Core0, Core1 and Core2 are
   contending for Core1's rq->lock):
   - Core1: enqueue next_task on Core1
   - Core0: lock Core1's rq->lock
            next_task = pick_next_pushable_task()
            unlock Core1's rq->lock via double_lock_balance()
   - Core1: lock Core1's rq->lock
            next_task = pick_next_task()
            unlock Core1's rq->lock
   - Core2: lock Core1's rq->lock in migration thread
   - Core1: running next_task
   - Core2: unlock Core1's rq->lock
   - Core1: lock Core1's rq->lock
            switches out and dequeue next_task
            unlock Core1's rq->lock
   - Core0: relock Core1's rq->lock from double_lock_balance()
            try to relock Core1's rq->lock from double_lock_balance()
            but next_task has been dequeued from Core1, causing the issue

Signed-off-by: Tengfei Fan <tengfei.fan@oss.qualcomm.com>
---
 kernel/sched/rt.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7936d4333731..b4e44317a5de 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2037,6 +2037,14 @@ static int push_rt_task(struct rq *rq, bool pull)
 		goto retry;
 	}
 
+	/* Within find_lock_lowest_rq(), it's possible to first unlock the
+	 * rq->lock of the runqueue containing next_task, and the re->lock
+	 * it. During this window, the state of next_task might have change.
+	 */
+	if (unlikely(rq != task_rq(next_task) ||
+		     !task_on_rq_queued(next_task)))
+		goto out;
+
 	move_queued_task_locked(rq, lowest_rq, next_task);
 	resched_curr(lowest_rq);
 	ret = 1;

---
base-commit: 7c3ba4249a3604477ea9c077e10089ba7ddcaa03
change-id: 20251008-recheck_rt_task_enqueue_state-e159aa6a2749

Best regards,
-- 
Tengfei Fan <tengfei.fan@oss.qualcomm.com>

Re: [PATCH] sched: Recheck the rt task's on rq state after double_lock_balance()

Posted by Valentin Schneider 3 months, 3 weeks ago

On 09/10/25 00:23, Tengfei Fan wrote:
> Recheck whether next_task is still in the runqueue of this_rq after
> locking this_rq and lowest_rq via double_lock_balance() in
> push_rt_task(). This is necessary because double_lock_balance() first
> releases this_rq->lock and then attempts to acquire both this_rq->lock
> and lowest_rq->lock, during which next_task may have already been
> removed from this_rq's runqueue, leading to a double dequeue issue.
>
> The double dequeue issue can occur in the following scenario:
> 1. Core0 call stack:
>         autoremove_wake_function
>         default_wake_function
>         try_to_wake_up
>         ttwu_do_activate
>         task_woken_rt
>         push_rt_task
>         move_queued_task_locked
>         dequeue_task
>         __wake_up
>
> 2. Execution flow on Core0, Core1 and Core2(Core0, Core1 and Core2 are
>    contending for Core1's rq->lock):
>    - Core1: enqueue next_task on Core1
>    - Core0: lock Core1's rq->lock
>             next_task = pick_next_pushable_task()
>             unlock Core1's rq->lock via double_lock_balance()
>    - Core1: lock Core1's rq->lock
>             next_task = pick_next_task()
>             unlock Core1's rq->lock
>    - Core2: lock Core1's rq->lock in migration thread
>    - Core1: running next_task
>    - Core2: unlock Core1's rq->lock
>    - Core1: lock Core1's rq->lock
>             switches out and dequeue next_task
>             unlock Core1's rq->lock
>    - Core0: relock Core1's rq->lock from double_lock_balance()
>             try to relock Core1's rq->lock from double_lock_balance()
>             but next_task has been dequeued from Core1, causing the issue
>
> Signed-off-by: Tengfei Fan <tengfei.fan@oss.qualcomm.com>
> ---
>  kernel/sched/rt.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 7936d4333731..b4e44317a5de 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2037,6 +2037,14 @@ static int push_rt_task(struct rq *rq, bool pull)
>  		goto retry;
>  	}
>  
> +	/* Within find_lock_lowest_rq(), it's possible to first unlock the
> +	 * rq->lock of the runqueue containing next_task, and the re->lock
> +	 * it. During this window, the state of next_task might have change.
> +	 */
> +	if (unlikely(rq != task_rq(next_task) ||
> +		     !task_on_rq_queued(next_task)))
> +		goto out;
> +

Isn't this already covered by find_lock_lowest_rq()?

if @next_task migrates during the double_lock_balance(), we'll see that
it's no longer the next highest priority pushable task of its original rq
(it won't be in that pushable list at all actually):

  static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
  {
          [...]
          if (double_lock_balance(rq, lowest_rq)) {
                  if (unlikely(is_migration_disabled(task) ||
                               !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
                               task != pick_next_pushable_task(rq))) {

                          double_unlock_balance(rq, lowest_rq);
                          lowest_rq = NULL;
                          break;
                  }
          }
  }                

Plus:

  static int push_rt_task(struct rq *rq, bool pull)
  {
          [...]
          if (!lowest_rq) {
                  struct task_struct *task;
                  task = pick_next_pushable_task(rq);
                  [...]
                  put_task_struct(next_task);
                  next_task = task;
                  goto retry;
          }
  }        

AFAICT in the scenario you described, we'd just retry with another next
pushable task.

>  	move_queued_task_locked(rq, lowest_rq, next_task);
>  	resched_curr(lowest_rq);
>  	ret = 1;
>
> ---
> base-commit: 7c3ba4249a3604477ea9c077e10089ba7ddcaa03
> change-id: 20251008-recheck_rt_task_enqueue_state-e159aa6a2749
>
> Best regards,
> -- 
> Tengfei Fan <tengfei.fan@oss.qualcomm.com>

Re: [PATCH] sched: Recheck the rt task's on rq state after double_lock_balance()

Posted by Tengfei Fan 3 months, 2 weeks ago

On 10/20/2025 8:55 PM, 'Valentin Schneider' via kernel wrote:
> On 09/10/25 00:23, Tengfei Fan wrote:
>> Recheck whether next_task is still in the runqueue of this_rq after
>> locking this_rq and lowest_rq via double_lock_balance() in
>> push_rt_task(). This is necessary because double_lock_balance() first
>> releases this_rq->lock and then attempts to acquire both this_rq->lock
>> and lowest_rq->lock, during which next_task may have already been
>> removed from this_rq's runqueue, leading to a double dequeue issue.
>>
>> The double dequeue issue can occur in the following scenario:
>> 1. Core0 call stack:
>>          autoremove_wake_function
>>          default_wake_function
>>          try_to_wake_up
>>          ttwu_do_activate
>>          task_woken_rt
>>          push_rt_task
>>          move_queued_task_locked
>>          dequeue_task
>>          __wake_up
>>
>> 2. Execution flow on Core0, Core1 and Core2(Core0, Core1 and Core2 are
>>     contending for Core1's rq->lock):
>>     - Core1: enqueue next_task on Core1
>>     - Core0: lock Core1's rq->lock
>>              next_task = pick_next_pushable_task()
>>              unlock Core1's rq->lock via double_lock_balance()
>>     - Core1: lock Core1's rq->lock
>>              next_task = pick_next_task()
>>              unlock Core1's rq->lock
>>     - Core2: lock Core1's rq->lock in migration thread
>>     - Core1: running next_task
>>     - Core2: unlock Core1's rq->lock
>>     - Core1: lock Core1's rq->lock
>>              switches out and dequeue next_task
>>              unlock Core1's rq->lock
>>     - Core0: relock Core1's rq->lock from double_lock_balance()
>>              try to relock Core1's rq->lock from double_lock_balance()
>>              but next_task has been dequeued from Core1, causing the issue
>>
>> Signed-off-by: Tengfei Fan <tengfei.fan@oss.qualcomm.com>
>> ---
>>   kernel/sched/rt.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> index 7936d4333731..b4e44317a5de 100644
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
>> @@ -2037,6 +2037,14 @@ static int push_rt_task(struct rq *rq, bool pull)
>>   		goto retry;
>>   	}
>>   
>> +	/* Within find_lock_lowest_rq(), it's possible to first unlock the
>> +	 * rq->lock of the runqueue containing next_task, and the re->lock
>> +	 * it. During this window, the state of next_task might have change.
>> +	 */
>> +	if (unlikely(rq != task_rq(next_task) ||
>> +		     !task_on_rq_queued(next_task)))
>> +		goto out;
>> +
> Isn't this already covered by find_lock_lowest_rq()?

Yes, this logic is already included in find_lock_lowest_rq(). 
Previously, we were missing the following patch.

https://lore.kernel.org/r/20250225180553.167995-1-harshit@nutanix.com

We will recheck whether our case has already been resolved by this patch.


>
> if @next_task migrates during the double_lock_balance(), we'll see that
> it's no longer the next highest priority pushable task of its original rq
> (it won't be in that pushable list at all actually):
>
>    static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
>    {
>            [...]
>            if (double_lock_balance(rq, lowest_rq)) {
>                    if (unlikely(is_migration_disabled(task) ||
>                                 !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
>                                 task != pick_next_pushable_task(rq))) {
>
>                            double_unlock_balance(rq, lowest_rq);
>                            lowest_rq = NULL;
>                            break;
>                    }
>            }
>    }
>
> Plus:
>
>    static int push_rt_task(struct rq *rq, bool pull)
>    {
>            [...]
>            if (!lowest_rq) {
>                    struct task_struct *task;
>                    task = pick_next_pushable_task(rq);
>                    [...]
>                    put_task_struct(next_task);
>                    next_task = task;
>                    goto retry;
>            }
>    }
>
> AFAICT in the scenario you described, we'd just retry with another next
> pushable task.
I think this is just a different handling approach. At the time, our 
concern was that retrying might introduce an infinite loop.
>
>>   	move_queued_task_locked(rq, lowest_rq, next_task);
>>   	resched_curr(lowest_rq);
>>   	ret = 1;
>>
>> ---
>> base-commit: 7c3ba4249a3604477ea9c077e10089ba7ddcaa03
>> change-id: 20251008-recheck_rt_task_enqueue_state-e159aa6a2749
>>
>> Best regards,
>> -- 
>> Tengfei Fan <tengfei.fan@oss.qualcomm.com>

-- 
Thx and BRs,
Tengfei Fan