[PATCH v2] sched/deadline: Fix race in push_dl_task

Harshit Agarwal posted 1 patch 9 months ago
There is a newer version of this series
kernel/sched/deadline.c | 66 ++++++++++++++++++++++++++---------------
1 file changed, 42 insertions(+), 24 deletions(-)
[PATCH v2] sched/deadline: Fix race in push_dl_task
Posted by Harshit Agarwal 9 months ago
This fix is the deadline version of the change made to the rt scheduler
titled: "sched/rt: Fix race in push_rt_task".
Here is the summary of the issue:
When a CPU chooses to call push_dl_task and picks a task to push to
another CPU's runqueue then it will call find_lock_later_rq method
which would take a double lock on both CPUs' runqueues. If one of the
locks aren't readily available, it may lead to dropping the current
runqueue lock and reacquiring both the locks at once. During this window
it is possible that the task is already migrated and is running on some
other CPU. These cases are already handled. However, if the task is
migrated and has already been executed and another CPU is now trying to
wake it up (ttwu) such that it is queued again on the runqeue
(on_rq is 1) and also if the task was run by the same CPU, then the
current checks will pass even though the task was migrated out and is no
longer in the pushable tasks list.
Please go through the original change for more details on the issue.

In this fix, after the lock is obtained inside the find_lock_later_rq we
ensure that the task is still at the head of pushable tasks list. Also
removed some checks that are no longer needed with the addition this new
check.
However, the check of pushable tasks list only applies when
find_lock_later_rq is called by push_dl_task. For the other caller i.e.
dl_task_offline_migration, we use the existing checks.

Signed-off-by: Harshit Agarwal <harshit@nutanix.com>
Cc: stable@vger.kernel.org
---
Changes in v2:
- As per Juri's suggestion, moved the check inside find_lock_later_rq
  similar to rt change. Here we distinguish among the push_dl_task
  caller vs dl_task_offline_migration by checking if the task is
  throttled or not.
- Fixed the commit message to refer to the rt change by title.
- Link to v1:
  https://lore.kernel.org/lkml/20250307204255.60640-1-harshit@nutanix.com/
---
 kernel/sched/deadline.c | 66 ++++++++++++++++++++++++++---------------
 1 file changed, 42 insertions(+), 24 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 38e4537790af..2366801b4557 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2621,6 +2621,25 @@ static int find_later_rq(struct task_struct *task)
 	return -1;
 }
 
+static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
+{
+	struct task_struct *p;
+
+	if (!has_pushable_dl_tasks(rq))
+		return NULL;
+
+	p = __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root));
+
+	WARN_ON_ONCE(rq->cpu != task_cpu(p));
+	WARN_ON_ONCE(task_current(rq, p));
+	WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
+
+	WARN_ON_ONCE(!task_on_rq_queued(p));
+	WARN_ON_ONCE(!dl_task(p));
+
+	return p;
+}
+
 /* Locks the rq it finds */
 static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 {
@@ -2648,12 +2667,30 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 
 		/* Retry if something changed. */
 		if (double_lock_balance(rq, later_rq)) {
-			if (unlikely(task_rq(task) != rq ||
+			/*
+			 * We had to unlock the run queue. In the meantime,
+			 * task could have migrated already or had its affinity
+			 * changed.
+			 * It is possible the task was scheduled, set
+			 * "migrate_disabled" and then got preempted, so we must
+			 * check the task migration disable flag here too.
+			 * For throttled task (dl_task_offline_migration), we
+			 * check if the task is migrated to a different rq or
+			 * is not a dl task anymore.
+			 * For the non-throttled task (push_dl_task), the check
+			 * to ensure that this task is still at the head of the
+			 * pushable tasks list is enough.
+			 */
+			if (unlikely(is_migration_disabled(task) ||
 				     !cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) ||
-				     task_on_cpu(rq, task) ||
-				     !dl_task(task) ||
-				     is_migration_disabled(task) ||
-				     !task_on_rq_queued(task))) {
+				     (task->dl.dl_throttled &&
+				      (task_rq(task) != rq ||
+				       task_on_cpu(rq, task) ||
+				       !dl_task(task) ||
+				       !task_on_rq_queued(task))) ||
+				     (!task->dl.dl_throttled &&
+				      task != pick_next_pushable_dl_task(rq)))) {
+
 				double_unlock_balance(rq, later_rq);
 				later_rq = NULL;
 				break;
@@ -2676,25 +2713,6 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
 	return later_rq;
 }
 
-static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
-{
-	struct task_struct *p;
-
-	if (!has_pushable_dl_tasks(rq))
-		return NULL;
-
-	p = __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root));
-
-	WARN_ON_ONCE(rq->cpu != task_cpu(p));
-	WARN_ON_ONCE(task_current(rq, p));
-	WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
-
-	WARN_ON_ONCE(!task_on_rq_queued(p));
-	WARN_ON_ONCE(!dl_task(p));
-
-	return p;
-}
-
 /*
  * See if the non running -deadline tasks on this rq
  * can be sent to some other CPU where they can preempt
-- 
2.39.3
Re: [PATCH v2] sched/deadline: Fix race in push_dl_task
Posted by Juri Lelli 8 months, 2 weeks ago
Hi,

I think I like this version better, but others feel free to disagree. :)

A few comments inline below.

On 17/03/25 02:23, Harshit Agarwal wrote:
> This fix is the deadline version of the change made to the rt scheduler
> titled: "sched/rt: Fix race in push_rt_task".
> Here is the summary of the issue:

I would probably remove the paragraph above and simply describe what the
issues is like you do below.

> When a CPU chooses to call push_dl_task and picks a task to push to
> another CPU's runqueue then it will call find_lock_later_rq method
> which would take a double lock on both CPUs' runqueues. If one of the
> locks aren't readily available, it may lead to dropping the current
> runqueue lock and reacquiring both the locks at once. During this window
> it is possible that the task is already migrated and is running on some
> other CPU. These cases are already handled. However, if the task is
> migrated and has already been executed and another CPU is now trying to
> wake it up (ttwu) such that it is queued again on the runqeue
> (on_rq is 1) and also if the task was run by the same CPU, then the
> current checks will pass even though the task was migrated out and is no
> longer in the pushable tasks list.
> Please go through the original change for more details on the issue.
> 
> In this fix, after the lock is obtained inside the find_lock_later_rq we

Please use imperative mode

https://elixir.bootlin.com/linux/v6.13.7/source/Documentation/process/submitting-patches.rst#L94

> ensure that the task is still at the head of pushable tasks list. Also
> removed some checks that are no longer needed with the addition this new
> check.
> However, the check of pushable tasks list only applies when
> find_lock_later_rq is called by push_dl_task. For the other caller i.e.
> dl_task_offline_migration, we use the existing checks.
> 
> Signed-off-by: Harshit Agarwal <harshit@nutanix.com>
> Cc: stable@vger.kernel.org
> ---
> Changes in v2:
> - As per Juri's suggestion, moved the check inside find_lock_later_rq
>   similar to rt change. Here we distinguish among the push_dl_task
>   caller vs dl_task_offline_migration by checking if the task is
>   throttled or not.
> - Fixed the commit message to refer to the rt change by title.
> - Link to v1:
>   https://lore.kernel.org/lkml/20250307204255.60640-1-harshit@nutanix.com/
> ---
>  kernel/sched/deadline.c | 66 ++++++++++++++++++++++++++---------------
>  1 file changed, 42 insertions(+), 24 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 38e4537790af..2366801b4557 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2621,6 +2621,25 @@ static int find_later_rq(struct task_struct *task)
>  	return -1;
>  }
>  
> +static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
> +{
> +	struct task_struct *p;
> +
> +	if (!has_pushable_dl_tasks(rq))
> +		return NULL;
> +
> +	p = __node_2_pdl(rb_first_cached(&rq->dl.pushable_dl_tasks_root));
> +
> +	WARN_ON_ONCE(rq->cpu != task_cpu(p));
> +	WARN_ON_ONCE(task_current(rq, p));
> +	WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
> +
> +	WARN_ON_ONCE(!task_on_rq_queued(p));
> +	WARN_ON_ONCE(!dl_task(p));
> +
> +	return p;
> +}
> +
>  /* Locks the rq it finds */
>  static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
>  {
> @@ -2648,12 +2667,30 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
>  
>  		/* Retry if something changed. */
>  		if (double_lock_balance(rq, later_rq)) {
> -			if (unlikely(task_rq(task) != rq ||
> +			/*
> +			 * We had to unlock the run queue. In the meantime,

Maybe rephrase as "doble_lock_balance might have released rq->lock ...".

> +			 * task could have migrated already or had its affinity
> +			 * changed.
> +			 * It is possible the task was scheduled, set
> +			 * "migrate_disabled" and then got preempted, so we must
> +			 * check the task migration disable flag here too.
> +			 * For throttled task (dl_task_offline_migration), we
> +			 * check if the task is migrated to a different rq or
> +			 * is not a dl task anymore.
> +			 * For the non-throttled task (push_dl_task), the check
> +			 * to ensure that this task is still at the head of the
> +			 * pushable tasks list is enough.

Maybe you can make a bullet point list out of this comment so that it's
even easier to associate it to the conditions below?

> +			 */
> +			if (unlikely(is_migration_disabled(task) ||
>  				     !cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) ||
> -				     task_on_cpu(rq, task) ||
> -				     !dl_task(task) ||
> -				     is_migration_disabled(task) ||
> -				     !task_on_rq_queued(task))) {
> +				     (task->dl.dl_throttled &&
> +				      (task_rq(task) != rq ||
> +				       task_on_cpu(rq, task) ||
> +				       !dl_task(task) ||
> +				       !task_on_rq_queued(task))) ||
> +				     (!task->dl.dl_throttled &&
> +				      task != pick_next_pushable_dl_task(rq)))) {
> +

Thanks!
Juri
Re: [PATCH v2] sched/deadline: Fix race in push_dl_task
Posted by Harshit Agarwal 8 months, 1 week ago
Thanks Juri. Addressed the comments in the v3 here: 
https://lore.kernel.org/lkml/20250408045021.3283624-1-harshit@nutanix.com/T/#u

Please take a look.

Regards,
Harshit