[PATCH] posix-timers: Fix a race with __exit_signal()

Ilya Leoshkevich posted 1 patch 1 year, 2 months ago
kernel/signal.c | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)
[PATCH] posix-timers: Fix a race with __exit_signal()
Posted by Ilya Leoshkevich 1 year, 2 months ago
If a thread exit happens simultaneously with a POSIX timer signal, this
POSIX timer may end up being erroneously stopped. This problem may be
reproduced with a small C program that creates a POSIX timer and
constantly respawns threads, e.g. [1].

When posixtimer_send_sigqueue() races against __exit_signal(), the
latter may clear the selected task's sighand, causing
lock_task_sighand() to fail. This is possible because __exit_signal()
clears the task's sighand under the sighand lock, but
lock_task_sighand() needs to first load it without taking this lock.

The it_sigqueue_seq update needs to happen under the sighand lock;
failure to take this lock means that it is not possible to make that
update. And if it_sigqueue_seq is not updated, then
__posixtimer_deliver_signal() does not rearm the timer, effectively
stopping it.

Fix by choosing another thread if the one chosen by
posixtimer_get_target() is exiting. This requires taking tasklist_lock,
which is ordered before the sighand lock, as can be seen in, e.g.,
release_task(). tasklist_lock may be released immediately after the
sighand lock is successfully taken, which will look nicer, but will
create uncertainty w.r.t. restoring the irq flags, so release it
at the end of posixtimer_send_sigqueue().

There is another user of posixtimer_get_target(), which may potentially
be affected as well: posixtimer_sig_unignore(). But it is called with
the sighand lock taken, so the race with __exit_signal() is fortunately
not possible there.

[1] https://gitlab.com/qemu-project/qemu/-/blob/v9.1.1/tests/tcg/multiarch/signals.c?ref_type=tags

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 kernel/signal.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 989b1cc9116a..ff1608997301 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1974,10 +1974,11 @@ static inline struct task_struct *posixtimer_get_target(struct k_itimer *tmr)
 
 void posixtimer_send_sigqueue(struct k_itimer *tmr)
 {
+	unsigned long flags, tasklist_flags;
 	struct sigqueue *q = &tmr->sigq;
+	bool tasklist_locked = false;
 	int sig = q->info.si_signo;
 	struct task_struct *t;
-	unsigned long flags;
 	int result;
 
 	guard(rcu)();
@@ -1986,8 +1987,25 @@ void posixtimer_send_sigqueue(struct k_itimer *tmr)
 	if (!t)
 		return;
 
-	if (!likely(lock_task_sighand(t, &flags)))
-		return;
+	if (!likely(lock_task_sighand(t, &flags))) {
+		/*
+		 * The target is exiting, pick another one. This ensures that
+		 * it_sigqueue_seq is updated, otherwise
+		 * posixtimer_deliver_signal() will not rearm the timer.
+		 */
+		bool found = false;
+
+		read_lock_irqsave(&tasklist_lock, tasklist_flags);
+		tasklist_locked = true;
+		do_each_pid_task(tmr->it_pid, tmr->it_pid_type, t) {
+			if (likely(lock_task_sighand(t, &flags))) {
+				found = true;
+				break;
+			}
+		} while_each_pid_task(tmr->it_pid, tmr->it_pid_type, t);
+		if (!likely(found))
+			goto out_tasklist;
+	}
 
 	/*
 	 * Update @tmr::sigqueue_seq for posix timer signals with sighand
@@ -2062,6 +2080,9 @@ void posixtimer_send_sigqueue(struct k_itimer *tmr)
 out:
 	trace_signal_generate(sig, &q->info, t, tmr->it_pid_type != PIDTYPE_PID, result);
 	unlock_task_sighand(t, &flags);
+out_tasklist:
+	if (tasklist_locked)
+		read_unlock_irqrestore(&tasklist_lock, tasklist_flags);
 }
 
 static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q)
-- 
2.47.0
Re: [PATCH] posix-timers: Fix a race with __exit_signal()
Posted by Thomas Gleixner 1 year, 2 months ago
On Mon, Dec 02 2024 at 16:54, Ilya Leoshkevich wrote:
> If a thread exit happens simultaneously with a POSIX timer signal, this
> POSIX timer may end up being erroneously stopped. This problem may be
> reproduced with a small C program that creates a POSIX timer and
> constantly respawns threads, e.g. [1].

AFAICT this has been fixed with commit:

  63dffecfba3e ("posix-timers: Target group sigqueue to current task only if not exiting")

Are you sure, that you have this commit in your testing? It's in rc1.

Thanks,

        tglx
Re: [PATCH] posix-timers: Fix a race with __exit_signal()
Posted by Oleg Nesterov 1 year, 2 months ago
Ilya,

I hope this was already fixed by Frederic, please see
https://lore.kernel.org/all/20241122234811.60455-1-frederic@kernel.org/

Oleg.

On 12/02, Ilya Leoshkevich wrote:
>
> If a thread exit happens simultaneously with a POSIX timer signal, this
> POSIX timer may end up being erroneously stopped. This problem may be
> reproduced with a small C program that creates a POSIX timer and
> constantly respawns threads, e.g. [1].
>
> When posixtimer_send_sigqueue() races against __exit_signal(), the
> latter may clear the selected task's sighand, causing
> lock_task_sighand() to fail. This is possible because __exit_signal()
> clears the task's sighand under the sighand lock, but
> lock_task_sighand() needs to first load it without taking this lock.
>
> The it_sigqueue_seq update needs to happen under the sighand lock;
> failure to take this lock means that it is not possible to make that
> update. And if it_sigqueue_seq is not updated, then
> __posixtimer_deliver_signal() does not rearm the timer, effectively
> stopping it.
>
> Fix by choosing another thread if the one chosen by
> posixtimer_get_target() is exiting. This requires taking tasklist_lock,
> which is ordered before the sighand lock, as can be seen in, e.g.,
> release_task(). tasklist_lock may be released immediately after the
> sighand lock is successfully taken, which will look nicer, but will
> create uncertainty w.r.t. restoring the irq flags, so release it
> at the end of posixtimer_send_sigqueue().
>
> There is another user of posixtimer_get_target(), which may potentially
> be affected as well: posixtimer_sig_unignore(). But it is called with
> the sighand lock taken, so the race with __exit_signal() is fortunately
> not possible there.
>
> [1] https://gitlab.com/qemu-project/qemu/-/blob/v9.1.1/tests/tcg/multiarch/signals.c?ref_type=tags
>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
>  kernel/signal.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 989b1cc9116a..ff1608997301 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1974,10 +1974,11 @@ static inline struct task_struct *posixtimer_get_target(struct k_itimer *tmr)
>
>  void posixtimer_send_sigqueue(struct k_itimer *tmr)
>  {
> +	unsigned long flags, tasklist_flags;
>  	struct sigqueue *q = &tmr->sigq;
> +	bool tasklist_locked = false;
>  	int sig = q->info.si_signo;
>  	struct task_struct *t;
> -	unsigned long flags;
>  	int result;
>
>  	guard(rcu)();
> @@ -1986,8 +1987,25 @@ void posixtimer_send_sigqueue(struct k_itimer *tmr)
>  	if (!t)
>  		return;
>
> -	if (!likely(lock_task_sighand(t, &flags)))
> -		return;
> +	if (!likely(lock_task_sighand(t, &flags))) {
> +		/*
> +		 * The target is exiting, pick another one. This ensures that
> +		 * it_sigqueue_seq is updated, otherwise
> +		 * posixtimer_deliver_signal() will not rearm the timer.
> +		 */
> +		bool found = false;
> +
> +		read_lock_irqsave(&tasklist_lock, tasklist_flags);
> +		tasklist_locked = true;
> +		do_each_pid_task(tmr->it_pid, tmr->it_pid_type, t) {
> +			if (likely(lock_task_sighand(t, &flags))) {
> +				found = true;
> +				break;
> +			}
> +		} while_each_pid_task(tmr->it_pid, tmr->it_pid_type, t);
> +		if (!likely(found))
> +			goto out_tasklist;
> +	}
>
>  	/*
>  	 * Update @tmr::sigqueue_seq for posix timer signals with sighand
> @@ -2062,6 +2080,9 @@ void posixtimer_send_sigqueue(struct k_itimer *tmr)
>  out:
>  	trace_signal_generate(sig, &q->info, t, tmr->it_pid_type != PIDTYPE_PID, result);
>  	unlock_task_sighand(t, &flags);
> +out_tasklist:
> +	if (tasklist_locked)
> +		read_unlock_irqrestore(&tasklist_lock, tasklist_flags);
>  }
>
>  static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q)
> --
> 2.47.0
>
Re: [PATCH] posix-timers: Fix a race with __exit_signal()
Posted by Ilya Leoshkevich 1 year, 2 months ago
On Mon, 2024-12-02 at 18:28 +0100, Oleg Nesterov wrote:
> Ilya,
> 
> I hope this was already fixed by Frederic, please see
> https://lore.kernel.org/all/20241122234811.60455-1-frederic@kernel.org/
> 
> Oleg.

I tested the Frederic's fix and it resolved the issue for me.

Thanks!

[...]