A task can block a signal, accumulate up to RLIMIT_SIGPENDING sigqueues,
and exit. In this case __exit_signal()->flush_sigqueue() called with irqs
disabled can trigger a hard lockup, see
https://lore.kernel.org/all/20190322114917.GC28876@redhat.com/
Fortunately, after the recent posixtimer changes sys_timer_delete() paths
no longer try to clear SIGQUEUE_PREALLOC and/or free tmr->sigq, and after
the exiting task passes __exit_signal() lock_task_sighand() can't succeed
and pid_task(tmr->it_pid) will return NULL.
This means that after __exit_signal(tsk) nobody can play with tsk->pending
or (if group_dead) with tsk->signal->shared_pending, so release_task() can
safely call flush_sigqueue() after write_unlock_irq(&tasklist_lock).
TODO:
- we can probably shift posix_cpu_timers_exit() as well
- do_sigaction() can hit the similar problem
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/exit.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index 3485e5fc499e..2d7444da743d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -200,20 +200,13 @@ static void __exit_signal(struct task_struct *tsk)
__unhash_process(tsk, group_dead);
write_sequnlock(&sig->stats_lock);
- /*
- * Do this under ->siglock, we can race with another thread
- * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
- */
- flush_sigqueue(&tsk->pending);
tsk->sighand = NULL;
spin_unlock(&sighand->siglock);
__cleanup_sighand(sighand);
clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
- if (group_dead) {
- flush_sigqueue(&sig->shared_pending);
+ if (group_dead)
tty_kref_put(tty);
- }
}
static void delayed_put_task_struct(struct rcu_head *rhp)
@@ -279,6 +272,16 @@ void release_task(struct task_struct *p)
proc_flush_pid(thread_pid);
put_pid(thread_pid);
release_thread(p);
+ /*
+ * This task was already removed from the process/thread/pid lists
+ * and lock_task_sighand(p) can't succeed. Nobody else can touch
+ * ->pending or, if group dead, signal->shared_pending. We can call
+ * flush_sigqueue() lockless.
+ */
+ flush_sigqueue(&p->pending);
+ if (thread_group_leader(p))
+ flush_sigqueue(&p->signal->shared_pending);
+
put_task_struct_rcu_user(p);
p = leader;
--
2.25.1.362.g51ebf55
Le Thu, Feb 06, 2025 at 04:23:14PM +0100, Oleg Nesterov a écrit : > A task can block a signal, accumulate up to RLIMIT_SIGPENDING sigqueues, > and exit. In this case __exit_signal()->flush_sigqueue() called with irqs > disabled can trigger a hard lockup, see > https://lore.kernel.org/all/20190322114917.GC28876@redhat.com/ > > Fortunately, after the recent posixtimer changes sys_timer_delete() paths > no longer try to clear SIGQUEUE_PREALLOC and/or free tmr->sigq, and after > the exiting task passes __exit_signal() lock_task_sighand() can't succeed > and pid_task(tmr->it_pid) will return NULL. > > This means that after __exit_signal(tsk) nobody can play with tsk->pending > or (if group_dead) with tsk->signal->shared_pending, so release_task() can > safely call flush_sigqueue() after write_unlock_irq(&tasklist_lock). > > TODO: > - we can probably shift posix_cpu_timers_exit() as well Hmm, can't a timer be concurrently deleted between __exit_signal() set tsk->sighand = NULL and release sighand lock, and the actual call to posix_cpu_timer_exit() ? And then posix_cpu_timer_exit() calls timerqueue_del() on a node that don't exist anymore? That would even trigger the warning in posix_cpu_timer_del(). > - do_sigaction() can hit the similar problem > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
On 02/06, Frederic Weisbecker wrote: > > > TODO: > > - we can probably shift posix_cpu_timers_exit() as well > > Hmm, can't a timer be concurrently deleted between __exit_signal() set > tsk->sighand = NULL and release sighand lock, and the actual call to > posix_cpu_timer_exit() ? And then posix_cpu_timer_exit() calls timerqueue_del() > on a node that don't exist anymore? Can't answer right now, I will think about it when/if I will actually try to make this change ;) This "TODO" note just tries to explain what else we could try to do, and "probably" means that I am not sure yet. I can remove this spam from the changelog, but I'd prefer to keep it as a reminder, at least for myself. > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Thanks Frederic! Oleg.
Le Thu, Feb 06, 2025 at 05:55:28PM +0100, Oleg Nesterov a écrit : > On 02/06, Frederic Weisbecker wrote: > > > > > TODO: > > > - we can probably shift posix_cpu_timers_exit() as well > > > > Hmm, can't a timer be concurrently deleted between __exit_signal() set > > tsk->sighand = NULL and release sighand lock, and the actual call to > > posix_cpu_timer_exit() ? And then posix_cpu_timer_exit() calls timerqueue_del() > > on a node that don't exist anymore? > > Can't answer right now, I will think about it when/if I will actually try to > make this change ;) This "TODO" note just tries to explain what else we could > try to do, and "probably" means that I am not sure yet. I can remove this spam > from the changelog, but I'd prefer to keep it as a reminder, at least for > myself. Sure! And thanks again for the patch! > > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > Thanks Frederic! > > Oleg. >
© 2016 - 2026 Red Hat, Inc.