kernel/fork.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.
One practical example is splat inside inactive_task_timer(), which is
called in a interrupt context:
CPU: 1 PID: 2848 Comm: life Kdump: loaded Tainted: G W ---------
Hardware name: HP ProLiant DL388p Gen8, BIOS P70 07/15/2012
Call Trace:
dump_stack_lvl+0x57/0x7d
mark_lock_irq.cold+0x33/0xba
? stack_trace_save+0x4b/0x70
? save_trace+0x55/0x150
mark_lock+0x1e7/0x400
mark_usage+0x11d/0x140
__lock_acquire+0x30d/0x930
lock_acquire.part.0+0x9c/0x210
? refill_obj_stock+0x3d/0x3a0
? rcu_read_lock_sched_held+0x3f/0x70
? trace_lock_acquire+0x38/0x140
? lock_acquire+0x30/0x80
? refill_obj_stock+0x3d/0x3a0
rt_spin_lock+0x27/0xe0
? refill_obj_stock+0x3d/0x3a0
refill_obj_stock+0x3d/0x3a0
? inactive_task_timer+0x1ad/0x340
kmem_cache_free+0x357/0x560
inactive_task_timer+0x1ad/0x340
? switched_from_dl+0x2d0/0x2d0
__run_hrtimer+0x8a/0x1a0
__hrtimer_run_queues+0x91/0x130
hrtimer_interrupt+0x10f/0x220
__sysvec_apic_timer_interrupt+0x7b/0xd0
sysvec_apic_timer_interrupt+0x4f/0xd0
? asm_sysvec_apic_timer_interrupt+0xa/0x20
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0033:0x7fff196bf6f5
Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reported-by: Hu Chunyu <chuhu@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Cc: Paul McKenney <paulmck@kernel.org>
---
kernel/fork.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 9f7fe3541897..b2d0d62c9b9d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -840,7 +840,7 @@ static inline void put_signal_struct(struct signal_struct *sig)
free_signal_struct(sig);
}
-void __put_task_struct(struct task_struct *tsk)
+static void ___put_task_struct(struct task_struct *tsk)
{
WARN_ON(!tsk->exit_state);
WARN_ON(refcount_read(&tsk->usage));
@@ -857,6 +857,26 @@ void __put_task_struct(struct task_struct *tsk)
sched_core_free(tsk);
free_task(tsk);
}
+
+static void __delayed_put_task_struct(struct rcu_head *rhp)
+{
+ struct task_struct *task = container_of(rhp, struct task_struct, rcu);
+
+ ___put_task_struct(task);
+}
+
+void __put_task_struct(struct task_struct *tsk)
+{
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && (!preemptible() || !in_task()))
+ /*
+ * under PREEMPT_RT, we can't call put_task_struct
+ * in atomic context because it will indirectly
+ * acquire sleeping locks.
+ */
+ call_rcu(&tsk->rcu, __delayed_put_task_struct);
+ else
+ ___put_task_struct(tsk);
+}
EXPORT_SYMBOL_GPL(__put_task_struct);
void __init __weak arch_task_cache_init(void) { }
--
2.39.1
On 02/01, Wander Lairson Costa wrote: > > Instead of calling __put_task_struct() directly, we defer it using > call_rcu(). A more natural approach would use a workqueue, but since > in PREEMPT_RT, we can't allocate dynamic memory from atomic context, > the code would become more complex because we would need to put the > work_struct instance in the task_struct and initialize it when we > allocate a new task_struct. I don't think I can ack the changes in PREEMPT_RT but this version LGTM. just a couple of purely cosmetic nits, feel free to ignore... > +static void __delayed_put_task_struct(struct rcu_head *rhp) > +{ > + struct task_struct *task = container_of(rhp, struct task_struct, rcu); > + > + ___put_task_struct(task); > +} We already have delayed_put_task_struct() which differs very much. Perhaps something like ___put_task_struct() will look less confusing. > +void __put_task_struct(struct task_struct *tsk) > +{ > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (!preemptible() || !in_task())) > + /* > + * under PREEMPT_RT, we can't call put_task_struct > + * in atomic context because it will indirectly > + * acquire sleeping locks. > + */ > + call_rcu(&tsk->rcu, __delayed_put_task_struct); Perhaps this deserves additional note to explain why is it safe to use tsk->rcu union. May be this is obvious, but I was confused when I looked at the previous version ;) but again, feel free to ignore. Oleg.
On Thu, Feb 2, 2023 at 3:37 PM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 02/01, Wander Lairson Costa wrote: > > > > Instead of calling __put_task_struct() directly, we defer it using > > call_rcu(). A more natural approach would use a workqueue, but since > > in PREEMPT_RT, we can't allocate dynamic memory from atomic context, > > the code would become more complex because we would need to put the > > work_struct instance in the task_struct and initialize it when we > > allocate a new task_struct. > > I don't think I can ack the changes in PREEMPT_RT but this version LGTM. > > > > > just a couple of purely cosmetic nits, feel free to ignore... > > > +static void __delayed_put_task_struct(struct rcu_head *rhp) > > +{ > > + struct task_struct *task = container_of(rhp, struct task_struct, rcu); > > + > > + ___put_task_struct(task); > > +} > > We already have delayed_put_task_struct() which differs very much. > Perhaps something like ___put_task_struct() will look less confusing. > ___put_task_struct()? I already added a function with this name below. > > +void __put_task_struct(struct task_struct *tsk) > > +{ > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (!preemptible() || !in_task())) > > + /* > > + * under PREEMPT_RT, we can't call put_task_struct > > + * in atomic context because it will indirectly > > + * acquire sleeping locks. > > + */ > > + call_rcu(&tsk->rcu, __delayed_put_task_struct); > > Perhaps this deserves additional note to explain why is it safe to use tsk->rcu > union. May be this is obvious, but I was confused when I looked at the previous > version ;) > Makes sense, I will add it in the next version. > but again, feel free to ignore. > > Oleg. >
On 02/02, Wander Lairson Costa wrote: > > > We already have delayed_put_task_struct() which differs very much. > > Perhaps something like ___put_task_struct() will look less confusing. > > > > ___put_task_struct()? I already added a function with this name below. Ah, I meant ___put_task_struct_rcu() or something like this. Bug again this is just cosmetic nit, please ignore > > > +void __put_task_struct(struct task_struct *tsk) > > > +{ > > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (!preemptible() || !in_task())) > > > + /* > > > + * under PREEMPT_RT, we can't call put_task_struct > > > + * in atomic context because it will indirectly > > > + * acquire sleeping locks. > > > + */ > > > + call_rcu(&tsk->rcu, __delayed_put_task_struct); > > > > Perhaps this deserves additional note to explain why is it safe to use tsk->rcu > > union. May be this is obvious, but I was confused when I looked at the previous > > version ;) > > > > Makes sense, I will add it in the next version. Thanks ;) Oleg.
© 2016 - 2025 Red Hat, Inc.