[PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

Eric W. Biederman posted 16 patches 3 years, 11 months ago
arch/alpha/kernel/asm-offsets.c |   1 -
arch/ia64/kernel/asm-offsets.c  |   1 -
arch/powerpc/xmon/xmon.c        |   2 +-
kernel/debug/kdb/kdb_main.c     |   2 +-
kernel/exit.c                   |  23 +++-
kernel/fork.c                   |  12 +-
kernel/ptrace.c                 | 132 ++++++++----------
kernel/signal.c                 | 296 ++++++++++++++++++++++++++--------------
8 files changed, 279 insertions(+), 190 deletions(-)
[PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
Posted by Eric W. Biederman 3 years, 11 months ago
For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
ptrace_freeze_traced has completed successfully.  Which fundamentally
means the lock dance of dropping siglock and grabbing tasklist_lock does
not work on PREEMPT_RT.  So I have worked through what is necessary so
that tasklist_lock does not need to be grabbed in ptrace_stop after
siglock is dropped.

I have explored several alternate ways of getting there and along the
way I found a lot of small bug fixes/cleanups that don't necessarily
contribute to the final result but that or worthwhile on their own.  So
I have included those changes in this set of changes just so they don't
get lost.

In addition I had a conversation with Thomas Gleixner recently that
emphasized for me the need to reduce the hold times of tasklist_lock,
and that made me realize that in principle it is possible.
https://lkml.kernel.org/r/87mtfmhap2.fsf@email.froward.int.ebiederm.org

Which is a long way of saying that not taking tasklist_lock in
ptrace_stop is good not just for PREMPT_RT but also for improving the
scalability of the kernel in general.

After this set of changes only cgroup_enter_frozen should remain a
stumbling block for PREEMPT_RT in the ptrace_stop path.

Eric W. Biederman (16):
      signal/alpha: Remove unused definition of TASK_REAL_PARENT
      signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
      kdb: Use real_parent when displaying a list of processes
      powerpc/xmon:  Use real_parent when displaying a list of processes
      ptrace: Remove dead code from __ptrace_detach
      ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
      signal: Wake up the designated parent
      ptrace: Only populate last_siginfo from ptrace
      ptrace: In ptrace_setsiginfo deal with invalid si_signo
      ptrace: In ptrace_signal look at what the debugger did with siginfo
      ptrace: Use si_sino as the signal number to resume with
      ptrace: Stop protecting ptrace_set_signr with tasklist_lock
      ptrace: Document why ptrace_setoptions does not need a lock
      signal: Protect parent child relationships by childs siglock
      ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
      signal: Always call do_notify_parent_cldstop with siglock held

 arch/alpha/kernel/asm-offsets.c |   1 -
 arch/ia64/kernel/asm-offsets.c  |   1 -
 arch/powerpc/xmon/xmon.c        |   2 +-
 kernel/debug/kdb/kdb_main.c     |   2 +-
 kernel/exit.c                   |  23 +++-
 kernel/fork.c                   |  12 +-
 kernel/ptrace.c                 | 132 ++++++++----------
 kernel/signal.c                 | 296 ++++++++++++++++++++++++++--------------
 8 files changed, 279 insertions(+), 190 deletions(-)

Eric
Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
Posted by Sebastian Andrzej Siewior 3 years, 11 months ago
On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> After this set of changes only cgroup_enter_frozen should remain a
> stumbling block for PREEMPT_RT in the ptrace_stop path.

Yes, I can confirm that. I have no systemd-less system at hand which
means I can't boot a kernel without CGROUP support. But after removing
cgroup_{enter|leave}_frozen() in ptrace_stop() I don't see the problems
I saw earlier.

Sebastian
Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
Posted by Sebastian Andrzej Siewior 3 years, 11 months ago
On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> 
> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> ptrace_freeze_traced has completed successfully.  Which fundamentally
> means the lock dance of dropping siglock and grabbing tasklist_lock does
> not work on PREEMPT_RT.  So I have worked through what is necessary so
> that tasklist_lock does not need to be grabbed in ptrace_stop after
> siglock is dropped.
…
It took me a while to realise that this is a follow-up I somehow assumed
that you added a few patches on top. Might have been the yesterday's
heat. b4 also refused to download this series because the v4 in this
thread looked newer… Anyway. Both series applied:

| =============================
| WARNING: suspicious RCU usage
| 5.18.0-rc7+ #16 Not tainted
| -----------------------------
| include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
| rcu_scheduler_active = 2, debug_locks = 1
| 2 locks held by ssdd/1734:
|  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
|  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
|
| stack backtrace:
| CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x45/0x5a
|  unlock_parents_siglocks+0xb6/0xc0
|  ptrace_stop+0xb9/0x390
|  get_signal+0x51c/0x8d0
|  arch_do_signal_or_restart+0x31/0x750
|  exit_to_user_mode_prepare+0x157/0x220
|  irqentry_exit_to_user_mode+0x5/0x50
|  asm_sysvec_apic_timer_interrupt+0x12/0x20

That is ptrace_parent() in unlock_parents_siglocks().

Sebastian
Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
Posted by Eric W. Biederman 3 years, 11 months ago
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
>> 
>> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
>> ptrace_freeze_traced has completed successfully.  Which fundamentally
>> means the lock dance of dropping siglock and grabbing tasklist_lock does
>> not work on PREEMPT_RT.  So I have worked through what is necessary so
>> that tasklist_lock does not need to be grabbed in ptrace_stop after
>> siglock is dropped.
> …
> It took me a while to realise that this is a follow-up I somehow assumed
> that you added a few patches on top. Might have been the yesterday's
> heat. b4 also refused to download this series because the v4 in this
> thread looked newer… Anyway. Both series applied:
>
> | =============================
> | WARNING: suspicious RCU usage
> | 5.18.0-rc7+ #16 Not tainted
> | -----------------------------
> | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> |
> | other info that might help us debug this:
> |
> | rcu_scheduler_active = 2, debug_locks = 1
> | 2 locks held by ssdd/1734:
> |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> |
> | stack backtrace:
> | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> | Call Trace:
> |  <TASK>
> |  dump_stack_lvl+0x45/0x5a
> |  unlock_parents_siglocks+0xb6/0xc0
> |  ptrace_stop+0xb9/0x390
> |  get_signal+0x51c/0x8d0
> |  arch_do_signal_or_restart+0x31/0x750
> |  exit_to_user_mode_prepare+0x157/0x220
> |  irqentry_exit_to_user_mode+0x5/0x50
> |  asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> That is ptrace_parent() in unlock_parents_siglocks().

How odd.  I thought I had the appropriate lockdep config options enabled
in my test build to catch things like this.  I guess not.

Now I am trying to think how to tell it that holding the appropriate
iglock makes this ok.

Eric
Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock
Posted by Peter Zijlstra 3 years, 11 months ago
On Fri, May 20, 2022 at 02:32:24PM -0500, Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> >> 
> >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> >> ptrace_freeze_traced has completed successfully.  Which fundamentally
> >> means the lock dance of dropping siglock and grabbing tasklist_lock does
> >> not work on PREEMPT_RT.  So I have worked through what is necessary so
> >> that tasklist_lock does not need to be grabbed in ptrace_stop after
> >> siglock is dropped.
> > …
> > It took me a while to realise that this is a follow-up I somehow assumed
> > that you added a few patches on top. Might have been the yesterday's
> > heat. b4 also refused to download this series because the v4 in this
> > thread looked newer… Anyway. Both series applied:
> >
> > | =============================
> > | WARNING: suspicious RCU usage
> > | 5.18.0-rc7+ #16 Not tainted
> > | -----------------------------
> > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> > |
> > | other info that might help us debug this:
> > |
> > | rcu_scheduler_active = 2, debug_locks = 1
> > | 2 locks held by ssdd/1734:
> > |  #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> > |  #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> > |
> > | stack backtrace:
> > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > | Call Trace:
> > |  <TASK>
> > |  dump_stack_lvl+0x45/0x5a
> > |  unlock_parents_siglocks+0xb6/0xc0
> > |  ptrace_stop+0xb9/0x390
> > |  get_signal+0x51c/0x8d0
> > |  arch_do_signal_or_restart+0x31/0x750
> > |  exit_to_user_mode_prepare+0x157/0x220
> > |  irqentry_exit_to_user_mode+0x5/0x50
> > |  asm_sysvec_apic_timer_interrupt+0x12/0x20
> >
> > That is ptrace_parent() in unlock_parents_siglocks().
> 
> How odd.  I thought I had the appropriate lockdep config options enabled
> in my test build to catch things like this.  I guess not.
> 
> Now I am trying to think how to tell it that holding the appropriate
> iglock makes this ok.

The typical annotation is something like:

	rcu_dereference_protected(foo, lockdep_is_held(&bar))

Except in this case I think the problem is that bar depends on foo in
non-trivial ways. That is, foo is 'task->parent' and bar is
'task->parent->sighand->siglock' or something.

The other option is to use rcu_dereference_raw() in this one instance
and have a comment that explains the situation.