On PREEMPT_RT keeping preemption disabled during the invocation of
cgroup_enter_frozen() is a problem because the function acquires css_set_lock
which is a sleeping lock on PREEMPT_RT and must not be acquired with disabled
preemption.
The preempt-disabled section is only for performance optimisation
reasons and can be avoided.
Extend the comment and don't disable preemption before scheduling on
PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/signal.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index da017a5461163..9e07b3075c72e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2328,11 +2328,16 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
* The preempt-disable section ensures that there will be no preemption
* between unlock and schedule() and so improving the performance since
* the ptracer has no reason to sleep.
+ *
+ * This optimisation is not doable on PREEMPT_RT due to the spinlock_t
+ * within the preempt-disable section.
*/
- preempt_disable();
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ preempt_disable();
read_unlock(&tasklist_lock);
cgroup_enter_frozen();
- preempt_enable_no_resched();
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ preempt_enable_no_resched();
schedule();
cgroup_leave_frozen(true);
--
2.40.1
The patch LGTM, but I am a bit confused by the changelog/comments, I guess I missed something... On 06/06, Sebastian Andrzej Siewior wrote: > > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -2328,11 +2328,16 @@ static int ptrace_stop(int exit_code, int why, unsigned long message, > * The preempt-disable section ensures that there will be no preemption > * between unlock and schedule() and so improving the performance since > * the ptracer has no reason to sleep. > + * > + * This optimisation is not doable on PREEMPT_RT due to the spinlock_t > + * within the preempt-disable section. > */ > - preempt_disable(); > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > + preempt_disable(); Not only we the problems with cgroup_enter_frozen(), afaics (please correct me) this optimisation doesn't work on RT anyway? IIUC, read_lock() on RT disables migration but not preemption, so it is simply too late to do preempt_disable() before unlock/schedule. The tracer can preempt the tracee right after do_notify_parent_cldstop(). Oleg.
On Tue, Jun 06, 2023 at 01:04:48PM +0200, Oleg Nesterov wrote: > The patch LGTM, but I am a bit confused by the changelog/comments, > I guess I missed something... > > On 06/06, Sebastian Andrzej Siewior wrote: > > > > --- a/kernel/signal.c > > +++ b/kernel/signal.c > > @@ -2328,11 +2328,16 @@ static int ptrace_stop(int exit_code, int why, unsigned long message, > > * The preempt-disable section ensures that there will be no preemption > > * between unlock and schedule() and so improving the performance since > > * the ptracer has no reason to sleep. > > + * > > + * This optimisation is not doable on PREEMPT_RT due to the spinlock_t > > + * within the preempt-disable section. > > */ > > - preempt_disable(); > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > > + preempt_disable(); > > Not only we the problems with cgroup_enter_frozen(), afaics (please correct me) > this optimisation doesn't work on RT anyway? > > IIUC, read_lock() on RT disables migration but not preemption, so it is simply > too late to do preempt_disable() before unlock/schedule. The tracer can preempt > the tracee right after do_notify_parent_cldstop(). Correct -- but I think you can disable preemption over what is effectivly rwsem_up_read(), but you can't over the effective rtmutex_lock() that cgroup_enter_frozen() will then attempt. (iow, unlock() doesn't tend to sleep, while lock() does) But you're correct to point out that the whole preempt_disable() thing is entirely pointless due to the whole task_lock region being preemptible before it.
On 06/06, Peter Zijlstra wrote: > > On Tue, Jun 06, 2023 at 01:04:48PM +0200, Oleg Nesterov wrote: > > The patch LGTM, but I am a bit confused by the changelog/comments, > > I guess I missed something... > > > > On 06/06, Sebastian Andrzej Siewior wrote: > > > > > > --- a/kernel/signal.c > > > +++ b/kernel/signal.c > > > @@ -2328,11 +2328,16 @@ static int ptrace_stop(int exit_code, int why, unsigned long message, > > > * The preempt-disable section ensures that there will be no preemption > > > * between unlock and schedule() and so improving the performance since > > > * the ptracer has no reason to sleep. > > > + * > > > + * This optimisation is not doable on PREEMPT_RT due to the spinlock_t > > > + * within the preempt-disable section. > > > */ > > > - preempt_disable(); > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > > > + preempt_disable(); > > > > Not only we the problems with cgroup_enter_frozen(), afaics (please correct me) > > this optimisation doesn't work on RT anyway? > > > > IIUC, read_lock() on RT disables migration but not preemption, so it is simply > > too late to do preempt_disable() before unlock/schedule. The tracer can preempt > > the tracee right after do_notify_parent_cldstop(). > > Correct -- but I think you can disable preemption over what is > effectivly rwsem_up_read(), but you can't over the effective > rtmutex_lock() that cgroup_enter_frozen() will then attempt. > > (iow, unlock() doesn't tend to sleep, while lock() does) > > But you're correct to point out that the whole preempt_disable() thing > is entirely pointless due to the whole task_lock region being > preemptible before it. Thanks Peter. So I think the comment should be updated. Otherwise it looks as if it makes sense to try to move cgroup_enter_frozen() up before preempt_disable(). Oleg.
On PREEMPT_RT keeping preemption disabled during the invocation of
cgroup_enter_frozen() is a problem because the function acquires css_set_lock
which is a sleeping lock on PREEMPT_RT and must not be acquired with disabled
preemption.
The preempt-disabled section is only for performance optimisation
reasons and can be avoided.
Extend the comment and don't disable preemption before scheduling on
PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Is this better?
v1…v2:
- Extend the comment to note that preemption isn't disabled due to
the lock to make it obvious that the optimisation isn't just
harmful but also pointless.
kernel/signal.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index da017a5461163..dcb0b1fbcb3a8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2328,11 +2328,20 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
* The preempt-disable section ensures that there will be no preemption
* between unlock and schedule() and so improving the performance since
* the ptracer has no reason to sleep.
+ *
+ * On PREEMPT_RT locking tasklist_lock does not disable preemption.
+ * Therefore the task can be preempted (after
+ * do_notify_parent_cldstop()) before unlocking tasklist_lock so there
+ * is no benefit in doing this. The optimisation is harmful on
+ * PEEMPT_RT because the spinlock_t (in cgroup_enter_frozen()) must not
+ * be acquired with disabled preemption.
*/
- preempt_disable();
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ preempt_disable();
read_unlock(&tasklist_lock);
cgroup_enter_frozen();
- preempt_enable_no_resched();
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ preempt_enable_no_resched();
schedule();
cgroup_leave_frozen(true);
--
2.40.1
On 06/06, Sebastian Andrzej Siewior wrote: > > v1…v2: > - Extend the comment to note that preemption isn't disabled due to > the lock to make it obvious that the optimisation isn't just > harmful but also pointless. Thanks, Acked-by: Oleg Nesterov <oleg@redhat.com>
© 2016 - 2026 Red Hat, Inc.