perf_swevent_get_recursion_context() is supposed to avoid recursion.
This requires to remain on the same CPU in order to decrement/ increment
the same counter. This is done by using preempt_disable(). Having
preemption disabled while sending a signal leads to locking problems on
PREEMPT_RT because sighand, a spinlock_t, becomes a sleeping lock.
This callback runs in task context and currently delivers only a signal
to "itself". Any kind of recusrion protection in this context is not
required.
Remove recursion protection in perf_pending_task().
Tested-by: Marco Elver <elver@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/events/core.c | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e0b2da8de485f..5400f7ed2f98b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6785,14 +6785,6 @@ static void perf_pending_irq(struct irq_work *entry)
static void perf_pending_task(struct callback_head *head)
{
struct perf_event *event = container_of(head, struct perf_event, pending_task);
- int rctx;
-
- /*
- * If we 'fail' here, that's OK, it means recursion is already disabled
- * and we won't recurse 'further'.
- */
- preempt_disable_notrace();
- rctx = perf_swevent_get_recursion_context();
if (event->pending_work) {
event->pending_work = 0;
@@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
local_dec(&event->ctx->nr_pending);
}
- if (rctx >= 0)
- perf_swevent_put_recursion_context(rctx);
- preempt_enable_notrace();
-
put_event(event);
}
--
2.43.0
Le Fri, Mar 22, 2024 at 07:48:23AM +0100, Sebastian Andrzej Siewior a écrit :
> perf_swevent_get_recursion_context() is supposed to avoid recursion.
> This requires to remain on the same CPU in order to decrement/ increment
> the same counter. This is done by using preempt_disable(). Having
> preemption disabled while sending a signal leads to locking problems on
> PREEMPT_RT because sighand, a spinlock_t, becomes a sleeping lock.
>
> This callback runs in task context and currently delivers only a signal
> to "itself". Any kind of recusrion protection in this context is not
> required.
>
> Remove recursion protection in perf_pending_task().
>
> Tested-by: Marco Elver <elver@google.com>
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> kernel/events/core.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index e0b2da8de485f..5400f7ed2f98b 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6785,14 +6785,6 @@ static void perf_pending_irq(struct irq_work *entry)
> static void perf_pending_task(struct callback_head *head)
> {
> struct perf_event *event = container_of(head, struct perf_event, pending_task);
> - int rctx;
> -
> - /*
> - * If we 'fail' here, that's OK, it means recursion is already disabled
> - * and we won't recurse 'further'.
> - */
> - preempt_disable_notrace();
> - rctx = perf_swevent_get_recursion_context();
>
> if (event->pending_work) {
> event->pending_work = 0;
> @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> local_dec(&event->ctx->nr_pending);
> }
>
> - if (rctx >= 0)
> - perf_swevent_put_recursion_context(rctx);
> - preempt_enable_notrace();
Well, if a software event happens during perf_sigtrap(), the task work
may be requeued endlessly and the task may get stuck in task_work_run()...
> -
> put_event(event);
> }
>
> --
> 2.43.0
>
>
On 2024-04-09 00:06:00 [+0200], Frederic Weisbecker wrote:
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index e0b2da8de485f..5400f7ed2f98b 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6785,14 +6785,6 @@ static void perf_pending_irq(struct irq_work *entry)
> > static void perf_pending_task(struct callback_head *head)
> > {
> > struct perf_event *event = container_of(head, struct perf_event, pending_task);
> > - int rctx;
> > -
> > - /*
> > - * If we 'fail' here, that's OK, it means recursion is already disabled
> > - * and we won't recurse 'further'.
> > - */
> > - preempt_disable_notrace();
> > - rctx = perf_swevent_get_recursion_context();
> >
> > if (event->pending_work) {
> > event->pending_work = 0;
> > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> > local_dec(&event->ctx->nr_pending);
> > }
> >
> > - if (rctx >= 0)
> > - perf_swevent_put_recursion_context(rctx);
> > - preempt_enable_notrace();
>
> Well, if a software event happens during perf_sigtrap(), the task work
> may be requeued endlessly and the task may get stuck in task_work_run()...
The last time I checked it had no users in the task context. How would
that happen?
Sebastian
Le Tue, Apr 09, 2024 at 08:25:01AM +0200, Sebastian Andrzej Siewior a écrit :
> On 2024-04-09 00:06:00 [+0200], Frederic Weisbecker wrote:
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index e0b2da8de485f..5400f7ed2f98b 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -6785,14 +6785,6 @@ static void perf_pending_irq(struct irq_work *entry)
> > > static void perf_pending_task(struct callback_head *head)
> > > {
> > > struct perf_event *event = container_of(head, struct perf_event, pending_task);
> > > - int rctx;
> > > -
> > > - /*
> > > - * If we 'fail' here, that's OK, it means recursion is already disabled
> > > - * and we won't recurse 'further'.
> > > - */
> > > - preempt_disable_notrace();
> > > - rctx = perf_swevent_get_recursion_context();
> > >
> > > if (event->pending_work) {
> > > event->pending_work = 0;
> > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> > > local_dec(&event->ctx->nr_pending);
> > > }
> > >
> > > - if (rctx >= 0)
> > > - perf_swevent_put_recursion_context(rctx);
> > > - preempt_enable_notrace();
> >
> > Well, if a software event happens during perf_sigtrap(), the task work
> > may be requeued endlessly and the task may get stuck in task_work_run()...
>
> The last time I checked it had no users in the task context. How would
> that happen?
I guess many tracepoint events would do the trick. Such as trace_lock_acquire()
for example.
Thanks.
On 2024-04-09 12:35:46 [+0200], Frederic Weisbecker wrote: > > > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head) > > > > local_dec(&event->ctx->nr_pending); > > > > } > > > > > > > > - if (rctx >= 0) > > > > - perf_swevent_put_recursion_context(rctx); > > > > - preempt_enable_notrace(); > > > > > > Well, if a software event happens during perf_sigtrap(), the task work > > > may be requeued endlessly and the task may get stuck in task_work_run()... > > > > The last time I checked it had no users in the task context. How would > > that happen? > > I guess many tracepoint events would do the trick. Such as trace_lock_acquire() > for example. So the perf_trace_buf_alloc() is invoked from that trace point and avoids the recursion. And any trace event from within perf_sigtrap() would trigger the endless loop? > Thanks. Sebastian
Le Tue, Apr 09, 2024 at 12:54:05PM +0200, Sebastian Andrzej Siewior a écrit : > On 2024-04-09 12:35:46 [+0200], Frederic Weisbecker wrote: > > > > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head) > > > > > local_dec(&event->ctx->nr_pending); > > > > > } > > > > > > > > > > - if (rctx >= 0) > > > > > - perf_swevent_put_recursion_context(rctx); > > > > > - preempt_enable_notrace(); > > > > > > > > Well, if a software event happens during perf_sigtrap(), the task work > > > > may be requeued endlessly and the task may get stuck in task_work_run()... > > > > > > The last time I checked it had no users in the task context. How would > > > that happen? > > > > I guess many tracepoint events would do the trick. Such as trace_lock_acquire() > > for example. > > So the perf_trace_buf_alloc() is invoked from that trace point and > avoids the recursion. And any trace event from within perf_sigtrap() > would trigger the endless loop? No sure I'm following: 1) event->perf_event_overflow() -> task_work_add() //return to userspace 2) task_work_run() -> perf_pending_task() -> perf_sigtrap() -> tracepoint event -> perf_event_overflow() -> task_work_add() 3) task_work_run() -> perf_pending_task() -> etc... What am I missing? Thanks.
On 2024-04-09 14:00:49 [+0200], Frederic Weisbecker wrote: > Le Tue, Apr 09, 2024 at 12:54:05PM +0200, Sebastian Andrzej Siewior a écrit : > > On 2024-04-09 12:35:46 [+0200], Frederic Weisbecker wrote: > > > > > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head) > > > > > > local_dec(&event->ctx->nr_pending); > > > > > > } > > > > > > > > > > > > - if (rctx >= 0) > > > > > > - perf_swevent_put_recursion_context(rctx); > > > > > > - preempt_enable_notrace(); > > > > > > > > > > Well, if a software event happens during perf_sigtrap(), the task work > > > > > may be requeued endlessly and the task may get stuck in task_work_run()... > > > > > > > > The last time I checked it had no users in the task context. How would > > > > that happen? > > > > > > I guess many tracepoint events would do the trick. Such as trace_lock_acquire() > > > for example. > > > > So the perf_trace_buf_alloc() is invoked from that trace point and > > avoids the recursion. And any trace event from within perf_sigtrap() > > would trigger the endless loop? > > No sure I'm following: > > 1) event->perf_event_overflow() -> task_work_add() > //return to userspace > 2) task_work_run() -> perf_pending_task() -> perf_sigtrap() -> tracepoint event > -> perf_event_overflow() -> task_work_add() > 3) task_work_run() -> perf_pending_task() -> etc... > > What am I missing? Yes, that is what I tried to say. Anyway, I misunderstood the concept before. That means we need to keep that counter here and a migrate_disable() is needed to avoid CPU migration which is sad. > Thanks. Sebastian
Le Tue, Apr 09, 2024 at 03:33:36PM +0200, Sebastian Andrzej Siewior a écrit : > On 2024-04-09 14:00:49 [+0200], Frederic Weisbecker wrote: > > Le Tue, Apr 09, 2024 at 12:54:05PM +0200, Sebastian Andrzej Siewior a écrit : > > > On 2024-04-09 12:35:46 [+0200], Frederic Weisbecker wrote: > > > > > > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head) > > > > > > > local_dec(&event->ctx->nr_pending); > > > > > > > } > > > > > > > > > > > > > > - if (rctx >= 0) > > > > > > > - perf_swevent_put_recursion_context(rctx); > > > > > > > - preempt_enable_notrace(); > > > > > > > > > > > > Well, if a software event happens during perf_sigtrap(), the task work > > > > > > may be requeued endlessly and the task may get stuck in task_work_run()... > > > > > > > > > > The last time I checked it had no users in the task context. How would > > > > > that happen? > > > > > > > > I guess many tracepoint events would do the trick. Such as trace_lock_acquire() > > > > for example. > > > > > > So the perf_trace_buf_alloc() is invoked from that trace point and > > > avoids the recursion. And any trace event from within perf_sigtrap() > > > would trigger the endless loop? > > > > No sure I'm following: > > > > 1) event->perf_event_overflow() -> task_work_add() > > //return to userspace > > 2) task_work_run() -> perf_pending_task() -> perf_sigtrap() -> tracepoint event > > -> perf_event_overflow() -> task_work_add() > > 3) task_work_run() -> perf_pending_task() -> etc... > > > > What am I missing? > > Yes, that is what I tried to say. Oh ok.. :o) > Anyway, I misunderstood the concept > before. That means we need to keep that counter here and a > migrate_disable() is needed to avoid CPU migration which is sad. I fear that won't work either. The task is then pinned but another task can come up on that CPU and its software events will be ignored... Some alternatives: _ Clear event->pending_work = 0 after perf_sigtrap(), preventing an event in there from adding a new task work. We may miss a signal though... _ Make the recursion context per task on -RT... > > Thanks. > > Sebastian
On 2024-04-10 12:38:50 [+0200], Frederic Weisbecker wrote: > > Anyway, I misunderstood the concept > > before. That means we need to keep that counter here and a > > migrate_disable() is needed to avoid CPU migration which is sad. > > I fear that won't work either. The task is then pinned but another > task can come up on that CPU and its software events will be ignored... oh, right. > Some alternatives: > > _ Clear event->pending_work = 0 after perf_sigtrap(), preventing an > event in there from adding a new task work. We may miss a signal though... > > _ Make the recursion context per task on -RT... The per-task counter would be indeed the easiest thing to do. But then only for task context, right? But why would we miss a signal if we clean event->pending_work late? Isn't cleaning late same as clearing in perf_swevent_put_recursion_context()? If clearing pending_work late works, I would prefer to avoid yet another per-task counter if possible. > > > Thanks. Sebastian
Le Wed, Apr 10, 2024 at 02:51:26PM +0200, Sebastian Andrzej Siewior a écrit : > On 2024-04-10 12:38:50 [+0200], Frederic Weisbecker wrote: > > Some alternatives: > > > > _ Clear event->pending_work = 0 after perf_sigtrap(), preventing an > > event in there from adding a new task work. We may miss a signal though... > > > > _ Make the recursion context per task on -RT... > > The per-task counter would be indeed the easiest thing to do. But then > only for task context, right? It should work for CPU context as well. A context switch shouldn't be considered as recursion. Hopefully... > > But why would we miss a signal if we clean event->pending_work late? > Isn't cleaning late same as clearing in > perf_swevent_put_recursion_context()? Not exactly. perf_swevent_get_recursion_context() avoids a new software event altogether from being recorded. It is completely ignored. Whereas clearing late event->pending_work records new software events but ignores the signal. > If clearing pending_work late works, I would prefer to avoid yet another > per-task counter if possible. Yeah I know :-/ Thanks. > > > > > Thanks. > > Sebastian
© 2016 - 2026 Red Hat, Inc.