sched/clock: Make local_clock() notrace

[PATCH] sched/clock: Make local_clock() notrace

Posted by Steven Rostedt 1 year, 11 months ago

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The "perf" clock in /sys/kernel/tracing/trace_clock enables local_clock(),
where on machines that have CONFIG_HAVE_UNSTABLE_SCHED_CLOCK set is a
normal function. This function can be traced.

I found that enabling the "perf" clock on some debug configs and running
function tracer can live lock the machine. That is, it goes so slow that
nothing moves forward.

Cc: stable@vger.kernel.org
Fixes: fb7d4948c4da2 ("sched/clock: Provide local_clock_noinstr()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/sched/clock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 3c6193de9cde..af8b698dee8c 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -308,7 +308,7 @@ noinstr u64 local_clock_noinstr(void)
 	return clock;
 }
 
-u64 local_clock(void)
+notrace u64 local_clock(void)
 {
 	u64 now;
 	preempt_disable_notrace();
-- 
2.43.0

Re: [PATCH] sched/clock: Make local_clock() notrace

Posted by Mathieu Desnoyers 1 year, 11 months ago

On 2024-02-20 20:20, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> The "perf" clock in /sys/kernel/tracing/trace_clock enables local_clock(),
> where on machines that have CONFIG_HAVE_UNSTABLE_SCHED_CLOCK set is a
> normal function. This function can be traced.
> 
> I found that enabling the "perf" clock on some debug configs and running
> function tracer can live lock the machine. That is, it goes so slow that
> nothing moves forward.

And I bet this is why the try_cmpxchg for reservation was
looping endlessly. ;)

Thanks,

Mathieu

> 
> Cc: stable@vger.kernel.org
> Fixes: fb7d4948c4da2 ("sched/clock: Provide local_clock_noinstr()")
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>   kernel/sched/clock.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 3c6193de9cde..af8b698dee8c 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -308,7 +308,7 @@ noinstr u64 local_clock_noinstr(void)
>   	return clock;
>   }
>   
> -u64 local_clock(void)
> +notrace u64 local_clock(void)
>   {
>   	u64 now;
>   	preempt_disable_notrace();

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Re: [PATCH] sched/clock: Make local_clock() notrace

Posted by Steven Rostedt 1 year, 11 months ago

On Tue, 20 Feb 2024 20:19:32 -0500
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> On 2024-02-20 20:20, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > 
> > The "perf" clock in /sys/kernel/tracing/trace_clock enables local_clock(),
> > where on machines that have CONFIG_HAVE_UNSTABLE_SCHED_CLOCK set is a
> > normal function. This function can be traced.
> > 
> > I found that enabling the "perf" clock on some debug configs and running
> > function tracer can live lock the machine. That is, it goes so slow that
> > nothing moves forward.  
> 
> And I bet this is why the try_cmpxchg for reservation was
> looping endlessly. ;)
>

Yes. Debugging that was how I found it ;-) sort of.

I went back to another machine which triggered the cmpxchg issue as well,
but when removing that code and going back to the old code, it then locked
up completely. That was because the other config had more debugging enabled.
That debugging lead to finding this.

I'm now going back to see if I can trigger that again with this update.

-- Steve

Re: [PATCH] sched/clock: Make local_clock() notrace

Posted by Steven Rostedt 1 year, 11 months ago

On Tue, 20 Feb 2024 20:25:24 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> Yes. Debugging that was how I found it ;-) sort of.
> 
> I went back to another machine which triggered the cmpxchg issue as well,
> but when removing that code and going back to the old code, it then locked
> up completely. That was because the other config had more debugging enabled.
> That debugging lead to finding this.
> 
> I'm now going back to see if I can trigger that again with this update.

Actually, I take that back. I had reverted the patches, but the lockups
happened when I put them back in. The lock ups do not happen when I don't
have the cmpxchg code.

I see now that it goes into an infinite loop if the clock gets traced
(found another clock that has the same issue):

	w = local_read(&tail_page->write);
	[..]
 again:
	info->ts = rb_time_stamp(cpu_buffer->buffer);
	[..]
	if (!local_try_cmpxchg(&tail_page->write, &w, w + info->length))
		goto again;

The rb_time_stamp() causes a trace to happen which will move 'w' and the
try_cmpxchg() is guaranteed to fail. Each time! So the above turns into an
infinite loop.

I finally got the recursion logic to not lock up the machine when a timer
gets traced. And because we still trace interrupt code (specifically
irq_enter_rcu(), which I do still want to trace!) we need the "transition"
bit in the recursion test.

That is, because irq_enter_rcu() is called before the preempt_count gets
set to being an IRQ, it fails the recursion test. To handle this, the
recursion test allows a single iteration (a transition bit) otherwise it
considers it a recursion and drops the event.

But in this case, a single recursion will still cause it to fall into an
infinite loop.

-- Steve