[v2] disable optimistic spinning for ftrace_lock

[RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by Yafang Shao 3 weeks, 6 days ago

Recently, we resolved a latency spike issue caused by concurrently running
bpftrace processes. The root cause was high contention on the ftrace_lock
due to optimistic spinning. We can optimize this by disabling optimistic
spinning for ftrace_lock.

While semaphores may present similar challenges, I'm not currently aware of
specific instances that exhibit this exact issue. Should we encounter
problematic semaphores in production workloads, we can address them at that
time.

PATCH #1: introduce slow_mutex_[un]lock to disable optimistic spinning
PATCH #2: add variant for rtmutex
PATCH #3: disable optimistic spinning for ftrace_lock

v1->v2:
- add slow_mutex_[un]lock (Steven)
- add variant for rtmutex (Waiman)
- revise commit log for clarity and accuracy (Waiman, Peter)
- note that semaphores may present similar challenges (David)

RFC v1: https://lore.kernel.org/bpf/20260304074650.58165-1-laoar.shao@gmail.com/

Yafang Shao (3):
  locking/mutex: Add slow path variants for lock/unlock
  locking/rtmutex: Add slow path variants for lock/unlock
  ftrace: Disable optimistic spinning for ftrace_lock

 include/linux/mutex.h        |   4 ++
 include/linux/rtmutex.h      |   3 +
 kernel/locking/mutex.c       |  41 +++++++++++---
 kernel/locking/rtmutex.c     |  37 +++++++-----
 kernel/locking/rtmutex_api.c |  47 +++++++++++++---
 kernel/trace/ftrace.c        | 106 +++++++++++++++++------------------
 6 files changed, 157 insertions(+), 81 deletions(-)

-- 
2.47.3

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by Peter Zijlstra 3 weeks, 6 days ago

On Wed, Mar 11, 2026 at 07:52:47PM +0800, Yafang Shao wrote:
> Recently, we resolved a latency spike issue caused by concurrently running
> bpftrace processes. The root cause was high contention on the ftrace_lock
> due to optimistic spinning. We can optimize this by disabling optimistic
> spinning for ftrace_lock.
> 
> While semaphores may present similar challenges, I'm not currently aware of
> specific instances that exhibit this exact issue. Should we encounter
> problematic semaphores in production workloads, we can address them at that
> time.
> 
> PATCH #1: introduce slow_mutex_[un]lock to disable optimistic spinning
> PATCH #2: add variant for rtmutex
> PATCH #3: disable optimistic spinning for ftrace_lock
> 

So I really utterly hate this.

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by David Laight 3 weeks, 6 days ago

On Wed, 11 Mar 2026 12:54:26 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Mar 11, 2026 at 07:52:47PM +0800, Yafang Shao wrote:
> > Recently, we resolved a latency spike issue caused by concurrently running
> > bpftrace processes. The root cause was high contention on the ftrace_lock
> > due to optimistic spinning. We can optimize this by disabling optimistic
> > spinning for ftrace_lock.
> > 
> > While semaphores may present similar challenges, I'm not currently aware of
> > specific instances that exhibit this exact issue. Should we encounter
> > problematic semaphores in production workloads, we can address them at that
> > time.
> > 
> > PATCH #1: introduce slow_mutex_[un]lock to disable optimistic spinning
> > PATCH #2: add variant for rtmutex
> > PATCH #3: disable optimistic spinning for ftrace_lock
> >   
> 
> So I really utterly hate this.

Yep...
Adding the extra parameter is likely to have a measurable impact
on everything else.

The problematic path is obvious:        find_kallsyms_symbol+142
        module_address_lookup+104
        kallsyms_lookup_buildid+203
        kallsyms_lookup+20
        print_rec+64
        t_show+67
        seq_read_iter+709
        seq_read+165
        vfs_read+165
        ksys_read+103
        __x64_sys_read+25
        do_syscall_64+56
        entry_SYSCALL_64_after_hwframe+100

The code needs to drop the ftrace_lock across t_show.

Although there is a bigger issue of why on earth the code is reading the
list of filter functions at all - never mind all the time.
I'll do it by hand when debugging, but I'd have though anything using bpf
will know exactly where to add its hooks.

	David

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by Yafang Shao 3 weeks, 6 days ago

On Wed, Mar 11, 2026 at 8:53 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 11 Mar 2026 12:54:26 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
>
> > On Wed, Mar 11, 2026 at 07:52:47PM +0800, Yafang Shao wrote:
> > > Recently, we resolved a latency spike issue caused by concurrently running
> > > bpftrace processes. The root cause was high contention on the ftrace_lock
> > > due to optimistic spinning. We can optimize this by disabling optimistic
> > > spinning for ftrace_lock.
> > >
> > > While semaphores may present similar challenges, I'm not currently aware of
> > > specific instances that exhibit this exact issue. Should we encounter
> > > problematic semaphores in production workloads, we can address them at that
> > > time.
> > >
> > > PATCH #1: introduce slow_mutex_[un]lock to disable optimistic spinning
> > > PATCH #2: add variant for rtmutex
> > > PATCH #3: disable optimistic spinning for ftrace_lock
> > >
> >
> > So I really utterly hate this.
>
> Yep...
> Adding the extra parameter is likely to have a measurable impact
> on everything else.
>
> The problematic path is obvious:        find_kallsyms_symbol+142
>         module_address_lookup+104
>         kallsyms_lookup_buildid+203
>         kallsyms_lookup+20
>         print_rec+64
>         t_show+67
>         seq_read_iter+709
>         seq_read+165
>         vfs_read+165
>         ksys_read+103
>         __x64_sys_read+25
>         do_syscall_64+56
>         entry_SYSCALL_64_after_hwframe+100
>
> The code needs to drop the ftrace_lock across t_show.

It's unclear whether we can safely release ftrace_lock within t_show —
doing so would probably necessitate a major redesign of the current
implementation.

>
> Although there is a bigger issue of why on earth the code is reading the
> list of filter functions at all - never mind all the time.

bpftrace reads the complete list of available functions into
userspace, then performs matching against the target function to
determine if it is traceable.

> I'll do it by hand when debugging, but I'd have though anything using bpf
> will know exactly where to add its hooks.


-- 
Regards
Yafang

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by Masami Hiramatsu (Google) 3 weeks, 5 days ago

Hi,

On Wed, 11 Mar 2026 21:40:32 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> > Although there is a bigger issue of why on earth the code is reading the
> > list of filter functions at all - never mind all the time.
> 
> bpftrace reads the complete list of available functions into
> userspace, then performs matching against the target function to
> determine if it is traceable.

What about changing bpftrace userspace tool to cache the list of available
functions? (or just add an option to pass available function list?)
Then, you can just copy the function list somewhere and uses it.

Of course we can do the same thing in the kernel, but I don't think
there is any reason to do it in the kernel instead of user space.

Thank you,

> 
> > I'll do it by hand when debugging, but I'd have though anything using bpf
> > will know exactly where to add its hooks.
> 
> 
> -- 
> Regards
> Yafang

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by Yafang Shao 3 weeks, 5 days ago

On Thu, Mar 12, 2026 at 4:06 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi,
>
> On Wed, 11 Mar 2026 21:40:32 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > > Although there is a bigger issue of why on earth the code is reading the
> > > list of filter functions at all - never mind all the time.
> >
> > bpftrace reads the complete list of available functions into
> > userspace, then performs matching against the target function to
> > determine if it is traceable.
>
> What about changing bpftrace userspace tool to cache the list of available
> functions? (or just add an option to pass available function list?)
> Then, you can just copy the function list somewhere and uses it.

Thanks for the suggestion.
Steven also mentioned this optimization, and I believe it could indeed
help address the performance issue.

>
> Of course we can do the same thing in the kernel, but I don't think
> there is any reason to do it in the kernel instead of user space.

Implementing this in the kernel provides a more generic solution.
Instead of patching each tool individually (bpftrace, perf, trace-cmd,
etc.).


--
Regards
Yafang

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by Steven Rostedt 3 weeks, 6 days ago

On Wed, 11 Mar 2026 21:40:32 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> > The code needs to drop the ftrace_lock across t_show.  
> 
> It's unclear whether we can safely release ftrace_lock within t_show —
> doing so would probably necessitate a major redesign of the current
> implementation.

The issue isn't t_show, it's the calls between t_start and t_next and
subsequent t_next calls, which needs to keep a consistent state. t_show
just happens to be called in between them.

> 
> >
> > Although there is a bigger issue of why on earth the code is reading the
> > list of filter functions at all - never mind all the time.  
> 
> bpftrace reads the complete list of available functions into
> userspace, then performs matching against the target function to
> determine if it is traceable.

Could it parse it in smaller bits? That is, the lock is held only during an
individual read system call. If it reads the available_filter_functions
file via smaller buffers, it would not hold the lock for as long.

-- Steve

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by David Laight 3 weeks, 6 days ago

On Wed, 11 Mar 2026 13:07:43 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 11 Mar 2026 21:40:32 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
> 
> > > The code needs to drop the ftrace_lock across t_show.    
> > 
> > It's unclear whether we can safely release ftrace_lock within t_show —
> > doing so would probably necessitate a major redesign of the current
> > implementation.  
> 
> The issue isn't t_show, it's the calls between t_start and t_next and
> subsequent t_next calls, which needs to keep a consistent state. t_show
> just happens to be called in between them.
> 
> >   
> > >
> > > Although there is a bigger issue of why on earth the code is reading the
> > > list of filter functions at all - never mind all the time.    
> > 
> > bpftrace reads the complete list of available functions into
> > userspace, then performs matching against the target function to
> > determine if it is traceable.  
> 
> Could it parse it in smaller bits? That is, the lock is held only during an
> individual read system call. If it reads the available_filter_functions
> file via smaller buffers, it would not hold the lock for as long.

But the expensive part is probably looking up the symbol name.
Shorter reads would hand the lock off to the other process, but overall
the lock would still be held for the same length of time.

How does the code work out where to start from for each read system call?
Couldn't you (effectively) do one symbol at a time the same way?

Another option would be to put a 'generation number' on the list of functions
(after all it doesn't change that often).
Then you can release the lock, generate the data, re-acquire the lock and
check the generation number hasn't changed.
If it hasn't changed you can carry on processing the list using the
same pointer.
If the generation number has changed terminate the read and worry about
locating the correct start position for the next read.

	David

> 
> -- Steve
>

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by Steven Rostedt 3 weeks, 6 days ago

On Wed, 11 Mar 2026 13:07:43 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > bpftrace reads the complete list of available functions into
> > userspace, then performs matching against the target function to
> > determine if it is traceable.  
> 
> Could it parse it in smaller bits? That is, the lock is held only during an
> individual read system call. If it reads the available_filter_functions
> file via smaller buffers, it would not hold the lock for as long

Hmm, I guess this wouldn't help much. I ran:

 trace-cmd record -e sys_enter_read -e sys_exit_read -F cat /sys/kernel/tracing/available_filter_functions > /dev/null

And trace-cmd report shows:

[..]
             cat-1208  [001] .....   142.025582: sys_enter_read:       fd: 0x00000003, buf: 0x7fa9daf21000, count: 0x00040000
             cat-1208  [001] .....   142.025995: sys_exit_read:        0xfee
             cat-1208  [001] .....   142.026000: sys_enter_read:       fd: 0x00000003, buf: 0x7fa9daf21000, count: 0x00040000
             cat-1208  [001] .....   142.026392: sys_exit_read:        0xff8
             cat-1208  [001] .....   142.026396: sys_enter_read:       fd: 0x00000003, buf: 0x7fa9daf21000, count: 0x00040000
             cat-1208  [001] .....   142.026766: sys_exit_read:        0xfed
             cat-1208  [001] .....   142.026770: sys_enter_read:       fd: 0x00000003, buf: 0x7fa9daf21000, count: 0x00040000
             cat-1208  [001] .....   142.027113: sys_exit_read:        0xfe0
             cat-1208  [001] .....   142.027117: sys_enter_read:       fd: 0x00000003, buf: 0x7fa9daf21000, count: 0x00040000
             cat-1208  [001] .....   142.027502: sys_exit_read:        0xfec
             cat-1208  [001] .....   142.027506: sys_enter_read:       fd: 0x00000003, buf: 0x7fa9daf21000, count: 0x00040000
[..]

which shows that even though the read buffer size is 0x40000, the size read
is just 0xff8. So the buffer being read and return is never more than a page.

Unless you are running on powerpc, where the page is likely 64K.

-- Steve

Re: [RFC PATCH v2 0/3] disable optimistic spinning for ftrace_lock

Posted by Yafang Shao 3 weeks, 6 days ago

On Wed, Mar 11, 2026 at 7:54 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Mar 11, 2026 at 07:52:47PM +0800, Yafang Shao wrote:
> > Recently, we resolved a latency spike issue caused by concurrently running
> > bpftrace processes. The root cause was high contention on the ftrace_lock
> > due to optimistic spinning. We can optimize this by disabling optimistic
> > spinning for ftrace_lock.
> >
> > While semaphores may present similar challenges, I'm not currently aware of
> > specific instances that exhibit this exact issue. Should we encounter
> > problematic semaphores in production workloads, we can address them at that
> > time.
> >
> > PATCH #1: introduce slow_mutex_[un]lock to disable optimistic spinning
> > PATCH #2: add variant for rtmutex
> > PATCH #3: disable optimistic spinning for ftrace_lock
> >
>
> So I really utterly hate this.

Do you have any other suggestions for optimizing this?

-- 
Regards
Yafang