drivers/accel/habanalabs/common/device.c | 12 ++++++------ drivers/accel/habanalabs/common/mmu/mmu.c | 3 ++- drivers/accel/habanalabs/common/pci/pci.c | 4 ++-- drivers/cpufreq/amd-pstate.c | 10 +++++----- drivers/cpufreq/cpufreq.c | 2 +- drivers/cpufreq/intel_pstate.c | 2 +- drivers/devfreq/devfreq.c | 2 +- drivers/dma-buf/dma-fence.c | 4 ++-- drivers/fsi/fsi-master-aspeed.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- drivers/gpu/drm/scheduler/sched_entity.c | 4 ++-- drivers/hid/intel-ish-hid/ipc/pci-ish.c | 2 +- drivers/i2c/i2c-core-slave.c | 2 +- drivers/spi/spi-axi-spi-engine.c | 4 ++-- drivers/ufs/core/ufshcd.c | 12 ++++++------ fs/btrfs/extent_map.c | 4 ++-- fs/btrfs/raid56.c | 4 ++-- include/linux/tracepoint.h | 11 +++++++++++ io_uring/io_uring.h | 2 +- kernel/irq_work.c | 2 +- kernel/sched/ext.c | 2 +- kernel/smp.c | 2 +- net/core/dev.c | 2 +- net/core/xdp.c | 2 +- net/openvswitch/actions.c | 2 +- net/openvswitch/datapath.c | 2 +- net/sctp/outqueue.c | 2 +- net/tipc/node.c | 2 +- 30 files changed, 62 insertions(+), 50 deletions(-)
When a caller already guards a tracepoint with an explicit enabled check:
if (trace_foo_enabled() && cond)
trace_foo(args);
trace_foo() internally re-evaluates the static_branch_unlikely() key.
Since static branches are patched binary instructions the compiler cannot
fold the two evaluations, so every such site pays the cost twice.
This series introduces trace_invoke_##name() as a companion to
trace_##name(). It calls __do_trace_##name() directly, bypassing the
redundant static-branch re-check, while preserving all other correctness
properties of the normal path (RCU-watching assertion, might_fault() for
syscall tracepoints). The internal __do_trace_##name() symbol is not
leaked to call sites; trace_invoke_##name() is the only new public API.
if (trace_foo_enabled() && cond)
trace_invoke_foo(args); /* calls __do_trace_foo() directly */
The first patch adds the three-location change to
include/linux/tracepoint.h (__DECLARE_TRACE, __DECLARE_TRACE_SYSCALL,
and the !TRACEPOINTS_ENABLED stub). The remaining 14 patches
mechanically convert all guarded call sites found in the tree:
kernel/, io_uring/, net/, accel/habanalabs, cpufreq/, devfreq/,
dma-buf/, fsi/, drm/, HID, i2c/, spi/, scsi/ufs/, and btrfs/.
This series is motivated by Peter Zijlstra's observation in the discussion
around Dmitry Ilvokhin's locking tracepoint instrumentation series, where
he noted that compilers cannot optimize static branches and that guarded
call sites end up evaluating the static branch twice for no reason, and
by Steven Rostedt's suggestion to add a proper API instead of exposing
internal implementation details like __do_trace_##name() directly to
call sites:
https://lore.kernel.org/linux-trace-kernel/8298e098d3418cb446ef396f119edac58a3414e9.1772642407.git.d@ilvokhin.com
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Vineeth Pillai (Google) (15):
tracepoint: Add trace_invoke_##name() API
kernel: Use trace_invoke_##name() at guarded tracepoint call sites
io_uring: Use trace_invoke_##name() at guarded tracepoint call sites
net: Use trace_invoke_##name() at guarded tracepoint call sites
accel/habanalabs: Use trace_invoke_##name() at guarded tracepoint call
sites
cpufreq: Use trace_invoke_##name() at guarded tracepoint call sites
devfreq: Use trace_invoke_##name() at guarded tracepoint call sites
dma-buf: Use trace_invoke_##name() at guarded tracepoint call sites
fsi: Use trace_invoke_##name() at guarded tracepoint call sites
drm: Use trace_invoke_##name() at guarded tracepoint call sites
HID: Use trace_invoke_##name() at guarded tracepoint call sites
i2c: Use trace_invoke_##name() at guarded tracepoint call sites
spi: Use trace_invoke_##name() at guarded tracepoint call sites
scsi: ufs: Use trace_invoke_##name() at guarded tracepoint call sites
btrfs: Use trace_invoke_##name() at guarded tracepoint call sites
drivers/accel/habanalabs/common/device.c | 12 ++++++------
drivers/accel/habanalabs/common/mmu/mmu.c | 3 ++-
drivers/accel/habanalabs/common/pci/pci.c | 4 ++--
drivers/cpufreq/amd-pstate.c | 10 +++++-----
drivers/cpufreq/cpufreq.c | 2 +-
drivers/cpufreq/intel_pstate.c | 2 +-
drivers/devfreq/devfreq.c | 2 +-
drivers/dma-buf/dma-fence.c | 4 ++--
drivers/fsi/fsi-master-aspeed.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
drivers/gpu/drm/scheduler/sched_entity.c | 4 ++--
drivers/hid/intel-ish-hid/ipc/pci-ish.c | 2 +-
drivers/i2c/i2c-core-slave.c | 2 +-
drivers/spi/spi-axi-spi-engine.c | 4 ++--
drivers/ufs/core/ufshcd.c | 12 ++++++------
fs/btrfs/extent_map.c | 4 ++--
fs/btrfs/raid56.c | 4 ++--
include/linux/tracepoint.h | 11 +++++++++++
io_uring/io_uring.h | 2 +-
kernel/irq_work.c | 2 +-
kernel/sched/ext.c | 2 +-
kernel/smp.c | 2 +-
net/core/dev.c | 2 +-
net/core/xdp.c | 2 +-
net/openvswitch/actions.c | 2 +-
net/openvswitch/datapath.c | 2 +-
net/sctp/outqueue.c | 2 +-
net/tipc/node.c | 2 +-
30 files changed, 62 insertions(+), 50 deletions(-)
--
2.53.0
On 2026-03-12 11:04, Vineeth Pillai (Google) wrote:
> When a caller already guards a tracepoint with an explicit enabled check:
>
> if (trace_foo_enabled() && cond)
> trace_foo(args);
>
> trace_foo() internally re-evaluates the static_branch_unlikely() key.
> Since static branches are patched binary instructions the compiler cannot
> fold the two evaluations, so every such site pays the cost twice.
>
> This series introduces trace_invoke_##name() as a companion to
> trace_##name(). It calls __do_trace_##name() directly, bypassing the
> redundant static-branch re-check, while preserving all other correctness
> properties of the normal path (RCU-watching assertion, might_fault() for
> syscall tracepoints). The internal __do_trace_##name() symbol is not
> leaked to call sites; trace_invoke_##name() is the only new public API.
>
> if (trace_foo_enabled() && cond)
> trace_invoke_foo(args); /* calls __do_trace_foo() directly */
FYI, we have a similar concept in LTTng-UST for userspace
instrumentation already:
if (lttng_ust_tracepoint_enabled(provider, name))
lttng_ust_do_tracepoint(provider, name, ...);
Perhaps it can provide some ideas about API naming.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
On Thu, 12 Mar 2026 11:12:41 -0400 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > if (trace_foo_enabled() && cond) > > trace_invoke_foo(args); /* calls __do_trace_foo() directly */ > > FYI, we have a similar concept in LTTng-UST for userspace > instrumentation already: > > if (lttng_ust_tracepoint_enabled(provider, name)) > lttng_ust_do_tracepoint(provider, name, ...); > > Perhaps it can provide some ideas about API naming. I find the word "invoke" sounding more official than "do" ;-) Note, Vineeth came up with the naming. I would have done "do" but when I saw "invoke" I thought it sounded better. -- Steve
On 2026-03-12 11:23, Steven Rostedt wrote: > On Thu, 12 Mar 2026 11:12:41 -0400 > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >>> if (trace_foo_enabled() && cond) >>> trace_invoke_foo(args); /* calls __do_trace_foo() directly */ >> >> FYI, we have a similar concept in LTTng-UST for userspace >> instrumentation already: >> >> if (lttng_ust_tracepoint_enabled(provider, name)) >> lttng_ust_do_tracepoint(provider, name, ...); >> >> Perhaps it can provide some ideas about API naming. > > I find the word "invoke" sounding more official than "do" ;-) > > Note, Vineeth came up with the naming. I would have done "do" but when I > saw "invoke" I thought it sounded better. It works as long as you don't have a tracing subsystem called "invoke", then you get into identifier clash territory. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
On Thu, 12 Mar 2026 11:28:07 -0400 Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > Note, Vineeth came up with the naming. I would have done "do" but when I > > saw "invoke" I thought it sounded better. > > It works as long as you don't have a tracing subsystem called > "invoke", then you get into identifier clash territory. True. Perhaps we should do the double underscore trick. Instead of: trace_invoke_foo() use: trace_invoke__foo() Which will make it more visible to what the trace event is. Hmm, we probably should have used: trace__foo() for all tracepoints, as there's still functions that are called trace_foo() that are not tracepoints :-p -- Steve
On 2026-03-12 11:40, Steven Rostedt wrote: > On Thu, 12 Mar 2026 11:28:07 -0400 > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >>> Note, Vineeth came up with the naming. I would have done "do" but when I >>> saw "invoke" I thought it sounded better. >> >> It works as long as you don't have a tracing subsystem called >> "invoke", then you get into identifier clash territory. > > True. Perhaps we should do the double underscore trick. > > Instead of: trace_invoke_foo() > > use: trace_invoke__foo() > > > Which will make it more visible to what the trace event is. > > Hmm, we probably should have used: trace__foo() for all tracepoints, as > there's still functions that are called trace_foo() that are not > tracepoints :-p One certain way to eliminate identifier clash would be to go for a prefix to "trace_", e.g. do_trace_foo() call_trace_foo() emit_trace_foo() __trace_foo() invoke_trace_foo() dispatch_trace_foo() Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
On Thu, Mar 12, 2026 at 11:49 AM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > On 2026-03-12 11:40, Steven Rostedt wrote: > > On Thu, 12 Mar 2026 11:28:07 -0400 > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > >>> Note, Vineeth came up with the naming. I would have done "do" but when I > >>> saw "invoke" I thought it sounded better. > >> > >> It works as long as you don't have a tracing subsystem called > >> "invoke", then you get into identifier clash territory. > > > > True. Perhaps we should do the double underscore trick. > > > > Instead of: trace_invoke_foo() > > > > use: trace_invoke__foo() > > > > > > Which will make it more visible to what the trace event is. > > > > Hmm, we probably should have used: trace__foo() for all tracepoints, as > > there's still functions that are called trace_foo() that are not > > tracepoints :-p > > One certain way to eliminate identifier clash would be to go for a > prefix to "trace_", e.g. > > do_trace_foo() > call_trace_foo() This was the initial idea, but it had conflict in the existing source: call_trace_sched_update_nr_running. do_trace_##name also had collisions when I checked. So, went with trace_invoke_##name. Did not check rest of the suggestions here though. Thanks, Vineeth > emit_trace_foo() > __trace_foo() > invoke_trace_foo() > dispatch_trace_foo() > > Thanks, > > Mathieu > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com
On Thu, Mar 12, 2026 at 9:15 AM Vineeth Remanan Pillai <vineeth@bitbyteword.org> wrote: > > On Thu, Mar 12, 2026 at 11:49 AM Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: > > > > On 2026-03-12 11:40, Steven Rostedt wrote: > > > On Thu, 12 Mar 2026 11:28:07 -0400 > > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > > > >>> Note, Vineeth came up with the naming. I would have done "do" but when I > > >>> saw "invoke" I thought it sounded better. > > >> > > >> It works as long as you don't have a tracing subsystem called > > >> "invoke", then you get into identifier clash territory. > > > > > > True. Perhaps we should do the double underscore trick. > > > > > > Instead of: trace_invoke_foo() > > > > > > use: trace_invoke__foo() > > > > > > > > > Which will make it more visible to what the trace event is. > > > > > > Hmm, we probably should have used: trace__foo() for all tracepoints, as > > > there's still functions that are called trace_foo() that are not > > > tracepoints :-p > > > > One certain way to eliminate identifier clash would be to go for a > > prefix to "trace_", e.g. > > > > do_trace_foo() > > call_trace_foo() > > This was the initial idea, but it had conflict in the existing source: > call_trace_sched_update_nr_running. do_trace_##name also had > collisions when I checked. So, went with trace_invoke_##name. Did not > check rest of the suggestions here though. > > Thanks, > Vineeth > > > emit_trace_foo() > > __trace_foo() this seems like the best approach, IMO. double-underscored variants are usually used for some specialized/internal version of a function when we know that some conditions are correct (e.g., lock is already taken, or something like that). Which fits here: trace_xxx() will check if tracepoint is enabled, while __trace_xxx() will not check and just invoke the tracepoint? It's short, it's distinct, and it says "I know what I am doing". > > invoke_trace_foo() > > dispatch_trace_foo() > > > > Thanks, > > > > Mathieu > > > > > > > > -- > > Mathieu Desnoyers > > EfficiOS Inc. > > https://www.efficios.com >
On Thu, 12 Mar 2026 09:54:29 -0700 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > emit_trace_foo() > > > __trace_foo() > > this seems like the best approach, IMO. double-underscored variants > are usually used for some specialized/internal version of a function > when we know that some conditions are correct (e.g., lock is already > taken, or something like that). Which fits here: trace_xxx() will > check if tracepoint is enabled, while __trace_xxx() will not check and > just invoke the tracepoint? It's short, it's distinct, and it says "I > know what I am doing". Honestly, I consider double underscore as internal only and not something anyone but the subsystem maintainers use. This, is a normal function where it's just saying: If you have it already enabled, then you can use this. Thus, I don't think it qualifies as a "you know what you are doing". Perhaps: call_trace_foo() ? -- Steve
On Thu, Mar 12, 2026 at 1:03 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 12 Mar 2026 09:54:29 -0700 > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > > emit_trace_foo() > > > > __trace_foo() > > > > this seems like the best approach, IMO. double-underscored variants > > are usually used for some specialized/internal version of a function > > when we know that some conditions are correct (e.g., lock is already > > taken, or something like that). Which fits here: trace_xxx() will > > check if tracepoint is enabled, while __trace_xxx() will not check and > > just invoke the tracepoint? It's short, it's distinct, and it says "I > > know what I am doing". > > Honestly, I consider double underscore as internal only and not something > anyone but the subsystem maintainers use. > > This, is a normal function where it's just saying: If you have it already > enabled, then you can use this. Thus, I don't think it qualifies as a "you > know what you are doing". > > Perhaps: call_trace_foo() ? > call_trace_foo has one collision with the tracepoint sched_update_nr_running and a function call_trace_sched_update_nr_running. I had considered this and later moved to trace_invoke_foo() because of the collision. But I can rename call_trace_sched_update_nr_running to something else if call_trace_foo is the general consensus. Thanks, Vineeth
On Fri, 13 Mar 2026 10:02:32 -0400 Vineeth Remanan Pillai <vineeth@bitbyteword.org> wrote: > > > > Perhaps: call_trace_foo() ? > > > call_trace_foo has one collision with the tracepoint > sched_update_nr_running and a function > call_trace_sched_update_nr_running. I had considered this and later > moved to trace_invoke_foo() because of the collision. But I can rename > call_trace_sched_update_nr_running to something else if call_trace_foo > is the general consensus. OK, then lets go with: trace_call__foo() The double underscore should prevent any name collisions. Does anyone have an objections? -- Steve
On 2026-03-17 12:00, Steven Rostedt wrote: > On Fri, 13 Mar 2026 10:02:32 -0400 > Vineeth Remanan Pillai <vineeth@bitbyteword.org> wrote: > >>> >>> Perhaps: call_trace_foo() ? >>> >> call_trace_foo has one collision with the tracepoint >> sched_update_nr_running and a function >> call_trace_sched_update_nr_running. I had considered this and later >> moved to trace_invoke_foo() because of the collision. But I can rename >> call_trace_sched_update_nr_running to something else if call_trace_foo >> is the general consensus. > > OK, then lets go with: trace_call__foo() > > The double underscore should prevent any name collisions. > > Does anyone have an objections? I'm OK with it. Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
On Tue, Mar 17, 2026 at 12:02 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > On 2026-03-17 12:00, Steven Rostedt wrote: > > On Fri, 13 Mar 2026 10:02:32 -0400 > > Vineeth Remanan Pillai <vineeth@bitbyteword.org> wrote: > > > >>> > >>> Perhaps: call_trace_foo() ? > >>> > >> call_trace_foo has one collision with the tracepoint > >> sched_update_nr_running and a function > >> call_trace_sched_update_nr_running. I had considered this and later > >> moved to trace_invoke_foo() because of the collision. But I can rename > >> call_trace_sched_update_nr_running to something else if call_trace_foo > >> is the general consensus. > > > > OK, then lets go with: trace_call__foo() > > > > The double underscore should prevent any name collisions. > > > > Does anyone have an objections? > I'm OK with it. > Great thanks! I shall send a v2 with s/trace_invoke_foo/trace_call__foo/ soon. Thanks, Vineeth
On Thu, Mar 12, 2026 at 11:49:23AM -0400, Mathieu Desnoyers wrote: > On 2026-03-12 11:40, Steven Rostedt wrote: > > On Thu, 12 Mar 2026 11:28:07 -0400 > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > > > > Note, Vineeth came up with the naming. I would have done "do" but when I > > > > saw "invoke" I thought it sounded better. > > > > > > It works as long as you don't have a tracing subsystem called > > > "invoke", then you get into identifier clash territory. > > > > True. Perhaps we should do the double underscore trick. > > > > Instead of: trace_invoke_foo() > > > > use: trace_invoke__foo() > > > > > > Which will make it more visible to what the trace event is. > > > > Hmm, we probably should have used: trace__foo() for all tracepoints, as > > there's still functions that are called trace_foo() that are not > > tracepoints :-p > > One certain way to eliminate identifier clash would be to go for a > prefix to "trace_", e.g. Oh, I know!, call them __do_trace_##foo(). /me runs like hell
On 2026-03-12 11:54, Peter Zijlstra wrote: > On Thu, Mar 12, 2026 at 11:49:23AM -0400, Mathieu Desnoyers wrote: >> On 2026-03-12 11:40, Steven Rostedt wrote: >>> On Thu, 12 Mar 2026 11:28:07 -0400 >>> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >>> >>>>> Note, Vineeth came up with the naming. I would have done "do" but when I >>>>> saw "invoke" I thought it sounded better. >>>> >>>> It works as long as you don't have a tracing subsystem called >>>> "invoke", then you get into identifier clash territory. >>> >>> True. Perhaps we should do the double underscore trick. >>> >>> Instead of: trace_invoke_foo() >>> >>> use: trace_invoke__foo() >>> >>> >>> Which will make it more visible to what the trace event is. >>> >>> Hmm, we probably should have used: trace__foo() for all tracepoints, as >>> there's still functions that are called trace_foo() that are not >>> tracepoints :-p >> >> One certain way to eliminate identifier clash would be to go for a >> prefix to "trace_", e.g. > > Oh, I know!, call them __do_trace_##foo(). > > /me runs like hell So s/__do_trace_/do_trace_/g and call it a day ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
© 2016 - 2026 Red Hat, Inc.