[PATCH 0/8] rtla/timerlat: Support actions on threshold and on end

Tomas Glozar posted 8 patches 4 months ago
There is a newer version of this series
tools/tracing/rtla/src/Build           |   1 +
tools/tracing/rtla/src/actions.c       | 260 +++++++++++++++++++++++++
tools/tracing/rtla/src/actions.h       |  52 +++++
tools/tracing/rtla/src/timerlat.bpf.c  |  13 +-
tools/tracing/rtla/src/timerlat.c      |  24 ++-
tools/tracing/rtla/src/timerlat.h      |  24 ++-
tools/tracing/rtla/src/timerlat_bpf.c  |  13 ++
tools/tracing/rtla/src/timerlat_bpf.h  |   3 +
tools/tracing/rtla/src/timerlat_hist.c | 145 ++++++++++----
tools/tracing/rtla/src/timerlat_top.c  | 167 ++++++++++------
tools/tracing/rtla/tests/engine.sh     |  21 +-
tools/tracing/rtla/tests/hwnoise.t     |   8 +-
tools/tracing/rtla/tests/osnoise.t     |   4 +-
tools/tracing/rtla/tests/timerlat.t    |  36 +++-
14 files changed, 652 insertions(+), 119 deletions(-)
create mode 100644 tools/tracing/rtla/src/actions.c
create mode 100644 tools/tracing/rtla/src/actions.h
[PATCH 0/8] rtla/timerlat: Support actions on threshold and on end
Posted by Tomas Glozar 4 months ago
This series adds a feature that allows to user to specify certain
kinds of "actions" to be executed at one of two places in the rtla
measurement: when tracing is stopped on latency threshold, and at the
end of tracing.

Two new options are added: -A/--on-threshold, and -N/--on-end, taking
the action as an argument. For example:

$ rtla timerlat hist -T 10 -A shell,command="echo Threshold" \
-N shell,command="echo Tracing stopped"

will print "Threshold" if a thread latency higher than 10 microseconds
is reached, and "Tracing stopped" always at the end.

The list of possible actions is extensible and is covered in
the commit messages. Later, a documentation patch series will be sent
with clear explanation of every action and its syntax.

Notably, a special action "continue" resumes tracing. For example:

$ rtla timerlat hist -T 100 -A shell,command="echo Threshold" \
-A continue -d 10s

will print "Threshold" as many times as tracing is stopped after
thread latency reaches 100us.

The feature was inspired by a case where collecting perf data on rtla
latency overflow was required, which can be done by sending a signal
to the perf process.

Example of this with Intel PT aux buffer:

$ perf record --cpu 0 -e intel_pt// -S -- rtla timerlat top -q -T 100 \
-c 0 -A signal,pid=parent,num=12 -A continue

In general, the feature is aiming to allow integration with external
tooling. To implement even more flexibility, passing context to the
shell through environmental variables, or even an entire scripting
language with access to the rtla internals can be implemented if
needed.

Tomas Glozar (8):
  rtla/timerlat: Introduce enum timerlat_tracing_mode
  rtla/timerlat: Add action on threshold feature
  rtla/timerlat_bpf: Allow resuming tracing
  rtla/timerlat: Add continue action
  rtla/timerlat: Add action on end feature
  rtla/tests: Check rtla output with grep
  rtla/tests: Add tests for actions
  rtla/tests: Limit duration to maximum of 10s

 tools/tracing/rtla/src/Build           |   1 +
 tools/tracing/rtla/src/actions.c       | 260 +++++++++++++++++++++++++
 tools/tracing/rtla/src/actions.h       |  52 +++++
 tools/tracing/rtla/src/timerlat.bpf.c  |  13 +-
 tools/tracing/rtla/src/timerlat.c      |  24 ++-
 tools/tracing/rtla/src/timerlat.h      |  24 ++-
 tools/tracing/rtla/src/timerlat_bpf.c  |  13 ++
 tools/tracing/rtla/src/timerlat_bpf.h  |   3 +
 tools/tracing/rtla/src/timerlat_hist.c | 145 ++++++++++----
 tools/tracing/rtla/src/timerlat_top.c  | 167 ++++++++++------
 tools/tracing/rtla/tests/engine.sh     |  21 +-
 tools/tracing/rtla/tests/hwnoise.t     |   8 +-
 tools/tracing/rtla/tests/osnoise.t     |   4 +-
 tools/tracing/rtla/tests/timerlat.t    |  36 +++-
 14 files changed, 652 insertions(+), 119 deletions(-)
 create mode 100644 tools/tracing/rtla/src/actions.c
 create mode 100644 tools/tracing/rtla/src/actions.h

-- 
2.49.0
Re: [PATCH 0/8] rtla/timerlat: Support actions on threshold and on end
Posted by Arnaldo Carvalho de Melo 4 months ago
On Wed, Jun 11, 2025 at 03:56:36PM +0200, Tomas Glozar wrote:
> This series adds a feature that allows to user to specify certain
> kinds of "actions" to be executed at one of two places in the rtla
> measurement: when tracing is stopped on latency threshold, and at the
> end of tracing.
 
> Two new options are added: -A/--on-threshold, and -N/--on-end, taking
> the action as an argument. For example:

I wouldn't add -A and -N, leaving just the long options, as it documents
scripts (and we should have autocomplete as well), leaving the one
letter options for things that are used super frequently, which could be
these new options, after a while, time will tell :-)

But I see that "A"ction connection, and since you show it is used
multiple times in a single command line, maybe its warranted the
one-letter option.
 
> $ rtla timerlat hist -T 10 -A shell,command="echo Threshold" \
> -N shell,command="echo Tracing stopped"
 
> will print "Threshold" if a thread latency higher than 10 microseconds
> is reached, and "Tracing stopped" always at the end.
 
> The list of possible actions is extensible and is covered in
> the commit messages. Later, a documentation patch series will be sent
> with clear explanation of every action and its syntax.

I think having the documentation together with the new options is
desirable.
 
> Notably, a special action "continue" resumes tracing. For example:
 
> $ rtla timerlat hist -T 100 -A shell,command="echo Threshold" \
> -A continue -d 10s

so --on-threshold ends up being a list of things to do when the
threshold is hit?
 
> will print "Threshold" as many times as tracing is stopped after
> thread latency reaches 100us.

> The feature was inspired by a case where collecting perf data on rtla
> latency overflow was required, which can be done by sending a signal
> to the perf process.
 
> Example of this with Intel PT aux buffer:
 
> $ perf record --cpu 0 -e intel_pt// -S -- rtla timerlat top -q -T 100 \
> -c 0 -A signal,pid=parent,num=12 -A continue
 
> In general, the feature is aiming to allow integration with external
> tooling. To implement even more flexibility, passing context to the
> shell through environmental variables, or even an entire scripting
> language with access to the rtla internals can be implemented if
> needed.

That is an interesting example of cross-tool integration using existing
mechanisms for detecting special events and asking for hardware tracing
snapshots, good stuff!

At some point we need to have this signalling to not involve userspace,
shortcircuiting the snapshot request closer to the event of interest,
inside the kernel.

- Arnaldo
 
> Tomas Glozar (8):
>   rtla/timerlat: Introduce enum timerlat_tracing_mode
>   rtla/timerlat: Add action on threshold feature
>   rtla/timerlat_bpf: Allow resuming tracing
>   rtla/timerlat: Add continue action
>   rtla/timerlat: Add action on end feature
>   rtla/tests: Check rtla output with grep
>   rtla/tests: Add tests for actions
>   rtla/tests: Limit duration to maximum of 10s
> 
>  tools/tracing/rtla/src/Build           |   1 +
>  tools/tracing/rtla/src/actions.c       | 260 +++++++++++++++++++++++++
>  tools/tracing/rtla/src/actions.h       |  52 +++++
>  tools/tracing/rtla/src/timerlat.bpf.c  |  13 +-
>  tools/tracing/rtla/src/timerlat.c      |  24 ++-
>  tools/tracing/rtla/src/timerlat.h      |  24 ++-
>  tools/tracing/rtla/src/timerlat_bpf.c  |  13 ++
>  tools/tracing/rtla/src/timerlat_bpf.h  |   3 +
>  tools/tracing/rtla/src/timerlat_hist.c | 145 ++++++++++----
>  tools/tracing/rtla/src/timerlat_top.c  | 167 ++++++++++------
>  tools/tracing/rtla/tests/engine.sh     |  21 +-
>  tools/tracing/rtla/tests/hwnoise.t     |   8 +-
>  tools/tracing/rtla/tests/osnoise.t     |   4 +-
>  tools/tracing/rtla/tests/timerlat.t    |  36 +++-
>  14 files changed, 652 insertions(+), 119 deletions(-)
>  create mode 100644 tools/tracing/rtla/src/actions.c
>  create mode 100644 tools/tracing/rtla/src/actions.h
> 
> -- 
> 2.49.0
Re: [PATCH 0/8] rtla/timerlat: Support actions on threshold and on end
Posted by Tomas Glozar 3 months, 2 weeks ago
st 11. 6. 2025 v 16:46 odesílatel Arnaldo Carvalho de Melo
<acme@kernel.org> napsal:
>
> I wouldn't add -A and -N, leaving just the long options, as it documents
> scripts (and we should have autocomplete as well), leaving the one
> letter options for things that are used super frequently, which could be
> these new options, after a while, time will tell :-)
>

Hmm, my reasoning for those is that one might have multiple actions,
and the action argument itself is long, so one would get a very long
command, e.g.:

$ rtla timerlat hist -T 10 --on-threshold shell,command="echo
Threshold" --on-end shell,command="echo Tracing stopped"

for the command from the example below. But it's true that this is an
experimental feature, and I don't even precisely know the direction in
which I'm going (which is to be determined based on the use of this in
practice). So your suggestion makes a lot of sense.

>
> I think having the documentation together with the new options is
> desirable.
>

Right, this is a user facing change. I did the documentation
separately before, but that was for a change in implementation (BPF
sample collection). Also, I did not have the documentation at the time
of sending of the patchset ready yet :) I'll add it to the v2.

>
> so --on-threshold ends up being a list of things to do when the
> threshold is hit?
>

Yes, the list is executed in order. Now when I'm looking at the cover
letter, this is not clear, I'm only talking about the "list" of the
supported actions (which I perhaps should more accurately call "set").

>
> That is an interesting example of cross-tool integration using existing
> mechanisms for detecting special events and asking for hardware tracing
> snapshots, good stuff!
>

Thanks!

> At some point we need to have this signalling to not involve userspace,
> shortcircuiting the snapshot request closer to the event of interest,
> inside the kernel.
>

I have a feature in mind for that. We already use a BPF program to
process the samples [note1], which means that BPF tail call [1] can be
used to implement in-kernel actions next to userspace ones. Those can
be built-in BPF programs, or custom BPF programs supplied by the user.

[1] https://docs.ebpf.io/linux/helper-function/bpf_tail_call
[note1] In BPF mode. However, outside BPF mode, actions are not that
useful in the first place, as they are only executed when rtla wakes
up to process samples, incurring up to 1s latency.

Tomas