[v2] rtla/timerlat: Support actions on threshold and on end

[PATCH v2 9/9] Documentation/rtla: Add actions feature

Posted by Tomas Glozar 3 months, 2 weeks ago

Document both --on-threshold and --on-end, with examples.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 .../tools/rtla/common_timerlat_options.rst    | 64 +++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/Documentation/tools/rtla/common_timerlat_options.rst b/Documentation/tools/rtla/common_timerlat_options.rst
index 10dc802f8d65..7854368f1827 100644
--- a/Documentation/tools/rtla/common_timerlat_options.rst
+++ b/Documentation/tools/rtla/common_timerlat_options.rst
@@ -55,3 +55,67 @@
         Set timerlat to run without workload, waiting for the user to dispatch a per-cpu
         task that waits for a new period on the tracing/osnoise/per_cpu/cpu$ID/timerlat_fd.
         See linux/tools/rtla/sample/timerlat_load.py for an example of user-load code.
+
+**--on-threshold** *action*
+
+        Defines an action to be executed when tracing is stopped on a latency threshold
+        specified by **-i/--irq** or **-T/--thread**.
+
+        Multiple --on-threshold actions may be specified, and they will be executed in
+        the order they are provided. If any action fails, subsequent actions in the list
+        will not be executed.
+
+        Supported actions are:
+
+        - *trace[,file=<filename>]*
+
+          Saves trace output, optionally taking a filename. Alternative to -t/--trace.
+          Note that nlike -t/--trace, specifying this multiple times will result in
+          the trace being saved multiple times.
+
+        - *signal,num=<sig>,pid=<pid>*
+
+          Sends signal to process. "parent" might be specified in place of pid to target
+          the parent process of rtla.
+
+        - *shell,command=<command>*
+
+          Execute shell command.
+
+        - *continue*
+
+          Continue tracing after actions are executed instead of stopping.
+
+        Example:
+
+        $ rtla timerlat -T 20 --on-threshold trace
+        --on-threshold shell,command="grep ipi_send timerlat_trace.txt"
+        --on-threshold signal,num=2,pid=parent
+
+        This will save a trace with the default filename "timerlat_trace.txt", print its
+        lines that contain the text "ipi_send" on standard output, and send signal 2
+        (SIGINT) to the parent process.
+
+        Performance Considerations:
+
+        For time-sensitive actions, it is recommended to run **rtla timerlat** with BPF
+        support and RT priority. Note that due to implementational limitations, actions
+        might be delayed up to one second after tracing is stopped if BPF mode is not
+        available or disabled.
+
+**--on-end** *action*
+
+        Defines an action to be executed at the end of **rtla timerlat** tracing.
+
+        Multiple --on-end actions can be specified, and they will be executed in the order
+        they are provided. If any action fails, subsequent actions in the list will not be
+        executed.
+
+        See the documentation for **--on-threshold** for the list of supported actions, with
+        the exception that *continue* has no effect.
+
+        Example:
+
+        $ rtla timerlat -d 5s --on-end trace
+
+        This runs rtla timerlat with default options and save trace output at the end.
-- 
2.49.0

Re: [PATCH v2 9/9] Documentation/rtla: Add actions feature

Posted by Steven Rostedt 2 months, 2 weeks ago

On Thu, 26 Jun 2025 14:34:05 +0200
Tomas Glozar <tglozar@redhat.com> wrote:

> +        For time-sensitive actions, it is recommended to run **rtla timerlat** with BPF
> +        support and RT priority. Note that due to implementational limitations, actions
> +        might be delayed up to one second after tracing is stopped if BPF mode is not
> +        available or disabled.
> +

I'm curious to what is looked for for triggering an action. We can poll on
events and get woken when they are triggered. It may be possible to add
even more ways to wake a task waiting for something to happen.

-- Steve

Re: [PATCH v2 9/9] Documentation/rtla: Add actions feature

Posted by Tomas Glozar 2 months, 2 weeks ago

út 22. 7. 2025 v 0:35 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal:
>
> I'm curious to what is looked for for triggering an action. We can poll on
> events and get woken when they are triggered. It may be possible to add
> even more ways to wake a task waiting for something to happen.
>

Threshold actions are triggered immediately after a sample over the
set threshold is detected by rtla. For BPF mode, this happens almost
right after the sample is processed in the BPF program and the
scheduler gets to waking up rtla following a BPF ringbuffer write.
There is only a short delay (up to tens of microseconds) because the
BPF helper defers the wake-up into irq_work.

For non-BPF mode, rtla periodically pulls samples from tracefs, when
it does that, it also checks whether tracing has been turned off. If
yes, that means there was a threshold overflow, and actions are
triggered. Since the period for that is currently set to 1 second, the
action might be delayed up to one second from the threshold occurring,
That delay might be a problem if you need to collect a lot of data
from a ringbuffer in the action, e.g. global Intel PT data collection
for precise troubleshooting of difficult latencies.

Of course, this is just an implementational limitation of the timerlat
tracer. If timerlat had an event (like osnoise's "sample_threshold")
triggered on threshold overflow and if it is possible to wait on it
even without BPF, rtla could wait on that for both BPF and non-BPF
mode instead of what it is currently doing.

Tomas

Re: [PATCH v2 9/9] Documentation/rtla: Add actions feature

Posted by Steven Rostedt 2 months, 2 weeks ago

On Tue, 22 Jul 2025 09:03:24 +0200
Tomas Glozar <tglozar@redhat.com> wrote:

> Of course, this is just an implementational limitation of the timerlat
> tracer. If timerlat had an event (like osnoise's "sample_threshold")
> triggered on threshold overflow and if it is possible to wait on it
> even without BPF, rtla could wait on that for both BPF and non-BPF
> mode instead of what it is currently doing.

Right. Is this something you may want?

-- Steve

Re: [PATCH v2 9/9] Documentation/rtla: Add actions feature

Posted by Tomas Glozar 2 months, 2 weeks ago

út 22. 7. 2025 v 17:30 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal:
>
> On Tue, 22 Jul 2025 09:03:24 +0200
> Tomas Glozar <tglozar@redhat.com> wrote:
>
> > Of course, this is just an implementational limitation of the timerlat
> > tracer. If timerlat had an event (like osnoise's "sample_threshold")
> > triggered on threshold overflow and if it is possible to wait on it
> > even without BPF, rtla could wait on that for both BPF and non-BPF
> > mode instead of what it is currently doing.
>
> Right. Is this something you may want?
>

I don't think it is that important. Non-BPF mode is mostly as a
fallback for users of rtla on older kernels which don't have the
osnoise:timerlat_sample trace event. Those are (I assume) mostly users
of LTS distributions who run newer rtla from a container. Adding a new
event wouldn't help in their case.

The only users who would benefit from that are those who don't have
BPF or libbpf. If there is interest from using low-latency actions on
threshold in such settings, I'm not against implementing a threshold
overflow tracepoint and supporting it in rtla for triggering actions
on threshold.

Tomas