tools/tracing/rtla/src/timerlat_hist.c | 19 ++++++++++++++++++- tools/tracing/rtla/src/timerlat_top.c | 20 +++++++++++++++++++- tools/tracing/rtla/src/trace.c | 8 ++++++++ tools/tracing/rtla/src/trace.h | 1 + 4 files changed, 46 insertions(+), 2 deletions(-)
We have been seeing an issue where if rtla is run on machines with a high number of CPUs (100+), timerlat can generate more samples than rtla is able to process via tracefs_iterate_raw_events. This is especially common when the interval is set to 100us (rteval and cyclictest default) as opposed to the rtla default of 1000us, but also happens with the rtla default. Currently, this leads to rtla hanging and having to be terminated with SIGTERM. SIGINT setting stop_tracing is not enough, since more and more events are coming and tracefs_iterate_raw_events never exits. This patchset contains two changes: - Stop the timerlat tracer on SIGINT/SIGALRM to ensure no more events are generated when rtla is supposed exit. This fixes rtla hanging and should go to stable. - On receiving SIGINT/SIGALRM twice, abort iteration immediately with tracefs_iterate_stop, making rtla exit right away instead of waiting for all events to be processed. This is more of a usability feature: if the user is in a hurry, they can Ctrl-C twice (or once after the duration has expired) and exit immediately, discarding any events pending processing. Note: I am sending those together only because the second one depends on the first. Also this should be fixed in osnoise, too. In the future, two more patchsets will be sent: one to display how many events/samples were dropped (either left in tracefs buffer or by buffer overflow), one to improve sample processing performance to be on par with cyclictest (ideally) so that samples are not dropped in the cases mentioned in the beginning of the email. Tomas Glozar (5): rtla: Add trace_instance_stop rtla/timerlat_hist: Stop timerlat tracer on signal rtla/timerlat_top: Stop timerlat tracer on signal rtla/timerlat_hist: Abort event processing on second signal rtla/timerlat_top: Abort event processing on second signal tools/tracing/rtla/src/timerlat_hist.c | 19 ++++++++++++++++++- tools/tracing/rtla/src/timerlat_top.c | 20 +++++++++++++++++++- tools/tracing/rtla/src/trace.c | 8 ++++++++ tools/tracing/rtla/src/trace.h | 1 + 4 files changed, 46 insertions(+), 2 deletions(-) -- 2.47.1
On Thu, 16 Jan 2025 15:49:26 +0100
Tomas Glozar <tglozar@redhat.com> wrote:
> In the future, two more patchsets will be sent: one to display how many
> events/samples were dropped (either left in tracefs buffer or by buffer
> overflow), one to improve sample processing performance to be on par with
> cyclictest (ideally) so that samples are not dropped in the cases mentioned
> in the beginning of the email.
Hmm, I wonder if timerlat can handle per cpu data, then you could kick off
a thread per CPU (or a set of CPUs) where the thread is responsible for
handling the data.
CPU_ZERO_S(cpu_size, cpusetp);
CPU_SET_S(cpu, cpu_size, cpusetp);
retval = tracefs_iterate_raw_events(trace->tep,
trace->inst,
cpusetp,
cpu_size,
collect_registered_events,
trace);
And then that iteration will only read over a subset of CPUs. Each thread
can do a different subset and then it should be able to keep up.
-- Steve
pá 17. 1. 2025 v 1:46 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal: > Hmm, I wonder if timerlat can handle per cpu data, then you could kick off > a thread per CPU (or a set of CPUs) where the thread is responsible for > handling the data. > > > CPU_ZERO_S(cpu_size, cpusetp); > CPU_SET_S(cpu, cpu_size, cpusetp); > retval = tracefs_iterate_raw_events(trace->tep, > trace->inst, > cpusetp, > cpu_size, > collect_registered_events, > trace); > > And then that iteration will only read over a subset of CPUs. Each thread > can do a different subset and then it should be able to keep up. > That's a good idea, I didn't think of that. But it doesn't help much in a scenario where rtla is pinned to a few housekeeping CPUs with -H, which is used for testing isolated-CPU-based setups. I was thinking of turning timerlat_hist_handler/timerlat_top_handler into a BPF program and having it executed right after the sample is created, e.g. by using the BPF perf interface to hook it to a tracepoint event. The histogram/counter would be stored in BPF maps, which would be merely copied over in the main loop. This is essentially how cyclictest does it, except in userspace. I expect this solution to have good performance, but the obvious downside is that it requires BPF. This is not a problem for us, but might be for other rtla users and we'd likely have to keep both implementations of sample processing in the code. Also, before even starting with that, it would be likely necessary to remove the duplicate code throughout timerlat/osnoise and test it properly, so we don't have to do the same code changes twice or four times. Tomas
On Fri, 17 Jan 2025 13:04:07 +0100 Tomas Glozar <tglozar@redhat.com> wrote: > I was thinking of turning timerlat_hist_handler/timerlat_top_handler > into a BPF program and having it executed right after the sample is > created, e.g. by using the BPF perf interface to hook it to a > tracepoint event. The histogram/counter would be stored in BPF maps, > which would be merely copied over in the main loop. This is > essentially how cyclictest does it, except in userspace. I expect this > solution to have good performance, but the obvious downside is that it > requires BPF. This is not a problem for us, but might be for other > rtla users and we'd likely have to keep both implementations of sample > processing in the code. > > Also, before even starting with that, it would be likely necessary to > remove the duplicate code throughout timerlat/osnoise and test it > properly, so we don't have to do the same code changes twice or four > times. We could also add kernel helpers to the code if it would help. Hmm, the timerlat event could probably get access to a trigger to allow it to do the work in the kernel like what the 'hist' triggers do. We can extend on that. The reason I haven't written a BPF program yet, is because when I feel there's a useful operation that can be done, I just extend ftrace to do it ;-) -- Steve
pá 17. 1. 2025 v 16:29 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal: > Hmm, the timerlat event could probably get access to a trigger to allow it > to do the work in the kernel like what the 'hist' triggers do. We can > extend on that. > If we only need to count the numbers, that would be a perfect solution: use the hist trigger in place of the BPF program (no need to reinvent the wheel!). I'll have to look at exactly what we need. > The reason I haven't written a BPF program yet, is because when I feel > there's a useful operation that can be done, I just extend ftrace to do it > ;-) I used to work with BPF, so I usually think of a BPF solution first, then I think about whether I can replace the BPF part :D Tomas
© 2016 - 2025 Red Hat, Inc.