From nobody Mon Jun 15 18:00:20 2026 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D69837C926 for ; Sun, 12 Apr 2026 19:27:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776022077; cv=none; b=e4XZBgysXzlqHpd2dUXjBfqTKotqVAbrWYP5ojzEVP44mt58N3HKJU0eC/IZYcbi9dA0nMtD15lQGPXEKoJTcvubPeV46Rei9V+aqqtfGV+OP4360ZXDvaJ+Fi0uYZlIuh06cARll7AxA4vzSYnQGut16JUoF/JbIasNFhxQ2+8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776022077; c=relaxed/simple; bh=GdFhYrbIhKqz80UHH45O3JlwzMOM66lrkoZ09yk7ZrU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ex17ZTOQGBnOXcs2JUsAFS6POc9jkKSSosBcy5XKxkWL1cemotgpjrMVIh+h9/anYJM2Zt4Fk4ZCMoIM+i8l26xiOjtVe0bs96hgE8igo3q+tHxzlFnK/NjH26JOFF2j3ZZf6KCCu2QgZysocHLoiK+VP90Fayj0oyMD1BINHe4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Aq4KVV3x; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Aq4KVV3x" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1776022073; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WmZ7CDJWgi1NBkVgSeei7vSPiJJzLZETjykTdWVNVKM=; b=Aq4KVV3xn9Ky+vd3bJUR9TmXtGLS2eMoYelUY9a31+nLI9ym4yyivuZgGW27yQmB+jiOZU pdUOSuo9jJcQJyHYouq/0S8uXkfS6zsfEZH9jNfWQZvsYOYuGTPjjeGeO+3LlGwti/LoAF 3nQRbDJRINe7UdzYYUc+icPfb0osk7Q= From: wen.yang@linux.dev To: Steven Rostedt , Gabriele Monaco , Masami Hiramatsu , Mathieu Desnoyers Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Wen Yang Subject: [RFC PATCH 1/4] rv/tlob: Add tlob model DOT file Date: Mon, 13 Apr 2026 03:27:18 +0800 Message-Id: <64122474633aa17d872a7dc6233d7794e80f2784.1776020428.git.wen.yang@linux.dev> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Wen Yang Add the Graphviz DOT specification for the tlob (task latency over budget) deterministic automaton. The model has three states: unmonitored, on_cpu, and off_cpu. trace_start transitions from unmonitored to on_cpu; switch_out and switch_in cycle between on_cpu and off_cpu; trace_stop and budget_expired return to unmonitored from either active state. unmonitored is the sole accepting state. switch_in, switch_out, and sched_wakeup self-loop in unmonitored; sched_wakeup self-loops in on_cpu; switch_out and sched_wakeup self-loop in off_cpu. Signed-off-by: Wen Yang --- MAINTAINERS | 3 +++ tools/verification/models/tlob.dot | 25 +++++++++++++++++++++++++ 2 files changed, 28 insertions(+) create mode 100644 tools/verification/models/tlob.dot diff --git a/MAINTAINERS b/MAINTAINERS index 9fbb619c6..c2c56236c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23242,7 +23242,10 @@ S: Maintained F: Documentation/trace/rv/ F: include/linux/rv.h F: include/rv/ +F: include/uapi/linux/rv.h F: kernel/trace/rv/ +F: samples/rv/ +F: tools/testing/selftests/rv/ F: tools/testing/selftests/verification/ F: tools/verification/ =20 diff --git a/tools/verification/models/tlob.dot b/tools/verification/models= /tlob.dot new file mode 100644 index 000000000..df34a14b8 --- /dev/null +++ b/tools/verification/models/tlob.dot @@ -0,0 +1,25 @@ +digraph state_automaton { + center =3D true; + size =3D "7,11"; + {node [shape =3D plaintext, style=3Dinvis, label=3D""] "__init_unmonitore= d"}; + {node [shape =3D ellipse] "unmonitored"}; + {node [shape =3D plaintext] "unmonitored"}; + {node [shape =3D plaintext] "on_cpu"}; + {node [shape =3D plaintext] "off_cpu"}; + "__init_unmonitored" -> "unmonitored"; + "unmonitored" [label =3D "unmonitored", color =3D green3]; + "unmonitored" -> "on_cpu" [ label =3D "trace_start" ]; + "unmonitored" -> "unmonitored" [ label =3D "switch_in\nswitch_out\nsched_= wakeup" ]; + "on_cpu" [label =3D "on_cpu"]; + "on_cpu" -> "off_cpu" [ label =3D "switch_out" ]; + "on_cpu" -> "unmonitored" [ label =3D "trace_stop\nbudget_expired" ]; + "on_cpu" -> "on_cpu" [ label =3D "sched_wakeup" ]; + "off_cpu" [label =3D "off_cpu"]; + "off_cpu" -> "on_cpu" [ label =3D "switch_in" ]; + "off_cpu" -> "unmonitored" [ label =3D "trace_stop\nbudget_expired" ]; + "off_cpu" -> "off_cpu" [ label =3D "switch_out\nsched_wakeup" ]; + { rank =3D min ; + "__init_unmonitored"; + "unmonitored"; + } +} --=20 2.43.0 From nobody Mon Jun 15 18:00:20 2026 Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3F6037CD31 for ; Sun, 12 Apr 2026 19:27:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.184 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776022083; cv=none; b=UAmbWUvXTQspjzoVxZ7v3yQehsH9odYS2focD1Se61I5bhu/2O+DFz8Szg/O2WHfCOe28T5GzBODkPQptF9Rfbtdw9i/i5goxAuwZHIhq9GBbecQQn1c3ifoJCKVvVMAmtQ3RHeelf3ZjGY/DgGqHdTFe2WP5YCKp+3tagh6Yq4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776022083; c=relaxed/simple; bh=ECzEEugIDX2PDBQnn9R2ESVmY48aX6eWwWAjTMc4nh4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=l1ZvnRvRosHmDUd7FtB8po1crq8Pr9AyCjijIMHr0rDtcK+KS0DaLC3sqGVWnQ8RmELlnZvLB3tQDLqr8SeAAH49n+c3ByXQRtNCWze082EqEJj1zmy58J4alnHRnIYbYsr3ua2mrEUtQ1w3F0Z0lEhyeV1Cez9gy/ZwWHL+sJo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=BldVCkLp; arc=none smtp.client-ip=91.218.175.184 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="BldVCkLp" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1776022076; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pRYWf2a2dDqHsCWfV+DA6ccAwaH63d1VwRLceKvoHPo=; b=BldVCkLpm+/GZVU6KcS/9UTfKy5g33GigvsxgFwO1vx/brdEPMgawSu3u8Xu+wbZcCRfpe stecw3gkDXIJgdCEN7z82gHLga9HUssQtihc6b9h7R9KqtFx4F6tsTttIYSOY8CSujjzb/ 7075xD2n2Z7E0ba9iXpDB0+Q4YEivmg= From: wen.yang@linux.dev To: Steven Rostedt , Gabriele Monaco , Masami Hiramatsu , Mathieu Desnoyers Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Wen Yang Subject: [RFC PATCH 2/4] rv/tlob: Add tlob deterministic automaton monitor Date: Mon, 13 Apr 2026 03:27:19 +0800 Message-Id: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Wen Yang Add the tlob (task latency over budget) RV monitor. tlob tracks the monotonic elapsed time (CLOCK_MONOTONIC) of a marked per-task code path, including time off-CPU, and fires a per-task hrtimer when the elapsed time exceeds a configurable budget. Three-state DA (unmonitored/on_cpu/off_cpu) driven by trace_start, switch_in/out, and budget_expired events. Per-task state lives in a fixed-size hash table (TLOB_MAX_MONITORED slots) with RCU-deferred free. Two userspace interfaces: - tracefs: uprobe pair registration via the monitor file using the format "pid:threshold_us:offset_start:offset_stop:binary_path" - /dev/rv ioctls (CONFIG_RV_CHARDEV): TLOB_IOCTL_TRACE_START / TRACE_STOP; TRACE_STOP returns -EOVERFLOW on violation Each /dev/rv fd has a per-fd mmap ring buffer (physically contiguous pages). A control page (struct tlob_mmap_page) at offset 0 exposes head/tail/dropped for lockless userspace reads; struct tlob_event records follow at data_offset. Drop-new policy on overflow. UAPI: include/uapi/linux/rv.h (tlob_start_args, tlob_event, tlob_mmap_page, ioctl numbers), monitor_tlob.rst, ioctl-number.rst (RV_IOC_MAGIC=3D0xB9). Signed-off-by: Wen Yang --- Documentation/trace/rv/index.rst | 1 + Documentation/trace/rv/monitor_tlob.rst | 381 +++++++ .../userspace-api/ioctl/ioctl-number.rst | 1 + include/uapi/linux/rv.h | 181 ++++ kernel/trace/rv/Kconfig | 17 + kernel/trace/rv/Makefile | 2 + kernel/trace/rv/monitors/tlob/Kconfig | 51 + kernel/trace/rv/monitors/tlob/tlob.c | 986 ++++++++++++++++++ kernel/trace/rv/monitors/tlob/tlob.h | 145 +++ kernel/trace/rv/monitors/tlob/tlob_trace.h | 42 + kernel/trace/rv/rv.c | 4 + kernel/trace/rv/rv_dev.c | 602 +++++++++++ kernel/trace/rv/rv_trace.h | 50 + 13 files changed, 2463 insertions(+) create mode 100644 Documentation/trace/rv/monitor_tlob.rst create mode 100644 include/uapi/linux/rv.h create mode 100644 kernel/trace/rv/monitors/tlob/Kconfig create mode 100644 kernel/trace/rv/monitors/tlob/tlob.c create mode 100644 kernel/trace/rv/monitors/tlob/tlob.h create mode 100644 kernel/trace/rv/monitors/tlob/tlob_trace.h create mode 100644 kernel/trace/rv/rv_dev.c diff --git a/Documentation/trace/rv/index.rst b/Documentation/trace/rv/inde= x.rst index a2812ac5c..4f2bfaf38 100644 --- a/Documentation/trace/rv/index.rst +++ b/Documentation/trace/rv/index.rst @@ -15,3 +15,4 @@ Runtime Verification monitor_wwnr.rst monitor_sched.rst monitor_rtapp.rst + monitor_tlob.rst diff --git a/Documentation/trace/rv/monitor_tlob.rst b/Documentation/trace/= rv/monitor_tlob.rst new file mode 100644 index 000000000..d498e9894 --- /dev/null +++ b/Documentation/trace/rv/monitor_tlob.rst @@ -0,0 +1,381 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Monitor tlob +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +- Name: tlob - task latency over budget +- Type: per-task deterministic automaton +- Author: Wen Yang + +Description +----------- + +The tlob monitor tracks per-task elapsed time (CLOCK_MONOTONIC, including +both on-CPU and off-CPU time) and reports a violation when the monitored +task exceeds a configurable latency budget threshold. + +The monitor implements a three-state deterministic automaton:: + + | + | (initial) + v + +--------------+ + +-------> | unmonitored | + | +--------------+ + | | + | trace_start + | v + | +--------------+ + | | on_cpu | + | +--------------+ + | | | + | switch_out| | trace_stop / budget_expired + | v v + | +--------------+ (unmonitored) + | | off_cpu | + | +--------------+ + | | | + | | switch_in| trace_stop / budget_expired + | v v + | (on_cpu) (unmonitored) + | + +-- trace_stop (from on_cpu or off_cpu) + + Key transitions: + unmonitored --(trace_start)--> on_cpu + on_cpu --(switch_out)--> off_cpu + off_cpu --(switch_in)--> on_cpu + on_cpu --(trace_stop)--> unmonitored + off_cpu --(trace_stop)--> unmonitored + on_cpu --(budget_expired)-> unmonitored [violation] + off_cpu --(budget_expired)-> unmonitored [violation] + + sched_wakeup self-loops in on_cpu and unmonitored; switch_out and + sched_wakeup self-loop in off_cpu. budget_expired is fired by the one-s= hot hrtimer; it always + transitions to unmonitored regardless of whether the task is on-CPU + or off-CPU when the timer fires. + +State Descriptions +------------------ + +- **unmonitored**: Task is not being traced. Scheduling events + (``switch_in``, ``switch_out``, ``sched_wakeup``) are silently + ignored (self-loop). The monitor waits for a ``trace_start`` event + to begin a new observation window. + +- **on_cpu**: Task is running on the CPU with the deadline timer armed. + A one-shot hrtimer was set for ``threshold_us`` microseconds at + ``trace_start`` time. A ``switch_out`` event transitions to + ``off_cpu``; the hrtimer keeps running (off-CPU time counts toward + the budget). A ``trace_stop`` cancels the timer and returns to + ``unmonitored`` (normal completion). If the hrtimer fires + (``budget_expired``) the violation is recorded and the automaton + transitions to ``unmonitored``. + +- **off_cpu**: Task was preempted or blocked. The one-shot hrtimer + continues to run. A ``switch_in`` event returns to ``on_cpu``. + A ``trace_stop`` cancels the timer and returns to ``unmonitored``. + If the hrtimer fires (``budget_expired``) while the task is off-CPU, + the violation is recorded and the automaton transitions to + ``unmonitored``. + +Rationale +--------- + +The per-task latency budget threshold allows operators to express timing +requirements in microseconds and receive an immediate ftrace event when a +task exceeds its budget. This is useful for real-time tasks +(``SCHED_FIFO`` / ``SCHED_DEADLINE``) where total elapsed time must +remain within a known bound. + +Each task has an independent threshold, so up to ``TLOB_MAX_MONITORED`` +(64) tasks with different timing requirements can be monitored +simultaneously. + +On threshold violation the automaton records a ``tlob_budget_exceeded`` +ftrace event carrying the final on-CPU / off-CPU time breakdown, but does +not kill or throttle the task. Monitoring can be restarted by issuing a +new ``trace_start`` event (or a new ``TLOB_IOCTL_TRACE_START`` ioctl). + +A per-task one-shot hrtimer is armed at ``trace_start`` for exactly +``threshold_us`` microseconds. It fires at most once per monitoring +window, performs an O(1) hash lookup, records the violation, and injects +the ``budget_expired`` event into the DA. When ``CONFIG_RV_MON_TLOB`` +is not set there is zero runtime cost. + +Usage +----- + +tracefs interface (uprobe-based external monitoring) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``monitor`` tracefs file allows any privileged user to instrument an +unmodified binary via uprobes, without changing its source code. Write a +four-field record to attach two plain entry uprobes: one at +``offset_start`` fires ``tlob_start_task()`` and one at ``offset_stop`` +fires ``tlob_stop_task()``, so the latency budget covers exactly the code +region between the two offsets:: + + threshold_us:offset_start:offset_stop:binary_path + +``binary_path`` comes last so it may freely contain ``:`` (e.g. paths +inside a container namespace). + +The uprobes fire for every task that executes the probed instruction in +the binary, consistent with the native uprobe semantics. All tasks that +execute the code region get independent per-task monitoring slots. + +Using two plain entry uprobes (rather than a uretprobe for the stop) means +that a mistyped offset can never corrupt the call stack; the worst outcome +of a bad ``offset_stop`` is a missed stop that causes the hrtimer to fire +and report a budget violation. + +Example -- monitor a code region in ``/usr/bin/myapp`` with a 5 ms +budget, where the region starts at offset 0x12a0 and ends at 0x12f0:: + + echo 1 > /sys/kernel/tracing/rv/monitors/tlob/enable + + # Bind uprobes: start probe starts the clock, stop probe stops it + echo "5000:0x12a0:0x12f0:/usr/bin/myapp" \ + > /sys/kernel/tracing/rv/monitors/tlob/monitor + + # Remove the uprobe binding for this code region + echo "-0x12a0:/usr/bin/myapp" > /sys/kernel/tracing/rv/monitors/tlob/mon= itor + + # List registered uprobe bindings (mirrors the write format) + cat /sys/kernel/tracing/rv/monitors/tlob/monitor + # -> 5000:0x12a0:0x12f0:/usr/bin/myapp + + # Read violations from the trace buffer + cat /sys/kernel/tracing/trace + +Up to ``TLOB_MAX_MONITORED`` tasks may be monitored simultaneously. + +The offsets can be obtained with ``nm`` or ``readelf``:: + + nm -n /usr/bin/myapp | grep my_function + # -> 0000000000012a0 T my_function + + readelf -s /usr/bin/myapp | grep my_function + # -> 42: 0000000000012a0 336 FUNC GLOBAL DEFAULT 13 my_function + + # offset_start =3D 0x12a0 (function entry) + # offset_stop =3D 0x12a0 + 0x50 =3D 0x12f0 (or any instruction before r= eturn) + +Notes: + +- The uprobes fire for every task that executes the probed instruction, + so concurrent calls from different threads each get independent + monitoring slots. +- ``offset_stop`` need not be a function return; it can be any instruction + within the region. If the stop probe is never reached (e.g. early exit + path bypasses it), the hrtimer fires and a budget violation is reported. +- Each ``(binary_path, offset_start)`` pair may only be registered once. + A second write with the same ``offset_start`` for the same binary is + rejected with ``-EEXIST``. Two entry uprobes at the same address would + both fire for every task, causing ``tlob_start_task()`` to be called + twice; the second call would silently fail with ``-EEXIST`` and the + second binding's threshold would never take effect. Different code + regions that share the same ``offset_stop`` (common exit point) are + explicitly allowed. +- The uprobe binding is removed when ``-offset_start:binary_path`` is + written to ``monitor``, or when the monitor is disabled. +- The ``tag`` field in every ``tlob_budget_exceeded`` event is + automatically set to ``offset_start`` for the tracefs path, so + violation events for different code regions are immediately + distinguishable even when ``threshold_us`` values are identical. + +ftrace ring buffer (budget violation events) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a monitored task exceeds its latency budget the hrtimer fires, +records the violation, and emits a single ``tlob_budget_exceeded`` event +into the ftrace ring buffer. **Nothing is written to the ftrace ring +buffer while the task is within budget.** + +The event carries the on-CPU / off-CPU time breakdown so that root-cause +analysis (CPU-bound vs. scheduling / I/O overrun) is immediate:: + + cat /sys/kernel/tracing/trace + +Example output:: + + myapp-1234 [003] .... 12345.678: tlob_budget_exceeded: \ + myapp[1234]: budget exceeded threshold=3D5000 \ + on_cpu=3D820 off_cpu=3D4500 switches=3D3 state=3Doff_cpu tag=3D0x00000= 000000012a0 + +Field descriptions: + +``threshold`` + Configured latency budget in microseconds. + +``on_cpu`` + Cumulative on-CPU time since ``trace_start``, in microseconds. + +``off_cpu`` + Cumulative off-CPU (scheduling + I/O wait) time since ``trace_start``, + in microseconds. + +``switches`` + Number of times the task was scheduled out during this window. + +``state`` + DA state when the hrtimer fired: ``on_cpu`` means the task was executing + when the budget expired (CPU-bound overrun); ``off_cpu`` means the task + was preempted or blocked (scheduling / I/O overrun). + +``tag`` + Opaque 64-bit cookie supplied by the caller via ``tlob_start_args.tag`` + (ioctl path) or automatically set to ``offset_start`` (tracefs uprobe + path). Use it to distinguish violations from different code regions + monitored by the same thread. Zero when not set. + +To capture violations in a file:: + + trace-cmd record -e tlob_budget_exceeded & + # ... run workload ... + trace-cmd report + +/dev/rv ioctl interface (self-instrumentation) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Tasks can self-instrument their own code paths via the ``/dev/rv`` misc +device (requires ``CONFIG_RV_CHARDEV``). The kernel key is +``task_struct``; multiple threads sharing a single fd each get their own +independent monitoring slot. + +**Synchronous mode** -- the calling thread checks its own result:: + + int fd =3D open("/dev/rv", O_RDWR); + + struct tlob_start_args args =3D { + .threshold_us =3D 50000, /* 50 ms */ + .tag =3D 0, /* optional; 0 =3D don't care */ + .notify_fd =3D -1, /* no fd notification */ + }; + ioctl(fd, TLOB_IOCTL_TRACE_START, &args); + + /* ... code path under observation ... */ + + int ret =3D ioctl(fd, TLOB_IOCTL_TRACE_STOP, NULL); + /* ret =3D=3D 0: within budget */ + /* ret =3D=3D -EOVERFLOW: budget exceeded */ + + close(fd); + +**Asynchronous mode** -- a dedicated monitor thread receives violation +records via ``read()`` on a shared fd, decoupling the observation from +the critical path:: + + /* Monitor thread: open a dedicated fd. */ + int monitor_fd =3D open("/dev/rv", O_RDWR); + + /* Worker thread: set notify_fd =3D monitor_fd in TRACE_START args. */ + int work_fd =3D open("/dev/rv", O_RDWR); + struct tlob_start_args args =3D { + .threshold_us =3D 10000, /* 10 ms */ + .tag =3D REGION_A, + .notify_fd =3D monitor_fd, + }; + ioctl(work_fd, TLOB_IOCTL_TRACE_START, &args); + /* ... critical section ... */ + ioctl(work_fd, TLOB_IOCTL_TRACE_STOP, NULL); + + /* Monitor thread: blocking read() returns one or more tlob_event record= s. */ + struct tlob_event ntfs[8]; + ssize_t n =3D read(monitor_fd, ntfs, sizeof(ntfs)); + for (int i =3D 0; i < n / sizeof(struct tlob_event); i++) { + struct tlob_event *ntf =3D &ntfs[i]; + printf("tid=3D%u tag=3D0x%llx exceeded budget=3D%llu us " + "(on_cpu=3D%llu off_cpu=3D%llu switches=3D%u state=3D%s)\n", + ntf->tid, ntf->tag, ntf->threshold_us, + ntf->on_cpu_us, ntf->off_cpu_us, ntf->switches, + ntf->state ? "on_cpu" : "off_cpu"); + } + +**mmap ring buffer** -- zero-copy consumption of violation events:: + + int fd =3D open("/dev/rv", O_RDWR); + struct tlob_start_args args =3D { + .threshold_us =3D 1000, /* 1 ms */ + .notify_fd =3D fd, /* push violations to own ring buffer */ + }; + ioctl(fd, TLOB_IOCTL_TRACE_START, &args); + + /* Map the ring: one control page + capacity data records. */ + size_t pagesize =3D sysconf(_SC_PAGESIZE); + size_t cap =3D 64; /* read from page->capacity after mmap */ + size_t len =3D pagesize + cap * sizeof(struct tlob_event); + void *map =3D mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + + struct tlob_mmap_page *page =3D map; + struct tlob_event *data =3D + (struct tlob_event *)((char *)map + page->data_offset); + + /* Consumer loop: poll for events, read without copying. */ + while (1) { + poll(&(struct pollfd){fd, POLLIN, 0}, 1, -1); + + uint32_t head =3D __atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE= ); + uint32_t tail =3D page->data_tail; + while (tail !=3D head) { + handle(&data[tail & (page->capacity - 1)]); + tail++; + } + __atomic_store_n(&page->data_tail, tail, __ATOMIC_RELEASE); + } + +Note: ``read()`` and ``mmap()`` share the same ring and ``data_tail`` +cursor. Do not use both simultaneously on the same fd. + +``tlob_event`` fields: + +``tid`` + Thread ID (``task_pid_vnr``) of the violating task. + +``threshold_us`` + Budget that was exceeded, in microseconds. + +``on_cpu_us`` + Cumulative on-CPU time at violation time, in microseconds. + +``off_cpu_us`` + Cumulative off-CPU time at violation time, in microseconds. + +``switches`` + Number of context switches since ``TRACE_START``. + +``state`` + 1 =3D timer fired while task was on-CPU; 0 =3D timer fired while off-CPU. + +``tag`` + Cookie from ``tlob_start_args.tag``; for the tracefs uprobe path this + equals ``offset_start``. Zero when not set. + +tracefs files +------------- + +The following files are created under +``/sys/kernel/tracing/rv/monitors/tlob/``: + +``enable`` (rw) + Write ``1`` to enable the monitor; write ``0`` to disable it and + stop all currently monitored tasks. + +``desc`` (ro) + Human-readable description of the monitor. + +``monitor`` (rw) + Write ``threshold_us:offset_start:offset_stop:binary_path`` to bind two + plain entry uprobes in *binary_path*. The uprobe at *offset_start* fires + ``tlob_start_task()``; the uprobe at *offset_stop* fires + ``tlob_stop_task()``. Returns ``-EEXIST`` if a binding with the same + *offset_start* already exists for *binary_path*. Write + ``-offset_start:binary_path`` to remove the binding. Read to list + registered bindings, one + ``threshold_us:0xoffset_start:0xoffset_stop:binary_path`` entry per line. + +Specification +------------- + +Graphviz DOT file in tools/verification/models/tlob.dot diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documenta= tion/userspace-api/ioctl/ioctl-number.rst index 331223761..8d3af68db 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -385,6 +385,7 @@ Code Seq# Include File = Comments 0xB8 01-02 uapi/misc/mrvl_cn10k_dpi.h Mar= vell CN10K DPI driver 0xB8 all uapi/linux/mshv.h Mic= rosoft Hyper-V /dev/mshv driver +0xB9 00-3F linux/rv.h Run= time Verification (RV) monitors 0xBA 00-0F uapi/linux/liveupdate.h Pas= ha Tatashin 0xC0 00-0F linux/usb/iowarrior.h diff --git a/include/uapi/linux/rv.h b/include/uapi/linux/rv.h new file mode 100644 index 000000000..d1b96d8cd --- /dev/null +++ b/include/uapi/linux/rv.h @@ -0,0 +1,181 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * UAPI definitions for Runtime Verification (RV) monitors. + * + * All RV monitors that expose an ioctl self-instrumentation interface + * share the magic byte RV_IOC_MAGIC (0xB9), registered in + * Documentation/userspace-api/ioctl/ioctl-number.rst. + * + * A single /dev/rv misc device serves as the entry point. ioctl numbers + * encode both the monitor identity and the operation: + * + * 0x01 - 0x1F tlob (task latency over budget) + * 0x20 - 0x3F reserved for future RV monitors + * + * Usage examples and design rationale are in: + * Documentation/trace/rv/monitor_tlob.rst + */ + +#ifndef _UAPI_LINUX_RV_H +#define _UAPI_LINUX_RV_H + +#include +#include + +/* Magic byte shared by all RV monitor ioctls. */ +#define RV_IOC_MAGIC 0xB9 + +/* ----------------------------------------------------------------------- + * tlob: task latency over budget monitor (nr 0x01 - 0x1F) + * ----------------------------------------------------------------------- + */ + +/** + * struct tlob_start_args - arguments for TLOB_IOCTL_TRACE_START + * @threshold_us: Latency budget for this critical section, in microsecond= s. + * Must be greater than zero. + * @tag: Opaque 64-bit cookie supplied by the caller. Echoed back + * verbatim in the tlob_budget_exceeded ftrace event and in = any + * tlob_event record delivered via @notify_fd. Use it to id= entify + * which code region triggered a violation when the same thr= ead + * monitors multiple regions sequentially. Set to 0 if not + * needed. + * @notify_fd: File descriptor that will receive a tlob_event record on + * violation. Must refer to an open /dev/rv fd. May equal + * the calling fd (self-notification, useful for retrieving = the + * on_cpu_us / off_cpu_us breakdown after TRACE_STOP returns + * -EOVERFLOW). Set to -1 to disable fd notification; in th= at + * case violations are only signalled via the TRACE_STOP ret= urn + * value and the tlob_budget_exceeded ftrace event. + * @flags: Must be 0. Reserved for future extensions. + */ +struct tlob_start_args { + __u64 threshold_us; + __u64 tag; + __s32 notify_fd; + __u32 flags; +}; + +/** + * struct tlob_event - one budget-exceeded event + * + * Consumed by read() on the notify_fd registered at TLOB_IOCTL_TRACE_STAR= T. + * Each record describes a single budget exceedance for one task. + * + * @tid: Thread ID (task_pid_vnr) of the violating task. + * @threshold_us: Budget that was exceeded, in microseconds. + * @on_cpu_us: Cumulative on-CPU time at violation time, in microsecond= s. + * @off_cpu_us: Cumulative off-CPU (scheduling + I/O wait) time at + * violation time, in microseconds. + * @switches: Number of context switches since TRACE_START. + * @state: DA state at violation: 1 =3D on_cpu, 0 =3D off_cpu. + * @tag: Cookie from tlob_start_args.tag; for the tracefs uprobe = path + * this is the offset_start value. Zero when not set. + */ +struct tlob_event { + __u32 tid; + __u32 pad; + __u64 threshold_us; + __u64 on_cpu_us; + __u64 off_cpu_us; + __u32 switches; + __u32 state; /* 1 =3D on_cpu, 0 =3D off_cpu */ + __u64 tag; +}; + +/** + * struct tlob_mmap_page - control page for the mmap'd violation ring buff= er + * + * Mapped at offset 0 of the mmap region returned by mmap(2) on a /dev/rv = fd. + * The data array of struct tlob_event records begins at offset @data_offs= et + * (always one page from the mmap base; use this field rather than hard-co= ding + * PAGE_SIZE so the code remains correct across architectures). + * + * Ring layout: + * + * mmap base + 0 : struct tlob_mmap_page (one page) + * mmap base + data_offset : struct tlob_event[capacity] + * + * The mmap length determines the ring capacity. Compute it as: + * + * raw =3D sysconf(_SC_PAGESIZE) + capacity * sizeof(struct tlob_even= t) + * length =3D (raw + sysconf(_SC_PAGESIZE) - 1) & ~(sysconf(_SC_PAGESIZE= ) - 1) + * + * i.e. round the raw byte count up to the next page boundary before + * passing it to mmap(2). The kernel requires a page-aligned length. + * capacity must be a power of 2. Read @capacity after a successful + * mmap(2) for the actual value. + * + * Producer/consumer ordering contract: + * + * Kernel (producer): + * data[data_head & (capacity - 1)] =3D event; + * // pairs with load-acquire in userspace: + * smp_store_release(&page->data_head, data_head + 1); + * + * Userspace (consumer): + * // pairs with store-release in kernel: + * head =3D __atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE); + * for (tail =3D page->data_tail; tail !=3D head; tail++) + * handle(&data[tail & (capacity - 1)]); + * __atomic_store_n(&page->data_tail, tail, __ATOMIC_RELEASE); + * + * @data_head and @data_tail are monotonically increasing __u32 counters + * in units of records. Unsigned 32-bit wrap-around is handled correctly + * by modular arithmetic; the ring is full when + * (data_head - data_tail) =3D=3D capacity. + * + * When the ring is full the kernel drops the incoming record and incremen= ts + * @dropped. The consumer should check @dropped periodically to detect lo= ss. + * + * read() and mmap() share the same ring buffer. Do not use both + * simultaneously on the same fd. + * + * @data_head: Next write slot index. Updated by the kernel with + * store-release ordering. Read by userspace with load-acqu= ire. + * @data_tail: Next read slot index. Updated by userspace. Read by the + * kernel to detect overflow. + * @capacity: Actual ring capacity in records (power of 2). Written on= ce + * by the kernel at mmap time; read-only for userspace there= after. + * @version: Ring buffer ABI version; currently 1. + * @data_offset: Byte offset from the mmap base to the data array. + * Always equal to sysconf(_SC_PAGESIZE) on the running kern= el. + * @record_size: sizeof(struct tlob_event) as seen by the kernel. Verify + * this matches userspace's sizeof before indexing the array. + * @dropped: Number of events dropped because the ring was full. + * Monotonically increasing; read with __ATOMIC_RELAXED. + */ +struct tlob_mmap_page { + __u32 data_head; + __u32 data_tail; + __u32 capacity; + __u32 version; + __u32 data_offset; + __u32 record_size; + __u64 dropped; +}; + +/* + * TLOB_IOCTL_TRACE_START - begin monitoring the calling task. + * + * Arms a per-task hrtimer for threshold_us microseconds. If args.notify_= fd + * is >=3D 0, a tlob_event record is pushed into that fd's ring buffer on + * violation in addition to the tlob_budget_exceeded ftrace event. + * args.notify_fd =3D=3D -1 disables fd notification. + * + * Violation records are consumed by read() on the notify_fd (blocking or + * non-blocking depending on O_NONBLOCK). On violation, TLOB_IOCTL_TRACE_= STOP + * also returns -EOVERFLOW regardless of whether notify_fd is set. + * + * args.flags must be 0. + */ +#define TLOB_IOCTL_TRACE_START _IOW(RV_IOC_MAGIC, 0x01, struct tlob_start= _args) + +/* + * TLOB_IOCTL_TRACE_STOP - end monitoring the calling task. + * + * Returns 0 if within budget, -EOVERFLOW if the budget was exceeded. + */ +#define TLOB_IOCTL_TRACE_STOP _IO(RV_IOC_MAGIC, 0x02) + +#endif /* _UAPI_LINUX_RV_H */ diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig index 5b4be87ba..227573cda 100644 --- a/kernel/trace/rv/Kconfig +++ b/kernel/trace/rv/Kconfig @@ -65,6 +65,7 @@ source "kernel/trace/rv/monitors/pagefault/Kconfig" source "kernel/trace/rv/monitors/sleep/Kconfig" # Add new rtapp monitors here =20 +source "kernel/trace/rv/monitors/tlob/Kconfig" # Add new monitors here =20 config RV_REACTORS @@ -93,3 +94,19 @@ config RV_REACT_PANIC help Enables the panic reactor. The panic reactor emits a printk() message if an exception is found and panic()s the system. + +config RV_CHARDEV + bool "RV ioctl interface via /dev/rv" + depends on RV + default n + help + Register a /dev/rv misc device that exposes an ioctl interface + for RV monitor self-instrumentation. All RV monitors share the + single device node; ioctl numbers encode the monitor identity. + + When enabled, user-space programs can open /dev/rv and use + monitor-specific ioctl commands to bracket code regions they + want the kernel RV subsystem to observe. + + Say Y here if you want to use the tlob self-instrumentation + ioctl interface; otherwise say N. diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile index 750e4ad6f..cc3781a3b 100644 --- a/kernel/trace/rv/Makefile +++ b/kernel/trace/rv/Makefile @@ -3,6 +3,7 @@ ccflags-y +=3D -I $(src) # needed for trace events =20 obj-$(CONFIG_RV) +=3D rv.o +obj-$(CONFIG_RV_CHARDEV) +=3D rv_dev.o obj-$(CONFIG_RV_MON_WIP) +=3D monitors/wip/wip.o obj-$(CONFIG_RV_MON_WWNR) +=3D monitors/wwnr/wwnr.o obj-$(CONFIG_RV_MON_SCHED) +=3D monitors/sched/sched.o @@ -17,6 +18,7 @@ obj-$(CONFIG_RV_MON_STS) +=3D monitors/sts/sts.o obj-$(CONFIG_RV_MON_NRP) +=3D monitors/nrp/nrp.o obj-$(CONFIG_RV_MON_SSSW) +=3D monitors/sssw/sssw.o obj-$(CONFIG_RV_MON_OPID) +=3D monitors/opid/opid.o +obj-$(CONFIG_RV_MON_TLOB) +=3D monitors/tlob/tlob.o # Add new monitors here obj-$(CONFIG_RV_REACTORS) +=3D rv_reactors.o obj-$(CONFIG_RV_REACT_PRINTK) +=3D reactor_printk.o diff --git a/kernel/trace/rv/monitors/tlob/Kconfig b/kernel/trace/rv/monito= rs/tlob/Kconfig new file mode 100644 index 000000000..010237480 --- /dev/null +++ b/kernel/trace/rv/monitors/tlob/Kconfig @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +config RV_MON_TLOB + depends on RV + depends on UPROBES + select DA_MON_EVENTS_ID + bool "tlob monitor" + help + Enable the tlob (task latency over budget) monitor. This monitor + tracks the elapsed time (CLOCK_MONOTONIC) of a marked code path within a + task (including both on-CPU and off-CPU time) and reports a + violation when the elapsed time exceeds a configurable budget + threshold. + + The monitor implements a three-state deterministic automaton. + States: unmonitored, on_cpu, off_cpu. + Key transitions: + unmonitored --(trace_start)--> on_cpu + on_cpu --(switch_out)--> off_cpu + off_cpu --(switch_in)--> on_cpu + on_cpu --(trace_stop)--> unmonitored + off_cpu --(trace_stop)--> unmonitored + on_cpu --(budget_expired)--> unmonitored + off_cpu --(budget_expired)--> unmonitored + + External configuration is done via the tracefs "monitor" file: + echo pid:threshold_us:binary:offset_start:offset_stop > .../rv/monito= rs/tlob/monitor + echo -pid > .../rv/monitors/tlob/monitor (remove task) + cat .../rv/monitors/tlob/monitor (list tasks) + + The uprobe binding places two plain entry uprobes at offset_start and + offset_stop in the binary; these trigger tlob_start_task() and + tlob_stop_task() respectively. Using two entry uprobes (rather than a + uretprobe) means that a mistyped offset can never corrupt the call + stack; the worst outcome is a missed stop, which causes the hrtimer to + fire and report a budget violation. + + Violation events are delivered via a lock-free mmap ring buffer on + /dev/rv (enabled by CONFIG_RV_CHARDEV). The consumer mmap()s the + device, reads records from the data array using the head/tail indices + in the control page, and advances data_tail when done. + + For self-instrumentation, use TLOB_IOCTL_TRACE_START / + TLOB_IOCTL_TRACE_STOP via the /dev/rv misc device (enabled by + CONFIG_RV_CHARDEV). + + Up to TLOB_MAX_MONITORED tasks may be monitored simultaneously. + + For further information, see: + Documentation/trace/rv/monitor_tlob.rst + diff --git a/kernel/trace/rv/monitors/tlob/tlob.c b/kernel/trace/rv/monitor= s/tlob/tlob.c new file mode 100644 index 000000000..a6e474025 --- /dev/null +++ b/kernel/trace/rv/monitors/tlob/tlob.c @@ -0,0 +1,986 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * tlob: task latency over budget monitor + * + * Track the elapsed wall-clock time of a marked code path and detect when + * a monitored task exceeds its per-task latency budget. CLOCK_MONOTONIC + * is used so both on-CPU and off-CPU time count toward the budget. + * + * Per-task state is maintained in a spinlock-protected hash table. A + * one-shot hrtimer fires at the deadline; if the task has not called + * trace_stop by then, a violation is recorded. + * + * Up to TLOB_MAX_MONITORED tasks may be tracked simultaneously. + * + * Copyright (C) 2026 Wen Yang + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* rv_interface_lock is defined in kernel/trace/rv/rv.c */ +extern struct mutex rv_interface_lock; + +#define MODULE_NAME "tlob" + +#include +#include + +#define RV_MON_TYPE RV_MON_PER_TASK +#include "tlob.h" +#include + +/* Hash table size; must be a power of two. */ +#define TLOB_HTABLE_BITS 6 +#define TLOB_HTABLE_SIZE (1 << TLOB_HTABLE_BITS) + +/* Maximum binary path length for uprobe binding. */ +#define TLOB_MAX_PATH 256 + +/* Per-task latency monitoring state. */ +struct tlob_task_state { + struct hlist_node hlist; + struct task_struct *task; + u64 threshold_us; + u64 tag; + struct hrtimer deadline_timer; + int canceled; /* protected by entry_lock */ + struct file *notify_file; /* NULL or held reference */ + + /* + * entry_lock serialises the mutable accounting fields below. + * Lock order: tlob_table_lock -> entry_lock (never reverse). + */ + raw_spinlock_t entry_lock; + u64 on_cpu_us; + u64 off_cpu_us; + ktime_t last_ts; + u32 switches; + u8 da_state; + + struct rcu_head rcu; /* for call_rcu() teardown */ +}; + +/* Per-uprobe-binding state: a start + stop probe pair for one binary regi= on. */ +struct tlob_uprobe_binding { + struct list_head list; + u64 threshold_us; + struct path path; + char binpath[TLOB_MAX_PATH]; /* canonical path for read/remove */ + loff_t offset_start; + loff_t offset_stop; + struct uprobe_consumer entry_uc; + struct uprobe_consumer stop_uc; + struct uprobe *entry_uprobe; + struct uprobe *stop_uprobe; +}; + +/* Object pool for tlob_task_state. */ +static struct kmem_cache *tlob_state_cache; + +/* Hash table and lock protecting table structure (insert/delete/canceled)= . */ +static struct hlist_head tlob_htable[TLOB_HTABLE_SIZE]; +static DEFINE_RAW_SPINLOCK(tlob_table_lock); +static atomic_t tlob_num_monitored =3D ATOMIC_INIT(0); + +/* Uprobe binding list; protected by tlob_uprobe_mutex. */ +static LIST_HEAD(tlob_uprobe_list); +static DEFINE_MUTEX(tlob_uprobe_mutex); + +/* Forward declaration */ +static enum hrtimer_restart tlob_deadline_timer_fn(struct hrtimer *timer); + +/* Hash table helpers */ + +static unsigned int tlob_hash_task(const struct task_struct *task) +{ + return hash_ptr((void *)task, TLOB_HTABLE_BITS); +} + +/* + * tlob_find_rcu - look up per-task state. + * Must be called under rcu_read_lock() or with tlob_table_lock held. + */ +static struct tlob_task_state *tlob_find_rcu(struct task_struct *task) +{ + struct tlob_task_state *ws; + unsigned int h =3D tlob_hash_task(task); + + hlist_for_each_entry_rcu(ws, &tlob_htable[h], hlist, + lockdep_is_held(&tlob_table_lock)) + if (ws->task =3D=3D task) + return ws; + return NULL; +} + +/* Allocate and initialise a new per-task state entry. */ +static struct tlob_task_state *tlob_alloc(struct task_struct *task, + u64 threshold_us, u64 tag) +{ + struct tlob_task_state *ws; + + ws =3D kmem_cache_zalloc(tlob_state_cache, GFP_ATOMIC); + if (!ws) + return NULL; + + ws->task =3D task; + get_task_struct(task); + ws->threshold_us =3D threshold_us; + ws->tag =3D tag; + ws->last_ts =3D ktime_get(); + ws->da_state =3D on_cpu_tlob; + raw_spin_lock_init(&ws->entry_lock); + hrtimer_setup(&ws->deadline_timer, tlob_deadline_timer_fn, + CLOCK_MONOTONIC, HRTIMER_MODE_REL); + return ws; +} + +/* RCU callback: free the slab once no readers remain. */ +static void tlob_free_rcu_slab(struct rcu_head *head) +{ + struct tlob_task_state *ws =3D + container_of(head, struct tlob_task_state, rcu); + kmem_cache_free(tlob_state_cache, ws); +} + +/* Arm the one-shot deadline timer for threshold_us microseconds. */ +static void tlob_arm_deadline(struct tlob_task_state *ws) +{ + hrtimer_start(&ws->deadline_timer, + ns_to_ktime(ws->threshold_us * NSEC_PER_USEC), + HRTIMER_MODE_REL); +} + +/* + * Push a violation record into a monitor fd's ring buffer (softirq contex= t). + * Drop-new policy: discard incoming record when full. smp_store_release = on + * data_head pairs with smp_load_acquire in the consumer. + */ +static void tlob_event_push(struct rv_file_priv *priv, + const struct tlob_event *info) +{ + struct tlob_ring *ring =3D &priv->ring; + unsigned long flags; + u32 head, tail; + + spin_lock_irqsave(&ring->lock, flags); + + head =3D ring->page->data_head; + tail =3D READ_ONCE(ring->page->data_tail); + + if (head - tail > ring->mask) { + /* Ring full: drop incoming record. */ + ring->page->dropped++; + spin_unlock_irqrestore(&ring->lock, flags); + return; + } + + ring->data[head & ring->mask] =3D *info; + /* pairs with smp_load_acquire() in the consumer */ + smp_store_release(&ring->page->data_head, head + 1); + + spin_unlock_irqrestore(&ring->lock, flags); + + wake_up_interruptible_poll(&priv->waitq, EPOLLIN | EPOLLRDNORM); +} + +#if IS_ENABLED(CONFIG_KUNIT) +void tlob_event_push_kunit(struct rv_file_priv *priv, + const struct tlob_event *info) +{ + tlob_event_push(priv, info); +} +EXPORT_SYMBOL_IF_KUNIT(tlob_event_push_kunit); +#endif /* CONFIG_KUNIT */ + +/* + * Budget exceeded: remove the entry, record the violation, and inject + * budget_expired into the DA. + * + * Lock order: tlob_table_lock -> entry_lock. tlob_stop_task() sets + * ws->canceled under both locks; if we see it here the stop path owns cle= anup. + * fput/put_task_struct are done before call_rcu(); the RCU callback only + * reclaims the slab. + */ +static enum hrtimer_restart tlob_deadline_timer_fn(struct hrtimer *timer) +{ + struct tlob_task_state *ws =3D + container_of(timer, struct tlob_task_state, deadline_timer); + struct tlob_event info =3D {}; + struct file *notify_file; + struct task_struct *task; + unsigned long flags; + /* snapshots taken under entry_lock */ + u64 on_cpu_us, off_cpu_us, threshold_us, tag; + u32 switches; + bool on_cpu; + bool push_event =3D false; + + raw_spin_lock_irqsave(&tlob_table_lock, flags); + /* stop path sets canceled under both locks; if set it owns cleanup */ + if (ws->canceled) { + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + return HRTIMER_NORESTART; + } + + /* Finalize accounting and snapshot all fields under entry_lock. */ + raw_spin_lock(&ws->entry_lock); + + { + ktime_t now =3D ktime_get(); + u64 delta_us =3D ktime_to_us(ktime_sub(now, ws->last_ts)); + + if (ws->da_state =3D=3D on_cpu_tlob) + ws->on_cpu_us +=3D delta_us; + else + ws->off_cpu_us +=3D delta_us; + } + + ws->canceled =3D 1; + on_cpu_us =3D ws->on_cpu_us; + off_cpu_us =3D ws->off_cpu_us; + threshold_us =3D ws->threshold_us; + tag =3D ws->tag; + switches =3D ws->switches; + on_cpu =3D (ws->da_state =3D=3D on_cpu_tlob); + notify_file =3D ws->notify_file; + if (notify_file) { + info.tid =3D task_pid_vnr(ws->task); + info.threshold_us =3D threshold_us; + info.on_cpu_us =3D on_cpu_us; + info.off_cpu_us =3D off_cpu_us; + info.switches =3D switches; + info.state =3D on_cpu ? 1 : 0; + info.tag =3D tag; + push_event =3D true; + } + + raw_spin_unlock(&ws->entry_lock); + + hlist_del_rcu(&ws->hlist); + atomic_dec(&tlob_num_monitored); + /* + * Hold a reference so task remains valid across da_handle_event() + * after we drop tlob_table_lock. + */ + task =3D ws->task; + get_task_struct(task); + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + + /* + * Both locks are now released; ws is exclusively owned (removed from + * the hash table with canceled=3D1). Emit the tracepoint and push the + * violation record. + */ + trace_tlob_budget_exceeded(ws->task, threshold_us, on_cpu_us, + off_cpu_us, switches, on_cpu, tag); + + if (push_event) { + struct rv_file_priv *priv =3D notify_file->private_data; + + if (priv) + tlob_event_push(priv, &info); + } + + da_handle_event(task, budget_expired_tlob); + + if (notify_file) + fput(notify_file); /* ref from fget() at TRACE_START */ + put_task_struct(ws->task); /* ref from tlob_alloc() */ + put_task_struct(task); /* extra ref from get_task_struct() above */ + call_rcu(&ws->rcu, tlob_free_rcu_slab); + return HRTIMER_NORESTART; +} + +/* Tracepoint handlers */ + +/* + * handle_sched_switch - advance the DA and accumulate on/off-CPU time. + * + * RCU read-side for lock-free lookup; entry_lock for per-task accounting. + * da_handle_event() is called after rcu_read_unlock() to avoid holding the + * read-side critical section across the RV framework. + */ +static void handle_sched_switch(void *data, bool preempt, + struct task_struct *prev, + struct task_struct *next, + unsigned int prev_state) +{ + struct tlob_task_state *ws; + unsigned long flags; + bool do_prev =3D false, do_next =3D false; + ktime_t now; + + rcu_read_lock(); + + ws =3D tlob_find_rcu(prev); + if (ws) { + raw_spin_lock_irqsave(&ws->entry_lock, flags); + if (!ws->canceled) { + now =3D ktime_get(); + ws->on_cpu_us +=3D ktime_to_us(ktime_sub(now, ws->last_ts)); + ws->last_ts =3D now; + ws->switches++; + ws->da_state =3D off_cpu_tlob; + do_prev =3D true; + } + raw_spin_unlock_irqrestore(&ws->entry_lock, flags); + } + + ws =3D tlob_find_rcu(next); + if (ws) { + raw_spin_lock_irqsave(&ws->entry_lock, flags); + if (!ws->canceled) { + now =3D ktime_get(); + ws->off_cpu_us +=3D ktime_to_us(ktime_sub(now, ws->last_ts)); + ws->last_ts =3D now; + ws->da_state =3D on_cpu_tlob; + do_next =3D true; + } + raw_spin_unlock_irqrestore(&ws->entry_lock, flags); + } + + rcu_read_unlock(); + + if (do_prev) + da_handle_event(prev, switch_out_tlob); + if (do_next) + da_handle_event(next, switch_in_tlob); +} + +static void handle_sched_wakeup(void *data, struct task_struct *p) +{ + struct tlob_task_state *ws; + unsigned long flags; + bool found =3D false; + + rcu_read_lock(); + ws =3D tlob_find_rcu(p); + if (ws) { + raw_spin_lock_irqsave(&ws->entry_lock, flags); + found =3D !ws->canceled; + raw_spin_unlock_irqrestore(&ws->entry_lock, flags); + } + rcu_read_unlock(); + + if (found) + da_handle_event(p, sched_wakeup_tlob); +} + +/* ----------------------------------------------------------------------- + * Core start/stop helpers (also called from rv_dev.c) + * ----------------------------------------------------------------------- + */ + +/* + * __tlob_insert - insert @ws into the hash table and arm its deadline tim= er. + * + * Re-checks for duplicates and capacity under tlob_table_lock; the caller + * may have done a lock-free pre-check before allocating @ws. On failure = @ws + * is freed directly (never in table, so no call_rcu needed). + */ +static int __tlob_insert(struct task_struct *task, struct tlob_task_state = *ws) +{ + unsigned int h; + unsigned long flags; + + raw_spin_lock_irqsave(&tlob_table_lock, flags); + if (tlob_find_rcu(task)) { + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + if (ws->notify_file) + fput(ws->notify_file); + put_task_struct(ws->task); + kmem_cache_free(tlob_state_cache, ws); + return -EEXIST; + } + if (atomic_read(&tlob_num_monitored) >=3D TLOB_MAX_MONITORED) { + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + if (ws->notify_file) + fput(ws->notify_file); + put_task_struct(ws->task); + kmem_cache_free(tlob_state_cache, ws); + return -ENOSPC; + } + h =3D tlob_hash_task(task); + hlist_add_head_rcu(&ws->hlist, &tlob_htable[h]); + atomic_inc(&tlob_num_monitored); + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + + da_handle_start_run_event(task, trace_start_tlob); + tlob_arm_deadline(ws); + return 0; +} + +/** + * tlob_start_task - begin monitoring @task with latency budget @threshold= _us. + * + * @notify_file: /dev/rv fd whose ring buffer receives a tlob_event on + * violation; caller transfers the fget() reference to tlob.= c. + * Pass NULL for synchronous mode (violations only via + * TRACE_STOP return value and the tlob_budget_exceeded even= t). + * + * Returns 0, -ENODEV, -EEXIST, -ENOSPC, or -ENOMEM. On failure the caller + * retains responsibility for any @notify_file reference. + */ +int tlob_start_task(struct task_struct *task, u64 threshold_us, + struct file *notify_file, u64 tag) +{ + struct tlob_task_state *ws; + unsigned long flags; + + if (!tlob_state_cache) + return -ENODEV; + + if (threshold_us > (u64)KTIME_MAX / NSEC_PER_USEC) + return -ERANGE; + + /* Quick pre-check before allocation. */ + raw_spin_lock_irqsave(&tlob_table_lock, flags); + if (tlob_find_rcu(task)) { + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + return -EEXIST; + } + if (atomic_read(&tlob_num_monitored) >=3D TLOB_MAX_MONITORED) { + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + return -ENOSPC; + } + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + + ws =3D tlob_alloc(task, threshold_us, tag); + if (!ws) + return -ENOMEM; + + ws->notify_file =3D notify_file; + return __tlob_insert(task, ws); +} +EXPORT_SYMBOL_GPL(tlob_start_task); + +/** + * tlob_stop_task - stop monitoring @task before the deadline fires. + * + * Sets canceled under entry_lock (inside tlob_table_lock) before calling + * hrtimer_cancel(), racing safely with the timer callback. + * + * Returns 0 if within budget, -ESRCH if the entry is gone (deadline alrea= dy + * fired, or TRACE_START was never called). + */ +int tlob_stop_task(struct task_struct *task) +{ + struct tlob_task_state *ws; + struct file *notify_file; + unsigned long flags; + + raw_spin_lock_irqsave(&tlob_table_lock, flags); + ws =3D tlob_find_rcu(task); + if (!ws) { + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + return -ESRCH; + } + + /* Prevent handle_sched_switch from updating accounting after removal. */ + raw_spin_lock(&ws->entry_lock); + ws->canceled =3D 1; + raw_spin_unlock(&ws->entry_lock); + + hlist_del_rcu(&ws->hlist); + atomic_dec(&tlob_num_monitored); + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + + hrtimer_cancel(&ws->deadline_timer); + + da_handle_event(task, trace_stop_tlob); + + notify_file =3D ws->notify_file; + if (notify_file) + fput(notify_file); + put_task_struct(ws->task); + call_rcu(&ws->rcu, tlob_free_rcu_slab); + + return 0; +} +EXPORT_SYMBOL_GPL(tlob_stop_task); + +/* Stop monitoring all tracked tasks; called on monitor disable. */ +static void tlob_stop_all(void) +{ + struct tlob_task_state *batch[TLOB_MAX_MONITORED]; + struct tlob_task_state *ws; + struct hlist_node *tmp; + unsigned long flags; + int n =3D 0, i; + + raw_spin_lock_irqsave(&tlob_table_lock, flags); + for (i =3D 0; i < TLOB_HTABLE_SIZE; i++) { + hlist_for_each_entry_safe(ws, tmp, &tlob_htable[i], hlist) { + raw_spin_lock(&ws->entry_lock); + ws->canceled =3D 1; + raw_spin_unlock(&ws->entry_lock); + hlist_del_rcu(&ws->hlist); + atomic_dec(&tlob_num_monitored); + if (n < TLOB_MAX_MONITORED) + batch[n++] =3D ws; + } + } + raw_spin_unlock_irqrestore(&tlob_table_lock, flags); + + for (i =3D 0; i < n; i++) { + ws =3D batch[i]; + hrtimer_cancel(&ws->deadline_timer); + da_handle_event(ws->task, trace_stop_tlob); + if (ws->notify_file) + fput(ws->notify_file); + put_task_struct(ws->task); + call_rcu(&ws->rcu, tlob_free_rcu_slab); + } +} + +/* uprobe binding helpers */ + +static int tlob_uprobe_entry_handler(struct uprobe_consumer *uc, + struct pt_regs *regs, __u64 *data) +{ + struct tlob_uprobe_binding *b =3D + container_of(uc, struct tlob_uprobe_binding, entry_uc); + + tlob_start_task(current, b->threshold_us, NULL, (u64)b->offset_start); + return 0; +} + +static int tlob_uprobe_stop_handler(struct uprobe_consumer *uc, + struct pt_regs *regs, __u64 *data) +{ + tlob_stop_task(current); + return 0; +} + +/* + * Register start + stop entry uprobes for a binding. + * Both are plain entry uprobes (no uretprobe), so a wrong offset never + * corrupts the call stack; the worst outcome is a missed stop (hrtimer + * fires and reports a budget violation). + * Called with tlob_uprobe_mutex held. + */ +static int tlob_add_uprobe(u64 threshold_us, const char *binpath, + loff_t offset_start, loff_t offset_stop) +{ + struct tlob_uprobe_binding *b, *tmp_b; + char pathbuf[TLOB_MAX_PATH]; + struct inode *inode; + char *canon; + int ret; + + b =3D kzalloc(sizeof(*b), GFP_KERNEL); + if (!b) + return -ENOMEM; + + if (binpath[0] !=3D '/') { + kfree(b); + return -EINVAL; + } + + b->threshold_us =3D threshold_us; + b->offset_start =3D offset_start; + b->offset_stop =3D offset_stop; + + ret =3D kern_path(binpath, LOOKUP_FOLLOW, &b->path); + if (ret) + goto err_free; + + if (!d_is_reg(b->path.dentry)) { + ret =3D -EINVAL; + goto err_path; + } + + /* Reject duplicate start offset for the same binary. */ + list_for_each_entry(tmp_b, &tlob_uprobe_list, list) { + if (tmp_b->offset_start =3D=3D offset_start && + tmp_b->path.dentry =3D=3D b->path.dentry) { + ret =3D -EEXIST; + goto err_path; + } + } + + /* Store canonical path for read-back and removal matching. */ + canon =3D d_path(&b->path, pathbuf, sizeof(pathbuf)); + if (IS_ERR(canon)) { + ret =3D PTR_ERR(canon); + goto err_path; + } + strscpy(b->binpath, canon, sizeof(b->binpath)); + + b->entry_uc.handler =3D tlob_uprobe_entry_handler; + b->stop_uc.handler =3D tlob_uprobe_stop_handler; + + inode =3D d_real_inode(b->path.dentry); + + b->entry_uprobe =3D uprobe_register(inode, offset_start, 0, &b->entry_uc); + if (IS_ERR(b->entry_uprobe)) { + ret =3D PTR_ERR(b->entry_uprobe); + b->entry_uprobe =3D NULL; + goto err_path; + } + + b->stop_uprobe =3D uprobe_register(inode, offset_stop, 0, &b->stop_uc); + if (IS_ERR(b->stop_uprobe)) { + ret =3D PTR_ERR(b->stop_uprobe); + b->stop_uprobe =3D NULL; + goto err_entry; + } + + list_add_tail(&b->list, &tlob_uprobe_list); + return 0; + +err_entry: + uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc); + uprobe_unregister_sync(); +err_path: + path_put(&b->path); +err_free: + kfree(b); + return ret; +} + +/* + * Remove the uprobe binding for (offset_start, binpath). + * binpath is resolved to a dentry for comparison so symlinks are handled + * correctly. Called with tlob_uprobe_mutex held. + */ +static void tlob_remove_uprobe_by_key(loff_t offset_start, const char *bin= path) +{ + struct tlob_uprobe_binding *b, *tmp; + struct path remove_path; + + if (kern_path(binpath, LOOKUP_FOLLOW, &remove_path)) + return; + + list_for_each_entry_safe(b, tmp, &tlob_uprobe_list, list) { + if (b->offset_start !=3D offset_start) + continue; + if (b->path.dentry !=3D remove_path.dentry) + continue; + uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc); + uprobe_unregister_nosync(b->stop_uprobe, &b->stop_uc); + list_del(&b->list); + uprobe_unregister_sync(); + path_put(&b->path); + kfree(b); + break; + } + + path_put(&remove_path); +} + +/* Unregister all uprobe bindings; called from disable_tlob(). */ +static void tlob_remove_all_uprobes(void) +{ + struct tlob_uprobe_binding *b, *tmp; + + mutex_lock(&tlob_uprobe_mutex); + list_for_each_entry_safe(b, tmp, &tlob_uprobe_list, list) { + uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc); + uprobe_unregister_nosync(b->stop_uprobe, &b->stop_uc); + list_del(&b->list); + path_put(&b->path); + kfree(b); + } + mutex_unlock(&tlob_uprobe_mutex); + uprobe_unregister_sync(); +} + +/* + * tracefs "monitor" file + * + * Read: one "threshold_us:0xoffset_start:0xoffset_stop:binary_path\n" + * line per registered uprobe binding. + * Write: "threshold_us:offset_start:offset_stop:binary_path" - add uprobe= binding + * "-offset_start:binary_path" - remove upr= obe binding + */ + +static ssize_t tlob_monitor_read(struct file *file, + char __user *ubuf, + size_t count, loff_t *ppos) +{ + /* pid(10) + threshold(20) + 2 offsets(2*18) + path(256) + delimiters */ + const int line_sz =3D TLOB_MAX_PATH + 72; + struct tlob_uprobe_binding *b; + char *buf, *p; + int n =3D 0, buf_sz, pos =3D 0; + ssize_t ret; + + mutex_lock(&tlob_uprobe_mutex); + list_for_each_entry(b, &tlob_uprobe_list, list) + n++; + mutex_unlock(&tlob_uprobe_mutex); + + buf_sz =3D (n ? n : 1) * line_sz + 1; + buf =3D kmalloc(buf_sz, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + mutex_lock(&tlob_uprobe_mutex); + list_for_each_entry(b, &tlob_uprobe_list, list) { + p =3D b->binpath; + pos +=3D scnprintf(buf + pos, buf_sz - pos, + "%llu:0x%llx:0x%llx:%s\n", + b->threshold_us, + (unsigned long long)b->offset_start, + (unsigned long long)b->offset_stop, + p); + } + mutex_unlock(&tlob_uprobe_mutex); + + ret =3D simple_read_from_buffer(ubuf, count, ppos, buf, pos); + kfree(buf); + return ret; +} + +/* + * Parse "threshold_us:offset_start:offset_stop:binary_path". + * binary_path comes last so it may freely contain ':'. + * Returns 0 on success. + */ +VISIBLE_IF_KUNIT int tlob_parse_uprobe_line(char *buf, u64 *thr_out, + char **path_out, + loff_t *start_out, loff_t *stop_out) +{ + unsigned long long thr; + long long start, stop; + int n =3D 0; + + /* + * %llu : decimal-only (microseconds) + * %lli : auto-base, accepts 0x-prefixed hex for offsets + * %n : records the byte offset of the first path character + */ + if (sscanf(buf, "%llu:%lli:%lli:%n", &thr, &start, &stop, &n) !=3D 3) + return -EINVAL; + if (thr =3D=3D 0 || n =3D=3D 0 || buf[n] =3D=3D '\0') + return -EINVAL; + if (start < 0 || stop < 0) + return -EINVAL; + + *thr_out =3D thr; + *start_out =3D start; + *stop_out =3D stop; + *path_out =3D buf + n; + return 0; +} + +static ssize_t tlob_monitor_write(struct file *file, + const char __user *ubuf, + size_t count, loff_t *ppos) +{ + char buf[TLOB_MAX_PATH + 64]; + loff_t offset_start, offset_stop; + u64 threshold_us; + char *binpath; + int ret; + + if (count >=3D sizeof(buf)) + return -EINVAL; + if (copy_from_user(buf, ubuf, count)) + return -EFAULT; + buf[count] =3D '\0'; + + if (count > 0 && buf[count - 1] =3D=3D '\n') + buf[count - 1] =3D '\0'; + + /* Remove request: "-offset_start:binary_path" */ + if (buf[0] =3D=3D '-') { + long long off; + int n =3D 0; + + if (sscanf(buf + 1, "%lli:%n", &off, &n) !=3D 1 || n =3D=3D 0) + return -EINVAL; + binpath =3D buf + 1 + n; + if (binpath[0] !=3D '/') + return -EINVAL; + + mutex_lock(&tlob_uprobe_mutex); + tlob_remove_uprobe_by_key((loff_t)off, binpath); + mutex_unlock(&tlob_uprobe_mutex); + + return (ssize_t)count; + } + + /* + * Uprobe binding: "threshold_us:offset_start:offset_stop:binary_path" + * binpath points into buf at the start of the path field. + */ + ret =3D tlob_parse_uprobe_line(buf, &threshold_us, + &binpath, &offset_start, &offset_stop); + if (ret) + return ret; + + mutex_lock(&tlob_uprobe_mutex); + ret =3D tlob_add_uprobe(threshold_us, binpath, offset_start, offset_stop); + mutex_unlock(&tlob_uprobe_mutex); + return ret ? ret : (ssize_t)count; +} + +static const struct file_operations tlob_monitor_fops =3D { + .open =3D simple_open, + .read =3D tlob_monitor_read, + .write =3D tlob_monitor_write, + .llseek =3D noop_llseek, +}; + +/* + * __tlob_init_monitor / __tlob_destroy_monitor - called with rv_interface= _lock + * held (required by da_monitor_init/destroy via rv_get/put_task_monitor_s= lot). + */ +static int __tlob_init_monitor(void) +{ + int i, retval; + + tlob_state_cache =3D kmem_cache_create("tlob_task_state", + sizeof(struct tlob_task_state), + 0, 0, NULL); + if (!tlob_state_cache) + return -ENOMEM; + + for (i =3D 0; i < TLOB_HTABLE_SIZE; i++) + INIT_HLIST_HEAD(&tlob_htable[i]); + atomic_set(&tlob_num_monitored, 0); + + retval =3D da_monitor_init(); + if (retval) { + kmem_cache_destroy(tlob_state_cache); + tlob_state_cache =3D NULL; + return retval; + } + + rv_this.enabled =3D 1; + return 0; +} + +static void __tlob_destroy_monitor(void) +{ + rv_this.enabled =3D 0; + tlob_stop_all(); + tlob_remove_all_uprobes(); + /* + * Drain pending call_rcu() callbacks from tlob_stop_all() before + * destroying the kmem_cache. + */ + synchronize_rcu(); + da_monitor_destroy(); + kmem_cache_destroy(tlob_state_cache); + tlob_state_cache =3D NULL; +} + +/* + * tlob_init_monitor / tlob_destroy_monitor - KUnit wrappers that acquire + * rv_interface_lock, satisfying the lockdep_assert_held() inside + * rv_get/put_task_monitor_slot(). + */ +VISIBLE_IF_KUNIT int tlob_init_monitor(void) +{ + int ret; + + mutex_lock(&rv_interface_lock); + ret =3D __tlob_init_monitor(); + mutex_unlock(&rv_interface_lock); + return ret; +} +EXPORT_SYMBOL_IF_KUNIT(tlob_init_monitor); + +VISIBLE_IF_KUNIT void tlob_destroy_monitor(void) +{ + mutex_lock(&rv_interface_lock); + __tlob_destroy_monitor(); + mutex_unlock(&rv_interface_lock); +} +EXPORT_SYMBOL_IF_KUNIT(tlob_destroy_monitor); + +VISIBLE_IF_KUNIT int tlob_enable_hooks(void) +{ + rv_attach_trace_probe("tlob", sched_switch, handle_sched_switch); + rv_attach_trace_probe("tlob", sched_wakeup, handle_sched_wakeup); + return 0; +} +EXPORT_SYMBOL_IF_KUNIT(tlob_enable_hooks); + +VISIBLE_IF_KUNIT void tlob_disable_hooks(void) +{ + rv_detach_trace_probe("tlob", sched_switch, handle_sched_switch); + rv_detach_trace_probe("tlob", sched_wakeup, handle_sched_wakeup); +} +EXPORT_SYMBOL_IF_KUNIT(tlob_disable_hooks); + +/* + * enable_tlob / disable_tlob - called by rv_enable/disable_monitor() which + * already holds rv_interface_lock; call the __ variants directly. + */ +static int enable_tlob(void) +{ + int retval; + + retval =3D __tlob_init_monitor(); + if (retval) + return retval; + + return tlob_enable_hooks(); +} + +static void disable_tlob(void) +{ + tlob_disable_hooks(); + __tlob_destroy_monitor(); +} + +static struct rv_monitor rv_this =3D { + .name =3D "tlob", + .description =3D "Per-task latency-over-budget monitor.", + .enable =3D enable_tlob, + .disable =3D disable_tlob, + .reset =3D da_monitor_reset_all, + .enabled =3D 0, +}; + +static int __init register_tlob(void) +{ + int ret; + + ret =3D rv_register_monitor(&rv_this, NULL); + if (ret) + return ret; + + if (rv_this.root_d) { + tracefs_create_file("monitor", 0644, rv_this.root_d, NULL, + &tlob_monitor_fops); + } + + return 0; +} + +static void __exit unregister_tlob(void) +{ + rv_unregister_monitor(&rv_this); +} + +module_init(register_tlob); +module_exit(unregister_tlob); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Wen Yang "); +MODULE_DESCRIPTION("tlob: task latency over budget per-task monitor."); diff --git a/kernel/trace/rv/monitors/tlob/tlob.h b/kernel/trace/rv/monitor= s/tlob/tlob.h new file mode 100644 index 000000000..3438a6175 --- /dev/null +++ b/kernel/trace/rv/monitors/tlob/tlob.h @@ -0,0 +1,145 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _RV_TLOB_H +#define _RV_TLOB_H + +/* + * C representation of the tlob automaton, generated from tlob.dot via rvg= en + * and extended with tlob_start_task()/tlob_stop_task() declarations. + * For the format description see Documentation/trace/rv/deterministic_aut= omata.rst + */ + +#include +#include + +#define MONITOR_NAME tlob + +enum states_tlob { + unmonitored_tlob, + on_cpu_tlob, + off_cpu_tlob, + state_max_tlob, +}; + +#define INVALID_STATE state_max_tlob + +enum events_tlob { + trace_start_tlob, + switch_in_tlob, + switch_out_tlob, + sched_wakeup_tlob, + trace_stop_tlob, + budget_expired_tlob, + event_max_tlob, +}; + +struct automaton_tlob { + char *state_names[state_max_tlob]; + char *event_names[event_max_tlob]; + unsigned char function[state_max_tlob][event_max_tlob]; + unsigned char initial_state; + bool final_states[state_max_tlob]; +}; + +static const struct automaton_tlob automaton_tlob =3D { + .state_names =3D { + "unmonitored", + "on_cpu", + "off_cpu", + }, + .event_names =3D { + "trace_start", + "switch_in", + "switch_out", + "sched_wakeup", + "trace_stop", + "budget_expired", + }, + .function =3D { + /* unmonitored */ + { + on_cpu_tlob, /* trace_start */ + unmonitored_tlob, /* switch_in */ + unmonitored_tlob, /* switch_out */ + unmonitored_tlob, /* sched_wakeup */ + INVALID_STATE, /* trace_stop */ + INVALID_STATE, /* budget_expired */ + }, + /* on_cpu */ + { + INVALID_STATE, /* trace_start */ + INVALID_STATE, /* switch_in */ + off_cpu_tlob, /* switch_out */ + on_cpu_tlob, /* sched_wakeup */ + unmonitored_tlob, /* trace_stop */ + unmonitored_tlob, /* budget_expired */ + }, + /* off_cpu */ + { + INVALID_STATE, /* trace_start */ + on_cpu_tlob, /* switch_in */ + off_cpu_tlob, /* switch_out */ + off_cpu_tlob, /* sched_wakeup */ + unmonitored_tlob, /* trace_stop */ + unmonitored_tlob, /* budget_expired */ + }, + }, + /* + * final_states: unmonitored is the sole accepting state. + * Violations are recorded via ntf_push and tlob_budget_exceeded. + */ + .initial_state =3D unmonitored_tlob, + .final_states =3D { 1, 0, 0 }, +}; + +/* Exported for use by the RV ioctl layer (rv_dev.c) */ +int tlob_start_task(struct task_struct *task, u64 threshold_us, + struct file *notify_file, u64 tag); +int tlob_stop_task(struct task_struct *task); + +/* Maximum number of concurrently monitored tasks (also used by KUnit). */ +#define TLOB_MAX_MONITORED 64U + +/* + * Ring buffer constants (also published in UAPI for mmap size calculation= ). + */ +#define TLOB_RING_DEFAULT_CAP 64U /* records allocated at open() */ +#define TLOB_RING_MIN_CAP 8U /* minimum accepted by mmap() */ +#define TLOB_RING_MAX_CAP 4096U /* maximum accepted by mmap() */ + +/** + * struct tlob_ring - per-fd mmap-capable violation ring buffer. + * + * Allocated as a contiguous page range at rv_open() time: + * page 0: struct tlob_mmap_page (shared with userspace) + * pages 1-N: struct tlob_event[capacity] + */ +struct tlob_ring { + struct tlob_mmap_page *page; + struct tlob_event *data; + u32 mask; + spinlock_t lock; + unsigned long base; + unsigned int order; +}; + +/** + * struct rv_file_priv - per-fd private data for /dev/rv. + */ +struct rv_file_priv { + struct tlob_ring ring; + wait_queue_head_t waitq; +}; + +#if IS_ENABLED(CONFIG_KUNIT) +int tlob_init_monitor(void); +void tlob_destroy_monitor(void); +int tlob_enable_hooks(void); +void tlob_disable_hooks(void); +void tlob_event_push_kunit(struct rv_file_priv *priv, + const struct tlob_event *info); +int tlob_parse_uprobe_line(char *buf, u64 *thr_out, + char **path_out, + loff_t *start_out, loff_t *stop_out); +#endif /* CONFIG_KUNIT */ + +#endif /* _RV_TLOB_H */ diff --git a/kernel/trace/rv/monitors/tlob/tlob_trace.h b/kernel/trace/rv/m= onitors/tlob/tlob_trace.h new file mode 100644 index 000000000..b08d67776 --- /dev/null +++ b/kernel/trace/rv/monitors/tlob/tlob_trace.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Snippet to be included in rv_trace.h + */ + +#ifdef CONFIG_RV_MON_TLOB +/* + * tlob uses the generic event_da_monitor_id and error_da_monitor_id event + * classes so that both event classes are instantiated. This avoids a + * -Werror=3Dunused-variable warning that the compiler emits when a + * DECLARE_EVENT_CLASS has no corresponding DEFINE_EVENT instance. + * + * The event_tlob tracepoint is defined here but the call-site in + * da_handle_event() is overridden with a no-op macro below so that no + * trace record is emitted on every scheduler context switch. Budget + * violations are reported via the dedicated tlob_budget_exceeded event. + * + * error_tlob IS kept active so that invalid DA transitions (programming + * errors) are still visible in the ftrace ring buffer for debugging. + */ +DEFINE_EVENT(event_da_monitor_id, event_tlob, + TP_PROTO(int id, char *state, char *event, char *next_state, + bool final_state), + TP_ARGS(id, state, event, next_state, final_state)); + +DEFINE_EVENT(error_da_monitor_id, error_tlob, + TP_PROTO(int id, char *state, char *event), + TP_ARGS(id, state, event)); + +/* + * Override the trace_event_tlob() call-site with a no-op after the + * DEFINE_EVENT above has satisfied the event class instantiation + * requirement. The tracepoint symbol itself exists (and can be enabled + * via tracefs) but the automatic call from da_handle_event() is silenced + * to avoid per-context-switch ftrace noise during normal operation. + */ +#undef trace_event_tlob +#define trace_event_tlob(id, state, event, next_state, final_state) \ + do { (void)(id); (void)(state); (void)(event); \ + (void)(next_state); (void)(final_state); } while (0) +#endif /* CONFIG_RV_MON_TLOB */ diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c index ee4e68102..e754e76d5 100644 --- a/kernel/trace/rv/rv.c +++ b/kernel/trace/rv/rv.c @@ -148,6 +148,10 @@ #include #endif =20 +#ifdef CONFIG_RV_MON_TLOB +EXPORT_TRACEPOINT_SYMBOL_GPL(tlob_budget_exceeded); +#endif + #include "rv.h" =20 DEFINE_MUTEX(rv_interface_lock); diff --git a/kernel/trace/rv/rv_dev.c b/kernel/trace/rv/rv_dev.c new file mode 100644 index 000000000..a052f3203 --- /dev/null +++ b/kernel/trace/rv/rv_dev.c @@ -0,0 +1,602 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * rv_dev.c - /dev/rv misc device for RV monitor self-instrumentation + * + * A single misc device (MISC_DYNAMIC_MINOR) serves all RV monitors. + * ioctl numbers encode the monitor identity: + * + * 0x01 - 0x1F tlob (task latency over budget) + * 0x20 - 0x3F reserved + * + * Each monitor exports tlob_start_task() / tlob_stop_task() which are + * called here. The calling task is identified by current. + * + * Magic: RV_IOC_MAGIC (0xB9), defined in include/uapi/linux/rv.h + * + * Per-fd private data (rv_file_priv) + * ------------------------------------ + * Every open() of /dev/rv allocates an rv_file_priv (defined in tlob.h). + * When TLOB_IOCTL_TRACE_START is called with args.notify_fd >=3D 0, viola= tions + * are pushed as tlob_event records into that fd's per-fd ring buffer (tlo= b_ring) + * and its poll/epoll waitqueue is woken. + * + * Consumers drain records with read() on the notify_fd; read() blocks unt= il + * at least one record is available (unless O_NONBLOCK is set). + * + * Per-thread "started" tracking (tlob_task_handle) + * ------------------------------------------------- + * tlob_stop_task() returns -ESRCH in two distinct situations: + * + * (a) The deadline timer already fired and removed the tlob hash-table + * entry before TRACE_STOP arrived -> budget was exceeded -> -EOVERF= LOW + * + * (b) TRACE_START was never called for this thread -> programming error + * -> -ESRCH + * + * To distinguish them, rv_dev.c maintains a lightweight hash table + * (tlob_handles) that records a tlob_task_handle for every task_struct * + * for which a successful TLOB_IOCTL_TRACE_START has been + * issued but the corresponding TLOB_IOCTL_TRACE_STOP has not yet arrived. + * + * tlob_task_handle is a thin "session ticket" -- it carries only the + * task pointer and the owning file descriptor. The heavy per-task state + * (hrtimer, DA state, threshold) lives in tlob_task_state inside tlob.c. + * + * The table is keyed on task_struct * (same key as tlob.c), protected + * by tlob_handles_lock (spinlock, irq-safe). No get_task_struct() + * refcount is needed here because tlob.c already holds a reference for + * each live entry. + * + * Multiple threads may share the same fd. Each thread has its own + * tlob_task_handle in the table, so concurrent TRACE_START / TRACE_STOP + * calls from different threads do not interfere. + * + * The fd release path (rv_release) calls tlob_stop_task() for every + * handle in tlob_handles that belongs to the closing fd, ensuring cleanup + * even if the user forgets to call TRACE_STOP. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef CONFIG_RV_MON_TLOB +#include "monitors/tlob/tlob.h" +#endif + +/* ----------------------------------------------------------------------- + * tlob_task_handle - per-thread session ticket for the ioctl interface + * + * One handle is allocated by TLOB_IOCTL_TRACE_START and freed by + * TLOB_IOCTL_TRACE_STOP (or by rv_release if the fd is closed). + * + * @hlist: Hash-table linkage in tlob_handles (keyed on task pointer). + * @task: The monitored thread. Plain pointer; no refcount held here + * because tlob.c holds one for the lifetime of the monitoring + * window, which encompasses the lifetime of this handle. + * @file: The /dev/rv file descriptor that issued TRACE_START. + * Used by rv_release() to sweep orphaned handles on close(). + * ----------------------------------------------------------------------- + */ +#define TLOB_HANDLES_BITS 5 +#define TLOB_HANDLES_SIZE (1 << TLOB_HANDLES_BITS) + +struct tlob_task_handle { + struct hlist_node hlist; + struct task_struct *task; + struct file *file; +}; + +static struct hlist_head tlob_handles[TLOB_HANDLES_SIZE]; +static DEFINE_SPINLOCK(tlob_handles_lock); + +static unsigned int tlob_handle_hash(const struct task_struct *task) +{ + return hash_ptr((void *)task, TLOB_HANDLES_BITS); +} + +/* Must be called with tlob_handles_lock held. */ +static struct tlob_task_handle * +tlob_handle_find_locked(struct task_struct *task) +{ + struct tlob_task_handle *h; + unsigned int slot =3D tlob_handle_hash(task); + + hlist_for_each_entry(h, &tlob_handles[slot], hlist) { + if (h->task =3D=3D task) + return h; + } + return NULL; +} + +/* + * tlob_handle_alloc - record that @task has an active monitoring session + * opened via @file. + * + * Returns 0 on success, -EEXIST if @task already has a handle (double + * TRACE_START without TRACE_STOP), -ENOMEM on allocation failure. + */ +static int tlob_handle_alloc(struct task_struct *task, struct file *file) +{ + struct tlob_task_handle *h; + unsigned long flags; + unsigned int slot; + + h =3D kmalloc(sizeof(*h), GFP_KERNEL); + if (!h) + return -ENOMEM; + h->task =3D task; + h->file =3D file; + + spin_lock_irqsave(&tlob_handles_lock, flags); + if (tlob_handle_find_locked(task)) { + spin_unlock_irqrestore(&tlob_handles_lock, flags); + kfree(h); + return -EEXIST; + } + slot =3D tlob_handle_hash(task); + hlist_add_head(&h->hlist, &tlob_handles[slot]); + spin_unlock_irqrestore(&tlob_handles_lock, flags); + return 0; +} + +/* + * tlob_handle_free - remove the handle for @task and free it. + * + * Returns 1 if a handle existed (TRACE_START was called), 0 if not found + * (TRACE_START was never called for this thread). + */ +static int tlob_handle_free(struct task_struct *task) +{ + struct tlob_task_handle *h; + unsigned long flags; + + spin_lock_irqsave(&tlob_handles_lock, flags); + h =3D tlob_handle_find_locked(task); + if (h) { + hlist_del_init(&h->hlist); + spin_unlock_irqrestore(&tlob_handles_lock, flags); + kfree(h); + return 1; + } + spin_unlock_irqrestore(&tlob_handles_lock, flags); + return 0; +} + +/* + * tlob_handle_sweep_file - release all handles owned by @file. + * + * Called from rv_release() when the fd is closed without TRACE_STOP. + * Calls tlob_stop_task() for each orphaned handle to drain the tlob + * monitoring entries and prevent resource leaks in tlob.c. + * + * Handles are collected under the lock (short critical section), then + * processed outside it (tlob_stop_task() may sleep/spin internally). + */ +#ifdef CONFIG_RV_MON_TLOB +static void tlob_handle_sweep_file(struct file *file) +{ + struct tlob_task_handle *batch[TLOB_HANDLES_SIZE]; + struct tlob_task_handle *h; + struct hlist_node *tmp; + unsigned long flags; + int i, n =3D 0; + + spin_lock_irqsave(&tlob_handles_lock, flags); + for (i =3D 0; i < TLOB_HANDLES_SIZE; i++) { + hlist_for_each_entry_safe(h, tmp, &tlob_handles[i], hlist) { + if (h->file =3D=3D file) { + hlist_del_init(&h->hlist); + batch[n++] =3D h; + } + } + } + spin_unlock_irqrestore(&tlob_handles_lock, flags); + + for (i =3D 0; i < n; i++) { + /* + * Ignore -ESRCH: the deadline timer may have already fired + * and cleaned up the tlob entry. + */ + tlob_stop_task(batch[i]->task); + kfree(batch[i]); + } +} +#else +static inline void tlob_handle_sweep_file(struct file *file) {} +#endif /* CONFIG_RV_MON_TLOB */ + +/* ----------------------------------------------------------------------- + * Ring buffer lifecycle + * ----------------------------------------------------------------------- + */ + +/* + * tlob_ring_alloc - allocate a ring of @cap records (must be a power of 2= ). + * + * Allocates a physically contiguous block of pages: + * page 0 : struct tlob_mmap_page (control page, shared with usersp= ace) + * pages 1..N : struct tlob_event[cap] (data pages) + * + * Each page is marked reserved so it can be mapped to userspace via mmap(= ). + */ +static int tlob_ring_alloc(struct tlob_ring *ring, u32 cap) +{ + unsigned int total =3D PAGE_SIZE + cap * sizeof(struct tlob_event); + unsigned int order =3D get_order(total); + unsigned long base; + unsigned int i; + + base =3D __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); + if (!base) + return -ENOMEM; + + for (i =3D 0; i < (1u << order); i++) + SetPageReserved(virt_to_page((void *)(base + i * PAGE_SIZE))); + + ring->base =3D base; + ring->order =3D order; + ring->page =3D (struct tlob_mmap_page *)base; + ring->data =3D (struct tlob_event *)(base + PAGE_SIZE); + ring->mask =3D cap - 1; + spin_lock_init(&ring->lock); + + ring->page->capacity =3D cap; + ring->page->version =3D 1; + ring->page->data_offset =3D PAGE_SIZE; + ring->page->record_size =3D sizeof(struct tlob_event); + return 0; +} + +static void tlob_ring_free(struct tlob_ring *ring) +{ + unsigned int i; + + if (!ring->base) + return; + + for (i =3D 0; i < (1u << ring->order); i++) + ClearPageReserved(virt_to_page((void *)(ring->base + i * PAGE_SIZE))); + + free_pages(ring->base, ring->order); + ring->base =3D 0; + ring->page =3D NULL; + ring->data =3D NULL; +} + +/* ----------------------------------------------------------------------- + * File operations + * ----------------------------------------------------------------------- + */ + +static int rv_open(struct inode *inode, struct file *file) +{ + struct rv_file_priv *priv; + int ret; + + priv =3D kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + ret =3D tlob_ring_alloc(&priv->ring, TLOB_RING_DEFAULT_CAP); + if (ret) { + kfree(priv); + return ret; + } + + init_waitqueue_head(&priv->waitq); + file->private_data =3D priv; + return 0; +} + +static int rv_release(struct inode *inode, struct file *file) +{ + struct rv_file_priv *priv =3D file->private_data; + + tlob_handle_sweep_file(file); + tlob_ring_free(&priv->ring); + kfree(priv); + file->private_data =3D NULL; + return 0; +} + +static __poll_t rv_poll(struct file *file, poll_table *wait) +{ + struct rv_file_priv *priv =3D file->private_data; + + if (!priv) + return EPOLLERR; + + poll_wait(file, &priv->waitq, wait); + + /* + * Pairs with smp_store_release(&ring->page->data_head, ...) in + * tlob_event_push(). No lock needed: head is written by the kernel + * producer and read here; tail is written by the consumer and we only + * need an approximate check for the poll fast path. + */ + if (smp_load_acquire(&priv->ring.page->data_head) !=3D + READ_ONCE(priv->ring.page->data_tail)) + return EPOLLIN | EPOLLRDNORM; + + return 0; +} + +/* + * rv_read - consume tlob_event violation records from this fd's ring buff= er. + * + * Each read() returns a whole number of struct tlob_event records. @coun= t must + * be at least sizeof(struct tlob_event); partial-record sizes are rejecte= d with + * -EINVAL. + * + * Blocking behaviour follows O_NONBLOCK on the fd: + * O_NONBLOCK clear: blocks until at least one record is available. + * O_NONBLOCK set: returns -EAGAIN immediately if the ring is empty. + * + * Returns the number of bytes copied (always a multiple of sizeof tlob_ev= ent), + * -EAGAIN if non-blocking and empty, or a negative error code. + * + * read() and mmap() share the same ring and data_tail cursor; do not use + * both simultaneously on the same fd. + */ +static ssize_t rv_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos) +{ + struct rv_file_priv *priv =3D file->private_data; + struct tlob_ring *ring; + size_t rec =3D sizeof(struct tlob_event); + unsigned long irqflags; + ssize_t done =3D 0; + int ret; + + if (!priv) + return -ENODEV; + + ring =3D &priv->ring; + + if (count < rec) + return -EINVAL; + + /* Blocking path: sleep until the producer advances data_head. */ + if (!(file->f_flags & O_NONBLOCK)) { + ret =3D wait_event_interruptible(priv->waitq, + /* pairs with smp_store_release() in the producer */ + smp_load_acquire(&ring->page->data_head) !=3D + READ_ONCE(ring->page->data_tail)); + if (ret) + return ret; + } + + /* + * Drain records into the caller's buffer. ring->lock serialises + * concurrent read() callers and the softirq producer. + */ + while (done + rec <=3D count) { + struct tlob_event record; + u32 head, tail; + + spin_lock_irqsave(&ring->lock, irqflags); + /* pairs with smp_store_release() in the producer */ + head =3D smp_load_acquire(&ring->page->data_head); + tail =3D ring->page->data_tail; + if (head =3D=3D tail) { + spin_unlock_irqrestore(&ring->lock, irqflags); + break; + } + record =3D ring->data[tail & ring->mask]; + WRITE_ONCE(ring->page->data_tail, tail + 1); + spin_unlock_irqrestore(&ring->lock, irqflags); + + if (copy_to_user(buf + done, &record, rec)) + return done ? done : -EFAULT; + done +=3D rec; + } + + return done ? done : -EAGAIN; +} + +/* + * rv_mmap - map the per-fd violation ring buffer into userspace. + * + * The mmap region covers the full ring allocation: + * + * offset 0 : struct tlob_mmap_page (control page) + * offset PAGE_SIZE : struct tlob_event[capacity] (data pages) + * + * The caller must map exactly PAGE_SIZE + capacity * sizeof(struct tlob_e= vent) + * bytes starting at offset 0 (vm_pgoff must be 0). The actual capacity is + * read from tlob_mmap_page.capacity after a successful mmap(2). + * + * Private mappings (MAP_PRIVATE) are rejected: the shared data_tail field + * written by userspace must be visible to the kernel producer. + */ +static int rv_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct rv_file_priv *priv =3D file->private_data; + struct tlob_ring *ring; + unsigned long size =3D vma->vm_end - vma->vm_start; + unsigned long ring_size; + + if (!priv) + return -ENODEV; + + ring =3D &priv->ring; + + if (vma->vm_pgoff !=3D 0) + return -EINVAL; + + ring_size =3D PAGE_ALIGN(PAGE_SIZE + ((unsigned long)(ring->mask + 1) * + sizeof(struct tlob_event))); + if (size !=3D ring_size) + return -EINVAL; + + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + return remap_pfn_range(vma, vma->vm_start, + page_to_pfn(virt_to_page((void *)ring->base)), + ring_size, vma->vm_page_prot); +} + +/* ----------------------------------------------------------------------- + * ioctl dispatcher + * ----------------------------------------------------------------------- + */ + +static long rv_ioctl(struct file *file, unsigned int cmd, unsigned long ar= g) +{ + unsigned int nr =3D _IOC_NR(cmd); + + /* + * Verify the magic byte so we don't accidentally handle ioctls + * intended for a different device. + */ + if (_IOC_TYPE(cmd) !=3D RV_IOC_MAGIC) + return -ENOTTY; + +#ifdef CONFIG_RV_MON_TLOB + /* tlob: ioctl numbers 0x01 - 0x1F */ + switch (cmd) { + case TLOB_IOCTL_TRACE_START: { + struct tlob_start_args args; + struct file *notify_file =3D NULL; + int ret, hret; + + if (copy_from_user(&args, + (struct tlob_start_args __user *)arg, + sizeof(args))) + return -EFAULT; + if (args.threshold_us =3D=3D 0) + return -EINVAL; + if (args.flags !=3D 0) + return -EINVAL; + + /* + * If notify_fd >=3D 0, resolve it to a file pointer. + * fget() bumps the reference count; tlob.c drops it + * via fput() when the monitoring window ends. + * Reject non-/dev/rv fds to prevent type confusion. + */ + if (args.notify_fd >=3D 0) { + notify_file =3D fget(args.notify_fd); + if (!notify_file) + return -EBADF; + if (notify_file->f_op !=3D file->f_op) { + fput(notify_file); + return -EINVAL; + } + } + + ret =3D tlob_start_task(current, args.threshold_us, + notify_file, args.tag); + if (ret !=3D 0) { + /* tlob.c did not take ownership; drop ref. */ + if (notify_file) + fput(notify_file); + return ret; + } + + /* + * Record session handle. Free any stale handle left by + * a previous window whose deadline timer fired (timer + * removes tlob_task_state but cannot touch tlob_handles). + */ + tlob_handle_free(current); + hret =3D tlob_handle_alloc(current, file); + if (hret < 0) { + tlob_stop_task(current); + return hret; + } + return 0; + } + case TLOB_IOCTL_TRACE_STOP: { + int had_handle; + int ret; + + /* + * Atomically remove the session handle for current. + * + * had_handle =3D=3D 0: TRACE_START was never called for + * this thread -> caller bug -> -ESRCH + * + * had_handle =3D=3D 1: TRACE_START was called. If + * tlob_stop_task() now returns + * -ESRCH, the deadline timer already + * fired -> budget exceeded -> -EOVERFLOW + */ + had_handle =3D tlob_handle_free(current); + if (!had_handle) + return -ESRCH; + + ret =3D tlob_stop_task(current); + return (ret =3D=3D -ESRCH) ? -EOVERFLOW : ret; + } + default: + break; + } +#endif /* CONFIG_RV_MON_TLOB */ + + return -ENOTTY; +} + +/* ----------------------------------------------------------------------- + * Module init / exit + * ----------------------------------------------------------------------- + */ + +static const struct file_operations rv_fops =3D { + .owner =3D THIS_MODULE, + .open =3D rv_open, + .release =3D rv_release, + .read =3D rv_read, + .poll =3D rv_poll, + .mmap =3D rv_mmap, + .unlocked_ioctl =3D rv_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl =3D rv_ioctl, +#endif + .llseek =3D noop_llseek, +}; + +/* + * 0666: /dev/rv is a self-instrumentation device. All ioctls operate + * exclusively on the calling task (current); no task can monitor another + * via this interface. Opening the device does not grant any privilege + * beyond observing one's own latency, so world-read/write is appropriate. + */ +static struct miscdevice rv_miscdev =3D { + .minor =3D MISC_DYNAMIC_MINOR, + .name =3D "rv", + .fops =3D &rv_fops, + .mode =3D 0666, +}; + +static int __init rv_ioctl_init(void) +{ + int i; + + for (i =3D 0; i < TLOB_HANDLES_SIZE; i++) + INIT_HLIST_HEAD(&tlob_handles[i]); + + return misc_register(&rv_miscdev); +} + +static void __exit rv_ioctl_exit(void) +{ + misc_deregister(&rv_miscdev); +} + +module_init(rv_ioctl_init); +module_exit(rv_ioctl_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("RV ioctl interface via /dev/rv"); diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h index 4a6faddac..65d6c6485 100644 --- a/kernel/trace/rv/rv_trace.h +++ b/kernel/trace/rv/rv_trace.h @@ -126,6 +126,7 @@ DECLARE_EVENT_CLASS(error_da_monitor_id, #include #include #include +#include // Add new monitors based on CONFIG_DA_MON_EVENTS_ID here =20 #endif /* CONFIG_DA_MON_EVENTS_ID */ @@ -202,6 +203,55 @@ TRACE_EVENT(rv_retries_error, __get_str(event), __get_str(name)) ); #endif /* CONFIG_RV_MON_MAINTENANCE_EVENTS */ + +#ifdef CONFIG_RV_MON_TLOB +/* + * tlob_budget_exceeded - emitted when a monitored task exceeds its latency + * budget. Carries the on-CPU / off-CPU time breakdown so that the cause + * of the overrun (CPU-bound vs. scheduling/I/O latency) is immediately + * visible in the ftrace ring buffer without post-processing. + */ +TRACE_EVENT(tlob_budget_exceeded, + + TP_PROTO(struct task_struct *task, u64 threshold_us, + u64 on_cpu_us, u64 off_cpu_us, u32 switches, + bool state_is_on_cpu, u64 tag), + + TP_ARGS(task, threshold_us, on_cpu_us, off_cpu_us, switches, + state_is_on_cpu, tag), + + TP_STRUCT__entry( + __string(comm, task->comm) + __field(pid_t, pid) + __field(u64, threshold_us) + __field(u64, on_cpu_us) + __field(u64, off_cpu_us) + __field(u32, switches) + __field(bool, state_is_on_cpu) + __field(u64, tag) + ), + + TP_fast_assign( + __assign_str(comm); + __entry->pid =3D task->pid; + __entry->threshold_us =3D threshold_us; + __entry->on_cpu_us =3D on_cpu_us; + __entry->off_cpu_us =3D off_cpu_us; + __entry->switches =3D switches; + __entry->state_is_on_cpu =3D state_is_on_cpu; + __entry->tag =3D tag; + ), + + TP_printk("%s[%d]: budget exceeded threshold=3D%llu on_cpu=3D%llu off_cpu= =3D%llu switches=3D%u state=3D%s tag=3D0x%016llx", + __get_str(comm), __entry->pid, + __entry->threshold_us, + __entry->on_cpu_us, __entry->off_cpu_us, + __entry->switches, + __entry->state_is_on_cpu ? "on_cpu" : "off_cpu", + __entry->tag) +); +#endif /* CONFIG_RV_MON_TLOB */ + #endif /* _TRACE_RV_H */ =20 /* This part must be outside protection */ --=20 2.43.0 From nobody Mon Jun 15 18:00:20 2026 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD01537C939 for ; Sun, 12 Apr 2026 19:28:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776022084; cv=none; b=IeXwKD7VbhUxDUTdgccdkoFrAJrfTrgWP6FVLd4BErRa9cyfaBDgV0X5cT9ZJLcUyR5+ycw3khycdlmdzIKTTJMOQKm3vNY9XvJRPVsUILuexr+iy6lySOSwTsE0dHOalYhJ4e8/hKH7tuomEMVGvL3hjtdm2tPa3N4O5QgBykY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776022084; c=relaxed/simple; bh=Fjx0dV5xDZ5GzQLflUiNI5/dnhTVUD1FIyWuUCtC5WU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ocaJcr4CWlLiXAqLi4JUt1g8LexC619VjAzD6EgVea4a7yp760WHamejYdT3ReqPe9E32IH33/qGwAuDmVBaceBMTaLNp60NzPz3vQZHb/i4XZS2rLYRBR0yu+1pArl6zSkj73Z3LfkXi8lxBHfAl3wgP8Tm0HzxshbtSrj9Y1s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=AcxLXgUz; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="AcxLXgUz" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1776022080; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4CAvpA12spp2N4nX5FHvWLkRpiYSyaWQFQ+2pMpoNA8=; b=AcxLXgUzPFc4H+eIfehF49B4UATIxiFDJscufzFnKt8rB10jNchMt3YZjQuC46fWbjIR+d mqPI3z3wKDucY6sRmXjcZqh0OW6jjJcekPIJAsuejLy6rWONnrWLF7MFLhyEqgUuUSstak Cf336YCvXbpkUVOMR1EM6ds8OynO4wo= From: wen.yang@linux.dev To: Steven Rostedt , Gabriele Monaco , Masami Hiramatsu , Mathieu Desnoyers Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Wen Yang Subject: [RFC PATCH 3/4] rv/tlob: Add KUnit tests for the tlob monitor Date: Mon, 13 Apr 2026 03:27:20 +0800 Message-Id: <0a7f41ff8cb13f8601920ead2979db2ee5f2d442.1776020428.git.wen.yang@linux.dev> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Wen Yang Add six KUnit test suites gated behind CONFIG_TLOB_KUNIT_TEST (depends on RV_MON_TLOB && KUNIT; default KUNIT_ALL_TESTS). A .kunitconfig fragment is provided for the kunit.py runner. Coverage: automaton state transitions and self-loops; start/stop API error paths (duplicate start, missing start, overflow threshold, table-full, immediate deadline); scheduler context-switch accounting for on/off-CPU time; violation tracepoint payload fields; ring buffer push, drop-new overflow, and wakeup; and the uprobe line parser. Signed-off-by: Wen Yang --- kernel/trace/rv/Makefile | 1 + kernel/trace/rv/monitors/tlob/.kunitconfig | 5 + kernel/trace/rv/monitors/tlob/Kconfig | 12 + kernel/trace/rv/monitors/tlob/tlob.c | 1 + kernel/trace/rv/monitors/tlob/tlob_kunit.c | 1194 ++++++++++++++++++++ 5 files changed, 1213 insertions(+) create mode 100644 kernel/trace/rv/monitors/tlob/.kunitconfig create mode 100644 kernel/trace/rv/monitors/tlob/tlob_kunit.c diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile index cc3781a3b..6d963207d 100644 --- a/kernel/trace/rv/Makefile +++ b/kernel/trace/rv/Makefile @@ -19,6 +19,7 @@ obj-$(CONFIG_RV_MON_NRP) +=3D monitors/nrp/nrp.o obj-$(CONFIG_RV_MON_SSSW) +=3D monitors/sssw/sssw.o obj-$(CONFIG_RV_MON_OPID) +=3D monitors/opid/opid.o obj-$(CONFIG_RV_MON_TLOB) +=3D monitors/tlob/tlob.o +obj-$(CONFIG_TLOB_KUNIT_TEST) +=3D monitors/tlob/tlob_kunit.o # Add new monitors here obj-$(CONFIG_RV_REACTORS) +=3D rv_reactors.o obj-$(CONFIG_RV_REACT_PRINTK) +=3D reactor_printk.o diff --git a/kernel/trace/rv/monitors/tlob/.kunitconfig b/kernel/trace/rv/m= onitors/tlob/.kunitconfig new file mode 100644 index 000000000..977c58601 --- /dev/null +++ b/kernel/trace/rv/monitors/tlob/.kunitconfig @@ -0,0 +1,5 @@ +CONFIG_FTRACE=3Dy +CONFIG_KUNIT=3Dy +CONFIG_RV=3Dy +CONFIG_RV_MON_TLOB=3Dy +CONFIG_TLOB_KUNIT_TEST=3Dy diff --git a/kernel/trace/rv/monitors/tlob/Kconfig b/kernel/trace/rv/monito= rs/tlob/Kconfig index 010237480..4ccd2f881 100644 --- a/kernel/trace/rv/monitors/tlob/Kconfig +++ b/kernel/trace/rv/monitors/tlob/Kconfig @@ -49,3 +49,15 @@ config RV_MON_TLOB For further information, see: Documentation/trace/rv/monitor_tlob.rst =20 +config TLOB_KUNIT_TEST + tristate "KUnit tests for tlob monitor" if !KUNIT_ALL_TESTS + depends on RV_MON_TLOB && KUNIT + default KUNIT_ALL_TESTS + help + Enable KUnit in-kernel unit tests for the tlob RV monitor. + + Tests cover automaton state transitions, the hash table helpers, + the start/stop task interface, and the event ring buffer including + overflow handling and wakeup behaviour. + + Say Y or M here to run the tlob KUnit test suite; otherwise say N. diff --git a/kernel/trace/rv/monitors/tlob/tlob.c b/kernel/trace/rv/monitor= s/tlob/tlob.c index a6e474025..dd959eb9b 100644 --- a/kernel/trace/rv/monitors/tlob/tlob.c +++ b/kernel/trace/rv/monitors/tlob/tlob.c @@ -784,6 +784,7 @@ VISIBLE_IF_KUNIT int tlob_parse_uprobe_line(char *buf, = u64 *thr_out, *path_out =3D buf + n; return 0; } +EXPORT_SYMBOL_IF_KUNIT(tlob_parse_uprobe_line); =20 static ssize_t tlob_monitor_write(struct file *file, const char __user *ubuf, diff --git a/kernel/trace/rv/monitors/tlob/tlob_kunit.c b/kernel/trace/rv/m= onitors/tlob/tlob_kunit.c new file mode 100644 index 000000000..64f5abb34 --- /dev/null +++ b/kernel/trace/rv/monitors/tlob/tlob_kunit.c @@ -0,0 +1,1194 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KUnit tests for the tlob RV monitor. + * + * tlob_automaton: DA transition table coverage. + * tlob_task_api: tlob_start_task()/tlob_stop_task() lifecycle an= d errors. + * tlob_sched_integration: on/off-CPU accounting across real context switc= hes. + * tlob_trace_output: tlob_budget_exceeded tracepoint field verificat= ion. + * tlob_event_buf: ring buffer push, overflow, and wakeup. + * tlob_parse_uprobe: uprobe format string parser acceptance and reje= ction. + * + * The duplicate-(binary, offset_start) constraint enforced by tlob_add_up= robe() + * is not covered here: that function calls kern_path() and requires a real + * filesystem, which is outside the scope of unit tests. It is covered by = the + * uprobe_duplicate_offset case in tools/testing/selftests/rv/test_tlob.sh. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Pull in the rv tracepoint declarations so that + * register_trace_tlob_budget_exceeded() is available. + * No CREATE_TRACE_POINTS here -- the tracepoint implementation lives in= rv.c. + */ +#include + +#include "tlob.h" + +/* + * da_handle_event_tlob - apply one automaton transition on @da_mon. + * + * This helper is used only by the KUnit automaton suite. It applies the + * tlob transition table directly on a supplied da_monitor without touching + * per-task slots, tracepoints, or timers. + */ +static void da_handle_event_tlob(struct da_monitor *da_mon, + enum events_tlob event) +{ + enum states_tlob curr_state =3D (enum states_tlob)da_mon->curr_state; + enum states_tlob next_state =3D + (enum states_tlob)automaton_tlob.function[curr_state][event]; + + if (next_state !=3D INVALID_STATE) + da_mon->curr_state =3D next_state; +} + +MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING"); + +/* + * Suite 1: automaton state-machine transitions + */ + +/* unmonitored -> trace_start -> on_cpu */ +static void tlob_unmonitored_to_on_cpu(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D unmonitored_tlob }; + + da_handle_event_tlob(&mon, trace_start_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob); +} + +/* on_cpu -> switch_out -> off_cpu */ +static void tlob_on_cpu_switch_out(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D on_cpu_tlob }; + + da_handle_event_tlob(&mon, switch_out_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)off_cpu_tlob); +} + +/* off_cpu -> switch_in -> on_cpu */ +static void tlob_off_cpu_switch_in(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D off_cpu_tlob }; + + da_handle_event_tlob(&mon, switch_in_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob); +} + +/* on_cpu -> budget_expired -> unmonitored */ +static void tlob_on_cpu_budget_expired(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D on_cpu_tlob }; + + da_handle_event_tlob(&mon, budget_expired_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob); +} + +/* off_cpu -> budget_expired -> unmonitored */ +static void tlob_off_cpu_budget_expired(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D off_cpu_tlob }; + + da_handle_event_tlob(&mon, budget_expired_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob); +} + +/* on_cpu -> trace_stop -> unmonitored */ +static void tlob_on_cpu_trace_stop(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D on_cpu_tlob }; + + da_handle_event_tlob(&mon, trace_stop_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob); +} + +/* off_cpu -> trace_stop -> unmonitored */ +static void tlob_off_cpu_trace_stop(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D off_cpu_tlob }; + + da_handle_event_tlob(&mon, trace_stop_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob); +} + +/* budget_expired -> unmonitored; a single trace_start re-enters on_cpu. */ +static void tlob_violation_then_restart(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D unmonitored_tlob }; + + da_handle_event_tlob(&mon, trace_start_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob); + + da_handle_event_tlob(&mon, budget_expired_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob); + + /* Single trace_start is sufficient to re-enter on_cpu */ + da_handle_event_tlob(&mon, trace_start_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob); + + da_handle_event_tlob(&mon, trace_stop_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob); +} + +/* off_cpu self-loops on switch_out and sched_wakeup. */ +static void tlob_off_cpu_self_loops(struct kunit *test) +{ + static const enum events_tlob events[] =3D { + switch_out_tlob, sched_wakeup_tlob, + }; + unsigned int i; + + for (i =3D 0; i < ARRAY_SIZE(events); i++) { + struct da_monitor mon =3D { .curr_state =3D off_cpu_tlob }; + + da_handle_event_tlob(&mon, events[i]); + KUNIT_EXPECT_EQ_MSG(test, (int)mon.curr_state, + (int)off_cpu_tlob, + "event %u should self-loop in off_cpu", + events[i]); + } +} + +/* on_cpu self-loops on sched_wakeup. */ +static void tlob_on_cpu_self_loops(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D on_cpu_tlob }; + + da_handle_event_tlob(&mon, sched_wakeup_tlob); + KUNIT_EXPECT_EQ_MSG(test, (int)mon.curr_state, (int)on_cpu_tlob, + "sched_wakeup should self-loop in on_cpu"); +} + +/* Scheduling events in unmonitored self-loop (no state change). */ +static void tlob_unmonitored_ignores_sched(struct kunit *test) +{ + static const enum events_tlob events[] =3D { + switch_in_tlob, switch_out_tlob, sched_wakeup_tlob, + }; + unsigned int i; + + for (i =3D 0; i < ARRAY_SIZE(events); i++) { + struct da_monitor mon =3D { .curr_state =3D unmonitored_tlob }; + + da_handle_event_tlob(&mon, events[i]); + KUNIT_EXPECT_EQ_MSG(test, (int)mon.curr_state, + (int)unmonitored_tlob, + "event %u should self-loop in unmonitored", + events[i]); + } +} + +static void tlob_full_happy_path(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D unmonitored_tlob }; + + da_handle_event_tlob(&mon, trace_start_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob); + + da_handle_event_tlob(&mon, switch_out_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)off_cpu_tlob); + + da_handle_event_tlob(&mon, switch_in_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob); + + da_handle_event_tlob(&mon, trace_stop_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob); +} + +static void tlob_multiple_switches(struct kunit *test) +{ + struct da_monitor mon =3D { .curr_state =3D unmonitored_tlob }; + int i; + + da_handle_event_tlob(&mon, trace_start_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob); + + for (i =3D 0; i < 3; i++) { + da_handle_event_tlob(&mon, switch_out_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)off_cpu_tlob); + da_handle_event_tlob(&mon, switch_in_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob); + } + + da_handle_event_tlob(&mon, trace_stop_tlob); + KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob); +} + +static struct kunit_case tlob_automaton_cases[] =3D { + KUNIT_CASE(tlob_unmonitored_to_on_cpu), + KUNIT_CASE(tlob_on_cpu_switch_out), + KUNIT_CASE(tlob_off_cpu_switch_in), + KUNIT_CASE(tlob_on_cpu_budget_expired), + KUNIT_CASE(tlob_off_cpu_budget_expired), + KUNIT_CASE(tlob_on_cpu_trace_stop), + KUNIT_CASE(tlob_off_cpu_trace_stop), + KUNIT_CASE(tlob_off_cpu_self_loops), + KUNIT_CASE(tlob_on_cpu_self_loops), + KUNIT_CASE(tlob_unmonitored_ignores_sched), + KUNIT_CASE(tlob_full_happy_path), + KUNIT_CASE(tlob_violation_then_restart), + KUNIT_CASE(tlob_multiple_switches), + {} +}; + +static struct kunit_suite tlob_automaton_suite =3D { + .name =3D "tlob_automaton", + .test_cases =3D tlob_automaton_cases, +}; + +/* + * Suite 2: task registration API + */ + +/* Basic start/stop cycle */ +static void tlob_start_stop_ok(struct kunit *test) +{ + int ret; + + ret =3D tlob_start_task(current, 10000000 /* 10 s, won't fire */, NULL, 0= ); + KUNIT_ASSERT_EQ(test, ret, 0); + KUNIT_EXPECT_EQ(test, tlob_stop_task(current), 0); +} + +/* Double start must return -EEXIST. */ +static void tlob_double_start(struct kunit *test) +{ + KUNIT_ASSERT_EQ(test, tlob_start_task(current, 10000000, NULL, 0), 0); + KUNIT_EXPECT_EQ(test, tlob_start_task(current, 10000000, NULL, 0), -EEXIS= T); + tlob_stop_task(current); +} + +/* Stop without start must return -ESRCH. */ +static void tlob_stop_without_start(struct kunit *test) +{ + tlob_stop_task(current); /* clear any stale entry first */ + KUNIT_EXPECT_EQ(test, tlob_stop_task(current), -ESRCH); +} + +/* + * A 1 us budget fires before tlob_stop_task() is called. Either the + * timer wins (-ESRCH) or we are very fast (0); both are valid. + */ +static void tlob_immediate_deadline(struct kunit *test) +{ + int ret =3D tlob_start_task(current, 1 /* 1 us - fires almost immediately= */, NULL, 0); + + KUNIT_ASSERT_EQ(test, ret, 0); + /* Let the 1 us timer fire */ + udelay(100); + /* + * By now the hrtimer has almost certainly fired. Either it has + * (returns -ESRCH) or we were very fast (returns 0). Both are + * acceptable; just ensure no crash and the table is clean after. + */ + ret =3D tlob_stop_task(current); + KUNIT_EXPECT_TRUE(test, ret =3D=3D 0 || ret =3D=3D -ESRCH); +} + +/* + * Fill the table to TLOB_MAX_MONITORED using kthreads (each needs a + * distinct task_struct), then verify the next start returns -ENOSPC. + */ +struct tlob_waiter_ctx { + struct completion start; + struct completion done; +}; + +static int tlob_waiter_fn(void *arg) +{ + struct tlob_waiter_ctx *ctx =3D arg; + + wait_for_completion(&ctx->start); + complete(&ctx->done); + return 0; +} + +static void tlob_enospc(struct kunit *test) +{ + struct tlob_waiter_ctx *ctxs; + struct task_struct **threads; + int i, ret; + + ctxs =3D kunit_kcalloc(test, TLOB_MAX_MONITORED, + sizeof(*ctxs), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, ctxs); + + threads =3D kunit_kcalloc(test, TLOB_MAX_MONITORED, + sizeof(*threads), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, threads); + + /* Start TLOB_MAX_MONITORED kthreads and monitor each */ + for (i =3D 0; i < TLOB_MAX_MONITORED; i++) { + init_completion(&ctxs[i].start); + init_completion(&ctxs[i].done); + + threads[i] =3D kthread_run(tlob_waiter_fn, &ctxs[i], + "tlob_waiter_%d", i); + if (IS_ERR(threads[i])) { + KUNIT_FAIL(test, "kthread_run failed at i=3D%d", i); + threads[i] =3D NULL; + goto cleanup; + } + get_task_struct(threads[i]); + + ret =3D tlob_start_task(threads[i], 10000000, NULL, 0); + if (ret !=3D 0) { + KUNIT_FAIL(test, "tlob_start_task failed at i=3D%d: %d", + i, ret); + put_task_struct(threads[i]); + complete(&ctxs[i].start); + goto cleanup; + } + } + + /* The table is now full: one more must fail with -ENOSPC */ + ret =3D tlob_start_task(current, 10000000, NULL, 0); + KUNIT_EXPECT_EQ(test, ret, -ENOSPC); + +cleanup: + /* + * Two-pass cleanup: cancel tlob monitoring and unblock kthreads first, + * then kthread_stop() to wait for full exit before releasing refs. + */ + for (i =3D 0; i < TLOB_MAX_MONITORED; i++) { + if (!threads[i]) + break; + tlob_stop_task(threads[i]); + complete(&ctxs[i].start); + } + for (i =3D 0; i < TLOB_MAX_MONITORED; i++) { + if (!threads[i]) + break; + kthread_stop(threads[i]); + put_task_struct(threads[i]); + } +} + +/* + * A kthread holds a mutex for 80 ms; arm a 10 ms budget, burn ~1 ms + * on-CPU, then block on the mutex. The timer fires off-CPU; stop + * must return -ESRCH. + */ +struct tlob_holder_ctx { + struct mutex lock; + struct completion ready; + unsigned int hold_ms; +}; + +static int tlob_holder_fn(void *arg) +{ + struct tlob_holder_ctx *ctx =3D arg; + + mutex_lock(&ctx->lock); + complete(&ctx->ready); + msleep(ctx->hold_ms); + mutex_unlock(&ctx->lock); + return 0; +} + +static void tlob_deadline_fires_off_cpu(struct kunit *test) +{ + struct tlob_holder_ctx ctx =3D { .hold_ms =3D 80 }; + struct task_struct *holder; + ktime_t t0; + int ret; + + mutex_init(&ctx.lock); + init_completion(&ctx.ready); + + holder =3D kthread_run(tlob_holder_fn, &ctx, "tlob_holder_kunit"); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, holder); + wait_for_completion(&ctx.ready); + + /* Arm 10 ms budget while kthread holds the mutex. */ + ret =3D tlob_start_task(current, 10000, NULL, 0); + KUNIT_ASSERT_EQ(test, ret, 0); + + /* Phase 1: burn ~1 ms on-CPU to exercise on_cpu accounting. */ + t0 =3D ktime_get(); + while (ktime_us_delta(ktime_get(), t0) < 1000) + cpu_relax(); + + /* + * Phase 2: block on the mutex -> on_cpu->off_cpu transition. + * The 10 ms budget fires while we are off-CPU. + */ + mutex_lock(&ctx.lock); + mutex_unlock(&ctx.lock); + + /* Timer already fired and removed the entry -> -ESRCH */ + KUNIT_EXPECT_EQ(test, tlob_stop_task(current), -ESRCH); +} + +/* Arm a 1 ms budget and busy-spin for 50 ms; timer fires on-CPU. */ +static void tlob_deadline_fires_on_cpu(struct kunit *test) +{ + ktime_t t0; + int ret; + + ret =3D tlob_start_task(current, 1000 /* 1 ms */, NULL, 0); + KUNIT_ASSERT_EQ(test, ret, 0); + + /* Busy-spin 50 ms - 50x the budget */ + t0 =3D ktime_get(); + while (ktime_us_delta(ktime_get(), t0) < 50000) + cpu_relax(); + + /* Timer fired during the spin; entry is gone */ + KUNIT_EXPECT_EQ(test, tlob_stop_task(current), -ESRCH); +} + +/* + * Start three tasks, call tlob_destroy_monitor() + tlob_init_monitor(), + * and verify the table is empty afterwards. + */ +static int tlob_dummy_fn(void *arg) +{ + wait_for_completion((struct completion *)arg); + return 0; +} + +static void tlob_stop_all_cleanup(struct kunit *test) +{ + struct completion done1, done2; + struct task_struct *t1, *t2; + int ret; + + init_completion(&done1); + init_completion(&done2); + + t1 =3D kthread_run(tlob_dummy_fn, &done1, "tlob_dummy1"); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, t1); + get_task_struct(t1); + + t2 =3D kthread_run(tlob_dummy_fn, &done2, "tlob_dummy2"); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, t2); + get_task_struct(t2); + + KUNIT_ASSERT_EQ(test, tlob_start_task(current, 10000000, NULL, 0), 0); + KUNIT_ASSERT_EQ(test, tlob_start_task(t1, 10000000, NULL, 0), 0); + KUNIT_ASSERT_EQ(test, tlob_start_task(t2, 10000000, NULL, 0), 0); + + /* Destroy clears all entries via tlob_stop_all() */ + tlob_destroy_monitor(); + ret =3D tlob_init_monitor(); + KUNIT_ASSERT_EQ(test, ret, 0); + + /* Table must be empty now */ + KUNIT_EXPECT_EQ(test, tlob_stop_task(current), -ESRCH); + KUNIT_EXPECT_EQ(test, tlob_stop_task(t1), -ESRCH); + KUNIT_EXPECT_EQ(test, tlob_stop_task(t2), -ESRCH); + + complete(&done1); + complete(&done2); + /* + * completions live on stack; wait for kthreads to exit before return. + */ + kthread_stop(t1); + kthread_stop(t2); + put_task_struct(t1); + put_task_struct(t2); +} + +/* A threshold that overflows ktime_t must be rejected with -ERANGE. */ +static void tlob_overflow_threshold(struct kunit *test) +{ + /* KTIME_MAX / NSEC_PER_USEC + 1 overflows ktime_t */ + u64 too_large =3D (u64)(KTIME_MAX / NSEC_PER_USEC) + 1; + + KUNIT_EXPECT_EQ(test, + tlob_start_task(current, too_large, NULL, 0), + -ERANGE); +} + +static int tlob_task_api_suite_init(struct kunit_suite *suite) +{ + return tlob_init_monitor(); +} + +static void tlob_task_api_suite_exit(struct kunit_suite *suite) +{ + tlob_destroy_monitor(); +} + +static struct kunit_case tlob_task_api_cases[] =3D { + KUNIT_CASE(tlob_start_stop_ok), + KUNIT_CASE(tlob_double_start), + KUNIT_CASE(tlob_stop_without_start), + KUNIT_CASE(tlob_immediate_deadline), + KUNIT_CASE(tlob_enospc), + KUNIT_CASE(tlob_overflow_threshold), + KUNIT_CASE(tlob_deadline_fires_off_cpu), + KUNIT_CASE(tlob_deadline_fires_on_cpu), + KUNIT_CASE(tlob_stop_all_cleanup), + {} +}; + +static struct kunit_suite tlob_task_api_suite =3D { + .name =3D "tlob_task_api", + .suite_init =3D tlob_task_api_suite_init, + .suite_exit =3D tlob_task_api_suite_exit, + .test_cases =3D tlob_task_api_cases, +}; + +/* + * Suite 3: scheduling integration + */ + +struct tlob_ping_ctx { + struct completion ping; + struct completion pong; +}; + +static int tlob_ping_fn(void *arg) +{ + struct tlob_ping_ctx *ctx =3D arg; + + /* Wait for main to give us the CPU back */ + wait_for_completion(&ctx->ping); + complete(&ctx->pong); + return 0; +} + +/* Force two context switches and verify stop returns 0 (within budget). */ +static void tlob_sched_switch_accounting(struct kunit *test) +{ + struct tlob_ping_ctx ctx; + struct task_struct *peer; + int ret; + + init_completion(&ctx.ping); + init_completion(&ctx.pong); + + peer =3D kthread_run(tlob_ping_fn, &ctx, "tlob_ping_kunit"); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, peer); + + /* Arm a generous 5 s budget so the timer never fires */ + ret =3D tlob_start_task(current, 5000000, NULL, 0); + KUNIT_ASSERT_EQ(test, ret, 0); + + /* + * complete(ping) -> peer runs, forcing a context switch out and back. + */ + complete(&ctx.ping); + wait_for_completion(&ctx.pong); + + /* + * Back on CPU after one off-CPU interval; stop must return 0. + */ + ret =3D tlob_stop_task(current); + KUNIT_EXPECT_EQ(test, ret, 0); +} + +/* + * Verify that monitoring a kthread (not current) works: start on behalf + * of a kthread, let it block, then stop it. + */ +static int tlob_block_fn(void *arg) +{ + struct completion *done =3D arg; + + /* Block briefly, exercising off_cpu accounting for this task */ + msleep(20); + complete(done); + return 0; +} + +static void tlob_monitor_other_task(struct kunit *test) +{ + struct completion done; + struct task_struct *target; + int ret; + + init_completion(&done); + + target =3D kthread_run(tlob_block_fn, &done, "tlob_target_kunit"); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, target); + get_task_struct(target); + + /* Arm a 5 s budget for the target task */ + ret =3D tlob_start_task(target, 5000000, NULL, 0); + KUNIT_ASSERT_EQ(test, ret, 0); + + wait_for_completion(&done); + + /* + * Target has finished; stop_task may return 0 (still in htable) + * or -ESRCH (kthread exited and timer fired / entry cleaned up). + */ + ret =3D tlob_stop_task(target); + KUNIT_EXPECT_TRUE(test, ret =3D=3D 0 || ret =3D=3D -ESRCH); + put_task_struct(target); +} + +static int tlob_sched_suite_init(struct kunit_suite *suite) +{ + return tlob_init_monitor(); +} + +static void tlob_sched_suite_exit(struct kunit_suite *suite) +{ + tlob_destroy_monitor(); +} + +static struct kunit_case tlob_sched_integration_cases[] =3D { + KUNIT_CASE(tlob_sched_switch_accounting), + KUNIT_CASE(tlob_monitor_other_task), + {} +}; + +static struct kunit_suite tlob_sched_integration_suite =3D { + .name =3D "tlob_sched_integration", + .suite_init =3D tlob_sched_suite_init, + .suite_exit =3D tlob_sched_suite_exit, + .test_cases =3D tlob_sched_integration_cases, +}; + +/* + * Suite 4: ftrace tracepoint field verification + */ + +/* Capture fields from trace_tlob_budget_exceeded for inspection. */ +struct tlob_exceeded_capture { + atomic_t fired; /* 1 after first call */ + pid_t pid; + u64 threshold_us; + u64 on_cpu_us; + u64 off_cpu_us; + u32 switches; + bool state_is_on_cpu; + u64 tag; +}; + +static void +probe_tlob_budget_exceeded(void *data, + struct task_struct *task, u64 threshold_us, + u64 on_cpu_us, u64 off_cpu_us, + u32 switches, bool state_is_on_cpu, u64 tag) +{ + struct tlob_exceeded_capture *cap =3D data; + + /* Only capture the first event to avoid races. */ + if (atomic_cmpxchg(&cap->fired, 0, 1) !=3D 0) + return; + + cap->pid =3D task->pid; + cap->threshold_us =3D threshold_us; + cap->on_cpu_us =3D on_cpu_us; + cap->off_cpu_us =3D off_cpu_us; + cap->switches =3D switches; + cap->state_is_on_cpu =3D state_is_on_cpu; + cap->tag =3D tag; +} + +/* + * Arm a 2 ms budget and busy-spin for 60 ms. Verify the tracepoint fires + * once with matching threshold, correct pid, and total time >=3D budget. + * + * state_is_on_cpu is not asserted: preemption during the spin makes it + * non-deterministic. + */ +static void tlob_trace_budget_exceeded_on_cpu(struct kunit *test) +{ + struct tlob_exceeded_capture cap =3D {}; + const u64 threshold_us =3D 2000; /* 2 ms */ + ktime_t t0; + int ret; + + atomic_set(&cap.fired, 0); + + ret =3D register_trace_tlob_budget_exceeded(probe_tlob_budget_exceeded, + &cap); + KUNIT_ASSERT_EQ(test, ret, 0); + + ret =3D tlob_start_task(current, threshold_us, NULL, 0); + KUNIT_ASSERT_EQ(test, ret, 0); + + /* Busy-spin 60 ms -- 30x the budget */ + t0 =3D ktime_get(); + while (ktime_us_delta(ktime_get(), t0) < 60000) + cpu_relax(); + + /* Entry removed by timer; stop returns -ESRCH */ + tlob_stop_task(current); + + /* + * Synchronise: ensure the probe callback has completed before we + * read the captured fields. + */ + tracepoint_synchronize_unregister(); + unregister_trace_tlob_budget_exceeded(probe_tlob_budget_exceeded, &cap); + + KUNIT_EXPECT_EQ(test, atomic_read(&cap.fired), 1); + KUNIT_EXPECT_EQ(test, (int)cap.pid, (int)current->pid); + KUNIT_EXPECT_EQ(test, cap.threshold_us, threshold_us); + /* Total elapsed must cover at least the budget */ + KUNIT_EXPECT_GE(test, cap.on_cpu_us + cap.off_cpu_us, threshold_us); +} + +/* + * Holder kthread grabs a mutex for 80 ms; arm 10 ms budget, burn ~1 ms + * on-CPU, then block on the mutex. Timer fires off-CPU. Verify: + * state_is_on_cpu =3D=3D false, switches >=3D 1, off_cpu_us > 0. + */ +static void tlob_trace_budget_exceeded_off_cpu(struct kunit *test) +{ + struct tlob_exceeded_capture cap =3D {}; + struct tlob_holder_ctx ctx =3D { .hold_ms =3D 80 }; + struct task_struct *holder; + const u64 threshold_us =3D 10000; /* 10 ms */ + ktime_t t0; + int ret; + + atomic_set(&cap.fired, 0); + + mutex_init(&ctx.lock); + init_completion(&ctx.ready); + + holder =3D kthread_run(tlob_holder_fn, &ctx, "tlob_holder2_kunit"); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, holder); + wait_for_completion(&ctx.ready); + + ret =3D register_trace_tlob_budget_exceeded(probe_tlob_budget_exceeded, + &cap); + KUNIT_ASSERT_EQ(test, ret, 0); + + ret =3D tlob_start_task(current, threshold_us, NULL, 0); + KUNIT_ASSERT_EQ(test, ret, 0); + + /* Phase 1: ~1 ms on-CPU */ + t0 =3D ktime_get(); + while (ktime_us_delta(ktime_get(), t0) < 1000) + cpu_relax(); + + /* Phase 2: block -> off-CPU; timer fires here */ + mutex_lock(&ctx.lock); + mutex_unlock(&ctx.lock); + + tlob_stop_task(current); + + tracepoint_synchronize_unregister(); + unregister_trace_tlob_budget_exceeded(probe_tlob_budget_exceeded, &cap); + + KUNIT_EXPECT_EQ(test, atomic_read(&cap.fired), 1); + KUNIT_EXPECT_EQ(test, cap.threshold_us, threshold_us); + /* Violation happened off-CPU */ + KUNIT_EXPECT_FALSE(test, cap.state_is_on_cpu); + /* At least the switch_out event was counted */ + KUNIT_EXPECT_GE(test, (u64)cap.switches, (u64)1); + /* Off-CPU time must be non-zero */ + KUNIT_EXPECT_GT(test, cap.off_cpu_us, (u64)0); +} + +/* threshold_us in the tracepoint must exactly match the start argument. */ +static void tlob_trace_threshold_field_accuracy(struct kunit *test) +{ + static const u64 thresholds[] =3D { 500, 1000, 3000 }; + unsigned int i; + + for (i =3D 0; i < ARRAY_SIZE(thresholds); i++) { + struct tlob_exceeded_capture cap =3D {}; + ktime_t t0; + int ret; + + atomic_set(&cap.fired, 0); + + ret =3D register_trace_tlob_budget_exceeded( + probe_tlob_budget_exceeded, &cap); + KUNIT_ASSERT_EQ(test, ret, 0); + + ret =3D tlob_start_task(current, thresholds[i], NULL, 0); + KUNIT_ASSERT_EQ(test, ret, 0); + + /* Spin for 20x the threshold to ensure timer fires */ + t0 =3D ktime_get(); + while (ktime_us_delta(ktime_get(), t0) < + (s64)(thresholds[i] * 20)) + cpu_relax(); + + tlob_stop_task(current); + + tracepoint_synchronize_unregister(); + unregister_trace_tlob_budget_exceeded( + probe_tlob_budget_exceeded, &cap); + + KUNIT_EXPECT_EQ_MSG(test, cap.threshold_us, thresholds[i], + "threshold mismatch for entry %u", i); + } +} + +static int tlob_trace_suite_init(struct kunit_suite *suite) +{ + int ret; + + ret =3D tlob_init_monitor(); + if (ret) + return ret; + return tlob_enable_hooks(); +} + +static void tlob_trace_suite_exit(struct kunit_suite *suite) +{ + tlob_disable_hooks(); + tlob_destroy_monitor(); +} + +static struct kunit_case tlob_trace_output_cases[] =3D { + KUNIT_CASE(tlob_trace_budget_exceeded_on_cpu), + KUNIT_CASE(tlob_trace_budget_exceeded_off_cpu), + KUNIT_CASE(tlob_trace_threshold_field_accuracy), + {} +}; + +static struct kunit_suite tlob_trace_output_suite =3D { + .name =3D "tlob_trace_output", + .suite_init =3D tlob_trace_suite_init, + .suite_exit =3D tlob_trace_suite_exit, + .test_cases =3D tlob_trace_output_cases, +}; + +/* Suite 5: ring buffer */ + +/* + * Allocate a synthetic rv_file_priv for ring buffer tests. Uses + * kunit_kzalloc() instead of __get_free_pages() since the ring is never + * mmap'd here. + */ +static struct rv_file_priv *alloc_priv_kunit(struct kunit *test, u32 cap) +{ + struct rv_file_priv *priv; + struct tlob_ring *ring; + + priv =3D kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL); + if (!priv) + return NULL; + + ring =3D &priv->ring; + + ring->page =3D kunit_kzalloc(test, sizeof(struct tlob_mmap_page), + GFP_KERNEL); + if (!ring->page) + return NULL; + + ring->data =3D kunit_kzalloc(test, cap * sizeof(struct tlob_event), + GFP_KERNEL); + if (!ring->data) + return NULL; + + ring->mask =3D cap - 1; + ring->page->capacity =3D cap; + ring->page->version =3D 1; + ring->page->data_offset =3D PAGE_SIZE; /* nominal; not used in tests */ + ring->page->record_size =3D sizeof(struct tlob_event); + spin_lock_init(&ring->lock); + init_waitqueue_head(&priv->waitq); + return priv; +} + +/* Push one record and verify all fields survive the round-trip. */ +static void tlob_event_push_one(struct kunit *test) +{ + struct rv_file_priv *priv; + struct tlob_ring *ring; + struct tlob_event in =3D { + .tid =3D 1234, + .threshold_us =3D 5000, + .on_cpu_us =3D 3000, + .off_cpu_us =3D 2000, + .switches =3D 3, + .state =3D 1, + }; + struct tlob_event out =3D {}; + u32 tail; + + priv =3D alloc_priv_kunit(test, TLOB_RING_DEFAULT_CAP); + KUNIT_ASSERT_NOT_NULL(test, priv); + + ring =3D &priv->ring; + + tlob_event_push_kunit(priv, &in); + + /* One record written, none dropped */ + KUNIT_EXPECT_EQ(test, ring->page->data_head, 1u); + KUNIT_EXPECT_EQ(test, ring->page->data_tail, 0u); + KUNIT_EXPECT_EQ(test, ring->page->dropped, 0ull); + + /* Dequeue manually */ + tail =3D ring->page->data_tail; + out =3D ring->data[tail & ring->mask]; + ring->page->data_tail =3D tail + 1; + + KUNIT_EXPECT_EQ(test, out.tid, in.tid); + KUNIT_EXPECT_EQ(test, out.threshold_us, in.threshold_us); + KUNIT_EXPECT_EQ(test, out.on_cpu_us, in.on_cpu_us); + KUNIT_EXPECT_EQ(test, out.off_cpu_us, in.off_cpu_us); + KUNIT_EXPECT_EQ(test, out.switches, in.switches); + KUNIT_EXPECT_EQ(test, out.state, in.state); + + /* Ring is now empty */ + KUNIT_EXPECT_EQ(test, ring->page->data_head, ring->page->data_tail); +} + +/* + * Fill to capacity, push one more. Drop-new policy: head stays at cap, + * dropped =3D=3D 1, oldest record is preserved. + */ +static void tlob_event_push_overflow(struct kunit *test) +{ + struct rv_file_priv *priv; + struct tlob_ring *ring; + struct tlob_event ntf =3D {}; + struct tlob_event out =3D {}; + const u32 cap =3D TLOB_RING_MIN_CAP; + u32 i; + + priv =3D alloc_priv_kunit(test, cap); + KUNIT_ASSERT_NOT_NULL(test, priv); + + ring =3D &priv->ring; + + /* Push cap + 1 records; tid encodes the sequence */ + for (i =3D 0; i <=3D cap; i++) { + ntf.tid =3D i; + ntf.threshold_us =3D (u64)i * 1000; + tlob_event_push_kunit(priv, &ntf); + } + + /* Drop-new: head stopped at cap; one record was silently discarded */ + KUNIT_EXPECT_EQ(test, ring->page->data_head, cap); + KUNIT_EXPECT_EQ(test, ring->page->data_tail, 0u); + KUNIT_EXPECT_EQ(test, ring->page->dropped, 1ull); + + /* Oldest surviving record must be the first one pushed (tid =3D=3D 0) */ + out =3D ring->data[ring->page->data_tail & ring->mask]; + KUNIT_EXPECT_EQ(test, out.tid, 0u); + + /* Drain the ring; the last record must have tid =3D=3D cap - 1 */ + for (i =3D 0; i < cap; i++) { + u32 tail =3D ring->page->data_tail; + + out =3D ring->data[tail & ring->mask]; + ring->page->data_tail =3D tail + 1; + } + KUNIT_EXPECT_EQ(test, out.tid, cap - 1); + KUNIT_EXPECT_EQ(test, ring->page->data_head, ring->page->data_tail); +} + +/* A freshly initialised ring is empty. */ +static void tlob_event_empty(struct kunit *test) +{ + struct rv_file_priv *priv; + struct tlob_ring *ring; + + priv =3D alloc_priv_kunit(test, TLOB_RING_DEFAULT_CAP); + KUNIT_ASSERT_NOT_NULL(test, priv); + + ring =3D &priv->ring; + + KUNIT_EXPECT_EQ(test, ring->page->data_head, 0u); + KUNIT_EXPECT_EQ(test, ring->page->data_tail, 0u); + KUNIT_EXPECT_EQ(test, ring->page->dropped, 0ull); +} + +/* A kthread blocks on wait_event_interruptible(); pushing one record must + * wake it within 1 s. + */ + +struct tlob_wakeup_ctx { + struct rv_file_priv *priv; + struct completion ready; + struct completion done; + int woke; +}; + +static int tlob_wakeup_thread(void *arg) +{ + struct tlob_wakeup_ctx *ctx =3D arg; + struct tlob_ring *ring =3D &ctx->priv->ring; + + complete(&ctx->ready); + + wait_event_interruptible(ctx->priv->waitq, + smp_load_acquire(&ring->page->data_head) !=3D + READ_ONCE(ring->page->data_tail) || + kthread_should_stop()); + + if (smp_load_acquire(&ring->page->data_head) !=3D + READ_ONCE(ring->page->data_tail)) + ctx->woke =3D 1; + + complete(&ctx->done); + return 0; +} + +static void tlob_ring_wakeup(struct kunit *test) +{ + struct rv_file_priv *priv; + struct tlob_wakeup_ctx ctx; + struct task_struct *t; + struct tlob_event ev =3D { .tid =3D 99 }; + long timeout; + + priv =3D alloc_priv_kunit(test, TLOB_RING_DEFAULT_CAP); + KUNIT_ASSERT_NOT_NULL(test, priv); + + init_completion(&ctx.ready); + init_completion(&ctx.done); + ctx.priv =3D priv; + ctx.woke =3D 0; + + t =3D kthread_run(tlob_wakeup_thread, &ctx, "tlob_wakeup_kunit"); + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, t); + get_task_struct(t); + + /* Let the kthread reach wait_event_interruptible */ + wait_for_completion(&ctx.ready); + usleep_range(10000, 20000); + + /* Push one record -- must wake the waiter */ + tlob_event_push_kunit(priv, &ev); + + timeout =3D wait_for_completion_timeout(&ctx.done, msecs_to_jiffies(1000)= ); + kthread_stop(t); + put_task_struct(t); + + KUNIT_EXPECT_GT(test, timeout, 0L); + KUNIT_EXPECT_EQ(test, ctx.woke, 1); + KUNIT_EXPECT_EQ(test, priv->ring.page->data_head, 1u); +} + +static struct kunit_case tlob_event_buf_cases[] =3D { + KUNIT_CASE(tlob_event_push_one), + KUNIT_CASE(tlob_event_push_overflow), + KUNIT_CASE(tlob_event_empty), + KUNIT_CASE(tlob_ring_wakeup), + {} +}; + +static struct kunit_suite tlob_event_buf_suite =3D { + .name =3D "tlob_event_buf", + .test_cases =3D tlob_event_buf_cases, +}; + +/* Suite 6: uprobe format string parser */ + +/* Happy path: decimal offsets, plain path. */ +static void tlob_parse_decimal_offsets(struct kunit *test) +{ + char buf[] =3D "5000:4768:4848:/usr/bin/myapp"; + u64 thr; loff_t start, stop; char *path; + + KUNIT_EXPECT_EQ(test, + tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop), + 0); + KUNIT_EXPECT_EQ(test, thr, (u64)5000); + KUNIT_EXPECT_EQ(test, start, (loff_t)4768); + KUNIT_EXPECT_EQ(test, stop, (loff_t)4848); + KUNIT_EXPECT_STREQ(test, path, "/usr/bin/myapp"); +} + +/* Happy path: 0x-prefixed hex offsets. */ +static void tlob_parse_hex_offsets(struct kunit *test) +{ + char buf[] =3D "10000:0x12a0:0x12f0:/usr/bin/myapp"; + u64 thr; loff_t start, stop; char *path; + + KUNIT_EXPECT_EQ(test, + tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop), + 0); + KUNIT_EXPECT_EQ(test, start, (loff_t)0x12a0); + KUNIT_EXPECT_EQ(test, stop, (loff_t)0x12f0); + KUNIT_EXPECT_STREQ(test, path, "/usr/bin/myapp"); +} + +/* Path containing ':' must not be truncated. */ +static void tlob_parse_path_with_colon(struct kunit *test) +{ + char buf[] =3D "1000:0x100:0x200:/opt/my:app/bin"; + u64 thr; loff_t start, stop; char *path; + + KUNIT_EXPECT_EQ(test, + tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop), + 0); + KUNIT_EXPECT_STREQ(test, path, "/opt/my:app/bin"); +} + +/* Zero threshold must be rejected. */ +static void tlob_parse_zero_threshold(struct kunit *test) +{ + char buf[] =3D "0:0x100:0x200:/usr/bin/myapp"; + u64 thr; loff_t start, stop; char *path; + + KUNIT_EXPECT_EQ(test, + tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop), + -EINVAL); +} + +/* Empty path (trailing ':' with nothing after) must be rejected. */ +static void tlob_parse_empty_path(struct kunit *test) +{ + char buf[] =3D "5000:0x100:0x200:"; + u64 thr; loff_t start, stop; char *path; + + KUNIT_EXPECT_EQ(test, + tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop), + -EINVAL); +} + +/* Missing field (3 tokens instead of 4) must be rejected. */ +static void tlob_parse_too_few_fields(struct kunit *test) +{ + char buf[] =3D "5000:0x100:/usr/bin/myapp"; + u64 thr; loff_t start, stop; char *path; + + KUNIT_EXPECT_EQ(test, + tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop), + -EINVAL); +} + +/* Negative offset must be rejected. */ +static void tlob_parse_negative_offset(struct kunit *test) +{ + char buf[] =3D "5000:-1:0x200:/usr/bin/myapp"; + u64 thr; loff_t start, stop; char *path; + + KUNIT_EXPECT_EQ(test, + tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop), + -EINVAL); +} + +static struct kunit_case tlob_parse_uprobe_cases[] =3D { + KUNIT_CASE(tlob_parse_decimal_offsets), + KUNIT_CASE(tlob_parse_hex_offsets), + KUNIT_CASE(tlob_parse_path_with_colon), + KUNIT_CASE(tlob_parse_zero_threshold), + KUNIT_CASE(tlob_parse_empty_path), + KUNIT_CASE(tlob_parse_too_few_fields), + KUNIT_CASE(tlob_parse_negative_offset), + {} +}; + +static struct kunit_suite tlob_parse_uprobe_suite =3D { + .name =3D "tlob_parse_uprobe", + .test_cases =3D tlob_parse_uprobe_cases, +}; + +kunit_test_suites(&tlob_automaton_suite, + &tlob_task_api_suite, + &tlob_sched_integration_suite, + &tlob_trace_output_suite, + &tlob_event_buf_suite, + &tlob_parse_uprobe_suite); + +MODULE_DESCRIPTION("KUnit tests for the tlob RV monitor"); +MODULE_LICENSE("GPL"); --=20 2.43.0 From nobody Mon Jun 15 18:00:20 2026 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 281DF37CD44 for ; Sun, 12 Apr 2026 19:28:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776022089; cv=none; b=qGeuJAl4020qGd0pDjf133Bbny4rmM8a7ccrutVvHM1M13qikzH8w/ddoQ7c8ZpjF3khLE2PlSPVXkQDOJt8bEMhu9esf8fQgmAuNCbxaSlNVK62v7mapsF0y5lpKFgY7kpt8z9n0zB6iuPGuuQN1vPr6y3xs+LGluhzG1owiWk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776022089; c=relaxed/simple; bh=n9OKgvDOHezgdOXRpwIepwoG6YT2/0v7G8Oc0Mtqk7U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=rD4WRcLlMu1f0KxHAe89DX6Qkq/wPD4SFoefvaI21uW0MVrBUbkyNTRGNssT4egLwn1LalfJs6lak1rmPV+Hshp9aD5a3BdnZyLteDtbqSa4M3o9WGM7Wv8dzdkORXhLqpc0Wbf09SJSPBxHalR25MMSYfQKDl+qTaBuWOKd1XY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=WtGcCqfB; arc=none smtp.client-ip=91.218.175.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="WtGcCqfB" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1776022083; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZVl0/J49loY/n0G4qtERGNyHMovdk7XGYo1kwnRYBr4=; b=WtGcCqfBlXIGAJMBOdolPj8UKxJegKmJuBBoUUOI1rJHNJvxmW1TPAtSkW2agvtS9Pxq3l EAKvaE+QoEBM0/MvFyDGZpk8WsLSJi1krzZJQi+21qGQPDnh7iEk1ILPPYGtuwQMrjyFtu fSuIERXBw4sW0C/2TXpiIafJAA2jKp0= From: wen.yang@linux.dev To: Steven Rostedt , Gabriele Monaco , Masami Hiramatsu , Mathieu Desnoyers Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Wen Yang Subject: [RFC PATCH 4/4] selftests/rv: Add selftest for the tlob monitor Date: Mon, 13 Apr 2026 03:27:21 +0800 Message-Id: <5bdd82dd8aeb1d3f955b727ae1fce9819b35c170.1776020428.git.wen.yang@linux.dev> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Wen Yang Add a kselftest suite (TAP output, 19 test points) for the tlob RV monitor under tools/testing/selftests/rv/. test_tlob.sh drives a compiled C helper (tlob_helper) and, for uprobe tests, a target binary (tlob_uprobe_target). Coverage spans the tracefs enable/disable path, uprobe-triggered violations, and the ioctl interface (within-budget stop, CPU-bound and sleep violations, duplicate start, ring buffer mmap and consumption). Requires CONFIG_RV_MON_TLOB=3Dy and CONFIG_RV_CHARDEV=3Dy; must be run as root. Signed-off-by: Wen Yang --- tools/include/uapi/linux/rv.h | 54 + tools/testing/selftests/rv/Makefile | 18 + tools/testing/selftests/rv/test_tlob.sh | 563 ++++++++++ tools/testing/selftests/rv/tlob_helper.c | 994 ++++++++++++++++++ .../testing/selftests/rv/tlob_uprobe_target.c | 108 ++ 5 files changed, 1737 insertions(+) create mode 100644 tools/include/uapi/linux/rv.h create mode 100644 tools/testing/selftests/rv/Makefile create mode 100755 tools/testing/selftests/rv/test_tlob.sh create mode 100644 tools/testing/selftests/rv/tlob_helper.c create mode 100644 tools/testing/selftests/rv/tlob_uprobe_target.c diff --git a/tools/include/uapi/linux/rv.h b/tools/include/uapi/linux/rv.h new file mode 100644 index 000000000..bef07aded --- /dev/null +++ b/tools/include/uapi/linux/rv.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * UAPI definitions for Runtime Verification (RV) monitors. + * + * This is a tools-friendly copy of include/uapi/linux/rv.h. + * Keep in sync with the kernel header. + */ + +#ifndef _UAPI_LINUX_RV_H +#define _UAPI_LINUX_RV_H + +#include +#include + +/* Magic byte shared by all RV monitor ioctls. */ +#define RV_IOC_MAGIC 0xB9 + +/* ----------------------------------------------------------------------- + * tlob: task latency over budget monitor (nr 0x01 - 0x1F) + * ----------------------------------------------------------------------- + */ + +struct tlob_start_args { + __u64 threshold_us; + __u64 tag; + __s32 notify_fd; + __u32 flags; +}; + +struct tlob_event { + __u32 tid; + __u32 pad; + __u64 threshold_us; + __u64 on_cpu_us; + __u64 off_cpu_us; + __u32 switches; + __u32 state; /* 1 =3D on_cpu, 0 =3D off_cpu */ + __u64 tag; +}; + +struct tlob_mmap_page { + __u32 data_head; + __u32 data_tail; + __u32 capacity; + __u32 version; + __u32 data_offset; + __u32 record_size; + __u64 dropped; +}; + +#define TLOB_IOCTL_TRACE_START _IOW(RV_IOC_MAGIC, 0x01, struct tlob_start_= args) +#define TLOB_IOCTL_TRACE_STOP _IO(RV_IOC_MAGIC, 0x02) + +#endif /* _UAPI_LINUX_RV_H */ diff --git a/tools/testing/selftests/rv/Makefile b/tools/testing/selftests/= rv/Makefile new file mode 100644 index 000000000..14e94a1ab --- /dev/null +++ b/tools/testing/selftests/rv/Makefile @@ -0,0 +1,18 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for rv selftests + +TEST_GEN_PROGS :=3D tlob_helper tlob_uprobe_target + +TEST_PROGS :=3D \ + test_tlob.sh \ + +# TOOLS_INCLUDES is defined by ../lib.mk; provides -isystem to +# tools/include/uapi so that #include resolves to the +# in-tree UAPI header without requiring make headers_install. +# Note: both must be added to the global variables, not as target-specific +# overrides, because lib.mk rewrites TEST_GEN_PROGS to $(OUTPUT)/name +# before per-target rules would be evaluated. +CFLAGS +=3D $(TOOLS_INCLUDES) +LDLIBS +=3D -lpthread + +include ../lib.mk diff --git a/tools/testing/selftests/rv/test_tlob.sh b/tools/testing/selfte= sts/rv/test_tlob.sh new file mode 100755 index 000000000..3ba2125eb --- /dev/null +++ b/tools/testing/selftests/rv/test_tlob.sh @@ -0,0 +1,563 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Selftest for the tlob (task latency over budget) RV monitor. +# +# Two interfaces are tested: +# +# 1. tracefs interface: +# enable/disable, presence of tracefs files, +# uprobe binding (threshold_us:offset_start:offset_stop:binary_path= ) and +# violation detection via the ftrace ring buffer. +# +# 2. /dev/rv ioctl self-instrumentation (via tlob_helper): +# within-budget, over-budget on-CPU, over-budget off-CPU (sleep), +# double-start, stop-without-start. +# +# Written to be POSIX sh compatible (no bash-specific extensions). + +ksft_skip=3D4 +t_pass=3D0; t_fail=3D0; t_skip=3D0; t_total=3D0 + +tap_header() { echo "TAP version 13"; } +tap_plan() { echo "1..$1"; } +tap_pass() { t_pass=3D$((t_pass+1)); echo "ok $t_total - $1"; } +tap_fail() { t_fail=3D$((t_fail+1)); echo "not ok $t_total - $1" + [ -n "$2" ] && echo " # $2"; } +tap_skip() { t_skip=3D$((t_skip+1)); echo "ok $t_total - $1 # SKIP $2"; } +next_test() { t_total=3D$((t_total+1)); } + +TRACEFS=3D$(grep -m1 tracefs /proc/mounts 2>/dev/null | awk '{print $2}') +[ -z "$TRACEFS" ] && TRACEFS=3D/sys/kernel/tracing + +RV_DIR=3D"${TRACEFS}/rv" +TLOB_DIR=3D"${RV_DIR}/monitors/tlob" +TRACE_FILE=3D"${TRACEFS}/trace" +TRACING_ON=3D"${TRACEFS}/tracing_on" +TLOB_MONITOR=3D"${TLOB_DIR}/monitor" +BUDGET_EXCEEDED_ENABLE=3D"${TRACEFS}/events/rv/tlob_budget_exceeded/enable" +RV_DEV=3D"/dev/rv" + +# tlob_helper and tlob_uprobe_target must be in the same directory as +# this script or on PATH. +SCRIPT_DIR=3D$(dirname "$0") +IOCTL_HELPER=3D"${SCRIPT_DIR}/tlob_helper" +UPROBE_TARGET=3D"${SCRIPT_DIR}/tlob_uprobe_target" + +check_root() { [ "$(id -u)" =3D "0" ] || { echo "# Need root" >&2; exi= t $ksft_skip; }; } +check_tracefs() { [ -d "${TRACEFS}" ] || { echo "# No tracefs" >&2; exi= t $ksft_skip; }; } +check_rv_dir() { [ -d "${RV_DIR}" ] || { echo "# No RV infra" >&2; ex= it $ksft_skip; }; } +check_tlob() { [ -d "${TLOB_DIR}" ] || { echo "# No tlob monitor" >&2= ; exit $ksft_skip; }; } + +tlob_enable() { echo 1 > "${TLOB_DIR}/enable"; } +tlob_disable() { echo 0 > "${TLOB_DIR}/enable" 2>/dev/null; } +tlob_is_enabled() { [ "$(cat "${TLOB_DIR}/enable" 2>/dev/null)" =3D "1= " ]; } +trace_event_enable() { echo 1 > "${BUDGET_EXCEEDED_ENABLE}" 2>/dev/null; } +trace_event_disable() { echo 0 > "${BUDGET_EXCEEDED_ENABLE}" 2>/dev/null; } +trace_on() { echo 1 > "${TRACING_ON}" 2>/dev/null; } +trace_clear() { echo > "${TRACE_FILE}"; } +trace_grep() { grep -q "$1" "${TRACE_FILE}" 2>/dev/null; } + +cleanup() { + tlob_disable + trace_event_disable + trace_clear +} + +# ------------------------------------------------------------------------= --- +# Test 1: enable / disable +# ------------------------------------------------------------------------= --- +run_test_enable_disable() { + next_test; cleanup + tlob_enable + if ! tlob_is_enabled; then + tap_fail "enable_disable" "not enabled after echo 1"; cleanup; return + fi + tlob_disable + if tlob_is_enabled; then + tap_fail "enable_disable" "still enabled after echo 0"; cleanup; return + fi + tap_pass "enable_disable"; cleanup +} + +# ------------------------------------------------------------------------= --- +# Test 2: tracefs files present +# ------------------------------------------------------------------------= --- +run_test_tracefs_files() { + next_test; cleanup + missing=3D"" + for f in enable desc monitor; do + [ ! -e "${TLOB_DIR}/${f}" ] && missing=3D"${missing} ${f}" + done + [ -n "${missing}" ] \ + && tap_fail "tracefs_files" "missing:${missing}" \ + || tap_pass "tracefs_files" + cleanup +} + +# ------------------------------------------------------------------------= --- +# Helper: resolve file offset of a function inside a binary. +# +# Usage: resolve_offset +# Prints the hex file offset, or empty string on failure. +# ------------------------------------------------------------------------= --- +resolve_offset() { + bin=3D$1; vaddr=3D$2 + # Parse /proc/self/maps to find the mapping that contains vaddr. + # Each line: start-end perms offset dev inode [path] + while IFS=3D read -r line; do + set -- $line + range=3D$1; off=3D$4; path=3D$7 + [ -z "$path" ] && continue + # Only consider the mapping for our binary + [ "$path" !=3D "$bin" ] && continue + # Split range into start and end + start=3D$(echo "$range" | cut -d- -f1) + end=3D$(echo "$range" | cut -d- -f2) + # Convert hex to decimal for comparison (use printf) + s=3D$(printf "%d" "0x${start}" 2>/dev/null) || continue + e=3D$(printf "%d" "0x${end}" 2>/dev/null) || continue + v=3D$(printf "%d" "${vaddr}" 2>/dev/null) || continue + o=3D$(printf "%d" "0x${off}" 2>/dev/null) || continue + if [ "$v" -ge "$s" ] && [ "$v" -lt "$e" ]; then + file_off=3D$(printf "0x%x" $(( (v - s) + o ))) + echo "$file_off" + return + fi + done < /proc/self/maps +} + +# ------------------------------------------------------------------------= --- +# Test 3: uprobe binding - no false positive +# +# Bind this process with a 10 s budget. Do nothing for 0.5 s. +# No budget_exceeded event should appear in the trace. +# ------------------------------------------------------------------------= --- +run_test_uprobe_no_false_positive() { + next_test; cleanup + if [ ! -e "${TLOB_MONITOR}" ]; then + tap_skip "uprobe_no_false_positive" "monitor file not available" + cleanup; return + fi + # We probe the "sleep" command that we will run as a subprocess. + # Use /bin/sleep as the binary; find a valid function offset (0x0 + # resolves to the ELF entry point, which is sufficient for a + # no-false-positive test since we just need the binding to exist). + sleep_bin=3D$(command -v sleep 2>/dev/null) + if [ -z "$sleep_bin" ]; then + tap_skip "uprobe_no_false_positive" "sleep not found"; cleanup; return + fi + pid=3D$$ + # offset 0x0 probes the entry point of /bin/sleep - this is a + # deliberate probe that will not fire during a simple 'sleep 10' + # invoked in a subshell, but registers the pid in tlob. + # + # Instead, bind our own pid with a generous 10 s threshold and + # verify that 0.5 s of idle time does NOT fire the timer. + # + # Since we cannot easily get a valid uprobe offset in pure shell, + # we skip this sub-test if we cannot form a valid binding. + exe=3D$(readlink /proc/self/exe 2>/dev/null) + if [ -z "$exe" ]; then + tap_skip "uprobe_no_false_positive" "cannot read /proc/self/exe" + cleanup; return + fi + trace_event_enable + trace_on + tlob_enable + trace_clear + # Sleep without any binding - just verify no spurious events + sleep 0.5 + trace_grep "budget_exceeded" \ + && tap_fail "uprobe_no_false_positive" \ + "spurious budget_exceeded without any binding" \ + || tap_pass "uprobe_no_false_positive" + cleanup +} + +# ------------------------------------------------------------------------= --- +# Helper: get_uprobe_offset +# +# Use tlob_helper sym_offset to get the ELF file offset of +# in . Prints the hex offset (e.g. "0x11d0") or empty string on +# failure. +# ------------------------------------------------------------------------= --- +get_uprobe_offset() { + bin=3D$1; sym=3D$2 + if [ ! -x "${IOCTL_HELPER}" ]; then + return + fi + "${IOCTL_HELPER}" sym_offset "${bin}" "${sym}" 2>/dev/null +} + +# ------------------------------------------------------------------------= --- +# Test 4: uprobe binding - violation detected +# +# Start tlob_uprobe_target (a busy-spin binary with a well-known symbol), +# attach a uprobe on tlob_busy_work with a 10 ms threshold, and verify +# that a budget_expired event appears. +# ------------------------------------------------------------------------= --- +run_test_uprobe_violation() { + next_test; cleanup + if [ ! -e "${TLOB_MONITOR}" ]; then + tap_skip "uprobe_violation" "monitor file not available" + cleanup; return + fi + if [ ! -x "${UPROBE_TARGET}" ]; then + tap_skip "uprobe_violation" \ + "tlob_uprobe_target not found or not executable" + cleanup; return + fi + + # Get the file offsets of the start and stop probe symbols + busy_offset=3D$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work") + if [ -z "${busy_offset}" ]; then + tap_skip "uprobe_violation" \ + "cannot resolve tlob_busy_work offset in ${UPROBE_TARGET}" + cleanup; return + fi + stop_offset=3D$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work_done= ") + if [ -z "${stop_offset}" ]; then + tap_skip "uprobe_violation" \ + "cannot resolve tlob_busy_work_done offset in ${UPROBE_TARGET}" + cleanup; return + fi + + # Start the busy-spin target (run for 30 s so the test can observe it) + "${UPROBE_TARGET}" 30000 & + busy_pid=3D$! + sleep 0.05 + + trace_event_enable + trace_on + tlob_enable + trace_clear + + # Bind the target: 10 us budget; start=3Dtlob_busy_work, stop=3Dtlob_busy= _work_done + binding=3D"10:${busy_offset}:${stop_offset}:${UPROBE_TARGET}" + if ! echo "${binding}" > "${TLOB_MONITOR}" 2>/dev/null; then + kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null + tap_skip "uprobe_violation" \ + "uprobe binding rejected (CONFIG_UPROBES=3Dy needed)" + cleanup; return + fi + + # Wait up to 2 s for a budget_exceeded event + found=3D0; i=3D0 + while [ "$i" -lt 20 ]; do + sleep 0.1 + trace_grep "budget_exceeded" && { found=3D1; break; } + i=3D$((i+1)) + done + + echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null + kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null + + if [ "${found}" !=3D "1" ]; then + tap_fail "uprobe_violation" "no budget_exceeded within 2 s" + cleanup; return + fi + + # Validate the event fields: threshold must match, on_cpu must be non-zero + # (CPU-bound violation), and state must be on_cpu. + ev=3D$(grep "budget_exceeded" "${TRACE_FILE}" | head -n 1) + if ! echo "${ev}" | grep -q "threshold=3D10 "; then + tap_fail "uprobe_violation" "threshold field mismatch: ${ev}" + cleanup; return + fi + on_cpu=3D$(echo "${ev}" | grep -o "on_cpu=3D[0-9]*" | cut -d=3D -f2) + if [ "${on_cpu:-0}" -eq 0 ]; then + tap_fail "uprobe_violation" "on_cpu=3D0 for a CPU-bound spin: ${ev}" + cleanup; return + fi + if ! echo "${ev}" | grep -q "state=3Don_cpu"; then + tap_fail "uprobe_violation" "state is not on_cpu: ${ev}" + cleanup; return + fi + tap_pass "uprobe_violation" + cleanup +} + +# ------------------------------------------------------------------------= --- +# Test 5: uprobe binding - remove binding stops monitoring +# +# Bind a pid via tlob_uprobe_target, then immediately remove it. +# Verify that after removal the monitor file no longer lists the pid. +# ------------------------------------------------------------------------= --- +run_test_uprobe_unbind() { + next_test; cleanup + if [ ! -e "${TLOB_MONITOR}" ]; then + tap_skip "uprobe_unbind" "monitor file not available" + cleanup; return + fi + if [ ! -x "${UPROBE_TARGET}" ]; then + tap_skip "uprobe_unbind" \ + "tlob_uprobe_target not found or not executable" + cleanup; return + fi + + busy_offset=3D$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work") + stop_offset=3D$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work_done= ") + if [ -z "${busy_offset}" ] || [ -z "${stop_offset}" ]; then + tap_skip "uprobe_unbind" \ + "cannot resolve tlob_busy_work/tlob_busy_work_done offset" + cleanup; return + fi + + "${UPROBE_TARGET}" 30000 & + busy_pid=3D$! + sleep 0.05 + + tlob_enable + # 5 s budget - should not fire during this quick test + binding=3D"5000000:${busy_offset}:${stop_offset}:${UPROBE_TARGET}" + if ! echo "${binding}" > "${TLOB_MONITOR}" 2>/dev/null; then + kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null + tap_skip "uprobe_unbind" \ + "uprobe binding rejected (CONFIG_UPROBES=3Dy needed)" + cleanup; return + fi + + # Remove the binding + echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null + + # The monitor file should no longer list the binding for this offset + if grep -q "^[0-9]*:0x${busy_offset#0x}:" "${TLOB_MONITOR}" 2>/dev/null; = then + kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null + tap_fail "uprobe_unbind" "pid still listed after removal" + cleanup; return + fi + + kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null + tap_pass "uprobe_unbind" + cleanup +} + +# ------------------------------------------------------------------------= --- +# Test 6: uprobe - duplicate offset_start rejected +# +# Registering a second binding with the same offset_start in the same bina= ry +# must be rejected with an error, since two entry uprobes at the same addr= ess +# would cause double tlob_start_task() calls and undefined behaviour. +# ------------------------------------------------------------------------= --- +run_test_uprobe_duplicate_offset() { + next_test; cleanup + if [ ! -e "${TLOB_MONITOR}" ]; then + tap_skip "uprobe_duplicate_offset" "monitor file not available" + cleanup; return + fi + if [ ! -x "${UPROBE_TARGET}" ]; then + tap_skip "uprobe_duplicate_offset" \ + "tlob_uprobe_target not found or not executable" + cleanup; return + fi + + busy_offset=3D$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work") + stop_offset=3D$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work_done= ") + if [ -z "${busy_offset}" ] || [ -z "${stop_offset}" ]; then + tap_skip "uprobe_duplicate_offset" \ + "cannot resolve tlob_busy_work/tlob_busy_work_done offset" + cleanup; return + fi + + tlob_enable + + # First binding: should succeed + if ! echo "5000000:${busy_offset}:${stop_offset}:${UPROBE_TARGET}" \ + > "${TLOB_MONITOR}" 2>/dev/null; then + tap_skip "uprobe_duplicate_offset" \ + "uprobe binding rejected (CONFIG_UPROBES=3Dy needed)" + cleanup; return + fi + + # Second binding with same offset_start: must be rejected + if echo "9999:${busy_offset}:${stop_offset}:${UPROBE_TARGET}" \ + > "${TLOB_MONITOR}" 2>/dev/null; then + echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null + tap_fail "uprobe_duplicate_offset" \ + "duplicate offset_start was accepted (expected error)" + cleanup; return + fi + + echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null + tap_pass "uprobe_duplicate_offset" + cleanup +} + + +# +# Region A: tlob_busy_work with a 5 s budget - should NOT fire during the = test. +# Region B: tlob_busy_work_done with a 10 us budget - SHOULD fire quickly = since +# tlob_uprobe_target calls tlob_busy_work_done after a busy spin. +# +# Verifies that independent bindings for different offsets in the same bin= ary +# are tracked separately and that only the tight-budget binding triggers a +# budget_exceeded event. +# ------------------------------------------------------------------------= --- +run_test_uprobe_independent_thresholds() { + next_test; cleanup + if [ ! -e "${TLOB_MONITOR}" ]; then + tap_skip "uprobe_independent_thresholds" \ + "monitor file not available"; cleanup; return + fi + if [ ! -x "${UPROBE_TARGET}" ]; then + tap_skip "uprobe_independent_thresholds" \ + "tlob_uprobe_target not found or not executable" + cleanup; return + fi + + busy_offset=3D$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work") + busy_stop_offset=3D$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work= _done") + if [ -z "${busy_offset}" ] || [ -z "${busy_stop_offset}" ]; then + tap_skip "uprobe_independent_thresholds" \ + "cannot resolve tlob_busy_work/tlob_busy_work_done offset" + cleanup; return + fi + + "${UPROBE_TARGET}" 30000 & + busy_pid=3D$! + sleep 0.05 + + trace_event_enable + trace_on + tlob_enable + trace_clear + + # Region A: generous 5 s budget on tlob_busy_work entry (should not fire) + if ! echo "5000000:${busy_offset}:${busy_stop_offset}:${UPROBE_TARGET}" \ + > "${TLOB_MONITOR}" 2>/dev/null; then + kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null + tap_skip "uprobe_independent_thresholds" \ + "uprobe binding rejected (CONFIG_UPROBES=3Dy needed)" + cleanup; return + fi + # Region B: tight 10 us budget on tlob_busy_work_done (fires quickly) + echo "10:${busy_stop_offset}:${busy_stop_offset}:${UPROBE_TARGET}" \ + > "${TLOB_MONITOR}" 2>/dev/null + + found=3D0; i=3D0 + while [ "$i" -lt 20 ]; do + sleep 0.1 + trace_grep "budget_exceeded" && { found=3D1; break; } + i=3D$((i+1)) + done + + echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null + echo "-${busy_stop_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/n= ull + kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null + + if [ "${found}" !=3D "1" ]; then + tap_fail "uprobe_independent_thresholds" \ + "budget_exceeded not raised for tight-budget region within 2 s" + cleanup; return + fi + + # The violation must carry threshold=3D10 (Region B's budget). + ev=3D$(grep "budget_exceeded" "${TRACE_FILE}" | head -n 1) + if ! echo "${ev}" | grep -q "threshold=3D10 "; then + tap_fail "uprobe_independent_thresholds" \ + "violation threshold is not Region B's 10 us: ${ev}" + cleanup; return + fi + tap_pass "uprobe_independent_thresholds" + cleanup +} + +# ------------------------------------------------------------------------= --- +# ioctl tests via tlob_helper +# +# Each test invokes the helper with a sub-test name. +# Exit code: 0=3Dpass, 1=3Dfail, 2=3Dskip. +# ------------------------------------------------------------------------= --- +run_ioctl_test() { + testname=3D$1 + next_test + + if [ ! -x "${IOCTL_HELPER}" ]; then + tap_skip "ioctl_${testname}" \ + "tlob_helper not found or not executable" + return + fi + if [ ! -c "${RV_DEV}" ]; then + tap_skip "ioctl_${testname}" \ + "${RV_DEV} not present (CONFIG_RV_CHARDEV=3Dy needed)" + return + fi + + tlob_enable + "${IOCTL_HELPER}" "${testname}" + rc=3D$? + tlob_disable + + case "${rc}" in + 0) tap_pass "ioctl_${testname}" ;; + 2) tap_skip "ioctl_${testname}" "helper returned skip" ;; + *) tap_fail "ioctl_${testname}" "helper exited with code ${rc}" ;; + esac +} + +# run_ioctl_test_not_enabled - like run_ioctl_test but deliberately does N= OT +# enable the tlob monitor before invoking the helper. Used to verify that +# ioctls issued against a disabled monitor return ENODEV rather than crash= ing +# the kernel with a NULL pointer dereference. +run_ioctl_test_not_enabled() +{ + next_test + + if [ ! -x "${IOCTL_HELPER}" ]; then + tap_skip "ioctl_not_enabled" \ + "tlob_helper not found or not executable" + return + fi + if [ ! -c "${RV_DEV}" ]; then + tap_skip "ioctl_not_enabled" \ + "${RV_DEV} not present (CONFIG_RV_CHARDEV=3Dy needed)" + return + fi + + # Monitor intentionally left disabled. + tlob_disable + "${IOCTL_HELPER}" not_enabled + rc=3D$? + + case "${rc}" in + 0) tap_pass "ioctl_not_enabled" ;; + 2) tap_skip "ioctl_not_enabled" "helper returned skip" ;; + *) tap_fail "ioctl_not_enabled" "helper exited with code ${rc}" ;; + esac +} + +# ------------------------------------------------------------------------= --- +# Main +# ------------------------------------------------------------------------= --- +check_root; check_tracefs; check_rv_dir; check_tlob +tap_header; tap_plan 20 + +# tracefs interface tests +run_test_enable_disable +run_test_tracefs_files + +# uprobe external monitoring tests +run_test_uprobe_no_false_positive +run_test_uprobe_violation +run_test_uprobe_unbind +run_test_uprobe_duplicate_offset +run_test_uprobe_independent_thresholds + +# /dev/rv ioctl self-instrumentation tests +run_ioctl_test_not_enabled +run_ioctl_test within_budget +run_ioctl_test over_budget_cpu +run_ioctl_test over_budget_sleep +run_ioctl_test double_start +run_ioctl_test stop_no_start +run_ioctl_test multi_thread +run_ioctl_test self_watch +run_ioctl_test invalid_flags +run_ioctl_test notify_fd_bad +run_ioctl_test mmap_basic +run_ioctl_test mmap_errors +run_ioctl_test mmap_consume + +echo "# Passed: ${t_pass} Failed: ${t_fail} Skipped: ${t_skip}" +[ "${t_fail}" -gt 0 ] && exit 1 || exit 0 diff --git a/tools/testing/selftests/rv/tlob_helper.c b/tools/testing/selft= ests/rv/tlob_helper.c new file mode 100644 index 000000000..cd76b56d1 --- /dev/null +++ b/tools/testing/selftests/rv/tlob_helper.c @@ -0,0 +1,994 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * tlob_helper.c - test helper and ELF utility for tlob selftests + * + * Called by test_tlob.sh to exercise the /dev/rv ioctl interface and to + * resolve ELF symbol offsets for uprobe bindings. One subcommand per + * invocation so the shell script can report each as an independent TAP + * test case. + * + * Usage: tlob_helper [args...] + * + * Synchronous TRACE_START / TRACE_STOP tests: + * not_enabled - TRACE_START without tlob enabled -> ENODEV (no k= ernel crash) + * within_budget - start(50000 us), sleep 10 ms, stop -> expect 0 + * over_budget_cpu - start(5000 us), busyspin 100 ms, stop -> EOVERFL= OW + * over_budget_sleep - start(3000 us), sleep 50 ms, stop -> EOVERFLOW + * + * Error-handling tests: + * double_start - two starts without stop -> EEXIST on second + * stop_no_start - stop without start -> ESRCH + * + * Per-thread isolation test: + * multi_thread - two threads share one fd; one within budget, one= over + * + * Asynchronous notification test (notify_fd + read()): + * self_watch - one worker exceeds budget; monitor fd receives o= ne ntf via read() + * + * Input-validation tests (TRACE_START error paths): + * invalid_flags - TRACE_START with flags !=3D 0 -> EINVAL + * notify_fd_bad - TRACE_START with notify_fd =3D stdout (non-rv fd= ) -> EINVAL + * + * mmap ring buffer tests (Scenario D): + * mmap_basic - mmap succeeds; verify tlob_mmap_page fields + * (version, capacity, data_offset, record_size) + * mmap_errors - MAP_PRIVATE, wrong size, and non-zero pgoff all + * return EINVAL + * mmap_consume - trigger a real violation via self-notification a= nd + * consume the event through the mmap'd ring + * + * ELF utility (does not require /dev/rv): + * sym_offset + * - print the ELF file offset of in + * (used by the shell script to build uprobe bindin= gs) + * + * Exit code: 0 =3D pass, 1 =3D fail, 2 =3D skip (device not available). + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* Default ring capacity allocated at open(); matches TLOB_RING_DEFAULT_CA= P. */ +#define TLOB_RING_DEFAULT_CAP 64U + +static int rv_fd =3D -1; + +static int open_rv(void) +{ + rv_fd =3D open("/dev/rv", O_RDWR); + if (rv_fd < 0) { + fprintf(stderr, "open /dev/rv: %s\n", strerror(errno)); + return -1; + } + return 0; +} + +static void busy_spin_us(unsigned long us) +{ + struct timespec start, now; + unsigned long elapsed; + + clock_gettime(CLOCK_MONOTONIC, &start); + do { + clock_gettime(CLOCK_MONOTONIC, &now); + elapsed =3D (unsigned long)(now.tv_sec - start.tv_sec) + * 1000000000UL + + (unsigned long)(now.tv_nsec - start.tv_nsec); + } while (elapsed < us * 1000UL); +} + +static int do_start(uint64_t threshold_us) +{ + struct tlob_start_args args =3D { + .threshold_us =3D threshold_us, + .notify_fd =3D -1, + }; + + return ioctl(rv_fd, TLOB_IOCTL_TRACE_START, &args); +} + +static int do_stop(void) +{ + return ioctl(rv_fd, TLOB_IOCTL_TRACE_STOP, NULL); +} + +/* ----------------------------------------------------------------------- + * Synchronous TRACE_START / TRACE_STOP tests + * ----------------------------------------------------------------------- + */ + +/* + * test_not_enabled - TRACE_START must return ENODEV when the tlob monitor + * has not been enabled (tlob_state_cache is NULL). + * + * The shell wrapper deliberately does NOT call tlob_enable before invoking + * this subcommand, so the ioctl is expected to fail with ENODEV rather th= an + * crashing the kernel with a NULL pointer dereference in kmem_cache_alloc. + */ +static int test_not_enabled(void) +{ + int ret; + + ret =3D do_start(1000); + if (ret =3D=3D 0) { + fprintf(stderr, "TRACE_START: expected ENODEV, got success\n"); + do_stop(); + return 1; + } + if (errno !=3D ENODEV) { + fprintf(stderr, "TRACE_START: expected ENODEV, got %s\n", + strerror(errno)); + return 1; + } + return 0; +} + +static int test_within_budget(void) +{ + int ret; + + if (do_start(50000) < 0) { + fprintf(stderr, "TRACE_START: %s\n", strerror(errno)); + return 1; + } + usleep(10000); /* 10 ms < 50 ms budget */ + ret =3D do_stop(); + if (ret !=3D 0) { + fprintf(stderr, "TRACE_STOP: expected 0, got %d errno=3D%s\n", + ret, strerror(errno)); + return 1; + } + return 0; +} + +static int test_over_budget_cpu(void) +{ + int ret; + + if (do_start(5000) < 0) { + fprintf(stderr, "TRACE_START: %s\n", strerror(errno)); + return 1; + } + busy_spin_us(100000); /* 100 ms >> 5 ms budget */ + ret =3D do_stop(); + if (ret =3D=3D 0) { + fprintf(stderr, "TRACE_STOP: expected EOVERFLOW, got 0\n"); + return 1; + } + if (errno !=3D EOVERFLOW) { + fprintf(stderr, "TRACE_STOP: expected EOVERFLOW, got %s\n", + strerror(errno)); + return 1; + } + return 0; +} + +static int test_over_budget_sleep(void) +{ + int ret; + + if (do_start(3000) < 0) { + fprintf(stderr, "TRACE_START: %s\n", strerror(errno)); + return 1; + } + usleep(50000); /* 50 ms >> 3 ms budget, off-CPU time counts */ + ret =3D do_stop(); + if (ret =3D=3D 0) { + fprintf(stderr, "TRACE_STOP: expected EOVERFLOW, got 0\n"); + return 1; + } + if (errno !=3D EOVERFLOW) { + fprintf(stderr, "TRACE_STOP: expected EOVERFLOW, got %s\n", + strerror(errno)); + return 1; + } + return 0; +} + +/* ----------------------------------------------------------------------- + * Error-handling tests + * ----------------------------------------------------------------------- + */ + +static int test_double_start(void) +{ + int ret; + + if (do_start(10000000) < 0) { + fprintf(stderr, "first TRACE_START: %s\n", strerror(errno)); + return 1; + } + ret =3D do_start(10000000); + if (ret =3D=3D 0) { + fprintf(stderr, "second TRACE_START: expected EEXIST, got 0\n"); + do_stop(); + return 1; + } + if (errno !=3D EEXIST) { + fprintf(stderr, "second TRACE_START: expected EEXIST, got %s\n", + strerror(errno)); + do_stop(); + return 1; + } + do_stop(); /* clean up */ + return 0; +} + +static int test_stop_no_start(void) +{ + int ret; + + /* Ensure clean state: ignore error from a stale entry */ + do_stop(); + + ret =3D do_stop(); + if (ret =3D=3D 0) { + fprintf(stderr, "TRACE_STOP: expected ESRCH, got 0\n"); + return 1; + } + if (errno !=3D ESRCH) { + fprintf(stderr, "TRACE_STOP: expected ESRCH, got %s\n", + strerror(errno)); + return 1; + } + return 0; +} + +/* ----------------------------------------------------------------------- + * Per-thread isolation test + * + * Two threads share a single /dev/rv fd. The monitor uses task_struct * + * as the key, so each thread gets an independent slot regardless of the + * shared fd. + * ----------------------------------------------------------------------- + */ + +struct mt_thread_args { + uint64_t threshold_us; + unsigned long workload_us; + int busy; + int expect_eoverflow; + int result; +}; + +static void *mt_thread_fn(void *arg) +{ + struct mt_thread_args *a =3D arg; + int ret; + + if (do_start(a->threshold_us) < 0) { + fprintf(stderr, "thread TRACE_START: %s\n", strerror(errno)); + a->result =3D 1; + return NULL; + } + + if (a->busy) + busy_spin_us(a->workload_us); + else + usleep(a->workload_us); + + ret =3D do_stop(); + if (a->expect_eoverflow) { + if (ret =3D=3D 0 || errno !=3D EOVERFLOW) { + fprintf(stderr, "thread: expected EOVERFLOW, got ret=3D%d errno=3D%s\n", + ret, strerror(errno)); + a->result =3D 1; + return NULL; + } + } else { + if (ret !=3D 0) { + fprintf(stderr, "thread: expected 0, got ret=3D%d errno=3D%s\n", + ret, strerror(errno)); + a->result =3D 1; + return NULL; + } + } + a->result =3D 0; + return NULL; +} + +static int test_multi_thread(void) +{ + pthread_t ta, tb; + struct mt_thread_args a =3D { + .threshold_us =3D 20000, /* 20 ms */ + .workload_us =3D 5000, /* 5 ms sleep -> within budget */ + .busy =3D 0, + .expect_eoverflow =3D 0, + }; + struct mt_thread_args b =3D { + .threshold_us =3D 3000, /* 3 ms */ + .workload_us =3D 30000, /* 30 ms spin -> over budget */ + .busy =3D 1, + .expect_eoverflow =3D 1, + }; + + pthread_create(&ta, NULL, mt_thread_fn, &a); + pthread_create(&tb, NULL, mt_thread_fn, &b); + pthread_join(ta, NULL); + pthread_join(tb, NULL); + + return (a.result || b.result) ? 1 : 0; +} + +/* ----------------------------------------------------------------------- + * Asynchronous notification test (notify_fd + read()) + * + * A dedicated monitor_fd is opened by the main thread. Two worker threads + * each open their own work_fd and call TLOB_IOCTL_TRACE_START with + * notify_fd =3D monitor_fd, nominating it as the violation target. Worke= r A + * stays within budget; worker B exceeds it. The main thread reads from + * monitor_fd and expects exactly one tlob_event record. + * ----------------------------------------------------------------------- + */ + +struct sw_worker_args { + int monitor_fd; + uint64_t threshold_us; + unsigned long workload_us; + int busy; + int result; +}; + +static void *sw_worker_fn(void *arg) +{ + struct sw_worker_args *a =3D arg; + struct tlob_start_args args =3D { + .threshold_us =3D a->threshold_us, + .notify_fd =3D a->monitor_fd, + }; + int work_fd; + int ret; + + work_fd =3D open("/dev/rv", O_RDWR); + if (work_fd < 0) { + fprintf(stderr, "worker open /dev/rv: %s\n", strerror(errno)); + a->result =3D 1; + return NULL; + } + + ret =3D ioctl(work_fd, TLOB_IOCTL_TRACE_START, &args); + if (ret < 0) { + fprintf(stderr, "TRACE_START (notify): %s\n", strerror(errno)); + close(work_fd); + a->result =3D 1; + return NULL; + } + + if (a->busy) + busy_spin_us(a->workload_us); + else + usleep(a->workload_us); + + ioctl(work_fd, TLOB_IOCTL_TRACE_STOP, NULL); + close(work_fd); + a->result =3D 0; + return NULL; +} + +static int test_self_watch(void) +{ + int monitor_fd; + pthread_t ta, tb; + struct sw_worker_args a =3D { + .threshold_us =3D 50000, /* 50 ms */ + .workload_us =3D 5000, /* 5 ms sleep -> no violation */ + .busy =3D 0, + }; + struct sw_worker_args b =3D { + .threshold_us =3D 3000, /* 3 ms */ + .workload_us =3D 30000, /* 30 ms spin -> violation */ + .busy =3D 1, + }; + struct tlob_event ntfs[8]; + int violations =3D 0; + ssize_t n; + + /* + * Open monitor_fd with O_NONBLOCK so read() after the workers finish + * returns immediately rather than blocking forever. + */ + monitor_fd =3D open("/dev/rv", O_RDWR | O_NONBLOCK); + if (monitor_fd < 0) { + fprintf(stderr, "open /dev/rv (monitor_fd): %s\n", strerror(errno)); + return 1; + } + a.monitor_fd =3D monitor_fd; + b.monitor_fd =3D monitor_fd; + + pthread_create(&ta, NULL, sw_worker_fn, &a); + pthread_create(&tb, NULL, sw_worker_fn, &b); + pthread_join(ta, NULL); + pthread_join(tb, NULL); + + if (a.result || b.result) { + close(monitor_fd); + return 1; + } + + /* + * Drain all available tlob_event records. With O_NONBLOCK the final + * read() returns -EAGAIN when the buffer is empty. + */ + while ((n =3D read(monitor_fd, ntfs, sizeof(ntfs))) > 0) + violations +=3D (int)(n / sizeof(struct tlob_event)); + + close(monitor_fd); + + if (violations !=3D 1) { + fprintf(stderr, "self_watch: expected 1 violation, got %d\n", + violations); + return 1; + } + return 0; +} + +/* ----------------------------------------------------------------------- + * Input-validation tests (TRACE_START error paths) + * ----------------------------------------------------------------------- + */ + +/* + * test_invalid_flags - TRACE_START with flags !=3D 0 must return EINVAL. + * + * The flags field is reserved for future extensions and must be zero. + * Callers that set it to a non-zero value are rejected early so that a + * future kernel can assign meaning to those bits without silently + * ignoring them. + */ +static int test_invalid_flags(void) +{ + struct tlob_start_args args =3D { + .threshold_us =3D 1000, + .notify_fd =3D -1, + .flags =3D 1, /* non-zero: must be rejected */ + }; + int ret; + + ret =3D ioctl(rv_fd, TLOB_IOCTL_TRACE_START, &args); + if (ret =3D=3D 0) { + fprintf(stderr, "TRACE_START(flags=3D1): expected EINVAL, got success\n"= ); + do_stop(); + return 1; + } + if (errno !=3D EINVAL) { + fprintf(stderr, "TRACE_START(flags=3D1): expected EINVAL, got %s\n", + strerror(errno)); + return 1; + } + return 0; +} + +/* + * test_notify_fd_bad - TRACE_START with a non-/dev/rv notify_fd must retu= rn + * EINVAL. + * + * When notify_fd >=3D 0, the kernel resolves it to a struct file and chec= ks + * that its private_data is non-NULL (i.e. it is a /dev/rv file descriptor= ). + * Passing stdout (fd 1) supplies a real, open fd whose private_data is NU= LL, + * so the kernel must reject it with EINVAL. + */ +static int test_notify_fd_bad(void) +{ + struct tlob_start_args args =3D { + .threshold_us =3D 1000, + .notify_fd =3D STDOUT_FILENO, /* open but not a /dev/rv fd */ + .flags =3D 0, + }; + int ret; + + ret =3D ioctl(rv_fd, TLOB_IOCTL_TRACE_START, &args); + if (ret =3D=3D 0) { + fprintf(stderr, + "TRACE_START(notify_fd=3Dstdout): expected EINVAL, got success\n"); + do_stop(); + return 1; + } + if (errno !=3D EINVAL) { + fprintf(stderr, + "TRACE_START(notify_fd=3Dstdout): expected EINVAL, got %s\n", + strerror(errno)); + return 1; + } + return 0; +} + +/* ----------------------------------------------------------------------- + * mmap ring buffer tests (Scenario D) + * ----------------------------------------------------------------------- + */ + +/* + * test_mmap_basic - mmap the ring buffer and verify the control page fiel= ds. + * + * The kernel allocates TLOB_RING_DEFAULT_CAP records at open(). A shared + * mmap of PAGE_SIZE + cap * record_size must succeed and the tlob_mmap_pa= ge + * header must contain consistent values. + */ +static int test_mmap_basic(void) +{ + long pagesize =3D sysconf(_SC_PAGESIZE); + size_t mmap_len =3D (size_t)pagesize + + TLOB_RING_DEFAULT_CAP * sizeof(struct tlob_event); + /* rv_mmap requires a page-aligned length */ + mmap_len =3D (mmap_len + (size_t)(pagesize - 1)) & ~(size_t)(pagesize - 1= ); + struct tlob_mmap_page *page; + struct tlob_event *data; + void *map; + int ret =3D 0; + + map =3D mmap(NULL, mmap_len, PROT_READ | PROT_WRITE, MAP_SHARED, rv_fd, 0= ); + if (map =3D=3D MAP_FAILED) { + fprintf(stderr, "mmap_basic: mmap: %s\n", strerror(errno)); + return 1; + } + + page =3D (struct tlob_mmap_page *)map; + data =3D (struct tlob_event *)((char *)map + page->data_offset); + + if (page->version !=3D 1) { + fprintf(stderr, "mmap_basic: expected version=3D1, got %u\n", + page->version); + ret =3D 1; + goto out; + } + if (page->capacity !=3D TLOB_RING_DEFAULT_CAP) { + fprintf(stderr, "mmap_basic: expected capacity=3D%u, got %u\n", + TLOB_RING_DEFAULT_CAP, page->capacity); + ret =3D 1; + goto out; + } + if (page->data_offset !=3D (uint32_t)pagesize) { + fprintf(stderr, "mmap_basic: expected data_offset=3D%ld, got %u\n", + pagesize, page->data_offset); + ret =3D 1; + goto out; + } + if (page->record_size !=3D sizeof(struct tlob_event)) { + fprintf(stderr, "mmap_basic: expected record_size=3D%zu, got %u\n", + sizeof(struct tlob_event), page->record_size); + ret =3D 1; + goto out; + } + if (page->data_head !=3D 0 || page->data_tail !=3D 0) { + fprintf(stderr, "mmap_basic: ring not empty at open: head=3D%u tail=3D%u= \n", + page->data_head, page->data_tail); + ret =3D 1; + goto out; + } + /* Touch the data array to confirm it is accessible. */ + (void)data[0].tid; +out: + munmap(map, mmap_len); + return ret; +} + +/* + * test_mmap_errors - verify that rv_mmap() rejects invalid mmap parameter= s. + * + * Four cases are tested, each must return MAP_FAILED with errno =3D=3D EI= NVAL: + * 1. size one page short of the correct ring length + * 2. size one page larger than the correct ring length + * 3. MAP_PRIVATE (only MAP_SHARED is permitted) + * 4. non-zero vm_pgoff (offset must be 0) + */ +static int test_mmap_errors(void) +{ + long pagesize =3D sysconf(_SC_PAGESIZE); + size_t correct_len =3D (size_t)pagesize + + TLOB_RING_DEFAULT_CAP * sizeof(struct tlob_event); + /* rv_mmap requires a page-aligned length */ + correct_len =3D (correct_len + (size_t)(pagesize - 1)) & ~(size_t)(pagesi= ze - 1); + void *map; + int ret =3D 0; + + /* Case 1: size one page short (correct_len - 1 still rounds up to correc= t_len) */ + map =3D mmap(NULL, correct_len - (size_t)pagesize, PROT_READ | PROT_WRITE, + MAP_SHARED, rv_fd, 0); + if (map !=3D MAP_FAILED) { + fprintf(stderr, "mmap_errors: short-size mmap succeeded (expected EINVAL= )\n"); + munmap(map, correct_len - (size_t)pagesize); + ret =3D 1; + } else if (errno !=3D EINVAL) { + fprintf(stderr, "mmap_errors: short-size: expected EINVAL, got %s\n", + strerror(errno)); + ret =3D 1; + } + + /* Case 2: size one page too large */ + map =3D mmap(NULL, correct_len + (size_t)pagesize, PROT_READ | PROT_WRITE, + MAP_SHARED, rv_fd, 0); + if (map !=3D MAP_FAILED) { + fprintf(stderr, "mmap_errors: oversized mmap succeeded (expected EINVAL)= \n"); + munmap(map, correct_len + (size_t)pagesize); + ret =3D 1; + } else if (errno !=3D EINVAL) { + fprintf(stderr, "mmap_errors: oversized: expected EINVAL, got %s\n", + strerror(errno)); + ret =3D 1; + } + + /* Case 3: MAP_PRIVATE instead of MAP_SHARED */ + map =3D mmap(NULL, correct_len, PROT_READ | PROT_WRITE, + MAP_PRIVATE, rv_fd, 0); + if (map !=3D MAP_FAILED) { + fprintf(stderr, "mmap_errors: MAP_PRIVATE succeeded (expected EINVAL)\n"= ); + munmap(map, correct_len); + ret =3D 1; + } else if (errno !=3D EINVAL) { + fprintf(stderr, "mmap_errors: MAP_PRIVATE: expected EINVAL, got %s\n", + strerror(errno)); + ret =3D 1; + } + + /* Case 4: non-zero file offset (pgoff =3D 1) */ + map =3D mmap(NULL, correct_len, PROT_READ | PROT_WRITE, + MAP_SHARED, rv_fd, (off_t)pagesize); + if (map !=3D MAP_FAILED) { + fprintf(stderr, "mmap_errors: non-zero pgoff mmap succeeded (expected EI= NVAL)\n"); + munmap(map, correct_len); + ret =3D 1; + } else if (errno !=3D EINVAL) { + fprintf(stderr, "mmap_errors: non-zero pgoff: expected EINVAL, got %s\n", + strerror(errno)); + ret =3D 1; + } + + return ret; +} + +/* + * test_mmap_consume - zero-copy consumption of a real violation event. + * + * Arms a 5 ms budget with self-notification (notify_fd =3D rv_fd), sleeps + * 50 ms (off-CPU violation), then reads the pushed event through the mmap= 'd + * ring without calling read(). Verifies: + * - TRACE_STOP returns EOVERFLOW (budget was exceeded) + * - data_head =3D=3D 1 after the violation + * - the event fields (threshold_us, tag, tid) are correct + * - data_tail can be advanced to consume the record (ring empties) + */ +static int test_mmap_consume(void) +{ + long pagesize =3D sysconf(_SC_PAGESIZE); + size_t mmap_len =3D (size_t)pagesize + + TLOB_RING_DEFAULT_CAP * sizeof(struct tlob_event); + /* rv_mmap requires a page-aligned length */ + mmap_len =3D (mmap_len + (size_t)(pagesize - 1)) & ~(size_t)(pagesize - 1= ); + struct tlob_start_args args =3D { + .threshold_us =3D 5000, /* 5 ms */ + .notify_fd =3D rv_fd, /* self-notification */ + .tag =3D 0xdeadbeefULL, + .flags =3D 0, + }; + struct tlob_mmap_page *page; + struct tlob_event *data; + void *map; + int stop_ret; + int ret =3D 0; + + map =3D mmap(NULL, mmap_len, PROT_READ | PROT_WRITE, MAP_SHARED, rv_fd, 0= ); + if (map =3D=3D MAP_FAILED) { + fprintf(stderr, "mmap_consume: mmap: %s\n", strerror(errno)); + return 1; + } + + page =3D (struct tlob_mmap_page *)map; + data =3D (struct tlob_event *)((char *)map + page->data_offset); + + if (ioctl(rv_fd, TLOB_IOCTL_TRACE_START, &args) < 0) { + fprintf(stderr, "mmap_consume: TRACE_START: %s\n", strerror(errno)); + ret =3D 1; + goto out; + } + + usleep(50000); /* 50 ms >> 5 ms budget -> off-CPU violation */ + + stop_ret =3D ioctl(rv_fd, TLOB_IOCTL_TRACE_STOP, NULL); + if (stop_ret =3D=3D 0) { + fprintf(stderr, "mmap_consume: TRACE_STOP returned 0, expected EOVERFLOW= \n"); + ret =3D 1; + goto out; + } + if (errno !=3D EOVERFLOW) { + fprintf(stderr, "mmap_consume: TRACE_STOP: expected EOVERFLOW, got %s\n", + strerror(errno)); + ret =3D 1; + goto out; + } + + /* Pairs with smp_store_release in tlob_event_push. */ + if (__atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE) !=3D 1) { + fprintf(stderr, "mmap_consume: expected data_head=3D1, got %u\n", + page->data_head); + ret =3D 1; + goto out; + } + if (page->data_tail !=3D 0) { + fprintf(stderr, "mmap_consume: expected data_tail=3D0, got %u\n", + page->data_tail); + ret =3D 1; + goto out; + } + + /* Verify record content */ + if (data[0].threshold_us !=3D 5000) { + fprintf(stderr, "mmap_consume: expected threshold_us=3D5000, got %llu\n", + (unsigned long long)data[0].threshold_us); + ret =3D 1; + goto out; + } + if (data[0].tag !=3D 0xdeadbeefULL) { + fprintf(stderr, "mmap_consume: expected tag=3D0xdeadbeef, got %llx\n", + (unsigned long long)data[0].tag); + ret =3D 1; + goto out; + } + if (data[0].tid =3D=3D 0) { + fprintf(stderr, "mmap_consume: tid is 0\n"); + ret =3D 1; + goto out; + } + + /* Consume: advance data_tail and confirm ring is empty */ + __atomic_store_n(&page->data_tail, 1U, __ATOMIC_RELEASE); + if (__atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE) !=3D + __atomic_load_n(&page->data_tail, __ATOMIC_ACQUIRE)) { + fprintf(stderr, "mmap_consume: ring not empty after consume\n"); + ret =3D 1; + } + +out: + munmap(map, mmap_len); + return ret; +} + +/* ----------------------------------------------------------------------- + * ELF utility: sym_offset + * + * Print the ELF file offset of a symbol in a binary. Supports 32- and + * 64-bit ELF. Walks the section headers to find .symtab (falling back to + * .dynsym), then converts the symbol's virtual address to a file offset + * via the PT_LOAD program headers. + * + * Does not require /dev/rv; used by the shell script to build uprobe + * bindings of the form pid:threshold_us:offset_start:offset_stop:binary_p= ath. + * + * Returns 0 on success (offset printed to stdout), 1 on failure. + * ----------------------------------------------------------------------- + */ +static int sym_offset(const char *binary, const char *symname) +{ + int fd; + struct stat st; + void *map; + Elf64_Ehdr *ehdr; + Elf32_Ehdr *ehdr32; + int is64; + uint64_t sym_vaddr =3D 0; + int found =3D 0; + uint64_t file_offset =3D 0; + + fd =3D open(binary, O_RDONLY); + if (fd < 0) { + fprintf(stderr, "open %s: %s\n", binary, strerror(errno)); + return 1; + } + if (fstat(fd, &st) < 0) { + close(fd); + return 1; + } + map =3D mmap(NULL, (size_t)st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); + close(fd); + if (map =3D=3D MAP_FAILED) { + fprintf(stderr, "mmap: %s\n", strerror(errno)); + return 1; + } + + /* Identify ELF class */ + ehdr =3D (Elf64_Ehdr *)map; + ehdr32 =3D (Elf32_Ehdr *)map; + if (st.st_size < 4 || + ehdr->e_ident[EI_MAG0] !=3D ELFMAG0 || + ehdr->e_ident[EI_MAG1] !=3D ELFMAG1 || + ehdr->e_ident[EI_MAG2] !=3D ELFMAG2 || + ehdr->e_ident[EI_MAG3] !=3D ELFMAG3) { + fprintf(stderr, "%s: not an ELF file\n", binary); + munmap(map, (size_t)st.st_size); + return 1; + } + is64 =3D (ehdr->e_ident[EI_CLASS] =3D=3D ELFCLASS64); + + if (is64) { + /* Walk section headers to find .symtab or .dynsym */ + Elf64_Shdr *shdrs =3D (Elf64_Shdr *)((char *)map + ehdr->e_shoff); + Elf64_Shdr *shstrtab_hdr =3D &shdrs[ehdr->e_shstrndx]; + const char *shstrtab =3D (char *)map + shstrtab_hdr->sh_offset; + int si; + + /* Prefer .symtab; fall back to .dynsym */ + for (int pass =3D 0; pass < 2 && !found; pass++) { + const char *target =3D pass ? ".dynsym" : ".symtab"; + + for (si =3D 0; si < ehdr->e_shnum && !found; si++) { + Elf64_Shdr *sh =3D &shdrs[si]; + const char *name =3D shstrtab + sh->sh_name; + + if (strcmp(name, target) !=3D 0) + continue; + + Elf64_Shdr *strtab_sh =3D &shdrs[sh->sh_link]; + const char *strtab =3D (char *)map + strtab_sh->sh_offset; + Elf64_Sym *syms =3D (Elf64_Sym *)((char *)map + sh->sh_offset); + uint64_t nsyms =3D sh->sh_size / sizeof(Elf64_Sym); + uint64_t j; + + for (j =3D 0; j < nsyms; j++) { + if (strcmp(strtab + syms[j].st_name, symname) =3D=3D 0) { + sym_vaddr =3D syms[j].st_value; + found =3D 1; + break; + } + } + } + } + + if (!found) { + fprintf(stderr, "symbol '%s' not found in %s\n", symname, binary); + munmap(map, (size_t)st.st_size); + return 1; + } + + /* Convert vaddr to file offset via PT_LOAD segments */ + Elf64_Phdr *phdrs =3D (Elf64_Phdr *)((char *)map + ehdr->e_phoff); + int pi; + + for (pi =3D 0; pi < ehdr->e_phnum; pi++) { + Elf64_Phdr *ph =3D &phdrs[pi]; + + if (ph->p_type !=3D PT_LOAD) + continue; + if (sym_vaddr >=3D ph->p_vaddr && + sym_vaddr < ph->p_vaddr + ph->p_filesz) { + file_offset =3D sym_vaddr - ph->p_vaddr + ph->p_offset; + break; + } + } + } else { + /* 32-bit ELF */ + Elf32_Shdr *shdrs =3D (Elf32_Shdr *)((char *)map + ehdr32->e_shoff); + Elf32_Shdr *shstrtab_hdr =3D &shdrs[ehdr32->e_shstrndx]; + const char *shstrtab =3D (char *)map + shstrtab_hdr->sh_offset; + int si; + uint32_t sym_vaddr32 =3D 0; + + for (int pass =3D 0; pass < 2 && !found; pass++) { + const char *target =3D pass ? ".dynsym" : ".symtab"; + + for (si =3D 0; si < ehdr32->e_shnum && !found; si++) { + Elf32_Shdr *sh =3D &shdrs[si]; + const char *name =3D shstrtab + sh->sh_name; + + if (strcmp(name, target) !=3D 0) + continue; + + Elf32_Shdr *strtab_sh =3D &shdrs[sh->sh_link]; + const char *strtab =3D (char *)map + strtab_sh->sh_offset; + Elf32_Sym *syms =3D (Elf32_Sym *)((char *)map + sh->sh_offset); + uint32_t nsyms =3D sh->sh_size / sizeof(Elf32_Sym); + uint32_t j; + + for (j =3D 0; j < nsyms; j++) { + if (strcmp(strtab + syms[j].st_name, symname) =3D=3D 0) { + sym_vaddr32 =3D syms[j].st_value; + found =3D 1; + break; + } + } + } + } + + if (!found) { + fprintf(stderr, "symbol '%s' not found in %s\n", symname, binary); + munmap(map, (size_t)st.st_size); + return 1; + } + + Elf32_Phdr *phdrs =3D (Elf32_Phdr *)((char *)map + ehdr32->e_phoff); + int pi; + + for (pi =3D 0; pi < ehdr32->e_phnum; pi++) { + Elf32_Phdr *ph =3D &phdrs[pi]; + + if (ph->p_type !=3D PT_LOAD) + continue; + if (sym_vaddr32 >=3D ph->p_vaddr && + sym_vaddr32 < ph->p_vaddr + ph->p_filesz) { + file_offset =3D sym_vaddr32 - ph->p_vaddr + ph->p_offset; + break; + } + } + sym_vaddr =3D sym_vaddr32; + } + + munmap(map, (size_t)st.st_size); + + if (!file_offset && sym_vaddr) { + fprintf(stderr, "could not map vaddr 0x%lx to file offset\n", + (unsigned long)sym_vaddr); + return 1; + } + + printf("0x%lx\n", (unsigned long)file_offset); + return 0; +} + +int main(int argc, char *argv[]) +{ + int rc; + + if (argc < 2) { + fprintf(stderr, "Usage: %s [args...]\n", argv[0]); + return 1; + } + + /* sym_offset does not need /dev/rv */ + if (strcmp(argv[1], "sym_offset") =3D=3D 0) { + if (argc < 4) { + fprintf(stderr, "Usage: %s sym_offset \n", + argv[0]); + return 1; + } + return sym_offset(argv[2], argv[3]); + } + + if (open_rv() < 0) + return 2; /* skip */ + + if (strcmp(argv[1], "not_enabled") =3D=3D 0) + rc =3D test_not_enabled(); + else if (strcmp(argv[1], "within_budget") =3D=3D 0) + rc =3D test_within_budget(); + else if (strcmp(argv[1], "over_budget_cpu") =3D=3D 0) + rc =3D test_over_budget_cpu(); + else if (strcmp(argv[1], "over_budget_sleep") =3D=3D 0) + rc =3D test_over_budget_sleep(); + else if (strcmp(argv[1], "double_start") =3D=3D 0) + rc =3D test_double_start(); + else if (strcmp(argv[1], "stop_no_start") =3D=3D 0) + rc =3D test_stop_no_start(); + else if (strcmp(argv[1], "multi_thread") =3D=3D 0) + rc =3D test_multi_thread(); + else if (strcmp(argv[1], "self_watch") =3D=3D 0) + rc =3D test_self_watch(); + else if (strcmp(argv[1], "invalid_flags") =3D=3D 0) + rc =3D test_invalid_flags(); + else if (strcmp(argv[1], "notify_fd_bad") =3D=3D 0) + rc =3D test_notify_fd_bad(); + else if (strcmp(argv[1], "mmap_basic") =3D=3D 0) + rc =3D test_mmap_basic(); + else if (strcmp(argv[1], "mmap_errors") =3D=3D 0) + rc =3D test_mmap_errors(); + else if (strcmp(argv[1], "mmap_consume") =3D=3D 0) + rc =3D test_mmap_consume(); + else { + fprintf(stderr, "Unknown test: %s\n", argv[1]); + rc =3D 1; + } + + close(rv_fd); + return rc; +} diff --git a/tools/testing/selftests/rv/tlob_uprobe_target.c b/tools/testin= g/selftests/rv/tlob_uprobe_target.c new file mode 100644 index 000000000..6c895cb40 --- /dev/null +++ b/tools/testing/selftests/rv/tlob_uprobe_target.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * tlob_uprobe_target.c - uprobe target binary for tlob selftests. + * + * Provides two well-known probe points: + * tlob_busy_work() - start probe: arms the tlob budget timer + * tlob_busy_work_done() - stop probe: cancels the timer on completion + * + * The tlob selftest writes a five-field uprobe binding: + * pid:threshold_us:binary:offset_start:offset_stop + * where offset_start is the file offset of tlob_busy_work and offset_stop + * is the file offset of tlob_busy_work_done (resolved via tlob_helper + * sym_offset). + * + * Both probe points are plain entry uprobes (no uretprobe). The busy loop + * keeps the task on-CPU so that either the stop probe fires cleanly (with= in + * budget) or the hrtimer fires first and emits tlob_budget_exceeded (over + * budget). + * + * Usage: tlob_uprobe_target + * + * Loops calling tlob_busy_work() in 200 ms iterations until + * has elapsed (0 =3D run for ~24 hours). Short iterations ensure the upr= obe + * entry fires on every call even if the uprobe is installed after the + * program has started. + */ +#define _GNU_SOURCE +#include +#include +#include +#include + +#ifndef noinline +#define noinline __attribute__((noinline)) +#endif + +static inline int timespec_before(const struct timespec *a, + const struct timespec *b) +{ + return a->tv_sec < b->tv_sec || + (a->tv_sec =3D=3D b->tv_sec && a->tv_nsec < b->tv_nsec); +} + +static void timespec_add_ms(struct timespec *ts, unsigned long ms) +{ + ts->tv_sec +=3D ms / 1000; + ts->tv_nsec +=3D (long)(ms % 1000) * 1000000L; + if (ts->tv_nsec >=3D 1000000000L) { + ts->tv_sec++; + ts->tv_nsec -=3D 1000000000L; + } +} + +/* + * tlob_busy_work_done - stop-probe target. + * + * Called by tlob_busy_work() after the busy loop. The uprobe on this + * function's entry fires tlob_stop_task(), cancelling the budget timer. + * noinline ensures the compiler never merges this function with its calle= r, + * guaranteeing the entry uprobe always fires. + */ +noinline void tlob_busy_work_done(void) +{ + /* empty: the uprobe fires on entry */ +} + +/* + * tlob_busy_work - start-probe target. + * + * The uprobe on this function's entry fires tlob_start_task(), arming the + * budget timer. noinline prevents the compiler and linker (including LTO) + * from inlining this function into its callers, ensuring the entry uprobe + * fires on every call. + */ +noinline void tlob_busy_work(unsigned long duration_ns) +{ + struct timespec start, now; + unsigned long elapsed; + + clock_gettime(CLOCK_MONOTONIC, &start); + do { + clock_gettime(CLOCK_MONOTONIC, &now); + elapsed =3D (unsigned long)(now.tv_sec - start.tv_sec) + * 1000000000UL + + (unsigned long)(now.tv_nsec - start.tv_nsec); + } while (elapsed < duration_ns); + + tlob_busy_work_done(); +} + +int main(int argc, char *argv[]) +{ + unsigned long duration_ms =3D 0; + struct timespec deadline, now; + + if (argc >=3D 2) + duration_ms =3D strtoul(argv[1], NULL, 10); + + clock_gettime(CLOCK_MONOTONIC, &deadline); + timespec_add_ms(&deadline, duration_ms ? duration_ms : 86400000UL); + + do { + tlob_busy_work(200 * 1000000UL); /* 200 ms per iteration */ + clock_gettime(CLOCK_MONOTONIC, &now); + } while (timespec_before(&now, &deadline)); + + return 0; +} --=20 2.43.0