From nobody Sun May 24 18:42:57 2026 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A72243BB117 for ; Fri, 22 May 2026 10:41:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779446485; cv=none; b=XktSYBgujKOKYsEuLCdsQ8WhYOhOPJNWXsANkGGH4ZHCTiYd88gng/Q9kXM86hqmRgWASi6CprhyzqEbzNfc7ZfcaK1K60ix/HiVg5kdmZsW5tmPPUS3vXmRO0oHEgZOLohTzIZRVHHI25CVDtvzlK+oOi+XK2xX8fkyMatVXSk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779446485; c=relaxed/simple; bh=YCh9Xsjw5Snz2j+M1AdaOBDzugZJImFYJ/RP87bVT4M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=M6Ga5Nqa/dBK9c0Uo6I4VwdgYDpyAJEAfMsyVdwySOcPM1obsav9GogqrDJw4fWx5b+LQYtOMAnH1yrZo3hupwvwDoKK+Yh0lz+pQzm7hphoQDiwGNdUuXDYBTUEVZezeXC94KQu37pn/j5p9Fl8nZnVvhzRRuZK1Tju/qhS9lI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jw758IGD; arc=none smtp.client-ip=209.85.215.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jw758IGD" Received: by mail-pg1-f181.google.com with SMTP id 41be03b00d2f7-c82471904fcso3125083a12.2 for ; Fri, 22 May 2026 03:41:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779446476; x=1780051276; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yx761+FVC9cow+6cUcd+vha9JXwiIKXa/tnmrKTSGzE=; b=jw758IGDBpfz9VV9PnSq04lS+HbLAfHA7yHuICxHRr2eCnD6fbvvqj5mMhQ1seIVbi p5pALa84pjApGF3AN+iG16NenBsp8aAy+IsF/Cgw+OpeVQFejbiDtA/74nVTipcZIAYX dcL30SQFGJH2f0bNCJ5nIkeYC9UZtDfGs04JPLlwQbd6Xmg9RKNeyVR2KK1SbL+A+Qmt kf3PxJ0kyAGh7hkCgjsm41ejbaT4aK74QPztly1yTgseI3Nz9IuL6IOEkxPc/y0pmDTp cp3BNd2cqyiCvvBu2vUxNgPZ4q0rbFFeqfXRxwr3FAFhSIdUByAUINwrZhzBuQT1pRFz uGAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779446476; x=1780051276; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=yx761+FVC9cow+6cUcd+vha9JXwiIKXa/tnmrKTSGzE=; b=dbxNsFlKTuXU0WLERVylCBtO5TEwJ/0l3pSbsfEgC2Gi74lQs/kl9lWDP2lIqAIc5S QHvqbUAehTAXv/V9iy+xXthXgHCxLoNXBtWergmwQQS09HkWl00LlMd5bWzzWhyi+3Yj WAx24Fy6vVaSpCkupYQPo0rWoyLKC8kB9LOWEC4ULfCK6EWJYy4WsytN1tEF/jv/Yzuq zQbkybe8nbUHIv3+a487ioP0znTux6ZCZA8OuNI4KSPgM8rSzG0yqUaP7afcyTRU8qup HnTZHG6qbPUsDvYasRGjZp+g3L+JRCjpDKzaAw7mNT6b6wwCfYg/MQ3AHcwQZY/WoX5/ BvuA== X-Forwarded-Encrypted: i=1; AFNElJ/4uORYd5laGvufaxTsA+FwmaQQCy5DNjG/AaBHRIPST+cC7GueQRD1BbLRF4UxNlj/IB4d3ERqnxtLcA4=@vger.kernel.org X-Gm-Message-State: AOJu0YxzJu/EifzwghhnfjPa1RF7UXrSSQhqA33dEECSXVnd4uVo/Scv KJ9N+w0/JFTo9dQCXiyXYfWdUE2BgoyMNnZqMwX73dXAAslk714Yy2Hg8IF9Hg== X-Gm-Gg: Acq92OHHlI8cfeoKshJgyCNfGc6XhfK2OyXtNUjCt0gkW8wnkKNrXCIALzDOpmLa5s6 BENpq1hUBD7JWA7Y/m7SfORtiu2k2kfYsxZHp9I8KsyKFmn1j/YG9YB1HEVmK6oFnBmLIlvTCIY +IVNUP/wyhhUSITtUOfxPbpN/VIRgmFa3Ig8ULorNBOzl9b2OZNrKeVG6W5AcZNmOYHwDrtfcXM v4mrirP5FY3wkHpWHKNaRvFc2lLYMO2vA9lwYaxwFJ/IgBD6oER9g8f49I8tB4kgezBW6zP1+ty O0MhV4Dm/eEfGBv1XXu9ZGejHtPkdHoMKLTj06Tq+5f48O3NMF5GrzRvYmH0vXUtlN9+x+TwV42 IOF7LsV0kYh3Y1FtogS+UV4bKynFpwPNDjOhK5Vp16Zu6KScAIV4Tzc2QAhu/GbubYwLT/F+egP QckEwgh9EG+xWXH15Djs6NIhLZehRWfKY0IAyuAA== X-Received: by 2002:a05:6a21:600f:b0:3a0:b65a:5df2 with SMTP id adf61e73a8af0-3b3291e9dfdmr3152993637.23.1779446475790; Fri, 22 May 2026 03:41:15 -0700 (PDT) Received: from localhost.localdomain ([2408:8607:1b00:8:9a21:4fdc:c1a5:7a8e]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c85200c3feesm1282942a12.0.2026.05.22.03.41.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 03:41:15 -0700 (PDT) From: Li Pengfei X-Google-Original-From: Li Pengfei To: linux-trace-kernel@vger.kernel.org Cc: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, cmllamas@google.com, zhangbo56@xiaomi.com, lipengfei28@xiaomi.com, lkp@intel.com Subject: [PATCH v2 1/3] trace: add lock-free stackmap for stack trace deduplication Date: Fri, 22 May 2026 18:40:15 +0800 Message-Id: <20260522104017.1668638-2-lipengfei28@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260522104017.1668638-1-lipengfei28@xiaomi.com> References: <20260514034916.2162517-1-lipengfei28@xiaomi.com> <20260522104017.1668638-1-lipengfei28@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Pengfei Li Add a lock-free hash map (ftrace_stackmap) that deduplicates kernel stack traces for the ftrace ring buffer. Instead of storing full stack traces (80-160 bytes each) in the ring buffer for every event, ftrace can store a 4-byte stack_id when the stackmap option is enabled. The implementation is modeled after tracing_map.c (used by hist triggers), using the same lock-free design based on Dr. Cliff Click's non-blocking hash table algorithm: - Lock-free insert via cmpxchg, safe in NMI/IRQ/any context - Pre-allocated element pool (zero allocation on hot path) - Linear probing with 2x over-provisioned table; probe length is bounded by FTRACE_STACKMAP_MAX_PROBE so worst-case insert/lookup is O(1) even when the table is heavily loaded with claimed-but- empty slots from pool exhaustion - Single global instance (initialized for the global trace array) The stackmap is exported via three tracefs nodes: - stack_map: text export with symbol resolution (mode 0640) - stack_map_stat: counters (entries, successes, drops, success_rate) - stack_map_bin: binary export (all fields native-endian) Counter naming: - 'successes' counts events that were successfully assigned a stack_id (covers both first-time inserts and dedup hits). - 'drops' counts events that fell back to recording the full stack (pool exhausted, probe limit reached, or reset in progress). - 'success_rate' is successes / (successes + drops). Reset semantics: - Reset is a control-path operation only allowed when tracing is stopped on the owning trace_array. Online reset (with tracing active) is intentionally not supported to keep the proof obligations small. - Reset uses atomic_cmpxchg() to claim the resetting flag, then verifies tracer_tracing_is_on() returns false. The resetting flag itself blocks subsequent get_id() callers; userspace re-enabling tracing after our check still cannot let new insertions through. - synchronize_rcu() drains in-flight get_id() callers from the ftrace callback path, which runs preempt-disabled. - Reset clears the resetting flag with atomic_set_release() so a subsequent get_id() observes a fully cleared map. - Concurrent reset returns -EBUSY; reset while tracing is active returns -EBUSY. Concurrency notes: - entry->val publication uses smp_store_release() paired with smp_load_acquire() in all dereferencing readers (lookup, seq_show, bin_open). seq_start/seq_next only check val for NULL and use READ_ONCE(). - elt->nr is read with READ_ONCE() and clamped to MAX_DEPTH before use in seq_show and bin_open. - Pool exhaustion: stackmap_get_elt() short-circuits via atomic_read() before the contended atomic RMW, avoiding cacheline contention once the pool is full. Slots that win cmpxchg but cannot get an elt are left 'claimed but empty'; subsequent lookups treat val=3D=3DNULL as a miss and probe past them. The bounded probe length keeps per-event cost O(1). Hash key: - Per-instance random seed stored in the stackmap struct (no global state), seeded at create time. - 32-bit jhash is forced to 1 if it lands on 0 (which is the free-slot sentinel). Full memcmp confirms matches. Memory: - Single flat vmalloc for the element pool (no per-elt kzalloc). - bits parameter clamped to [10, 18]: at the maximum bits=3D18, the element pool is ~130 MB and a stack_map_bin snapshot may briefly allocate another ~130 MB. - struct stackmap_bin_snapshot uses u64 (not size_t) for its size field so data[] is 8-byte aligned on both 32-bit and 64-bit architectures, avoiding alignment faults when writing u64 IPs on strict-alignment architectures. Kernel command line parameter: - ftrace_stackmap.bits=3DN: set map capacity (2^N unique stacks, range 10-18, default 14) Signed-off-by: Pengfei Li --- kernel/trace/Kconfig | 21 ++ kernel/trace/Makefile | 1 + kernel/trace/trace_stackmap.c | 643 ++++++++++++++++++++++++++++++++++ kernel/trace/trace_stackmap.h | 56 +++ 4 files changed, 721 insertions(+) create mode 100644 kernel/trace/trace_stackmap.c create mode 100644 kernel/trace/trace_stackmap.h diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e130da35808f..2a63fd2c9a96 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -412,6 +412,27 @@ config STACK_TRACER =20 Say N if unsure. =20 +config FTRACE_STACKMAP + bool "Ftrace stack map deduplication" + depends on TRACING + depends on STACKTRACE + select KALLSYMS + help + This enables a global stack trace hash table for ftrace, inspired + by eBPF's BPF_MAP_TYPE_STACK_TRACE. When enabled, ftrace can store + only a stack_id in the ring buffer instead of the full stack trace, + significantly reducing trace buffer usage when the same call stacks + appear repeatedly. + + The deduplicated stacks are exported via: + /sys/kernel/debug/tracing/stack_map + + Writing to this file resets the stack map. Reading shows all unique + stacks with their stack_id and reference count. + + Say Y if you want to reduce ftrace buffer usage for stack traces. + Say N if unsure. + config TRACE_PREEMPT_TOGGLE bool help diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 1decdce8cbef..f1b6175099cc 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -85,6 +85,7 @@ obj-$(CONFIG_HWLAT_TRACER) +=3D trace_hwlat.o obj-$(CONFIG_OSNOISE_TRACER) +=3D trace_osnoise.o obj-$(CONFIG_NOP_TRACER) +=3D trace_nop.o obj-$(CONFIG_STACK_TRACER) +=3D trace_stack.o +obj-$(CONFIG_FTRACE_STACKMAP) +=3D trace_stackmap.o obj-$(CONFIG_MMIOTRACE) +=3D trace_mmiotrace.o obj-$(CONFIG_FUNCTION_GRAPH_TRACER) +=3D trace_functions_graph.o obj-$(CONFIG_TRACE_BRANCH_PROFILING) +=3D trace_branch.o diff --git a/kernel/trace/trace_stackmap.c b/kernel/trace/trace_stackmap.c new file mode 100644 index 000000000000..b23a60e9286c --- /dev/null +++ b/kernel/trace/trace_stackmap.c @@ -0,0 +1,643 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Ftrace Stack Map - Lock-free stack trace deduplication for ftrace + * + * Modeled after tracing_map.c (used by hist triggers), this provides + * a lock-free hash map optimized for the ftrace hot path. The design + * is based on Dr. Cliff Click's non-blocking hash table algorithm. + * + * Key properties: + * - Lock-free insert via cmpxchg, safe in NMI/IRQ/any context + * - Pre-allocated element pool (zero allocation on hot path) + * - Linear probing with 2x over-provisioned table; probe length + * bounded by FTRACE_STACKMAP_MAX_PROBE to keep worst-case lookup + * cost constant even when the table is heavily loaded + * - Single global instance (initialized for the global trace array) + * + * Reset is a control-path operation, only allowed when tracing is + * stopped on the owning trace_array. The protocol is: + * + * - atomic_cmpxchg(&resetting, 0, 1) atomically claims reset rights + * and blocks new get_id() callers (they observe resetting=3D1 and + * return -EINVAL). + * - tracer_tracing_is_on() is checked AFTER the cmpxchg, so the + * resetting flag itself prevents new insertions even if userspace + * re-enables tracing immediately after the check. + * - synchronize_rcu() drains in-flight get_id() callers from the + * ftrace callback path, which runs with preemption disabled. + * + * Online reset (with tracing active) is intentionally not supported + * to keep the design simple and the proof obligations small. + * + * The 32-bit jhash of the stack IPs is the hash table key. On hash + * collision, linear probing finds the next slot and full memcmp + * confirms the match. + * + * Concurrent userspace readers (cat stack_map / stack_map_bin) get + * a best-effort snapshot. They are coherent with the hot path + * (smp_load_acquire on entry->val), but they are not coherent with + * a concurrent reset; since reset requires tracing to be stopped, + * mid-iteration reset can produce truncated or partial output but + * never crashes. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "trace.h" +#include "trace_stackmap.h" + +/* + * Bound the linear-probe scan length. With a 2x over-provisioned table, + * a well-distributed hash gives very short probe chains. Capping at 64 + * keeps worst-case lookup O(1) even when the table is heavily loaded + * with claimed-but-empty slots from pool exhaustion. + */ +#define FTRACE_STACKMAP_MAX_PROBE 64 + +/* + * Each pre-allocated element holds one unique stack trace. + * Fixed size: MAX_DEPTH entries regardless of actual depth. + */ +struct stackmap_elt { + u32 nr; /* actual number of IPs */ + atomic_t ref_count; + unsigned long ips[FTRACE_STACKMAP_MAX_DEPTH]; +}; + +/* + * Hash table entry: a 32-bit key (jhash of stack) + pointer to elt. + * key =3D=3D 0 means the slot is free. + */ +struct stackmap_entry { + u32 key; /* 0 =3D free, non-zero =3D jhash */ + struct stackmap_elt *val; /* NULL until fully published */ +}; + +struct ftrace_stackmap { + struct trace_array *tr; /* owning trace_array */ + unsigned int map_bits; + unsigned int map_size; /* 1 << (map_bits + 1) */ + unsigned int max_elts; /* 1 << map_bits */ + u32 hash_seed; /* per-instance jhash seed */ + atomic_t next_elt; /* index into elts pool */ + struct stackmap_entry *entries; /* hash table */ + struct stackmap_elt *elts; /* flat element pool */ + atomic_t resetting; + atomic64_t successes; /* events served (hits + new inserts) */ + atomic64_t drops; +}; + +/* + * Cap the bits parameter to keep worst-case allocations bounded: + * bits=3D18 =E2=86=92 256K elts, 512K slots, ~130 MB elt pool, ~130 MB = bin + * export. + * Smaller workloads should use the default (14) which gives 16K elts + * (~8 MB pool); bump bits via the ftrace_stackmap.bits=3D kernel + * parameter for higher unique-stack capacity. + */ +#define FTRACE_STACKMAP_BITS_MIN 10 +#define FTRACE_STACKMAP_BITS_MAX 18 +#define FTRACE_STACKMAP_BITS_DEFAULT 14 + +static unsigned int stackmap_map_bits =3D FTRACE_STACKMAP_BITS_DEFAULT; +static int __init stackmap_bits_setup(char *str) +{ + unsigned long val; + + if (kstrtoul(str, 0, &val)) + return -EINVAL; + val =3D clamp_val(val, FTRACE_STACKMAP_BITS_MIN, FTRACE_STACKMAP_BITS_MAX= ); + stackmap_map_bits =3D val; + return 0; +} +early_param("ftrace_stackmap.bits", stackmap_bits_setup); + +/* --- Element pool --- */ + +static struct stackmap_elt *stackmap_get_elt(struct ftrace_stackmap *smap) +{ + int idx; + + /* + * Fast-path early-out once the pool is fully consumed. Avoids + * the contended atomic RMW on next_elt for every traced event + * after the pool is exhausted. + */ + if (atomic_read(&smap->next_elt) >=3D smap->max_elts) + return NULL; + + idx =3D atomic_fetch_add_unless(&smap->next_elt, 1, smap->max_elts); + if (idx < smap->max_elts) + return &smap->elts[idx]; + return NULL; +} + +/* --- Create / Destroy / Reset --- */ + +struct ftrace_stackmap *ftrace_stackmap_create(struct trace_array *tr) +{ + struct ftrace_stackmap *smap; + unsigned int bits; + + smap =3D kzalloc(sizeof(*smap), GFP_KERNEL); + if (!smap) + return ERR_PTR(-ENOMEM); + + /* Defensive clamp: reject bogus bits even if early_param is bypassed. */ + bits =3D clamp_val(stackmap_map_bits, + FTRACE_STACKMAP_BITS_MIN, + FTRACE_STACKMAP_BITS_MAX); + + smap->tr =3D tr; + smap->map_bits =3D bits; + smap->max_elts =3D 1U << bits; + smap->map_size =3D 1U << (bits + 1); /* 2x over-provision */ + BUG_ON(!is_power_of_2(smap->map_size)); + + smap->entries =3D vzalloc(sizeof(*smap->entries) * smap->map_size); + if (!smap->entries) { + kfree(smap); + return ERR_PTR(-ENOMEM); + } + + /* + * Single large vmalloc of the element pool, indexed flat. + * At bits=3D16 this is 64K * sizeof(struct stackmap_elt). The + * struct is ~520 B (8 + 4 + 4 + 64*8), so total ~33 MB. + */ + smap->elts =3D vzalloc(sizeof(*smap->elts) * (size_t)smap->max_elts); + if (!smap->elts) { + vfree(smap->entries); + kfree(smap); + return ERR_PTR(-ENOMEM); + } + + smap->hash_seed =3D get_random_u32(); + atomic_set(&smap->next_elt, 0); + atomic_set(&smap->resetting, 0); + atomic64_set(&smap->successes, 0); + atomic64_set(&smap->drops, 0); + + return smap; +} + +void ftrace_stackmap_destroy(struct ftrace_stackmap *smap) +{ + if (!smap || IS_ERR(smap)) + return; + vfree(smap->elts); + vfree(smap->entries); + kfree(smap); +} + +/** + * ftrace_stackmap_reset - clear all entries in the stackmap + * @smap: the stackmap to reset + * + * Returns 0 on success, -EBUSY if another reset is already in + * progress, or if tracing is currently active on the owning + * trace_array. + * + * Online reset (with tracing active) is not supported. Caller must + * stop tracing first (echo 0 > tracing_on). + * + * Caller is process context (typically sysfs write handler). + * + * Protocol: + * 1. Atomically claim reset rights via cmpxchg on @resetting. + * 2. Verify tracing is stopped on @smap->tr; if not, release the + * claim and return -EBUSY. The resetting flag itself blocks + * any subsequent get_id() callers. + * 3. synchronize_rcu() drains in-flight get_id() callers from the + * ftrace callback path (which runs preempt-disabled). + * 4. memset entries, elts, and counters. + * 5. Release the resetting flag with release semantics so any new + * get_id() observes a fully cleared map. + */ +int ftrace_stackmap_reset(struct ftrace_stackmap *smap) +{ + if (!smap) + return 0; + + if (atomic_cmpxchg(&smap->resetting, 0, 1) !=3D 0) + return -EBUSY; + + if (smap->tr && tracer_tracing_is_on(smap->tr)) { + atomic_set(&smap->resetting, 0); + return -EBUSY; + } + + /* + * synchronize_rcu() itself is a full barrier; no extra smp_mb() + * is needed before it. It drains in-flight ftrace callbacks that + * may have already passed the resetting check with the old value. + */ + synchronize_rcu(); + + memset(smap->entries, 0, sizeof(*smap->entries) * smap->map_size); + memset(smap->elts, 0, sizeof(*smap->elts) * (size_t)smap->max_elts); + + atomic_set(&smap->next_elt, 0); + atomic64_set(&smap->successes, 0); + atomic64_set(&smap->drops, 0); + + /* Release resetting=3D0 so new get_id() observes a cleared map. */ + atomic_set_release(&smap->resetting, 0); + return 0; +} + +/* --- Core: get_id (lock-free, NMI-safe) --- */ + +int ftrace_stackmap_get_id(struct ftrace_stackmap *smap, + unsigned long *ips, unsigned int nr_entries) +{ + u32 key_hash, idx, test_key, trace_len; + struct stackmap_entry *entry; + struct stackmap_elt *val; + int probes =3D 0; + + if (!smap || !nr_entries || atomic_read(&smap->resetting)) + return -EINVAL; + if (nr_entries > FTRACE_STACKMAP_MAX_DEPTH) + nr_entries =3D FTRACE_STACKMAP_MAX_DEPTH; + + trace_len =3D nr_entries * sizeof(unsigned long); + /* + * jhash2() requires the length in u32 units and the data to be + * u32-aligned. On 64-bit kernels sizeof(unsigned long)=3D=3D8, so + * trace_len is always a multiple of 8 (hence of 4). Use jhash2 + * directly; the cast to u32* is safe because ips[] is naturally + * aligned to sizeof(unsigned long) >=3D 4. + */ + key_hash =3D jhash2((const u32 *)ips, trace_len / sizeof(u32), + smap->hash_seed); + if (key_hash =3D=3D 0) + key_hash =3D 1; /* 0 means free slot */ + + idx =3D key_hash >> (32 - (smap->map_bits + 1)); + + while (probes < FTRACE_STACKMAP_MAX_PROBE) { + idx &=3D (smap->map_size - 1); + entry =3D &smap->entries[idx]; + test_key =3D entry->key; + + if (test_key =3D=3D key_hash) { + /* + * smp_load_acquire pairs with smp_store_release in + * the publisher below; ensures we see fully-formed + * elt fields (nr, ips, ref_count) before dereference. + */ + val =3D smp_load_acquire(&entry->val); + if (val && val->nr =3D=3D nr_entries && + memcmp(val->ips, ips, trace_len) =3D=3D 0) { + atomic_inc(&val->ref_count); + atomic64_inc(&smap->successes); + return (int)idx; + } + /* + * val =3D=3D NULL: another CPU is mid-insert, or this + * slot is "claimed but empty" (pool exhausted). + * val !=3D NULL but mismatch: 32-bit hash collision + * with a different stack. In both cases, advance. + */ + } else if (!test_key) { + /* Free slot: try to claim it */ + if (cmpxchg(&entry->key, 0, key_hash) =3D=3D 0) { + struct stackmap_elt *elt; + + elt =3D stackmap_get_elt(smap); + if (!elt) { + /* + * Pool exhausted. We claimed this + * slot with cmpxchg but cannot fill + * it. Leave key set so the slot + * stays "claimed but empty" =E2=80=94 future + * lookups treat val=3D=3DNULL as a miss + * and probe past it. Cannot revert + * key=3D0 without racing other CPUs. + */ + atomic64_inc(&smap->drops); + return -ENOSPC; + } + + elt->nr =3D nr_entries; + atomic_set(&elt->ref_count, 1); + memcpy(elt->ips, ips, trace_len); + + /* + * Publish elt with release semantics so the + * reader's smp_load_acquire can safely + * dereference val->nr / val->ips. + */ + smp_store_release(&entry->val, elt); + atomic64_inc(&smap->successes); + return (int)idx; + } + /* cmpxchg failed; another CPU claimed this slot. */ + } + + idx++; + probes++; + } + + atomic64_inc(&smap->drops); + return -ENOSPC; +} + +/* --- Text export: /sys/kernel/debug/tracing/stack_map --- */ + +struct stackmap_seq_private { + struct ftrace_stackmap *smap; +}; + +static void *stackmap_seq_start(struct seq_file *m, loff_t *pos) +{ + struct stackmap_seq_private *priv =3D m->private; + struct ftrace_stackmap *smap =3D priv->smap; + u32 i; + + if (!smap) + return NULL; + for (i =3D *pos; i < smap->map_size; i++) { + if (smap->entries[i].key && READ_ONCE(smap->entries[i].val)) { + *pos =3D i; + return &smap->entries[i]; + } + } + return NULL; +} + +static void *stackmap_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct stackmap_seq_private *priv =3D m->private; + struct ftrace_stackmap *smap =3D priv->smap; + u32 i; + + if (!smap) + return NULL; + for (i =3D *pos + 1; i < smap->map_size; i++) { + if (smap->entries[i].key && READ_ONCE(smap->entries[i].val)) { + *pos =3D i; + return &smap->entries[i]; + } + } + return NULL; +} + +static void stackmap_seq_stop(struct seq_file *m, void *v) { } + +static int stackmap_seq_show(struct seq_file *m, void *v) +{ + struct stackmap_entry *entry =3D v; + struct stackmap_elt *elt =3D smp_load_acquire(&entry->val); + struct stackmap_seq_private *priv =3D m->private; + u32 idx =3D entry - priv->smap->entries; + u32 i, nr; + + if (!elt) + return 0; + + nr =3D READ_ONCE(elt->nr); + if (nr > FTRACE_STACKMAP_MAX_DEPTH) + nr =3D FTRACE_STACKMAP_MAX_DEPTH; + + seq_printf(m, "stack_id %u [ref %u, depth %u]\n", + idx, atomic_read(&elt->ref_count), nr); + for (i =3D 0; i < nr; i++) + seq_printf(m, " [%u] %pS\n", i, (void *)elt->ips[i]); + seq_putc(m, '\n'); + return 0; +} + +static const struct seq_operations stackmap_seq_ops =3D { + .start =3D stackmap_seq_start, + .next =3D stackmap_seq_next, + .stop =3D stackmap_seq_stop, + .show =3D stackmap_seq_show, +}; + +static int stackmap_open(struct inode *inode, struct file *file) +{ + struct stackmap_seq_private *priv; + struct seq_file *m; + int ret; + + ret =3D seq_open_private(file, &stackmap_seq_ops, + sizeof(struct stackmap_seq_private)); + if (ret) + return ret; + m =3D file->private_data; + priv =3D m->private; + priv->smap =3D inode->i_private; + return 0; +} + +/* + * Accept exactly "0" or "reset" (optionally followed by a single newline). + */ +static bool stackmap_write_is_reset(const char *buf, size_t n) +{ + if (n > 0 && buf[n - 1] =3D=3D '\n') + n--; + return (n =3D=3D 1 && buf[0] =3D=3D '0') || + (n =3D=3D 5 && memcmp(buf, "reset", 5) =3D=3D 0); +} + +static ssize_t stackmap_write(struct file *file, const char __user *ubuf, + size_t count, loff_t *ppos) +{ + struct seq_file *m =3D file->private_data; + struct stackmap_seq_private *priv =3D m->private; + char buf[8]; + size_t n =3D min(count, sizeof(buf) - 1); + int ret; + + if (n =3D=3D 0) + return -EINVAL; + if (copy_from_user(buf, ubuf, n)) + return -EFAULT; + buf[n] =3D '\0'; + + if (!stackmap_write_is_reset(buf, n)) + return -EINVAL; + + /* + * ftrace_stackmap_reset() atomically claims reset rights via + * cmpxchg and returns -EBUSY if another reset is in progress + * or if tracing is active. + */ + ret =3D ftrace_stackmap_reset(priv->smap); + if (ret) + return ret; + return count; +} + +const struct file_operations ftrace_stackmap_fops =3D { + .open =3D stackmap_open, + .read =3D seq_read, + .write =3D stackmap_write, + .llseek =3D seq_lseek, + .release =3D seq_release_private, +}; + +/* --- Stats --- */ + +static int stackmap_stat_show(struct seq_file *m, void *v) +{ + struct ftrace_stackmap *smap =3D m->private; + u32 entries; + u64 successes, drops; + + if (!smap) { + seq_puts(m, "stackmap not initialized\n"); + return 0; + } + + entries =3D atomic_read(&smap->next_elt); + successes =3D atomic64_read(&smap->successes); + drops =3D atomic64_read(&smap->drops); + + seq_printf(m, "entries: %u / %u\n", entries, smap->max_elts); + seq_printf(m, "table_size: %u\n", smap->map_size); + seq_printf(m, "successes: %llu\n", successes); + seq_printf(m, "drops: %llu\n", drops); + if (successes + drops > 0) + seq_printf(m, "success_rate: %llu%%\n", + successes * 100 / (successes + drops)); + return 0; +} + +static int stackmap_stat_open(struct inode *inode, struct file *file) +{ + return single_open(file, stackmap_stat_show, inode->i_private); +} + +const struct file_operations ftrace_stackmap_stat_fops =3D { + .open =3D stackmap_stat_open, + .read =3D seq_read, + .llseek =3D seq_lseek, + .release =3D single_release, +}; + +/* --- Binary export --- */ + +struct stackmap_bin_snapshot { + /* + * Use u64 (not size_t) so data[] is 8-byte aligned on both + * 32-bit and 64-bit architectures. The IP array within data[] + * is accessed as u64*, which would alignment-fault on strict + * architectures (e.g. older ARM, SPARC) if data[] started at + * a 4-byte boundary. + */ + u64 size; + char data[]; +}; + +static int stackmap_bin_open(struct inode *inode, struct file *file) +{ + struct ftrace_stackmap *smap =3D inode->i_private; + struct stackmap_bin_snapshot *snap; + struct ftrace_stackmap_bin_header *hdr; + size_t alloc_size, off; + u32 nr_entries, i, nr_stacks; + + if (!smap) + return -ENODEV; + + /* + * Worst-case allocation size: every populated entry uses a + * full-depth stack. The (+1) gives one slack slot in case a + * concurrent insert lands between this snapshot and iteration. + * The loop below performs an explicit bounds check anyway. + * + * At bits=3D16 this caps at ~33 MB. The file is mode 0440 + * (TRACE_MODE_READ), so only privileged users can open it. + */ + nr_entries =3D atomic_read(&smap->next_elt); + alloc_size =3D sizeof(*hdr) + (nr_entries + 1) * + (sizeof(struct ftrace_stackmap_bin_entry) + + FTRACE_STACKMAP_MAX_DEPTH * sizeof(u64)); + + snap =3D vmalloc(sizeof(*snap) + alloc_size); + if (!snap) + return -ENOMEM; + + hdr =3D (struct ftrace_stackmap_bin_header *)snap->data; + hdr->magic =3D FTRACE_STACKMAP_BIN_MAGIC; + hdr->version =3D FTRACE_STACKMAP_BIN_VERSION; + hdr->reserved =3D 0; + off =3D sizeof(*hdr); + nr_stacks =3D 0; + + for (i =3D 0; i < smap->map_size; i++) { + struct stackmap_entry *entry =3D &smap->entries[i]; + struct stackmap_elt *elt; + struct ftrace_stackmap_bin_entry *e; + u64 *ips_out; + u32 k, nr; + + if (!entry->key) + continue; + elt =3D smp_load_acquire(&entry->val); + if (!elt) + continue; + + nr =3D READ_ONCE(elt->nr); + if (nr > FTRACE_STACKMAP_MAX_DEPTH) + nr =3D FTRACE_STACKMAP_MAX_DEPTH; + + /* Bounds check: stop if we would overflow the allocation. */ + if (off + sizeof(*e) + nr * sizeof(u64) > alloc_size) + break; + + e =3D (struct ftrace_stackmap_bin_entry *)(snap->data + off); + e->stack_id =3D i; + e->nr =3D nr; + e->ref_count =3D atomic_read(&elt->ref_count); + e->reserved =3D 0; + off +=3D sizeof(*e); + + ips_out =3D (u64 *)(snap->data + off); + for (k =3D 0; k < nr; k++) + ips_out[k] =3D (u64)elt->ips[k]; + off +=3D nr * sizeof(u64); + nr_stacks++; + } + + hdr->nr_stacks =3D nr_stacks; + snap->size =3D off; + file->private_data =3D snap; + return 0; +} + +static ssize_t stackmap_bin_read(struct file *file, char __user *ubuf, + size_t count, loff_t *ppos) +{ + struct stackmap_bin_snapshot *snap =3D file->private_data; + + if (!snap) + return -EINVAL; + return simple_read_from_buffer(ubuf, count, ppos, snap->data, snap->size); +} + +static int stackmap_bin_release(struct inode *inode, struct file *file) +{ + vfree(file->private_data); + return 0; +} + +const struct file_operations ftrace_stackmap_bin_fops =3D { + .open =3D stackmap_bin_open, + .read =3D stackmap_bin_read, + .llseek =3D default_llseek, + .release =3D stackmap_bin_release, +}; diff --git a/kernel/trace/trace_stackmap.h b/kernel/trace/trace_stackmap.h new file mode 100644 index 000000000000..da51ed919e2c --- /dev/null +++ b/kernel/trace/trace_stackmap.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TRACE_STACKMAP_H +#define _TRACE_STACKMAP_H + +#include +#include + +#define FTRACE_STACKMAP_MAX_DEPTH 64 + +/* Binary export format */ +#define FTRACE_STACKMAP_BIN_MAGIC 0x464D5342 /* 'FSMB' */ +#define FTRACE_STACKMAP_BIN_VERSION 2 + +struct ftrace_stackmap_bin_header { + u32 magic; + u32 version; + u32 nr_stacks; + u32 reserved; +}; + +struct ftrace_stackmap_bin_entry { + u32 stack_id; + u32 nr; + u32 ref_count; + u32 reserved; + /* followed by u64 ips[nr] */ +}; + +struct trace_array; + +#ifdef CONFIG_FTRACE_STACKMAP + +struct ftrace_stackmap; + +struct ftrace_stackmap *ftrace_stackmap_create(struct trace_array *tr); +void ftrace_stackmap_destroy(struct ftrace_stackmap *smap); +int ftrace_stackmap_get_id(struct ftrace_stackmap *smap, + unsigned long *ips, unsigned int nr_entries); +int ftrace_stackmap_reset(struct ftrace_stackmap *smap); + +extern const struct file_operations ftrace_stackmap_fops; +extern const struct file_operations ftrace_stackmap_stat_fops; +extern const struct file_operations ftrace_stackmap_bin_fops; + +#else + +struct ftrace_stackmap; +static inline struct ftrace_stackmap *ftrace_stackmap_create(struct trace_= array *tr) { return NULL; } +static inline void ftrace_stackmap_destroy(struct ftrace_stackmap *s) { } +static inline int ftrace_stackmap_get_id(struct ftrace_stackmap *s, + unsigned long *ips, unsigned int n) +{ return -ENOSYS; } +static inline int ftrace_stackmap_reset(struct ftrace_stackmap *s) { retur= n 0; } + +#endif +#endif /* _TRACE_STACKMAP_H */ --=20 2.34.1 From nobody Sun May 24 18:42:57 2026 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43AE43A6B92 for ; Fri, 22 May 2026 10:41:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779446487; cv=none; b=IvtWem8bSWplg1YeR+S8S+bsad7gC9lbJ7Y5/n++aWqSt/m/JS8E2xokzC7GrbGOdRXFiDEM/tSumP1n2f6CA55mnnp8jFrVPe5gYdwu78W1UbumYbPonrGUX4t9M/ya2EaODn/6XsM4YA6Q5AYQtINggpS3I/TBdcgVWlk6i7U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779446487; c=relaxed/simple; bh=WC71pHugNevoo7yRdUlO7NXHZw/oA8YyBK7mN7wfRsk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=lbzki2cyPGAcR8eMDO2RaqA/sYgFMJvwDivHP/NWsIvwXy2kSHzOq5W8ald78Kk6KeajIuItKWtYrM83OIJAQk818kGrkthMfQG/rHseiuncwIzoZf4u2DxoD1jmNRsmty9qwugI8ZDgB2d7bWn36HoeklUCPJEfyh3WMQ/W1gk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lH6TG+dw; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lH6TG+dw" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-c798fc1a28cso3196363a12.3 for ; Fri, 22 May 2026 03:41:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779446483; x=1780051283; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ykLygwenxt26NzmMqgCAh8HlNsTLLzKbEYER4Sev7EY=; b=lH6TG+dwfYa6jUlAvbee3jlXY3DNAPSd7wjkQ+oAIpcjvFIhw5M4A04cUUDBlDkjjg 9KwgqNkmYw19zHROyKyb1OnDVjpPce3fNEZAXj0gO4IRexjguKNf+rEyJH8ANw+f1zuT loB/RWfLeD94ZPXDzgiUSffMRG4QagRs2/oOrpzQBsAwjRFZYFjhA5pZ29qaptrGX2FH Cmq1tAooMZPv1yJaysCFqARKhvNBiqOO7VBnw0IK3+xBJmpJ6V9mJup/Hy7BQpX04Rye ErtxWllk9YwgCAmvT010f4tSCxNF3U/o1ifsPbKJDRqpdhKAntGE1LyKpjNgW0dZ9eSk p38g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779446483; x=1780051283; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ykLygwenxt26NzmMqgCAh8HlNsTLLzKbEYER4Sev7EY=; b=cw0VV4UOlwgKymdhAnw0IlGlzgpeX17jCzVtIKOmVsGb1R4ZY8Bk/pnMa0z01HVYok s2fRpDJaujRvFzXjqIJNtyVDMMCHDPo3YflqfNEyuQolCq/wJRj6+yZIZNMeJtsEDcF1 5Dkvw00NuQqa1M/Dt65ZCyQlaVASgXsYLupAo9xEKP+PWoMY6I/O5dr7/tS8CEhny8Ig JlV/77ondVfmxmSKPFJZpsC6JLov9bfySdka1Sh7UVcrzLWDMuOFlqUaLySKG3CFkj7S U3LHwvwEkpYqdxP0tj+mJCM8ePx80tT+2kvuP1uH7/hFgOznghKE6fUybO+w6CwH7fWh WbJA== X-Forwarded-Encrypted: i=1; AFNElJ9d5m6OEr7EVJA+QX2/AdmR9Byv16WhNjO1hmNR40059rC9Ysf/UX0yt8+xjZMQ6xREjz97iBsS/p26IQw=@vger.kernel.org X-Gm-Message-State: AOJu0YyTE+53Gs2GEN0sPU1NsxQwoRl4uu/16L6n59vP96HsfaSLDACg xyxYhVmrefmIpv1Po8ARZBT+FsdAa3a2wznefIqBGmPWe5CztuT5aI1f X-Gm-Gg: Acq92OF+sVyVFcS0nA7BvpO7uWCcD3uCpNkX2ndbAySjOmEskkfJNSIFW181f9JUa9+ JgB+oiSFzqFgAoGCrZmDuz8b5ssdGp1AeJgaem4lH9Pp9TjTmgL+ALVsZOA4/uJ9nUx8I/7/KlV 2WgIs8RzACFraqOzOKBU27YvpiBHRHZAAoybl8qkv/IiNlkRsqAJQRPFn+33lpnK2N3FmDcs1bn SCJw1Zf4DMGh7BmSE1b+NfXlkKwogR6JidDepXO0Hv6cT0W0MUcImgaAaTJTd9urqEBFXhhVqmH 9EBEdWc4ypC7+IkfzqZNvNWJJFYSxCkwt/27iyMPIkg9tWX9X3a1PlnmJN71lBh3HfZ3HEcAAJU Wv903bua4l/cRMHlGtG4GlsghLE5pHeEmiKYWwXbUw8+pBEaexTNxvNB1aaKIuCKHBbUbvts5dg Bn+gkpcyI1ByOwVZRHtbrPkCoV4czaTJf9Z/Vrk0IzrLsYL9dV X-Received: by 2002:a17:902:e745:b0:2bd:158d:299a with SMTP id d9443c01a7336-2beb05cead8mr32987275ad.25.1779446483146; Fri, 22 May 2026 03:41:23 -0700 (PDT) Received: from localhost.localdomain ([2408:8607:1b00:8:9a21:4fdc:c1a5:7a8e]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c85200c3feesm1282942a12.0.2026.05.22.03.41.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 03:41:22 -0700 (PDT) From: Li Pengfei X-Google-Original-From: Li Pengfei To: linux-trace-kernel@vger.kernel.org Cc: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, cmllamas@google.com, zhangbo56@xiaomi.com, lipengfei28@xiaomi.com, lkp@intel.com Subject: [PATCH v2 2/3] trace: integrate stackmap into ftrace stack recording path Date: Fri, 22 May 2026 18:40:16 +0800 Message-Id: <20260522104017.1668638-3-lipengfei28@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260522104017.1668638-1-lipengfei28@xiaomi.com> References: <20260514034916.2162517-1-lipengfei28@xiaomi.com> <20260522104017.1668638-1-lipengfei28@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Pengfei Li Add TRACE_STACK_ID event type and integrate ftrace_stackmap into __ftrace_trace_stack(). When the 'stackmap' trace option is enabled, the stack recording path stores a 4-byte stack_id in the ring buffer instead of the full stack trace. Changes: - New TRACE_STACK_ID in trace_type enum - New stack_id_entry in trace_entries.h - New TRACE_ITER(STACKMAP) trace option flag; when CONFIG_FTRACE_STACKMAP is disabled, TRACE_ITER_STACKMAP_BIT is defined as -1 so that TRACE_ITER(STACKMAP) evaluates to 0 (following the existing pattern used by TRACE_ITER_PROF_TEXT_OFFSET) - Modified __ftrace_trace_stack() to call ftrace_stackmap_get_id() when the stackmap option is active - Stackmap pointer read with smp_load_acquire(), published with smp_store_release() to ensure proper initialization ordering - NULL check on tr->stackmap prevents dereference if creation failed or if used on a secondary trace instance (graceful fallback) - ftrace_stackmap_create() takes the owning trace_array so the stackmap can later check tracing state during reset - Added stack_id print handler in trace_output.c Fallback behavior: if stackmap returns an error (pool exhausted, resetting, or NULL pointer), the full stack trace is recorded as before =E2=80=94 no new failure modes introduced. Note: stackmap is currently initialized only for the global trace instance. Secondary instances fall back to full stack recording. Usage: echo 1 > /sys/kernel/debug/tracing/options/stackmap echo 1 > /sys/kernel/debug/tracing/options/stacktrace Signed-off-by: Pengfei Li --- kernel/trace/trace.c | 66 ++++++++++++++++++++++++++++++++++++ kernel/trace/trace.h | 16 +++++++++ kernel/trace/trace_entries.h | 15 ++++++++ kernel/trace/trace_output.c | 23 +++++++++++++ 4 files changed, 120 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 6eb4d3097a4d..49a675dffad5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -57,6 +57,7 @@ =20 #include "trace.h" #include "trace_output.h" +#include "trace_stackmap.h" =20 #ifdef CONFIG_FTRACE_STARTUP_TEST /* @@ -2184,6 +2185,43 @@ void __ftrace_trace_stack(struct trace_array *tr, } #endif =20 +#ifdef CONFIG_FTRACE_STACKMAP + /* + * If stackmap dedup is enabled, try to store only the stack_id + * in the ring buffer instead of the full stack trace. + */ + if (tr->trace_flags & TRACE_ITER(STACKMAP)) { + struct ftrace_stackmap *smap; + struct stack_id_entry *sid_entry; + int sid; + + smap =3D smp_load_acquire(&tr->stackmap); + if (!smap) + goto full_stack; + + sid =3D ftrace_stackmap_get_id(smap, fstack->calls, nr_entries); + if (sid >=3D 0) { + event =3D __trace_buffer_lock_reserve(buffer, + TRACE_STACK_ID, + sizeof(*sid_entry), trace_ctx); + if (!event) + goto out; + sid_entry =3D ring_buffer_event_data(event); + sid_entry->stack_id =3D sid; + /* + * stack_id is a synthetic side-event attached to a + * primary trace event that was already subject to + * filtering. No per-event filter is defined for + * TRACE_STACK_ID, so commit unconditionally. + */ + __buffer_unlock_commit(buffer, event); + goto out; + } + /* Fall through to full stack on stackmap failure */ + } +full_stack: +#endif + event =3D __trace_buffer_lock_reserve(buffer, TRACE_STACK, struct_size(entry, caller, nr_entries), trace_ctx); @@ -9222,6 +9260,34 @@ static __init void tracer_init_tracefs_work_func(str= uct work_struct *work) NULL, &tracing_dyn_info_fops); #endif =20 +#ifdef CONFIG_FTRACE_STACKMAP + { + struct ftrace_stackmap *smap; + + smap =3D ftrace_stackmap_create(&global_trace); + if (!IS_ERR(smap)) { + /* + * Use smp_store_release to ensure the stackmap + * structure is fully initialized before publishing + * the pointer to concurrent trace event readers. + */ + smp_store_release(&global_trace.stackmap, smap); + trace_create_file("stack_map", TRACE_MODE_WRITE, NULL, + smap, &ftrace_stackmap_fops); + trace_create_file("stack_map_stat", TRACE_MODE_READ, NULL, + smap, &ftrace_stackmap_stat_fops); + trace_create_file("stack_map_bin", TRACE_MODE_READ, NULL, + smap, &ftrace_stackmap_bin_fops); + } else { + pr_warn("ftrace stackmap init failed, dedup disabled\n"); + /* + * global_trace.stackmap is already NULL from kzalloc; + * leaving it NULL ensures the load-acquire in + * __ftrace_trace_stack falls back to full stack. + */ + } + } +#endif create_trace_instances(NULL); =20 update_tracer_options(); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 80fe152af1dd..7e7d5e5a35ff 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -57,6 +57,7 @@ enum trace_type { TRACE_TIMERLAT, TRACE_RAW_DATA, TRACE_FUNC_REPEATS, + TRACE_STACK_ID, =20 __TRACE_LAST_TYPE, }; @@ -453,6 +454,9 @@ struct trace_array { struct cond_snapshot *cond_snapshot; #endif struct trace_func_repeats __percpu *last_func_repeats; +#ifdef CONFIG_FTRACE_STACKMAP + struct ftrace_stackmap *stackmap; +#endif /* * On boot up, the ring buffer is set to the minimum size, so that * we do not waste memory on systems that are not using tracing. @@ -579,6 +583,8 @@ extern void __ftrace_bad_type(void); TRACE_GRAPH_RET); \ IF_ASSIGN(var, ent, struct func_repeats_entry, \ TRACE_FUNC_REPEATS); \ + IF_ASSIGN(var, ent, struct stack_id_entry, \ + TRACE_STACK_ID); \ __ftrace_bad_type(); \ } while (0) =20 @@ -1449,7 +1455,16 @@ extern int trace_get_user(struct trace_parser *parse= r, const char __user *ubuf, # define STACK_FLAGS #endif =20 +#ifdef CONFIG_FTRACE_STACKMAP +# define STACKMAP_FLAGS \ + C(STACKMAP, "stackmap"), +#else +# define STACKMAP_FLAGS +# define TRACE_ITER_STACKMAP_BIT -1 +#endif + #ifdef CONFIG_FUNCTION_PROFILER + # define PROFILER_FLAGS \ C(PROF_TEXT_OFFSET, "prof-text-offset"), # ifdef CONFIG_FUNCTION_GRAPH_TRACER @@ -1506,6 +1521,7 @@ extern int trace_get_user(struct trace_parser *parser= , const char __user *ubuf, FUNCTION_FLAGS \ FGRAPH_FLAGS \ STACK_FLAGS \ + STACKMAP_FLAGS \ BRANCH_FLAGS \ PROFILER_FLAGS \ FPROFILE_FLAGS diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index 54417468fdeb..89ed14b7e5fd 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -250,6 +250,21 @@ FTRACE_ENTRY(user_stack, userstack_entry, (void *)__entry->caller[6], (void *)__entry->caller[7]) ); =20 +/* + * Stack ID entry - stores only a stack_id referencing the stackmap. + * Used when CONFIG_FTRACE_STACKMAP is enabled to deduplicate stacks. + */ +FTRACE_ENTRY(stack_id, stack_id_entry, + + TRACE_STACK_ID, + + F_STRUCT( + __field( int, stack_id ) + ), + + F_printk("", __entry->stack_id) +); + /* * trace_printk entry: */ diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index a5ad76175d10..68678ea88159 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -1517,6 +1517,28 @@ static struct trace_event trace_user_stack_event =3D= { .funcs =3D &trace_user_stack_funcs, }; =20 +/* TRACE_STACK_ID */ +static enum print_line_t trace_stack_id_print(struct trace_iterator *iter, + int flags, struct trace_event *event) +{ + struct stack_id_entry *field; + struct trace_seq *s =3D &iter->seq; + + trace_assign_type(field, iter->ent); + trace_seq_printf(s, "\n", field->stack_id); + + return trace_handle_return(s); +} + +static struct trace_event_functions trace_stack_id_funcs =3D { + .trace =3D trace_stack_id_print, +}; + +static struct trace_event trace_stack_id_event =3D { + .type =3D TRACE_STACK_ID, + .funcs =3D &trace_stack_id_funcs, +}; + /* TRACE_HWLAT */ static enum print_line_t trace_hwlat_print(struct trace_iterator *iter, int flags, @@ -1908,6 +1930,7 @@ static struct trace_event *events[] __initdata =3D { &trace_wake_event, &trace_stack_event, &trace_user_stack_event, + &trace_stack_id_event, &trace_bputs_event, &trace_bprint_event, &trace_print_event, --=20 2.34.1 From nobody Sun May 24 18:42:57 2026 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0599D3CD8CA for ; Fri, 22 May 2026 10:41:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779446493; cv=none; b=Sb0nuS7WWteoUu+vlFx/d8cKuPbwxPUtPwUdYGZrr8WspTsgl2oRR9WmvGsA7vklhnNYISydpoXj/YPovP/RC+s0fNEzAaHHXHt6gbdk0G6HAa59B4g/LuqvRBZ8cqn8hQ8/+FqDSWhRyTioW98UNUOSfcGwpjmK5+ytBs5khV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779446493; c=relaxed/simple; bh=YnhARjoR2achW0qVPc//wJnFnjplveFY3wti7KhnGJQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=d2ciauJ/SXGBZxgcWr1DnRVIACjJTGNF+2nWQdF5WibCrH01Yl7L1NkfmdAQZUtIAVkT5kYVHeUdC41qkVMWKAvVvIOCnYUue+MCo1szzLtHEChbgdOmWoF5bJUQBnPjx+812AFOBD1muK5YkFV0mGd2kVkL8ieiEZplAkOEWyI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LOjmmWOD; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LOjmmWOD" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-c80167f5716so3223750a12.2 for ; Fri, 22 May 2026 03:41:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779446490; x=1780051290; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WB8E6xUBrMkQP8ebx8+gCpC7rvCJlL/tAjTiX9u35Zc=; b=LOjmmWOD6nc/LS6C9M6fsmlndL/i++9JMbE4kUAD+lTaydkTvbwe6VHjgrpI3UJtaT ph/Zbz+q6OzrkUoMR2CgpOwEWXECu/+Bxq+6UBVROQ3Lm/a8kgYsTxUHVw70sB8njAh0 YEqQ3ixP/Fk3S/hb/FpU8nTd8imB20ob/SmIrDSU2d7xWPJuy533qhvRPO2c8H5/C9qj gv0rMrmrzEpW5i+vBnuTHn+nTuntxIHraJ9TVfdlfoQBGhDznLrtQnqh9ajIqfZG2dcD ZJnhPrDLCBAfDuEPhJBzqjIG9cG3OjUTeY4YSJscNAT+nZ0g5+IUhi0C72o44fts6rFu MMFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779446490; x=1780051290; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=WB8E6xUBrMkQP8ebx8+gCpC7rvCJlL/tAjTiX9u35Zc=; b=fZMmI8f3waZSVze61I7ZRBAnrvkuOJAhg1e+Etlu/RPro6yEMh5PiMMe0ap+YM5a36 64Z/i7Jv1T7MlQ6gxXTZ42TDgDV683T6xiX95Kq661nXa4JG8hlmXllgY3pEcwxNOn9H ciiJCEbN6TT2JeEH6+AVBu20QjCY47MauKxTLSxKTDI4Gk1JcY+j0IjJW1LaLmTqgzwc QKwxq9xM81MAHyj8quTTSLwU9vb46dLYyao6FC+vuASkrDKI8SwVTtaNAg1UKaRl9fYa HGivR33tENZoN7sC+SoltBW6wy6QeQDfnTuac20BhWEZwGkHqHIx5Ll4T/zTNlFWsHb/ YfdQ== X-Forwarded-Encrypted: i=1; AFNElJ9Su5W1p75N1iXHExxIUDErWt9tQB+2a1Pz8hWSoVM/SMBOH50ZSOkaSRnP4nn/4+1k3NHe4g7SAGwFJCU=@vger.kernel.org X-Gm-Message-State: AOJu0YyxbgvjmaSA2yEMrxXmT8q4xAyqRmcql03IrvFmJ1Ffv7uJ658V VyC9cemh4g87EUn1Ej+/qXcSh8w79GHyFARmKMpmIAKvKDUt+nU9Oi0e X-Gm-Gg: Acq92OEce7R2QceIy58iU5YhHZm0J1OWqHdP9zozPSDh157+6YjdrY/+eiRs72ltd6O sVDPTY71JFNiOtQvE2JcqayXr9o/XGKVg7EdpzPrXmUhnLhKybUSokRA5nlpek0v0DefTHAPgOi 9VKCpObtI+f2nJQq1TWrtHpNoFRkGgVLkXrS0suxpjNDxhP0xxZOMvESeOynyrDmN+kzygeU/SM rtddT5qb0hiejxt+DRIfiqVLyV9TMqnU0YrUvxp+4enx/CU1dL2mizpo+Z674SSL0IyoOmaOldo ekCoWn2yobL39EuAjMt9HbBMccpCHstACHGm5RhaWC3KNYqhYhJY0jlf5jcGiMV2uqp33dsuuet 9cm5pLOmwh33QaVze0oXnH4JROEw+t+vNoFG4kQDbX0j4J91JYwmLNqn1bjBmaccWVWyFUI7uOy xCA9S5eF0ZYNlW47Qcq4ApK86a625UQ+5KaZXnwA== X-Received: by 2002:a05:6a20:7350:b0:398:6bb5:54c4 with SMTP id adf61e73a8af0-3b328c7077amr3337167637.5.1779446489981; Fri, 22 May 2026 03:41:29 -0700 (PDT) Received: from localhost.localdomain ([2408:8607:1b00:8:9a21:4fdc:c1a5:7a8e]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c85200c3feesm1282942a12.0.2026.05.22.03.41.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 03:41:29 -0700 (PDT) From: Li Pengfei X-Google-Original-From: Li Pengfei To: linux-trace-kernel@vger.kernel.org Cc: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, cmllamas@google.com, zhangbo56@xiaomi.com, lipengfei28@xiaomi.com, lkp@intel.com Subject: [PATCH v2 3/3] trace: add documentation, selftest and tooling for stackmap Date: Fri, 22 May 2026 18:40:17 +0800 Message-Id: <20260522104017.1668638-4-lipengfei28@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260522104017.1668638-1-lipengfei28@xiaomi.com> References: <20260514034916.2162517-1-lipengfei28@xiaomi.com> <20260522104017.1668638-1-lipengfei28@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Pengfei Li Add supporting files for the ftrace stackmap feature: Documentation/trace/ftrace-stackmap.rst: Documentation covering design, usage, tracefs interface, binary format, and performance characteristics. Added to the 'Core Tracing Frameworks' toctree in Documentation/trace/index.rst. Documents: - Reset requires tracing to be stopped first - Boot-time activation via trace_options=3Dstackmap - bits parameter range [10, 18] and worst-case memory usage - tracefs file modes (0640 / 0440) - Best-effort snapshot semantics for stack_map_bin - Counter naming: successes (events served), drops, success_rate tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc: Functional selftest verifying: - stackmap tracefs nodes exist - enabling stackmap + stacktrace produces stack_id events - stack_map_stat shows non-zero successes and zero drops - reset clears entries when tracing is stopped - reset is rejected (-EBUSY) while tracing is active Uses an EXIT trap to restore options/stackmap and options/stacktrace on any exit path. tools/tracing/stackmap_dump.py: Python script to parse the binary stack_map_bin export. Features: - Automatic endianness detection via magic number - Batched addr2line via stdin (avoids ARG_MAX with large stacks) - JSON output mode - Top-N filtering by ref_count Binary format: all fields are native-endian. The parser detects byte order by reading the magic value (0x464D5342 =3D 'FSMB'). Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202605160010.fakzGVVq-lkp@int= el.com/ Signed-off-by: Pengfei Li --- Documentation/trace/ftrace-stackmap.rst | 145 +++++++++++++++++ Documentation/trace/index.rst | 1 + .../ftrace/test.d/ftrace/stackmap-basic.tc | 100 ++++++++++++ tools/tracing/stackmap_dump.py | 150 ++++++++++++++++++ 4 files changed, 396 insertions(+) create mode 100644 Documentation/trace/ftrace-stackmap.rst create mode 100755 tools/testing/selftests/ftrace/test.d/ftrace/stackmap-b= asic.tc create mode 100755 tools/tracing/stackmap_dump.py diff --git a/Documentation/trace/ftrace-stackmap.rst b/Documentation/trace/= ftrace-stackmap.rst new file mode 100644 index 000000000000..1230d44d1d23 --- /dev/null +++ b/Documentation/trace/ftrace-stackmap.rst @@ -0,0 +1,145 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Ftrace Stack Map +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +:Author: Pengfei Li + +Overview +=3D=3D=3D=3D=3D=3D=3D=3D + +The ftrace stack map provides stack trace deduplication for the ftrace +ring buffer. When enabled, instead of storing full kernel stack traces +(typically 80-160 bytes each) in the ring buffer for every event, ftrace +stores only a 4-byte ``stack_id``. The full stacks are maintained in a +separate hash table and exported via tracefs for userspace to resolve. + +This is inspired by eBPF's ``BPF_MAP_TYPE_STACK_TRACE`` but integrated +into ftrace's infrastructure, requiring no userspace daemon. + +Configuration +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Enable ``CONFIG_FTRACE_STACKMAP=3Dy`` in the kernel config. + +Kernel command line parameters: + +- ``ftrace_stackmap.bits=3DN`` - Set map capacity to 2^N unique stacks + (default: 14 =E2=86=92 16384 stacks; valid range: 10-18). + + At ``bits=3D18`` the kernel reserves roughly 130 MB of vmalloc memory + for the element pool. Each ``open()`` of ``stack_map_bin`` may + briefly allocate a similar amount for a snapshot. The cap is set + intentionally to bound memory usage. + +Usage +=3D=3D=3D=3D=3D + +Enable stack deduplication:: + + echo 1 > /sys/kernel/debug/tracing/options/stackmap + echo 1 > /sys/kernel/debug/tracing/options/stacktrace + echo function > /sys/kernel/debug/tracing/current_tracer + +The trace output will show ```` instead of full stack traces:: + + sh-1234 [006] d.h.. 123.456789: + +To view the actual stacks:: + + cat /sys/kernel/debug/tracing/stack_map + +Output format:: + + stack_id 42 [ref 1337, depth 8] + [0] schedule+0x48/0xc0 + [1] schedule_timeout+0x1c/0x30 + ... + +To view statistics:: + + cat /sys/kernel/debug/tracing/stack_map_stat + +Output:: + + entries: 2500 / 16384 + table_size: 32768 + successes: 148923 + drops: 0 + success_rate: 100% + +To reset the stack map (tracing must be stopped first):: + + echo 0 > /sys/kernel/debug/tracing/tracing_on + echo 0 > /sys/kernel/debug/tracing/stack_map + +Reset returns ``-EBUSY`` if tracing is currently active, or if another +reset is already in progress. + +Boot-time activation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The stackmap option can be enabled from the kernel command line:: + + trace_options=3Dstackmap,stacktrace + +Trace events that fire before the tracefs filesystem is initialized +(``fs_initcall`` time) fall back to recording full stack traces; once +``ftrace_stackmap_create()`` runs, subsequent events are deduplicated. +The crossover is automatic and lossless =E2=80=94 no events are dropped, b= ut +early-boot stacks recorded before the crossover are not deduplicated. + +Tracefs Nodes +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +The stack_map files are owned by root and not world-readable +(``stack_map``: 0640; ``stack_map_stat`` and ``stack_map_bin``: 0440). + +``stack_map`` + Text export of all deduplicated stacks with symbol resolution. + Writing ``0`` or ``reset`` clears all entries (only when tracing + is stopped). + +``stack_map_stat`` + Statistics: entry count, hits, drops, and hit rate. + +``stack_map_bin`` + Binary export for efficient userspace consumption. Format: + + - Header (16 bytes): magic(u32) + version(u32) + nr_stacks(u32) + rese= rved(u32) + - Per stack: stack_id(u32) + nr(u32) + ref_count(u32) + reserved(u32) = + ips(u64 =C3=97 nr) + + All fields are written in the kernel's native byte order. + Userspace tools detect endianness by reading the magic value. + Magic: ``0x464D5342`` ('FSMB'), Version: 2. + + The export is a best-effort snapshot allocated at ``open()``; + concurrent inserts during the snapshot may be truncated. A + bounds check ensures no overflow. + +Design +=3D=3D=3D=3D=3D=3D + +The stack map is modeled after ``tracing_map.c`` (used by hist triggers), +using a lock-free design based on Dr. Cliff Click's non-blocking hash table +algorithm: + +- **Lookup/Insert**: Lock-free via ``cmpxchg``, safe in NMI/IRQ/any context +- **Memory**: Pre-allocated element pool, zero allocation on the hot path + (no GFP_ATOMIC failures under memory pressure) +- **Collision**: Linear probing with a 2x over-provisioned table; probe + length is bounded so worst-case insert/lookup is O(1) +- **Scope**: Currently supports the global trace instance +- **Hash**: 32-bit jhash with a per-instance random seed; full ``memcmp`` + confirms matches + +Performance +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Typical results on ARM64 Android device (function tracer, 2 seconds): + +- Unique stacks: ~3000 +- Hit rate: 84-98% (depends on workload diversity) +- Ring buffer savings: ~80% for stack data +- Overhead per event: ~50ns (one jhash + hash table lookup) diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst index 5d9bf4694d5d..ac8b1141c23a 100644 --- a/Documentation/trace/index.rst +++ b/Documentation/trace/index.rst @@ -33,6 +33,7 @@ the Linux kernel. ftrace ftrace-design ftrace-uses + ftrace-stackmap kprobes kprobetrace fprobetrace diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc= b/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc new file mode 100755 index 000000000000..34e4e31ff7a1 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc @@ -0,0 +1,100 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: ftrace - stackmap basic functionality +# requires: stack_map options/stackmap + +# Test that ftrace stackmap deduplication works: +# 1. Enable stackmap + stacktrace options +# 2. Run function tracer briefly +# 3. Verify stack_map has entries +# 4. Verify stack_map_stat shows successes and zero drops +# 5. Verify trace contains events +# 6. Verify reset works when tracing is stopped +# 7. Verify reset is rejected (-EBUSY) while tracing is active + +fail() { + echo "FAIL: $1" + exit_fail +} + +# Restore state on any exit (success, fail, or interrupt) so a +# half-finished test does not leave stacktrace/stackmap enabled. +cleanup() { + disable_tracing 2>/dev/null + echo nop > current_tracer 2>/dev/null + echo 0 > options/stackmap 2>/dev/null + echo 0 > options/stacktrace 2>/dev/null +} +trap cleanup EXIT + +disable_tracing +clear_trace + +# Verify stackmap files exist +test -f stack_map || fail "stack_map file missing" +test -f stack_map_stat || fail "stack_map_stat file missing" +test -f stack_map_bin || fail "stack_map_bin file missing" + +# Enable stackmap dedup +echo 1 > options/stackmap +echo 1 > options/stacktrace + +# Run function tracer briefly +echo function > current_tracer +enable_tracing +sleep 1 +disable_tracing +echo nop > current_tracer +echo 0 > options/stackmap + +# Check stack_map_stat has entries (default empty to avoid [: too many arg= s) +entries=3D$(cat stack_map_stat | grep "^entries:" | awk '{print $2}') +: "${entries:=3D0}" +if [ "$entries" -eq 0 ]; then + fail "stackmap has zero entries after tracing" +fi + +# Check successes > 0 +successes=3D$(cat stack_map_stat | grep "^successes:" | awk '{print $2}') +: "${successes:=3D0}" +if [ "$successes" -eq 0 ]; then + fail "stackmap has zero successes" +fi + +# Check drops =3D=3D 0 (pool should be large enough for 1s trace) +drops=3D$(cat stack_map_stat | grep "^drops:" | awk '{print $2}') +: "${drops:=3D0}" +if [ "$drops" -ne 0 ]; then + fail "stackmap had $drops drops (pool exhausted?)" +fi + +# Check stack_map text output is parseable +first_id=3D$(cat stack_map | grep "^stack_id" | head -1 | awk '{print $2}') +if [ -z "$first_id" ]; then + fail "stack_map output has no stack_id entries" +fi + +# Check trace has stack_id events +count=3D$(grep -c "stack_id" trace || true) +if [ "$count" -eq 0 ]; then + fail "trace has no events" +fi + +# Test reset (tracing must be stopped =E2=80=94 disable_tracing was called= above) +echo 0 > stack_map +entries_after=3D$(cat stack_map_stat | grep "^entries:" | awk '{print $2}') +: "${entries_after:=3D-1}" +if [ "$entries_after" -ne 0 ]; then + fail "stackmap reset did not clear entries (got $entries_after)" +fi + +# Test that reset is rejected while tracing is active +enable_tracing +if echo 0 > stack_map 2>/dev/null; then + disable_tracing + fail "stackmap reset should fail while tracing is active" +fi +disable_tracing + +echo "stackmap basic test passed: $entries unique stacks, $successes succe= sses, $drops drops" +exit 0 diff --git a/tools/tracing/stackmap_dump.py b/tools/tracing/stackmap_dump.py new file mode 100755 index 000000000000..fc5d0c9cf0af --- /dev/null +++ b/tools/tracing/stackmap_dump.py @@ -0,0 +1,150 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +""" +stackmap_dump.py - Parse and display ftrace stack_map_bin binary export. + +Usage: + # Pull from device and parse + adb pull /sys/kernel/debug/tracing/stack_map_bin /tmp/stack_map.bin + python3 stackmap_dump.py /tmp/stack_map.bin + + # With vmlinux for offline symbol resolution + python3 stackmap_dump.py /tmp/stack_map.bin --vmlinux vmlinux + + # JSON output for tooling + python3 stackmap_dump.py /tmp/stack_map.bin --json +""" + +import struct +import sys +import argparse +import json +import subprocess + +MAGIC =3D 0x464D5342 # 'FSMB' +HEADER_SIZE =3D 16 # 4 x u32 +ENTRY_SIZE =3D 16 # 4 x u32 + + +def detect_endianness(data): + """Detect byte order from magic number in header.""" + if len(data) < 4: + raise ValueError("File too small") + magic_le =3D struct.unpack_from('I', data, 0)[0] + if magic_be =3D=3D MAGIC: + return '>' + raise ValueError(f"Bad magic: 0x{magic_le:08x} (neither LE nor BE)") + + +def batch_addr2line(vmlinux, addrs): + """Resolve multiple addresses in one addr2line invocation.""" + if not addrs: + return {} + try: + # Feed addresses on stdin to avoid ARG_MAX limits with large + # numbers of addresses (one stack can have 30+ frames; a + # snapshot can have thousands of unique stacks). + stdin =3D '\n'.join(hex(a) for a in addrs) + '\n' + result =3D subprocess.run( + ['addr2line', '-f', '-e', vmlinux], + input=3Dstdin, capture_output=3DTrue, text=3DTrue, timeout=3D60 + ) + lines =3D result.stdout.split('\n') + # addr2line outputs 2 lines per address: function name + source lo= cation + symbols =3D {} + for i, addr in enumerate(addrs): + idx =3D i * 2 + if idx < len(lines) and lines[idx] and lines[idx] !=3D '??': + symbols[addr] =3D lines[idx] + return symbols + except (subprocess.TimeoutExpired, FileNotFoundError) as e: + print(f"warning: addr2line failed: {e}", file=3Dsys.stderr) + return {} + + +def parse_stackmap_bin(data): + """Parse binary stackmap data, yield (stack_id, ref_count, [ips]).""" + if len(data) < HEADER_SIZE: + raise ValueError("File too small for header") + + endian =3D detect_endianness(data) + header_fmt =3D f'{endian}IIII' + entry_fmt =3D f'{endian}IIII' + + magic, version, nr_stacks, _ =3D struct.unpack_from(header_fmt, data, = 0) + if version not in (1, 2): + raise ValueError(f"Unsupported version: {version}") + + offset =3D HEADER_SIZE + for _ in range(nr_stacks): + if offset + ENTRY_SIZE > len(data): + break + stack_id, nr, ref_count, _ =3D struct.unpack_from(entry_fmt, data,= offset) + offset +=3D ENTRY_SIZE + + ips_size =3D nr * 8 + if offset + ips_size > len(data): + break + ips =3D struct.unpack_from(f'{endian}{nr}Q', data, offset) + offset +=3D ips_size + + yield stack_id, ref_count, list(ips) + + +def main(): + parser =3D argparse.ArgumentParser(description=3D'Parse ftrace stack_m= ap_bin') + parser.add_argument('file', help=3D'Path to stack_map_bin file') + parser.add_argument('--vmlinux', help=3D'Path to vmlinux for symbol re= solution') + parser.add_argument('--json', action=3D'store_true', help=3D'JSON outp= ut') + parser.add_argument('--top', type=3Dint, default=3D0, + help=3D'Show only top N stacks by ref_count') + args =3D parser.parse_args() + + with open(args.file, 'rb') as f: + data =3D f.read() + + stacks =3D list(parse_stackmap_bin(data)) + + if args.top > 0: + stacks.sort(key=3Dlambda x: x[1], reverse=3DTrue) + stacks =3D stacks[:args.top] + + # Batch symbol resolution + symbols =3D {} + if args.vmlinux: + all_addrs =3D set() + for _, _, ips in stacks: + all_addrs.update(ips) + symbols =3D batch_addr2line(args.vmlinux, list(all_addrs)) + + if args.json: + output =3D [] + for stack_id, ref_count, ips in stacks: + entry =3D { + 'stack_id': stack_id, + 'ref_count': ref_count, + 'ips': [f'0x{ip:x}' for ip in ips] + } + if args.vmlinux: + entry['symbols'] =3D [symbols.get(ip, f'0x{ip:x}') + for ip in ips] + output.append(entry) + print(json.dumps(output, indent=3D2)) + else: + for stack_id, ref_count, ips in stacks: + print(f"stack_id {stack_id} [ref {ref_count}, depth {len(ips)}= ]") + for i, ip in enumerate(ips): + sym =3D symbols.get(ip, '') + if sym: + sym =3D f' {sym}' + print(f" [{i}] 0x{ip:x}{sym}") + print() + + print(f"Total: {len(stacks)} unique stacks", file=3Dsys.stderr) + + +if __name__ =3D=3D '__main__': + main() --=20 2.34.1