From nobody Fri Jun 12 15:49:37 2026 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF7933F4121 for ; Thu, 14 May 2026 03:51:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778730710; cv=none; b=Cu/wEUo8MqOCIWCHB7cCK7GwoeogBHxLBPOMeKvtLMsGMFR63U0/t9xJnuOtSH3+MucI4Aocb8BSgkJ+5U156jZiDpNtnhl/knW0qtqB/UDMK5/WHljf7it619Fmf0a2uFfaXZ+Zggpd80Pv8KubJNEn9hhPxvuZg0iX7HhH6fo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778730710; c=relaxed/simple; bh=ipPqbURLaeUouRXUL3f3pSrO0+z2uI3IB8jGANpIyAw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=s+WtzuukSrnFpePW6yVRbvCXzJ7L6AqDyRq1hMYRxd6xKtzOZDd0IMHmnG35Fwi4PhJe1EJP4vabaqM1IVRIf49zoAqj7CRN0n+O64dgfIJi109+blThZu/Vl/hFFUijuU9xl1FlSH4IDedmrgglVifyXFEk4NaeA4GolJUxVNI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=afrlgyIt; arc=none smtp.client-ip=209.85.215.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="afrlgyIt" Received: by mail-pg1-f170.google.com with SMTP id 41be03b00d2f7-c80148ae949so3307224a12.2 for ; Wed, 13 May 2026 20:51:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778730708; x=1779335508; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Yx4se+DYa5wypPaIJXHxb8KwHMypPz6kixZY56ylDzE=; b=afrlgyIt4iOyuqajvmpBJuYOWuF7jEwWsx5QwqzY/oohRIVL2Dl5DkubxhP0p1A8nI rGZbEMxGsmrYHW//Hn/h+I9LTY2VySUyS+DCl7F7q0Jnp+xighqpEvqM8K7QFMXjACOM WNtG2z7ytbQhE1xM2wDvcrUHTMzWGKgNjeZcZc/9es21g72ntApJgEkhyx004tkCusXf U69F9+aGy6N1X1E0R/WQowo5bK1KDVHGPcFi66gxxUprM9/BijmJyOEHN0rBNULV7fYz WgwBMivdYBXzcJHsIFRpJpCAHvBnb7lcncy8xyQf0aWNuubANCgpO8ePbkA4WmpBCqW/ LJMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778730708; x=1779335508; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Yx4se+DYa5wypPaIJXHxb8KwHMypPz6kixZY56ylDzE=; b=sOOOLw1HgL0Qw1MVqM2+gVPeGp+mbGON1oQ11qQ8QPseQUX22NArTJ4v95ZwIKtMwE 7A77YzUPrdJgVxWdv2co8MhnLppDpVzEQHj2UBo2awIlX0Im7mJ0KJwEeFyE1Qr4AHvt Ggsd3En7LcH3Rqmt5GTrjjwJP1urYhqmUs7fQ6MPTJnpLCKsDj3C8XBkoBT3RdwGGvJf QdxRaSgugSbl4Q01pjcmMYwiucJ7GbyZ8pklc7rXm8cQu6qjDo+ciMkYcmnBE3z2byjj uPPljgp8rXt8HCi/bebbATIyeS7wNYdugMc+MQYRcihaxLmWZJhdARqR6C88M1v5P7ER 8FGA== X-Forwarded-Encrypted: i=1; AFNElJ8Zhu3yaAE0rEpQwNjZVUNkRUZX2feWX6ArVhLJkxuWvWuM7WW6RfJ+gnEP3BCcbfBtPO6IOjN4k/9PSL0=@vger.kernel.org X-Gm-Message-State: AOJu0YwdQufvCH7ZvjKt9Jue9BGTUNjz9QGMoE+r1lnZBje2xNDkSaT+ 1vhq9U6aHMldm+6Vn5l3Ko6CXSOAMFhcqoaYgr0FUcORe2ER0mB2NWm4 X-Gm-Gg: Acq92OGbhjcOxLkMK+zqNHoBmweNzC0Wdi3gO2QcEe4h7LiPCV8piIh+q7oghPGFUVG Bf1sREEH0XwyzGyRYRjRh+j01225D3H27HMo2cCi1IvDGmtnGL9WEw47QRP044+o2zw7dRj4XwC hBXeZEDI2Lebyfn//3QGTD+PJ8VnoguehMkktDvCyyRzhR+5fff8CWwmTMYWIG+RgIoor7pZQJd JCXyRrbeijA+dSr+v6Q88hhrY3ZdTM45MuxYU1+BDyf4o74YlH3LIiMuxydpzpgGz+UMHiznoSE F/kGAp6Bz7Ta8gWLLyfdOvGCqepnuvtZQmsmnfYVs9rLIGek21RIQu4hxgw2uWBaJ8Olda5aWhi g0zA3sQDHG/+rPjX7/i+ndJApNae4yATr0Jz0fSjiuDM4zUddsP/mQUer2Yt6GO1Z1W+EmOymnz WSPtJzMih3JK/3RCAl+QkHSyz3zXdAVEk1ZUHnbw== X-Received: by 2002:a17:903:8cf:b0:2ae:ba5f:3ac3 with SMTP id d9443c01a7336-2bd27133c5fmr38960425ad.2.1778730708015; Wed, 13 May 2026 20:51:48 -0700 (PDT) Received: from localhost.localdomain ([2408:8607:1b00:8:3771:9b3a:64a3:f42d]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5bd5fae5sm8189435ad.6.2026.05.13.20.51.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 May 2026 20:51:47 -0700 (PDT) From: Li Pengfei X-Google-Original-From: Li Pengfei To: linux-trace-kernel@vger.kernel.org Cc: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, cmllamas@google.com, zhangbo56@xiaomi.com, lipengfei28@xiaomi.com Subject: [RFC PATCH 1/3] trace: add lock-free stackmap for stack trace deduplication Date: Thu, 14 May 2026 11:49:14 +0800 Message-Id: <20260514034916.2162517-2-lipengfei28@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260514034916.2162517-1-lipengfei28@xiaomi.com> References: <20260514034916.2162517-1-lipengfei28@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Pengfei Li Add a lock-free hash map (ftrace_stackmap) that deduplicates kernel stack traces for the ftrace ring buffer. Instead of storing full stack traces (80-160 bytes each) in the ring buffer for every event, ftrace can store a 4-byte stack_id when the stackmap option is enabled. The implementation is modeled after tracing_map.c (used by hist triggers), using the same lock-free design based on Dr. Cliff Click's non-blocking hash table algorithm: - Lock-free insert via cmpxchg (safe in NMI/IRQ/any context) - Pre-allocated element pool (zero allocation on hot path) - Linear probing with 2x over-provisioned table - Per-trace_array instance support The stackmap is exported via three tracefs nodes: - stack_map: text export with symbol resolution - stack_map_stat: statistics (entries, hits, drops, hit_rate) - stack_map_bin: binary export for efficient userspace consumption Kernel command line parameter: - ftrace_stackmap.bits=3DN: set map capacity (2^N unique stacks) Test results on ARM64 (SM8850, Android 16, kernel 6.12): - 774 unique stacks from kmem_cache_alloc in 1 second - 100% hit rate, 0 drops - 92% hit rate under heavy load (all kmem events) Signed-off-by: Pengfei Li --- kernel/trace/Kconfig | 21 ++ kernel/trace/Makefile | 1 + kernel/trace/trace_stackmap.c | 569 ++++++++++++++++++++++++++++++++++ kernel/trace/trace_stackmap.h | 54 ++++ 4 files changed, 645 insertions(+) create mode 100644 kernel/trace/trace_stackmap.c create mode 100644 kernel/trace/trace_stackmap.h diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e130da35808f..2a63fd2c9a96 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -412,6 +412,27 @@ config STACK_TRACER =20 Say N if unsure. =20 +config FTRACE_STACKMAP + bool "Ftrace stack map deduplication" + depends on TRACING + depends on STACKTRACE + select KALLSYMS + help + This enables a global stack trace hash table for ftrace, inspired + by eBPF's BPF_MAP_TYPE_STACK_TRACE. When enabled, ftrace can store + only a stack_id in the ring buffer instead of the full stack trace, + significantly reducing trace buffer usage when the same call stacks + appear repeatedly. + + The deduplicated stacks are exported via: + /sys/kernel/debug/tracing/stack_map + + Writing to this file resets the stack map. Reading shows all unique + stacks with their stack_id and reference count. + + Say Y if you want to reduce ftrace buffer usage for stack traces. + Say N if unsure. + config TRACE_PREEMPT_TOGGLE bool help diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 1decdce8cbef..f1b6175099cc 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -85,6 +85,7 @@ obj-$(CONFIG_HWLAT_TRACER) +=3D trace_hwlat.o obj-$(CONFIG_OSNOISE_TRACER) +=3D trace_osnoise.o obj-$(CONFIG_NOP_TRACER) +=3D trace_nop.o obj-$(CONFIG_STACK_TRACER) +=3D trace_stack.o +obj-$(CONFIG_FTRACE_STACKMAP) +=3D trace_stackmap.o obj-$(CONFIG_MMIOTRACE) +=3D trace_mmiotrace.o obj-$(CONFIG_FUNCTION_GRAPH_TRACER) +=3D trace_functions_graph.o obj-$(CONFIG_TRACE_BRANCH_PROFILING) +=3D trace_branch.o diff --git a/kernel/trace/trace_stackmap.c b/kernel/trace/trace_stackmap.c new file mode 100644 index 000000000000..c402e7e7f902 --- /dev/null +++ b/kernel/trace/trace_stackmap.c @@ -0,0 +1,569 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Ftrace Stack Map - Lock-free stack trace deduplication for ftrace + * + * Modeled after tracing_map.c (used by hist triggers), this provides + * a lock-free hash map optimized for the ftrace hot path. The design + * is based on Dr. Cliff Click's non-blocking hash table algorithm. + * + * Key properties: + * - Lock-free insert via cmpxchg (safe in NMI/IRQ/any context) + * - Pre-allocated element pool (zero allocation on hot path) + * - Linear probing with 2x over-provisioned table + * - Per-trace_array instance support + * + * The 32-bit jhash of the stack IPs is used as the hash table key. + * On hash collision (different stacks, same 32-bit hash), linear + * probing finds the next slot. Full stack comparison (memcmp) is + * used to confirm matches. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "trace.h" +#include "trace_stackmap.h" + +/* + * Each pre-allocated element holds one unique stack trace. + * Fixed size: MAX_DEPTH entries regardless of actual depth. + */ +struct stackmap_elt { + u32 nr; /* actual number of IPs */ + atomic_t ref_count; + unsigned long ips[FTRACE_STACKMAP_MAX_DEPTH]; +}; + +/* + * Hash table entry: a 32-bit key (jhash of stack) + pointer to elt. + * key =3D=3D 0 means the slot is free. + */ +struct stackmap_entry { + u32 key; /* 0 =3D free, non-zero =3D jhash */ + struct stackmap_elt *val; /* NULL until fully published */ +}; + +struct ftrace_stackmap { + unsigned int map_bits; + unsigned int map_size; /* 1 << (map_bits + 1) */ + unsigned int max_elts; /* 1 << map_bits */ + atomic_t next_elt; /* index into elts pool */ + struct stackmap_entry *entries; /* hash table */ + struct stackmap_elt **elts; /* pre-allocated pool */ + atomic_t resetting; + atomic64_t hits; + atomic64_t drops; +}; + +static u32 stackmap_hash_seed; + +static unsigned int stackmap_map_bits =3D 14; /* 16384 elts, 32768 slots */ +static int __init stackmap_bits_setup(char *str) +{ + unsigned long val; + + if (kstrtoul(str, 0, &val)) + return -EINVAL; + val =3D clamp_val(val, 10, 20); /* 1K - 1M elts */ + stackmap_map_bits =3D val; + return 0; +} +early_param("ftrace_stackmap.bits", stackmap_bits_setup); + +/* --- Element pool --- */ + +static struct stackmap_elt *stackmap_get_elt(struct ftrace_stackmap *smap) +{ + int idx; + + idx =3D atomic_fetch_add_unless(&smap->next_elt, 1, smap->max_elts); + if (idx < smap->max_elts) + return smap->elts[idx]; + return NULL; +} + +static int stackmap_alloc_elts(struct ftrace_stackmap *smap) +{ + unsigned int i; + + smap->elts =3D vzalloc(sizeof(*smap->elts) * smap->max_elts); + if (!smap->elts) + return -ENOMEM; + + for (i =3D 0; i < smap->max_elts; i++) { + smap->elts[i] =3D kzalloc(sizeof(struct stackmap_elt), GFP_KERNEL); + if (!smap->elts[i]) + goto fail; + } + return 0; +fail: + while (i--) + kfree(smap->elts[i]); + vfree(smap->elts); + smap->elts =3D NULL; + return -ENOMEM; +} + +static void stackmap_free_elts(struct ftrace_stackmap *smap) +{ + unsigned int i; + + if (!smap->elts) + return; + for (i =3D 0; i < smap->max_elts; i++) + kfree(smap->elts[i]); + vfree(smap->elts); + smap->elts =3D NULL; +} + +/* --- Create / Destroy / Reset --- */ + +struct ftrace_stackmap *ftrace_stackmap_create(void) +{ + struct ftrace_stackmap *smap; + static bool seed_initialized; + int err; + + smap =3D kzalloc(sizeof(*smap), GFP_KERNEL); + if (!smap) + return ERR_PTR(-ENOMEM); + + smap->map_bits =3D stackmap_map_bits; + smap->max_elts =3D 1 << smap->map_bits; + smap->map_size =3D smap->max_elts * 2; /* 2x over-provision */ + + smap->entries =3D vzalloc(sizeof(*smap->entries) * smap->map_size); + if (!smap->entries) { + kfree(smap); + return ERR_PTR(-ENOMEM); + } + + err =3D stackmap_alloc_elts(smap); + if (err) { + vfree(smap->entries); + kfree(smap); + return ERR_PTR(err); + } + + atomic_set(&smap->next_elt, 0); + atomic_set(&smap->resetting, 0); + atomic64_set(&smap->hits, 0); + atomic64_set(&smap->drops, 0); + + if (!seed_initialized) { + stackmap_hash_seed =3D get_random_u32(); + seed_initialized =3D true; + } + + return smap; +} + +void ftrace_stackmap_destroy(struct ftrace_stackmap *smap) +{ + if (!smap || IS_ERR(smap)) + return; + stackmap_free_elts(smap); + vfree(smap->entries); + kfree(smap); +} + +void ftrace_stackmap_reset(struct ftrace_stackmap *smap) +{ + unsigned int i; + + if (!smap) + return; + + /* + * Reset protocol: + * + * 1. Set resetting=3D1 so get_id() returns -EINVAL immediately. + * get_id() callers in NMI/IRQ context will see this and bail + * out before touching entries or elts. + * + * 2. smp_mb() ensures the resetting store is visible to all CPUs + * before we start clearing entries. Any get_id() that already + * passed the resetting check will complete its cmpxchg and + * WRITE_ONCE(entry->val) before we memset, because: + * - the cmpxchg claims the slot atomically + * - WRITE_ONCE(entry->val) happens before we clear entries + * We accept that a handful of in-flight inserts may write into + * entries that we are about to clear; those entries will simply + * be wiped by the memset below, which is safe. + * + * 3. Clear entries table, then reset elt pool. + * + * 4. Clear resetting=3D0 with another smp_mb() so new get_id() + * calls see a fully reset map. + */ + atomic_set(&smap->resetting, 1); + smp_mb(); + + /* Clear hash table */ + memset(smap->entries, 0, sizeof(*smap->entries) * smap->map_size); + + /* Reset elt pool */ + for (i =3D 0; i < smap->max_elts; i++) + memset(smap->elts[i], 0, sizeof(struct stackmap_elt)); + + atomic_set(&smap->next_elt, 0); + atomic64_set(&smap->hits, 0); + atomic64_set(&smap->drops, 0); + + smp_mb(); + atomic_set(&smap->resetting, 0); +} + +/* --- Core: get_id (lock-free, NMI-safe) --- */ + +int ftrace_stackmap_get_id(struct ftrace_stackmap *smap, + unsigned long *ips, unsigned int nr_entries) +{ + u32 key_hash, idx, test_key, trace_len; + struct stackmap_entry *entry; + struct stackmap_elt *val; + int dup_try =3D 0; + + if (!smap || !nr_entries || atomic_read(&smap->resetting)) + return -EINVAL; + if (nr_entries > FTRACE_STACKMAP_MAX_DEPTH) + nr_entries =3D FTRACE_STACKMAP_MAX_DEPTH; + + trace_len =3D nr_entries * sizeof(unsigned long); + /* + * jhash2() requires the length in u32 units and the data to be + * u32-aligned. On 64-bit kernels sizeof(unsigned long)=3D=3D8, so + * trace_len is always a multiple of 8 (hence of 4). Use jhash2 + * directly; the cast to u32* is safe because ips[] is naturally + * aligned to sizeof(unsigned long) >=3D 4. + */ + key_hash =3D jhash2((const u32 *)ips, trace_len / sizeof(u32), + stackmap_hash_seed); + if (key_hash =3D=3D 0) + key_hash =3D 1; /* 0 means free slot */ + + idx =3D key_hash >> (32 - (smap->map_bits + 1)); + + while (1) { + idx &=3D (smap->map_size - 1); + entry =3D &smap->entries[idx]; + test_key =3D entry->key; + + if (test_key && test_key =3D=3D key_hash) { + val =3D READ_ONCE(entry->val); + if (val && val->nr =3D=3D nr_entries && + memcmp(val->ips, ips, trace_len) =3D=3D 0) { + atomic_inc(&val->ref_count); + atomic64_inc(&smap->hits); + return (int)idx; + } else if (unlikely(!val)) { + /* Another CPU is mid-insert; retry */ + dup_try++; + if (dup_try > smap->map_size) { + atomic64_inc(&smap->drops); + break; + } + continue; + } + } + + if (!test_key) { + /* Free slot: try to claim it */ + if (!cmpxchg(&entry->key, 0, key_hash)) { + struct stackmap_elt *elt; + + elt =3D stackmap_get_elt(smap); + if (!elt) { + /* + * Pool exhausted. We claimed this slot with + * cmpxchg but cannot fill it. Leave key set + * so the slot stays "claimed but empty" =E2=80=94 + * future lookups will skip it (val =3D=3D NULL + * triggers the mid-insert retry path which + * will eventually drop). This is safer than + * writing key=3D0 without cmpxchg, which could + * race with another CPU's cmpxchg on the same + * slot. + */ + atomic64_inc(&smap->drops); + break; + } + + elt->nr =3D nr_entries; + atomic_set(&elt->ref_count, 1); + memcpy(elt->ips, ips, trace_len); + + /* Ensure elt is fully visible before publish */ + smp_wmb(); + WRITE_ONCE(entry->val, elt); + atomic64_inc(&smap->hits); + return (int)idx; + } else { + /* cmpxchg failed; someone else claimed it */ + dup_try++; + continue; + } + } + + idx++; + dup_try++; + if (dup_try > smap->map_size) { + atomic64_inc(&smap->drops); + break; + } + } + + return -ENOSPC; +} + +/* --- Text export: /sys/kernel/debug/tracing/stack_map --- */ + +struct stackmap_seq_private { + struct ftrace_stackmap *smap; +}; + +static void *stackmap_seq_start(struct seq_file *m, loff_t *pos) +{ + struct stackmap_seq_private *priv =3D m->private; + struct ftrace_stackmap *smap =3D priv->smap; + u32 i; + + if (!smap) + return NULL; + for (i =3D *pos; i < smap->map_size; i++) { + if (smap->entries[i].key && smap->entries[i].val) { + *pos =3D i; + return &smap->entries[i]; + } + } + return NULL; +} + +static void *stackmap_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct stackmap_seq_private *priv =3D m->private; + struct ftrace_stackmap *smap =3D priv->smap; + u32 i; + + for (i =3D *pos + 1; i < smap->map_size; i++) { + if (smap->entries[i].key && smap->entries[i].val) { + *pos =3D i; + return &smap->entries[i]; + } + } + *pos =3D i; + return NULL; +} + +static void stackmap_seq_stop(struct seq_file *m, void *v) { } + +static int stackmap_seq_show(struct seq_file *m, void *v) +{ + struct stackmap_entry *entry =3D v; + struct stackmap_elt *elt =3D entry->val; + struct stackmap_seq_private *priv =3D m->private; + u32 idx =3D entry - priv->smap->entries; + u32 i; + + if (!elt) + return 0; + + seq_printf(m, "stack_id %u [ref %u, depth %u]\n", + idx, atomic_read(&elt->ref_count), elt->nr); + for (i =3D 0; i < elt->nr; i++) + seq_printf(m, " [%u] %pS\n", i, (void *)elt->ips[i]); + seq_putc(m, '\n'); + return 0; +} + +static const struct seq_operations stackmap_seq_ops =3D { + .start =3D stackmap_seq_start, + .next =3D stackmap_seq_next, + .stop =3D stackmap_seq_stop, + .show =3D stackmap_seq_show, +}; + +static int stackmap_open(struct inode *inode, struct file *file) +{ + struct stackmap_seq_private *priv; + struct seq_file *m; + int ret; + + ret =3D seq_open_private(file, &stackmap_seq_ops, + sizeof(struct stackmap_seq_private)); + if (ret) + return ret; + m =3D file->private_data; + priv =3D m->private; + priv->smap =3D inode->i_private; + return 0; +} + +static ssize_t stackmap_write(struct file *file, const char __user *ubuf, + size_t count, loff_t *ppos) +{ + struct seq_file *m =3D file->private_data; + struct stackmap_seq_private *priv =3D m->private; + char buf[8]; + size_t n =3D min(count, sizeof(buf) - 1); + + if (copy_from_user(buf, ubuf, n)) + return -EFAULT; + buf[n] =3D '\0'; + if (n =3D=3D 0 || (buf[0] !=3D '0' && strncmp(buf, "reset", 5) !=3D 0)) + return -EINVAL; + + ftrace_stackmap_reset(priv->smap); + return count; +} + +const struct file_operations ftrace_stackmap_fops =3D { + .open =3D stackmap_open, + .read =3D seq_read, + .write =3D stackmap_write, + .llseek =3D seq_lseek, + .release =3D seq_release_private, +}; + +/* --- Stats --- */ + +static int stackmap_stat_show(struct seq_file *m, void *v) +{ + struct ftrace_stackmap *smap =3D m->private; + u32 entries; + u64 hits, drops; + + if (!smap) { + seq_puts(m, "stackmap not initialized\n"); + return 0; + } + + entries =3D atomic_read(&smap->next_elt); + hits =3D atomic64_read(&smap->hits); + drops =3D atomic64_read(&smap->drops); + + seq_printf(m, "entries: %u / %u\n", entries, smap->max_elts); + seq_printf(m, "table_size: %u\n", smap->map_size); + seq_printf(m, "hits: %llu\n", hits); + seq_printf(m, "drops: %llu\n", drops); + if (hits + drops > 0) + seq_printf(m, "hit_rate: %llu%%\n", + hits * 100 / (hits + drops)); + return 0; +} + +static int stackmap_stat_open(struct inode *inode, struct file *file) +{ + return single_open(file, stackmap_stat_show, inode->i_private); +} + +const struct file_operations ftrace_stackmap_stat_fops =3D { + .open =3D stackmap_stat_open, + .read =3D seq_read, + .llseek =3D seq_lseek, + .release =3D single_release, +}; + +/* --- Binary export --- */ + +struct stackmap_bin_snapshot { + size_t size; + char data[]; +}; + +static int stackmap_bin_open(struct inode *inode, struct file *file) +{ + struct ftrace_stackmap *smap =3D inode->i_private; + struct stackmap_bin_snapshot *snap; + struct ftrace_stackmap_bin_header *hdr; + size_t alloc_size, off; + u32 i, nr_stacks; + + if (!smap) + return -ENODEV; + + /* + * Allocate based on actual entry count, not max_elts worst case. + * Each entry needs a header struct plus up to MAX_DEPTH u64 IPs. + * Add 1 to nr_entries to avoid zero-size alloc on empty map. + */ + { + u32 nr_entries =3D atomic_read(&smap->next_elt); + + alloc_size =3D sizeof(*hdr) + (nr_entries + 1) * + (sizeof(struct ftrace_stackmap_bin_entry) + + FTRACE_STACKMAP_MAX_DEPTH * sizeof(u64)); + } + + snap =3D vmalloc(sizeof(*snap) + alloc_size); + if (!snap) + return -ENOMEM; + + hdr =3D (struct ftrace_stackmap_bin_header *)snap->data; + hdr->magic =3D FTRACE_STACKMAP_BIN_MAGIC; + hdr->version =3D FTRACE_STACKMAP_BIN_VERSION; + hdr->reserved =3D 0; + off =3D sizeof(*hdr); + nr_stacks =3D 0; + + for (i =3D 0; i < smap->map_size; i++) { + struct stackmap_entry *entry =3D &smap->entries[i]; + struct stackmap_elt *elt; + struct ftrace_stackmap_bin_entry *e; + u64 *ips_out; + u32 k; + + if (!entry->key) + continue; + elt =3D READ_ONCE(entry->val); + if (!elt) + continue; + + e =3D (struct ftrace_stackmap_bin_entry *)(snap->data + off); + e->stack_id =3D i; + e->nr =3D elt->nr; + e->ref_count =3D atomic_read(&elt->ref_count); + e->reserved =3D 0; + off +=3D sizeof(*e); + + ips_out =3D (u64 *)(snap->data + off); + for (k =3D 0; k < elt->nr; k++) + ips_out[k] =3D (u64)elt->ips[k]; + off +=3D elt->nr * sizeof(u64); + nr_stacks++; + } + + hdr->nr_stacks =3D nr_stacks; + snap->size =3D off; + file->private_data =3D snap; + return 0; +} + +static ssize_t stackmap_bin_read(struct file *file, char __user *ubuf, + size_t count, loff_t *ppos) +{ + struct stackmap_bin_snapshot *snap =3D file->private_data; + + if (!snap) + return -EINVAL; + return simple_read_from_buffer(ubuf, count, ppos, snap->data, snap->size); +} + +static int stackmap_bin_release(struct inode *inode, struct file *file) +{ + vfree(file->private_data); + return 0; +} + +const struct file_operations ftrace_stackmap_bin_fops =3D { + .open =3D stackmap_bin_open, + .read =3D stackmap_bin_read, + .llseek =3D default_llseek, + .release =3D stackmap_bin_release, +}; diff --git a/kernel/trace/trace_stackmap.h b/kernel/trace/trace_stackmap.h new file mode 100644 index 000000000000..74ad649a79f7 --- /dev/null +++ b/kernel/trace/trace_stackmap.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TRACE_STACKMAP_H +#define _TRACE_STACKMAP_H + +#include +#include + +#define FTRACE_STACKMAP_MAX_DEPTH 64 + +/* Binary export format */ +#define FTRACE_STACKMAP_BIN_MAGIC 0x464D5342 /* 'FSMB' */ +#define FTRACE_STACKMAP_BIN_VERSION 2 + +struct ftrace_stackmap_bin_header { + u32 magic; + u32 version; + u32 nr_stacks; + u32 reserved; +}; + +struct ftrace_stackmap_bin_entry { + u32 stack_id; + u32 nr; + u32 ref_count; + u32 reserved; + /* followed by u64 ips[nr] */ +}; + +#ifdef CONFIG_FTRACE_STACKMAP + +struct ftrace_stackmap; + +struct ftrace_stackmap *ftrace_stackmap_create(void); +void ftrace_stackmap_destroy(struct ftrace_stackmap *smap); +int ftrace_stackmap_get_id(struct ftrace_stackmap *smap, + unsigned long *ips, unsigned int nr_entries); +void ftrace_stackmap_reset(struct ftrace_stackmap *smap); + +extern const struct file_operations ftrace_stackmap_fops; +extern const struct file_operations ftrace_stackmap_stat_fops; +extern const struct file_operations ftrace_stackmap_bin_fops; + +#else + +struct ftrace_stackmap; +static inline struct ftrace_stackmap *ftrace_stackmap_create(void) { retur= n NULL; } +static inline void ftrace_stackmap_destroy(struct ftrace_stackmap *s) { } +static inline int ftrace_stackmap_get_id(struct ftrace_stackmap *s, + unsigned long *ips, unsigned int n) +{ return -ENOSYS; } +static inline void ftrace_stackmap_reset(struct ftrace_stackmap *s) { } + +#endif +#endif /* _TRACE_STACKMAP_H */ --=20 2.34.1 From nobody Fri Jun 12 15:49:37 2026 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 258EB3A2561 for ; Thu, 14 May 2026 03:51:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778730716; cv=none; b=HaXBZwz3Yk/5zY2sBLAHyhcaMYbyzb/E1sNeU4XjMprU/570jmGqGd0Ztoq4anMAKLXiL7KpbJy64HnBbZnVlnjUyXybnP9XNlpi5jy9xnvShNvvIcpVQ7MxjG2qnrmqu8f3OvBUliI1ZECwDntgZgPTh7oIiufvzbkbrZw6+cA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778730716; c=relaxed/simple; bh=y0s0UN6lryG4rGB2GrYsjoqHZQL8wlF28XVqWSBQTas=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CmmPS/0H2ApSh1sY6k3z9P5Vw3BjP2w5Rc5shXw0qNUWKefAKDgZU7fWEJZ3KlIFqe50pXmaxxR8cdK/VhsZLOihkzrex7zw604JX+19lFQe0Zpq1lTWNDomt/zQssdseZ2icYcvL+12ojn0n37GRKN5Du9SYIcptSsdAYF7fSo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=oRBAaJj+; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="oRBAaJj+" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2ba17c8cfacso76179225ad.2 for ; Wed, 13 May 2026 20:51:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778730714; x=1779335514; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=M7tCgSV8knNsj3DI/25Run/pnZ2u7KyEH4bA5BHmKpY=; b=oRBAaJj+XFsHcCu64SJjfzja4YVjf/GE7ItIml64JpgHqVPavNyzt68VoYe5MWXqlz MptwBMySapn4pqvrAon8RwNaXiD4ouUgIEcAM8htu7aFbLTs9VvARZODFi8shry8xogv voDtYz9kaFMZ2Sj96DQKWFM8M7IJoLhppKMdDWElmSyG4BTZrtR36d/AxZdmMY0k8SDZ 9/pdWQ08952qFEyB10X6kTOzspihRd+eIh3LL2BhyGS1lFy6ztE0y7a6qWrqyjMbQ3tk GNfw2YYAPOeQctowlzp1Ewd9j2cebG0tkGL9ZXFepP70Pb+QoJZ4q6Bvz/W9psn/Pwnu G5MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778730714; x=1779335514; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=M7tCgSV8knNsj3DI/25Run/pnZ2u7KyEH4bA5BHmKpY=; b=IGdrdoTdoEkxqLD+np07lO6mNvOx7rGLlsIRV39Ny4PrXWUNCKSopy1sUmeuDcVb2X 4UGDqedzLeosO1ub9FhEVKr8KrOrnTalxGNxT1OnnBvQWN1YjHsWf6NlLq8KtBFqG9X2 QMiF1FUiFHu3Tmr2j0y/YOEEj52pg6/M9k1GBX0JVucstD0Nrvoxg0N+tpY/V9ATRJsq 3Ip7n3UUe6MeuPjG8DZMgKyA6HB0HwptEhUrVrKW0Xt+K3YUZ3ZOWALul2Mjo2GUYtnB TtmZJkC5bbjUeNY52zWdq+h9UOEf3dpOfqOUyDA4JZ91GrmwAbx7QauiFPkkGbS2pxnC JIbw== X-Forwarded-Encrypted: i=1; AFNElJ8C0BKuxhruIvzu77iifDGK+FoPNw+y3VyG4kHigXZmu+Crknn0HBvi4UmC3w0rEg1cvV2+7JBQi5/6i8A=@vger.kernel.org X-Gm-Message-State: AOJu0YxIoEQyH54+P95BP36H6bwa099agu0H6vy+WF9fE+ASM0H0oWfZ RoftlHeLVwRW6iTyxc4mWmbovh43npYZ19gWijYJXrbyCTNBrOce92Nd X-Gm-Gg: Acq92OHuuqMJevlmci6zblhFcd5LClJLBXr8j0CCwBP+Eu//8dNqltEHXPjhJZ4rTXH 54Hcj+CFuE9rJzzcdbC9ND/IDVbkg8qF/xEaEMqzWCSXz7UTXgqj4tE8ygtXfbTBZpKDhVRAw5A QRWI9XPBXoqAyddkXgh5cfz/CIbCZigRfdv0hhS6B19ScFov5nIJo/QGtNnZin+tfyRqCEieMvf q1zY8ZwT02x0Pgba1ZHqp+HjfRT311sQ8SIrhIBMS2HmHjlCTCXPvKTNR1cLpGIzBs9GXQdMPuk 7z4N0zAcZa1TpNYjr60oMXSPmWwsACo1//R8SQ3lB7c7ECKZ43idh1RLERjZpaEZ2kaf4cNhyx5 ubSTUIU9BYiYlQuhV2brtxYRZWCIS13NqOpo09xfVtGq4fXxdUOluhymSc9PKm/tUYPxZwCwWVe ZXIwovSrbPc6Pjujx/I5SxaDd7FeJjmflbpreEJA== X-Received: by 2002:a17:903:198c:b0:2b2:4cd2:e162 with SMTP id d9443c01a7336-2bd276f899fmr68562255ad.34.1778730714300; Wed, 13 May 2026 20:51:54 -0700 (PDT) Received: from localhost.localdomain ([2408:8607:1b00:8:3771:9b3a:64a3:f42d]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5bd5fae5sm8189435ad.6.2026.05.13.20.51.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 May 2026 20:51:53 -0700 (PDT) From: Li Pengfei X-Google-Original-From: Li Pengfei To: linux-trace-kernel@vger.kernel.org Cc: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, cmllamas@google.com, zhangbo56@xiaomi.com, lipengfei28@xiaomi.com Subject: [RFC PATCH 2/3] trace: integrate stackmap into ftrace stack recording path Date: Thu, 14 May 2026 11:49:15 +0800 Message-Id: <20260514034916.2162517-3-lipengfei28@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260514034916.2162517-1-lipengfei28@xiaomi.com> References: <20260514034916.2162517-1-lipengfei28@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pengfei Li Add TRACE_STACK_ID event type and integrate ftrace_stackmap into __ftrace_trace_stack(). When the 'stackmap' trace option is enabled, the stack recording path stores a 4-byte stack_id in the ring buffer instead of the full stack trace. Changes: - New TRACE_STACK_ID in trace_type enum - New stack_id_entry in trace_entries.h (just 'int stack_id') - New TRACE_ITER_STACKMAP trace option flag - Modified __ftrace_trace_stack() to call ftrace_stackmap_get_id() when stackmap option is active - Added stack_id print handler in trace_output.c - Added stackmap field to struct trace_array (per-instance support) The stack_id event is committed unconditionally (no filter check) since it is a synthetic side-event tied to the parent event which was already subject to filtering. Fallback behavior: if stackmap returns an error (pool exhausted or resetting), the full stack trace is recorded as before. Usage: echo 1 > /sys/kernel/debug/tracing/options/stackmap echo 1 > /sys/kernel/debug/tracing/options/stacktrace Signed-off-by: Pengfei Li --- kernel/trace/trace.c | 46 ++++++++++++++++++++++++++++++++++++ kernel/trace/trace.h | 16 +++++++++++++ kernel/trace/trace_entries.h | 15 ++++++++++++ kernel/trace/trace_output.c | 23 ++++++++++++++++++ 4 files changed, 100 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 6eb4d3097a4d..c72cb8491217 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -57,6 +57,7 @@ =20 #include "trace.h" #include "trace_output.h" +#include "trace_stackmap.h" =20 #ifdef CONFIG_FTRACE_STARTUP_TEST /* @@ -2184,6 +2185,37 @@ void __ftrace_trace_stack(struct trace_array *tr, } #endif =20 +#ifdef CONFIG_FTRACE_STACKMAP + /* + * If stackmap dedup is enabled, try to store only the stack_id + * in the ring buffer instead of the full stack trace. + */ + if (tr->trace_flags & TRACE_ITER_STACKMAP) { + struct stack_id_entry *sid_entry; + int sid; + + sid =3D ftrace_stackmap_get_id(tr->stackmap, fstack->calls, nr_entries); + if (sid >=3D 0) { + event =3D __trace_buffer_lock_reserve(buffer, + TRACE_STACK_ID, + sizeof(*sid_entry), trace_ctx); + if (!event) + goto out; + sid_entry =3D ring_buffer_event_data(event); + sid_entry->stack_id =3D sid; + /* + * stack_id is a synthetic side-event attached to a + * primary trace event that was already subject to + * filtering. No per-event filter is defined for + * TRACE_STACK_ID, so commit unconditionally. + */ + __buffer_unlock_commit(buffer, event); + goto out; + } + /* Fall through to full stack on stackmap failure */ + } +#endif + event =3D __trace_buffer_lock_reserve(buffer, TRACE_STACK, struct_size(entry, caller, nr_entries), trace_ctx); @@ -9222,6 +9254,20 @@ static __init void tracer_init_tracefs_work_func(str= uct work_struct *work) NULL, &tracing_dyn_info_fops); #endif =20 +#ifdef CONFIG_FTRACE_STACKMAP + global_trace.stackmap =3D ftrace_stackmap_create(); + if (!IS_ERR(global_trace.stackmap)) { + trace_create_file("stack_map", TRACE_MODE_WRITE, NULL, + global_trace.stackmap, &ftrace_stackmap_fops); + trace_create_file("stack_map_stat", TRACE_MODE_READ, NULL, + global_trace.stackmap, &ftrace_stackmap_stat_fops); + trace_create_file("stack_map_bin", TRACE_MODE_READ, NULL, + global_trace.stackmap, &ftrace_stackmap_bin_fops); + } else { + pr_warn("ftrace stackmap init failed, dedup disabled\n"); + global_trace.stackmap =3D NULL; + } +#endif create_trace_instances(NULL); =20 update_tracer_options(); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 80fe152af1dd..74f421a89347 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -57,6 +57,7 @@ enum trace_type { TRACE_TIMERLAT, TRACE_RAW_DATA, TRACE_FUNC_REPEATS, + TRACE_STACK_ID, =20 __TRACE_LAST_TYPE, }; @@ -453,6 +454,9 @@ struct trace_array { struct cond_snapshot *cond_snapshot; #endif struct trace_func_repeats __percpu *last_func_repeats; +#ifdef CONFIG_FTRACE_STACKMAP + struct ftrace_stackmap *stackmap; +#endif /* * On boot up, the ring buffer is set to the minimum size, so that * we do not waste memory on systems that are not using tracing. @@ -579,6 +583,8 @@ extern void __ftrace_bad_type(void); TRACE_GRAPH_RET); \ IF_ASSIGN(var, ent, struct func_repeats_entry, \ TRACE_FUNC_REPEATS); \ + IF_ASSIGN(var, ent, struct stack_id_entry, \ + TRACE_STACK_ID); \ __ftrace_bad_type(); \ } while (0) =20 @@ -1449,7 +1455,16 @@ extern int trace_get_user(struct trace_parser *parse= r, const char __user *ubuf, # define STACK_FLAGS #endif =20 +#ifdef CONFIG_FTRACE_STACKMAP +# define STACKMAP_FLAGS \ + C(STACKMAP, "stackmap"), +#else +# define STACKMAP_FLAGS +# define TRACE_ITER_STACKMAP 0UL +#endif + #ifdef CONFIG_FUNCTION_PROFILER + # define PROFILER_FLAGS \ C(PROF_TEXT_OFFSET, "prof-text-offset"), # ifdef CONFIG_FUNCTION_GRAPH_TRACER @@ -1506,6 +1521,7 @@ extern int trace_get_user(struct trace_parser *parser= , const char __user *ubuf, FUNCTION_FLAGS \ FGRAPH_FLAGS \ STACK_FLAGS \ + STACKMAP_FLAGS \ BRANCH_FLAGS \ PROFILER_FLAGS \ FPROFILE_FLAGS diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h index 54417468fdeb..89ed14b7e5fd 100644 --- a/kernel/trace/trace_entries.h +++ b/kernel/trace/trace_entries.h @@ -250,6 +250,21 @@ FTRACE_ENTRY(user_stack, userstack_entry, (void *)__entry->caller[6], (void *)__entry->caller[7]) ); =20 +/* + * Stack ID entry - stores only a stack_id referencing the stackmap. + * Used when CONFIG_FTRACE_STACKMAP is enabled to deduplicate stacks. + */ +FTRACE_ENTRY(stack_id, stack_id_entry, + + TRACE_STACK_ID, + + F_STRUCT( + __field( int, stack_id ) + ), + + F_printk("", __entry->stack_id) +); + /* * trace_printk entry: */ diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index a5ad76175d10..68678ea88159 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -1517,6 +1517,28 @@ static struct trace_event trace_user_stack_event =3D= { .funcs =3D &trace_user_stack_funcs, }; =20 +/* TRACE_STACK_ID */ +static enum print_line_t trace_stack_id_print(struct trace_iterator *iter, + int flags, struct trace_event *event) +{ + struct stack_id_entry *field; + struct trace_seq *s =3D &iter->seq; + + trace_assign_type(field, iter->ent); + trace_seq_printf(s, "\n", field->stack_id); + + return trace_handle_return(s); +} + +static struct trace_event_functions trace_stack_id_funcs =3D { + .trace =3D trace_stack_id_print, +}; + +static struct trace_event trace_stack_id_event =3D { + .type =3D TRACE_STACK_ID, + .funcs =3D &trace_stack_id_funcs, +}; + /* TRACE_HWLAT */ static enum print_line_t trace_hwlat_print(struct trace_iterator *iter, int flags, @@ -1908,6 +1930,7 @@ static struct trace_event *events[] __initdata =3D { &trace_wake_event, &trace_stack_event, &trace_user_stack_event, + &trace_stack_id_event, &trace_bputs_event, &trace_bprint_event, &trace_print_event, --=20 2.34.1 From nobody Fri Jun 12 15:49:37 2026 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D8DD30675F for ; Thu, 14 May 2026 03:52:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778730724; cv=none; b=d65dBBc7+lPc6U3iMncTYT5JijHAhR/wm74m1/P1RlgT08LJLgVyDR34tdtNEDDjnLh9cKEgVm0W4MTwokRkF2fHKAVN9spQXcoRP5U3ATqbjsm1QmkiHrTc+e1yRC8hkV6yHG9Z7nV0ShUvsn6k6gzJJvu7AEBIfTyGZjAXi3k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778730724; c=relaxed/simple; bh=LU2busX77/SlapeYQBl8LoDveHHSaOEGWMl0T2qArzY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=PchjRyw7Ijk21FOYTDfQUmuwiE0qmNiaeNxDV/MG8WDcDBRD/5YslvyQviGJyXlEIGGSNHAUsVIUoP2GdHxDPl7n9rucRldVdd/IrYNOynz/3uZ16pc8sVjXf9dbEWtpy+1xKhV2aZrzQ1/lQsuXTlTRXuQ9tCC/v9iHJnN7E+c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WKnNXm68; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WKnNXm68" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2b9fcf7c91bso74928885ad.0 for ; Wed, 13 May 2026 20:52:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778730720; x=1779335520; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q+CvXOMGO7rt57VJF75foGqvGhV5yw/dUBPWeQOs4tU=; b=WKnNXm68hGuEPse8H6KeGcY+AKqrhS0JEjp1FKsVtYYn2pA3M4SBRhV18pnHI/8eka XOjbRWQK2FZNmLpY47hPtV+c6FWmwC852HrFeuW6lYS8FnLIdbHOGZD+U8SGKpPwe6/O 5Hl6FpzW7Fl08YGB0iYn02cTqLJc+7kGsf+ZtAwI7K0EVqpj55Am8Zo/pya3btuwH0x4 Lbv5IiW4YeuPcc5M4EAIz5dhmzranfSLMqk8M3l7HJ5r6jOsQ6n8C8u1jhs2OS3Szvt7 DmbOl4Y4w4NxPykJ2ApljhD/GOb58VmQiy4axx9dhT/nNZQnUdAcjVsFhVdq/RziBH9d qHTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778730720; x=1779335520; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=q+CvXOMGO7rt57VJF75foGqvGhV5yw/dUBPWeQOs4tU=; b=CS7MICjQ2eivxcJ3Odi37OTT7Xz/h8GYSTqjMFkEYMmiXdXqxUCkWNSdmOlAtpHN4q OKvPo1+6vEZkc7ZIVOoQ1LhvMQUfMQU+ufLE2EeO06tyOEjzWRelFKqZ/ZUQKNImINja TSZUrAMuIYme+yOD0455zVRekimRwLXu9bDWTG53p6/0LglDsNLKDtKX83ZcGVbMLjBh drRCtHqkE/LEYpfxF10JkVu/3IxuLrODq3UPs8aNeaAtMDuKv2YtnWMv0npCxDQOju0i wPkrEajw1R/o/+tqr2V4pfdPuZTxkELQlF6INF8h/10ypbAjXspTfHhh0tzwN0bE/s3b LfFg== X-Forwarded-Encrypted: i=1; AFNElJ/jzIOE6rGdS5XfNOOpNCYQZ37qC3mqrw4M0qcpqqn9eI/jhUUIfHmaxEntHWVdN+TeXwMG/gTi+UxGx64=@vger.kernel.org X-Gm-Message-State: AOJu0YzrKozKH8XvYXIgwiC16fwy8PaE6lQJAyQGpJFW5s4CZ2mrcRIg YlvmECrExb/EVLWnY0NaldssCCv+g7XrVGopRHEwhD/IAvW4p/5l5wHN X-Gm-Gg: Acq92OEri47GBJui/jRErrYWhMdxpW7Abe3pdiuMBi0epBK2bU49lF+zTjovbgDs2dz uSTwkVmCTpEIrurh3jsqURBfZa8891Nt57pOgLhnqOozxZAM2F+60KrCldumdTw9Ga9pvfolRsG L79LcJAtmWY69Ur+R39sgDAxH3wY+hjsioZbNZ6kR9AvKFFmJSN8oX0Fd0UJ4qNlwB2LpXmS18j 2aS3bKKgDRwKYYluomY/Zu7TD/ZUq/JIJEQF883oe4Kj+zhn4aWKYlmgoSftIExS8DAsQjLolA4 fCa6uMH2t+dOPTihzkcyKcFaj+vixwKhfwbWoVftHxs8rU8DTistsCWRNcjSRXzcMUBoUhz37Yy N5dCkafDq31CXBD0C/+9RWVr1iTgpWn2fwXK5UbQKWO4BpXvbsKXkPTF3ECf5WTdpew3gngeb4O gJ8YlAQrzEinV97R5qJkbjAsreDM6ePkABXyZ5Fj2k8PCADIbi X-Received: by 2002:a17:902:ca8d:b0:2b4:63c8:ce18 with SMTP id d9443c01a7336-2bd2f71202dmr39339025ad.12.1778730720498; Wed, 13 May 2026 20:52:00 -0700 (PDT) Received: from localhost.localdomain ([2408:8607:1b00:8:3771:9b3a:64a3:f42d]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5bd5fae5sm8189435ad.6.2026.05.13.20.51.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 May 2026 20:51:59 -0700 (PDT) From: Li Pengfei X-Google-Original-From: Li Pengfei To: linux-trace-kernel@vger.kernel.org Cc: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, cmllamas@google.com, zhangbo56@xiaomi.com, lipengfei28@xiaomi.com Subject: [RFC PATCH 3/3] trace: add documentation, selftest and tooling for stackmap Date: Thu, 14 May 2026 11:49:16 +0800 Message-Id: <20260514034916.2162517-4-lipengfei28@xiaomi.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260514034916.2162517-1-lipengfei28@xiaomi.com> References: <20260514034916.2162517-1-lipengfei28@xiaomi.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Pengfei Li Add supporting files for the ftrace stackmap feature: Documentation/trace/ftrace-stackmap.rst: Comprehensive documentation covering design, usage, tracefs interface, binary format, and performance characteristics. tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc: Basic functional selftest that verifies: - stackmap tracefs nodes exist - enabling stackmap + stacktrace produces stack_id events - stack_map_stat shows non-zero hits - reset clears entries tools/tracing/stackmap_dump.py: Python script to parse the binary stack_map_bin export. Supports offline symbol resolution via addr2line, JSON output, and top-N filtering by ref_count. Signed-off-by: Pengfei Li --- Documentation/trace/ftrace-stackmap.rst | 111 ++++++++++++++++ .../ftrace/test.d/ftrace/stackmap-basic.tc | 74 +++++++++++ tools/tracing/stackmap_dump.py | 120 ++++++++++++++++++ 3 files changed, 305 insertions(+) create mode 100644 Documentation/trace/ftrace-stackmap.rst create mode 100755 tools/testing/selftests/ftrace/test.d/ftrace/stackmap-b= asic.tc create mode 100755 tools/tracing/stackmap_dump.py diff --git a/Documentation/trace/ftrace-stackmap.rst b/Documentation/trace/= ftrace-stackmap.rst new file mode 100644 index 000000000000..8f6410d4258c --- /dev/null +++ b/Documentation/trace/ftrace-stackmap.rst @@ -0,0 +1,111 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Ftrace Stack Map +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +:Author: Pengfei Li + +Overview +=3D=3D=3D=3D=3D=3D=3D=3D + +The ftrace stack map provides stack trace deduplication for the ftrace +ring buffer. When enabled, instead of storing full kernel stack traces +(typically 80-160 bytes each) in the ring buffer for every event, ftrace +stores only a 4-byte ``stack_id``. The full stacks are maintained in a +separate hash table and exported via tracefs for userspace to resolve. + +This is inspired by eBPF's ``BPF_MAP_TYPE_STACK_TRACE`` but integrated +into ftrace's infrastructure, requiring no userspace daemon. + +Configuration +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Enable ``CONFIG_FTRACE_STACKMAP=3Dy`` in the kernel config. + +Kernel command line parameters: + +- ``ftrace_stackmap.bits=3DN`` - Set map capacity to 2^N unique stacks (de= fault: 14, range: 10-20) + +Usage +=3D=3D=3D=3D=3D + +Enable stack deduplication:: + + echo 1 > /sys/kernel/debug/tracing/options/stackmap + echo 1 > /sys/kernel/debug/tracing/options/stacktrace + echo function > /sys/kernel/debug/tracing/current_tracer + +The trace output will show ```` instead of full stack traces:: + + sh-1234 [006] d.h.. 123.456789: + +To view the actual stacks:: + + cat /sys/kernel/debug/tracing/stack_map + +Output format:: + + stack_id 42 [ref 1337, depth 8] + [0] schedule+0x48/0xc0 + [1] schedule_timeout+0x1c/0x30 + ... + +To view statistics:: + + cat /sys/kernel/debug/tracing/stack_map_stat + +Output:: + + entries: 2500 + table_size: 5000 + hits: 148923 + drops: 0 + hit_rate: 98% + +To reset the stack map:: + + echo 0 > /sys/kernel/debug/tracing/stack_map + +Tracefs Nodes +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +``stack_map`` + Text export of all deduplicated stacks with symbol resolution. + Writing ``0`` or ``reset`` clears all entries. + +``stack_map_stat`` + Statistics: entry count, hits, drops, and hit rate. + +``stack_map_bin`` + Binary export for efficient userspace consumption. Format: + + - Header (16 bytes): magic(u32) + version(u32) + nr_stacks(u32) + rese= rved(u32) + - Per stack: stack_id(u32) + nr(u32) + ref_count(u32) + reserved(u32) = + ips(u64 =C3=97 nr) + + Magic: ``0x464D5342`` ('FSMB'), Version: 2 + +Design +=3D=3D=3D=3D=3D=3D + +The stack map is modeled after ``tracing_map.c`` (used by hist triggers), +using a lock-free design based on Dr. Cliff Click's non-blocking hash table +algorithm: + +- **Lookup/Insert**: Lock-free via ``cmpxchg``, safe in NMI/IRQ/any context +- **Memory**: Pre-allocated element pool, zero allocation on the hot path + (no GFP_ATOMIC failures under memory pressure) +- **Collision**: Linear probing with a 2x over-provisioned table +- **Per-instance**: Each trace_array has its own stackmap, supporting + multiple ftrace instances +- **Hash**: 32-bit jhash of stack IPs; full ``memcmp`` confirms matches + +Performance +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Typical results on ARM64 Android device (function tracer, 2 seconds): + +- Unique stacks: ~3000 +- Hit rate: 84-98% (depends on workload diversity) +- Ring buffer savings: ~80% for stack data +- Overhead per event: ~50ns (one jhash + hash table lookup) diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc= b/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc new file mode 100755 index 000000000000..3b0a7f60769f --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/ftrace/stackmap-basic.tc @@ -0,0 +1,74 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# description: ftrace - stackmap basic functionality +# requires: stack_map options/stackmap + +# Test that ftrace stackmap deduplication works: +# 1. Enable stackmap + stacktrace options +# 2. Run function tracer briefly +# 3. Verify stack_map has entries +# 4. Verify stack_map_stat shows hits +# 5. Verify trace contains events +# 6. Verify reset works + +fail() { + echo "FAIL: $1" + exit_fail +} + +disable_tracing +clear_trace + +# Verify stackmap files exist +test -f stack_map || fail "stack_map file missing" +test -f stack_map_stat || fail "stack_map_stat file missing" +test -f stack_map_bin || fail "stack_map_bin file missing" + +# Enable stackmap dedup +echo 1 > options/stackmap +echo 1 > options/stacktrace + +# Run function tracer briefly +echo function > current_tracer +enable_tracing +sleep 1 +disable_tracing +echo nop > current_tracer +echo 0 > options/stackmap + +# Check stack_map_stat has entries +entries=3D$(cat stack_map_stat | grep "^entries:" | awk '{print $2}') +if [ "$entries" -eq 0 ]; then + fail "stackmap has zero entries after tracing" +fi + +# Check hits > 0 +hits=3D$(cat stack_map_stat | grep "^hits:" | awk '{print $2}') +if [ "$hits" -eq 0 ]; then + fail "stackmap has zero hits" +fi + +# Check drops =3D=3D 0 (pool should be large enough for 1s trace) +drops=3D$(cat stack_map_stat | grep "^drops:" | awk '{print $2}') + +# Check stack_map text output is parseable +first_id=3D$(cat stack_map | grep "^stack_id" | head -1 | awk '{print $2}') +if [ -z "$first_id" ]; then + fail "stack_map output has no stack_id entries" +fi + +# Check trace has stack_id events +count=3D$(cat trace | grep -c "stack_id" || true) +if [ "$count" -eq 0 ]; then + fail "trace has no events" +fi + +# Test reset +echo 0 > stack_map +entries_after=3D$(cat stack_map_stat | grep "^entries:" | awk '{print $2}') +if [ "$entries_after" -ne 0 ]; then + fail "stackmap reset did not clear entries" +fi + +echo "stackmap basic test passed: $entries unique stacks, $hits hits, $dro= ps drops" +exit 0 diff --git a/tools/tracing/stackmap_dump.py b/tools/tracing/stackmap_dump.py new file mode 100755 index 000000000000..91ce80c681ea --- /dev/null +++ b/tools/tracing/stackmap_dump.py @@ -0,0 +1,120 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 +""" +stackmap_dump.py - Parse and display ftrace stack_map_bin binary export. + +Usage: + # Pull from device and parse + adb pull /sys/kernel/debug/tracing/stack_map_bin /tmp/stack_map.bin + python3 stackmap_dump.py /tmp/stack_map.bin + + # With vmlinux for offline symbol resolution + python3 stackmap_dump.py /tmp/stack_map.bin --vmlinux vmlinux + + # JSON output for tooling + python3 stackmap_dump.py /tmp/stack_map.bin --json +""" + +import struct +import sys +import argparse +import json +import subprocess + +MAGIC =3D 0x464D5342 # 'FSMB' +HEADER_FMT =3D '=3D 1 and lines[0] !=3D '??': + return lines[0] + except (subprocess.TimeoutExpired, FileNotFoundError): + pass + return None + + +def parse_stackmap_bin(data): + """Parse binary stackmap data, yield (stack_id, ref_count, [ips]).""" + if len(data) < HEADER_SIZE: + raise ValueError("File too small for header") + + magic, version, nr_stacks, _ =3D struct.unpack_from(HEADER_FMT, data, = 0) + if magic !=3D MAGIC: + raise ValueError(f"Bad magic: 0x{magic:08x}, expected 0x{MAGIC:08x= }") + if version not in (1, 2): + raise ValueError(f"Unsupported version: {version}") + + offset =3D HEADER_SIZE + for _ in range(nr_stacks): + if offset + ENTRY_SIZE > len(data): + break + stack_id, nr, ref_count, _ =3D struct.unpack_from(ENTRY_FMT, data,= offset) + offset +=3D ENTRY_SIZE + + ips_size =3D nr * 8 + if offset + ips_size > len(data): + break + ips =3D struct.unpack_from(f'<{nr}Q', data, offset) + offset +=3D ips_size + + yield stack_id, ref_count, list(ips) + + +def main(): + parser =3D argparse.ArgumentParser(description=3D'Parse ftrace stack_m= ap_bin') + parser.add_argument('file', help=3D'Path to stack_map_bin file') + parser.add_argument('--vmlinux', help=3D'Path to vmlinux for symbol re= solution') + parser.add_argument('--json', action=3D'store_true', help=3D'JSON outp= ut') + parser.add_argument('--top', type=3Dint, default=3D0, + help=3D'Show only top N stacks by ref_count') + args =3D parser.parse_args() + + with open(args.file, 'rb') as f: + data =3D f.read() + + stacks =3D list(parse_stackmap_bin(data)) + + if args.top > 0: + stacks.sort(key=3Dlambda x: x[1], reverse=3DTrue) + stacks =3D stacks[:args.top] + + if args.json: + output =3D [] + for stack_id, ref_count, ips in stacks: + entry =3D { + 'stack_id': stack_id, + 'ref_count': ref_count, + 'ips': [f'0x{ip:x}' for ip in ips] + } + if args.vmlinux: + entry['symbols'] =3D [addr2line(args.vmlinux, ip) or f'0x{= ip:x}' + for ip in ips] + output.append(entry) + print(json.dumps(output, indent=3D2)) + else: + for stack_id, ref_count, ips in stacks: + print(f"stack_id {stack_id} [ref {ref_count}, depth {len(ips)}= ]") + for i, ip in enumerate(ips): + sym =3D '' + if args.vmlinux: + resolved =3D addr2line(args.vmlinux, ip) + if resolved: + sym =3D f' {resolved}' + print(f" [{i}] 0x{ip:x}{sym}") + print() + + print(f"Total: {len(stacks)} unique stacks", file=3Dsys.stderr) + + +if __name__ =3D=3D '__main__': + main() --=20 2.34.1