From nobody Sun Feb 8 11:26:45 2026 Received: from out-181.mta1.migadu.com (out-181.mta1.migadu.com [95.215.58.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E00E325720 for ; Fri, 9 Jan 2026 15:37:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767973032; cv=none; b=sGWZ126fZgJY+9Wt+I4JOnW6XmQLR54xO0rPcTIFZPPkv+rBM//zgBNn0Vyxbnw255g1b4lRf1rS/E8w2V3h4CGDi3jbY8cd5feOQBbPlD5IKA6gJdbkQkXg7Ud5U4wlwN0kDjjCSS6R7/n2wBKoLWDaGfiY5qdR4gWzPcmW6po= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767973032; c=relaxed/simple; bh=l/hupaNnVXU7GiVcnweVKH9y/0ngIhXBvVKYebvSiYc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fDnohQnzT1e3b8JAUJN4nh7dY+hU5qvGqU6xZuv0VDVX/37eMLqUSQpoTU1/2XnF/jawHOzCoEfV9cxOGqSujPNAuoVHF9sgly/Bg0xJDo+EnIfjc+4O6XX4xBEo15fp2kkUWv+7mJGIFQvUlNsQsdvUPmhOZiA1DFltuFB4Qv4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=iDNCykDA; arc=none smtp.client-ip=95.215.58.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="iDNCykDA" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1767973028; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D5p/tww7cB1FEKleJGQjJQqwkIHUCww35xYq5j4Io0w=; b=iDNCykDAhIo+5NV1zsDlYIlDVQG2Y9ILsjezMyikGMeFitDAluKSyPmX5BAnpxDgfpCB7x InHGYYg4tl5591BtQyo4fsiAj0WfmHe2/1uM39GkpD6mb574Hoan28pyxVh7o5SjQnAX00 gPT7+2uh7ZbcM5UM82XV3Amx0x8ZJYY= From: Leon Hwang To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , "David S . Miller" , David Ahern , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Matt Bobrowski , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shuah Khan , Leon Hwang , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-patches-bot@fb.com Subject: [PATCH bpf-next 1/3] bpf, x64: Call perf_snapshot_branch_stack in trampoline Date: Fri, 9 Jan 2026 23:34:18 +0800 Message-ID: <20260109153420.32181-2-leon.hwang@linux.dev> In-Reply-To: <20260109153420.32181-1-leon.hwang@linux.dev> References: <20260109153420.32181-1-leon.hwang@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" When the PMU LBR is running in branch-sensitive mode, 'perf_snapshot_branch_stack()' may capture branch entries from the trampoline entry up to the call site inside a BPF program. These branch entries are not useful for analyzing the control flow of the tracee. To eliminate such noise for tracing programs, the branch snapshot should be taken as early as possible: * Call 'perf_snapshot_branch_stack()' at the very beginning of the trampoline for fentry programs. * Call 'perf_snapshot_branch_stack()' immediately after invoking the tracee for fexit programs. With this change, LBR snapshots remain meaningful even when multiple BPF programs execute before the one requesting LBR data. In addition, more relevant branch entries can be captured on AMD CPUs, which provide a 16-entry-deep LBR stack. Signed-off-by: Leon Hwang --- arch/x86/net/bpf_jit_comp.c | 66 +++++++++++++++++++++++++++++++++++++ include/linux/bpf.h | 16 ++++++++- 2 files changed, 81 insertions(+), 1 deletion(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index e3b1c4b1d550..a71a6c675392 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -19,6 +20,7 @@ #include #include #include +#include "../events/perf_event.h" =20 static bool all_callee_regs_used[4] =3D {true, true, true, true}; =20 @@ -3137,6 +3139,54 @@ static int invoke_bpf_mod_ret(const struct btf_func_= model *m, u8 **pprog, return 0; } =20 +DEFINE_PER_CPU(struct bpf_tramp_branch_entries, bpf_branch_snapshot); + +static int invoke_branch_snapshot(u8 **pprog, void *image, void *rw_image) +{ + struct bpf_tramp_branch_entries __percpu *pptr =3D &bpf_branch_snapshot; + u8 *prog =3D *pprog; + + /* + * Emit: + * + * struct bpf_tramp_branch_entries *br =3D this_cpu_ptr(&bpf_branch_snaps= hot); + * br->cnt =3D static_call(perf_snapshot_branch_stack)(br->entries, x86_p= mu.lbr_nr); + */ + + /* mov rbx, &bpf_branch_snapshot */ + emit_mov_imm64(&prog, BPF_REG_6, (long) pptr >> 32, (u32)(long) pptr); +#ifdef CONFIG_SMP + /* add rbx, gs:[] */ + EMIT2(0x65, 0x48); + EMIT3(0x03, 0x1C, 0x25); + EMIT((u32)(unsigned long)&this_cpu_off, 4); +#endif + /* mov esi, x86_pmu.lbr_nr */ + EMIT1_off32(0xBE, x86_pmu.lbr_nr); + /* lea rdi, [rbx + offsetof(struct bpf_tramp_branch_entries, entries)] */ + EMIT4(0x48, 0x8D, 0x7B, offsetof(struct bpf_tramp_branch_entries, entries= )); + /* call static_call_query(perf_snapshot_branch_stack) */ + if (emit_rsb_call(&prog, static_call_query(perf_snapshot_branch_stack), + image + (prog - (u8 *)rw_image))) + return -EINVAL; + /* mov dword ptr [rbx], eax */ + EMIT2(0x89, 0x03); + + *pprog =3D prog; + return 0; +} + +static bool bpf_prog_copy_branch_snapshot(struct bpf_tramp_links *tl) +{ + bool copy =3D false; + int i; + + for (i =3D 0; i < tl->nr_links; i++) + copy =3D copy || tl->links[i]->link.prog->copy_branch_snapshot; + + return copy; +} + /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */ #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack) \ __LOAD_TCC_PTR(-round_up(stack, 8) - 8) @@ -3366,6 +3416,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_= tramp_image *im, void *rw_im =20 save_args(m, &prog, regs_off, false, flags); =20 + if (bpf_prog_copy_branch_snapshot(fentry)) { + /* Get branch snapshot asap. */ + if (invoke_branch_snapshot(&prog, image, rw_image)) { + ret =3D -EINVAL; + goto cleanup; + } + } + if (flags & BPF_TRAMP_F_CALL_ORIG) { /* arg1: mov rdi, im */ emit_mov_imm64(&prog, BPF_REG_1, (long) im >> 32, (u32) (long) im); @@ -3422,6 +3480,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_= tramp_image *im, void *rw_im emit_nops(&prog, X86_PATCH_SIZE); } =20 + if (bpf_prog_copy_branch_snapshot(fexit)) { + /* Get branch snapshot asap. */ + if (invoke_branch_snapshot(&prog, image, rw_image)) { + ret =3D -EINVAL; + goto cleanup; + } + } + if (fmod_ret->nr_links) { /* From Intel 64 and IA-32 Architectures Optimization * Reference Manual, 3.4.1.4 Code Alignment, Assembly/Compiler diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5936f8e2996f..16dc21836a06 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -6,6 +6,7 @@ =20 #include #include +#include =20 #include #include @@ -1236,6 +1237,18 @@ struct bpf_tramp_links { =20 struct bpf_tramp_run_ctx; =20 +#ifdef CONFIG_X86_64 +/* Same as MAX_LBR_ENTRIES in arch/x86/events/perf_event.h */ +#define MAX_BRANCH_ENTRIES 32 + +struct bpf_tramp_branch_entries { + int cnt; + struct perf_branch_entry entries[MAX_BRANCH_ENTRIES]; +}; + +DECLARE_PER_CPU(struct bpf_tramp_branch_entries, bpf_branch_snapshot); +#endif + /* Different use cases for BPF trampoline: * 1. replace nop at the function entry (kprobe equivalent) * flags =3D BPF_TRAMP_F_RESTORE_REGS @@ -1780,7 +1793,8 @@ struct bpf_prog { call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ call_get_func_ip:1, /* Do we call get_func_ip() */ tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */ - sleepable:1; /* BPF program is sleepable */ + sleepable:1, /* BPF program is sleepable */ + copy_branch_snapshot:1; /* Copy branch snapshot from prefetched buffer= */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ --=20 2.52.0