From nobody Mon Jun 15 20:32:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CE3438F94E; Mon, 13 Apr 2026 18:58:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776106706; cv=none; b=Ah8cQLORxmh/9+vo4Y4yz5eR9qbVTE9ZdyoTV+HlEX4OYLWrGNMi0eVj3XgzNy2wW61kWa5Yn39LCk0i6vGdzU2NV3KcCCj/xsM7rTKmMvRwkRlXvXF+kgiit5zyYDKd3RZZ+xmoBqtIeyg5RFUewqd1jF3hIKK994Jb7kXLdnw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776106706; c=relaxed/simple; bh=oWNqShly/dKLm6QH9Zz4OONYsqdxohN3rWDpuNVVnD0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SZ1nuNywB63jVrDeOyu6HH11tD8qpITJ5EfsvAfvGVDADFbdnaqkmXbzYlTtNWP656wrWtqhOZRTWiupI6KULlddIFQaZt0JLmPVXYbqKy/5EBTQ+gybScsMyuyuTWHRVwqb47k84zm0mrv8G3asRjfKrNqaW/hAcdupyYGUI/0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hK8Vv8mF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hK8Vv8mF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 064C5C2BCAF; Mon, 13 Apr 2026 18:58:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776106706; bh=oWNqShly/dKLm6QH9Zz4OONYsqdxohN3rWDpuNVVnD0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hK8Vv8mFgWDEIp9XPGid+4bIdY5R6vAfgVzhC5yYWCbpVPhECmnS2bvbCv5BxyxdM usE7681SvMHyn2wOlNh3VFVCVLOrfc/Xe4wGG+8t4pQZ4g0yMwHEhMXe2YsTQHHrTM z7MP/uKNR0Geo2pb9YqogClOhStG9iRjwNXtC35CdomBUpIKpLOFfmpYKmLLefDPvr QQH3P+VBHZNeZFJzGesa6y28nsQMuRDln61sGFocqRxejcgoUBH4plyc4bjsBAb3o0 qOtWLEc091pg2B6yGz7Hdes+cNQJCLpFlpXyrsFZQu4jh/cdU1D6HsI5z42lA5CtTX CvQVceOLPneqQ== From: Puranjay Mohan To: bpf@vger.kernel.org Cc: Puranjay Mohan , Puranjay Mohan , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Will Deacon , Mark Rutland , Catalin Marinas , Leo Yan , Rob Herring , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , James Clark , Ian Rogers , Adrian Hunter , Shuah Khan , Breno Leitao , Ravi Bangoria , Stephane Eranian , Kumar Kartikeya Dwivedi , Usama Arif , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH v3 1/4] perf/core: Fix NULL pmu_ctx passed to PMU sched_task callback Date: Mon, 13 Apr 2026 11:57:20 -0700 Message-ID: <20260413185740.3286146-2-puranjay@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260413185740.3286146-1-puranjay@kernel.org> References: <20260413185740.3286146-1-puranjay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __perf_pmu_sched_task() passes cpc->task_epc to pmu->sched_task(), which is NULL when no per-task events exist for this PMU. With CPU-wide branch-stack events, PMU callbacks that dereference pmu_ctx crash. On ARM64 this is easily triggered with: perf record -b -e cycles -a -- ls which crashes on the first context switch: Unable to handle kernel NULL pointer dereference at virtual address 00[.] PC is at armv8pmu_sched_task+0x14/0x50 Call trace: armv8pmu_sched_task+0x14/0x50 (P) perf_pmu_sched_task+0xac/0x108 __perf_event_task_sched_out+0x6c/0xe0 Fall back to &cpc->epc when cpc->task_epc is NULL so the callback always receives a valid pmu_ctx. Fixes: bd2756811766 ("perf: Rewrite core context handling") Signed-off-by: Puranjay Mohan --- kernel/events/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 1f5699b339ec..2a8fb78e1347 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3906,7 +3906,8 @@ static void __perf_pmu_sched_task(struct perf_cpu_pmu= _context *cpc, perf_ctx_lock(cpuctx, cpuctx->task_ctx); perf_pmu_disable(pmu); =20 - pmu->sched_task(cpc->task_epc, task, sched_in); + pmu->sched_task(cpc->task_epc ? cpc->task_epc : &cpc->epc, + task, sched_in); =20 perf_pmu_enable(pmu); perf_ctx_unlock(cpuctx, cpuctx->task_ctx); --=20 2.52.0 From nobody Mon Jun 15 20:32:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BFB038C41C; Mon, 13 Apr 2026 18:58:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776106709; cv=none; b=d0Eixpiwu3EoxZGNo8Ad5Oc7dchTkEs38HVAsab8OlnX5GXxFlsQxWJe2bGVBgtoeUJ2SV5wi1hV9+Ns324VP0sff2PNW6B2Ob2T+dtS3v0anXy1qRhI6OqeIXXdUQbpb160kciN1qj7O4HfqLVPkMmRMMN+hxzC+nk0pDh2dzM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776106709; c=relaxed/simple; bh=JQPVHfvUkzj2ic9SS65cmjJ9pkCh/Jz8i86/isDKNZY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=umou21LcEr71Q3uByFotIk4GUlvmdlqpujgwEK9WvsDhIsYUkFT5dOpv085oPzryFlqctAagIXIv5NPvyMAqQgbedK4EfmNc8N5Jrs3Yjngd4vuIMs2NvfeyyQ0SFodjfM8P0/2kZD8P7X9LXKpxr+OyC2R6MbNyUICTDxrDruw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PYOt2bz4; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PYOt2bz4" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2382AC2BCB6; Mon, 13 Apr 2026 18:58:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776106709; bh=JQPVHfvUkzj2ic9SS65cmjJ9pkCh/Jz8i86/isDKNZY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PYOt2bz4Gg+cTM8EoWwBy0FSyl4AWWaEADtib0LmRnPICQ5nItok66sSo64KYL8mU aWqAQc4VbMpftssTkn/Wy/zpsnc5JzF4mUUdKbiqIMFFAj0C/wLRHCFi3n8PeMwci/ 04Z/IJe7JjHwj9kdvmHXz+dtxwz7kq4jFgrQqzAyFS6MPxR41hkiI3jrsE8/VeVrJs uOOTZsHJzRBlIwNMmYDNyXbi4/uJsq3TKwC2AiWcItqUnBWZMQF1uF7x6GlmB7wWqU WkGVl3rfd5mstu6bqRKmYAEtrVubwR8mZNsRSTFFThxLqowYT9Tl8QZeKZZYREXkFA WRHj4cX6/k6QQ== From: Puranjay Mohan To: bpf@vger.kernel.org Cc: Puranjay Mohan , Puranjay Mohan , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Will Deacon , Mark Rutland , Catalin Marinas , Leo Yan , Rob Herring , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , James Clark , Ian Rogers , Adrian Hunter , Shuah Khan , Breno Leitao , Ravi Bangoria , Stephane Eranian , Kumar Kartikeya Dwivedi , Usama Arif , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH v3 2/4] perf: Use a union to clear branch entry bitfields Date: Mon, 13 Apr 2026 11:57:21 -0700 Message-ID: <20260413185740.3286146-3-puranjay@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260413185740.3286146-1-puranjay@kernel.org> References: <20260413185740.3286146-1-puranjay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" perf_clear_branch_entry_bitfields() zeroes individual bitfields of struct perf_branch_entry but has repeatedly fallen out of sync when new fields were added (new_type and priv were missed). Wrap the bitfields in an anonymous struct inside a union with a u64 bitfields member, and clear them all with a single assignment. This avoids having to update the clearing function every time a new bitfield is added. Fixes: bfe4daf850f4 ("perf/core: Add perf_clear_branch_entry_bitfields() he= lper") Signed-off-by: Puranjay Mohan --- include/linux/perf_event.h | 9 +-------- include/uapi/linux/perf_event.h | 25 +++++++++++++++---------- 2 files changed, 16 insertions(+), 18 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 48d851fbd8ea..f7360c43f902 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1474,14 +1474,7 @@ static inline u32 perf_sample_data_size(struct perf_= sample_data *data, */ static inline void perf_clear_branch_entry_bitfields(struct perf_branch_en= try *br) { - br->mispred =3D 0; - br->predicted =3D 0; - br->in_tx =3D 0; - br->abort =3D 0; - br->cycles =3D 0; - br->type =3D 0; - br->spec =3D PERF_BR_SPEC_NA; - br->reserved =3D 0; + br->bitfields =3D 0; } =20 extern void perf_output_sample(struct perf_output_handle *handle, diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_even= t.h index fd10aa8d697f..c2e7b1b1c4fa 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -1491,16 +1491,21 @@ union perf_mem_data_src { struct perf_branch_entry { __u64 from; __u64 to; - __u64 mispred : 1, /* target mispredicted */ - predicted : 1, /* target predicted */ - in_tx : 1, /* in transaction */ - abort : 1, /* transaction abort */ - cycles : 16, /* cycle count to last branch */ - type : 4, /* branch type */ - spec : 2, /* branch speculation info */ - new_type : 4, /* additional branch type */ - priv : 3, /* privilege level */ - reserved : 31; + union { + struct { + __u64 mispred : 1, /* target mispredicted */ + predicted : 1, /* target predicted */ + in_tx : 1, /* in transaction */ + abort : 1, /* transaction abort */ + cycles : 16, /* cycle count to last branch */ + type : 4, /* branch type */ + spec : 2, /* branch speculation info */ + new_type : 4, /* additional branch type */ + priv : 3, /* privilege level */ + reserved : 31; + }; + __u64 bitfields; + }; }; =20 /* Size of used info bits in struct perf_branch_entry */ --=20 2.52.0 From nobody Mon Jun 15 20:32:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19F9138C41C; Mon, 13 Apr 2026 18:58:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776106715; cv=none; b=iMmarsoHIvY6HqMRLZLjw+J3rMqnAEefumhF3sS8Yqq9a2/MYVCnnN04wnelfapiHUWY6QSuSfa+2HOSVz6aOZ8s5xeozQTWGfJH9xumtk+k/VLzn5EJC0kQSPje38YzeLK6MRBM/dBAstKysujJW/5ebGeIh12zQYSTFUUki30= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776106715; c=relaxed/simple; bh=Yrewm25QxKHuHPPzUpCt1jWW9y9z0Bjv30JM4nYw3i4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T81iBrJ0PDAevnNwIzuiJicBb/zMV+U3svZqqyEExsEMEcf6hOuWTQJyQVSZiIaOQL2D66xkos1Y8Acp2jlD1AyIoQCo1n2AdvVQnFPtWtj4d8Dhby5N2bZAnMxUFex9p+jiO1VLceso4lD9Cl6lftcXydbWXkr5+AZI4+olj2Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TlVLgq3j; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TlVLgq3j" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AF387C2BCAF; Mon, 13 Apr 2026 18:58:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776106714; bh=Yrewm25QxKHuHPPzUpCt1jWW9y9z0Bjv30JM4nYw3i4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TlVLgq3jkQwGeq5pwIiimn5mrTXRa1XaBnCit/Mgsml5QJRx1n9bZyYhcdsCjEPOZ 1ud0SXebtGO464Oi/6ATqgDgkUy0lQ4AlqyLudFQPGsXphRNWglJOIW6wQ4pcH8lgH i1ayhKLXPIS4UGT2Me7I4srB4V0VrkRZtGHNbxSzqdfIa06qYfXa1V+VzffGbXQKXd 6ya2Ry0UDgbck7xO+Nsc6jfD0YOF5ZvnVNlTnftHdXl4lwCUdQP3IxrZlbv9i82GIW UO/qwrFRG8FhPRpIlIGJ4qJ05L5DswnIOSerbrDiOnE7/6/gXlHLyNf/RBcSgqKtp3 PH7s4UNLsfE/A== From: Puranjay Mohan To: bpf@vger.kernel.org Cc: Puranjay Mohan , Puranjay Mohan , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Will Deacon , Mark Rutland , Catalin Marinas , Leo Yan , Rob Herring , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , James Clark , Ian Rogers , Adrian Hunter , Shuah Khan , Breno Leitao , Ravi Bangoria , Stephane Eranian , Kumar Kartikeya Dwivedi , Usama Arif , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH v3 3/4] perf/arm64: Add BRBE support for bpf_get_branch_snapshot() Date: Mon, 13 Apr 2026 11:57:22 -0700 Message-ID: <20260413185740.3286146-4-puranjay@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260413185740.3286146-1-puranjay@kernel.org> References: <20260413185740.3286146-1-puranjay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Enable bpf_get_branch_snapshot() on ARM64 by implementing the perf_snapshot_branch_stack static call for BRBE. BRBE is paused before masking exceptions to avoid branch buffer pollution from trace_hardirqs_off(). Exceptions are then masked with local_daif_save() to prevent PMU overflow pseudo-NMIs from interfering. If an overflow between pause and DAIF save re-enables BRBE, the snapshot detects this via BRBFCR_EL1.PAUSED and bails out. Branch records are read using perf_entry_from_brbe_regset() with a NULL event pointer to bypass event-specific filtering. The buffer is invalidated after reading. Introduce a for_each_brbe_entry() iterator to deduplicate bank iteration between brbe_read_filtered_entries() and the snapshot. Signed-off-by: Puranjay Mohan --- drivers/perf/arm_brbe.c | 107 ++++++++++++++++++++++++++++++++------- drivers/perf/arm_brbe.h | 9 ++++ drivers/perf/arm_pmuv3.c | 5 +- 3 files changed, 103 insertions(+), 18 deletions(-) diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c index ba554e0c846c..fd62019ddc83 100644 --- a/drivers/perf/arm_brbe.c +++ b/drivers/perf/arm_brbe.c @@ -9,6 +9,7 @@ #include #include #include +#include #include "arm_brbe.h" =20 #define BRBFCR_EL1_BRANCH_FILTERS (BRBFCR_EL1_DIRECT | \ @@ -271,6 +272,20 @@ static void select_brbe_bank(int bank) isb(); } =20 +static inline void __brbe_advance(int *bank, int *idx, int nr_hw) +{ + if (++(*idx) >=3D BRBE_BANK_MAX_ENTRIES && + *bank * BRBE_BANK_MAX_ENTRIES + *idx < nr_hw) { + *idx =3D 0; + select_brbe_bank(++(*bank)); + } +} + +#define for_each_brbe_entry(idx, nr_hw) \ + for (int __bank =3D (select_brbe_bank(0), 0), idx =3D 0; \ + __bank * BRBE_BANK_MAX_ENTRIES + idx < (nr_hw); \ + __brbe_advance(&__bank, &idx, (nr_hw))) + static bool __read_brbe_regset(struct brbe_regset *entry, int idx) { entry->brbinf =3D get_brbinf_reg(idx); @@ -618,10 +633,10 @@ static bool perf_entry_from_brbe_regset(int index, st= ruct perf_branch_entry *ent =20 brbe_set_perf_entry_type(entry, brbinf); =20 - if (!branch_sample_no_cycles(event)) + if (!event || !branch_sample_no_cycles(event)) entry->cycles =3D brbinf_get_cycles(brbinf); =20 - if (!branch_sample_no_flags(event)) { + if (!event || !branch_sample_no_flags(event)) { /* Mispredict info is available for source only and complete branch reco= rds. */ if (!brbe_record_is_target_only(brbinf)) { entry->mispred =3D brbinf_get_mispredict(brbinf); @@ -774,32 +789,90 @@ void brbe_read_filtered_entries(struct perf_branch_st= ack *branch_stack, { struct arm_pmu *cpu_pmu =3D to_arm_pmu(event->pmu); int nr_hw =3D brbe_num_branch_records(cpu_pmu); - int nr_banks =3D DIV_ROUND_UP(nr_hw, BRBE_BANK_MAX_ENTRIES); int nr_filtered =3D 0; u64 branch_sample_type =3D event->attr.branch_sample_type; DECLARE_BITMAP(event_type_mask, PERF_BR_ARM64_MAX); =20 prepare_event_branch_type_mask(branch_sample_type, event_type_mask); =20 - for (int bank =3D 0; bank < nr_banks; bank++) { - int nr_remaining =3D nr_hw - (bank * BRBE_BANK_MAX_ENTRIES); - int nr_this_bank =3D min(nr_remaining, BRBE_BANK_MAX_ENTRIES); + for_each_brbe_entry(i, nr_hw) { + struct perf_branch_entry *pbe =3D &branch_stack->entries[nr_filtered]; =20 - select_brbe_bank(bank); + if (!perf_entry_from_brbe_regset(i, pbe, event)) + break; =20 - for (int i =3D 0; i < nr_this_bank; i++) { - struct perf_branch_entry *pbe =3D &branch_stack->entries[nr_filtered]; + if (!filter_branch_record(pbe, branch_sample_type, event_type_mask)) + continue; =20 - if (!perf_entry_from_brbe_regset(i, pbe, event)) - goto done; + nr_filtered++; + } =20 - if (!filter_branch_record(pbe, branch_sample_type, event_type_mask)) - continue; + branch_stack->nr =3D nr_filtered; +} =20 - nr_filtered++; - } +/* + * Best-effort BRBE snapshot for BPF tracing. Pause BRBE to avoid + * self-recording and return 0 if the snapshot state appears disturbed. + */ +int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries, unsi= gned int cnt) +{ + unsigned long flags; + int nr_hw, nr_copied =3D 0; + u64 brbfcr, brbcr; + + if (!cnt) + return 0; + + /* + * Pause BRBE first to avoid recording our own branches. The + * sysreg read/write and ISB are branchless, so pausing before + * checking BRBCR avoids polluting the buffer with our own + * conditional branches. + */ + brbfcr =3D read_sysreg_s(SYS_BRBFCR_EL1); + brbcr =3D read_sysreg_s(SYS_BRBCR_EL1); + write_sysreg_s(brbfcr | BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1); + isb(); + + /* Bail out if BRBE is not enabled (BRBCR_EL1 =3D=3D 0). */ + if (!brbcr) { + write_sysreg_s(brbfcr, SYS_BRBFCR_EL1); + return 0; } =20 -done: - branch_stack->nr =3D nr_filtered; + /* Block local exception delivery while reading the buffer. */ + flags =3D local_daif_save(); + + /* + * A PMU overflow before local_daif_save() could have re-enabled + * BRBE, clearing the PAUSED bit. The overflow handler already + * restored BRBE to its correct state, so just bail out. + */ + if (!(read_sysreg_s(SYS_BRBFCR_EL1) & BRBFCR_EL1_PAUSED)) { + local_daif_restore(flags); + return 0; + } + + nr_hw =3D FIELD_GET(BRBIDR0_EL1_NUMREC_MASK, + read_sysreg_s(SYS_BRBIDR0_EL1)); + + for_each_brbe_entry(i, nr_hw) { + if (nr_copied >=3D cnt) + break; + + if (!perf_entry_from_brbe_regset(i, &entries[nr_copied], NULL)) + break; + + nr_copied++; + } + + brbe_invalidate(); + + /* Restore BRBCR before unpausing via BRBFCR, matching brbe_enable(). */ + write_sysreg_s(brbcr, SYS_BRBCR_EL1); + isb(); + write_sysreg_s(brbfcr, SYS_BRBFCR_EL1); + local_daif_restore(flags); + + return nr_copied; } diff --git a/drivers/perf/arm_brbe.h b/drivers/perf/arm_brbe.h index b7c7d8796c86..c2a1824437fb 100644 --- a/drivers/perf/arm_brbe.h +++ b/drivers/perf/arm_brbe.h @@ -10,6 +10,7 @@ struct arm_pmu; struct perf_branch_stack; struct perf_event; +struct perf_branch_entry; =20 #ifdef CONFIG_ARM64_BRBE void brbe_probe(struct arm_pmu *arm_pmu); @@ -22,6 +23,8 @@ void brbe_disable(void); bool brbe_branch_attr_valid(struct perf_event *event); void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack, const struct perf_event *event); +int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries, + unsigned int cnt); #else static inline void brbe_probe(struct arm_pmu *arm_pmu) { } static inline unsigned int brbe_num_branch_records(const struct arm_pmu *a= rmpmu) @@ -44,4 +47,10 @@ static void brbe_read_filtered_entries(struct perf_branc= h_stack *branch_stack, const struct perf_event *event) { } + +static inline int arm_brbe_snapshot_branch_stack(struct perf_branch_entry = *entries, + unsigned int cnt) +{ + return 0; +} #endif diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index 8014ff766cff..1a9f129a0f94 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -1449,8 +1449,11 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, c= har *name, cpu_pmu->set_event_filter =3D armv8pmu_set_event_filter; =20 cpu_pmu->pmu.event_idx =3D armv8pmu_user_event_idx; - if (brbe_num_branch_records(cpu_pmu)) + if (brbe_num_branch_records(cpu_pmu)) { cpu_pmu->pmu.sched_task =3D armv8pmu_sched_task; + static_call_update(perf_snapshot_branch_stack, + arm_brbe_snapshot_branch_stack); + } =20 cpu_pmu->name =3D name; cpu_pmu->map_event =3D map_event; --=20 2.52.0 From nobody Mon Jun 15 20:32:55 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FE4A39150D; Mon, 13 Apr 2026 18:58:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776106723; cv=none; b=aVrfs9JA08hj4cBMTmo0fV6Hn8tpnvgzBMi0oT9BjgvfU2WfZSPFXugXhG4MrUBCkusCu/0OHd/wJ3rc3cOocr7SZmfkcG4FxRovr5c22KB4m6LgHO71yFiIUQKalmy77h/xEajmvDAz2TcISLc9v4JAjBzzBdQVhak+m0L+8OY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776106723; c=relaxed/simple; bh=x0FufSkSBt0JmK3zAYZJMpJPG4N7AJq2hUvuGYHPtHo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kKAOtD0x4N8wqPVivNbBT9rdxvb1fRBgJEWrJHpSqt4Wty4Zg5DTnEgERJZiP6lxFcxcMDIMHWy/okq21jw5qrUy+DYZtU5SDg425tHiUm0cv5exymBXD5mmm9VUkokxVMm0wtA/rpLuI1wItrDr/ntz0m1mJutWdbMDLCHGkcw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EOrNAGGi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EOrNAGGi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CF7DEC2BCAF; Mon, 13 Apr 2026 18:58:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776106723; bh=x0FufSkSBt0JmK3zAYZJMpJPG4N7AJq2hUvuGYHPtHo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EOrNAGGi3PB4uFo4u00SGeTHTJjiJ+KbmpiDpXapCjf8Hw2J36+vBxqQLreLRvxKI rUiq4vyy3kxR7DD+c0uis7k7+A2RsFpUg5uLtT9yl8N5wGwnwhz4y1jv+joA+62hWU 2EI17S/kgnyprLCJXsJCT4aiVnWuadAlRfay9aU/AWcf9w6+iDPanf14ZNYI4lqOWi KUDtYMks4Nd6OWmefO8IHkO4kb3ahvGKTDX3lT/atWa5FGH5hQ+1WM4y6cLzl4Z3Vp hqgh6nZ5tYJV1s+QpsJeSk4vki3nqGY2pJ3VZSWyAIH7CoxlOF+zpLWRaqZVNJ5Nx0 2qZx2hKoMTJPA== From: Puranjay Mohan To: bpf@vger.kernel.org Cc: Puranjay Mohan , Puranjay Mohan , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , Will Deacon , Mark Rutland , Catalin Marinas , Leo Yan , Rob Herring , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , James Clark , Ian Rogers , Adrian Hunter , Shuah Khan , Breno Leitao , Ravi Bangoria , Stephane Eranian , Kumar Kartikeya Dwivedi , Usama Arif , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH v3 4/4] selftests/bpf: Adjust wasted entries threshold for ARM64 BRBE Date: Mon, 13 Apr 2026 11:57:23 -0700 Message-ID: <20260413185740.3286146-5-puranjay@kernel.org> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260413185740.3286146-1-puranjay@kernel.org> References: <20260413185740.3286146-1-puranjay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The get_branch_snapshot test checks that bpf_get_branch_snapshot() doesn't waste too many branch entries on infrastructure overhead. The threshold of < 10 was calibrated for x86 where about 7 entries are wasted. On ARM64, the BPF trampoline generates more branches than x86, resulting in about 13 wasted entries. The overhead comes from the BPF trampoline calling __bpf_prog_enter_recur which on ARM64 makes out-of-line calls to __rcu_read_lock and generates more conditional branches than x86: [#12] bpf_testmod_loop_test+0x40 -> bpf_trampoline_...+0x48 [#11] bpf_trampoline_...+0x68 -> __bpf_prog_enter_recur+0x0 [#10] __bpf_prog_enter_recur+0x20 -> __bpf_prog_enter_recur+0x118 [#09] __bpf_prog_enter_recur+0x154 -> __bpf_prog_enter_recur+0x160 [#08] __bpf_prog_enter_recur+0x164 -> __bpf_prog_enter_recur+0x2c [#07] __bpf_prog_enter_recur+0x2c -> __rcu_read_lock+0x0 [#06] __rcu_read_lock+0x18 -> __bpf_prog_enter_recur+0x30 [#05] __bpf_prog_enter_recur+0x9c -> __bpf_prog_enter_recur+0xf0 [#04] __bpf_prog_enter_recur+0xf4 -> __bpf_prog_enter_recur+0xa8 [#03] __bpf_prog_enter_recur+0xb8 -> __bpf_prog_enter_recur+0x100 [#02] __bpf_prog_enter_recur+0x114 -> bpf_trampoline_...+0x6c [#01] bpf_trampoline_...+0x78 -> bpf_prog_...test1+0x0 [#00] bpf_prog_...test1+0x58 -> arm_brbe_snapshot_branch_stack+0x0 Use an architecture-specific threshold of < 14 for ARM64 to accommodate this overhead while still detecting regressions. Signed-off-by: Puranjay Mohan --- .../selftests/bpf/prog_tests/get_branch_snapshot.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c b= /tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c index 0394a1156d99..8d1a3480767f 100644 --- a/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c +++ b/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c @@ -116,13 +116,18 @@ void serial_test_get_branch_snapshot(void) =20 ASSERT_GT(skel->bss->test1_hits, 6, "find_looptest_in_lbr"); =20 - /* Given we stop LBR in software, we will waste a few entries. + /* Given we stop LBR/BRBE in software, we will waste a few entries. * But we should try to waste as few as possible entries. We are at - * about 7 on x86_64 systems. - * Add a check for < 10 so that we get heads-up when something - * changes and wastes too many entries. + * about 7 on x86_64 and about 13 on arm64 systems (the arm64 BPF + * trampoline generates more branches than x86_64). + * Add a check so that we get heads-up when something changes and + * wastes too many entries. */ +#if defined(__aarch64__) + ASSERT_LT(skel->bss->wasted_entries, 14, "check_wasted_entries"); +#else ASSERT_LT(skel->bss->wasted_entries, 10, "check_wasted_entries"); +#endif =20 cleanup: get_branch_snapshot__destroy(skel); --=20 2.52.0