From nobody Mon May 25 04:34:28 2026 Received: from mail-dy1-f201.google.com (mail-dy1-f201.google.com [74.125.82.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8AD9E38BF9E for ; Mon, 18 May 2026 22:43:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779144214; cv=none; b=U+fq9BwKdfMbAzfusDeKV+APrHlcdXeprj53GsMk4eX4bOaKh1uFvJXwpBpRA0DKXr2AbKf0de+JNG8PdN2Csj78FNKYIeyTPkvZ4dF1pG1oJY798wExjlpHVpHlIj0Zt1TcmP53ZUjT5XdNvrQ5i4mWI3phsKbGSkLBPu4JgQs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779144214; c=relaxed/simple; bh=XC4fMTsGdk1JztdyyKzSEMGeqKnDzi+Nfb9ZoOo5HFc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZEGwj1qjQ5OX0h/rHFFeDo33BxCsV8LR7yDEJ4Xnq0FAgfjnC7BQ42JZttlTrQKTuHaV4y1LI0kqd9VJjYeNtI3Sm5/MubuzIly3g08htDcQXrZEsNkFORIW5FgnCk0Vv2yNc35e4S/eQJuSmaoDmVj42q5mKPvvdogOkCFXNo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dzz/kowo; arc=none smtp.client-ip=74.125.82.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dzz/kowo" Received: by mail-dy1-f201.google.com with SMTP id 5a478bee46e88-2f3eb8f3419so941667eec.1 for ; Mon, 18 May 2026 15:43:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779144211; x=1779749011; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=m9cZgG5XMktoBNuhN5wGXs6d6FURKoW94bCnI/zbzTw=; b=dzz/kowobIWY0H8P1WqkQeb3sYKwd5BcMtd/vJPq69DHMMBY/KvWXOG/FFbaDd7Aop i+T8Gfl+8rBGrDmutyi39Y/GJ8q4ZjGchVya+ihlhoGq5SZml4yMiYDDWrpW6Vy8KhMs g3a+bGH4Xiw0WNPFquqSitSgGHsZkkc/8uiRzI0M8gs95/q2/giXFwQ1HZWdutUJ6iFH WZhQ7SZlxS2Qp16TqViSpyvC6E/vlTR8uKFP2cSmPBCuEb2EKmhzaDB55WzN/17xIiEk W/yyW7x445x90ludFyE1UgoGF9Dz43wBfXj8KgpSPUNZT1XFCo+eoDeAXxFki2dbAjNf Ob4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779144211; x=1779749011; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=m9cZgG5XMktoBNuhN5wGXs6d6FURKoW94bCnI/zbzTw=; b=fghJqcRtEQLbs3hGa5+u7sLgIzPn4uhZwNJA5eGtgeEpe1DoHnniaYNtSikZlXDF3B UiiTUxejmCIQHhASXtAUxyXiTuAaLLFQmYedH9XblhiP3fEcGCbgGpFXX4wUSp+/1EN4 uRu3X/JBdlHusL0iup7ryh10hZ7YW/At0MQU86kqKMgoSNpS0VY//W7mFki8wqyJDyof GcN71GfzLWht8CmDTlhrvsyyCmUHCs8IKIHD7dejq9yp8gDlG1li/OHxp7p9mxvdjDVP 1QXl8thkeznMWki9Sk3O9eDRFCY0arPFWdXOfvLTJxZCwtYWwfoAKWz4BaSerQq1DuRy bkwQ== X-Forwarded-Encrypted: i=1; AFNElJ/UxGobk/9Q+vlogiea8XwlXBfrzdIxQMuRDqFkVzG9u/7IAY80hbYUu67OoZ/2p4g2aGxScLjuP/SGQXA=@vger.kernel.org X-Gm-Message-State: AOJu0YzjuTrfIeXeHwz3PgiyybuE0xacrlO1uytN/Ipc66E6iOBjfWRQ BAVy3dZaAcv8pejwovBfo05i+lSbrHGuEeuN6PkSLxl3TCmZk0CsiJ5/wLWFYRG3Ak1FsFVyfAS gqEcoR1ttyA== X-Received: from dyaw18-n1.prod.google.com ([2002:a05:693c:4152:10b0:2da:5e63:c8e4]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:7300:7494:b0:2ef:1d11:18ae with SMTP id 5a478bee46e88-303986a1931mr7429227eec.28.1779144210347; Mon, 18 May 2026 15:43:30 -0700 (PDT) Date: Mon, 18 May 2026 15:43:24 -0700 In-Reply-To: <20260518224325.3037838-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260518203805.2955241-1-irogers@google.com> <20260518224325.3037838-1-irogers@google.com> X-Mailer: git-send-email 2.54.0.631.ge1b05301d1-goog Message-ID: <20260518224325.3037838-2-irogers@google.com> Subject: [PATCH v8 1/2] perf event: Fix size of synthesized sample with branch stacks From: Ian Rogers To: irogers@google.com, acme@kernel.org, namhyung@kernel.org Cc: adrian.hunter@intel.com, dapeng1.mi@linux.intel.com, james.clark@linaro.org, leo.yan@linux.dev, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, mingo@redhat.com, peterz@infradead.org, ravi.bangoria@amd.com, thomas.falcon@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Synthesizing branch stacks for Intel-PT highlighted an issue where PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the perf_event_attr branch_sample_type. This caused an incorrect size calculation. Fix the writing of the nr and hw_idx values during sample event synthesis by passing the branch_sample_type into the sample size and synthesis functions. Also update hardware tracers (Intel PT, ARM SPE, CS-ETM) to retrieve and pass their branch_sample_type dynamically to prevent payload misalignment. Fixes: d3f85437ad6a ("perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX") Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers Acked-by: Namhyung Kim --- tools/perf/bench/inject-buildid.c | 9 ++++++--- tools/perf/builtin-inject.c | 12 +++++++++--- tools/perf/tests/dlfilter-test.c | 8 ++++++-- tools/perf/tests/sample-parsing.c | 5 +++-- tools/perf/util/arm-spe.c | 28 ++++++++++++++++++++++++---- tools/perf/util/cs-etm.c | 28 +++++++++++++++++++++++----- tools/perf/util/intel-bts.c | 3 ++- tools/perf/util/intel-pt.c | 27 +++++++++++++++++++++++---- tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++------- tools/perf/util/synthetic-events.h | 6 ++++-- 10 files changed, 118 insertions(+), 33 deletions(-) diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-bu= ildid.c index aad572a78d7f..bfd2c5ec9488 100644 --- a/tools/perf/bench/inject-buildid.c +++ b/tools/perf/bench/inject-buildid.c @@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *da= ta, struct bench_dso *dso, =20 event.header.type =3D PERF_RECORD_SAMPLE; event.header.misc =3D PERF_RECORD_MISC_USER; - event.header.size =3D perf_event__sample_event_size(&sample, bench_sample= _type, 0); - - perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample); + event.header.size =3D perf_event__sample_event_size(&sample, bench_sample= _type, + /*read_format=3D*/0, + /*branch_sample_type=3D*/0); + perf_event__synthesize_sample(&event, bench_sample_type, + /*read_format=3D*/0, + /*branch_sample_type=3D*/0, &sample); =20 return writen(data->input_pipe[1], &event, event.header.size); } diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index a2493f1097df..2f20e782c7f2 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -465,8 +465,13 @@ static int perf_event__convert_sample_callchain(const = struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &=3D ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); =20 - perf_event__synthesize_sample(event_copy, sample_type, - evsel->core.attr.read_format, sample); + ret =3D perf_event__synthesize_sample(event_copy, sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + if (ret) { + pr_err("Failed to synthesize sample\n"); + return ret; + } return perf_event__repipe_synth(tool, event_copy); } =20 @@ -1102,7 +1107,8 @@ static int perf_inject__sched_stat(const struct perf_= tool *tool, sample_sw.period =3D sample->period; sample_sw.time =3D sample->time; perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type, - evsel->core.attr.read_format, &sample_sw); + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, &sample_sw); build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine); ret =3D perf_event__repipe(tool, event_sw, &sample_sw, machine); perf_sample__exit(&sample_sw); diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-t= est.c index e63790c61d53..204663571943 100644 --- a/tools/perf/tests/dlfilter-test.c +++ b/tools/perf/tests/dlfilter-test.c @@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 samp= le_type, u64 id, pid_t pid =20 event->header.type =3D PERF_RECORD_SAMPLE; event->header.misc =3D PERF_RECORD_MISC_USER; - event->header.size =3D perf_event__sample_event_size(&sample, sample_type= , 0); - err =3D perf_event__synthesize_sample(event, sample_type, 0, &sample); + event->header.size =3D perf_event__sample_event_size(&sample, sample_type, + /*read_format=3D*/0, + /*branch_sample_type=3D*/0); + err =3D perf_event__synthesize_sample(event, sample_type, + /*read_format=3D*/0, + /*branch_sample_type=3D*/0, &sample); if (err) return test_result("perf_event__synthesize_sample() failed", TEST_FAIL); =20 diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-pa= rsing.c index a7327c942ca2..55f0b73ca20e 100644 --- a/tools/perf/tests/sample-parsing.c +++ b/tools/perf/tests/sample-parsing.c @@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u6= 4 read_format) sample.read.one.lost =3D 1; } =20 - sz =3D perf_event__sample_event_size(&sample, sample_type, read_format); + sz =3D perf_event__sample_event_size(&sample, sample_type, read_format, + evsel.core.attr.branch_sample_type); bufsz =3D sz + 4096; /* Add a bit for overrun checking */ event =3D malloc(bufsz); if (!event) { @@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u6= 4 read_format) event->header.size =3D sz; =20 err =3D perf_event__synthesize_sample(event, sample_type, read_format, - &sample); + evsel.core.attr.branch_sample_type, &sample); if (err) { pr_debug("%s failed for sample_type %#"PRIx64", error %d\n", "perf_event__synthesize_sample", sample_type, err); diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index 2b31da231ef3..31f05f467810 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -487,10 +487,30 @@ static void arm_spe__prep_branch_stack(struct arm_spe= _queue *speq) bstack->hw_idx =3D -1ULL; } =20 -static int arm_spe__inject_event(union perf_event *event, struct perf_samp= le *sample, u64 type) +static int arm_spe__inject_event(struct arm_spe *spe, union perf_event *ev= ent, + struct perf_sample *sample, u64 type) { - event->header.size =3D perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + struct evsel *evsel =3D sample->evsel; + u64 branch_sample_type =3D 0; + size_t sz; + + if (!evsel && spe->session && spe->session->evlist) + evsel =3D evlist__id2evsel(spe->session->evlist, sample->id); + + if (evsel) + branch_sample_type =3D evsel->core.attr.branch_sample_type; + + event->header.type =3D PERF_RECORD_SAMPLE; + sz =3D perf_event__sample_event_size(sample, type, /*read_format=3D*/0, + branch_sample_type); + if (sz >=3D PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE= ); + return -EFAULT; + } + event->header.size =3D sz; + + return perf_event__synthesize_sample(event, type, /*read_format=3D*/0, + branch_sample_type, sample); } =20 static inline int @@ -502,7 +522,7 @@ arm_spe_deliver_synth_event(struct arm_spe *spe, int ret; =20 if (spe->synth_opts.inject) { - ret =3D arm_spe__inject_event(event, sample, spe->sample_type); + ret =3D arm_spe__inject_event(spe, event, sample, spe->sample_type); if (ret) return ret; } diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8a639d2e51a4..6ec48de29441 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1422,11 +1422,29 @@ static void cs_etm__update_last_branch_rb(struct cs= _etm_queue *etmq, bs->nr +=3D 1; } =20 -static int cs_etm__inject_event(union perf_event *event, +static int cs_etm__inject_event(struct cs_etm_auxtrace *etm, union perf_ev= ent *event, struct perf_sample *sample, u64 type) { - event->header.size =3D perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + struct evsel *evsel =3D sample->evsel; + u64 branch_sample_type =3D 0; + size_t sz; + + if (!evsel && etm->session && etm->session->evlist) + evsel =3D evlist__id2evsel(etm->session->evlist, sample->id); + + if (evsel) + branch_sample_type =3D evsel->core.attr.branch_sample_type; + + sz =3D perf_event__sample_event_size(sample, type, /*read_format=3D*/0, + branch_sample_type); + if (sz >=3D PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE= ); + return -EFAULT; + } + event->header.size =3D sz; + + return perf_event__synthesize_sample(event, type, /*read_format=3D*/0, + branch_sample_type, sample); } =20 =20 @@ -1592,7 +1610,7 @@ static int cs_etm__synth_instruction_sample(struct cs= _etm_queue *etmq, sample.branch_stack =3D tidq->last_branch; =20 if (etm->synth_opts.inject) { - ret =3D cs_etm__inject_event(event, &sample, + ret =3D cs_etm__inject_event(etm, event, &sample, etm->instructions_sample_type); if (ret) return ret; @@ -1667,7 +1685,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_= queue *etmq, } =20 if (etm->synth_opts.inject) { - ret =3D cs_etm__inject_event(event, &sample, + ret =3D cs_etm__inject_event(etm, event, &sample, etm->branches_sample_type); if (ret) return ret; diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c index 382255393fb3..0b18ebd13f7c 100644 --- a/tools/perf/util/intel-bts.c +++ b/tools/perf/util/intel-bts.c @@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_b= ts_queue *btsq, event.sample.header.size =3D bts->branches_event_size; ret =3D perf_event__synthesize_sample(&event, bts->branches_sample_type, - 0, &sample); + /*read_format=3D*/0, /*branch_sample_type=3D*/0, + &sample); if (ret) return ret; } diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index fc9eec8b54b8..dd2637678b40 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1728,11 +1728,30 @@ static void intel_pt_prep_b_sample(struct intel_pt = *pt, event->sample.header.misc =3D sample->cpumode; } =20 -static int intel_pt_inject_event(union perf_event *event, +static int intel_pt_inject_event(struct intel_pt *pt, union perf_event *ev= ent, struct perf_sample *sample, u64 type) { - event->header.size =3D perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + struct evsel *evsel =3D sample->evsel; + u64 branch_sample_type =3D 0; + size_t sz; + + if (!evsel && pt->session && pt->session->evlist) + evsel =3D evlist__id2evsel(pt->session->evlist, sample->id); + + if (evsel) + branch_sample_type =3D evsel->core.attr.branch_sample_type; + + event->header.type =3D PERF_RECORD_SAMPLE; + sz =3D perf_event__sample_event_size(sample, type, /*read_format=3D*/0, + branch_sample_type); + if (sz >=3D PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE= ); + return -EFAULT; + } + event->header.size =3D sz; + + return perf_event__synthesize_sample(event, type, /*read_format=3D*/0, + branch_sample_type, sample); } =20 static inline int intel_pt_opt_inject(struct intel_pt *pt, @@ -1742,7 +1761,7 @@ static inline int intel_pt_opt_inject(struct intel_pt= *pt, if (!pt->synth_opts.inject) return 0; =20 - return intel_pt_inject_event(event, sample, type); + return intel_pt_inject_event(pt, event, sample, type); } =20 static int intel_pt_deliver_synth_event(struct intel_pt *pt, diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic= -events.c index 85bee747f4cd..2461f25a4d7d 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct pe= rf_tool *tool, return process(tool, (union perf_event *) &event, NULL, machine); } =20 -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64= type, u64 read_format) +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64= type, u64 read_format, + u64 branch_sample_type) { size_t sz, result =3D sizeof(struct perf_record_sample); =20 @@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct pe= rf_sample *sample, u64 type, =20 if (type & PERF_SAMPLE_BRANCH_STACK) { sz =3D sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz +=3D 2 * sizeof(u64); + /* nr */ + sz +=3D sizeof(u64); + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) + sz +=3D sizeof(u64); result +=3D sz; } =20 @@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __= u64 read_format, } =20 int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 r= ead_format, - const struct perf_sample *sample) + u64 branch_sample_type, const struct perf_sample *sample) { __u64 *array; size_t sz; @@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *= event, u64 type, u64 read_fo =20 if (type & PERF_SAMPLE_BRANCH_STACK) { sz =3D sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz +=3D 2 * sizeof(u64); - memcpy(array, sample->branch_stack, sz); + + *array++ =3D sample->branch_stack->nr; + + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) { + if (sample->no_hw_idx) + *array++ =3D 0; + else + *array++ =3D sample->branch_stack->hw_idx; + } + + memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample),= sz); array =3D (void *)array + sz; } =20 diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic= -events.h index b0edad0c3100..8c7f49f9ccf5 100644 --- a/tools/perf/util/synthetic-events.h +++ b/tools/perf/util/synthetic-events.h @@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_= tool *tool, union perf_ int perf_event__synthesize_modules(const struct perf_tool *tool, perf_even= t__handler_t process, struct machine *machine); int perf_event__synthesize_namespaces(const struct perf_tool *tool, union = perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, st= ruct machine *machine); int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_even= t__handler_t process, struct machine *machine); -int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 r= ead_format, const struct perf_sample *sample); +int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 r= ead_format, + u64 branch_sample_type, const struct perf_sample *sample); int perf_event__synthesize_stat_config(const struct perf_tool *tool, struc= t perf_stat_config *config, perf_event__handler_t process, struct machine *= machine); int perf_event__synthesize_stat_events(struct perf_stat_config *config, co= nst struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t pr= ocess, bool attrs); int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 ti= me, u64 type, perf_event__handler_t process, struct machine *machine); @@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct per= f_session *session, =20 int perf_tool__process_synth_event(const struct perf_tool *tool, union per= f_event *event, struct machine *machine, perf_event__handler_t process); =20 -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64= type, u64 read_format); +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64= type, + u64 read_format, u64 branch_sample_type); =20 int __machine__synthesize_threads(struct machine *machine, const struct pe= rf_tool *tool, struct target *target, struct perf_thread_map *threads, --=20 2.54.0.631.ge1b05301d1-goog From nobody Mon May 25 04:34:28 2026 Received: from mail-dl1-f73.google.com (mail-dl1-f73.google.com [74.125.82.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32219399D0B for ; Mon, 18 May 2026 22:43:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779144215; cv=none; b=maHYBcU2hnXqRscbPfTW6tCopQHWNAvPsJnFyOSn1bvHUQ6Q4tQFsqmvb+Kgq8B1CdLeMF1P6HQCEjbwPgNd8VFNtwjUzstSmj9FlL8Xajz5Xe+LGkdUA/+gcWTn7SfTG58l1QXzUz+7B8av939YcLAPs/AZuSqhEqOxO65A9Ek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779144215; c=relaxed/simple; bh=TfICAEzvWOk2mxaCMD2CPWDvxjawe769J89fx97LJfk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=a43JegyxKMcprHrrnKfdWSXe6Im1psRPQ133kdRCcW4m3pbBZE0rCF980hJ7aLrLLoTjgYLeV+UYPdF5e1dGglgSFwVXuiAn1HmjPfXsgnklF/vA1AYUjM63tBC+GuVcUa7tztyuyhwbBZ9jioI3e+ezTv37dKKPGYttZuFWEiI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rsvVP03u; arc=none smtp.client-ip=74.125.82.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rsvVP03u" Received: by mail-dl1-f73.google.com with SMTP id a92af1059eb24-1329791f18fso5021918c88.1 for ; Mon, 18 May 2026 15:43:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779144212; x=1779749012; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9isfTE4P0fJ6VI+Gl9uW+M+jrD927XwM2kSaRzjsyMQ=; b=rsvVP03u0WdXxfUhF+n5F+VbcDUZM6djRffRz+2hBlbnYCWSlSXFNPRWhmoG9Yu1gP m4PqUtVQvdX4Mr4wFVfLOVT54p1xxKvuOkITSvbOfJVe5D3BRQhR5e+9xN43FO1b2Vp6 AUxI/6mdBwxRCpj6ag0JM61J0ZC5HiO6Bh6EiLPd4rbAKNSrLpuLPN0d+DJOiYgmXGGi 6LxQTDhfr3gDs87FrA+vPs2bZPGg2pvj8jcY03afM7hz8CuQmvZ5iGajWom7qXnzMAz0 yKVFOjVD+O4Wtg2I7sKv3iUZFLk1hVaIKWU9l+ofaU3t70RWZ9Ug09O8pIho1YE/7cWW XUIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779144212; x=1779749012; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9isfTE4P0fJ6VI+Gl9uW+M+jrD927XwM2kSaRzjsyMQ=; b=KsuBEPTEgQvBSwt2TAzroyqwddxQ8CDyqC2nG5nM/Uby2gfikMzK2aL0nyEX/d3PgW ZB+b/iFpaXTvLfP0dVmLBbYKZJDQFDpxC1KVnaPkVw/rQ7pHjvEf8ULEddYTQQIryx6p Oeb2Zkj4CQjcluJ+d06MQAMaMx3hUV7qr62WYBSj1ZGxMFDgUIdhPdSRLosjTaz9sUX0 wi+AUUscfzoZzeg5Kdael3ek+24IcFchOXiVAMAyQEXHRYZSMHIX2+C9uAUSnX+jlKQ4 Nd5W8CZnHPCRN6UnLbfLmHDaOvUwC7F3vILxcMR+2cZCfeOhFKASvLZl0mNzUwSvpwDc GgZg== X-Forwarded-Encrypted: i=1; AFNElJ9eWvOLiW6VxiPQ/f2QiwYpsKgVQxibfodI77ROql5ERiZqM69MmPAQB3CkOGjjJv8lEPt6sPVchWbcJv8=@vger.kernel.org X-Gm-Message-State: AOJu0Yz5go8p5aFDOOeUZkWrmtjlg31nP5gX2W2Uxm+C1pnTI0GwXWVy 2Uum3KGp4Aut0Wg2QhC3I0qJTgoqB8/SstJlW9VdKi6gaOBNx3wp2Muz9TFlaoh1vwq/i7bzfk/ n7iqa62SD0Q== X-Received: from dlk5.prod.google.com ([2002:a05:7022:105:b0:135:3d53:71e]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:7022:ea26:b0:12c:8b9:71d9 with SMTP id a92af1059eb24-1350483ccf4mr7737287c88.27.1779144212109; Mon, 18 May 2026 15:43:32 -0700 (PDT) Date: Mon, 18 May 2026 15:43:25 -0700 In-Reply-To: <20260518224325.3037838-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260518203805.2955241-1-irogers@google.com> <20260518224325.3037838-1-irogers@google.com> X-Mailer: git-send-email 2.54.0.631.ge1b05301d1-goog Message-ID: <20260518224325.3037838-3-irogers@google.com> Subject: [PATCH v8 2/2] perf inject: Fix itrace branch stack synthesis From: Ian Rogers To: irogers@google.com, acme@kernel.org, namhyung@kernel.org Cc: adrian.hunter@intel.com, dapeng1.mi@linux.intel.com, james.clark@linaro.org, leo.yan@linux.dev, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, mingo@redhat.com, peterz@infradead.org, ravi.bangoria@amd.com, thomas.falcon@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When using "perf inject --itrace=3DL" to synthesize branch stacks from AUX data, several issues caused failures with the generated file: 1. The synthesized samples were delivered without the PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's sample_type. Fixed by using sample_type | evsel->synth_sample_type in intel_pt_do_synth_pebs_sample. 2. Modifying evsel->core.attr.sample_type early in __cmd_inject caused parse failures for subsequent records in the input file. Fixed by moving this modification to just before writing the header. 3. perf_event__repipe_sample was narrowed to only synthesize samples when branch stack injection was requested, and restored the use of perf_inject__cut_auxtrace_sample as a fallback to preserve functionality. 4. Potential Heap Overflow in perf_event__repipe_sample: Addressed by adding a check that prints an error and returns -EFAULT if the calculated event size exceeds PERF_SAMPLE_MAX_SIZE. 5. Header vs Payload Mismatch in __cmd_inject: Addressed by narrowing the condition so that HEADER_BRANCH_STACK is only set in the file header if add_last_branch was true. 6. NULL Pointer Dereference in intel-pt.c: When branch stack injection is requested (add_last_branch is true) but last_branch is false (e.g., perf inject --itrace=3DL), ptq->last_branch was not allocated. However, PEBS branch stack synthesis (via synth_sample_type) still forced LBR handling in do_synth_pebs_sample(), dereferencing the NULL ptq->last_branch pointer. Guarding the dereference is not sufficient because downstream sample size calculation and synthesis strictly require a non-NULL branch_stack when the bit is set. Fixed by ensuring ptq->last_branch is allocated in intel_pt_alloc_queue() when add_last_branch is requested. 7. Modifying event attributes in perf_event__repipe_attr in-place caused SIGSEGV on read-only mmap buffers in file mode and downstream parser breakage in pipe mode. Fixed by processing the unmodified attribute first, returning immediately in non-pipe mode, and correctly synthesizing a new attribute event for pipe output using perf_event__synthesize_attr. Also: - Added a size validation check and integer underflow protection when parsing n_ids. - Prevented Trailing ID memory corruption by zero-initializing the local attr copy and safely copying using min_t(size_t, sizeof(attr), event->attr.attr.size). - Resolved ID array parsing mismatch downstream by expanding attr.size to sizeof(struct perf_event_attr) before synthesis to guarantee perfect header/attribute size alignment. 8. Potential dangling pointer vulnerability in perf_event__repipe_sample: Addressed by restoring the original sample->branch_stack pointer before returning, including on early error return paths. 9. Off-by-one error in sample size check in perf_event__repipe_sample: Fixed by checking if sz >=3D PERF_SAMPLE_MAX_SIZE instead of >. 10. Unadvertised size field left in payload by cut_auxtrace_sample: Addressed by excluding the 8-byte size field from the copied payload to correctly match the cleared PERF_SAMPLE_AUX bit. Cut the AUX sample payload even if size is 0. 11. Inaccurate sample size calculation and uninitialized memory leaks in convert_sample_callchain: Fixed by replacing manual arithmetic with perf_event__sample_event_size and adding a bounds check against PERF_SAMPLE_MAX_SIZE. 12. Omission of branch_sample_type in file headers: Addressed by expanding older, smaller attributes to PERF_ATTR_SIZE_VER2 in __cmd_inject to ensure branch_sample_type is not silently omitted. Fixes: 0f0aa5e0693c ("perf inject: Add Instruction Tracing support") Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers --- tools/perf/builtin-inject.c | 153 ++++++++++++++++++++++++++++++++---- tools/perf/util/intel-pt.c | 8 +- 2 files changed, 142 insertions(+), 19 deletions(-) diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 2f20e782c7f2..7a64935b7e2b 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -216,12 +216,23 @@ static int perf_event__repipe_op4_synth(const struct = perf_tool *tool, return perf_event__repipe_synth(tool, event); } =20 +static int perf_event__repipe_synth_cb(const struct perf_tool *tool, + union perf_event *event, + struct perf_sample *sample __maybe_unused, + struct machine *machine __maybe_unused) +{ + return perf_event__repipe_synth(tool, event); +} + static int perf_event__repipe_attr(const struct perf_tool *tool, union perf_event *event, struct evlist **pevlist) { struct perf_inject *inject =3D container_of(tool, struct perf_inject, tool); + struct perf_event_attr attr; + size_t n_ids; + u64 *ids; int ret; =20 ret =3D perf_event__process_attr(tool, event, pevlist); @@ -232,7 +243,37 @@ static int perf_event__repipe_attr(const struct perf_t= ool *tool, if (!inject->output.is_pipe) return 0; =20 - return perf_event__repipe_synth(tool, event); + if (!inject->itrace_synth_opts.set) + return perf_event__repipe_synth(tool, event); + + if (event->header.size < sizeof(struct perf_event_header) + sizeof(u64)) { + pr_err("Attribute event size %u is too small\n", event->header.size); + return -EINVAL; + } + + if (event->header.size - sizeof(event->header) < event->attr.attr.size) { + pr_err("Attribute event size %u is too small for attr.size %u\n", + event->header.size, event->attr.attr.size); + return -EINVAL; + } + + memset(&attr, 0, sizeof(attr)); + memcpy(&attr, &event->attr.attr, + min_t(size_t, sizeof(attr), (size_t)event->attr.attr.size)); + + n_ids =3D event->header.size - sizeof(event->header) - event->attr.attr.s= ize; + n_ids /=3D sizeof(u64); + ids =3D perf_record_header_attr_id(event); + + attr.size =3D sizeof(struct perf_event_attr); + attr.sample_type &=3D ~PERF_SAMPLE_AUX; + + if (inject->itrace_synth_opts.add_last_branch) { + attr.sample_type |=3D PERF_SAMPLE_BRANCH_STACK; + attr.branch_sample_type |=3D PERF_SAMPLE_BRANCH_HW_INDEX; + } + return perf_event__synthesize_attr(tool, &attr, (u32)n_ids, ids, + perf_event__repipe_synth_cb); } =20 static int perf_event__repipe_event_update(const struct perf_tool *tool, @@ -331,8 +372,8 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *in= ject, union perf_event *event, struct perf_sample *sample) { - size_t sz1 =3D sample->aux_sample.data - (void *)event; - size_t sz2 =3D event->header.size - sample->aux_sample.size - sz1; + size_t sz1 =3D sample->aux_sample.data - (void *)event - sizeof(u64); + size_t sz2 =3D event->header.size - sample->aux_sample.size - (sz1 + size= of(u64)); union perf_event *ev; =20 if (inject->event_copy =3D=3D NULL) { @@ -343,13 +384,12 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *= inject, ev =3D (union perf_event *)inject->event_copy; if (sz1 > event->header.size || sz2 > event->header.size || sz1 + sz2 > event->header.size || - sz1 < sizeof(struct perf_event_header) + sizeof(u64)) + sz1 < sizeof(struct perf_event_header)) return event; =20 memcpy(ev, event, sz1); memcpy((void *)ev + sz1, (void *)event + event->header.size - sz2, sz2); ev->header.size =3D sz1 + sz2; - ((u64 *)((void *)ev + sz1))[-1] =3D 0; =20 return ev; } @@ -369,14 +409,77 @@ static int perf_event__repipe_sample(const struct per= f_tool *tool, struct perf_inject *inject =3D container_of(tool, struct perf_inject, tool); =20 - if (evsel && evsel->handler) { + if (evsel =3D=3D NULL) + return perf_event__repipe_synth(tool, event); + + if (evsel->handler) { inject_handler f =3D evsel->handler; return f(tool, event, sample, evsel, machine); } =20 build_id__mark_dso_hit(tool, event, sample, evsel, machine); =20 - if (inject->itrace_synth_opts.set && sample->aux_sample.size) { + if (inject->itrace_synth_opts.set && + (inject->itrace_synth_opts.last_branch || + inject->itrace_synth_opts.add_last_branch)) { + union perf_event *event_copy =3D (void *)inject->event_copy; + struct branch_stack dummy_bs =3D { .nr =3D 0, .hw_idx =3D 0 }; + int err; + size_t sz; + u64 orig_type =3D evsel->core.attr.sample_type; + u64 orig_branch_type =3D evsel->core.attr.branch_sample_type; + + struct branch_stack *orig_bs =3D sample->branch_stack; + + if (event_copy =3D=3D NULL) { + inject->event_copy =3D malloc(PERF_SAMPLE_MAX_SIZE); + if (!inject->event_copy) + return -ENOMEM; + + event_copy =3D (void *)inject->event_copy; + } + + if (!sample->branch_stack) + sample->branch_stack =3D &dummy_bs; + + if (inject->itrace_synth_opts.add_last_branch) { + /* Temporarily add in type bits for synthesis. */ + evsel->core.attr.sample_type |=3D PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |=3D PERF_SAMPLE_BRANCH_HW_INDEX; + } + evsel->core.attr.sample_type &=3D ~PERF_SAMPLE_AUX; + + sz =3D perf_event__sample_event_size(sample, evsel->core.attr.sample_typ= e, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type); + + if (sz >=3D PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZ= E); + evsel->core.attr.sample_type =3D orig_type; + evsel->core.attr.branch_sample_type =3D orig_branch_type; + sample->branch_stack =3D orig_bs; + return -EFAULT; + } + + event_copy->header.type =3D PERF_RECORD_SAMPLE; + event_copy->header.misc =3D event->header.misc; + event_copy->header.size =3D sz; + + err =3D perf_event__synthesize_sample(event_copy, evsel->core.attr.sampl= e_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + + evsel->core.attr.sample_type =3D orig_type; + evsel->core.attr.branch_sample_type =3D orig_branch_type; + sample->branch_stack =3D orig_bs; + + if (err) { + pr_err("Failed to synthesize sample\n"); + return err; + } + event =3D event_copy; + } else if (inject->itrace_synth_opts.set && + (evsel->core.attr.sample_type & PERF_SAMPLE_AUX)) { event =3D perf_inject__cut_auxtrace_sample(inject, event, sample); if (IS_ERR(event)) return PTR_ERR(event); @@ -397,7 +500,7 @@ static int perf_event__convert_sample_callchain(const s= truct perf_tool *tool, struct callchain_cursor_node *node; struct thread *thread; u64 sample_type =3D evsel->core.attr.sample_type; - u32 sample_size =3D event->header.size; + size_t sz; u64 i, k; int ret; =20 @@ -456,15 +559,18 @@ static int perf_event__convert_sample_callchain(const= struct perf_tool *tool, out: memcpy(event_copy, event, sizeof(event->header)); =20 - /* adjust sample size for stack and regs */ - sample_size -=3D sample->user_stack.size; - sample_size -=3D (hweight64(evsel->core.attr.sample_regs_user) + 1) * siz= eof(u64); - sample_size +=3D (sample->callchain->nr + 1) * sizeof(u64); - event_copy->header.size =3D sample_size; - /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &=3D ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); =20 + sz =3D perf_event__sample_event_size(sample, sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type); + if (sz >=3D PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE= ); + return -EFAULT; + } + event_copy->header.size =3D sz; + ret =3D perf_event__synthesize_sample(event_copy, sample_type, evsel->core.attr.read_format, evsel->core.attr.branch_sample_type, sample); @@ -2442,12 +2548,27 @@ static int __cmd_inject(struct perf_inject *inject) * synthesized hardware events, so clear the feature flag. */ if (inject->itrace_synth_opts.set) { + struct evsel *evsel; + perf_header__clear_feat(&session->header, HEADER_AUXTRACE); - if (inject->itrace_synth_opts.last_branch || - inject->itrace_synth_opts.add_last_branch) + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type &=3D ~PERF_SAMPLE_AUX; + } + + if (inject->itrace_synth_opts.add_last_branch) { perf_header__set_feat(&session->header, HEADER_BRANCH_STACK); + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type |=3D PERF_SAMPLE_BRANCH_STACK; + if (evsel->core.attr.size < PERF_ATTR_SIZE_VER2) + evsel->core.attr.size =3D PERF_ATTR_SIZE_VER2; + evsel->core.attr.branch_sample_type |=3D + PERF_SAMPLE_BRANCH_HW_INDEX; + } + } } =20 /* diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index dd2637678b40..d9c86ac49748 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1307,7 +1307,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(st= ruct intel_pt *pt, goto out_free; } =20 - if (pt->synth_opts.last_branch || pt->synth_opts.other_events) { + if (pt->synth_opts.last_branch || pt->synth_opts.add_last_branch || + pt->synth_opts.other_events) { unsigned int entry_cnt =3D max(LBRS_MAX, pt->br_stack_sz); =20 ptq->last_branch =3D intel_pt_alloc_br_stack(entry_cnt); @@ -2505,7 +2506,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel= _pt_queue *ptq, struct evse intel_pt_add_xmm(intr_regs, pos, items, regs_mask); } =20 - if (sample_type & PERF_SAMPLE_BRANCH_STACK) { + if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) { if (items->mask[INTEL_PT_LBR_0_POS] || items->mask[INTEL_PT_LBR_1_POS] || items->mask[INTEL_PT_LBR_2_POS]) { @@ -2576,7 +2577,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel= _pt_queue *ptq, struct evse sample.transaction =3D txn; } =20 - ret =3D intel_pt_deliver_synth_event(pt, event, &sample, sample_type); + ret =3D intel_pt_deliver_synth_event(pt, event, &sample, + sample_type | evsel->synth_sample_type); perf_sample__exit(&sample); return ret; } --=20 2.54.0.631.ge1b05301d1-goog