tools/perf/bench/inject-buildid.c | 9 +- tools/perf/builtin-inject.c | 153 +++++++++++++++++++++++++---- tools/perf/tests/dlfilter-test.c | 8 +- tools/perf/tests/sample-parsing.c | 5 +- tools/perf/util/arm-spe.c | 28 +++++- tools/perf/util/cs-etm.c | 28 +++++- tools/perf/util/intel-bts.c | 3 +- tools/perf/util/intel-pt.c | 32 ++++-- tools/perf/util/synthetic-events.c | 25 +++-- tools/perf/util/synthetic-events.h | 6 +- 10 files changed, 247 insertions(+), 50 deletions(-)
An intel-pt trace can be turned into LBR events either in perf script
or perf inject with the --itrace=L option. With perf inject the
generated perf.data file failed to be parsed as the sample events were
out of sync with their perf_event_attr. A range of fixes were
required.
This patch was separated from a large perf script refactor that
highlighted the breakage:
https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
v5:
- Restored the missing PEBS branch stack synthesis fixes in intel-pt.c
which were accidentally dropped in a previous rebase/conflict
resolution.
- Addressed the pipe mode size mismatch and ID array out-of-bounds read:
* Safely copy the incoming attribute payload using min_t and
memset zero, preventing trailing ID corruption.
* Explicitly set the synthesized event's attr.size to match the tool's
physical sizeof(struct perf_event_attr), guaranteeing perfect offset
alignment and removing any risk of hallucinated/garbage IDs or
underflow out-of-bounds reads.
- Refactored both commit descriptions to strictly focus on code changes,
deferring meta-commentary and implementation details exclusively to
the cover letter.
v4:
- Avoid temporary regressions in Commit 1:
* Used local masked sample_type in convert_sample_callchain instead of
unmasked evsel attribute, preventing heap overflows.
* Promoted hardware tracer signature changes and dynamic retrieval of
branch_sample_type to Commit 1, removing hardcoded 0 bugs.
* Checked sample->evsel first before performing evlist__id2evsel lookup
to optimize evsel retrieval when already populated.
- Address critical security and correctness feedback in Commit 2:
* Added check in perf_event__repipe_attr to prevent n_ids underflow.
* Fixed early return error path in perf_event__repipe_sample to
prevent state corruption and dangling pointers on dummy_bs.
* Ensured perf_inject__cut_auxtrace_sample cuts the 8-byte size field
even when aux_sample.size is 0 to prevent parser misalignment.
* Expanded older attributes to PERF_ATTR_SIZE_VER2 in file mode
within __cmd_inject to prevent silent truncation of
branch_sample_type.
* Added bounds checks against PERF_SAMPLE_MAX_SIZE to all hardware
tracing synthetic helpers to prevent heap buffer overflows.
* Fixed checkpatch.pl warnings/errors for line-wrapping.
v3:
- Add missing Fixes: tags on both commits.
- Refactor perf_event__repipe_attr to avoid in-place modifications on
read-only mmap buffers, preventing SIGSEGV in file mode and premature
evsel updates in pipe mode.
- Use perf_event__synthesize_attr to correctly construct and repipe
attributes in pipe mode.
- Replace manual arithmetic in convert_sample_callchain with
perf_event__sample_event_size to prevent uninitialized memory leaks.
- Retrieve evsel branch_sample_type dynamically in util/arm-spe.c and
util/cs-etm.c instead of hardcoding 0, resolving missing hw_idx field
on synthesized branch stacks.
v2: Response to sashiko fixes for patch 2, Namhyung's acked-by for patch 1.
v1: https://lore.kernel.org/lkml/20260428070328.1880314-1-irogers@google.com/
Ian Rogers (2):
perf event: Fix size of synthesized sample with branch stacks
perf inject: Fix itrace branch stack synthesis
tools/perf/bench/inject-buildid.c | 9 +-
tools/perf/builtin-inject.c | 153 +++++++++++++++++++++++++----
tools/perf/tests/dlfilter-test.c | 8 +-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/util/arm-spe.c | 28 +++++-
tools/perf/util/cs-etm.c | 28 +++++-
tools/perf/util/intel-bts.c | 3 +-
tools/perf/util/intel-pt.c | 32 ++++--
tools/perf/util/synthetic-events.c | 25 +++--
tools/perf/util/synthetic-events.h | 6 +-
10 files changed, 247 insertions(+), 50 deletions(-)
--
2.54.0.631.ge1b05301d1-goog
An intel-pt trace can be turned into LBR events either in perf script
or perf inject with the --itrace=L option. With perf inject the
generated perf.data file failed to be parsed as the sample events were
out of sync with their perf_event_attr. A range of fixes were
required.
This patch was separated from a large perf script refactor that
highlighted the breakage:
https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
v6:
- Address critical security and correctness feedback in Commit 2:
* Fixed potential out-of-bounds read in perf_event__repipe_attr() by
moving the header.size validation check before accessing
event->attr.attr.size or copying.
* Prevented 32-bit integer wrapping overflow on header.size validation
by using subtraction instead of addition.
- Restored PEBS LBR synthesis fixes in tools/perf/util/intel-pt.c.
v5:
- Restored the missing PEBS branch stack synthesis fixes in intel-pt.c
which were accidentally dropped in a previous rebase/conflict
resolution.
- Addressed the pipe mode size mismatch and ID array out-of-bounds read:
* Safely copy the incoming attribute payload using min_t and
memset zero, preventing trailing ID corruption.
* Explicitly set the synthesized event's attr.size to match the tool's
physical sizeof(struct perf_event_attr), guaranteeing perfect offset
alignment and removing any risk of hallucinated/garbage IDs or
underflow out-of-bounds reads.
- Refactored both commit descriptions to strictly focus on code changes,
deferring meta-commentary and implementation details exclusively to
the cover letter.
v4:
- Avoid temporary regressions in Commit 1:
* Used local masked sample_type in convert_sample_callchain instead of
unmasked evsel attribute, preventing heap overflows.
* Promoted hardware tracer signature changes and dynamic retrieval of
branch_sample_type to Commit 1, removing hardcoded 0 bugs.
* Checked sample->evsel first before performing evlist__id2evsel lookup
to optimize evsel retrieval when already populated.
- Address critical security and correctness feedback in Commit 2:
* Added check in perf_event__repipe_attr to prevent n_ids underflow.
* Fixed early return error path in perf_event__repipe_sample to
prevent state corruption and dangling pointers on dummy_bs.
* Ensured perf_inject__cut_auxtrace_sample cuts the 8-byte size field
even when aux_sample.size is 0 to prevent parser misalignment.
* Expanded older attributes to PERF_ATTR_SIZE_VER2 in file mode
within __cmd_inject to prevent silent truncation of
branch_sample_type.
* Added bounds checks against PERF_SAMPLE_MAX_SIZE to all hardware
tracing synthetic helpers to prevent heap buffer overflows.
* Fixed checkpatch.pl warnings/errors for line-wrapping.
v3:
- Add missing Fixes: tags on both commits.
- Refactor perf_event__repipe_attr to avoid in-place modifications on
read-only mmap buffers, preventing SIGSEGV in file mode and premature
evsel updates in pipe mode.
- Use perf_event__synthesize_attr to correctly construct and repipe
attributes in pipe mode.
- Replace manual arithmetic in convert_sample_callchain with
perf_event__sample_event_size to prevent uninitialized memory leaks.
- Retrieve evsel branch_sample_type dynamically in util/arm-spe.c and
util/cs-etm.c instead of hardcoding 0, resolving missing hw_idx field
on synthesized branch stacks.
v2: Response to sashiko fixes for patch 2, Namhyung's acked-by for patch 1.
v1: https://lore.kernel.org/lkml/20260428070328.1880314-1-irogers@google.com/
Ian Rogers (2):
perf event: Fix size of synthesized sample with branch stacks
perf inject: Fix itrace branch stack synthesis
tools/perf/bench/inject-buildid.c | 9 +-
tools/perf/builtin-inject.c | 160 +++++++++++++++++++++++++----
tools/perf/tests/dlfilter-test.c | 8 +-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/util/arm-spe.c | 28 ++++-
tools/perf/util/cs-etm.c | 28 ++++-
tools/perf/util/intel-bts.c | 3 +-
tools/perf/util/intel-pt.c | 32 ++++--
tools/perf/util/synthetic-events.c | 25 +++--
tools/perf/util/synthetic-events.h | 6 +-
10 files changed, 254 insertions(+), 50 deletions(-)
--
2.54.0.631.ge1b05301d1-goog
An intel-pt trace can be turned into LBR events either in perf script
or perf inject with the --itrace=L option. With perf inject the
generated perf.data file failed to be parsed as the sample events were
out of sync with their perf_event_attr. A range of fixes were
required.
This patch was separated from a large perf script refactor that
highlighted the breakage:
https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
v7:
- Fixed a critical NULL pointer dereference crash in Commit 2:
* In tools/perf/util/intel-pt.c, when branch stack injection is
requested (add_last_branch is true) but last_branch is false (such
as in perf inject --itrace=L), ptq->last_branch was not allocated.
* When PEBS branch stack synthesis is forced via
evsel->synth_sample_type, the code dereferenced ptq->last_branch
inside do_synth_pebs_sample (either in intel_pt_add_lbrs or by
setting nr = 0), causing a SEGSEGV.
* Fixed by ensuring ptq->last_branch is successfully allocated in
intel_pt_alloc_queue() when add_last_branch is requested.
v6:
- Address critical security and correctness feedback in Commit 2:
* Fixed potential out-of-bounds read in perf_event__repipe_attr() by
moving the header.size validation check before accessing
event->attr.attr.size or copying.
* Prevented 32-bit integer wrapping overflow on header.size validation
by using subtraction instead of addition.
- Restored PEBS LBR synthesis fixes in tools/perf/util/intel-pt.c.
v5:
- Restored the missing PEBS branch stack synthesis fixes in intel-pt.c
which were accidentally dropped in a previous rebase/conflict
resolution.
- Addressed the pipe mode size mismatch and ID array out-of-bounds read:
* Safely copy the incoming attribute payload using min_t and
memset zero, preventing trailing ID corruption.
* Explicitly set the synthesized event's attr.size to match the tool's
physical sizeof(struct perf_event_attr), guaranteeing perfect offset
alignment and removing any risk of hallucinated/garbage IDs or
underflow out-of-bounds reads.
- Refactored both commit descriptions to strictly focus on code changes,
deferring meta-commentary and implementation details exclusively to
the cover letter.
v4:
- Avoid temporary regressions in Commit 1:
* Used local masked sample_type in convert_sample_callchain instead of
unmasked evsel attribute, preventing heap overflows.
* Promoted hardware tracer signature changes and dynamic retrieval of
branch_sample_type to Commit 1, removing hardcoded 0 bugs.
* Checked sample->evsel first before performing evlist__id2evsel lookup
to optimize evsel retrieval when already populated.
- Address critical security and correctness feedback in Commit 2:
* Added check in perf_event__repipe_attr to prevent n_ids underflow.
* Fixed early return error path in perf_event__repipe_sample to
prevent state corruption and dangling pointers on dummy_bs.
* Ensured perf_inject__cut_auxtrace_sample cuts the 8-byte size field
even when aux_sample.size is 0 to prevent parser misalignment.
* Expanded older attributes to PERF_ATTR_SIZE_VER2 in file mode
within __cmd_inject to prevent silent truncation of
branch_sample_type.
* Added bounds checks against PERF_SAMPLE_MAX_SIZE to all hardware
tracing synthetic helpers to prevent heap buffer overflows.
* Fixed checkpatch.pl warnings/errors for line-wrapping.
v3:
- Add missing Fixes: tags on both commits.
- Refactor perf_event__repipe_attr to avoid in-place modifications on
read-only mmap buffers, preventing SIGSEGV in file mode and premature
evsel updates in pipe mode.
- Use perf_event__synthesize_attr to correctly construct and repipe
attributes in pipe mode.
- Replace manual arithmetic in convert_sample_callchain with
perf_event__sample_event_size to prevent uninitialized memory leaks.
- Retrieve evsel branch_sample_type dynamically in util/arm-spe.c and
util/cs-etm.c instead of hardcoding 0, resolving missing hw_idx field
on synthesized branch stacks.
v2: Response to sashiko fixes for patch 2, Namhyung's acked-by for patch 1.
v1: https://lore.kernel.org/lkml/20260428070328.1880314-1-irogers@google.com/
Ian Rogers (2):
perf event: Fix size of synthesized sample with branch stacks
perf inject: Fix itrace branch stack synthesis
tools/perf/bench/inject-buildid.c | 9 +-
tools/perf/builtin-inject.c | 160 +++++++++++++++++++++++++----
tools/perf/tests/dlfilter-test.c | 8 +-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/util/arm-spe.c | 28 ++++-
tools/perf/util/cs-etm.c | 28 ++++-
tools/perf/util/intel-bts.c | 3 +-
tools/perf/util/intel-pt.c | 35 +++++--
tools/perf/util/synthetic-events.c | 25 +++--
tools/perf/util/synthetic-events.h | 6 +-
10 files changed, 256 insertions(+), 51 deletions(-)
--
2.54.0.631.ge1b05301d1-goog
An intel-pt trace can be turned into LBR events either in perf script
or perf inject with the --itrace=L option. With perf inject the
generated perf.data file failed to be parsed as the sample events were
out of sync with their perf_event_attr. A range of fixes were
required.
This patch was separated from a large perf script refactor that
highlighted the breakage:
https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
v8:
- Avoid potential NULL pointer dereference of evsel in Commit 2:
* If machines__deliver_event() processes a malformed PERF_RECORD_READ
event with an unknown or missing sample ID, the evsel is passed
as NULL.
* Added an early evsel == NULL check in perf_event__repipe_sample()
to safely fallback and repipe the raw event unmodified, preventing
any segmentation faults.
v7:
- Fixed a critical NULL pointer dereference crash in Commit 2:
* In tools/perf/util/intel-pt.c, when branch stack injection is
requested (add_last_branch is true) but last_branch is false (such
as in perf inject --itrace=L), ptq->last_branch was not allocated.
* When PEBS branch stack synthesis is forced via
evsel->synth_sample_type, the code dereferenced ptq->last_branch
inside do_synth_pebs_sample (either in intel_pt_add_lbrs or by
setting nr = 0), causing a SEGSEGV.
* Fixed by ensuring ptq->last_branch is successfully allocated in
intel_pt_alloc_queue() when add_last_branch is requested.
v6:
- Address critical security and correctness feedback in Commit 2:
* Fixed potential out-of-bounds read in perf_event__repipe_attr() by
moving the header.size validation check before accessing
event->attr.attr.size or copying.
* Prevented 32-bit integer wrapping overflow on header.size validation
by using subtraction instead of addition.
- Restored PEBS LBR synthesis fixes in tools/perf/util/intel-pt.c.
v5:
- Restored the missing PEBS branch stack synthesis fixes in intel-pt.c
which were accidentally dropped in a previous rebase/conflict
resolution.
- Addressed the pipe mode size mismatch and ID array out-of-bounds read:
* Safely copy the incoming attribute payload using min_t and
memset zero, preventing trailing ID corruption.
* Explicitly set the synthesized event's attr.size to match the tool's
physical sizeof(struct perf_event_attr), guaranteeing perfect offset
alignment and removing any risk of hallucinated/garbage IDs or
underflow out-of-bounds reads.
- Refactored both commit descriptions to strictly focus on code changes,
deferring meta-commentary and implementation details exclusively to
the cover letter.
v4:
- Avoid temporary regressions in Commit 1:
* Used local masked sample_type in convert_sample_callchain instead of
unmasked evsel attribute, preventing heap overflows.
* Promoted hardware tracer signature changes and dynamic retrieval of
branch_sample_type to Commit 1, removing hardcoded 0 bugs.
* Checked sample->evsel first before performing evlist__id2evsel lookup
to optimize evsel retrieval when already populated.
- Address critical security and correctness feedback in Commit 2:
* Added check in perf_event__repipe_attr to prevent n_ids underflow.
* Fixed early return error path in perf_event__repipe_sample to
prevent state corruption and dangling pointers on dummy_bs.
* Ensured perf_inject__cut_auxtrace_sample cuts the 8-byte size field
even when aux_sample.size is 0 to prevent parser misalignment.
* Expanded older attributes to PERF_ATTR_SIZE_VER2 in file mode
within __cmd_inject to prevent silent truncation of
branch_sample_type.
* Added bounds checks against PERF_SAMPLE_MAX_SIZE to all hardware
tracing synthetic helpers to prevent heap buffer overflows.
* Fixed checkpatch.pl warnings/errors for line-wrapping.
v3:
- Add missing Fixes: tags on both commits.
- Refactor perf_event__repipe_attr to avoid in-place modifications on
read-only mmap buffers, preventing SIGSEGV in file mode and premature
evsel updates in pipe mode.
- Use perf_event__synthesize_attr to correctly construct and repipe
attributes in pipe mode.
- Replace manual arithmetic in convert_sample_callchain with
perf_event__sample_event_size to prevent uninitialized memory leaks.
- Retrieve evsel branch_sample_type dynamically in util/arm-spe.c and
util/cs-etm.c instead of hardcoding 0, resolving missing hw_idx field
on synthesized branch stacks.
v2: Response to sashiko fixes for patch 2, Namhyung's acked-by for patch 1.
v1: https://lore.kernel.org/lkml/20260428070328.1880314-1-irogers@google.com/
Ian Rogers (2):
perf event: Fix size of synthesized sample with branch stacks
perf inject: Fix itrace branch stack synthesis
tools/perf/bench/inject-buildid.c | 9 +-
tools/perf/builtin-inject.c | 165 +++++++++++++++++++++++++----
tools/perf/tests/dlfilter-test.c | 8 +-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/util/arm-spe.c | 28 ++++-
tools/perf/util/cs-etm.c | 28 ++++-
tools/perf/util/intel-bts.c | 3 +-
tools/perf/util/intel-pt.c | 35 ++++--
tools/perf/util/synthetic-events.c | 25 +++--
tools/perf/util/synthetic-events.h | 6 +-
10 files changed, 260 insertions(+), 52 deletions(-)
--
2.54.0.631.ge1b05301d1-goog
On Mon, May 18, 2026 at 03:43:23PM -0700, Ian Rogers wrote: > An intel-pt trace can be turned into LBR events either in perf script > or perf inject with the --itrace=L option. With perf inject the > generated perf.data file failed to be parsed as the sample events were > out of sync with their perf_event_attr. A range of fixes were > required. > > This patch was separated from a large perf script refactor that > highlighted the breakage: > https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/ Thanks, applied to perf-tools-next, for v7.2. - Arnaldo
Synthesizing branch stacks for Intel-PT highlighted an issue where
PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the
perf_event_attr branch_sample_type. This caused an incorrect size
calculation.
Fix the writing of the nr and hw_idx values during sample event
synthesis by passing the branch_sample_type into the sample size
and synthesis functions. Also update hardware tracers (Intel PT,
ARM SPE, CS-ETM) to retrieve and pass their branch_sample_type
dynamically to prevent payload misalignment.
Fixes: d3f85437ad6a ("perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX")
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/bench/inject-buildid.c | 9 ++++++---
tools/perf/builtin-inject.c | 12 +++++++++---
tools/perf/tests/dlfilter-test.c | 8 ++++++--
tools/perf/tests/sample-parsing.c | 5 +++--
tools/perf/util/arm-spe.c | 28 ++++++++++++++++++++++++----
tools/perf/util/cs-etm.c | 28 +++++++++++++++++++++++-----
tools/perf/util/intel-bts.c | 3 ++-
tools/perf/util/intel-pt.c | 27 +++++++++++++++++++++++----
tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++-------
tools/perf/util/synthetic-events.h | 6 ++++--
10 files changed, 118 insertions(+), 33 deletions(-)
diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
index aad572a78d7f..bfd2c5ec9488 100644
--- a/tools/perf/bench/inject-buildid.c
+++ b/tools/perf/bench/inject-buildid.c
@@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso,
event.header.type = PERF_RECORD_SAMPLE;
event.header.misc = PERF_RECORD_MISC_USER;
- event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0);
-
- perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample);
+ event.header.size = perf_event__sample_event_size(&sample, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ perf_event__synthesize_sample(&event, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
return writen(data->input_pipe[1], &event, event.header.size);
}
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index a2493f1097df..2f20e782c7f2 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -465,8 +465,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
- perf_event__synthesize_sample(event_copy, sample_type,
- evsel->core.attr.read_format, sample);
+ ret = perf_event__synthesize_sample(event_copy, sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+ if (ret) {
+ pr_err("Failed to synthesize sample\n");
+ return ret;
+ }
return perf_event__repipe_synth(tool, event_copy);
}
@@ -1102,7 +1107,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool,
sample_sw.period = sample->period;
sample_sw.time = sample->time;
perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type,
- evsel->core.attr.read_format, &sample_sw);
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, &sample_sw);
build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine);
ret = perf_event__repipe(tool, event_sw, &sample_sw, machine);
perf_sample__exit(&sample_sw);
diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c
index e63790c61d53..204663571943 100644
--- a/tools/perf/tests/dlfilter-test.c
+++ b/tools/perf/tests/dlfilter-test.c
@@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid
event->header.type = PERF_RECORD_SAMPLE;
event->header.misc = PERF_RECORD_MISC_USER;
- event->header.size = perf_event__sample_event_size(&sample, sample_type, 0);
- err = perf_event__synthesize_sample(event, sample_type, 0, &sample);
+ event->header.size = perf_event__sample_event_size(&sample, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ err = perf_event__synthesize_sample(event, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
if (err)
return test_result("perf_event__synthesize_sample() failed", TEST_FAIL);
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index a7327c942ca2..55f0b73ca20e 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
sample.read.one.lost = 1;
}
- sz = perf_event__sample_event_size(&sample, sample_type, read_format);
+ sz = perf_event__sample_event_size(&sample, sample_type, read_format,
+ evsel.core.attr.branch_sample_type);
bufsz = sz + 4096; /* Add a bit for overrun checking */
event = malloc(bufsz);
if (!event) {
@@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
event->header.size = sz;
err = perf_event__synthesize_sample(event, sample_type, read_format,
- &sample);
+ evsel.core.attr.branch_sample_type, &sample);
if (err) {
pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
"perf_event__synthesize_sample", sample_type, err);
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 2b31da231ef3..31f05f467810 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -487,10 +487,30 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq)
bstack->hw_idx = -1ULL;
}
-static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type)
+static int arm_spe__inject_event(struct arm_spe *spe, union perf_event *event,
+ struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && spe->session && spe->session->evlist)
+ evsel = evlist__id2evsel(spe->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ event->header.type = PERF_RECORD_SAMPLE;
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
static inline int
@@ -502,7 +522,7 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
int ret;
if (spe->synth_opts.inject) {
- ret = arm_spe__inject_event(event, sample, spe->sample_type);
+ ret = arm_spe__inject_event(spe, event, sample, spe->sample_type);
if (ret)
return ret;
}
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8a639d2e51a4..6ec48de29441 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1422,11 +1422,29 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
bs->nr += 1;
}
-static int cs_etm__inject_event(union perf_event *event,
+static int cs_etm__inject_event(struct cs_etm_auxtrace *etm, union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && etm->session && etm->session->evlist)
+ evsel = evlist__id2evsel(etm->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
@@ -1592,7 +1610,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
sample.branch_stack = tidq->last_branch;
if (etm->synth_opts.inject) {
- ret = cs_etm__inject_event(event, &sample,
+ ret = cs_etm__inject_event(etm, event, &sample,
etm->instructions_sample_type);
if (ret)
return ret;
@@ -1667,7 +1685,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
}
if (etm->synth_opts.inject) {
- ret = cs_etm__inject_event(event, &sample,
+ ret = cs_etm__inject_event(etm, event, &sample,
etm->branches_sample_type);
if (ret)
return ret;
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 382255393fb3..0b18ebd13f7c 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
event.sample.header.size = bts->branches_event_size;
ret = perf_event__synthesize_sample(&event,
bts->branches_sample_type,
- 0, &sample);
+ /*read_format=*/0, /*branch_sample_type=*/0,
+ &sample);
if (ret)
return ret;
}
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index fc9eec8b54b8..dd2637678b40 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1728,11 +1728,30 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
event->sample.header.misc = sample->cpumode;
}
-static int intel_pt_inject_event(union perf_event *event,
+static int intel_pt_inject_event(struct intel_pt *pt, union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && pt->session && pt->session->evlist)
+ evsel = evlist__id2evsel(pt->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ event->header.type = PERF_RECORD_SAMPLE;
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
static inline int intel_pt_opt_inject(struct intel_pt *pt,
@@ -1742,7 +1761,7 @@ static inline int intel_pt_opt_inject(struct intel_pt *pt,
if (!pt->synth_opts.inject)
return 0;
- return intel_pt_inject_event(event, sample, type);
+ return intel_pt_inject_event(pt, event, sample, type);
}
static int intel_pt_deliver_synth_event(struct intel_pt *pt,
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 85bee747f4cd..2461f25a4d7d 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool,
return process(tool, (union perf_event *) &event, NULL, machine);
}
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format)
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format,
+ u64 branch_sample_type)
{
size_t sz, result = sizeof(struct perf_record_sample);
@@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
+ /* nr */
+ sz += sizeof(u64);
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)
+ sz += sizeof(u64);
result += sz;
}
@@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
}
int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
- const struct perf_sample *sample)
+ u64 branch_sample_type, const struct perf_sample *sample)
{
__u64 *array;
size_t sz;
@@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
- memcpy(array, sample->branch_stack, sz);
+
+ *array++ = sample->branch_stack->nr;
+
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) {
+ if (sample->no_hw_idx)
+ *array++ = 0;
+ else
+ *array++ = sample->branch_stack->hw_idx;
+ }
+
+ memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz);
array = (void *)array + sz;
}
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index b0edad0c3100..8c7f49f9ccf5 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_
int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
-int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
+int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
+ u64 branch_sample_type, const struct perf_sample *sample);
int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine);
@@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session,
int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process);
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format);
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
+ u64 read_format, u64 branch_sample_type);
int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool,
struct target *target, struct perf_thread_map *threads,
--
2.54.0.631.ge1b05301d1-goog
When using "perf inject --itrace=L" to synthesize branch stacks from
AUX data, several issues caused failures with the generated file:
1. The synthesized samples were delivered without the
PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's
sample_type. Fixed by using sample_type | evsel->synth_sample_type
in intel_pt_do_synth_pebs_sample.
2. Modifying evsel->core.attr.sample_type early in __cmd_inject caused
parse failures for subsequent records in the input file. Fixed by
moving this modification to just before writing the header.
3. perf_event__repipe_sample was narrowed to only synthesize samples
when branch stack injection was requested, and restored the use of
perf_inject__cut_auxtrace_sample as a fallback to preserve
functionality.
4. Potential Heap Overflow in perf_event__repipe_sample: Addressed by
adding a check that prints an error and returns -EFAULT if the
calculated event size exceeds PERF_SAMPLE_MAX_SIZE.
5. Header vs Payload Mismatch in __cmd_inject: Addressed by narrowing
the condition so that HEADER_BRANCH_STACK is only set in the file
header if add_last_branch was true.
6. NULL Pointer Dereference in intel-pt.c: When branch stack injection
is requested (add_last_branch is true) but last_branch is false
(e.g., perf inject --itrace=L), ptq->last_branch was not allocated.
However, PEBS branch stack synthesis (via synth_sample_type) still
forced LBR handling in do_synth_pebs_sample(), dereferencing the
NULL ptq->last_branch pointer. Guarding the dereference is not
sufficient because downstream sample size calculation and synthesis
strictly require a non-NULL branch_stack when the bit is set.
Fixed by ensuring ptq->last_branch is allocated in
intel_pt_alloc_queue() when add_last_branch is requested.
7. Modifying event attributes in perf_event__repipe_attr in-place caused
SIGSEGV on read-only mmap buffers in file mode and downstream parser
breakage in pipe mode. Fixed by processing the unmodified attribute
first, returning immediately in non-pipe mode, and correctly
synthesizing a new attribute event for pipe output using
perf_event__synthesize_attr. Also:
- Added a size validation check and integer underflow protection when
parsing n_ids.
- Prevented Trailing ID memory corruption by zero-initializing the
local attr copy and safely copying using min_t(size_t, sizeof(attr),
event->attr.attr.size).
- Resolved ID array parsing mismatch downstream by expanding attr.size
to sizeof(struct perf_event_attr) before synthesis to guarantee
perfect header/attribute size alignment.
8. Potential dangling pointer vulnerability in perf_event__repipe_sample:
Addressed by restoring the original sample->branch_stack pointer
before returning, including on early error return paths.
9. Off-by-one error in sample size check in perf_event__repipe_sample:
Fixed by checking if sz >= PERF_SAMPLE_MAX_SIZE instead of >.
10. Unadvertised size field left in payload by cut_auxtrace_sample:
Addressed by excluding the 8-byte size field from the copied
payload to correctly match the cleared PERF_SAMPLE_AUX bit. Cut
the AUX sample payload even if size is 0.
11. Inaccurate sample size calculation and uninitialized memory leaks in
convert_sample_callchain: Fixed by replacing manual arithmetic with
perf_event__sample_event_size and adding a bounds check against
PERF_SAMPLE_MAX_SIZE.
12. Omission of branch_sample_type in file headers: Addressed by
expanding older, smaller attributes to PERF_ATTR_SIZE_VER2 in
__cmd_inject to ensure branch_sample_type is not silently omitted.
Fixes: 0f0aa5e0693c ("perf inject: Add Instruction Tracing support")
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-inject.c | 153 ++++++++++++++++++++++++++++++++----
tools/perf/util/intel-pt.c | 8 +-
2 files changed, 142 insertions(+), 19 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 2f20e782c7f2..7a64935b7e2b 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -216,12 +216,23 @@ static int perf_event__repipe_op4_synth(const struct perf_tool *tool,
return perf_event__repipe_synth(tool, event);
}
+static int perf_event__repipe_synth_cb(const struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample __maybe_unused,
+ struct machine *machine __maybe_unused)
+{
+ return perf_event__repipe_synth(tool, event);
+}
+
static int perf_event__repipe_attr(const struct perf_tool *tool,
union perf_event *event,
struct evlist **pevlist)
{
struct perf_inject *inject = container_of(tool, struct perf_inject,
tool);
+ struct perf_event_attr attr;
+ size_t n_ids;
+ u64 *ids;
int ret;
ret = perf_event__process_attr(tool, event, pevlist);
@@ -232,7 +243,37 @@ static int perf_event__repipe_attr(const struct perf_tool *tool,
if (!inject->output.is_pipe)
return 0;
- return perf_event__repipe_synth(tool, event);
+ if (!inject->itrace_synth_opts.set)
+ return perf_event__repipe_synth(tool, event);
+
+ if (event->header.size < sizeof(struct perf_event_header) + sizeof(u64)) {
+ pr_err("Attribute event size %u is too small\n", event->header.size);
+ return -EINVAL;
+ }
+
+ if (event->header.size - sizeof(event->header) < event->attr.attr.size) {
+ pr_err("Attribute event size %u is too small for attr.size %u\n",
+ event->header.size, event->attr.attr.size);
+ return -EINVAL;
+ }
+
+ memset(&attr, 0, sizeof(attr));
+ memcpy(&attr, &event->attr.attr,
+ min_t(size_t, sizeof(attr), (size_t)event->attr.attr.size));
+
+ n_ids = event->header.size - sizeof(event->header) - event->attr.attr.size;
+ n_ids /= sizeof(u64);
+ ids = perf_record_header_attr_id(event);
+
+ attr.size = sizeof(struct perf_event_attr);
+ attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ return perf_event__synthesize_attr(tool, &attr, (u32)n_ids, ids,
+ perf_event__repipe_synth_cb);
}
static int perf_event__repipe_event_update(const struct perf_tool *tool,
@@ -331,8 +372,8 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject,
union perf_event *event,
struct perf_sample *sample)
{
- size_t sz1 = sample->aux_sample.data - (void *)event;
- size_t sz2 = event->header.size - sample->aux_sample.size - sz1;
+ size_t sz1 = sample->aux_sample.data - (void *)event - sizeof(u64);
+ size_t sz2 = event->header.size - sample->aux_sample.size - (sz1 + sizeof(u64));
union perf_event *ev;
if (inject->event_copy == NULL) {
@@ -343,13 +384,12 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject,
ev = (union perf_event *)inject->event_copy;
if (sz1 > event->header.size || sz2 > event->header.size ||
sz1 + sz2 > event->header.size ||
- sz1 < sizeof(struct perf_event_header) + sizeof(u64))
+ sz1 < sizeof(struct perf_event_header))
return event;
memcpy(ev, event, sz1);
memcpy((void *)ev + sz1, (void *)event + event->header.size - sz2, sz2);
ev->header.size = sz1 + sz2;
- ((u64 *)((void *)ev + sz1))[-1] = 0;
return ev;
}
@@ -369,14 +409,77 @@ static int perf_event__repipe_sample(const struct perf_tool *tool,
struct perf_inject *inject = container_of(tool, struct perf_inject,
tool);
- if (evsel && evsel->handler) {
+ if (evsel == NULL)
+ return perf_event__repipe_synth(tool, event);
+
+ if (evsel->handler) {
inject_handler f = evsel->handler;
return f(tool, event, sample, evsel, machine);
}
build_id__mark_dso_hit(tool, event, sample, evsel, machine);
- if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
+ if (inject->itrace_synth_opts.set &&
+ (inject->itrace_synth_opts.last_branch ||
+ inject->itrace_synth_opts.add_last_branch)) {
+ union perf_event *event_copy = (void *)inject->event_copy;
+ struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 };
+ int err;
+ size_t sz;
+ u64 orig_type = evsel->core.attr.sample_type;
+ u64 orig_branch_type = evsel->core.attr.branch_sample_type;
+
+ struct branch_stack *orig_bs = sample->branch_stack;
+
+ if (event_copy == NULL) {
+ inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE);
+ if (!inject->event_copy)
+ return -ENOMEM;
+
+ event_copy = (void *)inject->event_copy;
+ }
+
+ if (!sample->branch_stack)
+ sample->branch_stack = &dummy_bs;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ /* Temporarily add in type bits for synthesis. */
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type);
+
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ evsel->core.attr.sample_type = orig_type;
+ evsel->core.attr.branch_sample_type = orig_branch_type;
+ sample->branch_stack = orig_bs;
+ return -EFAULT;
+ }
+
+ event_copy->header.type = PERF_RECORD_SAMPLE;
+ event_copy->header.misc = event->header.misc;
+ event_copy->header.size = sz;
+
+ err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+
+ evsel->core.attr.sample_type = orig_type;
+ evsel->core.attr.branch_sample_type = orig_branch_type;
+ sample->branch_stack = orig_bs;
+
+ if (err) {
+ pr_err("Failed to synthesize sample\n");
+ return err;
+ }
+ event = event_copy;
+ } else if (inject->itrace_synth_opts.set &&
+ (evsel->core.attr.sample_type & PERF_SAMPLE_AUX)) {
event = perf_inject__cut_auxtrace_sample(inject, event, sample);
if (IS_ERR(event))
return PTR_ERR(event);
@@ -397,7 +500,7 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
struct callchain_cursor_node *node;
struct thread *thread;
u64 sample_type = evsel->core.attr.sample_type;
- u32 sample_size = event->header.size;
+ size_t sz;
u64 i, k;
int ret;
@@ -456,15 +559,18 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
out:
memcpy(event_copy, event, sizeof(event->header));
- /* adjust sample size for stack and regs */
- sample_size -= sample->user_stack.size;
- sample_size -= (hweight64(evsel->core.attr.sample_regs_user) + 1) * sizeof(u64);
- sample_size += (sample->callchain->nr + 1) * sizeof(u64);
- event_copy->header.size = sample_size;
-
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
+ sz = perf_event__sample_event_size(sample, sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event_copy->header.size = sz;
+
ret = perf_event__synthesize_sample(event_copy, sample_type,
evsel->core.attr.read_format,
evsel->core.attr.branch_sample_type, sample);
@@ -2442,12 +2548,27 @@ static int __cmd_inject(struct perf_inject *inject)
* synthesized hardware events, so clear the feature flag.
*/
if (inject->itrace_synth_opts.set) {
+ struct evsel *evsel;
+
perf_header__clear_feat(&session->header,
HEADER_AUXTRACE);
- if (inject->itrace_synth_opts.last_branch ||
- inject->itrace_synth_opts.add_last_branch)
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+ }
+
+ if (inject->itrace_synth_opts.add_last_branch) {
perf_header__set_feat(&session->header,
HEADER_BRANCH_STACK);
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ if (evsel->core.attr.size < PERF_ATTR_SIZE_VER2)
+ evsel->core.attr.size = PERF_ATTR_SIZE_VER2;
+ evsel->core.attr.branch_sample_type |=
+ PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ }
}
/*
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index dd2637678b40..d9c86ac49748 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1307,7 +1307,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
goto out_free;
}
- if (pt->synth_opts.last_branch || pt->synth_opts.other_events) {
+ if (pt->synth_opts.last_branch || pt->synth_opts.add_last_branch ||
+ pt->synth_opts.other_events) {
unsigned int entry_cnt = max(LBRS_MAX, pt->br_stack_sz);
ptq->last_branch = intel_pt_alloc_br_stack(entry_cnt);
@@ -2505,7 +2506,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
intel_pt_add_xmm(intr_regs, pos, items, regs_mask);
}
- if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
+ if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) {
if (items->mask[INTEL_PT_LBR_0_POS] ||
items->mask[INTEL_PT_LBR_1_POS] ||
items->mask[INTEL_PT_LBR_2_POS]) {
@@ -2576,7 +2577,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
sample.transaction = txn;
}
- ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type);
+ ret = intel_pt_deliver_synth_event(pt, event, &sample,
+ sample_type | evsel->synth_sample_type);
perf_sample__exit(&sample);
return ret;
}
--
2.54.0.631.ge1b05301d1-goog
Synthesizing branch stacks for Intel-PT highlighted an issue where
PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the
perf_event_attr branch_sample_type. This caused an incorrect size
calculation.
Fix the writing of the nr and hw_idx values during sample event
synthesis by passing the branch_sample_type into the sample size
and synthesis functions. Also update hardware tracers (Intel PT,
ARM SPE, CS-ETM) to retrieve and pass their branch_sample_type
dynamically to prevent payload misalignment.
Fixes: d3f85437ad6a ("perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX")
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/bench/inject-buildid.c | 9 ++++++---
tools/perf/builtin-inject.c | 12 +++++++++---
tools/perf/tests/dlfilter-test.c | 8 ++++++--
tools/perf/tests/sample-parsing.c | 5 +++--
tools/perf/util/arm-spe.c | 28 ++++++++++++++++++++++++----
tools/perf/util/cs-etm.c | 28 +++++++++++++++++++++++-----
tools/perf/util/intel-bts.c | 3 ++-
tools/perf/util/intel-pt.c | 27 +++++++++++++++++++++++----
tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++-------
tools/perf/util/synthetic-events.h | 6 ++++--
10 files changed, 118 insertions(+), 33 deletions(-)
diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
index aad572a78d7f..bfd2c5ec9488 100644
--- a/tools/perf/bench/inject-buildid.c
+++ b/tools/perf/bench/inject-buildid.c
@@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso,
event.header.type = PERF_RECORD_SAMPLE;
event.header.misc = PERF_RECORD_MISC_USER;
- event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0);
-
- perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample);
+ event.header.size = perf_event__sample_event_size(&sample, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ perf_event__synthesize_sample(&event, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
return writen(data->input_pipe[1], &event, event.header.size);
}
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index a2493f1097df..2f20e782c7f2 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -465,8 +465,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
- perf_event__synthesize_sample(event_copy, sample_type,
- evsel->core.attr.read_format, sample);
+ ret = perf_event__synthesize_sample(event_copy, sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+ if (ret) {
+ pr_err("Failed to synthesize sample\n");
+ return ret;
+ }
return perf_event__repipe_synth(tool, event_copy);
}
@@ -1102,7 +1107,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool,
sample_sw.period = sample->period;
sample_sw.time = sample->time;
perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type,
- evsel->core.attr.read_format, &sample_sw);
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, &sample_sw);
build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine);
ret = perf_event__repipe(tool, event_sw, &sample_sw, machine);
perf_sample__exit(&sample_sw);
diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c
index e63790c61d53..204663571943 100644
--- a/tools/perf/tests/dlfilter-test.c
+++ b/tools/perf/tests/dlfilter-test.c
@@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid
event->header.type = PERF_RECORD_SAMPLE;
event->header.misc = PERF_RECORD_MISC_USER;
- event->header.size = perf_event__sample_event_size(&sample, sample_type, 0);
- err = perf_event__synthesize_sample(event, sample_type, 0, &sample);
+ event->header.size = perf_event__sample_event_size(&sample, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ err = perf_event__synthesize_sample(event, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
if (err)
return test_result("perf_event__synthesize_sample() failed", TEST_FAIL);
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index a7327c942ca2..55f0b73ca20e 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
sample.read.one.lost = 1;
}
- sz = perf_event__sample_event_size(&sample, sample_type, read_format);
+ sz = perf_event__sample_event_size(&sample, sample_type, read_format,
+ evsel.core.attr.branch_sample_type);
bufsz = sz + 4096; /* Add a bit for overrun checking */
event = malloc(bufsz);
if (!event) {
@@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
event->header.size = sz;
err = perf_event__synthesize_sample(event, sample_type, read_format,
- &sample);
+ evsel.core.attr.branch_sample_type, &sample);
if (err) {
pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
"perf_event__synthesize_sample", sample_type, err);
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 2b31da231ef3..31f05f467810 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -487,10 +487,30 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq)
bstack->hw_idx = -1ULL;
}
-static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type)
+static int arm_spe__inject_event(struct arm_spe *spe, union perf_event *event,
+ struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && spe->session && spe->session->evlist)
+ evsel = evlist__id2evsel(spe->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ event->header.type = PERF_RECORD_SAMPLE;
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
static inline int
@@ -502,7 +522,7 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
int ret;
if (spe->synth_opts.inject) {
- ret = arm_spe__inject_event(event, sample, spe->sample_type);
+ ret = arm_spe__inject_event(spe, event, sample, spe->sample_type);
if (ret)
return ret;
}
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8a639d2e51a4..6ec48de29441 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1422,11 +1422,29 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
bs->nr += 1;
}
-static int cs_etm__inject_event(union perf_event *event,
+static int cs_etm__inject_event(struct cs_etm_auxtrace *etm, union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && etm->session && etm->session->evlist)
+ evsel = evlist__id2evsel(etm->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
@@ -1592,7 +1610,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
sample.branch_stack = tidq->last_branch;
if (etm->synth_opts.inject) {
- ret = cs_etm__inject_event(event, &sample,
+ ret = cs_etm__inject_event(etm, event, &sample,
etm->instructions_sample_type);
if (ret)
return ret;
@@ -1667,7 +1685,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
}
if (etm->synth_opts.inject) {
- ret = cs_etm__inject_event(event, &sample,
+ ret = cs_etm__inject_event(etm, event, &sample,
etm->branches_sample_type);
if (ret)
return ret;
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 382255393fb3..0b18ebd13f7c 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
event.sample.header.size = bts->branches_event_size;
ret = perf_event__synthesize_sample(&event,
bts->branches_sample_type,
- 0, &sample);
+ /*read_format=*/0, /*branch_sample_type=*/0,
+ &sample);
if (ret)
return ret;
}
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index fc9eec8b54b8..dd2637678b40 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1728,11 +1728,30 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
event->sample.header.misc = sample->cpumode;
}
-static int intel_pt_inject_event(union perf_event *event,
+static int intel_pt_inject_event(struct intel_pt *pt, union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && pt->session && pt->session->evlist)
+ evsel = evlist__id2evsel(pt->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ event->header.type = PERF_RECORD_SAMPLE;
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
static inline int intel_pt_opt_inject(struct intel_pt *pt,
@@ -1742,7 +1761,7 @@ static inline int intel_pt_opt_inject(struct intel_pt *pt,
if (!pt->synth_opts.inject)
return 0;
- return intel_pt_inject_event(event, sample, type);
+ return intel_pt_inject_event(pt, event, sample, type);
}
static int intel_pt_deliver_synth_event(struct intel_pt *pt,
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 85bee747f4cd..2461f25a4d7d 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool,
return process(tool, (union perf_event *) &event, NULL, machine);
}
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format)
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format,
+ u64 branch_sample_type)
{
size_t sz, result = sizeof(struct perf_record_sample);
@@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
+ /* nr */
+ sz += sizeof(u64);
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)
+ sz += sizeof(u64);
result += sz;
}
@@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
}
int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
- const struct perf_sample *sample)
+ u64 branch_sample_type, const struct perf_sample *sample)
{
__u64 *array;
size_t sz;
@@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
- memcpy(array, sample->branch_stack, sz);
+
+ *array++ = sample->branch_stack->nr;
+
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) {
+ if (sample->no_hw_idx)
+ *array++ = 0;
+ else
+ *array++ = sample->branch_stack->hw_idx;
+ }
+
+ memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz);
array = (void *)array + sz;
}
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index b0edad0c3100..8c7f49f9ccf5 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_
int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
-int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
+int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
+ u64 branch_sample_type, const struct perf_sample *sample);
int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine);
@@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session,
int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process);
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format);
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
+ u64 read_format, u64 branch_sample_type);
int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool,
struct target *target, struct perf_thread_map *threads,
--
2.54.0.631.ge1b05301d1-goog
When using "perf inject --itrace=L" to synthesize branch stacks from
AUX data, several issues caused failures with the generated file:
1. The synthesized samples were delivered without the
PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's
sample_type. Fixed by using sample_type | evsel->synth_sample_type
in intel_pt_do_synth_pebs_sample.
2. Modifying evsel->core.attr.sample_type early in __cmd_inject caused
parse failures for subsequent records in the input file. Fixed by
moving this modification to just before writing the header.
3. perf_event__repipe_sample was narrowed to only synthesize samples
when branch stack injection was requested, and restored the use of
perf_inject__cut_auxtrace_sample as a fallback to preserve
functionality.
4. Potential Heap Overflow in perf_event__repipe_sample: Addressed by
adding a check that prints an error and returns -EFAULT if the
calculated event size exceeds PERF_SAMPLE_MAX_SIZE.
5. Header vs Payload Mismatch in __cmd_inject: Addressed by narrowing
the condition so that HEADER_BRANCH_STACK is only set in the file
header if add_last_branch was true.
6. NULL Pointer Dereference in intel-pt.c: When branch stack injection
is requested (add_last_branch is true) but last_branch is false
(e.g., perf inject --itrace=L), ptq->last_branch was not allocated.
However, PEBS branch stack synthesis (via synth_sample_type) still
forced LBR handling in do_synth_pebs_sample(), dereferencing the
NULL ptq->last_branch pointer. Guarding the dereference is not
sufficient because downstream sample size calculation and synthesis
strictly require a non-NULL branch_stack when the bit is set.
Fixed by ensuring ptq->last_branch is allocated in
intel_pt_alloc_queue() when add_last_branch is requested.
7. Modifying event attributes in perf_event__repipe_attr in-place caused
SIGSEGV on read-only mmap buffers in file mode and downstream parser
breakage in pipe mode. Fixed by processing the unmodified attribute
first, returning immediately in non-pipe mode, and correctly
synthesizing a new attribute event for pipe output using
perf_event__synthesize_attr. Also:
- Added a size validation check and integer underflow protection when
parsing n_ids.
- Prevented Trailing ID memory corruption by zero-initializing the
local attr copy and safely copying using min_t(size_t, sizeof(attr),
event->attr.attr.size).
- Resolved ID array parsing mismatch downstream by expanding attr.size to
sizeof(struct perf_event_attr) before synthesis to guarantee perfect
header/attribute size alignment.
8. Potential dangling pointer vulnerability in perf_event__repipe_sample:
Addressed by restoring the original sample->branch_stack pointer
before returning, including on early error return paths.
9. Off-by-one error in sample size check in perf_event__repipe_sample:
Fixed by checking if sz >= PERF_SAMPLE_MAX_SIZE instead of >.
10. Unadvertised size field left in payload by cut_auxtrace_sample:
Addressed by excluding the 8-byte size field from the copied
payload to correctly match the cleared PERF_SAMPLE_AUX bit. Cut
the AUX sample payload even if size is 0.
11. Inaccurate sample size calculation and uninitialized memory leaks in
convert_sample_callchain: Fixed by replacing manual arithmetic with
perf_event__sample_event_size and adding a bounds check against
PERF_SAMPLE_MAX_SIZE.
12. Omission of branch_sample_type in file headers: Addressed by
expanding older, smaller attributes to PERF_ATTR_SIZE_VER2 in
__cmd_inject to ensure branch_sample_type is not silently omitted.
Fixes: 0f0aa5e0693c ("perf inject: Add Instruction Tracing support")
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-inject.c | 148 ++++++++++++++++++++++++++++++++----
tools/perf/util/intel-pt.c | 8 +-
2 files changed, 138 insertions(+), 18 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 2f20e782c7f2..29470d819442 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -216,12 +216,23 @@ static int perf_event__repipe_op4_synth(const struct perf_tool *tool,
return perf_event__repipe_synth(tool, event);
}
+static int perf_event__repipe_synth_cb(const struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample __maybe_unused,
+ struct machine *machine __maybe_unused)
+{
+ return perf_event__repipe_synth(tool, event);
+}
+
static int perf_event__repipe_attr(const struct perf_tool *tool,
union perf_event *event,
struct evlist **pevlist)
{
struct perf_inject *inject = container_of(tool, struct perf_inject,
tool);
+ struct perf_event_attr attr;
+ size_t n_ids;
+ u64 *ids;
int ret;
ret = perf_event__process_attr(tool, event, pevlist);
@@ -232,7 +243,37 @@ static int perf_event__repipe_attr(const struct perf_tool *tool,
if (!inject->output.is_pipe)
return 0;
- return perf_event__repipe_synth(tool, event);
+ if (!inject->itrace_synth_opts.set)
+ return perf_event__repipe_synth(tool, event);
+
+ if (event->header.size < sizeof(struct perf_event_header) + sizeof(u64)) {
+ pr_err("Attribute event size %u is too small\n", event->header.size);
+ return -EINVAL;
+ }
+
+ if (event->header.size - sizeof(event->header) < event->attr.attr.size) {
+ pr_err("Attribute event size %u is too small for attr.size %u\n",
+ event->header.size, event->attr.attr.size);
+ return -EINVAL;
+ }
+
+ memset(&attr, 0, sizeof(attr));
+ memcpy(&attr, &event->attr.attr,
+ min_t(size_t, sizeof(attr), (size_t)event->attr.attr.size));
+
+ n_ids = event->header.size - sizeof(event->header) - event->attr.attr.size;
+ n_ids /= sizeof(u64);
+ ids = perf_record_header_attr_id(event);
+
+ attr.size = sizeof(struct perf_event_attr);
+ attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ return perf_event__synthesize_attr(tool, &attr, (u32)n_ids, ids,
+ perf_event__repipe_synth_cb);
}
static int perf_event__repipe_event_update(const struct perf_tool *tool,
@@ -331,8 +372,8 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject,
union perf_event *event,
struct perf_sample *sample)
{
- size_t sz1 = sample->aux_sample.data - (void *)event;
- size_t sz2 = event->header.size - sample->aux_sample.size - sz1;
+ size_t sz1 = sample->aux_sample.data - (void *)event - sizeof(u64);
+ size_t sz2 = event->header.size - sample->aux_sample.size - (sz1 + sizeof(u64));
union perf_event *ev;
if (inject->event_copy == NULL) {
@@ -343,13 +384,12 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject,
ev = (union perf_event *)inject->event_copy;
if (sz1 > event->header.size || sz2 > event->header.size ||
sz1 + sz2 > event->header.size ||
- sz1 < sizeof(struct perf_event_header) + sizeof(u64))
+ sz1 < sizeof(struct perf_event_header))
return event;
memcpy(ev, event, sz1);
memcpy((void *)ev + sz1, (void *)event + event->header.size - sz2, sz2);
ev->header.size = sz1 + sz2;
- ((u64 *)((void *)ev + sz1))[-1] = 0;
return ev;
}
@@ -376,7 +416,67 @@ static int perf_event__repipe_sample(const struct perf_tool *tool,
build_id__mark_dso_hit(tool, event, sample, evsel, machine);
- if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
+ if (inject->itrace_synth_opts.set &&
+ (inject->itrace_synth_opts.last_branch ||
+ inject->itrace_synth_opts.add_last_branch)) {
+ union perf_event *event_copy = (void *)inject->event_copy;
+ struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 };
+ int err;
+ size_t sz;
+ u64 orig_type = evsel->core.attr.sample_type;
+ u64 orig_branch_type = evsel->core.attr.branch_sample_type;
+
+ struct branch_stack *orig_bs = sample->branch_stack;
+
+ if (event_copy == NULL) {
+ inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE);
+ if (!inject->event_copy)
+ return -ENOMEM;
+
+ event_copy = (void *)inject->event_copy;
+ }
+
+ if (!sample->branch_stack)
+ sample->branch_stack = &dummy_bs;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ /* Temporarily add in type bits for synthesis. */
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type);
+
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ evsel->core.attr.sample_type = orig_type;
+ evsel->core.attr.branch_sample_type = orig_branch_type;
+ sample->branch_stack = orig_bs;
+ return -EFAULT;
+ }
+
+ event_copy->header.type = PERF_RECORD_SAMPLE;
+ event_copy->header.misc = event->header.misc;
+ event_copy->header.size = sz;
+
+ err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+
+ evsel->core.attr.sample_type = orig_type;
+ evsel->core.attr.branch_sample_type = orig_branch_type;
+ sample->branch_stack = orig_bs;
+
+ if (err) {
+ pr_err("Failed to synthesize sample\n");
+ return err;
+ }
+ event = event_copy;
+ } else if (inject->itrace_synth_opts.set &&
+ (evsel->core.attr.sample_type & PERF_SAMPLE_AUX)) {
event = perf_inject__cut_auxtrace_sample(inject, event, sample);
if (IS_ERR(event))
return PTR_ERR(event);
@@ -397,7 +497,7 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
struct callchain_cursor_node *node;
struct thread *thread;
u64 sample_type = evsel->core.attr.sample_type;
- u32 sample_size = event->header.size;
+ size_t sz;
u64 i, k;
int ret;
@@ -456,15 +556,18 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
out:
memcpy(event_copy, event, sizeof(event->header));
- /* adjust sample size for stack and regs */
- sample_size -= sample->user_stack.size;
- sample_size -= (hweight64(evsel->core.attr.sample_regs_user) + 1) * sizeof(u64);
- sample_size += (sample->callchain->nr + 1) * sizeof(u64);
- event_copy->header.size = sample_size;
-
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
+ sz = perf_event__sample_event_size(sample, sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event_copy->header.size = sz;
+
ret = perf_event__synthesize_sample(event_copy, sample_type,
evsel->core.attr.read_format,
evsel->core.attr.branch_sample_type, sample);
@@ -2442,12 +2545,27 @@ static int __cmd_inject(struct perf_inject *inject)
* synthesized hardware events, so clear the feature flag.
*/
if (inject->itrace_synth_opts.set) {
+ struct evsel *evsel;
+
perf_header__clear_feat(&session->header,
HEADER_AUXTRACE);
- if (inject->itrace_synth_opts.last_branch ||
- inject->itrace_synth_opts.add_last_branch)
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+ }
+
+ if (inject->itrace_synth_opts.add_last_branch) {
perf_header__set_feat(&session->header,
HEADER_BRANCH_STACK);
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ if (evsel->core.attr.size < PERF_ATTR_SIZE_VER2)
+ evsel->core.attr.size = PERF_ATTR_SIZE_VER2;
+ evsel->core.attr.branch_sample_type |=
+ PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ }
}
/*
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index dd2637678b40..d9c86ac49748 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1307,7 +1307,8 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
goto out_free;
}
- if (pt->synth_opts.last_branch || pt->synth_opts.other_events) {
+ if (pt->synth_opts.last_branch || pt->synth_opts.add_last_branch ||
+ pt->synth_opts.other_events) {
unsigned int entry_cnt = max(LBRS_MAX, pt->br_stack_sz);
ptq->last_branch = intel_pt_alloc_br_stack(entry_cnt);
@@ -2505,7 +2506,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
intel_pt_add_xmm(intr_regs, pos, items, regs_mask);
}
- if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
+ if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) {
if (items->mask[INTEL_PT_LBR_0_POS] ||
items->mask[INTEL_PT_LBR_1_POS] ||
items->mask[INTEL_PT_LBR_2_POS]) {
@@ -2576,7 +2577,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
sample.transaction = txn;
}
- ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type);
+ ret = intel_pt_deliver_synth_event(pt, event, &sample,
+ sample_type | evsel->synth_sample_type);
perf_sample__exit(&sample);
return ret;
}
--
2.54.0.631.ge1b05301d1-goog
Synthesizing branch stacks for Intel-PT highlighted an issue where
PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the
perf_event_attr branch_sample_type. This caused an incorrect size
calculation.
Fix the writing of the nr and hw_idx values during sample event
synthesis by passing the branch_sample_type into the sample size
and synthesis functions. Also update hardware tracers (Intel PT,
ARM SPE, CS-ETM) to retrieve and pass their branch_sample_type
dynamically to prevent payload misalignment.
Fixes: d3f85437ad6a ("perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX")
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/bench/inject-buildid.c | 9 ++++++---
tools/perf/builtin-inject.c | 12 +++++++++---
tools/perf/tests/dlfilter-test.c | 8 ++++++--
tools/perf/tests/sample-parsing.c | 5 +++--
tools/perf/util/arm-spe.c | 28 ++++++++++++++++++++++++----
tools/perf/util/cs-etm.c | 28 +++++++++++++++++++++++-----
tools/perf/util/intel-bts.c | 3 ++-
tools/perf/util/intel-pt.c | 27 +++++++++++++++++++++++----
tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++-------
tools/perf/util/synthetic-events.h | 6 ++++--
10 files changed, 118 insertions(+), 33 deletions(-)
diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
index aad572a78d7f..bfd2c5ec9488 100644
--- a/tools/perf/bench/inject-buildid.c
+++ b/tools/perf/bench/inject-buildid.c
@@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso,
event.header.type = PERF_RECORD_SAMPLE;
event.header.misc = PERF_RECORD_MISC_USER;
- event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0);
-
- perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample);
+ event.header.size = perf_event__sample_event_size(&sample, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ perf_event__synthesize_sample(&event, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
return writen(data->input_pipe[1], &event, event.header.size);
}
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index a2493f1097df..2f20e782c7f2 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -465,8 +465,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
- perf_event__synthesize_sample(event_copy, sample_type,
- evsel->core.attr.read_format, sample);
+ ret = perf_event__synthesize_sample(event_copy, sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+ if (ret) {
+ pr_err("Failed to synthesize sample\n");
+ return ret;
+ }
return perf_event__repipe_synth(tool, event_copy);
}
@@ -1102,7 +1107,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool,
sample_sw.period = sample->period;
sample_sw.time = sample->time;
perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type,
- evsel->core.attr.read_format, &sample_sw);
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, &sample_sw);
build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine);
ret = perf_event__repipe(tool, event_sw, &sample_sw, machine);
perf_sample__exit(&sample_sw);
diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c
index e63790c61d53..204663571943 100644
--- a/tools/perf/tests/dlfilter-test.c
+++ b/tools/perf/tests/dlfilter-test.c
@@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid
event->header.type = PERF_RECORD_SAMPLE;
event->header.misc = PERF_RECORD_MISC_USER;
- event->header.size = perf_event__sample_event_size(&sample, sample_type, 0);
- err = perf_event__synthesize_sample(event, sample_type, 0, &sample);
+ event->header.size = perf_event__sample_event_size(&sample, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ err = perf_event__synthesize_sample(event, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
if (err)
return test_result("perf_event__synthesize_sample() failed", TEST_FAIL);
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index a7327c942ca2..55f0b73ca20e 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
sample.read.one.lost = 1;
}
- sz = perf_event__sample_event_size(&sample, sample_type, read_format);
+ sz = perf_event__sample_event_size(&sample, sample_type, read_format,
+ evsel.core.attr.branch_sample_type);
bufsz = sz + 4096; /* Add a bit for overrun checking */
event = malloc(bufsz);
if (!event) {
@@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
event->header.size = sz;
err = perf_event__synthesize_sample(event, sample_type, read_format,
- &sample);
+ evsel.core.attr.branch_sample_type, &sample);
if (err) {
pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
"perf_event__synthesize_sample", sample_type, err);
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 2b31da231ef3..31f05f467810 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -487,10 +487,30 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq)
bstack->hw_idx = -1ULL;
}
-static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type)
+static int arm_spe__inject_event(struct arm_spe *spe, union perf_event *event,
+ struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && spe->session && spe->session->evlist)
+ evsel = evlist__id2evsel(spe->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ event->header.type = PERF_RECORD_SAMPLE;
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
static inline int
@@ -502,7 +522,7 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
int ret;
if (spe->synth_opts.inject) {
- ret = arm_spe__inject_event(event, sample, spe->sample_type);
+ ret = arm_spe__inject_event(spe, event, sample, spe->sample_type);
if (ret)
return ret;
}
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8a639d2e51a4..6ec48de29441 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1422,11 +1422,29 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
bs->nr += 1;
}
-static int cs_etm__inject_event(union perf_event *event,
+static int cs_etm__inject_event(struct cs_etm_auxtrace *etm, union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && etm->session && etm->session->evlist)
+ evsel = evlist__id2evsel(etm->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
@@ -1592,7 +1610,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
sample.branch_stack = tidq->last_branch;
if (etm->synth_opts.inject) {
- ret = cs_etm__inject_event(event, &sample,
+ ret = cs_etm__inject_event(etm, event, &sample,
etm->instructions_sample_type);
if (ret)
return ret;
@@ -1667,7 +1685,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
}
if (etm->synth_opts.inject) {
- ret = cs_etm__inject_event(event, &sample,
+ ret = cs_etm__inject_event(etm, event, &sample,
etm->branches_sample_type);
if (ret)
return ret;
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 382255393fb3..0b18ebd13f7c 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
event.sample.header.size = bts->branches_event_size;
ret = perf_event__synthesize_sample(&event,
bts->branches_sample_type,
- 0, &sample);
+ /*read_format=*/0, /*branch_sample_type=*/0,
+ &sample);
if (ret)
return ret;
}
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index fc9eec8b54b8..dd2637678b40 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1728,11 +1728,30 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
event->sample.header.misc = sample->cpumode;
}
-static int intel_pt_inject_event(union perf_event *event,
+static int intel_pt_inject_event(struct intel_pt *pt, union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ struct evsel *evsel = sample->evsel;
+ u64 branch_sample_type = 0;
+ size_t sz;
+
+ if (!evsel && pt->session && pt->session->evlist)
+ evsel = evlist__id2evsel(pt->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ event->header.type = PERF_RECORD_SAMPLE;
+ sz = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event->header.size = sz;
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ branch_sample_type, sample);
}
static inline int intel_pt_opt_inject(struct intel_pt *pt,
@@ -1742,7 +1761,7 @@ static inline int intel_pt_opt_inject(struct intel_pt *pt,
if (!pt->synth_opts.inject)
return 0;
- return intel_pt_inject_event(event, sample, type);
+ return intel_pt_inject_event(pt, event, sample, type);
}
static int intel_pt_deliver_synth_event(struct intel_pt *pt,
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 85bee747f4cd..2461f25a4d7d 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool,
return process(tool, (union perf_event *) &event, NULL, machine);
}
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format)
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format,
+ u64 branch_sample_type)
{
size_t sz, result = sizeof(struct perf_record_sample);
@@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
+ /* nr */
+ sz += sizeof(u64);
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)
+ sz += sizeof(u64);
result += sz;
}
@@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
}
int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
- const struct perf_sample *sample)
+ u64 branch_sample_type, const struct perf_sample *sample)
{
__u64 *array;
size_t sz;
@@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
- memcpy(array, sample->branch_stack, sz);
+
+ *array++ = sample->branch_stack->nr;
+
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) {
+ if (sample->no_hw_idx)
+ *array++ = 0;
+ else
+ *array++ = sample->branch_stack->hw_idx;
+ }
+
+ memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz);
array = (void *)array + sz;
}
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index b0edad0c3100..8c7f49f9ccf5 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_
int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
-int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
+int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
+ u64 branch_sample_type, const struct perf_sample *sample);
int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine);
@@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session,
int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process);
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format);
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
+ u64 read_format, u64 branch_sample_type);
int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool,
struct target *target, struct perf_thread_map *threads,
--
2.54.0.631.ge1b05301d1-goog
When using "perf inject --itrace=L" to synthesize branch stacks from
AUX data, several issues caused failures with the generated file:
1. The synthesized samples were delivered without the
PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's
sample_type. Fixed by using sample_type | evsel->synth_sample_type
in intel_pt_do_synth_pebs_sample.
2. Modifying evsel->core.attr.sample_type early in __cmd_inject caused
parse failures for subsequent records in the input file. Fixed by
moving this modification to just before writing the header.
3. perf_event__repipe_sample was narrowed to only synthesize samples
when branch stack injection was requested, and restored the use of
perf_inject__cut_auxtrace_sample as a fallback to preserve
functionality.
4. Potential Heap Overflow in perf_event__repipe_sample: Addressed by
adding a check that prints an error and returns -EFAULT if the
calculated event size exceeds PERF_SAMPLE_MAX_SIZE.
5. Header vs Payload Mismatch in __cmd_inject: Addressed by narrowing
the condition so that HEADER_BRANCH_STACK is only set in the file
header if add_last_branch was true.
6. NULL Pointer Dereference in intel-pt.c: Addressed by updating the
condition in intel_pt_do_synth_pebs_sample to fill sample.
branch_stack if it warrants synthesis, even if not in the original
sample_type.
7. Modifying event attributes in perf_event__repipe_attr in-place caused
SIGSEGV on read-only mmap buffers in file mode and downstream parser
breakage in pipe mode. Fixed by processing the unmodified attribute
first, returning immediately in non-pipe mode, and correctly
synthesizing a new attribute event for pipe output using
perf_event__synthesize_attr. Also added a size validation check to
prevent n_ids underflow when parsing header size.
8. Potential dangling pointer vulnerability in perf_event__repipe_sample:
Addressed by restoring the original sample->branch_stack pointer
before returning, including on early error return paths.
9. Off-by-one error in sample size check in perf_event__repipe_sample:
Fixed by checking if sz >= PERF_SAMPLE_MAX_SIZE instead of >.
10. Unadvertised size field left in payload by cut_auxtrace_sample:
Addressed by excluding the 8-byte size field from the copied
payload to correctly match the cleared PERF_SAMPLE_AUX bit. Cut
the AUX sample payload even if size is 0.
11. Inaccurate sample size calculation and uninitialized memory leaks in
convert_sample_callchain: Fixed by replacing manual arithmetic with
perf_event__sample_event_size and adding a bounds check against
PERF_SAMPLE_MAX_SIZE.
12. Omission of branch_sample_type in file headers: Addressed by
expanding older, smaller attributes to PERF_ATTR_SIZE_VER2 in
__cmd_inject to ensure branch_sample_type is not silently omitted.
Fixes: 0f0aa5e0693c ("perf inject: Add Instruction Tracing support")
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-inject.c | 148 ++++++++++++++++++++++++++++++++----
tools/perf/util/intel-pt.c | 5 +-
2 files changed, 136 insertions(+), 17 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 2f20e782c7f2..29470d819442 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -216,12 +216,23 @@ static int perf_event__repipe_op4_synth(const struct perf_tool *tool,
return perf_event__repipe_synth(tool, event);
}
+static int perf_event__repipe_synth_cb(const struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample __maybe_unused,
+ struct machine *machine __maybe_unused)
+{
+ return perf_event__repipe_synth(tool, event);
+}
+
static int perf_event__repipe_attr(const struct perf_tool *tool,
union perf_event *event,
struct evlist **pevlist)
{
struct perf_inject *inject = container_of(tool, struct perf_inject,
tool);
+ struct perf_event_attr attr;
+ size_t n_ids;
+ u64 *ids;
int ret;
ret = perf_event__process_attr(tool, event, pevlist);
@@ -232,7 +243,37 @@ static int perf_event__repipe_attr(const struct perf_tool *tool,
if (!inject->output.is_pipe)
return 0;
- return perf_event__repipe_synth(tool, event);
+ if (!inject->itrace_synth_opts.set)
+ return perf_event__repipe_synth(tool, event);
+
+ if (event->header.size < sizeof(struct perf_event_header) + sizeof(u64)) {
+ pr_err("Attribute event size %u is too small\n", event->header.size);
+ return -EINVAL;
+ }
+
+ if (event->header.size - sizeof(event->header) < event->attr.attr.size) {
+ pr_err("Attribute event size %u is too small for attr.size %u\n",
+ event->header.size, event->attr.attr.size);
+ return -EINVAL;
+ }
+
+ memset(&attr, 0, sizeof(attr));
+ memcpy(&attr, &event->attr.attr,
+ min_t(size_t, sizeof(attr), (size_t)event->attr.attr.size));
+
+ n_ids = event->header.size - sizeof(event->header) - event->attr.attr.size;
+ n_ids /= sizeof(u64);
+ ids = perf_record_header_attr_id(event);
+
+ attr.size = sizeof(struct perf_event_attr);
+ attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ return perf_event__synthesize_attr(tool, &attr, (u32)n_ids, ids,
+ perf_event__repipe_synth_cb);
}
static int perf_event__repipe_event_update(const struct perf_tool *tool,
@@ -331,8 +372,8 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject,
union perf_event *event,
struct perf_sample *sample)
{
- size_t sz1 = sample->aux_sample.data - (void *)event;
- size_t sz2 = event->header.size - sample->aux_sample.size - sz1;
+ size_t sz1 = sample->aux_sample.data - (void *)event - sizeof(u64);
+ size_t sz2 = event->header.size - sample->aux_sample.size - (sz1 + sizeof(u64));
union perf_event *ev;
if (inject->event_copy == NULL) {
@@ -343,13 +384,12 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject,
ev = (union perf_event *)inject->event_copy;
if (sz1 > event->header.size || sz2 > event->header.size ||
sz1 + sz2 > event->header.size ||
- sz1 < sizeof(struct perf_event_header) + sizeof(u64))
+ sz1 < sizeof(struct perf_event_header))
return event;
memcpy(ev, event, sz1);
memcpy((void *)ev + sz1, (void *)event + event->header.size - sz2, sz2);
ev->header.size = sz1 + sz2;
- ((u64 *)((void *)ev + sz1))[-1] = 0;
return ev;
}
@@ -376,7 +416,67 @@ static int perf_event__repipe_sample(const struct perf_tool *tool,
build_id__mark_dso_hit(tool, event, sample, evsel, machine);
- if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
+ if (inject->itrace_synth_opts.set &&
+ (inject->itrace_synth_opts.last_branch ||
+ inject->itrace_synth_opts.add_last_branch)) {
+ union perf_event *event_copy = (void *)inject->event_copy;
+ struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 };
+ int err;
+ size_t sz;
+ u64 orig_type = evsel->core.attr.sample_type;
+ u64 orig_branch_type = evsel->core.attr.branch_sample_type;
+
+ struct branch_stack *orig_bs = sample->branch_stack;
+
+ if (event_copy == NULL) {
+ inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE);
+ if (!inject->event_copy)
+ return -ENOMEM;
+
+ event_copy = (void *)inject->event_copy;
+ }
+
+ if (!sample->branch_stack)
+ sample->branch_stack = &dummy_bs;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ /* Temporarily add in type bits for synthesis. */
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type);
+
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ evsel->core.attr.sample_type = orig_type;
+ evsel->core.attr.branch_sample_type = orig_branch_type;
+ sample->branch_stack = orig_bs;
+ return -EFAULT;
+ }
+
+ event_copy->header.type = PERF_RECORD_SAMPLE;
+ event_copy->header.misc = event->header.misc;
+ event_copy->header.size = sz;
+
+ err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+
+ evsel->core.attr.sample_type = orig_type;
+ evsel->core.attr.branch_sample_type = orig_branch_type;
+ sample->branch_stack = orig_bs;
+
+ if (err) {
+ pr_err("Failed to synthesize sample\n");
+ return err;
+ }
+ event = event_copy;
+ } else if (inject->itrace_synth_opts.set &&
+ (evsel->core.attr.sample_type & PERF_SAMPLE_AUX)) {
event = perf_inject__cut_auxtrace_sample(inject, event, sample);
if (IS_ERR(event))
return PTR_ERR(event);
@@ -397,7 +497,7 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
struct callchain_cursor_node *node;
struct thread *thread;
u64 sample_type = evsel->core.attr.sample_type;
- u32 sample_size = event->header.size;
+ size_t sz;
u64 i, k;
int ret;
@@ -456,15 +556,18 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
out:
memcpy(event_copy, event, sizeof(event->header));
- /* adjust sample size for stack and regs */
- sample_size -= sample->user_stack.size;
- sample_size -= (hweight64(evsel->core.attr.sample_regs_user) + 1) * sizeof(u64);
- sample_size += (sample->callchain->nr + 1) * sizeof(u64);
- event_copy->header.size = sample_size;
-
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
+ sz = perf_event__sample_event_size(sample, sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type);
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+ event_copy->header.size = sz;
+
ret = perf_event__synthesize_sample(event_copy, sample_type,
evsel->core.attr.read_format,
evsel->core.attr.branch_sample_type, sample);
@@ -2442,12 +2545,27 @@ static int __cmd_inject(struct perf_inject *inject)
* synthesized hardware events, so clear the feature flag.
*/
if (inject->itrace_synth_opts.set) {
+ struct evsel *evsel;
+
perf_header__clear_feat(&session->header,
HEADER_AUXTRACE);
- if (inject->itrace_synth_opts.last_branch ||
- inject->itrace_synth_opts.add_last_branch)
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+ }
+
+ if (inject->itrace_synth_opts.add_last_branch) {
perf_header__set_feat(&session->header,
HEADER_BRANCH_STACK);
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ if (evsel->core.attr.size < PERF_ATTR_SIZE_VER2)
+ evsel->core.attr.size = PERF_ATTR_SIZE_VER2;
+ evsel->core.attr.branch_sample_type |=
+ PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ }
}
/*
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index dd2637678b40..7153b48cfe63 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -2505,7 +2505,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
intel_pt_add_xmm(intr_regs, pos, items, regs_mask);
}
- if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
+ if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) {
if (items->mask[INTEL_PT_LBR_0_POS] ||
items->mask[INTEL_PT_LBR_1_POS] ||
items->mask[INTEL_PT_LBR_2_POS]) {
@@ -2576,7 +2576,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
sample.transaction = txn;
}
- ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type);
+ ret = intel_pt_deliver_synth_event(pt, event, &sample,
+ sample_type | evsel->synth_sample_type);
perf_sample__exit(&sample);
return ret;
}
--
2.54.0.631.ge1b05301d1-goog
© 2016 - 2026 Red Hat, Inc.