From: Kan Liang <kan.liang@linux.intel.com>
The PEBS TSC-based timestamps do not appear correctly in the final
perf.data output file from perf record.
The data->time field setup by PEBS in the setup_pebs_fixed_sample_data()
is later overwritten by perf_events generic code in
perf_prepare_sample(). There is an ordering problem.
Set the sample flags when the data->time is updated by PEBS.
The data->time field will not be overwritten anymore.
Reported-by: Andreas Kogler <andreas.kogler.0x@gmail.com>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/intel/ds.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index de1f55d51784..01cbe26225c2 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1643,8 +1643,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
* We can only do this for the default trace clock.
*/
if (x86_pmu.intel_cap.pebs_format >= 3 &&
- event->attr.use_clockid == 0)
+ event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(pebs->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }
if (has_branch_stack(event))
data->br_stack = &cpuc->lbr_stack;
@@ -1705,8 +1707,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
perf_sample_data_init(data, 0, event->hw.last_period);
data->period = event->hw.last_period;
- if (event->attr.use_clockid == 0)
+ if (event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(basic->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }
/*
* We must however always use iregs for the unwinder to stay sane; the
--
2.35.1
Hello,
On Thu, Sep 1, 2022 at 6:10 AM <kan.liang@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> The PEBS TSC-based timestamps do not appear correctly in the final
> perf.data output file from perf record.
>
> The data->time field setup by PEBS in the setup_pebs_fixed_sample_data()
> is later overwritten by perf_events generic code in
> perf_prepare_sample(). There is an ordering problem.
>
> Set the sample flags when the data->time is updated by PEBS.
> The data->time field will not be overwritten anymore.
I have a report that it breaks the symbolization of samples.
It seems time is not in sync between perf_clock and PEBS.
One thing I noticed is that the system has a config option
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y.
Looking at the code, it seems sched_clock is doing some
adjustments in that case. So I'm not sure if it'd work well
on those systems.
Thoughts?
Thanks,
Namhyung
>
> Reported-by: Andreas Kogler <andreas.kogler.0x@gmail.com>
> Reported-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> arch/x86/events/intel/ds.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index de1f55d51784..01cbe26225c2 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1643,8 +1643,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
> * We can only do this for the default trace clock.
> */
> if (x86_pmu.intel_cap.pebs_format >= 3 &&
> - event->attr.use_clockid == 0)
> + event->attr.use_clockid == 0) {
> data->time = native_sched_clock_from_tsc(pebs->tsc);
> + data->sample_flags |= PERF_SAMPLE_TIME;
> + }
>
> if (has_branch_stack(event))
> data->br_stack = &cpuc->lbr_stack;
> @@ -1705,8 +1707,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
> perf_sample_data_init(data, 0, event->hw.last_period);
> data->period = event->hw.last_period;
>
> - if (event->attr.use_clockid == 0)
> + if (event->attr.use_clockid == 0) {
> data->time = native_sched_clock_from_tsc(basic->tsc);
> + data->sample_flags |= PERF_SAMPLE_TIME;
> + }
>
> /*
> * We must however always use iregs for the unwinder to stay sane; the
> --
> 2.35.1
>
On Tue, Oct 11, 2022 at 11:20:54AM -0700, Namhyung Kim wrote: > One thing I noticed is that the system has a config option > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y. You can't build x86 without that.
Hi Peter, On Wed, Oct 12, 2022 at 1:05 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Oct 11, 2022 at 11:20:54AM -0700, Namhyung Kim wrote: > > > One thing I noticed is that the system has a config option > > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y. > > You can't build x86 without that. Oh, I didn't know that.. so it's not a config problem. Kan, could you check the following command? $ perf record -e cycles:upp dd if=/dev/zero of=/dev/null count=10000 Thanks, Namhyung
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 47a3aeb39e8dc099ae431cd8b46bdf218f5511b2
Gitweb: https://git.kernel.org/tip/47a3aeb39e8dc099ae431cd8b46bdf218f5511b2
Author: Kan Liang <kan.liang@linux.intel.com>
AuthorDate: Thu, 01 Sep 2022 06:09:55 -07:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 06 Sep 2022 11:33:01 +02:00
perf/x86/intel/pebs: Fix PEBS timestamps overwritten
The PEBS TSC-based timestamps do not appear correctly in the final
perf.data output file from perf record.
The data->time field setup by PEBS in the setup_pebs_fixed_sample_data()
is later overwritten by perf_events generic code in
perf_prepare_sample(). There is an ordering problem.
Set the sample flags when the data->time is updated by PEBS.
The data->time field will not be overwritten anymore.
Reported-by: Andreas Kogler <andreas.kogler.0x@gmail.com>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220901130959.1285717-3-kan.liang@linux.intel.com
---
arch/x86/events/intel/ds.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index ba60427..cdd857b 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1635,8 +1635,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
* We can only do this for the default trace clock.
*/
if (x86_pmu.intel_cap.pebs_format >= 3 &&
- event->attr.use_clockid == 0)
+ event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(pebs->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }
if (has_branch_stack(event))
data->br_stack = &cpuc->lbr_stack;
@@ -1697,8 +1699,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
perf_sample_data_init(data, 0, event->hw.last_period);
data->period = event->hw.last_period;
- if (event->attr.use_clockid == 0)
+ if (event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(basic->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }
/*
* We must however always use iregs for the unwinder to stay sane; the
© 2016 - 2026 Red Hat, Inc.