[v5] Bug fixes on topdown events reordering

[Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Dapeng Mi 1 year, 4 months ago

Changes:
v5 -> v6:
  * no function change.
  * rebase patchset to latest code of perf-tool-next tree.
  * Add Kan's reviewed-by tag.

History:
  v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
  v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
  v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
  v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/

Dapeng Mi (6):
  perf x86/topdown: Complete topdown slots/metrics events check
  perf x86/topdown: Correct leader selection with sample_read enabled
  perf x86/topdown: Don't move topdown metric events in group
  perf tests: Add leader sampling test in record tests
  perf tests: Add topdown events counting and sampling tests
  perf tests: Add more topdown events regroup tests

 tools/perf/arch/x86/util/evlist.c  | 68 ++++++++++++++++++++++++++++--
 tools/perf/arch/x86/util/evsel.c   |  3 +-
 tools/perf/arch/x86/util/topdown.c | 64 +++++++++++++++++++++++++++-
 tools/perf/arch/x86/util/topdown.h |  2 +
 tools/perf/tests/shell/record.sh   | 45 ++++++++++++++++++++
 tools/perf/tests/shell/stat.sh     | 28 +++++++++++-
 6 files changed, 201 insertions(+), 9 deletions(-)


base-commit: 1de5b5dcb8353f36581c963df2d359a5f151a0be
-- 
2.40.1

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Namhyung Kim 1 year, 4 months ago

On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:

> Changes:
> v5 -> v6:
>   * no function change.
>   * rebase patchset to latest code of perf-tool-next tree.
>   * Add Kan's reviewed-by tag.
> 
> History:
>   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
>   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
>   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
>   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
> 
> [...]

Applied to perf-tools-next, thanks!

Best regards,
Namhyung

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Ian Rogers 1 year, 4 months ago

On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
>
> > Changes:
> > v5 -> v6:
> >   * no function change.
> >   * rebase patchset to latest code of perf-tool-next tree.
> >   * Add Kan's reviewed-by tag.
> >
> > History:
> >   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
> >   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
> >   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
> >   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
> >
> > [...]
>
> Applied to perf-tools-next, thanks!

I disagreed with an early patch set and the issue wasn't resolved. Specifically:

https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
```
  /* Followed by topdown events. */
  if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
  return -1;
- if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
+ /*
+ * Move topdown events forward only when topdown events
+ * are not in same group with previous event.
+ */
+ if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
+     lhs->core.leader != rhs->core.leader)
  return 1;
```
Is a broken comparator as the lhs then rhs behavior varies from the
rhs then lhs behavior. The qsort implementation can randomly order the
events.
Please drop/revert.

Thanks,
Ian

> Best regards,
> Namhyung

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Namhyung Kim 1 year, 4 months ago

On Tue, Oct 01, 2024 at 03:32:04PM -0700, Ian Rogers wrote:
> On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
> >
> > > Changes:
> > > v5 -> v6:
> > >   * no function change.
> > >   * rebase patchset to latest code of perf-tool-next tree.
> > >   * Add Kan's reviewed-by tag.
> > >
> > > History:
> > >   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
> > >   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
> > >   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
> > >   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
> > >
> > > [...]
> >
> > Applied to perf-tools-next, thanks!
> 
> I disagreed with an early patch set and the issue wasn't resolved. Specifically:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
> ```
>   /* Followed by topdown events. */
>   if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>   return -1;
> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> + /*
> + * Move topdown events forward only when topdown events
> + * are not in same group with previous event.
> + */
> + if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> +     lhs->core.leader != rhs->core.leader)
>   return 1;
> ```
> Is a broken comparator as the lhs then rhs behavior varies from the
> rhs then lhs behavior. The qsort implementation can randomly order the
> events.
> Please drop/revert.

Can you please provide an example when it's broken?  I'm not sure how it
can produce new errors, but it seems to fix a specific problem.  Do you
have a new test failure after this change?

Thanks,
Namhyung

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Ian Rogers 1 year, 4 months ago

On Wed, Oct 2, 2024 at 5:00 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Oct 01, 2024 at 03:32:04PM -0700, Ian Rogers wrote:
> > On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
> > >
> > > > Changes:
> > > > v5 -> v6:
> > > >   * no function change.
> > > >   * rebase patchset to latest code of perf-tool-next tree.
> > > >   * Add Kan's reviewed-by tag.
> > > >
> > > > History:
> > > >   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
> > > >   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
> > > >   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
> > > >   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
> > > >
> > > > [...]
> > >
> > > Applied to perf-tools-next, thanks!
> >
> > I disagreed with an early patch set and the issue wasn't resolved. Specifically:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
> > ```
> >   /* Followed by topdown events. */
> >   if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >   return -1;
> > - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> > + /*
> > + * Move topdown events forward only when topdown events
> > + * are not in same group with previous event.
> > + */
> > + if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> > +     lhs->core.leader != rhs->core.leader)
> >   return 1;
> > ```
> > Is a broken comparator as the lhs then rhs behavior varies from the
> > rhs then lhs behavior. The qsort implementation can randomly order the
> > events.
> > Please drop/revert.
>
> Can you please provide an example when it's broken?  I'm not sure how it
> can produce new errors, but it seems to fix a specific problem.  Do you
> have a new test failure after this change?

It may work with a particular sort routine in a particular library
today, the issue is that if the sort routine were say changed from:

if (cmp(a, b) < 0)

to:

if (cmp(b, a) > 0)

the sort would vary with this change even though such a change in the
sorting code is a no-op. It is fundamental algorithms that this code
is broken, I'm not going to spend the time finding a list of
instructions and a sort routine to demonstrate this.

Thanks,
Ian

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Liang, Kan 1 year, 4 months ago


On 2024-10-02 8:57 p.m., Ian Rogers wrote:
> On Wed, Oct 2, 2024 at 5:00 PM Namhyung Kim <namhyung@kernel.org> wrote:
>>
>> On Tue, Oct 01, 2024 at 03:32:04PM -0700, Ian Rogers wrote:
>>> On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@kernel.org> wrote:
>>>>
>>>> On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
>>>>
>>>>> Changes:
>>>>> v5 -> v6:
>>>>>   * no function change.
>>>>>   * rebase patchset to latest code of perf-tool-next tree.
>>>>>   * Add Kan's reviewed-by tag.
>>>>>
>>>>> History:
>>>>>   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
>>>>>   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
>>>>>   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
>>>>>   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
>>>>>
>>>>> [...]
>>>>
>>>> Applied to perf-tools-next, thanks!
>>>
>>> I disagreed with an early patch set and the issue wasn't resolved. Specifically:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
>>> ```
>>>   /* Followed by topdown events. */
>>>   if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>>>   return -1;
>>> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
>>> + /*
>>> + * Move topdown events forward only when topdown events
>>> + * are not in same group with previous event.
>>> + */
>>> + if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
>>> +     lhs->core.leader != rhs->core.leader)
>>>   return 1;
>>> ```
>>> Is a broken comparator as the lhs then rhs behavior varies from the
>>> rhs then lhs behavior. The qsort implementation can randomly order the
>>> events.
>>> Please drop/revert.
>>
>> Can you please provide an example when it's broken?  I'm not sure how it
>> can produce new errors, but it seems to fix a specific problem.  Do you
>> have a new test failure after this change?
> 
> It may work with a particular sort routine in a particular library
> today, the issue is that if the sort routine were say changed from:
> 
> if (cmp(a, b) < 0)
> 
> to:
> 
> if (cmp(b, a) > 0)
> 
> the sort would vary with this change even though such a change in the
> sorting code is a no-op. It is fundamental algorithms that this code
> is broken, I'm not going to spend the time finding a list of
> instructions and a sort routine to demonstrate this.


I'm not sure I fully understand. But just want to mention that the
change only impacts the Topdown perf metric group, which is only
available for the ICL and later p-core. Nothing will change if no perf
metrics topdown events are used. Very limited impact.

I like the patch set is because it provides examples and tests to cover
the possible combination of the slots event and topdown metrics events.
So we will have a clear expectation for different combinations caused by
the perf metrics mess.

If the algorithms cannot be changed, can you please give some
suggestions, especially for the sample read failure?

Thanks,
Kan

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Ian Rogers 1 year, 4 months ago

On Thu, Oct 3, 2024 at 7:57 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-10-02 8:57 p.m., Ian Rogers wrote:
> > On Wed, Oct 2, 2024 at 5:00 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >>
> >> On Tue, Oct 01, 2024 at 03:32:04PM -0700, Ian Rogers wrote:
> >>> On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >>>>
> >>>> On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
> >>>>
> >>>>> Changes:
> >>>>> v5 -> v6:
> >>>>>   * no function change.
> >>>>>   * rebase patchset to latest code of perf-tool-next tree.
> >>>>>   * Add Kan's reviewed-by tag.
> >>>>>
> >>>>> History:
> >>>>>   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
> >>>>>   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
> >>>>>   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
> >>>>>   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
> >>>>>
> >>>>> [...]
> >>>>
> >>>> Applied to perf-tools-next, thanks!
> >>>
> >>> I disagreed with an early patch set and the issue wasn't resolved. Specifically:
> >>>
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
> >>> ```
> >>>   /* Followed by topdown events. */
> >>>   if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >>>   return -1;
> >>> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> >>> + /*
> >>> + * Move topdown events forward only when topdown events
> >>> + * are not in same group with previous event.
> >>> + */
> >>> + if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> >>> +     lhs->core.leader != rhs->core.leader)
> >>>   return 1;
> >>> ```
> >>> Is a broken comparator as the lhs then rhs behavior varies from the
> >>> rhs then lhs behavior. The qsort implementation can randomly order the
> >>> events.
> >>> Please drop/revert.
> >>
> >> Can you please provide an example when it's broken?  I'm not sure how it
> >> can produce new errors, but it seems to fix a specific problem.  Do you
> >> have a new test failure after this change?
> >
> > It may work with a particular sort routine in a particular library
> > today, the issue is that if the sort routine were say changed from:
> >
> > if (cmp(a, b) < 0)
> >
> > to:
> >
> > if (cmp(b, a) > 0)
> >
> > the sort would vary with this change even though such a change in the
> > sorting code is a no-op. It is fundamental algorithms that this code
> > is broken, I'm not going to spend the time finding a list of
> > instructions and a sort routine to demonstrate this.
>
>
> I'm not sure I fully understand. But just want to mention that the
> change only impacts the Topdown perf metric group, which is only
> available for the ICL and later p-core. Nothing will change if no perf
> metrics topdown events are used. Very limited impact.

How is breaking every ICL and later Intel model (excluding atoms)
limited impact?

> I like the patch set is because it provides examples and tests to cover
> the possible combination of the slots event and topdown metrics events.
> So we will have a clear expectation for different combinations caused by
> the perf metrics mess.

Tests are good. I have no problem with the change, possibly it is
inefficient, except that hacking a sort comparator to not be symmetric
is broken.

> If the algorithms cannot be changed, can you please give some
> suggestions, especially for the sample read failure?

So this is symmetric:
```
if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
  return -1;
if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
  return 1;
```
That is were lhs and rhs swapped then you'd get the expected comparison order.
```
if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
lhs->core.leader != rhs->core.leader)
  return -1;
if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
lhs->core.leader != rhs->core.leader)
  return 1;
```
Is symmetric as well.
```
if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
  return -1;
if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
lhs->core.leader != rhs->core.leader)
  return 1;
```
(what this patch does) is not symmetric as the group leader impacts
the greater-than case but not the less-than case.

It is not uncommon to see in a sort function:
```
if (cmp(a, b) <= 0) {
  assert(cmp(b,a) >= 0 && "check for unstable/broken compare functions");
```
and we could add this here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/list_sort.c?h=perf-tools-next#n22
Try searching up "Comparison method violates its general contract"
which is what Java throws when its TimSort implementation detects
cases like this. If you break the comparator you can get the sort into
an infinite loop - maybe that's enough of an indication that the code
is broken and you don't need the exception. As is common in kernel
code we're missing guard rails in list_sort, were the comparator
passed to qsort then expectations on breakage are a roll of the dice.

I believe when I originally gave this feedback I didn't think the fix
was to do it in the comparator and maybe an additional pass over the
list was going to be necessary. Basically a sort needs to be a sort
and it can't kind of be a sort depending on the order of the
comparison, this is just incurring tech debt and when a sort tweak
happens everything will break. As the usual chump cleaning up this
tech debt I'd prefer it wasn't introduced.

Fwiw, I refuse to carry this change in:
https://github.com/googleprodkernel/linux-perf/commits/google_tools_master/
and if the tests break as a consequence then the appropriate thing is
to revert those too. Linus/upstream maintainers follow a different
plan and that's their business, I can't build off of this code.

It is unclear to me why we get patches that have been pointed out to
be broken rebased/resent repeatedly without the broken-ness corrected.
For example, the abuse of aggregation for metrics in perf script. At
least point to the disagreement in the commit message, call me an
idiot, and let the maintainer make an informed or uninformed choice. I
am frequently an idiot and I apologize for all those cases, to the
best of my knowledge this isn't one of them.

Thanks,
Ian

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Namhyung Kim 1 year, 4 months ago

On Thu, Oct 03, 2024 at 08:55:22AM -0700, Ian Rogers wrote:
> On Thu, Oct 3, 2024 at 7:57 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >
> >
> >
> > On 2024-10-02 8:57 p.m., Ian Rogers wrote:
> > > On Wed, Oct 2, 2024 at 5:00 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >>
> > >> On Tue, Oct 01, 2024 at 03:32:04PM -0700, Ian Rogers wrote:
> > >>> On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >>>>
> > >>>> On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
> > >>>>
> > >>>>> Changes:
> > >>>>> v5 -> v6:
> > >>>>>   * no function change.
> > >>>>>   * rebase patchset to latest code of perf-tool-next tree.
> > >>>>>   * Add Kan's reviewed-by tag.
> > >>>>>
> > >>>>> History:
> > >>>>>   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
> > >>>>>   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
> > >>>>>   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
> > >>>>>   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
> > >>>>>
> > >>>>> [...]
> > >>>>
> > >>>> Applied to perf-tools-next, thanks!
> > >>>
> > >>> I disagreed with an early patch set and the issue wasn't resolved. Specifically:
> > >>>
> > >>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
> > >>> ```
> > >>>   /* Followed by topdown events. */
> > >>>   if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > >>>   return -1;
> > >>> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> > >>> + /*
> > >>> + * Move topdown events forward only when topdown events
> > >>> + * are not in same group with previous event.
> > >>> + */
> > >>> + if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> > >>> +     lhs->core.leader != rhs->core.leader)
> > >>>   return 1;
> > >>> ```
> > >>> Is a broken comparator as the lhs then rhs behavior varies from the
> > >>> rhs then lhs behavior. The qsort implementation can randomly order the
> > >>> events.
> > >>> Please drop/revert.
> > >>
> > >> Can you please provide an example when it's broken?  I'm not sure how it
> > >> can produce new errors, but it seems to fix a specific problem.  Do you
> > >> have a new test failure after this change?
> > >
> > > It may work with a particular sort routine in a particular library
> > > today, the issue is that if the sort routine were say changed from:
> > >
> > > if (cmp(a, b) < 0)
> > >
> > > to:
> > >
> > > if (cmp(b, a) > 0)
> > >
> > > the sort would vary with this change even though such a change in the
> > > sorting code is a no-op. It is fundamental algorithms that this code
> > > is broken, I'm not going to spend the time finding a list of
> > > instructions and a sort routine to demonstrate this.
> >
> >
> > I'm not sure I fully understand. But just want to mention that the
> > change only impacts the Topdown perf metric group, which is only
> > available for the ICL and later p-core. Nothing will change if no perf
> > metrics topdown events are used. Very limited impact.
> 
> How is breaking every ICL and later Intel model (excluding atoms)
> limited impact?

I guess he meant it's only for Topdown metric groups with an incorrect
order on those models.

> 
> > I like the patch set is because it provides examples and tests to cover
> > the possible combination of the slots event and topdown metrics events.
> > So we will have a clear expectation for different combinations caused by
> > the perf metrics mess.
> 
> Tests are good. I have no problem with the change, possibly it is
> inefficient, except that hacking a sort comparator to not be symmetric
> is broken.

Ok.

> 
> > If the algorithms cannot be changed, can you please give some
> > suggestions, especially for the sample read failure?
> 
> So this is symmetric:
> ```
> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>   return -1;
> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
>   return 1;
> ```
> That is were lhs and rhs swapped then you'd get the expected comparison order.
> ```
> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
> lhs->core.leader != rhs->core.leader)
>   return -1;
> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> lhs->core.leader != rhs->core.leader)
>   return 1;
> ```
> Is symmetric as well.
> ```
> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>   return -1;
> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> lhs->core.leader != rhs->core.leader)
>   return 1;
> ```
> (what this patch does) is not symmetric as the group leader impacts
> the greater-than case but not the less-than case.
> 
> It is not uncommon to see in a sort function:
> ```
> if (cmp(a, b) <= 0) {
>   assert(cmp(b,a) >= 0 && "check for unstable/broken compare functions");
> ```

I see.  So are you proposing this?

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index 438e4639fa892304..46884fa17fe658a6 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -70,7 +70,8 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
                if (arch_is_topdown_slots(rhs))
                        return 1;
                /* Followed by topdown events. */
-               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
+               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
+                   lhs->core.leader != rhs->core.leader)
                        return -1;
                /*
                 * Move topdown events forward only when topdown events

Dapeng and Kan, can you verify if it's ok?  My quick tests look ok.


> and we could add this here:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/list_sort.c?h=perf-tools-next#n22
> Try searching up "Comparison method violates its general contract"
> which is what Java throws when its TimSort implementation detects
> cases like this. If you break the comparator you can get the sort into
> an infinite loop - maybe that's enough of an indication that the code
> is broken and you don't need the exception. As is common in kernel
> code we're missing guard rails in list_sort, were the comparator
> passed to qsort then expectations on breakage are a roll of the dice.
> 
> I believe when I originally gave this feedback I didn't think the fix
> was to do it in the comparator and maybe an additional pass over the
> list was going to be necessary. Basically a sort needs to be a sort
> and it can't kind of be a sort depending on the order of the
> comparison, this is just incurring tech debt and when a sort tweak
> happens everything will break. As the usual chump cleaning up this
> tech debt I'd prefer it wasn't introduced.

Sorry, I think I didn't follow the previous thread and missed your
feedback.

> 
> Fwiw, I refuse to carry this change in:
> https://github.com/googleprodkernel/linux-perf/commits/google_tools_master/
> and if the tests break as a consequence then the appropriate thing is
> to revert those too. Linus/upstream maintainers follow a different
> plan and that's their business, I can't build off of this code.
> 
> It is unclear to me why we get patches that have been pointed out to
> be broken rebased/resent repeatedly without the broken-ness corrected.
> For example, the abuse of aggregation for metrics in perf script. At
> least point to the disagreement in the commit message, call me an
> idiot, and let the maintainer make an informed or uninformed choice. I
> am frequently an idiot and I apologize for all those cases, to the
> best of my knowledge this isn't one of them.

Yeah I understand your point and it sounds reasonable.  Thanks for your
feedback.  I don't think I'll revert the change though, we can add the
fix on top instead.

Thanks,
Namhyung

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Liang, Kan 1 year, 4 months ago


On 2024-10-03 12:45 p.m., Namhyung Kim wrote:
>>> If the algorithms cannot be changed, can you please give some
>>> suggestions, especially for the sample read failure?
>> So this is symmetric:
>> ```
>> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>>   return -1;
>> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
>>   return 1;
>> ```
>> That is were lhs and rhs swapped then you'd get the expected comparison order.
>> ```
>> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
>> lhs->core.leader != rhs->core.leader)
>>   return -1;
>> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
>> lhs->core.leader != rhs->core.leader)
>>   return 1;
>> ```
>> Is symmetric as well.
>> ```
>> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>>   return -1;
>> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
>> lhs->core.leader != rhs->core.leader)
>>   return 1;
>> ```
>> (what this patch does) is not symmetric as the group leader impacts
>> the greater-than case but not the less-than case.
>>
>> It is not uncommon to see in a sort function:
>> ```
>> if (cmp(a, b) <= 0) {
>>   assert(cmp(b,a) >= 0 && "check for unstable/broken compare functions");
>> ```
> I see.  So are you proposing this?
> 
> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> index 438e4639fa892304..46884fa17fe658a6 100644
> --- a/tools/perf/arch/x86/util/evlist.c
> +++ b/tools/perf/arch/x86/util/evlist.c
> @@ -70,7 +70,8 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
>                 if (arch_is_topdown_slots(rhs))
>                         return 1;
>                 /* Followed by topdown events. */
> -               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> +               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
> +                   lhs->core.leader != rhs->core.leader)
>                         return -1;
>                 /*
>                  * Move topdown events forward only when topdown events
> 
> Dapeng and Kan, can you verify if it's ok?  My quick tests look ok.

I verified the above change. It works well.

Tested-by: Kan Liang <kan.liang@linux.intel.com>

Thanks,
Kan

RE: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Mi, Dapeng1 1 year, 4 months ago


> -----Original Message-----
> From: Liang, Kan <kan.liang@linux.intel.com>
> Sent: Friday, October 4, 2024 3:45 AM
> To: Namhyung Kim <namhyung@kernel.org>; Ian Rogers <irogers@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>;
> Arnaldo Carvalho de Melo <acme@kernel.org>; Hunter, Adrian
> <adrian.hunter@intel.com>; Alexander Shishkin
> <alexander.shishkin@linux.intel.com>; Dapeng Mi
> <dapeng1.mi@linux.intel.com>; linux-perf-users@vger.kernel.org; linux-
> kernel@vger.kernel.org; Yongwei Ma <yongwei.ma@intel.com>; Mi, Dapeng1
> <dapeng1.mi@intel.com>
> Subject: Re: [Patch v5 0/6] Bug fixes on topdown events reordering
> 
> 
> 
> On 2024-10-03 12:45 p.m., Namhyung Kim wrote:
> >>> If the algorithms cannot be changed, can you please give some
> >>> suggestions, especially for the sample read failure?
> >> So this is symmetric:
> >> ```
> >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >>   return -1;
> >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> >>   return 1;
> >> ```
> >> That is were lhs and rhs swapped then you'd get the expected comparison
> order.
> >> ```
> >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
> >> lhs->core.leader != rhs->core.leader)
> >>   return -1;
> >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> >> lhs->core.leader != rhs->core.leader)
> >>   return 1;
> >> ```
> >> Is symmetric as well.
> >> ```
> >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >>   return -1;
> >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> >> lhs->core.leader != rhs->core.leader)
> >>   return 1;
> >> ```
> >> (what this patch does) is not symmetric as the group leader impacts
> >> the greater-than case but not the less-than case.
> >>
> >> It is not uncommon to see in a sort function:
> >> ```
> >> if (cmp(a, b) <= 0) {
> >>   assert(cmp(b,a) >= 0 && "check for unstable/broken compare
> >> functions"); ```
> > I see.  So are you proposing this?
> >
> > diff --git a/tools/perf/arch/x86/util/evlist.c
> > b/tools/perf/arch/x86/util/evlist.c
> > index 438e4639fa892304..46884fa17fe658a6 100644
> > --- a/tools/perf/arch/x86/util/evlist.c
> > +++ b/tools/perf/arch/x86/util/evlist.c
> > @@ -70,7 +70,8 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct
> evsel *rhs)
> >                 if (arch_is_topdown_slots(rhs))
> >                         return 1;
> >                 /* Followed by topdown events. */
> > -               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > +               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs)
> &&
> > +                   lhs->core.leader != rhs->core.leader)
> >                         return -1;
> >                 /*
> >                  * Move topdown events forward only when topdown
> > events
> >
> > Dapeng and Kan, can you verify if it's ok?  My quick tests look ok.
> 
> I verified the above change. It works well.
> 
> Tested-by: Kan Liang <kan.liang@linux.intel.com>

Since the linux.intel.com mail server is down, I have to use the intel.com mail server (outlook client) to respond this mail. The mail format may vary. Sorry for this.

Thanks Kan for testing this.

Ian, thanks for pointing that the comparators are not symmetrical. I agree it would lead to an inconsistent sorting results if changing comparing condition from "cmp(a, b) < 0" to "cmp(b, a) > 0" or vice versa, but limit to some certain sort library like perf-tool, the sorting result should be still fixed, right?

Anyway, I would provide a new patch to make the comparators are symmetrical. Thanks.

> 
> Thanks,
> Kan

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Ian Rogers 1 year, 4 months ago

On Thu, Oct 3, 2024 at 12:45 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-10-03 12:45 p.m., Namhyung Kim wrote:
> >>> If the algorithms cannot be changed, can you please give some
> >>> suggestions, especially for the sample read failure?
> >> So this is symmetric:
> >> ```
> >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >>   return -1;
> >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> >>   return 1;
> >> ```
> >> That is were lhs and rhs swapped then you'd get the expected comparison order.
> >> ```
> >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
> >> lhs->core.leader != rhs->core.leader)
> >>   return -1;
> >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> >> lhs->core.leader != rhs->core.leader)
> >>   return 1;
> >> ```
> >> Is symmetric as well.
> >> ```
> >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >>   return -1;
> >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> >> lhs->core.leader != rhs->core.leader)
> >>   return 1;
> >> ```
> >> (what this patch does) is not symmetric as the group leader impacts
> >> the greater-than case but not the less-than case.
> >>
> >> It is not uncommon to see in a sort function:
> >> ```
> >> if (cmp(a, b) <= 0) {
> >>   assert(cmp(b,a) >= 0 && "check for unstable/broken compare functions");
> >> ```
> > I see.  So are you proposing this?
> >
> > diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> > index 438e4639fa892304..46884fa17fe658a6 100644
> > --- a/tools/perf/arch/x86/util/evlist.c
> > +++ b/tools/perf/arch/x86/util/evlist.c
> > @@ -70,7 +70,8 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> >                 if (arch_is_topdown_slots(rhs))
> >                         return 1;
> >                 /* Followed by topdown events. */
> > -               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > +               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
> > +                   lhs->core.leader != rhs->core.leader)
> >                         return -1;
> >                 /*
> >                  * Move topdown events forward only when topdown events
> >
> > Dapeng and Kan, can you verify if it's ok?  My quick tests look ok.
>
> I verified the above change. It works well.
>
> Tested-by: Kan Liang <kan.liang@linux.intel.com>

Dapeng's comment should cover replace the comment /* Followed by
topdown events. */ but there are other things amiss. I'm thinking of
something like: "slots,cycles,{instructions,topdown-be-bound}" the
topdown-be-bound should get sorted and grouped with slots, but cycles
and instructions have no reason to be reordered, so do we end up with
slots, instructions and topdown-be-bound being grouped with cycles
sitting ungrouped in the middle of the evlist? I believe there are
assumptions that grouped evsels are adjacent in the evlist, not least
in:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2106
Does cycles instructions end up being broken out of a group in this
case? Which feels like the case the code was trying to avoid.

Thanks,
Ian

RE: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Mi, Dapeng1 1 year, 4 months ago


> -----Original Message-----
> From: Ian Rogers <irogers@google.com>
> Sent: Friday, October 4, 2024 5:26 AM
> To: Liang, Kan <kan.liang@linux.intel.com>
> Cc: Namhyung Kim <namhyung@kernel.org>; Peter Zijlstra
> <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>; Arnaldo Carvalho
> de Melo <acme@kernel.org>; Hunter, Adrian <adrian.hunter@intel.com>;
> Alexander Shishkin <alexander.shishkin@linux.intel.com>; Dapeng Mi
> <dapeng1.mi@linux.intel.com>; linux-perf-users@vger.kernel.org; linux-
> kernel@vger.kernel.org; Yongwei Ma <yongwei.ma@intel.com>; Mi, Dapeng1
> <dapeng1.mi@intel.com>
> Subject: Re: [Patch v5 0/6] Bug fixes on topdown events reordering
> 
> On Thu, Oct 3, 2024 at 12:45 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >
> >
> >
> > On 2024-10-03 12:45 p.m., Namhyung Kim wrote:
> > >>> If the algorithms cannot be changed, can you please give some
> > >>> suggestions, especially for the sample read failure?
> > >> So this is symmetric:
> > >> ```
> > >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > >>   return -1;
> > >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> > >>   return 1;
> > >> ```
> > >> That is were lhs and rhs swapped then you'd get the expected comparison
> order.
> > >> ```
> > >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs)
> > >> &&
> > >> lhs->core.leader != rhs->core.leader)
> > >>   return -1;
> > >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs)
> > >> &&
> > >> lhs->core.leader != rhs->core.leader)
> > >>   return 1;
> > >> ```
> > >> Is symmetric as well.
> > >> ```
> > >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > >>   return -1;
> > >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs)
> > >> &&
> > >> lhs->core.leader != rhs->core.leader)
> > >>   return 1;
> > >> ```
> > >> (what this patch does) is not symmetric as the group leader impacts
> > >> the greater-than case but not the less-than case.
> > >>
> > >> It is not uncommon to see in a sort function:
> > >> ```
> > >> if (cmp(a, b) <= 0) {
> > >>   assert(cmp(b,a) >= 0 && "check for unstable/broken compare
> > >> functions"); ```
> > > I see.  So are you proposing this?
> > >
> > > diff --git a/tools/perf/arch/x86/util/evlist.c
> > > b/tools/perf/arch/x86/util/evlist.c
> > > index 438e4639fa892304..46884fa17fe658a6 100644
> > > --- a/tools/perf/arch/x86/util/evlist.c
> > > +++ b/tools/perf/arch/x86/util/evlist.c
> > > @@ -70,7 +70,8 @@ int arch_evlist__cmp(const struct evsel *lhs, const
> struct evsel *rhs)
> > >                 if (arch_is_topdown_slots(rhs))
> > >                         return 1;
> > >                 /* Followed by topdown events. */
> > > -               if (arch_is_topdown_metrics(lhs)
> && !arch_is_topdown_metrics(rhs))
> > > +               if (arch_is_topdown_metrics(lhs)
> && !arch_is_topdown_metrics(rhs) &&
> > > +                   lhs->core.leader != rhs->core.leader)
> > >                         return -1;
> > >                 /*
> > >                  * Move topdown events forward only when topdown
> > > events
> > >
> > > Dapeng and Kan, can you verify if it's ok?  My quick tests look ok.
> >
> > I verified the above change. It works well.
> >
> > Tested-by: Kan Liang <kan.liang@linux.intel.com>
> 
> Dapeng's comment should cover replace the comment /* Followed by topdown
> events. */ but there are other things amiss. I'm thinking of something like:

Thanks. I would change the comments.

> "slots,cycles,{instructions,topdown-be-bound}" the topdown-be-bound should get
> sorted and grouped with slots, but cycles and instructions have no reason to be
> reordered, so do we end up with slots, instructions and topdown-be-bound being
> grouped with cycles sitting ungrouped in the middle of the evlist? I believe there
> are assumptions that grouped evsels are adjacent in the evlist, not least
> in:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-
> next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2106
> Does cycles instructions end up being broken out of a group in this case? Which
> feels like the case the code was trying to avoid.
> 
> Thanks,
> Ian

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Namhyung Kim 1 year, 4 months ago

On Thu, Oct 03, 2024 at 02:26:29PM -0700, Ian Rogers wrote:
> On Thu, Oct 3, 2024 at 12:45 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >
> >
> >
> > On 2024-10-03 12:45 p.m., Namhyung Kim wrote:
> > >>> If the algorithms cannot be changed, can you please give some
> > >>> suggestions, especially for the sample read failure?
> > >> So this is symmetric:
> > >> ```
> > >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > >>   return -1;
> > >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> > >>   return 1;
> > >> ```
> > >> That is were lhs and rhs swapped then you'd get the expected comparison order.
> > >> ```
> > >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
> > >> lhs->core.leader != rhs->core.leader)
> > >>   return -1;
> > >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> > >> lhs->core.leader != rhs->core.leader)
> > >>   return 1;
> > >> ```
> > >> Is symmetric as well.
> > >> ```
> > >> if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > >>   return -1;
> > >> if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> > >> lhs->core.leader != rhs->core.leader)
> > >>   return 1;
> > >> ```
> > >> (what this patch does) is not symmetric as the group leader impacts
> > >> the greater-than case but not the less-than case.
> > >>
> > >> It is not uncommon to see in a sort function:
> > >> ```
> > >> if (cmp(a, b) <= 0) {
> > >>   assert(cmp(b,a) >= 0 && "check for unstable/broken compare functions");
> > >> ```
> > > I see.  So are you proposing this?
> > >
> > > diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> > > index 438e4639fa892304..46884fa17fe658a6 100644
> > > --- a/tools/perf/arch/x86/util/evlist.c
> > > +++ b/tools/perf/arch/x86/util/evlist.c
> > > @@ -70,7 +70,8 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> > >                 if (arch_is_topdown_slots(rhs))
> > >                         return 1;
> > >                 /* Followed by topdown events. */
> > > -               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> > > +               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs) &&
> > > +                   lhs->core.leader != rhs->core.leader)
> > >                         return -1;
> > >                 /*
> > >                  * Move topdown events forward only when topdown events
> > >
> > > Dapeng and Kan, can you verify if it's ok?  My quick tests look ok.
> >
> > I verified the above change. It works well.
> >
> > Tested-by: Kan Liang <kan.liang@linux.intel.com>

Thanks for the check!

> 
> Dapeng's comment should cover replace the comment /* Followed by
> topdown events. */ but there are other things amiss. I'm thinking of
> something like: "slots,cycles,{instructions,topdown-be-bound}" the
> topdown-be-bound should get sorted and grouped with slots, but cycles
> and instructions have no reason to be reordered, so do we end up with
> slots, instructions and topdown-be-bound being grouped with cycles
> sitting ungrouped in the middle of the evlist? I believe there are
> assumptions that grouped evsels are adjacent in the evlist, not least
> in:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2106
> Does cycles instructions end up being broken out of a group in this
> case? Which feels like the case the code was trying to avoid.

I got this:

  $ sudo ./perf record -a -e "slots,cycles,{instructions,topdown-be-bound}" true
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-be-bound).
  "dmesg | grep -i perf" may provide additional information.

Thanks,
Namhyung

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Liang, Kan 1 year, 4 months ago

On 2024-10-03 6:13 p.m., Namhyung Kim wrote:
>> Dapeng's comment should cover replace the comment /* Followed by
>> topdown events. */ but there are other things amiss. I'm thinking of
>> something like: "slots,cycles,{instructions,topdown-be-bound}" the
>> topdown-be-bound should get sorted and grouped with slots, but cycles
>> and instructions have no reason to be reordered, so do we end up with
>> slots, instructions and topdown-be-bound being grouped with cycles
>> sitting ungrouped in the middle of the evlist? I believe there are
>> assumptions that grouped evsels are adjacent in the evlist, not least
>> in:
>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2106
>> Does cycles instructions end up being broken out of a group in this
>> case? Which feels like the case the code was trying to avoid.
> I got this:
> 
>   $ sudo ./perf record -a -e "slots,cycles,{instructions,topdown-be-bound}" true
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-be-bound).
>   "dmesg | grep -i perf" may provide additional information.

To be honest, I think the "slots,cycles,{instructions,topdown-be-bound}"
is a meaningless case. Why a user wants to group instructions and
topdown events, but leave the slots out of the group?
There could be hundreds of different combinations caused by the perf
metrics mess. I don't think the re-ordering code should/can fix all of them.

For the case which the re-ordering cannot cover (like above), an error
out is acceptable. So the end user can update their command to a more
meaningful format, either {slots,cycles,instructions,topdown-be-bound}
or {slots,topdown-be-bound},cycles,instructions still works.

I think what the patch set really fixed is the failure of sample read
with perf metrics. Without the patch set, it never works no matter how
you change the order of the events.
A better ordering is just a nice to have feature. If perf cannot
provides a perfect re-ordering, I think an error out is also OK.

Thanks,
Kan

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Ian Rogers 1 year, 4 months ago

On Thu, Oct 3, 2024 at 4:29 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-10-03 6:13 p.m., Namhyung Kim wrote:
> >> Dapeng's comment should cover replace the comment /* Followed by
> >> topdown events. */ but there are other things amiss. I'm thinking of
> >> something like: "slots,cycles,{instructions,topdown-be-bound}" the
> >> topdown-be-bound should get sorted and grouped with slots, but cycles
> >> and instructions have no reason to be reordered, so do we end up with
> >> slots, instructions and topdown-be-bound being grouped with cycles
> >> sitting ungrouped in the middle of the evlist? I believe there are
> >> assumptions that grouped evsels are adjacent in the evlist, not least
> >> in:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2106
> >> Does cycles instructions end up being broken out of a group in this
> >> case? Which feels like the case the code was trying to avoid.
> > I got this:
> >
> >   $ sudo ./perf record -a -e "slots,cycles,{instructions,topdown-be-bound}" true
> >   Error:
> >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-be-bound).
> >   "dmesg | grep -i perf" may provide additional information.
>
> To be honest, I think the "slots,cycles,{instructions,topdown-be-bound}"
> is a meaningless case. Why a user wants to group instructions and
> topdown events, but leave the slots out of the group?
> There could be hundreds of different combinations caused by the perf
> metrics mess. I don't think the re-ordering code should/can fix all of them.

I'm happy with better code and things don't need to be perfect. Can we
fix the comments though? It'd be nice to also include that some things
are going to be broken. I can imagine groups with topdown events but
without slots, for example we group events in metrics and in
tma_retiring we add "0 * tma_info_thread_slots" to the metric so that
we get a slots event. If the multiply were optimized away as redundant
then we'd have a topdown group without slots, we could pick up slots
and other events from other metrics.

> For the case which the re-ordering cannot cover (like above), an error
> out is acceptable. So the end user can update their command to a more
> meaningful format, either {slots,cycles,instructions,topdown-be-bound}
> or {slots,topdown-be-bound},cycles,instructions still works.

Perhaps we can add an arch error path that could help more for topdown
events given they are a particular pain to open.

> I think what the patch set really fixed is the failure of sample read
> with perf metrics. Without the patch set, it never works no matter how
> you change the order of the events.
> A better ordering is just a nice to have feature. If perf cannot
> provides a perfect re-ordering, I think an error out is also OK.

Agreed, we don't need to fix everything and focussing on the common
use cases makes sense.

Thanks,
Ian

RE: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Mi, Dapeng1 1 year, 4 months ago


> -----Original Message-----
> From: Ian Rogers <irogers@google.com>
> Sent: Friday, October 4, 2024 7:36 AM
> To: Liang, Kan <kan.liang@linux.intel.com>
> Cc: Namhyung Kim <namhyung@kernel.org>; Peter Zijlstra
> <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>; Arnaldo Carvalho
> de Melo <acme@kernel.org>; Hunter, Adrian <adrian.hunter@intel.com>;
> Alexander Shishkin <alexander.shishkin@linux.intel.com>; Dapeng Mi
> <dapeng1.mi@linux.intel.com>; linux-perf-users@vger.kernel.org; linux-
> kernel@vger.kernel.org; Yongwei Ma <yongwei.ma@intel.com>; Mi, Dapeng1
> <dapeng1.mi@intel.com>
> Subject: Re: [Patch v5 0/6] Bug fixes on topdown events reordering
> 
> On Thu, Oct 3, 2024 at 4:29 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >
> >
> >
> > On 2024-10-03 6:13 p.m., Namhyung Kim wrote:
> > >> Dapeng's comment should cover replace the comment /* Followed by
> > >> topdown events. */ but there are other things amiss. I'm thinking
> > >> of something like: "slots,cycles,{instructions,topdown-be-bound}"
> > >> the topdown-be-bound should get sorted and grouped with slots, but
> > >> cycles and instructions have no reason to be reordered, so do we
> > >> end up with slots, instructions and topdown-be-bound being grouped
> > >> with cycles sitting ungrouped in the middle of the evlist? I
> > >> believe there are assumptions that grouped evsels are adjacent in
> > >> the evlist, not least
> > >> in:
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-nex
> > >> t.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2106
> > >> Does cycles instructions end up being broken out of a group in this
> > >> case? Which feels like the case the code was trying to avoid.
> > > I got this:
> > >
> > >   $ sudo ./perf record -a -e "slots,cycles,{instructions,topdown-be-bound}"
> true
> > >   Error:
> > >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for
> event (topdown-be-bound).
> > >   "dmesg | grep -i perf" may provide additional information.
> >
> > To be honest, I think the "slots,cycles,{instructions,topdown-be-bound}"
> > is a meaningless case. Why a user wants to group instructions and
> > topdown events, but leave the slots out of the group?
> > There could be hundreds of different combinations caused by the perf
> > metrics mess. I don't think the re-ordering code should/can fix all of them.
> 
> I'm happy with better code and things don't need to be perfect. Can we fix the
> comments though? It'd be nice to also include that some things are going to be
> broken. I can imagine groups with topdown events but without slots, for example
> we group events in metrics and in tma_retiring we add "0 *
> tma_info_thread_slots" to the metric so that we get a slots event. If the multiply
> were optimized away as redundant then we'd have a topdown group without
> slots, we could pick up slots and other events from other metrics.

I don't think this patch would break any current regroup case. It just blocks to move topdown metrics event if they are already in same group with slot events. We can see same error for this event combination "slots,cycles,{instructions,topdown-be-bound}" in the original code.

As Kan said, there could be hundreds of topdown metrics combinations, it's unrealistic to cover all these combinations just using sorting, and even it can be done, the comparator would become much complicated and hard to maintain.

I think we'd better just maintain several commonly used regroup cases, it would be fine to raise an error for these unsupported combinations as long as error message is clear enough.

Ian, I don't fully understand your words, could you please give an example? Thanks.


> 
> > For the case which the re-ordering cannot cover (like above), an error
> > out is acceptable. So the end user can update their command to a more
> > meaningful format, either {slots,cycles,instructions,topdown-be-bound}
> > or {slots,topdown-be-bound},cycles,instructions still works.
> 
> Perhaps we can add an arch error path that could help more for topdown events
> given they are a particular pain to open.
> 
> > I think what the patch set really fixed is the failure of sample read
> > with perf metrics. Without the patch set, it never works no matter how
> > you change the order of the events.
> > A better ordering is just a nice to have feature. If perf cannot
> > provides a perfect re-ordering, I think an error out is also OK.
> 
> Agreed, we don't need to fix everything and focussing on the common use cases
> makes sense.
> 
> Thanks,
> Ian

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Ian Rogers 1 year, 4 months ago

On Mon, Oct 7, 2024 at 7:52 PM Mi, Dapeng1 <dapeng1.mi@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Ian Rogers <irogers@google.com>
> > Sent: Friday, October 4, 2024 7:36 AM
> > To: Liang, Kan <kan.liang@linux.intel.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>; Peter Zijlstra
> > <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>; Arnaldo Carvalho
> > de Melo <acme@kernel.org>; Hunter, Adrian <adrian.hunter@intel.com>;
> > Alexander Shishkin <alexander.shishkin@linux.intel.com>; Dapeng Mi
> > <dapeng1.mi@linux.intel.com>; linux-perf-users@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Yongwei Ma <yongwei.ma@intel.com>; Mi, Dapeng1
> > <dapeng1.mi@intel.com>
> > Subject: Re: [Patch v5 0/6] Bug fixes on topdown events reordering
> >
> > On Thu, Oct 3, 2024 at 4:29 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> > >
> > >
> > >
> > > On 2024-10-03 6:13 p.m., Namhyung Kim wrote:
> > > >> Dapeng's comment should cover replace the comment /* Followed by
> > > >> topdown events. */ but there are other things amiss. I'm thinking
> > > >> of something like: "slots,cycles,{instructions,topdown-be-bound}"
> > > >> the topdown-be-bound should get sorted and grouped with slots, but
> > > >> cycles and instructions have no reason to be reordered, so do we
> > > >> end up with slots, instructions and topdown-be-bound being grouped
> > > >> with cycles sitting ungrouped in the middle of the evlist? I
> > > >> believe there are assumptions that grouped evsels are adjacent in
> > > >> the evlist, not least
> > > >> in:
> > > >> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-nex
> > > >> t.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2106
> > > >> Does cycles instructions end up being broken out of a group in this
> > > >> case? Which feels like the case the code was trying to avoid.
> > > > I got this:
> > > >
> > > >   $ sudo ./perf record -a -e "slots,cycles,{instructions,topdown-be-bound}"
> > true
> > > >   Error:
> > > >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for
> > event (topdown-be-bound).
> > > >   "dmesg | grep -i perf" may provide additional information.
> > >
> > > To be honest, I think the "slots,cycles,{instructions,topdown-be-bound}"
> > > is a meaningless case. Why a user wants to group instructions and
> > > topdown events, but leave the slots out of the group?
> > > There could be hundreds of different combinations caused by the perf
> > > metrics mess. I don't think the re-ordering code should/can fix all of them.
> >
> > I'm happy with better code and things don't need to be perfect. Can we fix the
> > comments though? It'd be nice to also include that some things are going to be
> > broken. I can imagine groups with topdown events but without slots, for example
> > we group events in metrics and in tma_retiring we add "0 *
> > tma_info_thread_slots" to the metric so that we get a slots event. If the multiply
> > were optimized away as redundant then we'd have a topdown group without
> > slots, we could pick up slots and other events from other metrics.
>
> I don't think this patch would break any current regroup case. It just blocks to move topdown metrics event if they are already in same group with slot events. We can see same error for this event combination "slots,cycles,{instructions,topdown-be-bound}" in the original code.
>
> As Kan said, there could be hundreds of topdown metrics combinations, it's unrealistic to cover all these combinations just using sorting, and even it can be done, the comparator would become much complicated and hard to maintain.
>
> I think we'd better just maintain several commonly used regroup cases, it would be fine to raise an error for these unsupported combinations as long as error message is clear enough.
>
> Ian, I don't fully understand your words, could you please give an example? Thanks.

So in the comparator I think most people won't understand the list of
cases that are supported and those that are not supported:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/evlist.c?h=perf-tools-next#n35
The groups usually come from metrics and to work around issues there
are constraints on those that may or may not group events. This could
yield situations that don't work given the cases you list, but I don't
think current metrics will violate this. We do test metrics
individually but not together as part of perf test.

Thanks,
Ian

> >
> > > For the case which the re-ordering cannot cover (like above), an error
> > > out is acceptable. So the end user can update their command to a more
> > > meaningful format, either {slots,cycles,instructions,topdown-be-bound}
> > > or {slots,topdown-be-bound},cycles,instructions still works.
> >
> > Perhaps we can add an arch error path that could help more for topdown events
> > given they are a particular pain to open.
> >
> > > I think what the patch set really fixed is the failure of sample read
> > > with perf metrics. Without the patch set, it never works no matter how
> > > you change the order of the events.
> > > A better ordering is just a nice to have feature. If perf cannot
> > > provides a perfect re-ordering, I think an error out is also OK.
> >
> > Agreed, we don't need to fix everything and focussing on the common use cases
> > makes sense.
> >
> > Thanks,
> > Ian

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Namhyung Kim 1 year, 4 months ago

On Thu, Oct 03, 2024 at 04:36:25PM -0700, Ian Rogers wrote:
> On Thu, Oct 3, 2024 at 4:29 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >
> >
> >
> > On 2024-10-03 6:13 p.m., Namhyung Kim wrote:
> > >> Dapeng's comment should cover replace the comment /* Followed by
> > >> topdown events. */ but there are other things amiss. I'm thinking of
> > >> something like: "slots,cycles,{instructions,topdown-be-bound}" the
> > >> topdown-be-bound should get sorted and grouped with slots, but cycles
> > >> and instructions have no reason to be reordered, so do we end up with
> > >> slots, instructions and topdown-be-bound being grouped with cycles
> > >> sitting ungrouped in the middle of the evlist? I believe there are
> > >> assumptions that grouped evsels are adjacent in the evlist, not least
> > >> in:
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2106
> > >> Does cycles instructions end up being broken out of a group in this
> > >> case? Which feels like the case the code was trying to avoid.
> > > I got this:
> > >
> > >   $ sudo ./perf record -a -e "slots,cycles,{instructions,topdown-be-bound}" true
> > >   Error:
> > >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-be-bound).
> > >   "dmesg | grep -i perf" may provide additional information.
> >
> > To be honest, I think the "slots,cycles,{instructions,topdown-be-bound}"
> > is a meaningless case. Why a user wants to group instructions and
> > topdown events, but leave the slots out of the group?
> > There could be hundreds of different combinations caused by the perf
> > metrics mess. I don't think the re-ordering code should/can fix all of them.
> 
> I'm happy with better code and things don't need to be perfect. Can we
> fix the comments though? It'd be nice to also include that some things
> are going to be broken. I can imagine groups with topdown events but

Can you please propose something?

Thanks,
Namhyung


> without slots, for example we group events in metrics and in
> tma_retiring we add "0 * tma_info_thread_slots" to the metric so that
> we get a slots event. If the multiply were optimized away as redundant
> then we'd have a topdown group without slots, we could pick up slots
> and other events from other metrics.

Re: [Patch v5 0/6] Bug fixes on topdown events reordering

Posted by Ian Rogers 1 year, 4 months ago

On Tue, Oct 1, 2024 at 3:32 PM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
> >
> > > Changes:
> > > v5 -> v6:
> > >   * no function change.
> > >   * rebase patchset to latest code of perf-tool-next tree.
> > >   * Add Kan's reviewed-by tag.
> > >
> > > History:
> > >   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
> > >   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
> > >   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
> > >   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
> > >
> > > [...]
> >
> > Applied to perf-tools-next, thanks!
>
> I disagreed with an early patch set and the issue wasn't resolved. Specifically:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
> ```
>   /* Followed by topdown events. */
>   if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>   return -1;
> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> + /*
> + * Move topdown events forward only when topdown events
> + * are not in same group with previous event.
> + */
> + if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
> +     lhs->core.leader != rhs->core.leader)
>   return 1;
> ```
> Is a broken comparator as the lhs then rhs behavior varies from the
> rhs then lhs behavior. The qsort implementation can randomly order the
> events.
> Please drop/revert.

Tihs is still in the tree:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
It should be invariant that for all comparators that:
a < b == b > a
This code breaks this property and so event sorting is broken.

Thanks,
Ian