[PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events

Florian Fischer posted 3 patches 4 years, 1 month ago
[PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events
Posted by Florian Fischer 4 years, 1 month ago
This patch series adds new internal events to perf stat exposing the times spend
in user and kernel mode in nanoseconds reported by rusage.

During some benchmarking using perf it bothered me that I could not easily
retrieve those times from perf stat when using the machine readable output.

But perf definitely knows about those values because in the human readable output
they are present.

Therefore I exposed the times reported by rusage via the new tool events:
user_time and system_time.

This allows to retrieved them in machine-readable output:

$ ./perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time tools/perf
72134524,ns,duration_time:u,72134524,100.00,,
65225000,ns,user_time:u,65225000,100.00,,
6865000,ns,ssystem_time:u,6865000,100.00,,
38705,,cache-misses:u,71189328,100.00,,

The changes are mostly inspired by the code for the only other available
tool event: 'duration_time'.

For now the new user_time and system_time events are not usable with metrics
(as far as I understand).

The patch applies cleanly on linux-next/next-20220420.

[PATCH v4 1/3] perf stat: introduce stats for the user and system
[PATCH v4 2/3] perf stat: add user_time and system_time events
[PATCH v4 3/3] perf list: print all available tool events

[v2]: Split up the changes into separate commits.

[v3]: * Use nanoseconds as suggested by Xing Zhengjun.
      * Squash [PATCH v2 2/4] and [PATCH v2 4/4], because it only adds the unit string.

[v4]: * Simplify event names to only user_time and system_time as suggested by Ian Rogers.

Florian Fischer
Re: [PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events
Posted by Arnaldo Carvalho de Melo 4 years, 1 month ago
Em Wed, Apr 20, 2022 at 12:23:51PM +0200, Florian Fischer escreveu:
> This patch series adds new internal events to perf stat exposing the times spend
> in user and kernel mode in nanoseconds reported by rusage.
> 
> During some benchmarking using perf it bothered me that I could not easily
> retrieve those times from perf stat when using the machine readable output.
> 
> But perf definitely knows about those values because in the human readable output
> they are present.
> 
> Therefore I exposed the times reported by rusage via the new tool events:
> user_time and system_time.
> 
> This allows to retrieved them in machine-readable output:
> 
> $ ./perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time tools/perf
> 72134524,ns,duration_time:u,72134524,100.00,,
> 65225000,ns,user_time:u,65225000,100.00,,
> 6865000,ns,ssystem_time:u,6865000,100.00,,
> 38705,,cache-misses:u,71189328,100.00,,
> 
> The changes are mostly inspired by the code for the only other available
> tool event: 'duration_time'.
> 
> For now the new user_time and system_time events are not usable with metrics
> (as far as I understand).
> 
> The patch applies cleanly on linux-next/next-20220420.

I tested and applied the first two patches, please take a look at that
"OR" problem with patch 3.

- Arnaldo
 
> [PATCH v4 1/3] perf stat: introduce stats for the user and system
> [PATCH v4 2/3] perf stat: add user_time and system_time events
> [PATCH v4 3/3] perf list: print all available tool events
> 
> [v2]: Split up the changes into separate commits.
> 
> [v3]: * Use nanoseconds as suggested by Xing Zhengjun.
>       * Squash [PATCH v2 2/4] and [PATCH v2 4/4], because it only adds the unit string.
> 
> [v4]: * Simplify event names to only user_time and system_time as suggested by Ian Rogers.
> 
> Florian Fischer
Re: [PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events
Posted by Namhyung Kim 4 years, 1 month ago
Hello,

On Fri, Apr 22, 2022 at 3:05 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Wed, Apr 20, 2022 at 12:23:51PM +0200, Florian Fischer escreveu:
> > This patch series adds new internal events to perf stat exposing the times spend
> > in user and kernel mode in nanoseconds reported by rusage.
> >
> > During some benchmarking using perf it bothered me that I could not easily
> > retrieve those times from perf stat when using the machine readable output.
> >
> > But perf definitely knows about those values because in the human readable output
> > they are present.
> >
> > Therefore I exposed the times reported by rusage via the new tool events:
> > user_time and system_time.
> >
> > This allows to retrieved them in machine-readable output:
> >
> > $ ./perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time tools/perf
> > 72134524,ns,duration_time:u,72134524,100.00,,
> > 65225000,ns,user_time:u,65225000,100.00,,
> > 6865000,ns,ssystem_time:u,6865000,100.00,,

ssystem?  Looks like a typo.

Anyway it looks a little bit strange to me if we can get
system time in user mode only (the 'u' modifier).

Thanks,
Namhyung


> > 38705,,cache-misses:u,71189328,100.00,,
> >
> > The changes are mostly inspired by the code for the only other available
> > tool event: 'duration_time'.
Re: [PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events
Posted by Florian Fischer 4 years, 1 month ago
On 22.04.2022 16:52, Namhyung Kim wrote:
> Hello,
> 
> On Fri, Apr 22, 2022 at 3:05 PM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > Em Wed, Apr 20, 2022 at 12:23:51PM +0200, Florian Fischer escreveu:
> > > This patch series adds new internal events to perf stat exposing the times spend
> > > in user and kernel mode in nanoseconds reported by rusage.
> > >
> > > During some benchmarking using perf it bothered me that I could not easily
> > > retrieve those times from perf stat when using the machine readable output.
> > >
> > > But perf definitely knows about those values because in the human readable output
> > > they are present.
> > >
> > > Therefore I exposed the times reported by rusage via the new tool events:
> > > user_time and system_time.
> > >
> > > This allows to retrieved them in machine-readable output:
> > >
> > > $ ./perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time tools/perf
> > > 72134524,ns,duration_time:u,72134524,100.00,,
> > > 65225000,ns,user_time:u,65225000,100.00,,
> > > 6865000,ns,ssystem_time:u,6865000,100.00,,
> 
> ssystem?  Looks like a typo.

Yes thanks. Fortunately I fixed it in the commit that got applied [1].

> 
> Anyway it looks a little bit strange to me if we can get
> system time in user mode only (the 'u' modifier).

Sorry but I don't really understand what you mean.
The system_time is reported to userspace via rusage filled by wait4(2).
It will always report the value reported to the user space regardless of what
counters perf has access to.

If you run perf as user you get the same system_time (but with the ':u' suffix)
as when you run perf as root or lower kernel.perf_event_paranoid to allow access
to more counters.

$ perf stat -x, -e system_time sync
31246000,ns,system_time:u,31246000,100.00,,

# perf stat -x, -e system_time sync
33794000,ns,system_time,33794000,100.00,,

> 
> Thanks,
> Namhyung
> 

Flo Fischer

[1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=tmp.perf/core&id=b03b89b350034f220cc24fc77c56990a97a796b2
Re: [PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events
Posted by Namhyung Kim 4 years, 1 month ago
On Sat, Apr 23, 2022 at 5:16 AM Florian Fischer
<florian.fischer@muhq.space> wrote:
>
> On 22.04.2022 16:52, Namhyung Kim wrote:
> > Hello,
> >
> > On Fri, Apr 22, 2022 at 3:05 PM Arnaldo Carvalho de Melo
> > <acme@kernel.org> wrote:
> > >
> > > Em Wed, Apr 20, 2022 at 12:23:51PM +0200, Florian Fischer escreveu:
> > > > This patch series adds new internal events to perf stat exposing the times spend
> > > > in user and kernel mode in nanoseconds reported by rusage.
> > > >
> > > > During some benchmarking using perf it bothered me that I could not easily
> > > > retrieve those times from perf stat when using the machine readable output.
> > > >
> > > > But perf definitely knows about those values because in the human readable output
> > > > they are present.
> > > >
> > > > Therefore I exposed the times reported by rusage via the new tool events:
> > > > user_time and system_time.
> > > >
> > > > This allows to retrieved them in machine-readable output:
> > > >
> > > > $ ./perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time tools/perf
> > > > 72134524,ns,duration_time:u,72134524,100.00,,
> > > > 65225000,ns,user_time:u,65225000,100.00,,
> > > > 6865000,ns,ssystem_time:u,6865000,100.00,,

> > Anyway it looks a little bit strange to me if we can get
> > system time in user mode only (the 'u' modifier).
>
> Sorry but I don't really understand what you mean.
> The system_time is reported to userspace via rusage filled by wait4(2).
> It will always report the value reported to the user space regardless of what
> counters perf has access to.
>
> If you run perf as user you get the same system_time (but with the ':u' suffix)
> as when you run perf as root or lower kernel.perf_event_paranoid to allow access
> to more counters.

The ':u' modifier means that the event should count only in user mode.
So I think system_time:u should be 0 by definition.
Likewise, user_time:k should be handled as such.

But as I said before, we already have the task-clock event, so it's not
clear why we need this too.  Also these tool events won't work for
other use cases like perf record.

Thanks,
Namhyung
Re: [PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events
Posted by Florian Fischer 4 years, 1 month ago
On 25.04.2022 12:06, Namhyung Kim wrote:
> On Sat, Apr 23, 2022 at 5:16 AM Florian Fischer
> <florian.fischer@muhq.space> wrote:
> >
> > On 22.04.2022 16:52, Namhyung Kim wrote:
> > > Hello,
> > >
> > > On Fri, Apr 22, 2022 at 3:05 PM Arnaldo Carvalho de Melo
> > > <acme@kernel.org> wrote:
> > > >
> > > > Em Wed, Apr 20, 2022 at 12:23:51PM +0200, Florian Fischer escreveu:
> > > > > This patch series adds new internal events to perf stat exposing the times spend
> > > > > in user and kernel mode in nanoseconds reported by rusage.
> > > > >
> > > > > During some benchmarking using perf it bothered me that I could not easily
> > > > > retrieve those times from perf stat when using the machine readable output.
> > > > >
> > > > > But perf definitely knows about those values because in the human readable output
> > > > > they are present.
> > > > >
> > > > > Therefore I exposed the times reported by rusage via the new tool events:
> > > > > user_time and system_time.
> > > > >
> > > > > This allows to retrieved them in machine-readable output:
> > > > >
> > > > > $ ./perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time tools/perf
> > > > > 72134524,ns,duration_time:u,72134524,100.00,,
> > > > > 65225000,ns,user_time:u,65225000,100.00,,
> > > > > 6865000,ns,ssystem_time:u,6865000,100.00,,
> 
> > > Anyway it looks a little bit strange to me if we can get
> > > system time in user mode only (the 'u' modifier).
> >
> > Sorry but I don't really understand what you mean.
> > The system_time is reported to userspace via rusage filled by wait4(2).
> > It will always report the value reported to the user space regardless of what
> > counters perf has access to.
> >
> > If you run perf as user you get the same system_time (but with the ':u' suffix)
> > as when you run perf as root or lower kernel.perf_event_paranoid to allow access
> > to more counters.
> 
> The ':u' modifier means that the event should count only in user mode.
> So I think system_time:u should be 0 by definition.
> Likewise, user_time:k should be handled as such.

Not sure if this logic applies to tool events because the time spend in user mode
or kernel mdoe stays the same regardless of where you count it.

This is consistent with how duration_time works currently in perf stat.

./perf stat -e duration_time,duration_time:u,duration_time:k -- grep -q -r duration_time .

 Performance counter stats for 'grep -q -r duration_time .':

        29,012,558 ns   duration_time
        29,012,558 ns   duration_time:u
        29,012,558 ns   duration_time:k

       0.029012558 seconds time elapsed

       0.029088000 seconds user
       0.000000000 seconds sys

> 
> But as I said before, we already have the task-clock event, so it's not
> clear why we need this too.  Also these tool events won't work for
> other use cases like perf record.

Maybe I do not know enough about perf but I think this is not exactly how those
suffixes work. This is the output of perf of next-20220420 and my rusage tool
event patches applied:

$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = -1

linux/tools/perf$ ./perf stat -e task-clock,task-clock:u,task-clock:k,\
                                 duration_time,duration_time:u,duration_time:k,\
                                 system_time,system_time:u,system_time:k,\
                                 user_time,user_time:u,user_time:k -- grep -q -r duration_time .

 Performance counter stats for 'grep -q -r duration_time .':

             42.30 msec task-clock                #    0.483 CPUs utilized
             42.30 msec task-clock:u              #    0.483 CPUs utilized
             42.30 msec task-clock:k              #    0.483 CPUs utilized
        87,533,340 ns   duration_time             #    2.069 G/sec
        87,533,340 ns   duration_time:u           #    2.069 G/sec
        87,533,340 ns   duration_time:k           #    2.069 G/sec
        12,841,000 ns   system_time               #  303.547 M/sec
        12,841,000 ns   system_time:u             #  303.547 M/sec
        12,841,000 ns   system_time:k             #  303.547 M/sec
        30,820,000 ns   user_time                 #  728.550 M/sec
        30,820,000 ns   user_time:u               #  728.550 M/sec
        30,820,000 ns   user_time:k               #  728.550 M/sec

       0.087533340 seconds time elapsed

       0.030820000 seconds user
       0.012841000 seconds sys


So task-clock regardless of its ':u' or ':k' suffix shows approximately the wall
clock time.

But for my experiments on a shared system where kernel.perf_event_paranoid is
not set to -1 and I feel uncomfortable to test research prototypes as root I
would like a simple and fast way to tell where my program spends its time.

This can be achieved by using user_time and system_time.
Which are reported in the human readable output anyway.

Maybe I am doing something wrong here but as shown above task-clock does not
provide the information I am interested in.

Florian Fischer
Re: [PATCHSET v4 next 0/3] perf stat: add user_time and system_time tool events
Posted by Namhyung Kim 4 years, 1 month ago
On Tue, Apr 26, 2022 at 1:58 AM Florian Fischer
<florian.fischer@muhq.space> wrote:
>
> On 25.04.2022 12:06, Namhyung Kim wrote:
> > On Sat, Apr 23, 2022 at 5:16 AM Florian Fischer
> > <florian.fischer@muhq.space> wrote:
> > >
> > > On 22.04.2022 16:52, Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > On Fri, Apr 22, 2022 at 3:05 PM Arnaldo Carvalho de Melo
> > > > <acme@kernel.org> wrote:
> > > > >
> > > > > Em Wed, Apr 20, 2022 at 12:23:51PM +0200, Florian Fischer escreveu:
> > > > > > This patch series adds new internal events to perf stat exposing the times spend
> > > > > > in user and kernel mode in nanoseconds reported by rusage.
> > > > > >
> > > > > > During some benchmarking using perf it bothered me that I could not easily
> > > > > > retrieve those times from perf stat when using the machine readable output.
> > > > > >
> > > > > > But perf definitely knows about those values because in the human readable output
> > > > > > they are present.
> > > > > >
> > > > > > Therefore I exposed the times reported by rusage via the new tool events:
> > > > > > user_time and system_time.
> > > > > >
> > > > > > This allows to retrieved them in machine-readable output:
> > > > > >
> > > > > > $ ./perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time tools/perf
> > > > > > 72134524,ns,duration_time:u,72134524,100.00,,
> > > > > > 65225000,ns,user_time:u,65225000,100.00,,
> > > > > > 6865000,ns,ssystem_time:u,6865000,100.00,,
> >
> > > > Anyway it looks a little bit strange to me if we can get
> > > > system time in user mode only (the 'u' modifier).
> > >
> > > Sorry but I don't really understand what you mean.
> > > The system_time is reported to userspace via rusage filled by wait4(2).
> > > It will always report the value reported to the user space regardless of what
> > > counters perf has access to.
> > >
> > > If you run perf as user you get the same system_time (but with the ':u' suffix)
> > > as when you run perf as root or lower kernel.perf_event_paranoid to allow access
> > > to more counters.
> >
> > The ':u' modifier means that the event should count only in user mode.
> > So I think system_time:u should be 0 by definition.
> > Likewise, user_time:k should be handled as such.
>
> Not sure if this logic applies to tool events because the time spend in user mode
> or kernel mdoe stays the same regardless of where you count it.
>
> This is consistent with how duration_time works currently in perf stat.
>
> ./perf stat -e duration_time,duration_time:u,duration_time:k -- grep -q -r duration_time .
>
>  Performance counter stats for 'grep -q -r duration_time .':
>
>         29,012,558 ns   duration_time
>         29,012,558 ns   duration_time:u
>         29,012,558 ns   duration_time:k
>
>        0.029012558 seconds time elapsed
>
>        0.029088000 seconds user
>        0.000000000 seconds sys
>
> >
> > But as I said before, we already have the task-clock event, so it's not
> > clear why we need this too.  Also these tool events won't work for
> > other use cases like perf record.
>
> Maybe I do not know enough about perf but I think this is not exactly how those
> suffixes work. This is the output of perf of next-20220420 and my rusage tool
> event patches applied:
>
> $ sysctl kernel.perf_event_paranoid
> kernel.perf_event_paranoid = -1
>
> linux/tools/perf$ ./perf stat -e task-clock,task-clock:u,task-clock:k,\
>                                  duration_time,duration_time:u,duration_time:k,\
>                                  system_time,system_time:u,system_time:k,\
>                                  user_time,user_time:u,user_time:k -- grep -q -r duration_time .
>
>  Performance counter stats for 'grep -q -r duration_time .':
>
>              42.30 msec task-clock                #    0.483 CPUs utilized
>              42.30 msec task-clock:u              #    0.483 CPUs utilized
>              42.30 msec task-clock:k              #    0.483 CPUs utilized
>         87,533,340 ns   duration_time             #    2.069 G/sec
>         87,533,340 ns   duration_time:u           #    2.069 G/sec
>         87,533,340 ns   duration_time:k           #    2.069 G/sec
>         12,841,000 ns   system_time               #  303.547 M/sec
>         12,841,000 ns   system_time:u             #  303.547 M/sec
>         12,841,000 ns   system_time:k             #  303.547 M/sec
>         30,820,000 ns   user_time                 #  728.550 M/sec
>         30,820,000 ns   user_time:u               #  728.550 M/sec
>         30,820,000 ns   user_time:k               #  728.550 M/sec
>
>        0.087533340 seconds time elapsed
>
>        0.030820000 seconds user
>        0.012841000 seconds sys
>
>
> So task-clock regardless of its ':u' or ':k' suffix shows approximately the wall
> clock time.

Ok, now I recall that these software events don't work with modifiers well.
Maybe we should reject or ignore setting modifiers on them.  But that's
a different work, of course.

>
> But for my experiments on a shared system where kernel.perf_event_paranoid is
> not set to -1 and I feel uncomfortable to test research prototypes as root I
> would like a simple and fast way to tell where my program spends its time.
>
> This can be achieved by using user_time and system_time.
> Which are reported in the human readable output anyway.
>
> Maybe I am doing something wrong here but as shown above task-clock does not
> provide the information I am interested in.

No, it's just me making this noise.  Thank you for taking the time to take
a look at the issue.

Thanks,
Namhyung