tools/lib/api/fs/fs.c | 6 +- tools/perf/builtin-kmem.c | 12 +- tools/perf/builtin-kwork.c | 3 +- tools/perf/builtin-record.c | 2 - tools/perf/builtin-script.c | 9 +- tools/perf/builtin-trace.c | 79 +++++++++---- tools/perf/tests/Build | 6 +- tools/perf/tests/builtin-test.c | 2 - tools/perf/tests/parse-events.c | 25 +--- tools/perf/util/Build | 2 +- tools/perf/util/data-convert-bt.c | 10 +- tools/perf/util/data-convert-json.c | 8 +- tools/perf/util/evsel.c | 110 +++++++++++++----- tools/perf/util/evsel.h | 9 +- tools/perf/util/evsel_fprintf.c | 4 +- tools/perf/util/parse-events.c | 16 +-- tools/perf/util/perf_event_attr_fprintf.c | 4 - .../util/scripting-engines/trace-event-perl.c | 3 +- .../scripting-engines/trace-event-python.c | 3 +- tools/perf/util/sort.c | 33 ++++-- tools/perf/util/trace-event-parse.c | 2 +- tools/perf/util/trace-event-scripting.c | 10 +- tools/perf/util/trace-event.h | 2 +- 23 files changed, 214 insertions(+), 146 deletions(-)
The tracepoint format isn't needed to open an event, just the id for the config value. Refactor the use of evsel->tp_format to use an accessor that will lazily construct its value. In evsel__newtp_idx read the id so the config value can be set up/used. This allows tracepoints to be used without libtraceevent in a number of tests. Other functionality is enabled without libtracevent, such as mapping a tracepoint id back to its name. There may be some performance benefit to code using tracepoints but not using the format information. v4. Rebase due to conflict with 9ac98662dbd3 ("perf: event: Remove deadcode") v3. Whitespace changes, Arnaldo. v2. Add additional error checking/handling in evsel__tp_format. Ian Rogers (6): tool api fs: Correctly encode errno for read/write open failures perf trace-event: Constify print arguments perf trace-event: Always build trace-event-info.c perf evsel: Add/use accessor for tp_format perf evsel: Allow evsel__newtp without libtraceevent perf tests: Enable tests disabled due to tracepoint parsing tools/lib/api/fs/fs.c | 6 +- tools/perf/builtin-kmem.c | 12 +- tools/perf/builtin-kwork.c | 3 +- tools/perf/builtin-record.c | 2 - tools/perf/builtin-script.c | 9 +- tools/perf/builtin-trace.c | 79 +++++++++---- tools/perf/tests/Build | 6 +- tools/perf/tests/builtin-test.c | 2 - tools/perf/tests/parse-events.c | 25 +--- tools/perf/util/Build | 2 +- tools/perf/util/data-convert-bt.c | 10 +- tools/perf/util/data-convert-json.c | 8 +- tools/perf/util/evsel.c | 110 +++++++++++++----- tools/perf/util/evsel.h | 9 +- tools/perf/util/evsel_fprintf.c | 4 +- tools/perf/util/parse-events.c | 16 +-- tools/perf/util/perf_event_attr_fprintf.c | 4 - .../util/scripting-engines/trace-event-perl.c | 3 +- .../scripting-engines/trace-event-python.c | 3 +- tools/perf/util/sort.c | 33 ++++-- tools/perf/util/trace-event-parse.c | 2 +- tools/perf/util/trace-event-scripting.c | 10 +- tools/perf/util/trace-event.h | 2 +- 23 files changed, 214 insertions(+), 146 deletions(-) -- 2.47.0.277.g8800431eea-goog
On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote: > The tracepoint format isn't needed to open an event, just the id for > the config value. Refactor the use of evsel->tp_format to use an > accessor that will lazily construct its value. In evsel__newtp_idx > read the id so the config value can be set up/used. > > This allows tracepoints to be used without libtraceevent in a number > of tests. Other functionality is enabled without libtracevent, such as > mapping a tracepoint id back to its name. There may be some > performance benefit to code using tracepoints but not using the format > information. > > v4. Rebase due to conflict with 9ac98662dbd3 ("perf: event: Remove deadcode") > v3. Whitespace changes, Arnaldo. > v2. Add additional error checking/handling in evsel__tp_format. > > Ian Rogers (6): > tool api fs: Correctly encode errno for read/write open failures > perf trace-event: Constify print arguments > perf trace-event: Always build trace-event-info.c > perf evsel: Add/use accessor for tp_format > perf evsel: Allow evsel__newtp without libtraceevent > perf tests: Enable tests disabled due to tracepoint parsing After applying this series, I'm seeing some test failures. But I don't understand why it affects non-tracepoint events though. $ sudo ./perf test -v pipe --- start --- test child forked, pid 3036123 1bde35-1bdecc l noploop perf does have symbol 'noploop' Record+report pipe test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.210 MB - ] [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.517 MB - ] [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.516 MB - ] Record+report pipe test [Success] Inject -B build-ids test 0xa5c [0x17a4]: failed to process type: 80 Error: failed to process sample Inject build-ids test [Failed - cannot find noploop function in pipe #1] Inject -b build-ids test 0xa5c [0x17a4]: failed to process type: 80 Error: failed to process sample Inject build-ids test [Failed - cannot find noploop function in pipe #1] Inject --buildid-all build-ids test 0xa5c [0x17a4]: failed to process type: 80 Error: failed to process sample Inject build-ids test [Failed - cannot find noploop function in pipe #1] Inject --mmap2-buildid-all build-ids test 0xa5c [0x17a4]: failed to process type: 80 Error: failed to process sample Inject build-ids test [Failed - cannot find noploop function in pipe #1] ---- end(-1) ---- 84: perf pipe recording and injection test : FAILED! $ sudo ./perf test -v Zstd --- start --- test child forked, pid 3036097 Collecting compressed record file: 500+0 records in 500+0 records out 256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ] Checking compressed events stats: Couldn't decompress data 0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted] Error: failed to process sample ---- end(-1) ---- 86: Zstd perf.data compression/decompression : FAILED! Thanks, Namhyung > > tools/lib/api/fs/fs.c | 6 +- > tools/perf/builtin-kmem.c | 12 +- > tools/perf/builtin-kwork.c | 3 +- > tools/perf/builtin-record.c | 2 - > tools/perf/builtin-script.c | 9 +- > tools/perf/builtin-trace.c | 79 +++++++++---- > tools/perf/tests/Build | 6 +- > tools/perf/tests/builtin-test.c | 2 - > tools/perf/tests/parse-events.c | 25 +--- > tools/perf/util/Build | 2 +- > tools/perf/util/data-convert-bt.c | 10 +- > tools/perf/util/data-convert-json.c | 8 +- > tools/perf/util/evsel.c | 110 +++++++++++++----- > tools/perf/util/evsel.h | 9 +- > tools/perf/util/evsel_fprintf.c | 4 +- > tools/perf/util/parse-events.c | 16 +-- > tools/perf/util/perf_event_attr_fprintf.c | 4 - > .../util/scripting-engines/trace-event-perl.c | 3 +- > .../scripting-engines/trace-event-python.c | 3 +- > tools/perf/util/sort.c | 33 ++++-- > tools/perf/util/trace-event-parse.c | 2 +- > tools/perf/util/trace-event-scripting.c | 10 +- > tools/perf/util/trace-event.h | 2 +- > 23 files changed, 214 insertions(+), 146 deletions(-) > > -- > 2.47.0.277.g8800431eea-goog >
On Fri, Nov 8, 2024 at 10:45 PM Namhyung Kim <namhyung@kernel.org> wrote: > On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote: > > Ian Rogers (6): > > tool api fs: Correctly encode errno for read/write open failures > > perf trace-event: Constify print arguments > > perf trace-event: Always build trace-event-info.c > > perf evsel: Add/use accessor for tp_format > > perf evsel: Allow evsel__newtp without libtraceevent > > perf tests: Enable tests disabled due to tracepoint parsing > > After applying this series, I'm seeing some test failures. But I don't > understand why it affects non-tracepoint events though. > > $ sudo ./perf test -v pipe > --- start --- > test child forked, pid 3036123 > 1bde35-1bdecc l noploop > perf does have symbol 'noploop' > > Record+report pipe test > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.210 MB - ] > [ perf record: Woken up 2 times to write data ] > [ perf record: Captured and wrote 0.517 MB - ] > [ perf record: Woken up 2 times to write data ] > [ perf record: Captured and wrote 0.516 MB - ] > Record+report pipe test [Success] > > Inject -B build-ids test > 0xa5c [0x17a4]: failed to process type: 80 > Error: > failed to process sample > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > Inject -b build-ids test > 0xa5c [0x17a4]: failed to process type: 80 > Error: > failed to process sample > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > Inject --buildid-all build-ids test > 0xa5c [0x17a4]: failed to process type: 80 > Error: > failed to process sample > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > Inject --mmap2-buildid-all build-ids test > 0xa5c [0x17a4]: failed to process type: 80 > Error: > failed to process sample > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > ---- end(-1) ---- > 84: perf pipe recording and injection test : FAILED! > > $ sudo ./perf test -v Zstd > --- start --- > test child forked, pid 3036097 > Collecting compressed record file: > 500+0 records in > 500+0 records out > 256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ] > Checking compressed events stats: > Couldn't decompress data > 0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted] > Error: > failed to process sample > ---- end(-1) ---- > 86: Zstd perf.data compression/decompression : FAILED! > > Thanks, > Namhyung I'm not able to reproduce: ``` $ git log --oneline |head a59bca6eb0a6 perf test: Add a runs-per-test flag 0d0c002eb45c perf tests: Enable tests disabled due to tracepoint parsing 4b8f5c9dfbda perf evsel: Allow evsel__newtp without libtraceevent 7f57057c7884 perf evsel: Add/use accessor for tp_format c27d357d2d4c perf trace-event: Always build trace-event-info.c 20bf7a2cd68a perf trace-event: Constify print arguments f18b07ee2af1 tool api fs: Correctly encode errno for read/write open failures ... $ sudo /tmp/perf/perf test -r 10 Zstd pipe -v 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 84: perf pipe recording and injection test : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok 86: Zstd perf.data compression/decompression : Ok ``` Similarly not as root or if runs-per-test is 100. Agreed that the changes are for tracepoints and these tests aren't for tracepoints, so an interaction wouldn't be expected. If you have a reliable reproduction perhaps you can bisect it. Thanks, Ian
On Sat, Nov 09, 2024 at 08:26:20AM -0800, Ian Rogers wrote: > On Fri, Nov 8, 2024 at 10:45 PM Namhyung Kim <namhyung@kernel.org> wrote: > > On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote: > > > Ian Rogers (6): > > > tool api fs: Correctly encode errno for read/write open failures > > > perf trace-event: Constify print arguments > > > perf trace-event: Always build trace-event-info.c > > > perf evsel: Add/use accessor for tp_format > > > perf evsel: Allow evsel__newtp without libtraceevent > > > perf tests: Enable tests disabled due to tracepoint parsing > > > > After applying this series, I'm seeing some test failures. But I don't > > understand why it affects non-tracepoint events though. > > > > $ sudo ./perf test -v pipe > > --- start --- > > test child forked, pid 3036123 > > 1bde35-1bdecc l noploop > > perf does have symbol 'noploop' > > > > Record+report pipe test > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.210 MB - ] > > [ perf record: Woken up 2 times to write data ] > > [ perf record: Captured and wrote 0.517 MB - ] > > [ perf record: Woken up 2 times to write data ] > > [ perf record: Captured and wrote 0.516 MB - ] > > Record+report pipe test [Success] > > > > Inject -B build-ids test > > 0xa5c [0x17a4]: failed to process type: 80 > > Error: > > failed to process sample > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > Inject -b build-ids test > > 0xa5c [0x17a4]: failed to process type: 80 > > Error: > > failed to process sample > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > Inject --buildid-all build-ids test > > 0xa5c [0x17a4]: failed to process type: 80 > > Error: > > failed to process sample > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > Inject --mmap2-buildid-all build-ids test > > 0xa5c [0x17a4]: failed to process type: 80 > > Error: > > failed to process sample > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > ---- end(-1) ---- > > 84: perf pipe recording and injection test : FAILED! > > > > $ sudo ./perf test -v Zstd > > --- start --- > > test child forked, pid 3036097 > > Collecting compressed record file: > > 500+0 records in > > 500+0 records out > > 256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ] > > Checking compressed events stats: > > Couldn't decompress data > > 0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted] > > Error: > > failed to process sample > > ---- end(-1) ---- > > 86: Zstd perf.data compression/decompression : FAILED! > > > > Thanks, > > Namhyung > > I'm not able to reproduce: > ``` > $ git log --oneline |head > a59bca6eb0a6 perf test: Add a runs-per-test flag > 0d0c002eb45c perf tests: Enable tests disabled due to tracepoint parsing > 4b8f5c9dfbda perf evsel: Allow evsel__newtp without libtraceevent > 7f57057c7884 perf evsel: Add/use accessor for tp_format > c27d357d2d4c perf trace-event: Always build trace-event-info.c > 20bf7a2cd68a perf trace-event: Constify print arguments > f18b07ee2af1 tool api fs: Correctly encode errno for read/write open failures > ... > $ sudo /tmp/perf/perf test -r 10 Zstd pipe -v > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 84: perf pipe recording and injection test : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > 86: Zstd perf.data compression/decompression : Ok > ``` > Similarly not as root or if runs-per-test is 100. > > Agreed that the changes are for tracepoints and these tests aren't for > tracepoints, so an interaction wouldn't be expected. If you have a > reliable reproduction perhaps you can bisect it. it says: 9c10de391840a35ab095b65e9a5c4fad0ac52068 is the first bad commit commit 9c10de391840a35ab095b65e9a5c4fad0ac52068 (HEAD) Author: Ian Rogers <irogers@google.com> Date: Fri Nov 8 10:47:46 2024 -0800 tool api fs: Correctly encode errno for read/write open failures Switch from returning -1 to -errno so that callers can determine types of failure. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> tools/lib/api/fs/fs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Thanks, Namhyung
On Sat, Nov 9, 2024 at 9:04 AM Namhyung Kim <namhyung@kernel.org> wrote: > > On Sat, Nov 09, 2024 at 08:26:20AM -0800, Ian Rogers wrote: > > On Fri, Nov 8, 2024 at 10:45 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote: > > > > Ian Rogers (6): > > > > tool api fs: Correctly encode errno for read/write open failures > > > > perf trace-event: Constify print arguments > > > > perf trace-event: Always build trace-event-info.c > > > > perf evsel: Add/use accessor for tp_format > > > > perf evsel: Allow evsel__newtp without libtraceevent > > > > perf tests: Enable tests disabled due to tracepoint parsing > > > > > > After applying this series, I'm seeing some test failures. But I don't > > > understand why it affects non-tracepoint events though. > > > > > > $ sudo ./perf test -v pipe > > > --- start --- > > > test child forked, pid 3036123 > > > 1bde35-1bdecc l noploop > > > perf does have symbol 'noploop' > > > > > > Record+report pipe test > > > [ perf record: Woken up 1 times to write data ] > > > [ perf record: Captured and wrote 0.210 MB - ] > > > [ perf record: Woken up 2 times to write data ] > > > [ perf record: Captured and wrote 0.517 MB - ] > > > [ perf record: Woken up 2 times to write data ] > > > [ perf record: Captured and wrote 0.516 MB - ] > > > Record+report pipe test [Success] > > > > > > Inject -B build-ids test > > > 0xa5c [0x17a4]: failed to process type: 80 > > > Error: > > > failed to process sample > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > Inject -b build-ids test > > > 0xa5c [0x17a4]: failed to process type: 80 > > > Error: > > > failed to process sample > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > Inject --buildid-all build-ids test > > > 0xa5c [0x17a4]: failed to process type: 80 > > > Error: > > > failed to process sample > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > Inject --mmap2-buildid-all build-ids test > > > 0xa5c [0x17a4]: failed to process type: 80 > > > Error: > > > failed to process sample > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > ---- end(-1) ---- > > > 84: perf pipe recording and injection test : FAILED! > > > > > > $ sudo ./perf test -v Zstd > > > --- start --- > > > test child forked, pid 3036097 > > > Collecting compressed record file: > > > 500+0 records in > > > 500+0 records out > > > 256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s > > > [ perf record: Woken up 1 times to write data ] > > > [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ] > > > Checking compressed events stats: > > > Couldn't decompress data > > > 0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted] > > > Error: > > > failed to process sample > > > ---- end(-1) ---- > > > 86: Zstd perf.data compression/decompression : FAILED! > > > > > > Thanks, > > > Namhyung > > > > I'm not able to reproduce: > > ``` > > $ git log --oneline |head > > a59bca6eb0a6 perf test: Add a runs-per-test flag > > 0d0c002eb45c perf tests: Enable tests disabled due to tracepoint parsing > > 4b8f5c9dfbda perf evsel: Allow evsel__newtp without libtraceevent > > 7f57057c7884 perf evsel: Add/use accessor for tp_format > > c27d357d2d4c perf trace-event: Always build trace-event-info.c > > 20bf7a2cd68a perf trace-event: Constify print arguments > > f18b07ee2af1 tool api fs: Correctly encode errno for read/write open failures > > ... > > $ sudo /tmp/perf/perf test -r 10 Zstd pipe -v > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 84: perf pipe recording and injection test : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > 86: Zstd perf.data compression/decompression : Ok > > ``` > > Similarly not as root or if runs-per-test is 100. > > > > Agreed that the changes are for tracepoints and these tests aren't for > > tracepoints, so an interaction wouldn't be expected. If you have a > > reliable reproduction perhaps you can bisect it. > > it says: > > 9c10de391840a35ab095b65e9a5c4fad0ac52068 is the first bad commit > commit 9c10de391840a35ab095b65e9a5c4fad0ac52068 (HEAD) > Author: Ian Rogers <irogers@google.com> > Date: Fri Nov 8 10:47:46 2024 -0800 > > tool api fs: Correctly encode errno for read/write open failures > > Switch from returning -1 to -errno so that callers can determine types > of failure. > > Signed-off-by: Ian Rogers <irogers@google.com> > Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > tools/lib/api/fs/fs.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) So I tried to eye-ball/grep all callers to spot assumptions on the return value like: ``` err = ...__read_int if (err == -1) ``` Didn't spot anything. It seems in the test log the record is failing so I ran this under gdb, set breakpoints on the 3 modified functions and then looked up the call stack to spot bad return value assumptions. Everything looks good. I then tried inject and report, the only file read by these functions is /proc/sys/kernel/perf_event_paranoid as part of symbol initialization (nit, this should probably be read lazily and the restriction should really come from the perf.data file, not the running system) and those calls look good. The change is small and not critical for the series. It improves the error message when reading the tracepoint id fails. So we could move forward with the rest of the series, but that could be annoying for tracepoint users. If I had a reproducer I'd revert the 1 line change on each function to find out which is causing the regression. Once you have that then you can binary search to find the bad call by using some global counter where the first 'n' calls use the new return value and the later use the old value. You can then vary 'n' to binary search and find the bad caller. Is there any chance you can help diagnose this or help me to find the reproducer? Thanks, Ian
On Wed, Nov 13, 2024 at 10:06:13AM -0800, Ian Rogers wrote: > On Sat, Nov 9, 2024 at 9:04 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > On Sat, Nov 09, 2024 at 08:26:20AM -0800, Ian Rogers wrote: > > > On Fri, Nov 8, 2024 at 10:45 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote: > > > > > Ian Rogers (6): > > > > > tool api fs: Correctly encode errno for read/write open failures > > > > > perf trace-event: Constify print arguments > > > > > perf trace-event: Always build trace-event-info.c > > > > > perf evsel: Add/use accessor for tp_format > > > > > perf evsel: Allow evsel__newtp without libtraceevent > > > > > perf tests: Enable tests disabled due to tracepoint parsing > > > > > > > > After applying this series, I'm seeing some test failures. But I don't > > > > understand why it affects non-tracepoint events though. > > > > > > > > $ sudo ./perf test -v pipe > > > > --- start --- > > > > test child forked, pid 3036123 > > > > 1bde35-1bdecc l noploop > > > > perf does have symbol 'noploop' > > > > > > > > Record+report pipe test > > > > [ perf record: Woken up 1 times to write data ] > > > > [ perf record: Captured and wrote 0.210 MB - ] > > > > [ perf record: Woken up 2 times to write data ] > > > > [ perf record: Captured and wrote 0.517 MB - ] > > > > [ perf record: Woken up 2 times to write data ] > > > > [ perf record: Captured and wrote 0.516 MB - ] > > > > Record+report pipe test [Success] > > > > > > > > Inject -B build-ids test > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > Error: > > > > failed to process sample > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > Inject -b build-ids test > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > Error: > > > > failed to process sample > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > Inject --buildid-all build-ids test > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > Error: > > > > failed to process sample > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > Inject --mmap2-buildid-all build-ids test > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > Error: > > > > failed to process sample > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > ---- end(-1) ---- > > > > 84: perf pipe recording and injection test : FAILED! > > > > > > > > $ sudo ./perf test -v Zstd > > > > --- start --- > > > > test child forked, pid 3036097 > > > > Collecting compressed record file: > > > > 500+0 records in > > > > 500+0 records out > > > > 256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s > > > > [ perf record: Woken up 1 times to write data ] > > > > [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ] > > > > Checking compressed events stats: > > > > Couldn't decompress data > > > > 0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted] > > > > Error: > > > > failed to process sample > > > > ---- end(-1) ---- > > > > 86: Zstd perf.data compression/decompression : FAILED! > > > > > > > > Thanks, > > > > Namhyung > > > > > > I'm not able to reproduce: > > > ``` > > > $ git log --oneline |head > > > a59bca6eb0a6 perf test: Add a runs-per-test flag > > > 0d0c002eb45c perf tests: Enable tests disabled due to tracepoint parsing > > > 4b8f5c9dfbda perf evsel: Allow evsel__newtp without libtraceevent > > > 7f57057c7884 perf evsel: Add/use accessor for tp_format > > > c27d357d2d4c perf trace-event: Always build trace-event-info.c > > > 20bf7a2cd68a perf trace-event: Constify print arguments > > > f18b07ee2af1 tool api fs: Correctly encode errno for read/write open failures > > > ... > > > $ sudo /tmp/perf/perf test -r 10 Zstd pipe -v > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 84: perf pipe recording and injection test : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > 86: Zstd perf.data compression/decompression : Ok > > > ``` > > > Similarly not as root or if runs-per-test is 100. > > > > > > Agreed that the changes are for tracepoints and these tests aren't for > > > tracepoints, so an interaction wouldn't be expected. If you have a > > > reliable reproduction perhaps you can bisect it. > > > > it says: > > > > 9c10de391840a35ab095b65e9a5c4fad0ac52068 is the first bad commit > > commit 9c10de391840a35ab095b65e9a5c4fad0ac52068 (HEAD) > > Author: Ian Rogers <irogers@google.com> > > Date: Fri Nov 8 10:47:46 2024 -0800 > > > > tool api fs: Correctly encode errno for read/write open failures > > > > Switch from returning -1 to -errno so that callers can determine types > > of failure. > > > > Signed-off-by: Ian Rogers <irogers@google.com> > > Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > > > tools/lib/api/fs/fs.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > So I tried to eye-ball/grep all callers to spot assumptions on the > return value like: > ``` > err = ...__read_int > if (err == -1) > ``` > Didn't spot anything. > > It seems in the test log the record is failing so I ran this under > gdb, set breakpoints on the 3 modified functions and then looked up > the call stack to spot bad return value assumptions. Everything looks > good. > I then tried inject and report, the only file read by these functions > is /proc/sys/kernel/perf_event_paranoid as part of symbol > initialization (nit, this should probably be read lazily and the > restriction should really come from the perf.data file, not the > running system) and those calls look good. > > The change is small and not critical for the series. It improves the > error message when reading the tracepoint id fails. So we could move > forward with the rest of the series, but that could be annoying for > tracepoint users. > > If I had a reproducer I'd revert the 1 line change on each function to > find out which is causing the regression. Once you have that then you > can binary search to find the bad call by using some global counter > where the first 'n' calls use the new return value and the later use > the old value. You can then vary 'n' to binary search and find the bad > caller. > > Is there any chance you can help diagnose this or help me to find the > reproducer? I think this depends on the system configuration. I've debugged it failed when it gets cpu topology: ... read int failed: /sys/devices/system/cpu/cpu112/topology/core_id (errno=2) read int failed: /sys/devices/system/cpu/cpu112/topology/physical_package_id (errno=2) read int failed: /sys/devices/system/cpu/cpu112/topology/die_id (errno=2) ... Maybe it's because # online CPUs != # possible CPUs. $ cat /sys/devices/system/cpu/online 0-63 $ cat /sys/devices/system/cpu/possible 0-127 There's a code like cpu__get_socket_id() to use the return value of sysfs__read_int() directly. And it saves the value to aggr_cpu_id which requires exact match like in aggr_cpu_id__equal(). Thanks, Namhyung
On Mon, Nov 18, 2024 at 10:31 AM Namhyung Kim <namhyung@kernel.org> wrote: > > On Wed, Nov 13, 2024 at 10:06:13AM -0800, Ian Rogers wrote: > > On Sat, Nov 9, 2024 at 9:04 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > On Sat, Nov 09, 2024 at 08:26:20AM -0800, Ian Rogers wrote: > > > > On Fri, Nov 8, 2024 at 10:45 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote: > > > > > > Ian Rogers (6): > > > > > > tool api fs: Correctly encode errno for read/write open failures > > > > > > perf trace-event: Constify print arguments > > > > > > perf trace-event: Always build trace-event-info.c > > > > > > perf evsel: Add/use accessor for tp_format > > > > > > perf evsel: Allow evsel__newtp without libtraceevent > > > > > > perf tests: Enable tests disabled due to tracepoint parsing > > > > > > > > > > After applying this series, I'm seeing some test failures. But I don't > > > > > understand why it affects non-tracepoint events though. > > > > > > > > > > $ sudo ./perf test -v pipe > > > > > --- start --- > > > > > test child forked, pid 3036123 > > > > > 1bde35-1bdecc l noploop > > > > > perf does have symbol 'noploop' > > > > > > > > > > Record+report pipe test > > > > > [ perf record: Woken up 1 times to write data ] > > > > > [ perf record: Captured and wrote 0.210 MB - ] > > > > > [ perf record: Woken up 2 times to write data ] > > > > > [ perf record: Captured and wrote 0.517 MB - ] > > > > > [ perf record: Woken up 2 times to write data ] > > > > > [ perf record: Captured and wrote 0.516 MB - ] > > > > > Record+report pipe test [Success] > > > > > > > > > > Inject -B build-ids test > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > Error: > > > > > failed to process sample > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > Inject -b build-ids test > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > Error: > > > > > failed to process sample > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > Inject --buildid-all build-ids test > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > Error: > > > > > failed to process sample > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > Inject --mmap2-buildid-all build-ids test > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > Error: > > > > > failed to process sample > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > ---- end(-1) ---- > > > > > 84: perf pipe recording and injection test : FAILED! > > > > > > > > > > $ sudo ./perf test -v Zstd > > > > > --- start --- > > > > > test child forked, pid 3036097 > > > > > Collecting compressed record file: > > > > > 500+0 records in > > > > > 500+0 records out > > > > > 256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s > > > > > [ perf record: Woken up 1 times to write data ] > > > > > [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ] > > > > > Checking compressed events stats: > > > > > Couldn't decompress data > > > > > 0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted] > > > > > Error: > > > > > failed to process sample > > > > > ---- end(-1) ---- > > > > > 86: Zstd perf.data compression/decompression : FAILED! > > > > > > > > > > Thanks, > > > > > Namhyung > > > > > > > > I'm not able to reproduce: > > > > ``` > > > > $ git log --oneline |head > > > > a59bca6eb0a6 perf test: Add a runs-per-test flag > > > > 0d0c002eb45c perf tests: Enable tests disabled due to tracepoint parsing > > > > 4b8f5c9dfbda perf evsel: Allow evsel__newtp without libtraceevent > > > > 7f57057c7884 perf evsel: Add/use accessor for tp_format > > > > c27d357d2d4c perf trace-event: Always build trace-event-info.c > > > > 20bf7a2cd68a perf trace-event: Constify print arguments > > > > f18b07ee2af1 tool api fs: Correctly encode errno for read/write open failures > > > > ... > > > > $ sudo /tmp/perf/perf test -r 10 Zstd pipe -v > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 84: perf pipe recording and injection test : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > 86: Zstd perf.data compression/decompression : Ok > > > > ``` > > > > Similarly not as root or if runs-per-test is 100. > > > > > > > > Agreed that the changes are for tracepoints and these tests aren't for > > > > tracepoints, so an interaction wouldn't be expected. If you have a > > > > reliable reproduction perhaps you can bisect it. > > > > > > it says: > > > > > > 9c10de391840a35ab095b65e9a5c4fad0ac52068 is the first bad commit > > > commit 9c10de391840a35ab095b65e9a5c4fad0ac52068 (HEAD) > > > Author: Ian Rogers <irogers@google.com> > > > Date: Fri Nov 8 10:47:46 2024 -0800 > > > > > > tool api fs: Correctly encode errno for read/write open failures > > > > > > Switch from returning -1 to -errno so that callers can determine types > > > of failure. > > > > > > Signed-off-by: Ian Rogers <irogers@google.com> > > > Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > > > > > tools/lib/api/fs/fs.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > So I tried to eye-ball/grep all callers to spot assumptions on the > > return value like: > > ``` > > err = ...__read_int > > if (err == -1) > > ``` > > Didn't spot anything. > > > > It seems in the test log the record is failing so I ran this under > > gdb, set breakpoints on the 3 modified functions and then looked up > > the call stack to spot bad return value assumptions. Everything looks > > good. > > I then tried inject and report, the only file read by these functions > > is /proc/sys/kernel/perf_event_paranoid as part of symbol > > initialization (nit, this should probably be read lazily and the > > restriction should really come from the perf.data file, not the > > running system) and those calls look good. > > > > The change is small and not critical for the series. It improves the > > error message when reading the tracepoint id fails. So we could move > > forward with the rest of the series, but that could be annoying for > > tracepoint users. > > > > If I had a reproducer I'd revert the 1 line change on each function to > > find out which is causing the regression. Once you have that then you > > can binary search to find the bad call by using some global counter > > where the first 'n' calls use the new return value and the later use > > the old value. You can then vary 'n' to binary search and find the bad > > caller. > > > > Is there any chance you can help diagnose this or help me to find the > > reproducer? > > I think this depends on the system configuration. I've debugged it > failed when it gets cpu topology: > > ... > read int failed: /sys/devices/system/cpu/cpu112/topology/core_id (errno=2) > read int failed: /sys/devices/system/cpu/cpu112/topology/physical_package_id (errno=2) > read int failed: /sys/devices/system/cpu/cpu112/topology/die_id (errno=2) > ... > > Maybe it's because # online CPUs != # possible CPUs. > > $ cat /sys/devices/system/cpu/online > 0-63 > $ cat /sys/devices/system/cpu/possible > 0-127 > > There's a code like cpu__get_socket_id() to use the return value of > sysfs__read_int() directly. And it saves the value to aggr_cpu_id which > requires exact match like in aggr_cpu_id__equal(). So this is a latent bug. Are you working on the fix or asking me to do it? I'm not sure why we should fail to describe the topology for offline cores, but if this is a kernel restriction we should probably purge those logical CPUs from the topology. Thanks, Ian
On Mon, Nov 18, 2024 at 11:35:19AM -0800, Ian Rogers wrote: > On Mon, Nov 18, 2024 at 10:31 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > On Wed, Nov 13, 2024 at 10:06:13AM -0800, Ian Rogers wrote: > > > On Sat, Nov 9, 2024 at 9:04 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > > > On Sat, Nov 09, 2024 at 08:26:20AM -0800, Ian Rogers wrote: > > > > > On Fri, Nov 8, 2024 at 10:45 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote: > > > > > > > Ian Rogers (6): > > > > > > > tool api fs: Correctly encode errno for read/write open failures > > > > > > > perf trace-event: Constify print arguments > > > > > > > perf trace-event: Always build trace-event-info.c > > > > > > > perf evsel: Add/use accessor for tp_format > > > > > > > perf evsel: Allow evsel__newtp without libtraceevent > > > > > > > perf tests: Enable tests disabled due to tracepoint parsing > > > > > > > > > > > > After applying this series, I'm seeing some test failures. But I don't > > > > > > understand why it affects non-tracepoint events though. > > > > > > > > > > > > $ sudo ./perf test -v pipe > > > > > > --- start --- > > > > > > test child forked, pid 3036123 > > > > > > 1bde35-1bdecc l noploop > > > > > > perf does have symbol 'noploop' > > > > > > > > > > > > Record+report pipe test > > > > > > [ perf record: Woken up 1 times to write data ] > > > > > > [ perf record: Captured and wrote 0.210 MB - ] > > > > > > [ perf record: Woken up 2 times to write data ] > > > > > > [ perf record: Captured and wrote 0.517 MB - ] > > > > > > [ perf record: Woken up 2 times to write data ] > > > > > > [ perf record: Captured and wrote 0.516 MB - ] > > > > > > Record+report pipe test [Success] > > > > > > > > > > > > Inject -B build-ids test > > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > > Error: > > > > > > failed to process sample > > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > > > Inject -b build-ids test > > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > > Error: > > > > > > failed to process sample > > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > > > Inject --buildid-all build-ids test > > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > > Error: > > > > > > failed to process sample > > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > > > Inject --mmap2-buildid-all build-ids test > > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > > Error: > > > > > > failed to process sample > > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > ---- end(-1) ---- > > > > > > 84: perf pipe recording and injection test : FAILED! > > > > > > > > > > > > $ sudo ./perf test -v Zstd > > > > > > --- start --- > > > > > > test child forked, pid 3036097 > > > > > > Collecting compressed record file: > > > > > > 500+0 records in > > > > > > 500+0 records out > > > > > > 256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s > > > > > > [ perf record: Woken up 1 times to write data ] > > > > > > [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ] > > > > > > Checking compressed events stats: > > > > > > Couldn't decompress data > > > > > > 0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted] > > > > > > Error: > > > > > > failed to process sample > > > > > > ---- end(-1) ---- > > > > > > 86: Zstd perf.data compression/decompression : FAILED! > > > > > > > > > > > > Thanks, > > > > > > Namhyung > > > > > > > > > > I'm not able to reproduce: > > > > > ``` > > > > > $ git log --oneline |head > > > > > a59bca6eb0a6 perf test: Add a runs-per-test flag > > > > > 0d0c002eb45c perf tests: Enable tests disabled due to tracepoint parsing > > > > > 4b8f5c9dfbda perf evsel: Allow evsel__newtp without libtraceevent > > > > > 7f57057c7884 perf evsel: Add/use accessor for tp_format > > > > > c27d357d2d4c perf trace-event: Always build trace-event-info.c > > > > > 20bf7a2cd68a perf trace-event: Constify print arguments > > > > > f18b07ee2af1 tool api fs: Correctly encode errno for read/write open failures > > > > > ... > > > > > $ sudo /tmp/perf/perf test -r 10 Zstd pipe -v > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 84: perf pipe recording and injection test : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > ``` > > > > > Similarly not as root or if runs-per-test is 100. > > > > > > > > > > Agreed that the changes are for tracepoints and these tests aren't for > > > > > tracepoints, so an interaction wouldn't be expected. If you have a > > > > > reliable reproduction perhaps you can bisect it. > > > > > > > > it says: > > > > > > > > 9c10de391840a35ab095b65e9a5c4fad0ac52068 is the first bad commit > > > > commit 9c10de391840a35ab095b65e9a5c4fad0ac52068 (HEAD) > > > > Author: Ian Rogers <irogers@google.com> > > > > Date: Fri Nov 8 10:47:46 2024 -0800 > > > > > > > > tool api fs: Correctly encode errno for read/write open failures > > > > > > > > Switch from returning -1 to -errno so that callers can determine types > > > > of failure. > > > > > > > > Signed-off-by: Ian Rogers <irogers@google.com> > > > > Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > > > > > > > tools/lib/api/fs/fs.c | 6 +++--- > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > So I tried to eye-ball/grep all callers to spot assumptions on the > > > return value like: > > > ``` > > > err = ...__read_int > > > if (err == -1) > > > ``` > > > Didn't spot anything. > > > > > > It seems in the test log the record is failing so I ran this under > > > gdb, set breakpoints on the 3 modified functions and then looked up > > > the call stack to spot bad return value assumptions. Everything looks > > > good. > > > I then tried inject and report, the only file read by these functions > > > is /proc/sys/kernel/perf_event_paranoid as part of symbol > > > initialization (nit, this should probably be read lazily and the > > > restriction should really come from the perf.data file, not the > > > running system) and those calls look good. > > > > > > The change is small and not critical for the series. It improves the > > > error message when reading the tracepoint id fails. So we could move > > > forward with the rest of the series, but that could be annoying for > > > tracepoint users. > > > > > > If I had a reproducer I'd revert the 1 line change on each function to > > > find out which is causing the regression. Once you have that then you > > > can binary search to find the bad call by using some global counter > > > where the first 'n' calls use the new return value and the later use > > > the old value. You can then vary 'n' to binary search and find the bad > > > caller. > > > > > > Is there any chance you can help diagnose this or help me to find the > > > reproducer? > > > > I think this depends on the system configuration. I've debugged it > > failed when it gets cpu topology: > > > > ... > > read int failed: /sys/devices/system/cpu/cpu112/topology/core_id (errno=2) > > read int failed: /sys/devices/system/cpu/cpu112/topology/physical_package_id (errno=2) > > read int failed: /sys/devices/system/cpu/cpu112/topology/die_id (errno=2) > > ... > > > > Maybe it's because # online CPUs != # possible CPUs. > > > > $ cat /sys/devices/system/cpu/online > > 0-63 > > $ cat /sys/devices/system/cpu/possible > > 0-127 > > > > There's a code like cpu__get_socket_id() to use the return value of > > sysfs__read_int() directly. And it saves the value to aggr_cpu_id which > > requires exact match like in aggr_cpu_id__equal(). > > So this is a latent bug. Are you working on the fix or asking me to do > it? I'm not sure why we should fail to describe the topology for > offline cores, but if this is a kernel restriction we should probably > purge those logical CPUs from the topology. It doesn't have the topology directory for the offline CPUs. $ ls /sys/devices/system/cpu/cpu112/ acpi_cppc cpufreq cpuidle crash_notes crash_notes_size driver firmware_node hotplug node0 online power subsystem uevent $ ls /sys/devices/system/cpu/cpu112/topology ls: cannot access '/sys/devices/system/cpu/cpu112/topology': No such file or directory Anyway, I'm not sure if it's really needed for this change. Maybe you can drop the patch 1 from the series for now and tackle it later. Thanks, Namhyung
On Mon, Nov 18, 2024 at 1:57 PM Namhyung Kim <namhyung@kernel.org> wrote: > > On Mon, Nov 18, 2024 at 11:35:19AM -0800, Ian Rogers wrote: > > On Mon, Nov 18, 2024 at 10:31 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > On Wed, Nov 13, 2024 at 10:06:13AM -0800, Ian Rogers wrote: > > > > On Sat, Nov 9, 2024 at 9:04 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > > > > > On Sat, Nov 09, 2024 at 08:26:20AM -0800, Ian Rogers wrote: > > > > > > On Fri, Nov 8, 2024 at 10:45 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > > On Fri, Nov 08, 2024 at 10:47:45AM -0800, Ian Rogers wrote: > > > > > > > > Ian Rogers (6): > > > > > > > > tool api fs: Correctly encode errno for read/write open failures > > > > > > > > perf trace-event: Constify print arguments > > > > > > > > perf trace-event: Always build trace-event-info.c > > > > > > > > perf evsel: Add/use accessor for tp_format > > > > > > > > perf evsel: Allow evsel__newtp without libtraceevent > > > > > > > > perf tests: Enable tests disabled due to tracepoint parsing > > > > > > > > > > > > > > After applying this series, I'm seeing some test failures. But I don't > > > > > > > understand why it affects non-tracepoint events though. > > > > > > > > > > > > > > $ sudo ./perf test -v pipe > > > > > > > --- start --- > > > > > > > test child forked, pid 3036123 > > > > > > > 1bde35-1bdecc l noploop > > > > > > > perf does have symbol 'noploop' > > > > > > > > > > > > > > Record+report pipe test > > > > > > > [ perf record: Woken up 1 times to write data ] > > > > > > > [ perf record: Captured and wrote 0.210 MB - ] > > > > > > > [ perf record: Woken up 2 times to write data ] > > > > > > > [ perf record: Captured and wrote 0.517 MB - ] > > > > > > > [ perf record: Woken up 2 times to write data ] > > > > > > > [ perf record: Captured and wrote 0.516 MB - ] > > > > > > > Record+report pipe test [Success] > > > > > > > > > > > > > > Inject -B build-ids test > > > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > > > Error: > > > > > > > failed to process sample > > > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > > > > > Inject -b build-ids test > > > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > > > Error: > > > > > > > failed to process sample > > > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > > > > > Inject --buildid-all build-ids test > > > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > > > Error: > > > > > > > failed to process sample > > > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > > > > > > > > Inject --mmap2-buildid-all build-ids test > > > > > > > 0xa5c [0x17a4]: failed to process type: 80 > > > > > > > Error: > > > > > > > failed to process sample > > > > > > > Inject build-ids test [Failed - cannot find noploop function in pipe #1] > > > > > > > ---- end(-1) ---- > > > > > > > 84: perf pipe recording and injection test : FAILED! > > > > > > > > > > > > > > $ sudo ./perf test -v Zstd > > > > > > > --- start --- > > > > > > > test child forked, pid 3036097 > > > > > > > Collecting compressed record file: > > > > > > > 500+0 records in > > > > > > > 500+0 records out > > > > > > > 256000 bytes (256 kB, 250 KiB) copied, 0.00169127 s, 151 MB/s > > > > > > > [ perf record: Woken up 1 times to write data ] > > > > > > > [ perf record: Captured and wrote 0.032 MB /tmp/perf.data.KBo, compressed (original 0.004 MB, ratio is 3.324) ] > > > > > > > Checking compressed events stats: > > > > > > > Couldn't decompress data > > > > > > > 0x7ca8 [0x4f2]: failed to process type: 81 [Operation not permitted] > > > > > > > Error: > > > > > > > failed to process sample > > > > > > > ---- end(-1) ---- > > > > > > > 86: Zstd perf.data compression/decompression : FAILED! > > > > > > > > > > > > > > Thanks, > > > > > > > Namhyung > > > > > > > > > > > > I'm not able to reproduce: > > > > > > ``` > > > > > > $ git log --oneline |head > > > > > > a59bca6eb0a6 perf test: Add a runs-per-test flag > > > > > > 0d0c002eb45c perf tests: Enable tests disabled due to tracepoint parsing > > > > > > 4b8f5c9dfbda perf evsel: Allow evsel__newtp without libtraceevent > > > > > > 7f57057c7884 perf evsel: Add/use accessor for tp_format > > > > > > c27d357d2d4c perf trace-event: Always build trace-event-info.c > > > > > > 20bf7a2cd68a perf trace-event: Constify print arguments > > > > > > f18b07ee2af1 tool api fs: Correctly encode errno for read/write open failures > > > > > > ... > > > > > > $ sudo /tmp/perf/perf test -r 10 Zstd pipe -v > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 84: perf pipe recording and injection test : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > 86: Zstd perf.data compression/decompression : Ok > > > > > > ``` > > > > > > Similarly not as root or if runs-per-test is 100. > > > > > > > > > > > > Agreed that the changes are for tracepoints and these tests aren't for > > > > > > tracepoints, so an interaction wouldn't be expected. If you have a > > > > > > reliable reproduction perhaps you can bisect it. > > > > > > > > > > it says: > > > > > > > > > > 9c10de391840a35ab095b65e9a5c4fad0ac52068 is the first bad commit > > > > > commit 9c10de391840a35ab095b65e9a5c4fad0ac52068 (HEAD) > > > > > Author: Ian Rogers <irogers@google.com> > > > > > Date: Fri Nov 8 10:47:46 2024 -0800 > > > > > > > > > > tool api fs: Correctly encode errno for read/write open failures > > > > > > > > > > Switch from returning -1 to -errno so that callers can determine types > > > > > of failure. > > > > > > > > > > Signed-off-by: Ian Rogers <irogers@google.com> > > > > > Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > > > > > > > > > tools/lib/api/fs/fs.c | 6 +++--- > > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > So I tried to eye-ball/grep all callers to spot assumptions on the > > > > return value like: > > > > ``` > > > > err = ...__read_int > > > > if (err == -1) > > > > ``` > > > > Didn't spot anything. > > > > > > > > It seems in the test log the record is failing so I ran this under > > > > gdb, set breakpoints on the 3 modified functions and then looked up > > > > the call stack to spot bad return value assumptions. Everything looks > > > > good. > > > > I then tried inject and report, the only file read by these functions > > > > is /proc/sys/kernel/perf_event_paranoid as part of symbol > > > > initialization (nit, this should probably be read lazily and the > > > > restriction should really come from the perf.data file, not the > > > > running system) and those calls look good. > > > > > > > > The change is small and not critical for the series. It improves the > > > > error message when reading the tracepoint id fails. So we could move > > > > forward with the rest of the series, but that could be annoying for > > > > tracepoint users. > > > > > > > > If I had a reproducer I'd revert the 1 line change on each function to > > > > find out which is causing the regression. Once you have that then you > > > > can binary search to find the bad call by using some global counter > > > > where the first 'n' calls use the new return value and the later use > > > > the old value. You can then vary 'n' to binary search and find the bad > > > > caller. > > > > > > > > Is there any chance you can help diagnose this or help me to find the > > > > reproducer? > > > > > > I think this depends on the system configuration. I've debugged it > > > failed when it gets cpu topology: > > > > > > ... > > > read int failed: /sys/devices/system/cpu/cpu112/topology/core_id (errno=2) > > > read int failed: /sys/devices/system/cpu/cpu112/topology/physical_package_id (errno=2) > > > read int failed: /sys/devices/system/cpu/cpu112/topology/die_id (errno=2) > > > ... > > > > > > Maybe it's because # online CPUs != # possible CPUs. > > > > > > $ cat /sys/devices/system/cpu/online > > > 0-63 > > > $ cat /sys/devices/system/cpu/possible > > > 0-127 > > > > > > There's a code like cpu__get_socket_id() to use the return value of > > > sysfs__read_int() directly. And it saves the value to aggr_cpu_id which > > > requires exact match like in aggr_cpu_id__equal(). > > > > So this is a latent bug. Are you working on the fix or asking me to do > > it? I'm not sure why we should fail to describe the topology for > > offline cores, but if this is a kernel restriction we should probably > > purge those logical CPUs from the topology. > > It doesn't have the topology directory for the offline CPUs. > > $ ls /sys/devices/system/cpu/cpu112/ > acpi_cppc cpufreq cpuidle crash_notes crash_notes_size driver firmware_node hotplug node0 online power subsystem uevent > > $ ls /sys/devices/system/cpu/cpu112/topology > ls: cannot access '/sys/devices/system/cpu/cpu112/topology': No such file or directory > > Anyway, I'm not sure if it's really needed for this change. Maybe you > can drop the patch 1 from the series for now and tackle it later. So this would be a regression. Currently in add_tracepoint the calls look like: add_tracepoint |->evsel__new_tp_idx |-> tp_format |->filename__read_str The filename__read_str error is passed in add_tracepoint for errors to tracepoint_error that handles EACCES, ENOENT and default as separate cases: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c#n466 The code changes filename__read_str to filename__read_int but if that can only return -1 on error then that'll make all tracepoint open errors look like EPERM. Based on your instructions I was able to reproduce the problem and have a fix to just force the error value to always be -1. I'll add that to the series and send as v5. Thanks, Ian
© 2016 - 2024 Red Hat, Inc.