[PATCH v1] perf trace: Fix inconsistent failures in perf trace's tests

Howard Chu posted 1 patch 10 months, 1 week ago
tools/perf/tests/shell/trace_btf_general.sh | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
[PATCH v1] perf trace: Fix inconsistent failures in perf trace's tests
Posted by Howard Chu 10 months, 1 week ago
There are two failures that frequently occur in perf trace's tests. One
is the failure of 'perf trace BTF general tests'; The other one is the
failure of 'perf trace record and replay', which, when run independently,
always succeeds.

The root cause of the first failure, is that perf trace may give two types
of output, depending on whether the comm of a process can be parsed, for
example:

mv/312705 renameat2(CWD, "/tmp/file1_VJOT", CWD, "/tmp/file2_VJOT", NOREPLACE) = 0
:312774/312774 renameat2(CWD, "/tmp/file1_5YcE", CWD, "/tmp/file2_5YcE", NOREPLACE) = 0

In the test, however, grep is always looking for the comm 'mv', which
sometimes may not be present.

The cause of the second failure is that 'perf trace BTF general tests'
modifies the perf config, and because tests are run concurrently,
subsequent tests use the modified perf config before the BTF general
test can restore the original config. Mark the BTF general tests as
exclusive will solve the failure.

Signed-off-by: Howard Chu <howardchu95@gmail.com>
Fixes: 0255338d6975 ("perf trace: Add tests for BTF general augmentation")
---
 tools/perf/tests/shell/trace_btf_general.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/shell/trace_btf_general.sh b/tools/perf/tests/shell/trace_btf_general.sh
index a25d8744695e..4833618a116b 100755
--- a/tools/perf/tests/shell/trace_btf_general.sh
+++ b/tools/perf/tests/shell/trace_btf_general.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-# perf trace BTF general tests
+# perf trace BTF general tests (exclusive)
 # SPDX-License-Identifier: GPL-2.0
 
 err=0
@@ -29,7 +29,7 @@ check_vmlinux() {
 trace_test_string() {
   echo "Testing perf trace's string augmentation"
   if ! perf trace -e renameat* --max-events=1 -- mv ${file1} ${file2} 2>&1 | \
-    grep -q -E "^mv/[0-9]+ renameat(2)?\(.*, \"${file1}\", .*, \"${file2}\", .*\) += +[0-9]+$"
+    grep -q -E "^.*/[0-9]+ renameat(2)?\(.*, \"${file1}\", .*, \"${file2}\", .*\) += +[0-9]+$"
   then
     echo "String augmentation test failed"
     err=1
@@ -40,7 +40,7 @@ trace_test_buffer() {
   echo "Testing perf trace's buffer augmentation"
   # echo will insert a newline (\10) at the end of the buffer
   if ! perf trace -e write --max-events=1 -- echo "${buffer}" 2>&1 | \
-    grep -q -E "^echo/[0-9]+ write\([0-9]+, ${buffer}.*, [0-9]+\) += +[0-9]+$"
+    grep -q -E "^.*/[0-9]+ write\([0-9]+, ${buffer}.*, [0-9]+\) += +[0-9]+$"
   then
     echo "Buffer augmentation test failed"
     err=1
@@ -50,7 +50,7 @@ trace_test_buffer() {
 trace_test_struct_btf() {
   echo "Testing perf trace's struct augmentation"
   if ! perf trace -e clock_nanosleep --force-btf --max-events=1 -- sleep 1 2>&1 | \
-    grep -q -E "^sleep/[0-9]+ clock_nanosleep\(0, 0, \{1,\}, 0x[0-9a-f]+\) += +[0-9]+$"
+    grep -q -E "^.*/[0-9]+ clock_nanosleep\(0, 0, \{1,\}, 0x[0-9a-f]+\) += +[0-9]+$"
   then
     echo "BTF struct augmentation test failed"
     err=1
-- 
2.45.2
Re: [PATCH v1] perf trace: Fix inconsistent failures in perf trace's tests
Posted by Namhyung Kim 10 months, 1 week ago
On Thu, Apr 03, 2025 at 09:16:52PM -0700, Howard Chu wrote:
> There are two failures that frequently occur in perf trace's tests. One
> is the failure of 'perf trace BTF general tests'; The other one is the
> failure of 'perf trace record and replay', which, when run independently,
> always succeeds.
> 
> The root cause of the first failure, is that perf trace may give two types
> of output, depending on whether the comm of a process can be parsed, for
> example:
> 
> mv/312705 renameat2(CWD, "/tmp/file1_VJOT", CWD, "/tmp/file2_VJOT", NOREPLACE) = 0
> :312774/312774 renameat2(CWD, "/tmp/file1_5YcE", CWD, "/tmp/file2_5YcE", NOREPLACE) = 0
> 
> In the test, however, grep is always looking for the comm 'mv', which
> sometimes may not be present.
> 
> The cause of the second failure is that 'perf trace BTF general tests'
> modifies the perf config, and because tests are run concurrently,
> subsequent tests use the modified perf config before the BTF general
> test can restore the original config. Mark the BTF general tests as
> exclusive will solve the failure.

I'm not sure if the config is the cause of the failure.  Also I don't
see it restored.

IIUC the export only affects child processes from the current shell.
So other tests running in parallel won't see the config change.

But still, there should be something to affect the behavior.  It's
strange to miss the task name in COMM record.

I also confirm that running the test serially fixes it.

Thanks,
Namhyung

> 
> Signed-off-by: Howard Chu <howardchu95@gmail.com>
> Fixes: 0255338d6975 ("perf trace: Add tests for BTF general augmentation")
> ---
>  tools/perf/tests/shell/trace_btf_general.sh | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/trace_btf_general.sh b/tools/perf/tests/shell/trace_btf_general.sh
> index a25d8744695e..4833618a116b 100755
> --- a/tools/perf/tests/shell/trace_btf_general.sh
> +++ b/tools/perf/tests/shell/trace_btf_general.sh
> @@ -1,5 +1,5 @@
>  #!/bin/bash
> -# perf trace BTF general tests
> +# perf trace BTF general tests (exclusive)
>  # SPDX-License-Identifier: GPL-2.0
>  
>  err=0
> @@ -29,7 +29,7 @@ check_vmlinux() {
>  trace_test_string() {
>    echo "Testing perf trace's string augmentation"
>    if ! perf trace -e renameat* --max-events=1 -- mv ${file1} ${file2} 2>&1 | \
> -    grep -q -E "^mv/[0-9]+ renameat(2)?\(.*, \"${file1}\", .*, \"${file2}\", .*\) += +[0-9]+$"
> +    grep -q -E "^.*/[0-9]+ renameat(2)?\(.*, \"${file1}\", .*, \"${file2}\", .*\) += +[0-9]+$"
>    then
>      echo "String augmentation test failed"
>      err=1
> @@ -40,7 +40,7 @@ trace_test_buffer() {
>    echo "Testing perf trace's buffer augmentation"
>    # echo will insert a newline (\10) at the end of the buffer
>    if ! perf trace -e write --max-events=1 -- echo "${buffer}" 2>&1 | \
> -    grep -q -E "^echo/[0-9]+ write\([0-9]+, ${buffer}.*, [0-9]+\) += +[0-9]+$"
> +    grep -q -E "^.*/[0-9]+ write\([0-9]+, ${buffer}.*, [0-9]+\) += +[0-9]+$"
>    then
>      echo "Buffer augmentation test failed"
>      err=1
> @@ -50,7 +50,7 @@ trace_test_buffer() {
>  trace_test_struct_btf() {
>    echo "Testing perf trace's struct augmentation"
>    if ! perf trace -e clock_nanosleep --force-btf --max-events=1 -- sleep 1 2>&1 | \
> -    grep -q -E "^sleep/[0-9]+ clock_nanosleep\(0, 0, \{1,\}, 0x[0-9a-f]+\) += +[0-9]+$"
> +    grep -q -E "^.*/[0-9]+ clock_nanosleep\(0, 0, \{1,\}, 0x[0-9a-f]+\) += +[0-9]+$"
>    then
>      echo "BTF struct augmentation test failed"
>      err=1
> -- 
> 2.45.2
>
Re: [PATCH v1] perf trace: Fix inconsistent failures in perf trace's tests
Posted by Howard Chu 10 months, 1 week ago
Hello,

On Fri, Apr 4, 2025 at 11:02 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Apr 03, 2025 at 09:16:52PM -0700, Howard Chu wrote:
> > There are two failures that frequently occur in perf trace's tests. One
> > is the failure of 'perf trace BTF general tests'; The other one is the
> > failure of 'perf trace record and replay', which, when run independently,
> > always succeeds.
> >
> > The root cause of the first failure, is that perf trace may give two types
> > of output, depending on whether the comm of a process can be parsed, for
> > example:
> >
> > mv/312705 renameat2(CWD, "/tmp/file1_VJOT", CWD, "/tmp/file2_VJOT", NOREPLACE) = 0
> > :312774/312774 renameat2(CWD, "/tmp/file1_5YcE", CWD, "/tmp/file2_5YcE", NOREPLACE) = 0
> >
> > In the test, however, grep is always looking for the comm 'mv', which
> > sometimes may not be present.
> >
> > The cause of the second failure is that 'perf trace BTF general tests'
> > modifies the perf config, and because tests are run concurrently,
> > subsequent tests use the modified perf config before the BTF general
> > test can restore the original config. Mark the BTF general tests as
> > exclusive will solve the failure.

Yeah, I was wrong — I now suspect it has something to do with two
augmented_syscall BPF programs running at the same time. I noticed the
offcpu test has '(exclusive)' too. Do you think it's a BPF triggering
issue? Like, if test A is trying to capture the clock_nanosleep
syscall and test B is also trying to capture it, could it be that A
ends up capturing both calls while B gets nothing? Just asking before
I dig in further. :)

>
> I'm not sure if the config is the cause of the failure.  Also I don't
> see it restored.
>
> IIUC the export only affects child processes from the current shell.
> So other tests running in parallel won't see the config change.
>
> But still, there should be something to affect the behavior.  It's
> strange to miss the task name in COMM record.

I can look into that too.

>
> I also confirm that running the test serially fixes it.
>
> Thanks,
> Namhyung

Thanks,
Howard
Re: [PATCH v1] perf trace: Fix inconsistent failures in perf trace's tests
Posted by Namhyung Kim 10 months ago
On Fri, Apr 04, 2025 at 07:12:26PM -0700, Howard Chu wrote:
> Hello,
> 
> On Fri, Apr 4, 2025 at 11:02 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Thu, Apr 03, 2025 at 09:16:52PM -0700, Howard Chu wrote:
> > > There are two failures that frequently occur in perf trace's tests. One
> > > is the failure of 'perf trace BTF general tests'; The other one is the
> > > failure of 'perf trace record and replay', which, when run independently,
> > > always succeeds.
> > >
> > > The root cause of the first failure, is that perf trace may give two types
> > > of output, depending on whether the comm of a process can be parsed, for
> > > example:
> > >
> > > mv/312705 renameat2(CWD, "/tmp/file1_VJOT", CWD, "/tmp/file2_VJOT", NOREPLACE) = 0
> > > :312774/312774 renameat2(CWD, "/tmp/file1_5YcE", CWD, "/tmp/file2_5YcE", NOREPLACE) = 0
> > >
> > > In the test, however, grep is always looking for the comm 'mv', which
> > > sometimes may not be present.
> > >
> > > The cause of the second failure is that 'perf trace BTF general tests'
> > > modifies the perf config, and because tests are run concurrently,
> > > subsequent tests use the modified perf config before the BTF general
> > > test can restore the original config. Mark the BTF general tests as
> > > exclusive will solve the failure.
> 
> Yeah, I was wrong — I now suspect it has something to do with two
> augmented_syscall BPF programs running at the same time. I noticed the
> offcpu test has '(exclusive)' too. Do you think it's a BPF triggering
> issue? Like, if test A is trying to capture the clock_nanosleep
> syscall and test B is also trying to capture it, could it be that A
> ends up capturing both calls while B gets nothing? Just asking before
> I dig in further. :)

I don't think that will happen.  I suspect it may have a timing issue
like two threads are racing to setup BPF for syscall tracing and
setting up the second thread's BPF will disable the first one for a
short amount of time.  And 'mv' runs very shortly so it finished while
it's disabled.  But it's just a wild guess.

Thanks,
Namhyung

> 
> >
> > I'm not sure if the config is the cause of the failure.  Also I don't
> > see it restored.
> >
> > IIUC the export only affects child processes from the current shell.
> > So other tests running in parallel won't see the config change.
> >
> > But still, there should be something to affect the behavior.  It's
> > strange to miss the task name in COMM record.
> 
> I can look into that too.
> 
> >
> > I also confirm that running the test serially fixes it.
> >
> > Thanks,
> > Namhyung
> 
> Thanks,
> Howard
Re: [PATCH v1] perf trace: Fix inconsistent failures in perf trace's tests
Posted by Howard Chu 10 months, 1 week ago
Hello Namhyung,

On Fri, Apr 4, 2025 at 11:02 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Apr 03, 2025 at 09:16:52PM -0700, Howard Chu wrote:
> > There are two failures that frequently occur in perf trace's tests. One
> > is the failure of 'perf trace BTF general tests'; The other one is the
> > failure of 'perf trace record and replay', which, when run independently,
> > always succeeds.
> >
> > The root cause of the first failure, is that perf trace may give two types
> > of output, depending on whether the comm of a process can be parsed, for
> > example:
> >
> > mv/312705 renameat2(CWD, "/tmp/file1_VJOT", CWD, "/tmp/file2_VJOT", NOREPLACE) = 0
> > :312774/312774 renameat2(CWD, "/tmp/file1_5YcE", CWD, "/tmp/file2_5YcE", NOREPLACE) = 0
> >
> > In the test, however, grep is always looking for the comm 'mv', which
> > sometimes may not be present.
> >
> > The cause of the second failure is that 'perf trace BTF general tests'
> > modifies the perf config, and because tests are run concurrently,
> > subsequent tests use the modified perf config before the BTF general
> > test can restore the original config. Mark the BTF general tests as
> > exclusive will solve the failure.
>
> I'm not sure if the config is the cause of the failure.  Also I don't
> see it restored.

Yeah because (exclusive) got it fixed, I didn't restore the
environment variable, just deleted the config file.

>
> IIUC the export only affects child processes from the current shell.
> So other tests running in parallel won't see the config change.

I'll look into it thanks.

>
> But still, there should be something to affect the behavior.  It's
> strange to miss the task name in COMM record.
>
> I also confirm that running the test serially fixes it.

Thanks, do you want a tested-by?

Thanks,
Howard

>
> Thanks,
> Namhyung