[PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case

Tengda Wu posted 1 patch 2 months, 4 weeks ago
.../selftests/ftrace/test.d/event/subsystem-enable.tc     | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
[PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case
Posted by Tengda Wu 2 months, 4 weeks ago
The first 100 lines of trace output don't always contain 3 or more
distinct events. In busy systems, they may be dominated by repetitive
events like sched_stat_runtime, causing the `$count -lt 3` check to
fail. Example trace:

$ head -n 100 trace | grep -v ^#
 systemd-timesyn-266     [006] d.h2.   738.778482: sched_stat_runtime: comm=systemd-timesyn pid=266 runtime=976854 [ns]
      ftracetest-8751    [001] d.h2.   738.778512: sched_stat_runtime: comm=ftracetest pid=8751 runtime=938335 [ns]
 systemd-timesyn-266     [006] d.h1.   738.779531: sched_stat_runtime: comm=systemd-timesyn pid=266 runtime=1044284 [ns]
      ftracetest-8751    [001] d.h2.   738.779541: sched_stat_runtime: comm=ftracetest pid=8751 runtime=1028575 [ns]
         systemd-1       [007] d.h5.   738.779657: sched_stat_runtime: comm=systemd pid=1 runtime=642624 [ns]
[...]

With trace cleared, simply check `$count -eq 0` to confirm subsystem
enablement, just like toplevel-enable.tc does.

Fixes: 1a4ea83a6e67 ("selftests/ftrace: Limit length in subsystem-enable tests")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
---
 .../selftests/ftrace/test.d/event/subsystem-enable.tc     | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
index b7c8f29c09a9..3a28adc7b727 100644
--- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
@@ -19,8 +19,8 @@ echo 'sched:*' > set_event
 yield
 
 count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
-if [ $count -lt 3 ]; then
-    fail "at least fork, exec and exit events should be recorded"
+if [ $count -eq 0 ]; then
+    fail "none of scheduler events are recorded"
 fi
 
 do_reset
@@ -30,8 +30,8 @@ echo 1 > events/sched/enable
 yield
 
 count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
-if [ $count -lt 3 ]; then
-    fail "at least fork, exec and exit events should be recorded"
+if [ $count -eq 0 ]; then
+    fail "none of scheduler events are recorded"
 fi
 
 do_reset
-- 
2.34.1
Re: [PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case
Posted by Steven Rostedt 2 months, 4 weeks ago
On Thu, 10 Jul 2025 13:01:34 +0000
Tengda Wu <wutengda@huaweicloud.com> wrote:


> 
> Fixes: 1a4ea83a6e67 ("selftests/ftrace: Limit length in subsystem-enable tests")
> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
> ---
>  .../selftests/ftrace/test.d/event/subsystem-enable.tc     | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> index b7c8f29c09a9..3a28adc7b727 100644
> --- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> +++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> @@ -19,8 +19,8 @@ echo 'sched:*' > set_event
>  yield
>  
>  count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
> -if [ $count -lt 3 ]; then
> -    fail "at least fork, exec and exit events should be recorded"
> +if [ $count -eq 0 ]; then
> +    fail "none of scheduler events are recorded"
>  fi
>  
>  do_reset
> @@ -30,8 +30,8 @@ echo 1 > events/sched/enable
>  yield
>  
>  count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
> -if [ $count -lt 3 ]; then
> -    fail "at least fork, exec and exit events should be recorded"
> +if [ $count -eq 0 ]; then
> +    fail "none of scheduler events are recorded"

So if there's a bug that causes the system enable to only enable a
single event, this will no longer catch it?

I rather not let the slide.

Can you test this to see if this works for you?

-- Steve

diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
index b7c8f29c09a9..46a9e6d92730 100644
--- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
+++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
@@ -14,11 +14,32 @@ fail() { #msg
     exit_fail
 }
 
+check_unique() {
+    cat trace_pipe | grep -v '^#' | awk '
+	BEGIN { cnt = 0; }
+	{
+	    for (i = 0; i < cnt; i++) {
+		if (event[i] == $5) {
+		    break;
+		}
+	    }
+	    if (i == cnt) {
+		event[cnt++] = $5;
+		if (cnt > 2) {
+		    exit;
+		}
+	    }
+	}
+	END {
+	    printf "%d", cnt;
+	}'
+}
+
 echo 'sched:*' > set_event
 
 yield
 
-count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
+count=`check_unique`
 if [ $count -lt 3 ]; then
     fail "at least fork, exec and exit events should be recorded"
 fi
@@ -29,7 +50,7 @@ echo 1 > events/sched/enable
 
 yield
 
-count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
+count=`check_unique`
 if [ $count -lt 3 ]; then
     fail "at least fork, exec and exit events should be recorded"
 fi
Re: [PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case
Posted by Tengda Wu 2 months, 2 weeks ago

On 2025/7/11 3:34, Steven Rostedt wrote:
> On Thu, 10 Jul 2025 13:01:34 +0000
> Tengda Wu <wutengda@huaweicloud.com> wrote:
> 
> 
>>
>> Fixes: 1a4ea83a6e67 ("selftests/ftrace: Limit length in subsystem-enable tests")
>> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
>> ---
>>  .../selftests/ftrace/test.d/event/subsystem-enable.tc     | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
>> index b7c8f29c09a9..3a28adc7b727 100644
>> --- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
>> +++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
>> @@ -19,8 +19,8 @@ echo 'sched:*' > set_event
>>  yield
>>  
>>  count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
>> -if [ $count -lt 3 ]; then
>> -    fail "at least fork, exec and exit events should be recorded"
>> +if [ $count -eq 0 ]; then
>> +    fail "none of scheduler events are recorded"
>>  fi
>>  
>>  do_reset
>> @@ -30,8 +30,8 @@ echo 1 > events/sched/enable
>>  yield
>>  
>>  count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
>> -if [ $count -lt 3 ]; then
>> -    fail "at least fork, exec and exit events should be recorded"
>> +if [ $count -eq 0 ]; then
>> +    fail "none of scheduler events are recorded"
> 
> So if there's a bug that causes the system enable to only enable a
> single event, this will no longer catch it?
> 
> I rather not let the slide.
> 
> Can you test this to see if this works for you?
> 
> -- Steve
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> index b7c8f29c09a9..46a9e6d92730 100644
> --- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> +++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> @@ -14,11 +14,32 @@ fail() { #msg
>      exit_fail
>  }
>  
> +check_unique() {
> +    cat trace_pipe | grep -v '^#' | awk '
> +	BEGIN { cnt = 0; }
> +	{
> +	    for (i = 0; i < cnt; i++) {
> +		if (event[i] == $5) {
> +		    break;
> +		}
> +	    }
> +	    if (i == cnt) {
> +		event[cnt++] = $5;
> +		if (cnt > 2) {
> +		    exit;
> +		}
> +	    }
> +	}
> +	END {
> +	    printf "%d", cnt;
> +	}'
> +}
> +
>  echo 'sched:*' > set_event
>  
>  yield
>  
> -count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
> +count=`check_unique`
>  if [ $count -lt 3 ]; then
>      fail "at least fork, exec and exit events should be recorded"
>  fi
> @@ -29,7 +50,7 @@ echo 1 > events/sched/enable
>  
>  yield
>  
> -count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
> +count=`check_unique`
>  if [ $count -lt 3 ]; then
>      fail "at least fork, exec and exit events should be recorded"
>  fi


Hi, Steve

I noticed this patch hasn't been merged yet. Do you plan to merge it soon?
If you're too busy, would you like me to help submit it instead?

Regards,
Tengda
Re: [PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case
Posted by Steven Rostedt 2 months, 2 weeks ago
On Mon, 21 Jul 2025 09:54:22 +0800
Tengda Wu <wutengda@huaweicloud.com> wrote:

> I noticed this patch hasn't been merged yet. Do you plan to merge it soon?
> If you're too busy, would you like me to help submit it instead?

Nah, I simply forgot about it. Let me go write up a patch and send it
out.

-- Steve
Re: [PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case
Posted by Tengda Wu 2 months, 1 week ago

On 2025/7/22 1:29, Steven Rostedt wrote:
> On Mon, 21 Jul 2025 09:54:22 +0800
> Tengda Wu <wutengda@huaweicloud.com> wrote:
> 
>> I noticed this patch hasn't been merged yet. Do you plan to merge it soon?
>> If you're too busy, would you like me to help submit it instead?
> 
> Nah, I simply forgot about it. Let me go write up a patch and send it
> out.
> 
> -- Steve

Just got the patch application notice, thanks.
-- Tengda
Re: [PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case
Posted by Tengda Wu 2 months, 4 weeks ago
On 2025/7/11 3:34, Steven Rostedt wrote:
> On Thu, 10 Jul 2025 13:01:34 +0000
> Tengda Wu <wutengda@huaweicloud.com> wrote:
> 
> 
>>
>> Fixes: 1a4ea83a6e67 ("selftests/ftrace: Limit length in subsystem-enable tests")
>> Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
>> ---
>>  .../selftests/ftrace/test.d/event/subsystem-enable.tc     | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
>> index b7c8f29c09a9..3a28adc7b727 100644
>> --- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
>> +++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
>> @@ -19,8 +19,8 @@ echo 'sched:*' > set_event
>>  yield
>>  
>>  count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
>> -if [ $count -lt 3 ]; then
>> -    fail "at least fork, exec and exit events should be recorded"
>> +if [ $count -eq 0 ]; then
>> +    fail "none of scheduler events are recorded"
>>  fi
>>  
>>  do_reset
>> @@ -30,8 +30,8 @@ echo 1 > events/sched/enable
>>  yield
>>  
>>  count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
>> -if [ $count -lt 3 ]; then
>> -    fail "at least fork, exec and exit events should be recorded"
>> +if [ $count -eq 0 ]; then
>> +    fail "none of scheduler events are recorded"
> 
> So if there's a bug that causes the system enable to only enable a
> single event, this will no longer catch it?
> 
> I rather not let the slide.
> 
> Can you test this to see if this works for you?
> 
> -- Steve
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> index b7c8f29c09a9..46a9e6d92730 100644
> --- a/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> +++ b/tools/testing/selftests/ftrace/test.d/event/subsystem-enable.tc
> @@ -14,11 +14,32 @@ fail() { #msg
>      exit_fail
>  }
>  
> +check_unique() {
> +    cat trace_pipe | grep -v '^#' | awk '
> +	BEGIN { cnt = 0; }
> +	{
> +	    for (i = 0; i < cnt; i++) {
> +		if (event[i] == $5) {
> +		    break;
> +		}
> +	    }
> +	    if (i == cnt) {
> +		event[cnt++] = $5;
> +		if (cnt > 2) {
> +		    exit;
> +		}
> +	    }
> +	}
> +	END {
> +	    printf "%d", cnt;
> +	}'
> +}
> +
>  echo 'sched:*' > set_event
>  
>  yield
>  
> -count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
> +count=`check_unique`
>  if [ $count -lt 3 ]; then
>      fail "at least fork, exec and exit events should be recorded"
>  fi
> @@ -29,7 +50,7 @@ echo 1 > events/sched/enable
>  
>  yield
>  
> -count=`head -n 100 trace | grep -v ^# | awk '{ print $5 }' | sort -u | wc -l`
> +count=`check_unique`
>  if [ $count -lt 3 ]; then
>      fail "at least fork, exec and exit events should be recorded"
>  fi

The patch works well - after ~50 test iterations, we haven't observed any
recurrence of the test case failures.

However, I'm concerned that using 'cat trace_pipe' (like the original
'cat trace' method) could bring back the stopping problem [1] on slower
systems.

Could a slow trace_pipe reader (slower than sched event generation rate)
reintroduce the original race condition?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1a4ea83a6e67f1415a1f17c1af5e9c814c882bb5

Some test details:

$ ./ftracetest -vvv subsystem-enable.tc
[...]
+ echo sched:*
+ yield
+ ping 127.0.0.1 -c 1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.538 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 1ms
rtt min/avg/max/mdev = 0.538/0.538/0.538/0.000 ms
+ check_unique
+ cat trace_pipe
+ grep -v ^#
+ awk 
        BEGIN { cnt = 0; }
        {
            for (i = 0; i < cnt; i++) {
                if (event[i] == $5) {
                    break;
                }
            }
            if (i == cnt) {
                event[cnt++] = $5;
                if (cnt > 2) {
                    exit;
                }
            }
        }
        END {
            printf "%d", cnt;
        }
+ count=3
+ [ 3 -lt 3 ]
+ do_reset
[...]

Regards,
Tengda
Re: [PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case
Posted by Steven Rostedt 2 months, 4 weeks ago

On July 10, 2025 10:48:54 PM EDT, Tengda Wu <wutengda@huaweicloud.com> wrote:
>

>The patch works well - after ~50 test iterations, we haven't observed any
>recurrence of the test case failures.
>
>However, I'm concerned that using 'cat trace_pipe' (like the original
>'cat trace' method) could bring back the stopping problem [1] on slower
>systems.
>
>Could a slow trace_pipe reader (slower than sched event generation rate)
>reintroduce the original race condition?
>

Only if it doesn't find three different events, in which case the test would fail regardless.

The awk script exits out as soon as it finds 3: unique events. It won't go forever, even on slower machines.

-- Steve

>[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1a4ea83a6e67f1415a1f17c1af5e9c814c882bb5
>
>Some test details:
>
>$ ./ftracetest -vvv subsystem-enable.tc
>[...]
>+ echo sched:*
>+ yield
>+ ping 127.0.0.1 -c 1
>PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
>64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.538 ms
>
>--- 127.0.0.1 ping statistics ---
>1 packets transmitted, 1 received, 0% packet loss, time 1ms
>rtt min/avg/max/mdev = 0.538/0.538/0.538/0.000 ms
>+ check_unique
>+ cat trace_pipe
>+ grep -v ^#
>+ awk 
>        BEGIN { cnt = 0; }
>        {
>            for (i = 0; i < cnt; i++) {
>                if (event[i] == $5) {
>                    break;
>                }
>            }
>            if (i == cnt) {
>                event[cnt++] = $5;
>                if (cnt > 2) {
>                    exit;
>                }
>            }
>        }
>        END {
>            printf "%d", cnt;
>        }
>+ count=3
>+ [ 3 -lt 3 ]
>+ do_reset
>[...]
>
>Regards,
>Tengda
Re: [PATCH -next] selftests/ftrace: Prevent potential failure in subsystem-enable test case
Posted by Tengda Wu 2 months, 4 weeks ago

On 2025/7/11 11:22, Steven Rostedt wrote:
> 
> 
> On July 10, 2025 10:48:54 PM EDT, Tengda Wu <wutengda@huaweicloud.com> wrote:
>>
> 
>> The patch works well - after ~50 test iterations, we haven't observed any
>> recurrence of the test case failures.
>>
>> However, I'm concerned that using 'cat trace_pipe' (like the original
>> 'cat trace' method) could bring back the stopping problem [1] on slower
>> systems.
>>
>> Could a slow trace_pipe reader (slower than sched event generation rate)
>> reintroduce the original race condition?
>>
> 
> Only if it doesn't find three different events, in which case the test would fail regardless.
> 
> The awk script exits out as soon as it finds 3: unique events. It won't go forever, even on slower machines.
> 
> -- Steve

Got it, thank you for explaining this. I have no further questions.

-- Tengda

> 
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1a4ea83a6e67f1415a1f17c1af5e9c814c882bb5
>>
>> Some test details:
>>
>> $ ./ftracetest -vvv subsystem-enable.tc
>> [...]
>> + echo sched:*
>> + yield
>> + ping 127.0.0.1 -c 1
>> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
>> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.538 ms
>>
>> --- 127.0.0.1 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 1ms
>> rtt min/avg/max/mdev = 0.538/0.538/0.538/0.000 ms
>> + check_unique
>> + cat trace_pipe
>> + grep -v ^#
>> + awk 
>>        BEGIN { cnt = 0; }
>>        {
>>            for (i = 0; i < cnt; i++) {
>>                if (event[i] == $5) {
>>                    break;
>>                }
>>            }
>>            if (i == cnt) {
>>                event[cnt++] = $5;
>>                if (cnt > 2) {
>>                    exit;
>>                }
>>            }
>>        }
>>        END {
>>            printf "%d", cnt;
>>        }
>> + count=3
>> + [ 3 -lt 3 ]
>> + do_reset
>> [...]
>>
>> Regards,
>> Tengda