[PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.

Sebastian Andrzej Siewior posted 4 patches 1 year, 11 months ago
There is a newer version of this series
[PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Sebastian Andrzej Siewior 1 year, 11 months ago
Hi,

Arnaldo reported that "perf test sigtrap" fails on PREEMPT_RT. Sending
the signal gets delayed until event_sched_out() which then uses
task_work_add() for its delivery. This breaks on PREEMPT_RT because the
signal is delivered with disabled preemption.

While looking at this, I also stumbled upon __perf_pending_irq() which
requires disabled interrupts but this is not the case on PREEMPT_RT.

This series aim to address both issues while not introducing a new issue
at the same time ;)
Any testing is appreciated.

v1…v2: https://lore.kernel.org/all/20240308175810.2894694-1-bigeasy@linutronix.de/
    - Marco pointed me to the testsuite that showed two problems:
      - Delayed task_work from NMI / missing events.
        Fixed by triggering dummy irq_work to enforce an interrupt for
	the exit-to-userland path which checks task_work
      - Increased ref-count on clean up/ during exec.
        Mostly addressed by the former change. There is still a window
	if the NMI occurs during execve(). This is addressed by removing
	the task_work before free_event().
      The testsuite (remove_on_exec) fails sometimes if the event/
      SIGTRAP is sent before the sighandler is installed.

Sebastian
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Marco Elver 1 year, 11 months ago
On Tue, 12 Mar 2024 at 19:08, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> Hi,
>
> Arnaldo reported that "perf test sigtrap" fails on PREEMPT_RT. Sending
> the signal gets delayed until event_sched_out() which then uses
> task_work_add() for its delivery. This breaks on PREEMPT_RT because the
> signal is delivered with disabled preemption.
>
> While looking at this, I also stumbled upon __perf_pending_irq() which
> requires disabled interrupts but this is not the case on PREEMPT_RT.
>
> This series aim to address both issues while not introducing a new issue
> at the same time ;)
> Any testing is appreciated.
>
> v1…v2: https://lore.kernel.org/all/20240308175810.2894694-1-bigeasy@linutronix.de/
>     - Marco pointed me to the testsuite that showed two problems:
>       - Delayed task_work from NMI / missing events.
>         Fixed by triggering dummy irq_work to enforce an interrupt for
>         the exit-to-userland path which checks task_work
>       - Increased ref-count on clean up/ during exec.
>         Mostly addressed by the former change. There is still a window
>         if the NMI occurs during execve(). This is addressed by removing
>         the task_work before free_event().
>       The testsuite (remove_on_exec) fails sometimes if the event/
>       SIGTRAP is sent before the sighandler is installed.

Tested-by: Marco Elver <elver@google.com>

It does pass the tests in tools/testing/selftests/perf_events (non-RT
kernel, lockdep enabled). But I do recall this being a particularly
sharp corner of perf, so any additional testing and review here is
useful.

Thanks,
-- Marco
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Wed, Mar 13, 2024 at 03:35:27PM +0100, Marco Elver wrote:
> On Tue, 12 Mar 2024 at 19:08, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > Arnaldo reported that "perf test sigtrap" fails on PREEMPT_RT. Sending
> > the signal gets delayed until event_sched_out() which then uses
> > task_work_add() for its delivery. This breaks on PREEMPT_RT because the
> > signal is delivered with disabled preemption.

> > While looking at this, I also stumbled upon __perf_pending_irq() which
> > requires disabled interrupts but this is not the case on PREEMPT_RT.

> > This series aim to address both issues while not introducing a new issue
> > at the same time ;)
> > Any testing is appreciated.

> > v1…v2: https://lore.kernel.org/all/20240308175810.2894694-1-bigeasy@linutronix.de/
> >     - Marco pointed me to the testsuite that showed two problems:
> >       - Delayed task_work from NMI / missing events.
> >         Fixed by triggering dummy irq_work to enforce an interrupt for
> >         the exit-to-userland path which checks task_work
> >       - Increased ref-count on clean up/ during exec.
> >         Mostly addressed by the former change. There is still a window
> >         if the NMI occurs during execve(). This is addressed by removing
> >         the task_work before free_event().
> >       The testsuite (remove_on_exec) fails sometimes if the event/
> >       SIGTRAP is sent before the sighandler is installed.
 
> Tested-by: Marco Elver <elver@google.com>
 
> It does pass the tests in tools/testing/selftests/perf_events (non-RT
> kernel, lockdep enabled). But I do recall this being a particularly
> sharp corner of perf, so any additional testing and review here is
> useful.

Right, I'm testing with the full 'perf test' suite now.

- Arnaldo
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Wed, Mar 13, 2024 at 12:23:32PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Mar 13, 2024 at 03:35:27PM +0100, Marco Elver wrote:
> > On Tue, 12 Mar 2024 at 19:08, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > > Arnaldo reported that "perf test sigtrap" fails on PREEMPT_RT. Sending
> > > the signal gets delayed until event_sched_out() which then uses
> > > task_work_add() for its delivery. This breaks on PREEMPT_RT because the
> > > signal is delivered with disabled preemption.
 
> > > While looking at this, I also stumbled upon __perf_pending_irq() which
> > > requires disabled interrupts but this is not the case on PREEMPT_RT.
 
> > > This series aim to address both issues while not introducing a new issue
> > > at the same time ;)
> > > Any testing is appreciated.
 
> > > v1…v2: https://lore.kernel.org/all/20240308175810.2894694-1-bigeasy@linutronix.de/
> > >     - Marco pointed me to the testsuite that showed two problems:
> > >       - Delayed task_work from NMI / missing events.
> > >         Fixed by triggering dummy irq_work to enforce an interrupt for
> > >         the exit-to-userland path which checks task_work
> > >       - Increased ref-count on clean up/ during exec.
> > >         Mostly addressed by the former change. There is still a window
> > >         if the NMI occurs during execve(). This is addressed by removing
> > >         the task_work before free_event().
> > >       The testsuite (remove_on_exec) fails sometimes if the event/
> > >       SIGTRAP is sent before the sighandler is installed.
>  
> > Tested-by: Marco Elver <elver@google.com>
>  
> > It does pass the tests in tools/testing/selftests/perf_events (non-RT
> > kernel, lockdep enabled). But I do recall this being a particularly
> > sharp corner of perf, so any additional testing and review here is
> > useful.

> Right, I'm testing with the full 'perf test' suite now.

'perf test' doesn't show any regression, now I'm running Vince Weaver's
https://github.com/deater/perf_event_tests, storing the results with
this patchset and then without, to do a diff, lets see...

- Arnaldo
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Wed, Mar 13, 2024 at 03:14:28PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Mar 13, 2024 at 12:23:32PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Wed, Mar 13, 2024 at 03:35:27PM +0100, Marco Elver wrote:
> > > On Tue, 12 Mar 2024 at 19:08, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > > > Arnaldo reported that "perf test sigtrap" fails on PREEMPT_RT. Sending
> > > > the signal gets delayed until event_sched_out() which then uses
> > > > task_work_add() for its delivery. This breaks on PREEMPT_RT because the
> > > > signal is delivered with disabled preemption.
>  
> > > > While looking at this, I also stumbled upon __perf_pending_irq() which
> > > > requires disabled interrupts but this is not the case on PREEMPT_RT.
>  
> > > > This series aim to address both issues while not introducing a new issue
> > > > at the same time ;)
> > > > Any testing is appreciated.
>  
> > > > v1…v2: https://lore.kernel.org/all/20240308175810.2894694-1-bigeasy@linutronix.de/
> > > >     - Marco pointed me to the testsuite that showed two problems:
> > > >       - Delayed task_work from NMI / missing events.
> > > >         Fixed by triggering dummy irq_work to enforce an interrupt for
> > > >         the exit-to-userland path which checks task_work
> > > >       - Increased ref-count on clean up/ during exec.
> > > >         Mostly addressed by the former change. There is still a window
> > > >         if the NMI occurs during execve(). This is addressed by removing
> > > >         the task_work before free_event().
> > > >       The testsuite (remove_on_exec) fails sometimes if the event/
> > > >       SIGTRAP is sent before the sighandler is installed.
> >  
> > > Tested-by: Marco Elver <elver@google.com>

> > > It does pass the tests in tools/testing/selftests/perf_events (non-RT
> > > kernel, lockdep enabled). But I do recall this being a particularly
> > > sharp corner of perf, so any additional testing and review here is
> > > useful.
> 
> > Right, I'm testing with the full 'perf test' suite now.
> 
> 'perf test' doesn't show any regression, now I'm running Vince Weaver's
> https://github.com/deater/perf_event_tests, storing the results with
> this patchset and then without, to do a diff, lets see...

[root@nine perf_event_tests]# diff -u results.6.8.0-rc7-rt6 results.6.8.0-rc7.sebastian-rt6+ | grep ^[+-]
--- results.6.8.0-rc7-rt6	2024-03-13 15:26:37.923323518 -0300
+++ results.6.8.0-rc7.sebastian-rt6+	2024-03-13 15:14:11.505333801 -0300
-Linux nine 6.8.0-rc7-rt6 #1 SMP PREEMPT_RT Fri Mar  8 17:36:48 -03 2024 x86_64 x86_64 x86_64 GNU/Linux
+Linux nine 6.8.0-rc7.sebastian-rt6+ #2 SMP PREEMPT_RT Tue Mar 12 18:01:31 -03 2024 x86_64 x86_64 x86_64 GNU/Linux
-    Testing "branch-misses" generalized event...               FAILED
+    Testing "branch-misses" generalized event...               PASSED
-    Testing uncore events...                                   SKIPPED
+    Testing uncore events...                                   PASSED
-    We are running release 6.8.0-rc7-rt6
+    We are running release 6.8.0-rc7.sebastian-rt6+
-    Running on CPU 4
+    Running on CPU 2
-    Running on CPU 6
+    Running on CPU 2
-    Measuring on CPU 5
-Running on CPU 6
-Measuring on CPU 5
+    Measuring on CPU 6
+Running on CPU 2
+Measuring on CPU 6
[root@nine perf_event_tests]#

So basically:

-    Testing "branch-misses" generalized event...               FAILED
+    Testing "branch-misses" generalized event...               PASSED
-    Testing uncore events...                                   SKIPPED
+    Testing uncore events...                                   PASSED

So things improved! I'll re-run to see if these results are stable...

- Arnaldo
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Wed, Mar 13, 2024 at 03:30:52PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Mar 13, 2024 at 03:14:28PM -0300, Arnaldo Carvalho de Melo wrote:
> > 'perf test' doesn't show any regression, now I'm running Vince Weaver's
> > https://github.com/deater/perf_event_tests, storing the results with
> > this patchset and then without, to do a diff, lets see...
> 
> So things improved! I'll re-run to see if these results are stable...

tldr; No dmesg activity, no kernel splats, most tests passed, nothing
noticeable when running with/without the patch with Vince's regression
tests. So:


Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>

- Arnaldo

Further details:

Without the patch:

[root@nine perf_event_tests]# ./run_tests.sh | tee results.$(uname -r).new ; diff -u results.$(uname -r) results.$(uname -r).new
--- results.6.8.0-rc7-rt6	2024-03-13 15:26:37.923323518 -0300
+++ results.6.8.0-rc7-rt6.new	2024-03-13 15:32:43.983245095 -0300
@@ -296,7 +296,7 @@
   + tests/rdpmc/rdpmc_validation
     Testing if userspace rdpmc reads give expected results...  PASSED
   + tests/rdpmc/rdpmc_multiplexing
-    Testing if userspace rdpmc multiplexing works...           PASSED
+    Testing if userspace rdpmc multiplexing works...           FAILED
   + tests/rdpmc/rdpmc_reset
     Testing if resetting while using rdpmc works...            PASSED
   + tests/rdpmc/rdpmc_group
@@ -304,15 +304,15 @@
   + tests/rdpmc/rdpmc_attach
     Testing if rdpmc attach works...                           PASSED
   + tests/rdpmc/rdpmc_attach_cpu
-    Running on CPU 4
+    Running on CPU 0
 Testing if rdpmc behavior on attach CPU...                 PASSED
   + tests/rdpmc/rdpmc_attach_global_cpu
-    Running on CPU 6
+    Running on CPU 3
 Testing if rdpmc behavior on attach all procs on other CPU... FAILED
   + tests/rdpmc/rdpmc_attach_other_cpu
-    Measuring on CPU 5
-Running on CPU 6
-Measuring on CPU 5
+    Measuring on CPU 0
+Running on CPU 3
+Measuring on CPU 0
 Testing if rdpmc behavior on attach other CPU...           FAILED
   + tests/rdpmc/rdpmc_multiattach
     Testing if rdpmc multi-attach works...                     PASSED

A test flipped results.

Trying again with a more compact output:

[root@nine perf_event_tests]# ./run_tests.sh | tee results.$(uname -r).new ; diff -u results.$(uname -r) results.$(uname -r).new | grep ^[+-]
--- results.6.8.0-rc7-rt6	2024-03-13 15:26:37.923323518 -0300
+++ results.6.8.0-rc7-rt6.new	2024-03-13 17:06:34.944149451 -0300
-    Running on CPU 4
-Testing if rdpmc behavior on attach CPU...                 PASSED
-  + tests/rdpmc/rdpmc_attach_global_cpu
+Testing if rdpmc behavior on attach CPU...                 FAILED
+  + tests/rdpmc/rdpmc_attach_global_cpu
+    Running on CPU 0
-    Measuring on CPU 5
-Running on CPU 6
-Measuring on CPU 5
+    Measuring on CPU 7
+Running on CPU 1
+Measuring on CPU 7
[root@nine perf_event_tests]#

Since its that rdpmc that is now always failing without this patch
series, lets try using that .new as the new baseline:

[root@nine perf_event_tests]# ./run_tests.sh | tee results.$(uname -r).new2 ; diff -u results.$(uname -r).new results.$(uname -r).new2 | grep ^[+-]
--- results.6.8.0-rc7-rt6.new	2024-03-13 17:06:34.944149451 -0300
+++ results.6.8.0-rc7-rt6.new2	2024-03-13 17:08:41.438282558 -0300
-    Testing "branch-misses" generalized event...               FAILED
+    Testing "branch-misses" generalized event...               PASSED
-    Testing if userspace rdpmc multiplexing works...           PASSED
+    Testing if userspace rdpmc multiplexing works...           FAILED
-    Running on CPU 6
-Testing if rdpmc behavior on attach CPU...                 FAILED
+    Running on CPU 2
+Testing if rdpmc behavior on attach CPU...                 PASSED
-    Running on CPU 0
+    Running on CPU 2
-    Measuring on CPU 7
-Running on CPU 1
-Measuring on CPU 7
+    Measuring on CPU 2
+Running on CPU 0
+Measuring on CPU 2
[root@nine perf_event_tests]#
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Wed, Mar 13, 2024 at 05:12:25PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Mar 13, 2024 at 03:30:52PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Wed, Mar 13, 2024 at 03:14:28PM -0300, Arnaldo Carvalho de Melo wrote:
> > > 'perf test' doesn't show any regression, now I'm running Vince Weaver's
> > > https://github.com/deater/perf_event_tests, storing the results with
> > > this patchset and then without, to do a diff, lets see...

> > So things improved! I'll re-run to see if these results are stable...
 
> tldr; No dmesg activity, no kernel splats, most tests passed, nothing
> noticeable when running with/without the patch with Vince's regression
> tests. So:
 
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Too quick, now I'm testing it on top of torvalds/master, no PREEMPT_RT.

- Arnaldo
 
> - Arnaldo
> 
> Further details:
> 
> Without the patch:
> 
> [root@nine perf_event_tests]# ./run_tests.sh | tee results.$(uname -r).new ; diff -u results.$(uname -r) results.$(uname -r).new
> --- results.6.8.0-rc7-rt6	2024-03-13 15:26:37.923323518 -0300
> +++ results.6.8.0-rc7-rt6.new	2024-03-13 15:32:43.983245095 -0300
> @@ -296,7 +296,7 @@
>    + tests/rdpmc/rdpmc_validation
>      Testing if userspace rdpmc reads give expected results...  PASSED
>    + tests/rdpmc/rdpmc_multiplexing
> -    Testing if userspace rdpmc multiplexing works...           PASSED
> +    Testing if userspace rdpmc multiplexing works...           FAILED
>    + tests/rdpmc/rdpmc_reset
>      Testing if resetting while using rdpmc works...            PASSED
>    + tests/rdpmc/rdpmc_group
> @@ -304,15 +304,15 @@
>    + tests/rdpmc/rdpmc_attach
>      Testing if rdpmc attach works...                           PASSED
>    + tests/rdpmc/rdpmc_attach_cpu
> -    Running on CPU 4
> +    Running on CPU 0
>  Testing if rdpmc behavior on attach CPU...                 PASSED
>    + tests/rdpmc/rdpmc_attach_global_cpu
> -    Running on CPU 6
> +    Running on CPU 3
>  Testing if rdpmc behavior on attach all procs on other CPU... FAILED
>    + tests/rdpmc/rdpmc_attach_other_cpu
> -    Measuring on CPU 5
> -Running on CPU 6
> -Measuring on CPU 5
> +    Measuring on CPU 0
> +Running on CPU 3
> +Measuring on CPU 0
>  Testing if rdpmc behavior on attach other CPU...           FAILED
>    + tests/rdpmc/rdpmc_multiattach
>      Testing if rdpmc multi-attach works...                     PASSED
> 
> A test flipped results.
> 
> Trying again with a more compact output:
> 
> [root@nine perf_event_tests]# ./run_tests.sh | tee results.$(uname -r).new ; diff -u results.$(uname -r) results.$(uname -r).new | grep ^[+-]
> --- results.6.8.0-rc7-rt6	2024-03-13 15:26:37.923323518 -0300
> +++ results.6.8.0-rc7-rt6.new	2024-03-13 17:06:34.944149451 -0300
> -    Running on CPU 4
> -Testing if rdpmc behavior on attach CPU...                 PASSED
> -  + tests/rdpmc/rdpmc_attach_global_cpu
> +Testing if rdpmc behavior on attach CPU...                 FAILED
> +  + tests/rdpmc/rdpmc_attach_global_cpu
> +    Running on CPU 0
> -    Measuring on CPU 5
> -Running on CPU 6
> -Measuring on CPU 5
> +    Measuring on CPU 7
> +Running on CPU 1
> +Measuring on CPU 7
> [root@nine perf_event_tests]#
> 
> Since its that rdpmc that is now always failing without this patch
> series, lets try using that .new as the new baseline:
> 
> [root@nine perf_event_tests]# ./run_tests.sh | tee results.$(uname -r).new2 ; diff -u results.$(uname -r).new results.$(uname -r).new2 | grep ^[+-]
> --- results.6.8.0-rc7-rt6.new	2024-03-13 17:06:34.944149451 -0300
> +++ results.6.8.0-rc7-rt6.new2	2024-03-13 17:08:41.438282558 -0300
> -    Testing "branch-misses" generalized event...               FAILED
> +    Testing "branch-misses" generalized event...               PASSED
> -    Testing if userspace rdpmc multiplexing works...           PASSED
> +    Testing if userspace rdpmc multiplexing works...           FAILED
> -    Running on CPU 6
> -Testing if rdpmc behavior on attach CPU...                 FAILED
> +    Running on CPU 2
> +Testing if rdpmc behavior on attach CPU...                 PASSED
> -    Running on CPU 0
> +    Running on CPU 2
> -    Measuring on CPU 7
> -Running on CPU 1
> -Measuring on CPU 7
> +    Measuring on CPU 2
> +Running on CPU 0
> +Measuring on CPU 2
> [root@nine perf_event_tests]#
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Sebastian Andrzej Siewior 1 year, 11 months ago
On 2024-03-13 17:14:25 [-0300], Arnaldo Carvalho de Melo wrote:
> > tldr; No dmesg activity, no kernel splats, most tests passed, nothing
> > noticeable when running with/without the patch with Vince's regression
> > tests. So:
>  
> > Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> Too quick, now I'm testing it on top of torvalds/master, no PREEMPT_RT.

Just to be clear: You revert your Tested-by because now you test this on
torvalds/master but not because you reported a regression which I
missed.

> - Arnaldo

Sebastian
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Thu, Mar 14, 2024 at 10:10:33AM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-03-13 17:14:25 [-0300], Arnaldo Carvalho de Melo wrote:
> > > tldr; No dmesg activity, no kernel splats, most tests passed, nothing
> > > noticeable when running with/without the patch with Vince's regression
> > > tests. So:
> >  
> > > Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > 
> > Too quick, now I'm testing it on top of torvalds/master, no PREEMPT_RT.
> 
> Just to be clear: You revert your Tested-by because now you test this on
> torvalds/master but not because you reported a regression which I
> missed.

You got it right. No regressions, the code is good, I just need to test
it a bit further, with torvalds/master, without PREEMPT_RT.

- Arnaldo
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Thu, Mar 14, 2024 at 11:34:39AM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Mar 14, 2024 at 10:10:33AM +0100, Sebastian Andrzej Siewior wrote:
> > On 2024-03-13 17:14:25 [-0300], Arnaldo Carvalho de Melo wrote:
> > > > tldr; No dmesg activity, no kernel splats, most tests passed, nothing
> > > > noticeable when running with/without the patch with Vince's regression
> > > > tests. So:

> > > > Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > > Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>

> > > Too quick, now I'm testing it on top of torvalds/master, no PREEMPT_RT.

> > Just to be clear: You revert your Tested-by because now you test this on
> > torvalds/master but not because you reported a regression which I
> > missed.

> You got it right. No regressions, the code is good, I just need to test
> it a bit further, with torvalds/master, without PREEMPT_RT.

Tests performed, no regressions detected, same behaviour when killing
the remove_on_exec selftests midway:

[acme@nine perf_events]$ perf annotate --stdio2 exec_child | tail
              mov    $0x1,%edi
            → callq  _exit@plt
        b5:   nop           
100.00  b6:   mov    signal_count,%eax
              test   %eax,%eax
            ↑ je     b6     
              nop           
              nop           
              leaveq        
            ← retq          
[acme@nine perf_events]$ pidof exec_child 
28256 28249 28241 28240 28239 28236 28228 28226 28224 28223 28219 28215 28208 28207 28206 28205 28200 28188 28187 28186 28185 28169 28168 28167 28166 28155 28154 28153 28152 28140 28139 28138 28137 28124 28123 28122 28121 28111 28110 28109 28108 28094 28093 28092 28091 28080 28079 28078 28077 28064 28062 28061 28060 28048 28047 28046 28045 28030 28029 28028 28027 28012 28011 28010 28009 27998 27996 27994 27993 27982 27981 27979 27978 27966 27965 27962 27961 27952 27951 27949 27948 27934 27933 27932 27931 27920 27919 27918 27917 27906 27905 27903 27902 27888 27885 27883 27882
[acme@nine perf_events]$

[acme@nine linux]$ uname -a
Linux nine 6.8.0-rc7.sebastian-rt6+ #2 SMP PREEMPT_RT Tue Mar 12 18:01:31 -03 2024 x86_64 x86_64 x86_64 GNU/Linux
[acme@nine linux]$ perf probe -L perf_pending_disable
<perf_pending_disable@/home/acme/git/linux/kernel/events/core.c:0>
      0  static void perf_pending_disable(struct irq_work *entry)
         {
      2         struct perf_event *event = container_of(entry, struct perf_event, pending_disable_irq);
                int rctx;
         
                /*
                 * If we 'fail' here, that's OK, it means recursion is already disabled
                 * and we won't recurse 'further'.
                 */
                rctx = perf_swevent_get_recursion_context();
     10         __perf_pending_disable(event);
     11         if (rctx >= 0)
     12                 perf_swevent_put_recursion_context(rctx);
         }
         
         static void perf_pending_irq(struct irq_work *entry)

[acme@nine linux]$

So I keep my:

Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>

- Arnaldo
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Thu, Mar 14, 2024 at 06:22:38PM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Mar 14, 2024 at 11:34:39AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Thu, Mar 14, 2024 at 10:10:33AM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2024-03-13 17:14:25 [-0300], Arnaldo Carvalho de Melo wrote:
> > > > > tldr; No dmesg activity, no kernel splats, most tests passed, nothing
> > > > > noticeable when running with/without the patch with Vince's regression
> > > > > tests. So:
> 
> > > > > Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > > > Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> > > > Too quick, now I'm testing it on top of torvalds/master, no PREEMPT_RT.
> 
> > > Just to be clear: You revert your Tested-by because now you test this on
> > > torvalds/master but not because you reported a regression which I
> > > missed.
> 
> > You got it right. No regressions, the code is good, I just need to test
> > it a bit further, with torvalds/master, without PREEMPT_RT.
> 
> Tests performed, no regressions detected, same behaviour when killing
> the remove_on_exec selftests midway:
 
> [acme@nine linux]$ uname -a
> Linux nine 6.8.0-rc7.sebastian-rt6+ #2 SMP PREEMPT_RT Tue Mar 12 18:01:31 -03 2024 x86_64 x86_64 x86_64 GNU/Linux

Re-reading this I noticed I really retested with the rt kernel, d0h, so
here it goes again:

[acme@nine ~]$ uname -r
6.8.0.sigtrapfix+
[acme@nine ~]$ set -o vi
[acme@nine ~]$ perf test sigtrap
 68: Sigtrap                                                         : Ok
[acme@nine ~]$ cd ~acme/git/linux
[acme@nine linux]$ cd tools/testing/selftests/perf_events && make
make: Nothing to be done for 'all'.
[acme@nine perf_events]$ for x in {0..100}; do (./remove_on_exec &); done
<snip>
ok 4 remove_on_exec.exec_stress
# PASSED: 4 / 4 tests passed.
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
#            OK  remove_on_exec.exec_stress
ok 4 remove_on_exec.exec_stress
# PASSED: 4 / 4 tests passed.
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
#            OK  remove_on_exec.exec_stress
ok 4 remove_on_exec.exec_stress
# PASSED: 4 / 4 tests passed.
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
[acme@nine perf_events]$

So despite this mistake all is well, torvalds/master + your patchkit
seems ok.

Sorry for the noise :-\

- Arnaldo
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Sebastian Andrzej Siewior 1 year, 10 months ago
On 2024-03-14 18:46:21 [-0300], Arnaldo Carvalho de Melo wrote:
> So despite this mistake all is well, torvalds/master + your patchkit
> seems ok.
> 
> Sorry for the noise :-\

No worries, thank you.

> - Arnaldo

Sebastian
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Tue, Mar 12, 2024 at 07:01:48PM +0100, Sebastian Andrzej Siewior wrote:
> Hi,
> 
> Arnaldo reported that "perf test sigtrap" fails on PREEMPT_RT. Sending
> the signal gets delayed until event_sched_out() which then uses
> task_work_add() for its delivery. This breaks on PREEMPT_RT because the
> signal is delivered with disabled preemption.
> 
> While looking at this, I also stumbled upon __perf_pending_irq() which
> requires disabled interrupts but this is not the case on PREEMPT_RT.
> 
> This series aim to address both issues while not introducing a new issue
> at the same time ;)
> Any testing is appreciated.
> 
> v1…v2: https://lore.kernel.org/all/20240308175810.2894694-1-bigeasy@linutronix.de/
>     - Marco pointed me to the testsuite that showed two problems:
>       - Delayed task_work from NMI / missing events.
>         Fixed by triggering dummy irq_work to enforce an interrupt for
> 	the exit-to-userland path which checks task_work
>       - Increased ref-count on clean up/ during exec.
>         Mostly addressed by the former change. There is still a window
> 	if the NMI occurs during execve(). This is addressed by removing
> 	the task_work before free_event().
>       The testsuite (remove_on_exec) fails sometimes if the event/
>       SIGTRAP is sent before the sighandler is installed.

Survives:

[acme@nine perf_events]$ for x in {0..1000}; do (perf test sigtrap &); done

And:

[acme@nine perf_events]$ pwd
/home/acme/git/linux/tools/testing/selftests/perf_events
[acme@nine perf_events]$ for x in {0..100}; do (./remove_on_exec &); done

Nothing on dmesg.

But:

[acme@nine ~]$ pidof exec_child
24273 24271 24270 24269 24268 24267 24266 24265 24264 24263 24262 24261 24260 24259 24258 24257 24256 24255 24254 24253 24252 24251 24250 24249 24248 24247 24246 24245 24244 24243 24242 24241 24240 24239 24238 24237 24236 24233 24232 24231 24230 24229 24228 24227 24226 24225 24224 24223 24220 24219 24218 24217 24216 24215 24214 24213 24212 24211 24210 24209 24208 24207 24206 24204 24203 24202 24201 24200 24199 24198 24197 24196 24195 24194 24192 24191 24190 24189 24188 24187 24186 24185 24184 24183 24182 24181 24180 24179 24178 24177 24176 24175 24174 24173 24172 24171 24170 24169 24168 24167 24166 24165 24164 24163 24162 24161 24160 24159 24158 24157 24154 24153 24152 24151 24150 24149 24148 24147 24146 24145 24144 24143 24142 24141 24140 24139 24138 24137 24136 24135 24134 24133 24132 24131 24130 24129 24127 24126 24125 24124 24123 24122 24121 24120 24119 24118 24117 24115 24114 24113 24112 24111 24110 24109 24108 24107 24106 24105 24104 24103 24102 24101 24100 24099 24098 24097 24096 24095 24094 24093 24092 24091 24090 24089 24088 24087 24086 24085 24084 24083 24082 24081 24080 24076 24075 24074 24073 24072 24071 24070 24069 24068 24067 24066 24065 24064 24063 24062 24061 24060 24059 24058 24057 24056 24055 24054 24053 24052 24051 24049 24048 24047 24046 24045 24044 24043 24042 24041 24040 24039 24037 24036 24035 24034 24033 24032 24031 24030 24029 24028 24027 24026 24025 24024 24023 24022 24021 24020 24019 24018 24017 24016 24015 24014 24013 24012 24011 24010 24009 24008 24007 24006 24005 24004 24003 24002 23998 23997 23996 23995 23994 23993 23992 23991 23990 23989 23988 23986 23985 23984 23983 23982 23981 23980 23979 23978 23977 23976 23975 23974 23973 23972 23970 23969 23968 23967 23966 23965 23964 23963 23962 23961 23960 23958 23957 23956 23955 23954 23953 23952 23951 23950 23949 23948 23947 23946 23945 23944 23943 23942 23941 23940 23939 23938 23937 23936 23935 23934 23933 23932 23931 23930 23929 23928 23927 23926 23925 23924 23923 23919 23918 23917 23916 23915 23914 23913 23912 23911 23910 23909 23907 23906 23905 23904 23903 23902 23901 23900 23899 23898 23897 23896 23895 23894 23893 23891 23890 23889 23888 23887 23886 23885 23884 23883 23882 23881 23879 23878 23877 23876 23875 23874 23873 23872 23871 23870 23869 23867 23866 23865 23864 23863 23862 23858 23857 23856 23855 23854 23853 23852 23851 23850 23849 23848 23847 23846 23845 23844 23843 23842 23841 23840 23833 23832 23831 23830 23829 23828 23827 23826 23825 23824 23823 23820 23819 23818 23817 23816 23815 23814 23813 23812 23811 23810 23809 23808 23807 23806 23804 23803 23802 23801 23800 23799 23798 23797 23793 23792 23791 23787 23786 23774 23773 23772 23771 23770 23769 23768 23767 23756 23754 23751 23750 23749 23748 23747 23746 23743 23734 23733 23732 23731 23730 23729 23728 23727 23726 23725 23724 23723 23722 23719 23716 23715 23684 23668 23667 23666 23589 23574 23573 23526 23461 23396 23395 23375 23271 23270 23266 23242 23241 23231 23214 23213 23210 23197 23196 23182 23181 23129 23128 23046 22964
[acme@nine ~]$

[root@nine ~]# cat /proc/24263/stack
[<0>] irqentry_exit_to_user_mode+0x1c9/0x1e0
[<0>] asm_sysvec_apic_timer_interrupt+0x16/0x20
[root@nine ~]#

[root@nine ~]# for a in `pidof exec_child` ; do cat /proc/$a/stack ; done | sort | uniq -c
      1 [<0>] asm_common_interrupt+0x22/0x40
    495 [<0>] asm_sysvec_apic_timer_interrupt+0x16/0x20
      2 [<0>] asm_sysvec_reschedule_ipi+0x16/0x20
    498 [<0>] irqentry_exit_to_user_mode+0x1c9/0x1e0
[root@nine ~]#

[acme@nine ~]$ ps ax|grep exec_child| wc -l
504
[acme@nine ~]$ ps ax|grep exec_child| tail
  24264 pts/0    R      0:04 exec_child
  24265 pts/0    R      0:04 exec_child
  24266 pts/0    R      0:04 exec_child
  24267 pts/0    R      0:04 exec_child
  24268 pts/0    R      0:04 exec_child
  24269 pts/0    R      0:04 exec_child
  24270 pts/0    R      0:04 exec_child
  24271 pts/0    R      0:04 exec_child
  24273 pts/0    R      0:04 exec_child
  26704 pts/1    S+     0:00 grep --color=auto exec_child
[acme@nine ~]$

All in 'R' state.

[root@nine ~]# killall exec_child
exec_child: no process found
[root@nine ~]# ps ax | grep exec_child | head -5
  22964 pts/0    R      0:06 exec_child
  23046 pts/0    R      0:05 exec_child
  23128 pts/0    R      0:05 exec_child
  23129 pts/0    R      0:05 exec_child
  23181 pts/0    R      0:05 exec_child
[root@nine ~]# kill 22964 23046 23128 23129 23181
[root@nine ~]# ps ax | grep exec_child | head -5
  23182 pts/0    R      0:06 exec_child
  23196 pts/0    R      0:06 exec_child
  23197 pts/0    R      0:06 exec_child
  23210 pts/0    R      0:06 exec_child
  23213 pts/0    R      0:06 exec_child
[root@nine ~]#

[root@nine ~]# kill `pidof exec_child`
[root@nine ~]# ps ax | grep exec_child | head -5
  26806 pts/0    S+     0:00 grep --color=auto exec_child
[root@nine ~]#

[acme@nine perf_events]$ ps ax|grep exec_child
  26816 pts/0    S+     0:00 grep --color=auto exec_child
[acme@nine perf_events]$ ./remove_on_exec 
TAP version 13
1..4
# Starting 4 tests from 1 test cases.
#  RUN           remove_on_exec.fork_only ...
#            OK  remove_on_exec.fork_only
ok 1 remove_on_exec.fork_only
#  RUN           remove_on_exec.fork_exec_then_enable ...
#            OK  remove_on_exec.fork_exec_then_enable
ok 2 remove_on_exec.fork_exec_then_enable
#  RUN           remove_on_exec.enable_then_fork_exec ...
*#            OK  remove_on_exec.enable_then_fork_exec
ok 3 remove_on_exec.enable_then_fork_exec
#  RUN           remove_on_exec.exec_stress ...
*************************#            OK  remove_on_exec.exec_stress
ok 4 remove_on_exec.exec_stress
# PASSED: 4 / 4 tests passed.
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
[acme@nine perf_events]$ ps ax|grep exec_child
  26858 pts/0    S+     0:00 grep --color=auto exec_child
[acme@nine perf_events]$ ps ax|grep exec_child
  26860 pts/0    S+     0:00 grep --color=auto exec_child
[acme@nine perf_events]$ 

[acme@nine perf_events]$ for x in {0..100}; do (./remove_on_exec &); done
<SNIP>

While it runs I can see:

[acme@nine ~]$ ps ax|grep exec_child
  28627 pts/0    R      0:00 exec_child
  28668 pts/0    R      0:00 exec_child
  28744 pts/0    R      0:00 exec_child
  28813 pts/0    R      0:00 exec_child
  28912 pts/0    R      0:00 exec_child
  29666 pts/1    S+     0:00 grep --color=auto exec_child
[acme@nine ~]$

at the end they disappeared, on this last run.

But if I do a 'killall remove_on_exec' and stop that loop (control+C/Z)
we get all those exec_child running a seemingly eternal loop:

[acme@nine ~]$ ps ax|grep exec_child|wc -l
479
[acme@nine ~]$ ps ax|grep exec_child|head
  30143 pts/0    R      0:01 exec_child
  30144 pts/0    R      0:01 exec_child
  30145 pts/0    R      0:01 exec_child
  30147 pts/0    R      0:01 exec_child
  30150 pts/0    R      0:01 exec_child
  30151 pts/0    R      0:01 exec_child
  30153 pts/0    R      0:01 exec_child
  30154 pts/0    R      0:01 exec_child
  30155 pts/0    R      0:01 exec_child
  30156 pts/0    R      0:01 exec_child
[acme@nine ~]$ sudo cat /proc/30143/stack 
[<0>] irqentry_exit_to_user_mode+0x1c9/0x1e0
[<0>] asm_sysvec_apic_timer_interrupt+0x16/0x20
[acme@nine ~]$

[acme@nine perf_events]$ uptime
 18:36:17 up 16 min,  2 users,  load average: 460.01, 375.77, 240.59
[acme@nine perf_events]$ uptime
 18:36:21 up 16 min,  2 users,  load average: 461.53, 377.49, 241.87
[acme@nine perf_events]$ uptime
 18:36:24 up 16 min,  2 users,  load average: 462.85, 379.16, 243.14
[acme@nine perf_events]$ uptime
 18:36:27 up 16 min,  2 users,  load average: 462.85, 379.16, 243.14
[acme@nine perf_events]$

- Arnaldo
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Sebastian Andrzej Siewior 1 year, 11 months ago
On 2024-03-12 18:42:38 [-0300], Arnaldo Carvalho de Melo wrote:
…
> But:
> 
> [acme@nine ~]$ pidof exec_child
> 24273 24271 24270 24269 24268 24267 24266 24265 24264 24263 24262 24261 24260 24259 
…

> [root@nine ~]# cat /proc/24263/stack
> [<0>] irqentry_exit_to_user_mode+0x1c9/0x1e0
> [<0>] asm_sysvec_apic_timer_interrupt+0x16/0x20
> [root@nine ~]#
…
> [acme@nine ~]$ ps ax|grep exec_child| wc -l
> 504
> [acme@nine ~]$ ps ax|grep exec_child| tail
>   24264 pts/0    R      0:04 exec_child
>   24265 pts/0    R      0:04 exec_child
>   24266 pts/0    R      0:04 exec_child
>   24267 pts/0    R      0:04 exec_child
>   24268 pts/0    R      0:04 exec_child
>   24269 pts/0    R      0:04 exec_child
>   24270 pts/0    R      0:04 exec_child
>   24271 pts/0    R      0:04 exec_child
>   24273 pts/0    R      0:04 exec_child
>   26704 pts/1    S+     0:00 grep --color=auto exec_child
> [acme@nine ~]$
> 
> All in 'R' state.
> 
> [root@nine ~]# killall exec_child
> exec_child: no process found
> [root@nine ~]# ps ax | grep exec_child | head -5
>   22964 pts/0    R      0:06 exec_child
>   23046 pts/0    R      0:05 exec_child
>   23128 pts/0    R      0:05 exec_child
>   23129 pts/0    R      0:05 exec_child
>   23181 pts/0    R      0:05 exec_child
> [root@nine ~]# kill 22964 23046 23128 23129 23181
> [root@nine ~]# ps ax | grep exec_child | head -5
>   23182 pts/0    R      0:06 exec_child
>   23196 pts/0    R      0:06 exec_child
>   23197 pts/0    R      0:06 exec_child
>   23210 pts/0    R      0:06 exec_child
>   23213 pts/0    R      0:06 exec_child
> [root@nine ~]#

You can't kill them?

> at the end they disappeared, on this last run.
> 
> But if I do a 'killall remove_on_exec' and stop that loop (control+C/Z)
> we get all those exec_child running a seemingly eternal loop:

Is this new or was it there? Is this VM or bare metal?

One part I don't get: did you let it run or did you kill it?
`exec_child' spins until a signal is received or the parent kills it. So
it shouldn't remain there for ever. And my guess, that it is in spinning
in userland and not in kernel.
I tried it on bare metal and VM and couldn't reproduce this.
…
> 
> - Arnaldo

Sebastian
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Wed, Mar 13, 2024 at 09:13:03AM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-03-12 18:42:38 [-0300], Arnaldo Carvalho de Melo wrote:
> …
> > But:
> > 
> > [acme@nine ~]$ pidof exec_child
> > 24273 24271 24270 24269 24268 24267 24266 24265 24264 24263 24262 24261 24260 24259 
> …
> 
> > [root@nine ~]# cat /proc/24263/stack
> > [<0>] irqentry_exit_to_user_mode+0x1c9/0x1e0
> > [<0>] asm_sysvec_apic_timer_interrupt+0x16/0x20
> > [root@nine ~]#
> …
> > [acme@nine ~]$ ps ax|grep exec_child| wc -l
> > 504
> > [acme@nine ~]$ ps ax|grep exec_child| tail
> >   24264 pts/0    R      0:04 exec_child
> >   24265 pts/0    R      0:04 exec_child
> >   24266 pts/0    R      0:04 exec_child
> >   24267 pts/0    R      0:04 exec_child
> >   24268 pts/0    R      0:04 exec_child
> >   24269 pts/0    R      0:04 exec_child
> >   24270 pts/0    R      0:04 exec_child
> >   24271 pts/0    R      0:04 exec_child
> >   24273 pts/0    R      0:04 exec_child
> >   26704 pts/1    S+     0:00 grep --color=auto exec_child
> > [acme@nine ~]$
> > 
> > All in 'R' state.
> > 
> > [root@nine ~]# killall exec_child
> > exec_child: no process found
> > [root@nine ~]# ps ax | grep exec_child | head -5
> >   22964 pts/0    R      0:06 exec_child
> >   23046 pts/0    R      0:05 exec_child
> >   23128 pts/0    R      0:05 exec_child
> >   23129 pts/0    R      0:05 exec_child
> >   23181 pts/0    R      0:05 exec_child
> > [root@nine ~]# kill 22964 23046 23128 23129 23181
> > [root@nine ~]# ps ax | grep exec_child | head -5
> >   23182 pts/0    R      0:06 exec_child
> >   23196 pts/0    R      0:06 exec_child
> >   23197 pts/0    R      0:06 exec_child
> >   23210 pts/0    R      0:06 exec_child
> >   23213 pts/0    R      0:06 exec_child
> > [root@nine ~]#
 
> You can't kill them?

I can, they remain in R state and I can kill them with 'kill `pidof exec_child`'
 
> > at the end they disappeared, on this last run.

> > But if I do a 'killall remove_on_exec' and stop that loop (control+C/Z)
> > we get all those exec_child running a seemingly eternal loop:
> 
> Is this new or was it there? Is this VM or bare metal?

bare metal.
 
> One part I don't get: did you let it run or did you kill it?

If I let them run they will finish and exit, no exec_child remains.

If I instead try to stop the loop that goes on forking the 100 of them,
then the exec_child remain spinning.

> `exec_child' spins until a signal is received or the parent kills it. So


> it shouldn't remain there for ever. And my guess, that it is in spinning
> in userland and not in kernel.

Checking that now, the stack is the one I posted:

> > [root@nine ~]# cat /proc/24263/stack
> > [<0>] irqentry_exit_to_user_mode+0x1c9/0x1e0
> > [<0>] asm_sysvec_apic_timer_interrupt+0x16/0x20
> > [root@nine ~]#

> I tried it on bare metal and VM and couldn't reproduce this.

All my tests are in bare metal.

- Arnaldo
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Wed, Mar 13, 2024 at 10:28:44AM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Mar 13, 2024 at 09:13:03AM +0100, Sebastian Andrzej Siewior wrote:
> > One part I don't get: did you let it run or did you kill it?
 
> If I let them run they will finish and exit, no exec_child remains.
 
> If I instead try to stop the loop that goes on forking the 100 of them,
> then the exec_child remain spinning.
 
> > `exec_child' spins until a signal is received or the parent kills it. So
 
> > it shouldn't remain there for ever. And my guess, that it is in spinning
> > in userland and not in kernel.
 
> Checking that now:

tldr; the tight loop, full details at the end.

100.00  b6:   mov    signal_count,%eax
              test   %eax,%eax
            ↑ je     b6     

remove_on_exec.c

/* For exec'd child. */
static void exec_child(void)
{
        struct sigaction action = {};
        const int val = 42;

        /* Set up sigtrap handler in case we erroneously receive a trap. */
        action.sa_flags = SA_SIGINFO | SA_NODEFER;
        action.sa_sigaction = sigtrap_handler;
        sigemptyset(&action.sa_mask);
        if (sigaction(SIGTRAP, &action, NULL))
                _exit((perror("sigaction failed"), 1));

        /* Signal parent that we're starting to spin. */
        if (write(STDOUT_FILENO, &val, sizeof(int)) == -1)
                _exit((perror("write failed"), 1));

        /* Should hang here until killed. */
        while (!signal_count);
}

So probably just a test needing to be a bit more polished?

Seems like it, on a newer machine, faster, I managed to reproduce it on
a non-RT kernel, with one exec_child remaining:

  1.44  b6:   mov   signal_count,%eax
              test  %eax,%eax
 98.56      ↑ je    b6

same tight loop:

acme@x1:~/git/perf-tools-next/tools/testing/selftests/perf_events$ pidof exec_child
722300
acme@x1:~/git/perf-tools-next/tools/testing/selftests/perf_events$ ps ax|grep exec_child
 722300 pts/2    R      4:08 exec_child
 722502 pts/2    S+     0:00 grep --color=auto exec_child
acme@x1:~/git/perf-tools-next/tools/testing/selftests/perf_events$

- Arnaldo

[root@nine ~]# perf record --call-graph dwarf -p 35785
^C[ perf record: Woken up 48 times to write data ]
[ perf record: Captured and wrote 12.120 MB perf.data (1503 samples) ]

[root@nine ~]# ls -la perf.data
-rw-------. 1 root root 12720152 Mar 13 10:32 perf.data
[root@nine ~]#
[root@nine ~]# perf report --no-child --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:P'
# Event count (approx.): 926018718
#
# Overhead  Command  Shared Object      Symbol                                
# ........  .......  .................  ......................................
#
    98.48%  exe      remove_on_exec     [.] exec_child
            |
            ---exec_child
               main
               __libc_start_call_main
               __libc_start_main@@GLIBC_2.34
               _start

     0.33%  exe      [kernel.kallsyms]  [k] arch_scale_freq_tick
     0.13%  exe      [kernel.kallsyms]  [k] debug_smp_processor_id
     0.13%  exe      [kernel.kallsyms]  [k] check_cpu_stall
     0.13%  exe      [kernel.kallsyms]  [k] acct_account_cputime
     0.13%  exe      [kernel.kallsyms]  [k] cpuacct_account_field
     0.07%  exe      [kernel.kallsyms]  [k] preempt_count_add
     0.07%  exe      [kernel.kallsyms]  [k] update_irq_load_avg
     0.07%  exe      [kernel.kallsyms]  [k] cgroup_rstat_updated
     0.07%  exe      [kernel.kallsyms]  [k] rcu_sched_clock_irq
     0.07%  exe      [kernel.kallsyms]  [k] account_user_time
     0.07%  exe      [kernel.kallsyms]  [k] __hrtimer_run_queues
     0.07%  exe      [kernel.kallsyms]  [k] tick_nohz_highres_handler
     0.07%  exe      [kernel.kallsyms]  [k] ktime_get_update_offsets_now
     0.06%  exe      [kernel.kallsyms]  [k] __enqueue_entity
     0.06%  exe      [kernel.kallsyms]  [k] tick_sched_handle
     0.00%  exe      [kernel.kallsyms]  [k] __intel_pmu_enable_all.constprop.0


#
# (Tip: To show assembler sample contexts use perf record -b / perf script -F +brstackinsn --xed)
#
[root@nine ~]#

[root@nine ~]# perf annotate --stdio2 exec_child 
Samples: 1K of event 'cycles:P', 4000 Hz, Event count (approx.): 911943256, [percent: local period]
exec_child() /home/acme/git/linux/tools/testing/selftests/perf_events/remove_on_exec
Percent        
               
               
            Disassembly of section .text:
               
            00000000004045cf <exec_child>:
              push   %rbp   
              mov    %rsp,%rbp
              sub    $0xb0,%rsp
              lea    -0xa0(%rbp),%rdx
              mov    $0x0,%eax
              mov    $0x13,%ecx
              mov    %rdx,%rdi
              rep    stos %rax,%es:(%rdi)
              movl   $0x2a,-0xa4(%rbp)
              movl   $0x40000004,-0x18(%rbp)
              movq   $0x402a2e,-0xa0(%rbp)
              lea    -0xa0(%rbp),%rax
              add    $0x8,%rax
              mov    %rax,%rdi
            → callq  sigemptyset@plt
              lea    -0xa0(%rbp),%rax
              mov    $0x0,%edx
              mov    %rax,%rsi
              mov    $0x5,%edi
            → callq  sigaction@plt
              test   %eax,%eax
            ↓ je     82     
              mov    $0x4058af,%edi
            → callq  perror@plt
              mov    $0x1,%edi
            → callq  _exit@plt
        82:   lea    -0xa4(%rbp),%rax
              mov    $0x4,%edx
              mov    %rax,%rsi
              mov    $0x1,%edi
            → callq  write@plt
              cmp    $0xffffffffffffffff,%rax
            ↓ jne    b5     
              mov    $0x4058c0,%edi
            → callq  perror@plt
              mov    $0x1,%edi
            → callq  _exit@plt
        b5:   nop           
100.00  b6:   mov    signal_count,%eax
              test   %eax,%eax
            ↑ je     b6     
              nop           
              nop           
              leaveq        
            ← retq          
[root@nine ~]#
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Marco Elver 1 year, 11 months ago
On Wed, 13 Mar 2024 at 14:47, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> On Wed, Mar 13, 2024 at 10:28:44AM -0300, Arnaldo Carvalho de Melo wrote:
> > On Wed, Mar 13, 2024 at 09:13:03AM +0100, Sebastian Andrzej Siewior wrote:
> > > One part I don't get: did you let it run or did you kill it?
>
> > If I let them run they will finish and exit, no exec_child remains.
>
> > If I instead try to stop the loop that goes on forking the 100 of them,
> > then the exec_child remain spinning.
>
> > > `exec_child' spins until a signal is received or the parent kills it. So
>
> > > it shouldn't remain there for ever. And my guess, that it is in spinning
> > > in userland and not in kernel.
>
> > Checking that now:
>
> tldr; the tight loop, full details at the end.
>
> 100.00  b6:   mov    signal_count,%eax
>               test   %eax,%eax
>             ↑ je     b6
>
> remove_on_exec.c
>
> /* For exec'd child. */
> static void exec_child(void)
> {
>         struct sigaction action = {};
>         const int val = 42;
>
>         /* Set up sigtrap handler in case we erroneously receive a trap. */
>         action.sa_flags = SA_SIGINFO | SA_NODEFER;
>         action.sa_sigaction = sigtrap_handler;
>         sigemptyset(&action.sa_mask);
>         if (sigaction(SIGTRAP, &action, NULL))
>                 _exit((perror("sigaction failed"), 1));
>
>         /* Signal parent that we're starting to spin. */
>         if (write(STDOUT_FILENO, &val, sizeof(int)) == -1)
>                 _exit((perror("write failed"), 1));
>
>         /* Should hang here until killed. */
>         while (!signal_count);
> }
>
> So probably just a test needing to be a bit more polished?

Yes, possible.

> Seems like it, on a newer machine, faster, I managed to reproduce it on
> a non-RT kernel, with one exec_child remaining:
>
>   1.44  b6:   mov   signal_count,%eax
>               test  %eax,%eax
>  98.56      ↑ je    b6

It's unclear to me why that happens. But I do recall seeing it before,
and my explanation was that with too many concurrent copies of the
test the system either ran out of memory (maybe?) because the stress
test also spawns 30 parallel copies of the "exec_child" subprocess. So
with the 100 parallel copies we end up with 30 * 100 processes. Maybe
that's too much?

In any case, if the kernel didn't fall over during that kind of stress
testing, and the test itself passes when run as a single copy, then
I'd conclude all looks good.

This particular feature of perf along with testing it once before
melted Peter's and my brain [1]. I hope your experience didn't result
in complete brain-melt. ;-)

[1] https://lore.kernel.org/all/Y0VofNVMBXPOJJr7@elver.google.com/

Thanks,
-- Marco
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Sebastian Andrzej Siewior 1 year, 11 months ago
On 2024-03-13 10:47:33 [-0300], Arnaldo Carvalho de Melo wrote:
> /* For exec'd child. */
> static void exec_child(void)
> {
>         struct sigaction action = {};
>         const int val = 42;
> 
>         /* Set up sigtrap handler in case we erroneously receive a trap. */
>         action.sa_flags = SA_SIGINFO | SA_NODEFER;
>         action.sa_sigaction = sigtrap_handler;
>         sigemptyset(&action.sa_mask);
>         if (sigaction(SIGTRAP, &action, NULL))
>                 _exit((perror("sigaction failed"), 1));
> 
>         /* Signal parent that we're starting to spin. */
>         if (write(STDOUT_FILENO, &val, sizeof(int)) == -1)
>                 _exit((perror("write failed"), 1));
> 
>         /* Should hang here until killed. */
>         while (!signal_count);
> }
> 
> So probably just a test needing to be a bit more polished?

Maybe. I'm not sure where this is coming from. Either someone should
kill or the signal should be delivered but, hmm… If the signal isn't
coming then it might one of those without a perf counter.

> Seems like it, on a newer machine, faster, I managed to reproduce it on
> a non-RT kernel, with one exec_child remaining:

Okay, so no regression. That is something ;)

> - Arnaldo

Sebastian
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Sebastian Andrzej Siewior 1 year, 11 months ago
On 2024-03-13 10:28:41 [-0300], Arnaldo Carvalho de Melo wrote:
> > One part I don't get: did you let it run or did you kill it?
> 
> If I let them run they will finish and exit, no exec_child remains.
> 
> If I instead try to stop the loop that goes on forking the 100 of them,
> then the exec_child remain spinning.

Okay. So that problem only exists if you intervene. And you can
reproduce this odd behaviour with my patches but not without them,
right?

> > it shouldn't remain there for ever. And my guess, that it is in spinning
> > in userland and not in kernel.
> 
> Checking that now, the stack is the one I posted:
> 
> > > [root@nine ~]# cat /proc/24263/stack
> > > [<0>] irqentry_exit_to_user_mode+0x1c9/0x1e0
> > > [<0>] asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > [root@nine ~]#

could you resolve irqentry_exit_to_user_mode+0x1c9/0x1e0, please?

> > I tried it on bare metal and VM and couldn't reproduce this.
> 
> All my tests are in bare metal.

Would you mind sending me your .config? The shell is bash I guess. I
will try to reproduce your setup on another box…

> - Arnaldo

Sebastian
Re: [PATCH v2 0/4] perf: Make SIGTRAP and __perf_pending_irq() work on RT.
Posted by Arnaldo Carvalho de Melo 1 year, 11 months ago
On Wed, Mar 13, 2024 at 02:46:45PM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-03-13 10:28:41 [-0300], Arnaldo Carvalho de Melo wrote:
> > > One part I don't get: did you let it run or did you kill it?
> > 
> > If I let them run they will finish and exit, no exec_child remains.
> > 
> > If I instead try to stop the loop that goes on forking the 100 of them,
> > then the exec_child remain spinning.
> 
> Okay. So that problem only exists if you intervene. And you can
> reproduce this odd behaviour with my patches but not without them,
> right?

See the next message, I managed to reproduce that behaviour in a non-RT
kernel as well.
 
- Arnaldo