tools/perf/builtin-lock.c | 12 ++++++++++-- tools/perf/builtin-sched.c | 20 ++++++++++++++++---- 2 files changed, 26 insertions(+), 6 deletions(-)
Several perf subcommands (sched stats, lock contention) use the pattern
of forking a workload child, calling evlist__start_workload() to uncork
it, and then calling pause() to wait for a signal (typically SIGCHLD
when the child exits, or SIGINT/SIGTERM from the user).
This pattern has a race condition: if the workload is very short-lived,
the child can exit and deliver SIGCHLD in the window between
evlist__start_workload() and pause(). Since pause() only returns when a
signal is received *while the process is suspended*, and SIGCHLD has
already been delivered and handled by the empty sighandler(), pause()
blocks indefinitely.
The fix replaces pause() with a simpler approach:
- The signal handler now sets a 'volatile sig_atomic_t done' flag
to record delivery. The flag is reset before handler registration
so that a signal arriving during the setup phase is not discarded.
- Replace pause() with a unified loop:
while (!done) {
if (argc && waitpid(evlist->workload.pid, NULL, WNOHANG) > 0)
break;
sleep(1);
}
This handles both workload mode (child exit detected via waitpid)
and system-wide mode (user sends SIGINT/SIGTERM setting done).
Using WNOHANG avoids the SA_RESTART problem where a blocking
waitpid() would auto-restart and ignore the done flag if the child
doesn't exit on signal.
Three call sites are affected across two files:
- perf_sched__schedstat_record() in builtin-sched.c
- perf_sched__schedstat_live() in builtin-sched.c
- __cmd_contention() in builtin-lock.c
The two pause() sites in builtin-kwork.c are NOT affected because they
do not register SIGCHLD or fork workload children; they only wait for
user-initiated SIGINT/SIGTERM.
v3: https://lore.kernel.org/all/20260422064700.12345-1-swapnil.sapkal@amd.com/
Changes since v3:
- Replaced blocking waitpid() with waitpid(WNOHANG) in a unified
loop that also checks 'done'. Because perf is compiled with
_GNU_SOURCE, glibc's signal() sets SA_RESTART, causing a blocking
waitpid() to auto-restart after the handler returns. If the
workload ignores SIGINT, the tool would hang. (Sashiko review)
- Moved 'done = 0' before signal handler registration so that an
early signal during the setup phase is not discarded by a later
reset. (Sashiko review)
- Used single unified loop for both workload and system-wide cases
as suggested by Namhyung.
v2: https://lore.kernel.org/all/20260409162249.25581-1-swapnil.sapkal@amd.com/
Changes since v2:
- The signal handler now sets a 'volatile sig_atomic_t done' flag
instead of being an empty function. pause()/sigsuspend() alone
cannot cover the window between signal() handler registration and
sigprocmask(): a signal consumed during that unblocked window was
silently lost. The while(!done) loop detects such early delivery.
(Sashiko review)
- Replaced sigprocmask()/sigsuspend() approach with the much simpler
waitpid() + while(!done) sleep(1) pattern as suggested by the
maintainer.
v1: https://lore.kernel.org/all/20260401064114.141066-1-swapnil.sapkal@amd.com/
Changes since v1:
- Moved sigprocmask() to after evlist__prepare_workload() so the
forked child does not inherit a blocked SIGCHLD mask, which would
break workloads relying on SIGCHLD (e.g., Node.js, Python asyncio).
(Sashiko review)
- Block SIGINT and SIGTERM in addition to SIGCHLD to prevent an
early Ctrl+C during setup from being consumed before sigsuspend().
- Added signal mask restoration on error paths.
(Sashiko review)
Swapnil Sapkal (3):
perf sched stats: Fix SIGCHLD vs pause() race in schedstat_record()
perf sched stats: Fix SIGCHLD vs pause() race in schedstat_live()
perf lock contention: Fix SIGCHLD vs pause() race in __cmd_contention()
tools/perf/builtin-lock.c | 12 ++++++++++--
tools/perf/builtin-sched.c | 20 ++++++++++++++++----
2 files changed, 26 insertions(+), 6 deletions(-)
--
2.43.0
On Wed, May 20, 2026 at 10:20:14AM +0000, Swapnil Sapkal wrote:
> Several perf subcommands (sched stats, lock contention) use the pattern
> of forking a workload child, calling evlist__start_workload() to uncork
> it, and then calling pause() to wait for a signal (typically SIGCHLD
> when the child exits, or SIGINT/SIGTERM from the user).
>
> This pattern has a race condition: if the workload is very short-lived,
> the child can exit and deliver SIGCHLD in the window between
> evlist__start_workload() and pause(). Since pause() only returns when a
> signal is received *while the process is suspended*, and SIGCHLD has
> already been delivered and handled by the empty sighandler(), pause()
> blocks indefinitely.
>
> The fix replaces pause() with a simpler approach:
>
> - The signal handler now sets a 'volatile sig_atomic_t done' flag
> to record delivery. The flag is reset before handler registration
> so that a signal arriving during the setup phase is not discarded.
>
> - Replace pause() with a unified loop:
>
> while (!done) {
> if (argc && waitpid(evlist->workload.pid, NULL, WNOHANG) > 0)
> break;
> sleep(1);
> }
>
> This handles both workload mode (child exit detected via waitpid)
> and system-wide mode (user sends SIGINT/SIGTERM setting done).
> Using WNOHANG avoids the SA_RESTART problem where a blocking
> waitpid() would auto-restart and ignore the done flag if the child
> doesn't exit on signal.
>
> Three call sites are affected across two files:
> - perf_sched__schedstat_record() in builtin-sched.c
> - perf_sched__schedstat_live() in builtin-sched.c
> - __cmd_contention() in builtin-lock.c
>
> The two pause() sites in builtin-kwork.c are NOT affected because they
> do not register SIGCHLD or fork workload children; they only wait for
> user-initiated SIGINT/SIGTERM.
Thanks, applied to perf-tools-next, for v7.2.
- Arnaldo
© 2016 - 2026 Red Hat, Inc.