[PATCH] sched/fork: Fix timer_slack_ns inheritance for RT tasks

Xiao Feng via B4 Relay posted 1 patch 4 days, 9 hours ago
kernel/fork.c       | 2 +-
kernel/sched/core.c | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
[PATCH] sched/fork: Fix timer_slack_ns inheritance for RT tasks
Posted by Xiao Feng via B4 Relay 4 days, 9 hours ago
From: Xiao Feng <xiaofeng5@xiaomi.com>

Since commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack
values for realtime tasks"), RT tasks have their timer_slack_ns set to 0
in __setscheduler_params(). This introduces two related problems when RT
tasks fork children that end up running as CFS:

Problem 1: sched_reset_on_fork

When a RT task with sched_reset_on_fork set forks a child:

1. dup_task_struct() copies timer_slack_ns (0) from the RT parent.
2. copy_process() sets:
     p->default_timer_slack_ns = current->timer_slack_ns (= 0)
3. sched_fork() demotes the child to SCHED_NORMAL but does not
   restore timer_slack_ns.

Result: CFS child has timer_slack_ns = 0 and default_timer_slack_ns = 0
permanently.

Problem 2: RT fork followed by later policy change

When a RT task forks a child (without reset_on_fork), the child inherits
RT policy with timer_slack_ns = 0. copy_process() sets
default_timer_slack_ns = current->timer_slack_ns (= 0). If the child is
later demoted to CFS via sched_setscheduler(), __setscheduler_params()
tries to restore timer_slack_ns from default_timer_slack_ns, but it is 0.

Result: same as above.

Both problems prevent timer coalescing for these CFS tasks, causing
unnecessary wakeups and increased power consumption. Writing 0 to
/proc/pid/timerslack_ns also cannot restore a proper default.

Fix both issues:

1. In copy_process(), inherit default_timer_slack_ns from the parent's
   default_timer_slack_ns (which is preserved across RT transitions)
   instead of timer_slack_ns (which is 0 for RT tasks).

2. In sched_fork(), when sched_reset_on_fork demotes RT/DL to CFS,
   explicitly restore timer_slack_ns from the parent's
   default_timer_slack_ns, falling back to 50us if it is also 0.

Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
Signed-off-by: Xiao Feng <xiaofeng5@xiaomi.com>
---
 kernel/fork.c       | 2 +-
 kernel/sched/core.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 5f3fdfdb14c7..c3ef5ebb3037 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2174,7 +2174,7 @@ __latent_entropy struct task_struct *copy_process(
 	retval = -EAGAIN;
 #endif
 
-	p->default_timer_slack_ns = current->timer_slack_ns;
+	p->default_timer_slack_ns = current->default_timer_slack_ns;
 
 #ifdef CONFIG_PSI
 	p->psi_flags = 0;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c6..9b63560a45de 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4710,6 +4710,8 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
 			p->policy = SCHED_NORMAL;
 			p->static_prio = NICE_TO_PRIO(0);
 			p->rt_priority = 0;
+			p->timer_slack_ns = current->default_timer_slack_ns ?: 50000;
+			p->default_timer_slack_ns = p->timer_slack_ns;
 		} else if (PRIO_TO_NICE(p->static_prio) < 0)
 			p->static_prio = NICE_TO_PRIO(0);
 

---
base-commit: 27fa82620cbaa89a7fc11ac3057701d598813e87
change-id: 20260520-sched-fork-fix-timer-slack-6dfacb033d43

Best regards,
--  
Xiao Feng <xiaofeng5@xiaomi.com>
Re: [PATCH] sched/fork: Fix timer_slack_ns inheritance for RT tasks
Posted by K Prateek Nayak 4 days, 7 hours ago
Hello Xiao,

On 5/20/2026 6:37 PM, Xiao Feng via B4 Relay wrote:
> Both problems prevent timer coalescing for these CFS tasks, causing
> unnecessary wakeups and increased power consumption. Writing 0 to
> /proc/pid/timerslack_ns also cannot restore a proper default.
> 
> Fix both issues:
> 
> 1. In copy_process(), inherit default_timer_slack_ns from the parent's
>    default_timer_slack_ns (which is preserved across RT transitions)
>    instead of timer_slack_ns (which is 0 for RT tasks).

man page for fork() [1] reads:

    The default timer slack value is set to the parent's current
    timer slack value.  See the description of PR_SET_TIMERSLACK in
    prctl(2).

And that description in man page for PR_SET_TIMERSLACK [2] reads:

    When a new thread is created, the two timer slack values are made
    the same as the "current" value of the creating thread.


The two timer slack value that the man page refers to above is the
"default" and the "current" value as is describes in the opening
statement.

From a documentation standpoint, it is doing the right thing. A
RT thread returns 0 for PR_GET_TIMERSLACK and the same is set as
"default" and "current" for its children.

[1] https://man7.org/linux/man-pages/man2/fork.2.html
[2] https://man7.org/linux/man-pages/man2/pr_set_timerslack.2const.html

> 
> 2. In sched_fork(), when sched_reset_on_fork demotes RT/DL to CFS,
>    explicitly restore timer_slack_ns from the parent's
>    default_timer_slack_ns, falling back to 50us if it is also 0.

As for SCHED_FLAG_RESET_ON_FORK, man page for sched() [3] reads:

    More precisely, if the reset-on-fork flag is set, the following
    rules apply for subsequently created children:

    -  If the calling thread has a scheduling policy of SCHED_FIFO or
        SCHED_RR, the policy is reset to SCHED_OTHER in child
        processes.

    -  If the calling process has a negative nice value, the nice
        value is reset to zero in child processes.

    After the reset-on-fork flag has been enabled, it can be reset
    only if the thread has the CAP_SYS_NICE capability.  This flag is
    disabled in child processes created by fork(2).


Nowhere it says anything other than the scheduling policy is affected by
this flag. How is timer_slack any special?

[3] https://man7.org/linux/man-pages/man7/sched.7.html

> 
> Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")

Based on my reading, this is not fixing anything but instead introducing
a behavior change contrary to what has been currently documented.

If it is acceptable, at the very least, the man pages need to be updated
stating this new behavior and the kernel version that introduces it.

I'll let others comment since they know these bits better than me.

-- 
Thanks and Regards,
Prateek