On 2025-10-22 14:57:28 [+0200], Thomas Gleixner wrote:
> Time slice extensions are an attempt to provide opportunistic priority
> ceiling without the overhead of an actual priority ceiling protocol, but
> also without the guarantees such a protocol provides.
>
> The intent is to avoid situations where a user space thread is interrupted
> in a critical section and scheduled out, while holding a resource on which
> the preempting thread or other threads in the system might block on. That
> obviously prevents those threads from making progress in the worst case for
> at least a full time slice. Especially in the context of user space
> spinlocks, which are a patently bad idea to begin with, but that's also
> true for other mechanisms.
I've been playing with it a bit with RT enabled and started to debug
this:
| slice_test-2903 [001] d.h.. 2313.285439: local_timer_entry: vector=236
| slice_test-2903 [001] d.h1. 2313.285440: hrtimer_cancel: hrtimer=000000000507e6d5
| slice_test-2903 [001] d.h.. 2313.285440: hrtimer_expire_entry: hrtimer=000000000507e6d5 function=tick_nohz_handler now=2313208001152
| slice_test-2903 [001] d.h1. 2313.285449: sched_stat_runtime: comm=slice_test pid=2903 runtime=3982905 [ns]
| slice_test-2903 [001] dlh.. 2313.285452: softirq_raise: vec=7 [action=SCHED]
| slice_test-2903 [001] dlh.. 2313.285452: hrtimer_expire_exit: hrtimer=000000000507e6d5
| slice_test-2903 [001] dlh1. 2313.285452: hrtimer_start: hrtimer=000000000507e6d5 function=tick_nohz_handler expires=2313212000000 softexpires=2313212000000 mode=ABS
| slice_test-2903 [001] dlh.. 2313.285453: local_timer_exit: vector=236
| slice_test-2903 [001] dl.2. 2313.285453: sched_waking: comm=ksoftirqd/1 pid=32 prio=120 target_cpu=001
| slice_test-2903 [001] dl.3. 2313.285456: sched_wakeup: comm=ksoftirqd/1 pid=32 prio=120 target_cpu=001
| slice_test-2903 [001] d.... 2313.285457: irqentry_exit: rseq_grant_slice_extension(216)
granting the extension and removing the lazy wakup. We are still on
return from IRQ but the 'h' flag has been already removed…
| slice_test-2903 [001] d..1. 2313.285458: hrtimer_start: hrtimer=0000000030a688cc function=rseq_slice_expired expires=2313208047790 softexpires=2313208047790 mode=ABS|PINNED|HARD
| slice_test-2903 [001] d.... 2313.285458: __rseq_arm_slice_extension_timer: timer
| slice_test-2903 [001] d..2. 2313.285484: hrtimer_cancel: hrtimer=0000000030a688cc
extension granted, timer started and revoked and set need resched.
| slice_test-2903 [001] dN.2. 2313.285487: sched_stat_runtime: comm=slice_test pid=2903 runtime=36886 [ns]
This is coming from schedule() already. It took me a while since I was
hunting a missing clear of need-resched.
| slice_test-2903 [001] d..2. 2313.285489: sched_switch: prev_comm=slice_test prev_pid=2903 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/1 next_pid=32 next_prio=120
| ksoftirqd/1-32 [001] ..s.1 2313.285490: softirq_entry: vec=7 [action=SCHED]
| ksoftirqd/1-32 [001] ..s.1 2313.285501: softirq_exit: vec=7 [action=SCHED]
| ksoftirqd/1-32 [001] d..2. 2313.285502: sched_stat_runtime: comm=ksoftirqd/1 pid=32 runtime=16438 [ns]
| ksoftirqd/1-32 [001] d..2. 2313.285503: sched_switch: prev_comm=ksoftirqd/1 prev_pid=32 prev_prio=120 prev_state=S ==> next_comm=slice_test next_pid=2904 next_prio=120
| slice_test-2904 [001] ..... 2313.285507: sys_enter: NR 230 (1, 0, 7f4692c7baa0, 0, 0, 0)
| slice_test-2904 [001] ..... 2313.285507: hrtimer_setup: hrtimer=00000000f2d53899 clockid=CLOCK_MONOTONIC mode=REL
| slice_test-2904 [001] d..1. 2313.285507: hrtimer_start: hrtimer=00000000f2d53899 function=hrtimer_wakeup expires=2313208168792 softexpires=2313208118792 mode=REL
| slice_test-2904 [001] d..2. 2313.285508: sched_stat_runtime: comm=slice_test pid=2904 runtime=6149 [ns]
| slice_test-2904 [001] d..2. 2313.285510: sched_switch: prev_comm=slice_test prev_pid=2904 prev_prio=120 prev_state=S ==> next_comm=slice_test next_pid=2903 next_prio=120
| slice_test-2903 [001] ..... 2313.285510: sys_enter: NR 470 (7fffc04f1ff0, c350, 11a0e0, 0, 7f4692e99000, 0)
slice_test-2903 enters _now_ rseq_slice_yield() so it must have been in
userland during the suppressed wake up at 2313.285457.
But a few iterations later it turns at out this trace event is recorded
_after_ the rseq magic happens at sys_enter time. We entered
rseq_slice_yield() a few cycles after the extension was granted. Buh.
So it seems to work as intended but it is not obvious tell from tracing
why it does not work.
Sebastian