On 2026.02.09.07:47 Peter Zijlstra wrote: > On Wed, Feb 04, 2026 at 03:45:58PM +0530, K Prateek Nayak wrote: > >> # Overflow on enqueue >> >> <...>-102371 [255] ... : __enqueue_entity: Overflowed cfs_rq: >> <...>-102371 [255] ... : dump_h_overflow_cfs_rq: cfs_rq: depth(0) weight(90894772) nr_queued(2) sum_w_vruntime(0) sum_weight(0) zero_vruntime(701164930256050) sum_shift(0) avg_vruntime(701809615900788) >> <...>-102371 [255] ... : dump_h_overflow_entity: se: weight(3508) vruntime(701809615900788) slice(2800000) deadline(701810568648095) curr?(1) task?(1) <-------- cfs_rq->curr >> <...>-102371 [255] ... : __enqueue_entity: Overflowed se: >> <...>-102371 [255] ... : dump_h_overflow_entity: se: weight(90891264) vruntime(701808975077099) slice(2800000) deadline(701808975109401) curr?(0) task?(0) <-------- new se > > So I spend a whole time trying to reproduce the splat, but alas. > > That said, I did spot something 'funny' in the above, note that > zero_vruntime and avg_vruntime/curr->vruntime are significantly apart. > That is not something that should happen. zero_vruntime is supposed to > closely track avg_vruntime. > > That lead me to hypothesise that there is a problem tracking > zero_vruntime when there is but a single runnable task, and sure > enough, I could reproduce that, albeit not at such a scale as to lead to > such problems (probably too much noise on my machine). > > I ended up with the below; and I've already pushed out a fresh > queue/sched/core. Could you please test again? I tested this "V2". The CPU migration times test results are not good. We expect the sample time to not deviate from the nominal 1 second by more than 10 milliseconds for this test. The test ran for about 13 hours and 41 minutes (49,243 samples). Histogram of times: kernel: 6.19.0-rc8-pz-v2 gov: powersave HWP: enabled 1.000, 29206 1.001, 19598 1.002, 19 1.003, 15 1.004, 32 1.005, 25 1.006, 3 1.007, 3 1.008, 5 1.009, 5 1.010, 13 1.011, 14 1.012, 13 1.013, 6 1.014, 10 1.015, 16 1.016, 54 1.017, 116 1.018, 57 1.019, 7 1.020, 2 1.021, 1 1.023, 2 1.024, 4 1.025, 7 1.026, 1 1.027, 1 1.028, 1 1.029, 1 1.030, 2 1.037, 1 Total: 49240 : Total >= 10 mSec: 329 ( 0.67 percent) For reference previous test results are copied and pasted below. Step 1: Confirm where we left off a year ago: The exact same kernel from a year ago, that we ended up happy with, was used. doug@s19:~/tmp/peterz/6.19/turbo$ cat 613.his Kernel: 6.13.0-stock gov: powersave HWP: enabled 1.000000, 23195 1.001000, 10897 1.002000, 49 1.003000, 23 1.004000, 21 1.005000, 9 Total: 34194 : Total >= 10 mSec: 0 ( 0.00 percent) So, over 9 hours and never a nominal sample time exceeded by over 5 milliseconds. Very good. Step 2: Take a baseline sample before this patch set: Mainline kernel 6.19-rc1 was used: doug@s19:~/tmp/peterz/6.19/turbo$ cat rc1.his Kernel: 6.19.0-rc1-stock gov: powersave HWP: enabled 1.000000, 19509 1.001000, 10430 1.002000, 32 1.003000, 19 1.004000, 24 1.005000, 13 1.006000, 9 1.007000, 4 1.008000, 3 1.009000, 4 1.010000, 6 1.011000, 2 1.012000, 1 1.013000, 4 1.014000, 10 1.015000, 10 1.016000, 7 1.017000, 10 1.018000, 20 1.019000, 12 1.020000, 5 1.021000, 3 1.022000, 1 1.023000, 2 1.024000, 2 <<< Clamped. Actually 26 and 25 milliseconds Total: 30142 : Total >= 10 mSec: 95 ( 0.32 percent) What!!! Over 8 hours. It seems something has regressed over the last year. Our threshold of 10 milliseconds was rather arbitrary. Step 3: This patch set [V1] and from Peter's git tree: doug@s19:~/tmp/peterz/6.19/turbo$ cat 02.his kernel: 6.19.0-rc1-pz gov: powersave HWP: enabled 1.000000, 19139 1.001000, 9532 1.002000, 19 1.003000, 17 1.004000, 8 1.005000, 3 1.006000, 2 1.009000, 1 Total: 28721 : Total >= 10 mSec: 0 ( 0.00 percent) Just about 8 hours. Never a time >= our arbitrary threshold of 10 milliseconds. So, good. My test computer also hung under the heavy heavy load test, albeit at a higher load than before. There was no log information that I could find after the re-boot. References: https://lore.kernel.org/lkml/000d01dc939e$0fc99fe0$2f5cdfa0$@telus.net/ https://lore.kernel.org/lkml/005f01db5a44$3bb698e0$b323caa0$@telus.net/ https://lore.kernel.org/lkml/004a01dc952b$471c94a0$d555bde0$@telus.net/
On Tue, Feb 10, 2026 at 07:41:58AM -0800, Doug Smythies wrote: > On 2026.02.09.07:47 Peter Zijlstra wrote: > > On Wed, Feb 04, 2026 at 03:45:58PM +0530, K Prateek Nayak wrote: > > > >> # Overflow on enqueue > >> > >> <...>-102371 [255] ... : __enqueue_entity: Overflowed cfs_rq: > >> <...>-102371 [255] ... : dump_h_overflow_cfs_rq: cfs_rq: depth(0) weight(90894772) nr_queued(2) sum_w_vruntime(0) > sum_weight(0) zero_vruntime(701164930256050) sum_shift(0) avg_vruntime(701809615900788) > >> <...>-102371 [255] ... : dump_h_overflow_entity: se: weight(3508) vruntime(701809615900788) slice(2800000) > deadline(701810568648095) curr?(1) task?(1) <-------- cfs_rq->curr > >> <...>-102371 [255] ... : __enqueue_entity: Overflowed se: > >> <...>-102371 [255] ... : dump_h_overflow_entity: se: weight(90891264) vruntime(701808975077099) slice(2800000) > deadline(701808975109401) curr?(0) task?(0) <-------- new se > > > > So I spend a whole time trying to reproduce the splat, but alas. > > > > That said, I did spot something 'funny' in the above, note that > > zero_vruntime and avg_vruntime/curr->vruntime are significantly apart. > > That is not something that should happen. zero_vruntime is supposed to > > closely track avg_vruntime. > > > > That lead me to hypothesise that there is a problem tracking > > zero_vruntime when there is but a single runnable task, and sure > > enough, I could reproduce that, albeit not at such a scale as to lead to > > such problems (probably too much noise on my machine). > > > > I ended up with the below; and I've already pushed out a fresh > > queue/sched/core. Could you please test again? > > I tested this "V2". The CPU migration times test results are not good. > We expect the sample time to not deviate from the nominal 1 second > by more than 10 milliseconds for this test. The test ran for about > 13 hours and 41 minutes (49,243 samples). Histogram of times: > > It seems something has regressed over the last year. > Our threshold of 10 milliseconds was rather arbitrary. Moo.. I'll go dig out that benchmark too.
Hello Doug,
On 2/10/2026 9:11 PM, Doug Smythies wrote:
> My test computer also hung under the heavy heavy load test,
> albeit at a higher load than before.
> There was no log information that I could find after the re-boot.
Could you run the same scenario with PARANOID_AVG:
echo PARANOID_AVG > /sys/kernel/debug/sched/features
and once you are past the point when the system would have usually
hung, can check if the "sum_shifts" reported for cfs_rq in the
debugfs have changed to some non-zero value:
grep "shift.*: [^0]$" /sys/kernel/debug/sched/debug
I'm assuming this is the same "yes" x 12500 copies bomb that failed.
Let me see if I can reproduce this on my setup by leaving it going
overnight on a limited cpuset.
Since you mentioned there is some bound to the number of copies when
the hang is observed, can you please share your system details and
the number of CPUs it has?
--
Thanks and Regards,
Prateek
On Tue, Feb 10, 2026 at 11:39:14PM +0530, K Prateek Nayak wrote: > Since you mentioned there is some bound to the number of copies when > the hang is observed, can you please share your system details and > the number of CPUs it has? He had that information in another email, 6 cores/12 threads. For me, I got to 21160 spread over 12 threads when it went boom. That translates to aroun 1763 per CPU. Which is a fairly silly number of active tasks, but it *should* work. It's a bit tedious to reproduce, but I'll prod at it some too.
On 2/11/2026 12:05 AM, Peter Zijlstra wrote: > On Tue, Feb 10, 2026 at 11:39:14PM +0530, K Prateek Nayak wrote: > > >> Since you mentioned there is some bound to the number of copies when >> the hang is observed, can you please share your system details and >> the number of CPUs it has? > > He had that information in another email, 6 cores/12 threads. Ah! My bad. should have dug deeper. > > For me, I got to 21160 spread over 12 threads when it went boom. That > translates to aroun 1763 per CPU. Which is a fairly silly number of > active tasks, but it *should* work. > > It's a bit tedious to reproduce, but I'll prod at it some too. My machine hasn't cracked yet with 30k instances. I'll too will prod more tomorrow. -- Thanks and Regards, Prateek
© 2016 - 2026 Red Hat, Inc.