[v1] RE: [PATCH 0/4] sched: Various reweight_entity() fixes

RE: [PATCH 0/4] sched: Various reweight_entity() fixes

Posted by Doug Smythies 13 hours ago

On 2026.02.09.07:47 Peter Zijlstra wrote:
> On Wed, Feb 04, 2026 at 03:45:58PM +0530, K Prateek Nayak wrote:
>
>>        # Overflow on enqueue
>> 
>>            <...>-102371  [255] ... : __enqueue_entity: Overflowed cfs_rq:
>>            <...>-102371  [255] ... : dump_h_overflow_cfs_rq: cfs_rq: depth(0) weight(90894772) nr_queued(2) sum_w_vruntime(0)
sum_weight(0) zero_vruntime(701164930256050) sum_shift(0) avg_vruntime(701809615900788)
>>            <...>-102371  [255] ... : dump_h_overflow_entity: se: weight(3508) vruntime(701809615900788) slice(2800000)
deadline(701810568648095) curr?(1) task?(1)       <-------- cfs_rq->curr
>>            <...>-102371  [255] ... : __enqueue_entity: Overflowed se:
>>            <...>-102371  [255] ... : dump_h_overflow_entity: se: weight(90891264) vruntime(701808975077099) slice(2800000)
deadline(701808975109401) curr?(0) task?(0)   <-------- new se
>
> So I spend a whole time trying to reproduce the splat, but alas.
>
> That said, I did spot something 'funny' in the above, note that
> zero_vruntime and avg_vruntime/curr->vruntime are significantly apart.
> That is not something that should happen. zero_vruntime is supposed to
> closely track avg_vruntime.
>
> That lead me to hypothesise that there is a problem tracking
> zero_vruntime when there is but a single runnable task, and sure
> enough, I could reproduce that, albeit not at such a scale as to lead to
> such problems (probably too much noise on my machine).
>
> I ended up with the below; and I've already pushed out a fresh
> queue/sched/core. Could you please test again?

I tested this "V2". The CPU migration times test results are not good.
We expect the sample time to not deviate from the nominal 1 second
by more than 10 milliseconds for this test. The test ran for about
13 hours and 41 minutes (49,243 samples). Histogram of times:

kernel: 6.19.0-rc8-pz-v2
gov: powersave
HWP: enabled

1.000, 29206
1.001, 19598
1.002, 19
1.003, 15
1.004, 32
1.005, 25
1.006, 3
1.007, 3
1.008, 5
1.009, 5
1.010, 13
1.011, 14
1.012, 13
1.013, 6
1.014, 10
1.015, 16
1.016, 54
1.017, 116
1.018, 57
1.019, 7
1.020, 2
1.021, 1
1.023, 2
1.024, 4
1.025, 7
1.026, 1
1.027, 1
1.028, 1
1.029, 1
1.030, 2
1.037, 1

Total: 49240 : Total >= 10 mSec: 329 ( 0.67 percent)

For reference previous test results are copied and pasted below.

Step 1: Confirm where we left off a year ago:

The exact same kernel from a year ago, that we ended up happy with, was used.

doug@s19:~/tmp/peterz/6.19/turbo$ cat 613.his
Kernel: 6.13.0-stock
gov: powersave
HWP: enabled

1.000000, 23195
1.001000, 10897
1.002000, 49
1.003000, 23
1.004000, 21
1.005000, 9

Total: 34194 : Total >= 10 mSec: 0 ( 0.00 percent)

So, over 9 hours and never a nominal sample time exceeded by over 5 milliseconds.
Very good.

Step 2: Take a baseline sample before this patch set:
Mainline kernel 6.19-rc1 was used:

doug@s19:~/tmp/peterz/6.19/turbo$ cat rc1.his
Kernel: 6.19.0-rc1-stock
gov: powersave
HWP: enabled

1.000000, 19509
1.001000, 10430
1.002000, 32
1.003000, 19
1.004000, 24
1.005000, 13
1.006000, 9
1.007000, 4
1.008000, 3
1.009000, 4
1.010000, 6
1.011000, 2
1.012000, 1
1.013000, 4
1.014000, 10
1.015000, 10
1.016000, 7
1.017000, 10
1.018000, 20
1.019000, 12
1.020000, 5
1.021000, 3
1.022000, 1
1.023000, 2
1.024000, 2  <<< Clamped. Actually 26 and 25 milliseconds

Total: 30142 : Total >= 10 mSec: 95 ( 0.32 percent)

What!!!
Over 8 hours.
It seems something has regressed over the last year.
Our threshold of 10 milliseconds was rather arbitrary.

Step 3: This patch set [V1] and from Peter's git tree:

doug@s19:~/tmp/peterz/6.19/turbo$ cat 02.his
kernel: 6.19.0-rc1-pz
gov: powersave
HWP: enabled

1.000000, 19139
1.001000, 9532
1.002000, 19
1.003000, 17
1.004000, 8
1.005000, 3
1.006000, 2
1.009000, 1

Total: 28721 : Total >= 10 mSec: 0 ( 0.00 percent)

Just about 8 hours.
Never a time >= our arbitrary threshold of 10 milliseconds.
So, good.

My test computer also hung under the heavy heavy load test,
albeit at a higher load than before.
There was no log information that I could find after the re-boot.

References:
https://lore.kernel.org/lkml/000d01dc939e$0fc99fe0$2f5cdfa0$@telus.net/
https://lore.kernel.org/lkml/005f01db5a44$3bb698e0$b323caa0$@telus.net/
https://lore.kernel.org/lkml/004a01dc952b$471c94a0$d555bde0$@telus.net/

Re: [PATCH 0/4] sched: Various reweight_entity() fixes

Posted by Peter Zijlstra 10 hours ago

On Tue, Feb 10, 2026 at 07:41:58AM -0800, Doug Smythies wrote:
> On 2026.02.09.07:47 Peter Zijlstra wrote:
> > On Wed, Feb 04, 2026 at 03:45:58PM +0530, K Prateek Nayak wrote:
> >
> >>        # Overflow on enqueue
> >> 
> >>            <...>-102371  [255] ... : __enqueue_entity: Overflowed cfs_rq:
> >>            <...>-102371  [255] ... : dump_h_overflow_cfs_rq: cfs_rq: depth(0) weight(90894772) nr_queued(2) sum_w_vruntime(0)
> sum_weight(0) zero_vruntime(701164930256050) sum_shift(0) avg_vruntime(701809615900788)
> >>            <...>-102371  [255] ... : dump_h_overflow_entity: se: weight(3508) vruntime(701809615900788) slice(2800000)
> deadline(701810568648095) curr?(1) task?(1)       <-------- cfs_rq->curr
> >>            <...>-102371  [255] ... : __enqueue_entity: Overflowed se:
> >>            <...>-102371  [255] ... : dump_h_overflow_entity: se: weight(90891264) vruntime(701808975077099) slice(2800000)
> deadline(701808975109401) curr?(0) task?(0)   <-------- new se
> >
> > So I spend a whole time trying to reproduce the splat, but alas.
> >
> > That said, I did spot something 'funny' in the above, note that
> > zero_vruntime and avg_vruntime/curr->vruntime are significantly apart.
> > That is not something that should happen. zero_vruntime is supposed to
> > closely track avg_vruntime.
> >
> > That lead me to hypothesise that there is a problem tracking
> > zero_vruntime when there is but a single runnable task, and sure
> > enough, I could reproduce that, albeit not at such a scale as to lead to
> > such problems (probably too much noise on my machine).
> >
> > I ended up with the below; and I've already pushed out a fresh
> > queue/sched/core. Could you please test again?
> 
> I tested this "V2". The CPU migration times test results are not good.
> We expect the sample time to not deviate from the nominal 1 second
> by more than 10 milliseconds for this test. The test ran for about
> 13 hours and 41 minutes (49,243 samples). Histogram of times:
> 

> It seems something has regressed over the last year.
> Our threshold of 10 milliseconds was rather arbitrary.

Moo.. I'll go dig out that benchmark too.

Re: [PATCH 0/4] sched: Various reweight_entity() fixes

Posted by K Prateek Nayak 11 hours ago

Hello Doug,

On 2/10/2026 9:11 PM, Doug Smythies wrote:
> My test computer also hung under the heavy heavy load test,
> albeit at a higher load than before.
> There was no log information that I could find after the re-boot.

Could you run the same scenario with PARANOID_AVG:

    echo PARANOID_AVG > /sys/kernel/debug/sched/features

and once you are past the point when the system would have usually
hung, can check if the "sum_shifts" reported for cfs_rq in the
debugfs have changed to some non-zero value:

    grep "shift.*: [^0]$" /sys/kernel/debug/sched/debug

I'm assuming this is the same "yes" x 12500 copies bomb that failed.
Let me see if I can reproduce this on my setup by leaving it going
overnight on a limited cpuset.

Since you mentioned there is some bound to the number of copies when
the hang is observed, can you please share your system details and
the number of CPUs it has?

-- 
Thanks and Regards,
Prateek

Re: [PATCH 0/4] sched: Various reweight_entity() fixes

Posted by Peter Zijlstra 10 hours ago

On Tue, Feb 10, 2026 at 11:39:14PM +0530, K Prateek Nayak wrote:

> Since you mentioned there is some bound to the number of copies when
> the hang is observed, can you please share your system details and
> the number of CPUs it has?

He had that information in another email, 6 cores/12 threads.

For me, I got to 21160 spread over 12 threads when it went boom. That
translates to aroun 1763 per CPU. Which is a fairly silly number of
active tasks, but it *should* work.

It's a bit tedious to reproduce, but I'll prod at it some too.

Re: [PATCH 0/4] sched: Various reweight_entity() fixes

Posted by K Prateek Nayak 9 hours ago

On 2/11/2026 12:05 AM, Peter Zijlstra wrote:
> On Tue, Feb 10, 2026 at 11:39:14PM +0530, K Prateek Nayak wrote:
> 
> 
>> Since you mentioned there is some bound to the number of copies when
>> the hang is observed, can you please share your system details and
>> the number of CPUs it has?
> 
> He had that information in another email, 6 cores/12 threads.

Ah! My bad. should have dug deeper.

> 
> For me, I got to 21160 spread over 12 threads when it went boom. That
> translates to aroun 1763 per CPU. Which is a fairly silly number of
> active tasks, but it *should* work.
> 
> It's a bit tedious to reproduce, but I'll prod at it some too.

My machine hasn't cracked yet with 30k instances. I'll too will prod
more tomorrow.

-- 
Thanks and Regards,
Prateek