On 2026.02.13 02:50 Peter Zijlstra wrote:
> On Fri, Feb 13, 2026 at 07:44:24AM +0100, Peter Zijlstra wrote:
>> As to wrapper, I just went through math64.h and it appears we have
>> div64_long() that might just DTRT, but I really need to go wake up
>> first.
>>
>> And as you noted, the current branch doesn't boot :/ No idea what I
>> messed up last night, but I did push without test building. I only
>> folded those two division fixed and figured what could possibly go wrong
>> :-)
>
> It's now got div64_long() throughout.
>
> I've build and booted each commit in a vm; build and booted the combined
> stack on 2 different physical machines and re-ran the various
> benchmarks.
>
> Works-for-me.
Works for me also.
Note: I am calling this "V5" (version 5).
But: please consider if there is an issue or not with test 3 below,
mainly detailed in the attached graphs.
Testing order was from least to most time consuming.
1.) Phoronix version of hackbench, same settings as my previous reports:
Run 1 of 2, 10 tests per run: Average: 22.685 Seconds; Deviation: 0.27%
Run 2 of 2, 10 tests per run: Average: 22.775 Seconds; Deviation: 0.24%
Conclusion: Pass.
2.) a ridiculous number of threads test. Each thread mostly sleeps.
Note 1: Not previously mentioned, but I have been doing this test for years, just to see if it works.
Note 2: Other default limits need to be increased for this test. Example:
doug@s19:~$ cat /sys/fs/cgroup/user.slice/user-1000.slice/pids.max
84080
doug@s19:~$ echo 400000 | sudo tee /sys/fs/cgroup/user.slice/user-1000.slice/pids.max
400000
doug@s19:~$ cat /sys/fs/cgroup/user.slice/user-1000.slice/pids.max
400000
Note 3: The maximum number of threads attempted is determined by the amount of system memory.
In my case the memory limit is around 220,000 threads.
Note 4: There never was an issue with this test, even for previous versions of this patch set.
Details:
doug@s19:~$ cat /sys/fs/cgroup/user.slice/user-1000.slice/pids.current
220015
doug@s19:~$ uptime
08:22:23 up 26 min, 4 users, load average: 0.99, 15.19, 36.38
doug@s19:~$ free -m
total used free shared buff/cache available
Mem: 31928 28767 2985 5 568 3161
Swap: 8191 0 8191
Conclusion: Pass.
3.) a ridiculous load. Each thread is 100% load, no sleep. 20,000 X yes:
Conclusion: Pass?
Observation: The spin out rate of tasks is "clunky" not smooth. It used to be smooth.
A couple of graphs are attached. Note that actual sample times are now used,
after a nominal sleep of 2 seconds between samples. Sometimes the actual
gap is over 1 minute. It takes considerably longer, 2,200 seconds verses
1,309 seconds to spin out the 20,000 takes for V5 verses kernel 6.19-rc8.
4.) Back to the original complaint from a year ago [1]: With a ridiculous load, is there a big delay for SSH login?
Note 1: I don't care either way, I just tried it.
Details: SSH login times were all between 6 and 8 seconds.
Conclusion: Pass.
5.) Under 100.0% load (in my case 12 X yes) are there unreasonably long times between samples for turbostat?
Histograms (3 hours and 39 minutes test run time):
Peter's cool way:
$ gawk '/^usec/ {next} { if (T) { d=$2-T; bucket[int(d*1000)]++; } T=$2 } END { for (i in bucket) { printf "%0.3f: %d\n", i/1000,
bucket[i] }}' < /dev/shm/turbo.log
1.000: 6870
1.001: 5647
1.002: 25
1.003: 6
1.005: 1
My old way: Not done.
[1] https://lore.kernel.org/lkml/002401dbb6bd$4527ec00$cf77c400$@telus.net/
On 2026.02.13 22:31 Doug Smythies wrote: > On 2026.02.13 02:50 Peter Zijlstra wrote: >> On Fri, Feb 13, 2026 at 07:44:24AM +0100, Peter Zijlstra wrote: >>> As to wrapper, I just went through math64.h and it appears we have >>> div64_long() that might just DTRT, but I really need to go wake up >>> first. >>> >>> And as you noted, the current branch doesn't boot :/ No idea what I >>> messed up last night, but I did push without test building. I only >>> folded those two division fixed and figured what could possibly go wrong >>> :-) >> >> It's now got div64_long() throughout. >> >> I've build and booted each commit in a vm; build and booted the combined >> stack on 2 different physical machines and re-ran the various >> benchmarks. >> >> Works-for-me. > > Works for me also. > > Note: I am calling this "V5" (version 5). > > But: please consider if there is an issue or not with test 3 below, > mainly detailed in the attached graphs. ... snip ... > 3.) a ridiculous load. Each thread is 100% load, no sleep. 20,000 X yes: > Conclusion: Pass? > Observation: The spin out rate of tasks is "clunky" not smooth. It used to be smooth. > A couple of graphs are attached. Note that actual sample times are now used, > after a nominal sleep of 2 seconds between samples. Sometimes the actual > gap is over 1 minute. It takes considerably longer, 2,200 seconds verses > 1,309 seconds to spin out the 20,000 takes for V5 verses kernel 6.19-rc8. Just a follow up: The above reported concern with this test never had anything to do with this patch series. It had everything to do with commit 7dadeaa6e851: sched: Further restrict the preemption modes and my use of the Ubuntu kernel configuration. In the header of the commit it says: While Lazy has been the recommended setting for a while, not all distributions have managed to make the switch yet. Force things along. The kernel configuration was automatically modified eliminating PREEMPT_VOLUNTARY and leaving PREEMPT_LAZY disabled. Once I set PREEMPT_LAZY the above noted concern was gone (although I am still testing). References: https://lore.kernel.org/all/20251219101502.GB1132199@noisy.programming.kicks-ass.net/
© 2016 - 2026 Red Hat, Inc.