> On a second thought, this
>
> mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626,
> 0x09e00900);
> stime rtime
> stime + utime
>
> looks suspicious:
>
> - stime > stime + utime
>
> - rtime = 0xfffd213aabd74626 is absurdly huge
>
> so perhaps there is another problem?
>
it happened when a process with 236 busy polling threads , run about 904 days, the total time will overflow the 64bit
non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0
so to cputime, could cputime_adjust() return stime if stime if stime + utime is overflow
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 6dab4854..db0c273 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
goto update;
}
+ if (stime > (stime + utime)) {
+ goto update;
+ }
+
stime = mul_u64_u64_div_u64(stime, rtime, stime + utime);
/*
* Because mul_u64_u64_div_u64() can approximate on some
Thanks
-Li
> Oleg.
>
> On 07/08, Oleg Nesterov wrote:
> >
> > On 07/07, Li,Rongqing wrote:
> > >
> > > [78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI
> >
> > ...
> >
> > > It caused by a process with many threads running very long, and
> > > utime+stime overflowed 64bit, then cause the below div
> > >
> > > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626,
> > > 0x09e00900);
> > >
> > > I see the comments of mul_u64_u64_div_u64() say:
> > >
> > > Will generate an #DE when the result doesn't fit u64, could fix with
> > > an __ex_table[] entry when it becomes an issu
> > >
> > > Seem __ex_table[] entry for div does not work ?
> >
> > Well, the current version doesn't have an __ex_table[] entry for div...
> >
> > I do not know what can/should we do in this case... Perhaps
> >
> > static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> > {
> > int ok = 0;
> > u64 q;
> >
> > asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n"
> > _ASM_EXTABLE(1b, 2b)
> > : "=a" (q), "+r" (ok)
> > : "a" (a), "rm" (mul), "rm" (div)
> > : "rdx");
> >
> > return ok ? q : -1ul;
> > }
> >
> > ?
> >
> > Should return ULLONG_MAX on #DE.
> >
> > Oleg.
> non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, > mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0 > Correct this, mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, 0x009e00900) oflib/math/div64.c returns 0xffffffffffffffff > so to cputime, could cputime_adjust() return stime if stime if stime + utime is > overflow > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index > 6dab4854..db0c273 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct > prev_cputime *prev, > goto update; > } > > + if (stime > (stime + utime)) { > + goto update; > + } > + > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime); > /* > * Because mul_u64_u64_div_u64() can approximate on some > > > Thanks > > -Li > > > > Oleg. > > > > On 07/08, Oleg Nesterov wrote: > > > > > > On 07/07, Li,Rongqing wrote: > > > > > > > > [78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI > > > > > > ... > > > > > > > It caused by a process with many threads running very long, and > > > > utime+stime overflowed 64bit, then cause the below div > > > > > > > > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, > > > > 0x09e00900); > > > > > > > > I see the comments of mul_u64_u64_div_u64() say: > > > > > > > > Will generate an #DE when the result doesn't fit u64, could fix > > > > with an __ex_table[] entry when it becomes an issu > > > > > > > > Seem __ex_table[] entry for div does not work ? > > > > > > Well, the current version doesn't have an __ex_table[] entry for div... > > > > > > I do not know what can/should we do in this case... Perhaps > > > > > > static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) > > > { > > > int ok = 0; > > > u64 q; > > > > > > asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n" > > > _ASM_EXTABLE(1b, 2b) > > > : "=a" (q), "+r" (ok) > > > : "a" (a), "rm" (mul), "rm" (div) > > > : "rdx"); > > > > > > return ok ? q : -1ul; > > > } > > > > > > ? > > > > > > Should return ULLONG_MAX on #DE. > > > > > > Oleg.
On Mon, 7 Jul 2025 23:41:14 +0000 "Li,Rongqing" <lirongqing@baidu.com> wrote: > > On a second thought, this > > > > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, > > 0x09e00900); > > stime rtime > > stime + utime > > > > looks suspicious: > > > > - stime > stime + utime > > > > - rtime = 0xfffd213aabd74626 is absurdly huge > > > > so perhaps there is another problem? > > > > it happened when a process with 236 busy polling threads , run about 904 days, the total time will overflow the 64bit > > non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0 > > so to cputime, could cputime_adjust() return stime if stime if stime + utime is overflow > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index 6dab4854..db0c273 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev, > goto update; > } > > + if (stime > (stime + utime)) { > + goto update; > + } > + > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime); > /* > * Because mul_u64_u64_div_u64() can approximate on some > Are you running 5.10.0? Because a diff of 5.10.238 from 5.10.0 gives: @@ -579,6 +579,12 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev, } stime = mul_u64_u64_div_u64(stime, rtime, stime + utime); + /* + * Because mul_u64_u64_div_u64() can approximate on some + * achitectures; enforce the constraint that: a*b/(b+c) <= a. + */ + if (unlikely(stime > rtime)) + stime = rtime; update: Thus the result is what's getting screwed up. -- Steve
> On Mon, 7 Jul 2025 23:41:14 +0000 > "Li,Rongqing" <lirongqing@baidu.com> wrote: > > > > On a second thought, this > > > > > > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, > > > 0x09e00900); > > > stime rtime > > > stime + utime > > > > > > looks suspicious: > > > > > > - stime > stime + utime > > > > > > - rtime = 0xfffd213aabd74626 is absurdly huge > > > > > > so perhaps there is another problem? > > > > > > > it happened when a process with 236 busy polling threads , run about > > 904 days, the total time will overflow the 64bit > > > > non-x86 system maybe has same issue, once (stime + utime) overflows > > 64bit, mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division > > by 0 > > > > so to cputime, could cputime_adjust() return stime if stime if stime + > > utime is overflow > > > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index > > 6dab4854..db0c273 100644 > > --- a/kernel/sched/cputime.c > > +++ b/kernel/sched/cputime.c > > @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, > struct prev_cputime *prev, > > goto update; > > } > > > > + if (stime > (stime + utime)) { > > + goto update; > > + } > > + > > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime); > > /* > > * Because mul_u64_u64_div_u64() can approximate on some > > > > Are you running 5.10.0? Because a diff of 5.10.238 from 5.10.0 gives: > > @@ -579,6 +579,12 @@ void cputime_adjust(struct task_cputime *curr, struct > prev_cputime *prev, > } > > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime); > + /* > + * Because mul_u64_u64_div_u64() can approximate on some > + * achitectures; enforce the constraint that: a*b/(b+c) <= a. > + */ > + if (unlikely(stime > rtime)) > + stime = rtime; My 5.10 has not this patch " sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime ", but I am sure this patch can not fix this overflow issue, Since division error happened in mul_u64_u64_div_u64() Thanks -Li > > update: > > > Thus the result is what's getting screwed up. > > -- Steve
On Tue, 8 Jul 2025 00:10:54 +0000 "Li,Rongqing" <lirongqing@baidu.com> wrote: > > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime); > > + /* > > + * Because mul_u64_u64_div_u64() can approximate on some > > + * achitectures; enforce the constraint that: a*b/(b+c) <= a. > > + */ > > + if (unlikely(stime > rtime)) > > + stime = rtime; > > > My 5.10 has not this patch " sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime ", > but I am sure this patch can not fix this overflow issue, Since division error happened in mul_u64_u64_div_u64() Have you tried it? Or are you just making an assumption? How can you be so sure? Did you even *look* at the commit? sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime In extreme test scenarios: the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime, utime = 18446744073709518790 ns, rtime = 135989749728000 ns In cputime_adjust() process, stime is greater than rtime due to mul_u64_u64_div_u64() precision problem. before call mul_u64_u64_div_u64(), stime = 175136586720000, rtime = 135989749728000, utime = 1416780000. after call mul_u64_u64_div_u64(), stime = 135989949653530 unsigned reversion occurs because rtime is less than stime. utime = rtime - stime = 135989749728000 - 135989949653530 = -199925530 = (u64)18446744073709518790 Trigger condition: 1). User task run in kernel mode most of time 2). ARM64 architecture 3). TICK_CPU_ACCOUNTING=y CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime When stime ends up greater than rtime, it causes utime to go NEGATIVE! That means *YES* it can overflow a u64 number. That's your bug. Next time, look to see if there's fixes in the code that is triggering issues for you and test them out, before bothering upstream. Goodbye. -- Steve
On Mon, 7 Jul 2025 20:30:57 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Tue, 8 Jul 2025 00:10:54 +0000 > "Li,Rongqing" <lirongqing@baidu.com> wrote: > > > > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime); > > > + /* > > > + * Because mul_u64_u64_div_u64() can approximate on some > > > + * achitectures; enforce the constraint that: a*b/(b+c) <= a. > > > + */ > > > + if (unlikely(stime > rtime)) > > > + stime = rtime; > > > > > > My 5.10 has not this patch " sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime ", > > but I am sure this patch can not fix this overflow issue, Since division error happened in mul_u64_u64_div_u64() > > Have you tried it? Or are you just making an assumption? > > How can you be so sure? Did you even *look* at the commit? It can't be relevant. That change is after the mul_u64_u64_div_u64() call that trapped. It is also not relevant for x86-64 because it uses the asm version. At some point mul_u64_u64_div_u64() got changed to be accurate (and slow) so that check isn't needed any more. David
> > On Tue, 8 Jul 2025 00:10:54 +0000 > > "Li,Rongqing" <lirongqing@baidu.com> wrote: > > > > > > stime = mul_u64_u64_div_u64(stime, rtime, stime + utime); > > > > + /* > > > > + * Because mul_u64_u64_div_u64() can approximate on some > > > > + * achitectures; enforce the constraint that: a*b/(b+c) <= a. > > > > + */ > > > > + if (unlikely(stime > rtime)) > > > > + stime = rtime; > > > > > > > > > My 5.10 has not this patch " sched/cputime: Fix > > > mul_u64_u64_div_u64() precision for cputime ", but I am sure this > > > patch can not fix this overflow issue, Since division error happened > > > in mul_u64_u64_div_u64() > > > > Have you tried it? Or are you just making an assumption? > > > > How can you be so sure? Did you even *look* at the commit? > > It can't be relevant. > That change is after the mul_u64_u64_div_u64() call that trapped. > It is also not relevant for x86-64 because it uses the asm version. > > At some point mul_u64_u64_div_u64() got changed to be accurate (and slow) so > that check isn't needed any more. > I see this patch not relevant Thank you very much for your confirmation -Li
> Have you tried it? Or are you just making an assumption? > > How can you be so sure? Did you even *look* at the commit? > > sched/cputime: Fix mul_u64_u64_div_u64() precision for cputime > > In extreme test scenarios: > the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime, > utime = 18446744073709518790 ns, rtime = 135989749728000 ns > > In cputime_adjust() process, stime is greater than rtime due to > mul_u64_u64_div_u64() precision problem. > before call mul_u64_u64_div_u64(), > stime = 175136586720000, rtime = 135989749728000, utime = > 1416780000. > after call mul_u64_u64_div_u64(), > stime = 135989949653530 > > unsigned reversion occurs because rtime is less than stime. > utime = rtime - stime = 135989749728000 - 135989949653530 > = -199925530 > = (u64)18446744073709518790 > I will try to tested this patch, But I think it is different case; Stime is not greater than rtime in my case, (stime= 0x69f98da9ba980c00, rtime= 0xfffd213aabd74626, stime+utime= 0x9e00900. So utime should be 0x960672564f47fd00 ), and this overflow process with 236 busy poll threads running about 904 day, so I think these times are correct Thanks -Li > Trigger condition: > 1). User task run in kernel mode most of time > 2). ARM64 architecture > 3). TICK_CPU_ACCOUNTING=y > CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set > > Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime > > > When stime ends up greater than rtime, it causes utime to go NEGATIVE! > > That means *YES* it can overflow a u64 number. That's your bug. > > Next time, look to see if there's fixes in the code that is triggering issues for you > and test them out, before bothering upstream. > > Goodbye. > > -- Steve
On Tue, 8 Jul 2025 01:17:50 +0000 "Li,Rongqing" <lirongqing@baidu.com> wrote: > Stime is not greater than rtime in my case, (stime= 0x69f98da9ba980c00, > rtime= 0xfffd213aabd74626, stime+utime= 0x9e00900. So utime should be > 0x960672564f47fd00 ), and this overflow process with 236 busy poll > threads running about 904 day, so I think these times are correct > But look at rtime, it is *negative*. So maybe that fix isn't going to fix this bug, but rtime is most definitely screwed up. That value is: 0xfffd213aabd74626 = (u64)18445936184654251558 = (s64)-807889055300058 There's no way run time should be 584 years in nanoseconds. So if it's not fixed by that commit, it's a bug that happened before you even got to the mul_u64_u64_div_u64() function. Touching that is only putting a band-aid on the symptom, you haven't touched the real bug. I bet there's likely another fix between what you are using and 5.10.238. There's 31,101 commits between those two. You are using a way old kernel without any fixes to it. It is known to be buggy. You will hit bugs with it. No need to tell us about it. -- Steve
© 2016 - 2025 Red Hat, Inc.