From nobody Tue Oct 7 18:25:14 2025 Received: from baidu.com (mx24.baidu.com [111.206.215.185]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33EF227713 for ; Mon, 7 Jul 2025 23:59:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=111.206.215.185 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751932752; cv=none; b=n/kmHVfXL/x52lfE2EP1/p8g85t1FYIKgyAdmGEcwY1ogxyVf2KTO6hX1w9gTq55uJ1olDdNotdm2taXOkqhWHWiAUwYXjn1cFMSSWGzIAltsa7we+U9M5ZoDcc0CAwAgutJWPypNKKlI2mdQi/RaYB2MJ96/XBKcb7/if19jEI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751932752; c=relaxed/simple; bh=DikT8ktGhIfeAaNBuhv0F2upOtkYIobxqYTOCV+abks=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=A9Yt+Gkv6A4Tw2k3rUw1QbNfJRmNBdSXtMuxpXRQUMcHrqwghF87TcZzaFGmxeWcXzSA1CYiXpHjup2L9fndEp4E1PmGygCI6AG3a4OBcIiLzgsHpDsfvQtIM9GzVq4k4W/8VKrCv9N+ZGlWsX2Q+Du4KlITzaBpJMH1GWGUsU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=baidu.com; spf=pass smtp.mailfrom=baidu.com; arc=none smtp.client-ip=111.206.215.185 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=baidu.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=baidu.com From: "Li,Rongqing" To: Oleg Nesterov , Peter Zijlstra , David Laight CC: "linux-kernel@vger.kernel.org" , "vschneid@redhat.com" , "mgorman@suse.de" , "bsegall@google.com" , "rostedt@goodmis.org" , "dietmar.eggemann@arm.com" , "vincent.guittot@linaro.org" , "juri.lelli@redhat.com" , "mingo@redhat.com" Subject: =?gb2312?B?tPC4tDogWz8/Pz9dIFJlOiBkaXZpZGUgZXJyb3IgaW4geDg2IGFuZCBjcHV0?= =?gb2312?Q?ime?= Thread-Topic: [????] Re: divide error in x86 and cputime Thread-Index: AdvvFOeRwP1XMiJ8Sh6ojy4vzov+4AAM96oAAAC7wQAAEo0PUA== Date: Mon, 7 Jul 2025 23:41:14 +0000 Message-ID: <2ef88def90634827bac1874d90e0e329@baidu.com> References: <78a0d7bb20504c0884d474868eccd858@baidu.com> <20250707220937.GA15787@redhat.com> <20250707223038.GB15787@redhat.com> In-Reply-To: <20250707223038.GB15787@redhat.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FEAS-Client-IP: 172.31.50.18 X-FE-Policy-ID: 52:10:53:SYSTEM Content-Type: text/plain; charset="utf-8" > On a second thought, this >=20 > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, > 0x09e00900); > stime rtime > stime + utime >=20 > looks suspicious: >=20 > - stime > stime + utime >=20 > - rtime =3D 0xfffd213aabd74626 is absurdly huge >=20 > so perhaps there is another problem? >=20 it happened when a process with 236 busy polling threads , run about 904 da= ys, the total time will overflow the 64bit non-x86 system maybe has same issue, once (stime + utime) overflows 64bit, = mul_u64_u64_div_u64 from lib/math/div64.c maybe cause division by 0 so to cputime, could cputime_adjust() return stime if stime if stime + utim= e is overflow diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 6dab4854..db0c273 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -579,6 +579,10 @@ void cputime_adjust(struct task_cputime *curr, struct = prev_cputime *prev, goto update; } + if (stime > (stime + utime)) { + goto update; + } + stime =3D mul_u64_u64_div_u64(stime, rtime, stime + utime); /* * Because mul_u64_u64_div_u64() can approximate on some Thanks -Li > Oleg. >=20 > On 07/08, Oleg Nesterov wrote: > > > > On 07/07, Li,Rongqing wrote: > > > > > > [78250815.703847] divide error: 0000 [#1] PREEMPT SMP NOPTI > > > > ... > > > > > It caused by a process with many threads running very long, and > > > utime+stime overflowed 64bit, then cause the below div > > > > > > mul_u64_u64_div_u64(0x69f98da9ba980c00, 0xfffd213aabd74626, > > > 0x09e00900); > > > > > > I see the comments of mul_u64_u64_div_u64() say: > > > > > > Will generate an #DE when the result doesn't fit u64, could fix with > > > an __ex_table[] entry when it becomes an issu > > > > > > Seem __ex_table[] entry for div does not work ? > > > > Well, the current version doesn't have an __ex_table[] entry for div... > > > > I do not know what can/should we do in this case... Perhaps > > > > static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) > > { > > int ok =3D 0; > > u64 q; > > > > asm ("mulq %3; 1: divq %4; movl $1,%1; 2:\n" > > _ASM_EXTABLE(1b, 2b) > > : "=3Da" (q), "+r" (ok) > > : "a" (a), "rm" (mul), "rm" (div) > > : "rdx"); > > > > return ok ? q : -1ul; > > } > > > > ? > > > > Should return ULLONG_MAX on #DE. > > > > Oleg.