From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1C223FFD for ; Sat, 14 Jun 2025 09:54:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894856; cv=none; b=o/5jUevPr2BK3BQdBYy2uOQBdSsvgabJ3t9IaCzoXPxuzVKPjcpemP7ooH2L0tWcyNlS7FZjaoxDzyxs2b6cA8510ue+CDabek2ULRQijKLwiOBUJlCt5Kfv2Vci7YJLfIBTkfk8L88fpDG7ASUwoa+NmV/0M/mAm1cBzFHbb44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894856; c=relaxed/simple; bh=EOVDbuv7TFQlHss93ohFwrc/tQHT72C+uLsvZ3IoSzE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=M/N9JrBA91hdLjNwQ4qw1WFQRJ8DYsZHk0Jv44sL87ZXXv7YifLhDgW9UUfFc1HCDAgQiGin7l85ZBGgSFpP4O8lS6GGe1szylgLqYiiprBKYJhk+I2GFaOnLqbNoG4Pi367HVg5Nkg9HunCyagCwR0qSYgU7fabjsqj03Rrz+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PHaTjL+w; arc=none smtp.client-ip=209.85.221.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PHaTjL+w" Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-3a3798794d3so2665255f8f.1 for ; Sat, 14 Jun 2025 02:54:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894853; x=1750499653; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BkCL/Vt5q5OcKwnPeY8ySV0HKzOdZLzX2zncNiWmtWQ=; b=PHaTjL+wIvzPkMj+vzN+bMQO4r75DUeMarL68yffB1VFxvw+8gvyQQT7UGYtNR0fDd QRQ+K8eO/64P4B7/ACqttMmYcVzwDYlCi7c5Bl8nOzFWAQ3WzBCxzHOgGoy9F0Fnz7xW gAwavk6DEhg4+2+lSZ0ciQg0V+R4uo7NkvE+Ra9sAs62xQGKJEhEE912HelogGHNFTIG tQF2PMnV8Y0AbUicPH0oJzPrXJ3htEjdOM97Sj0fOXOQvheOo+Cg+74uujPW25GYam6H XXhvDTrhLF/rW7a3mmB80ZcStVMhu3TRhZeG9xLk9/RObIiZtGHqsZJHvQyHdP70H/gy lU6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894853; x=1750499653; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BkCL/Vt5q5OcKwnPeY8ySV0HKzOdZLzX2zncNiWmtWQ=; b=j3TrWXnjO02lk6uBB5ZNxokSOfxmDuSt83nWhLOY9nEcTuDj8sGD3b3zwdyzv869zO S5V4OC4p6dUozWSTqy8xJYxCiGG+Xooaafdye0wZxDkhivL83SmTjA5BqRPPX0XI4eei VDYGvvWibjy7bUv5jj2mE+YPA7bxebiBmvuC1DrEk2UUZvUrG+cu1lQCie7lILW8+A+o 2nf8EgVTb1yG2WbAYdvkkaV5onGVaOQddeQMLizMBqPCunJWX/fecqOTHlT+sUqvFE+Q BSJz2iOiApP994+ePAQeMKwcCqe1rapwdqXHRyjGfma2x+4i7zfuqMfHNHp2htaIJUda CB0w== X-Forwarded-Encrypted: i=1; AJvYcCU5TtCDLbHVulBMXjUZmhVpC8VbNCK6iHARgRNGHPcpd8Rq0EiUKTeU90qUzXU/4haTihAfYybXSObegdM=@vger.kernel.org X-Gm-Message-State: AOJu0YzVejj86gx19UpGC7FfILTH47EG6C2G8cE17smbamGlsogKjIuD BgZrfI9RBtJkjT5a7dG3gUGO+gdFqwsLnD3nxnsTs3bqeojHOt3lJN8bU5vbuQ== X-Gm-Gg: ASbGnctd8bRIUaO6l0Fx7/IerkawkdbgV/ZAY8PMadLfOaKns0+CkDGjuqbmpyuI+nN iIliGU3I/jPgDr6weVR8Az5t4TjGsY39Uvx2NR4JnfkUmZs84NnaeiqNssBgHGQP+ZrUswRN8Aj H24uJtJ7NWaxc8djHxJJ15qHRH5OQYtEBCxMzEZo/tn+VibslMSLo5joU9xFXfoZXr7jJqei7sG qoQ+IEVp2mWSXmV4UFRNfqf/OC+5OIl2xQygSrNdQx+wJcyOIb5y0W4HgOfXLVxdDpvOZCc34gp A1yXNYjwXHwtMpXhtTK0CjSAsmf3Hg7TRfVk/Gieg4MlOLum2FLMDB4IjlPXmPly7gyH3u5i66a CLtTy/8KFme2Q1yvtQzBf/taqA7Lfk3J/nhAw0csvBr0= X-Google-Smtp-Source: AGHT+IFlzdzTjVJtPmR+g6aY5/br4zeP0j7xtigVPw1IDEECv7yqLscSed0KvEFbpYg+t1QOB7yDkQ== X-Received: by 2002:a05:6000:26cd:b0:3a3:6595:921f with SMTP id ffacd0b85a97d-3a572e79674mr1962401f8f.41.1749894853065; Sat, 14 Jun 2025 02:54:13 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:12 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 01/10] lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd' Date: Sat, 14 Jun 2025 10:53:37 +0100 Message-Id: <20250614095346.69130-2-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change to prototype from mul_u64_u64_div_u64(u64 a, u64 b, u64 c) to mul_u64_u64_div_u64(u64 a, u64 b, u64 d). Using 'd' for 'divisor' makes more sense. Am upcoming change adds a 'c' parameter to calculate (a * b + c)/d. Signed-off-by: David Laight Reviewed-by: Nicolas Pitre --- lib/math/div64.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/lib/math/div64.c b/lib/math/div64.c index 5faa29208bdb..a5c966a36836 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -184,10 +184,10 @@ u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *= remainder) EXPORT_SYMBOL(iter_div_u64_rem); =20 #ifndef mul_u64_u64_div_u64 -u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) +u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) { if (ilog2(a) + ilog2(b) <=3D 62) - return div64_u64(a * b, c); + return div64_u64(a * b, d); =20 #if defined(__SIZEOF_INT128__) =20 @@ -212,36 +212,36 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) =20 #endif =20 - /* make sure c is not zero, trigger exception otherwise */ + /* make sure d is not zero, trigger exception otherwise */ #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wdiv-by-zero" - if (unlikely(c =3D=3D 0)) + if (unlikely(d =3D=3D 0)) return 1/0; #pragma GCC diagnostic pop =20 - int shift =3D __builtin_ctzll(c); + int shift =3D __builtin_ctzll(d); =20 /* try reducing the fraction in case the dividend becomes <=3D 64 bits */ if ((n_hi >> shift) =3D=3D 0) { u64 n =3D shift ? (n_lo >> shift) | (n_hi << (64 - shift)) : n_lo; =20 - return div64_u64(n, c >> shift); + return div64_u64(n, d >> shift); /* * The remainder value if needed would be: - * res =3D div64_u64_rem(n, c >> shift, &rem); + * res =3D div64_u64_rem(n, d >> shift, &rem); * rem =3D (rem << shift) + (n_lo - (n << shift)); */ } =20 - if (n_hi >=3D c) { + if (n_hi >=3D d) { /* overflow: result is unrepresentable in a u64 */ return -1; } =20 /* Do the full 128 by 64 bits division */ =20 - shift =3D __builtin_clzll(c); - c <<=3D shift; + shift =3D __builtin_clzll(d); + d <<=3D shift; =20 int p =3D 64 + shift; u64 res =3D 0; @@ -256,8 +256,8 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) n_hi <<=3D shift; n_hi |=3D n_lo >> (64 - shift); n_lo <<=3D shift; - if (carry || (n_hi >=3D c)) { - n_hi -=3D c; + if (carry || (n_hi >=3D d)) { + n_hi -=3D d; res |=3D 1ULL << p; } } while (n_hi); --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B03C1CCB40 for ; Sat, 14 Jun 2025 09:54:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894857; cv=none; b=b8eMIJKdy1mN0HvNKYgWxM5opOkiK6Q2yxgy99mp3DHZ2bHUn6CPoPSEAPtV4PiKG9+BcywX34W7FrIlxRxQPOtAjZ9qxBO9v6VOCf5mdlW6sFC7lwe2esBURgYK2lsRw3pdaFMYMvi46EOL+AI3BePlWB+ULjdE7SR15sAyCGE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894857; c=relaxed/simple; bh=/DAhbAalRWUaIbB1SD5U9rjy+L7LK5uvY2pUXmbFHSE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eNltBUcm3jB0VZKImnDTi3JWkw0D9NnoSi4FA+dbrX8ihSUhcg2jilaaIeHCHX2+o1WAqVYJ6clh1nq6LqQfZh8z3ceVNF4OB93Z8CFpxpjfWJTK8C+WAsCk3jjAqQ0ZArP0OfOmFXYqqxdxjJ0j5wzf0+bgyWDCWSqYDu/soXI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fFdQWiis; arc=none smtp.client-ip=209.85.128.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fFdQWiis" Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-4533bf4d817so4219995e9.2 for ; Sat, 14 Jun 2025 02:54:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894854; x=1750499654; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ndZldl+b//mjmwm6Wu4ORnjQlkIiVaXqckuA68ipUcs=; b=fFdQWiiswUx6tE4KzmPOVqZvMz26yyA02Y1S5XsaaXGZJbMYWNpq8zZ8RnJUid7uvN SVEa0tIJekEKRO8zTqowNNOXi696EYi9xMm+mV1xuFW1xueYXfFV65WUL015m8ZwYu5i WMtb7vFD/32HWHJfQAacP9n60nnbE4k0b6J3g7myJF2tw38UktDbuc2Bc/7tuN5klRRS msFaU90X7+A4FXR2u5Nchphx4tD1vkDh2dCUcBxXCk1VBqSIUdSVLJka6N8w6oMok0CZ rsiE/grUvs+W/YVH7u24ksC0dRlJTwHfQ6TVfHOa8+f+kkwEvA9ENwF3wsiZym+lNp5K DxgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894854; x=1750499654; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ndZldl+b//mjmwm6Wu4ORnjQlkIiVaXqckuA68ipUcs=; b=wvAOYwQWHL/NHMK9lU4cLyyXQxfgMABTwYnSvl6qdH80385lnUsI3uiFMQotaSqauA wsLAdWgkGdPHVwcFGC24+S4Tiq4NhvjdUBJinEo4uRxVeQFHP+BoQAXoHZEyqnDQ9HLP xO0wFvC0kOrOE5DYf1FD5eW4lYvLi+xA9LbuKX/H6NQ55CSaME4Q9D/3eQFPlGS1pzpz QWROq3EbhP047RHsNaHhomHsfW9m4tVipI+ZcmpWYe6NhRZQRs/bgJmz7MhjN/05BuwT QyAxU9h61pvn21RumVtcZh35Xagh+F1vt5Lr40UVzrwssAaGRLp0iecFIr4n4ZrrL7YO HZKg== X-Forwarded-Encrypted: i=1; AJvYcCWCIUMtxmAlC69J8fpUOP+ykYlQR37imTY0E6F7UpaAaw7A2oqA1J/gl6O7fWM3sdEGKruZrTU/SypqJhs=@vger.kernel.org X-Gm-Message-State: AOJu0Yxmfv20G/yw28kkgROu8XLbw5VSxmaOOelpqtLaebSz6qISB6Vf /PTFV2dRJkgicCthE3acgXhgXeRmFeUND2YNxlfX5roWh4vWqMcYuOy9 X-Gm-Gg: ASbGnctVUsBHOgTr2sZ/0RKCK+oDThN3S1ClzzBACPcS7NnroooJpvubWgNrrkkJuiV bOuW/zfg03VzLkVSXrcdA63+mBeWYkz/Yt1CEaR+6N86TmzO9q4SRqAPkGY9TmaiKgtckZiz1gL DutMx/sVHoFsuC3aJS7TxKLu6lwwvA9DYNBMnsLER6aq0PHEdX/Yr3i9+1FpiJAYtecVwjGaH+6 4xT28GosMN314QreH+XZkHgdIB+CXMKAGOJihVQ+XDOb1GUN/uxKV/CPf4iG53qFaHGNXPjC0yT uQyOoJSNKcZ1Ooq38DNcjlwyIFuh+XWcRBg5GeT59Og04UrxTikhGRgK50hLtjPtsX0FpG7fYFp dMYt2s4o/tw8Abboyeeh+iM49Faqyy4xXz/2mP0K+izg= X-Google-Smtp-Source: AGHT+IGSjd/kO18b5Ye+GM/X/c+MeEZNGDbYMf1Lskkd6Y0qzZzOQkIOeqoOJECwhIbt6WAL9LSdCA== X-Received: by 2002:a05:600c:1d07:b0:442:d9f2:c74e with SMTP id 5b1f17b1804b1-4533cab5335mr25414685e9.23.1749894853548; Sat, 14 Jun 2025 02:54:13 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:13 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 02/10] lib: mul_u64_u64_div_u64() Use WARN_ONCE() for divide errors. Date: Sat, 14 Jun 2025 10:53:38 +0100 Message-Id: <20250614095346.69130-3-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Do an explicit WARN_ONCE(!divisor) instead of hoping the 'undefined behaviour' the compiler generates for a compile-time 1/0 is in any way useful. Return 0 (rather than ~(u64)0) because it is less likely to cause further serious issues. Add WARN_ONCE() in the divide overflow path. A new change for v2 of the patchset. Whereas gcc inserts (IIRC) 'ud2' clang is likely to let the code continue and generate 'random' results for any 'undefined behaviour'. v3: Use WARN_ONCE() and return 0 instead of BUG_ON(). Explicitely #include Signed-off-by: David Laight --- lib/math/div64.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/lib/math/div64.c b/lib/math/div64.c index a5c966a36836..397578dc9a0b 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -19,6 +19,7 @@ */ =20 #include +#include #include #include #include @@ -186,6 +187,15 @@ EXPORT_SYMBOL(iter_div_u64_rem); #ifndef mul_u64_u64_div_u64 u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) { + if (WARN_ONCE(!d, "%s: division of (%#llx * %#llx) by zero, returning 0", + __func__, a, b)) { + /* + * Return 0 (rather than ~(u64)0) because it is less likely to + * have unexpected side effects. + */ + return 0; + } + if (ilog2(a) + ilog2(b) <=3D 62) return div64_u64(a * b, d); =20 @@ -212,12 +222,10 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) =20 #endif =20 - /* make sure d is not zero, trigger exception otherwise */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wdiv-by-zero" - if (unlikely(d =3D=3D 0)) - return 1/0; -#pragma GCC diagnostic pop + if (WARN_ONCE(n_hi >=3D d, + "%s: division of (%#llx * %#llx =3D %#llx%016llx) by %#llx overflo= ws, returning ~0", + __func__, a, b, n_hi, n_lo, d)) + return ~(u64)0; =20 int shift =3D __builtin_ctzll(d); =20 @@ -233,11 +241,6 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) */ } =20 - if (n_hi >=3D d) { - /* overflow: result is unrepresentable in a u64 */ - return -1; - } - /* Do the full 128 by 64 bits division */ =20 shift =3D __builtin_clzll(d); --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC8CA1D2F42 for ; Sat, 14 Jun 2025 09:54:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894857; cv=none; b=pe8SZeIdjz5t7sXgLTb7YfAS/ZUOE56V7aINHPyl5FXKJpwhorrcNS9Kw7wAbWtRNIbbEMOoRjO+5OlxTqx+0n5Pg33HiSMd4O1pUKT+a1Rb27MzQPwiEInaEfbDc+l4q05n/EGOFVDm95a/fbArQr26sC1RPdEQxQJyJCauceE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894857; c=relaxed/simple; bh=gNPWLk7bVGAvDlMuEwd7oZj30Q4/E6JcxEl7T/MFyqI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZvDCpApSV81/5tTubmUCQlZqprbRdEGFWzWaSO+psynp5DDGC3R8KMd673Ds85cOEkMbdZ/IPte/F3RpIMSKXnRBn6juQWmun/sfkrv027q4e5/JQPWeLQsIn3pL91D0Mmmv5L1g5dGXO3glAXaePuCImJ1K/rZeo0C65N432IU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=F9YbXESq; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="F9YbXESq" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-450cf0120cdso23752545e9.2 for ; Sat, 14 Jun 2025 02:54:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894854; x=1750499654; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oEoRmOROics9XZZ0dFappmfLDjCc2Jum5Eh7tphQmqA=; b=F9YbXESqczbrQOCWdRIZpS3Xa1HUpgXRxBuQmfR2wTSWUy9iIV2CQDJYWLrXHa22SL uCx8T3wEzvUxMq4ITTkukomtYxbM49HXqOha4s2TjESKjaur3NTGzx3r+fWKv5UwWVDG PwZP+noo6qQsMAFCpa35rFFaSPfVhLEO6DgSGHT6+qvM8ilUNPeO1SD7nVGcKZpyxQWK kloOrscFphsylWyzE7jTn1+jcaXm16LNiyFKqt7cNRjKSSuAx7NEHaakFDg5PvDhWKg4 T43JeLC08CkM2KCLLo0Ln20FMbTOY1IGiltY9N+bn8yuRgn9NQFnfmEJSfAdHAsmpgOv Q2ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894854; x=1750499654; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oEoRmOROics9XZZ0dFappmfLDjCc2Jum5Eh7tphQmqA=; b=I+nH5s4aLHi3UfPEAo+Ocs6a7kQMT5dsoRlO+o7+8lrZOt39ghXTVYyu9EmtgIiBSQ 9qXCCLEQ8Nd2IHnsXwgDrQmzAMA3xO1U9dg0E8uQvhWmCgG0DszcLBhtX7YlWW/4iaUx uqiNqzwH9j1lycT82WVyokN41cqdtnSdMgObQCXIzCFhs4NBNE0YfHBWUTb6Y0xf3FZW /CmFhkq5ol6XEo6EX+1J4LmjGycvUSfGLsJrYh3edsT2XpWeXm+hyG+M78/BDXQ+NF6t 5tG3h7Exxiwbk2ndEinRSlBVoE+5yCMNrEcbshCtSuGkPaLqCiLv1aqeyhOVIGhp8Xjs JW0w== X-Forwarded-Encrypted: i=1; AJvYcCWgI8cRFiU2g4wBereCnYe95Ue9b4Lb2/FYlU0jjr1Ga2kXBvBFr8NEC0RHD0hKI0jsTN7Q3lDeoDjl0Wc=@vger.kernel.org X-Gm-Message-State: AOJu0Yz1/jz+cc7Dk3qCXk7lAfFXoBAG4sYbmzOEkPM06HH7ygJE7ptT TU68alcqn9pnMjTZuQ5p2qhtx7BYTlANYLRbD9Syuzie7fXs0mJNlQ0Q X-Gm-Gg: ASbGncuE1kLS4F2otdGN2skmYEJeh8Ilhw3r5HIGfKo8VdSuT4g6LITgqppufZakDqL P2vAoNi+A9rVCB0K9hH0+sVJZswhIyz/tU8Q87Wb/sWCRJ2+FHiXBEjqlZIgVUrrVTd9XlZyeAl AaQOoA38sYL6KwoT/m6O2+y/x5bi/rc15JndQ6e04FurfaLh4YxlsGuF/4opeBID3FlvUdmfc2i QeBhki2yFwGuUyMdzgJBDS8CGBDfxH9i4wIVd+WsJc3fVcmwnrXgeZVWNXeEJFFckA9OHaZB/jS c4gOlGpMNWeV8dDDPN1o0TfaleF/+EVbKbaBa/00VCLWmUOO1xpwOlzvtaw15VPvZ1PsBS6M8bg TtfRPYbTu8yqUkD90TIsCJCYWbGslfshsm/nqrd9FEDY= X-Google-Smtp-Source: AGHT+IGPHoMO7+RCfAgqwFmRqb4OO/WnTD2i72BVayXQxIoB4REUTtbCjI5Mlqa31tNkHVk2IqjKqQ== X-Received: by 2002:a05:6000:2002:b0:3a4:f52d:8b11 with SMTP id ffacd0b85a97d-3a572373cbbmr2901003f8f.20.1749894854039; Sat, 14 Jun 2025 02:54:14 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:13 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 03/10] lib: mul_u64_u64_div_u64() simplify check for a 64bit product Date: Sat, 14 Jun 2025 10:53:39 +0100 Message-Id: <20250614095346.69130-4-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If the product is only 64bits div64_u64() can be used for the divide. Replace the pre-multiply check (ilog2(a) + ilog2(b) <=3D 62) with a simple post-multiply check that the high 64bits are zero. This has the advantage of being simpler, more accurate and less code. It will always be faster when the product is larger than 64bits. Most 64bit cpu have a native 64x64=3D128 bit multiply, this is needed (for the low 64bits) even when div64_u64() is called - so the early check gains nothing and is just extra code. 32bit cpu will need a compare (etc) to generate the 64bit ilog2() from two 32bit bit scans - so that is non-trivial. (Never mind the mess of x86's 'bsr' and any oddball cpu without fast bit-scan instructions.) Whereas the additional instructions for the 128bit multiply result are pretty much one multiply and two adds (typically the 'adc $0,%reg' can be run in parallel with the instruction that follows). The only outliers are 64bit systems without 128bit mutiply and simple in order 32bit ones with fast bit scan but needing extra instructions to get the high bits of the multiply result. I doubt it makes much difference to either, the latter is definitely not mainsteam. Split from patch 3 of v2 of this series. If anyone is worried about the analysis they can look at the generated code for x86 (especially when cmov isn't used). Signed-off-by: David Laight Reviewed-by: Nicolas Pitre --- lib/math/div64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/math/div64.c b/lib/math/div64.c index 397578dc9a0b..ed9475b9e1ef 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -196,9 +196,6 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) return 0; } =20 - if (ilog2(a) + ilog2(b) <=3D 62) - return div64_u64(a * b, d); - #if defined(__SIZEOF_INT128__) =20 /* native 64x64=3D128 bits multiplication */ @@ -222,6 +219,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) =20 #endif =20 + if (!n_hi) + return div64_u64(n_lo, d); + if (WARN_ONCE(n_hi >=3D d, "%s: division of (%#llx * %#llx =3D %#llx%016llx) by %#llx overflo= ws, returning ~0", __func__, a, b, n_hi, n_lo, d)) --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C3FB1D54D1 for ; Sat, 14 Jun 2025 09:54:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894858; cv=none; b=qZRZUmHJW8RTJKdzvyVFUG2DI5MlWdHW8eS17CgA6TCMNCi+6tlbRsMovXNWWetND0PmaJmwOOsa9dGTLmEod06Wrf5YVYAmNn5h8/gMKmSg5mfTdRSEMf/Fb37uWdwQhEO6HmmYQh6ipD61w91JhzYWO0yjDYZmaw+S8zB4cKQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894858; c=relaxed/simple; bh=V5A8c4F+Anra4BnQo+Mv9DZzO6ZTA0M1l0EhuD5TBgA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eYdNA1sjXD8J1k+vBMrreIVe275CQR64phmYbHkW0UuCOR3xkpTPcrE+mjd3L+NYsH67h1Yd6hmtDJUGNUj9pc+rZQvWGAAUtBMxgDTtw8mGCdLAKizSdLxU+cNnX9ohKnJYzg16yx96rBkzvwt+aCd59UpajuHhJQrrig+NQfU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DkLE41Gd; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DkLE41Gd" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-450cfb790f7so22620715e9.0 for ; Sat, 14 Jun 2025 02:54:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894855; x=1750499655; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MRv07B3uuhx1oL47u14W0Ds8EkcLpM2/A7/p5FHeIT8=; b=DkLE41GdHMH1XJo+NfcUH5JjEAFNYIHUY9b0GhSR8JDjMxBHn2rXeqUyQdfHJJYQBh CcFXk8hNdp1DLQyuoern3LX1foKWPU2dBs1e6NzIb6GEmAzC/WOZjA/s4YVmTdiWmf7e bFb994HoJPsOWlQTZRk8/Qgwx4Wewp9njOPypGU1tmXCubfOZam8fHDfKeBTmpNA4rwu nVU9J/AtysLJb/1lDXEZzXPYlzu9Vu4HH/flAHvsyN/TD7/CvpMhAxQ/4ReJi5O534+s aCQXd/CpYKMlsw6te2gvvIVGj1aG7UvJC3XYkYIO6R7SbGf6e4+wOgav56/8YlI8jc4s AHWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894855; x=1750499655; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MRv07B3uuhx1oL47u14W0Ds8EkcLpM2/A7/p5FHeIT8=; b=M24H6o/16p1Pd1W5MOVtDtAUcjjFUUPErAD5gpGWonuZHKjV5a9gvPMjbi/lKQb2ea 50IGtMVRbfONSVLW+fYPQgEf+pp23LYTDOx+mH2Q0iN/IalRZ6XJ451qJxyt0ldW7CcO nrgV/myYPn5ySpvbl66CYyUwhwZ+cndlC0qglpQrKeSk4ZkEyfxWvUuDoCA7MG6QOMhT FVd1ji6bJKvbCsJ29s04OT7m2nb9LsA0U7n8jnV1Z/A030SL7KQnqqRsTI2hfBz759lB BdCr/hixIrGaO8z+7C04DhL6WTP+OPlhoEVd6N5VuVAnzvXhrIFoWLWg59vn3pLZRwb9 VrqQ== X-Forwarded-Encrypted: i=1; AJvYcCW6rvNQ67dE5EaPH1IBHoDINGjTpddsJW2mP1OZNd+PFaqveTgzDXo+VhbVUwZG7seQNiRiwTSdU6fYAJ8=@vger.kernel.org X-Gm-Message-State: AOJu0YwbItNlqmOxbgqPkeAAItJWzTl9mTREy/1ZEwWvfdtNMyeMcNl1 pf6mKCy5iZXGlBFmf4NGwDjK4Rpbafg9fISmLrQfFeXUolwnOCkfvnihb8R0EQ== X-Gm-Gg: ASbGncuSqzb11+7On/g76aDlQ5I7q38t+sDi7POi1KqEPdtrsxjReel4tW8WLgF1Iqc V9Tb5HNAnzP6uca4UQATPAcjV/CFZQfV05KB+MO5LjHvRFRemjj0AEkyNWY9yYQymhlJyWCm8st B0X6bHwSZuW9/HkcVviWQ+NGIy1crydLOgrLhPNqzUnjjS2FFjoPynRUWWP6z5UIAkn3qziDhZ4 9+7DIJ9GkUjC6cIjpj4LyZaJM/PpR0Z5oG2KhUsPoxYNCrJ7ofdsLCywaCrqYcfkRy/4dpKoNVf 5QQ5tE+sXYvENXazDSX6qx6tH6Ai9RuA7OnTZWsSN/PuhPE8+glk3eWXdBcGMY45rp798XBn6X3 Su7p3bdI9fFwcM6limdNBLF9aqMupkhqmwZucmsKddsM= X-Google-Smtp-Source: AGHT+IF3OOcl+tXXUMMYOpw+OOYGfeCIfuL1vnnTOB42visF2nbrl9E8AE5w/J9WjsQF4BZyPVK+zw== X-Received: by 2002:a05:600c:1f94:b0:453:a95:f07d with SMTP id 5b1f17b1804b1-4533ca54b90mr36464145e9.10.1749894854533; Sat, 14 Jun 2025 02:54:14 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:14 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 04/10] lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup() Date: Sat, 14 Jun 2025 10:53:40 +0100 Message-Id: <20250614095346.69130-5-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The existing mul_u64_u64_div_u64() rounds down, a 'rounding up' variant needs 'divisor - 1' adding in between the multiply and divide so cannot easily be done by a caller. Add mul_u64_add_u64_div_u64(a, b, c, d) that calculates (a * b + c)/d and implement the 'round down' and 'round up' using it. Update the x86-64 asm to optimise for 'c' being a constant zero. Add kerndoc definitions for all three functions. Signed-off-by: David Laight Changes for v2 (formally patch 1/3): - Reinstate the early call to div64_u64() on 32bit when 'c' is zero. Although I'm not convinced the path is common enough to be worth the two ilog2() calls. Changes for v3 (formally patch 3/4): - The early call to div64_u64() has been removed by patch 3. Pretty much guaranteed to be a pessimisation. Signed-off-by: David Laight Reviewed-by: Nicolas Pitre --- arch/x86/include/asm/div64.h | 19 ++++++++++----- include/linux/math64.h | 45 +++++++++++++++++++++++++++++++++++- lib/math/div64.c | 22 ++++++++++-------- 3 files changed, 69 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h index 9931e4c7d73f..7a0a916a2d7d 100644 --- a/arch/x86/include/asm/div64.h +++ b/arch/x86/include/asm/div64.h @@ -84,21 +84,28 @@ static inline u64 mul_u32_u32(u32 a, u32 b) * Will generate an #DE when the result doesn't fit u64, could fix with an * __ex_table[] entry when it becomes an issue. */ -static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div) +static inline u64 mul_u64_add_u64_div_u64(u64 a, u64 mul, u64 add, u64 div) { u64 q; =20 - asm ("mulq %2; divq %3" : "=3Da" (q) - : "a" (a), "rm" (mul), "rm" (div) - : "rdx"); + if (statically_true(!add)) { + asm ("mulq %2; divq %3" : "=3Da" (q) + : "a" (a), "rm" (mul), "rm" (div) + : "rdx"); + } else { + asm ("mulq %2; addq %3, %%rax; adcq $0, %%rdx; divq %4" + : "=3Da" (q) + : "a" (a), "rm" (mul), "rm" (add), "rm" (div) + : "rdx"); + } =20 return q; } -#define mul_u64_u64_div_u64 mul_u64_u64_div_u64 +#define mul_u64_add_u64_div_u64 mul_u64_add_u64_div_u64 =20 static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 div) { - return mul_u64_u64_div_u64(a, mul, div); + return mul_u64_add_u64_div_u64(a, mul, 0, div); } #define mul_u64_u32_div mul_u64_u32_div =20 diff --git a/include/linux/math64.h b/include/linux/math64.h index 6aaccc1626ab..e1c2e3642cec 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -282,7 +282,53 @@ static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 = divisor) } #endif /* mul_u64_u32_div */ =20 -u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div); +/** + * mul_u64_add_u64_div_u64 - unsigned 64bit multiply, add, and divide + * @a: first unsigned 64bit multiplicand + * @b: second unsigned 64bit multiplicand + * @c: unsigned 64bit addend + * @d: unsigned 64bit divisor + * + * Multiply two 64bit values together to generate a 128bit product + * add a third value and then divide by a fourth. + * Generic code returns 0 if @d is zero and ~0 if the quotient exceeds 64 = bits. + * Architecture specific code may trap on zero or overflow. + * + * Return: (@a * @b + @c) / @d + */ +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d); + +/** + * mul_u64_u64_div_u64 - unsigned 64bit multiply and divide + * @a: first unsigned 64bit multiplicand + * @b: second unsigned 64bit multiplicand + * @d: unsigned 64bit divisor + * + * Multiply two 64bit values together to generate a 128bit product + * and then divide by a third value. + * Generic code returns 0 if @d is zero and ~0 if the quotient exceeds 64 = bits. + * Architecture specific code may trap on zero or overflow. + * + * Return: @a * @b / @d + */ +#define mul_u64_u64_div_u64(a, b, d) mul_u64_add_u64_div_u64(a, b, 0, d) + +/** + * mul_u64_u64_div_u64_roundup - unsigned 64bit multiply and divide rounde= d up + * @a: first unsigned 64bit multiplicand + * @b: second unsigned 64bit multiplicand + * @d: unsigned 64bit divisor + * + * Multiply two 64bit values together to generate a 128bit product + * and then divide and round up. + * Generic code returns 0 if @d is zero and ~0 if the quotient exceeds 64 = bits. + * Architecture specific code may trap on zero or overflow. + * + * Return: (@a * @b + @d - 1) / @d + */ +#define mul_u64_u64_div_u64_roundup(a, b, d) \ + ({ u64 _tmp =3D (d); mul_u64_add_u64_div_u64(a, b, _tmp - 1, _tmp); }) + =20 /** * DIV64_U64_ROUND_UP - unsigned 64bit divide with 64bit divisor rounded up diff --git a/lib/math/div64.c b/lib/math/div64.c index ed9475b9e1ef..7850cc0a7596 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -184,11 +184,11 @@ u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *= remainder) } EXPORT_SYMBOL(iter_div_u64_rem); =20 -#ifndef mul_u64_u64_div_u64 -u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) +#ifndef mul_u64_add_u64_div_u64 +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) { - if (WARN_ONCE(!d, "%s: division of (%#llx * %#llx) by zero, returning 0", - __func__, a, b)) { + if (WARN_ONCE(!d, "%s: division of (%#llx * %#llx + %#llx) by zero, retur= ning 0", + __func__, a, b, c)) { /* * Return 0 (rather than ~(u64)0) because it is less likely to * have unexpected side effects. @@ -199,7 +199,7 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) #if defined(__SIZEOF_INT128__) =20 /* native 64x64=3D128 bits multiplication */ - u128 prod =3D (u128)a * b; + u128 prod =3D (u128)a * b + c; u64 n_lo =3D prod, n_hi =3D prod >> 64; =20 #else @@ -208,8 +208,10 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) u32 a_lo =3D a, a_hi =3D a >> 32, b_lo =3D b, b_hi =3D b >> 32; u64 x, y, z; =20 - x =3D (u64)a_lo * b_lo; - y =3D (u64)a_lo * b_hi + (u32)(x >> 32); + /* Since (x-1)(x-1) + 2(x-1) =3D=3D x.x - 1 two u32 can be added to a u64= */ + x =3D (u64)a_lo * b_lo + (u32)c; + y =3D (u64)a_lo * b_hi + (u32)(c >> 32); + y +=3D (u32)(x >> 32); z =3D (u64)a_hi * b_hi + (u32)(y >> 32); y =3D (u64)a_hi * b_lo + (u32)y; z +=3D (u32)(y >> 32); @@ -223,8 +225,8 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) return div64_u64(n_lo, d); =20 if (WARN_ONCE(n_hi >=3D d, - "%s: division of (%#llx * %#llx =3D %#llx%016llx) by %#llx overflo= ws, returning ~0", - __func__, a, b, n_hi, n_lo, d)) + "%s: division of (%#llx * %#llx + %#llx =3D %#llx%016llx) by %#llx= overflows, returning ~0", + __func__, a, b, c, n_hi, n_lo, d)) return ~(u64)0; =20 int shift =3D __builtin_ctzll(d); @@ -268,5 +270,5 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d) =20 return res; } -EXPORT_SYMBOL(mul_u64_u64_div_u64); +EXPORT_SYMBOL(mul_u64_add_u64_div_u64); #endif --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0468A1DE3A8 for ; Sat, 14 Jun 2025 09:54:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894859; cv=none; b=Kpt90MlvIkCH5kmcqjZQyJbNzPJfaFMmsiYWqzGEiP7eFvby4JNRzSktk0gX5bYXJyZMttRzks6inFOJOXPX8a5UxNkY1Bzlbf5KRT4ZHX5f41jiIHlE+o24wuzzaJ7R5Sg1PxoFCsPynqYoDlR0n1d2Z/qw95qlewIkcL1FBME= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894859; c=relaxed/simple; bh=n94r1iGKEr3TiNgolTbPj8u7MorHH7FIByppBdcWdKA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kZnKmM72dftfplYx10xJ48+vb9hTIkf920EA/0YQh+EqWHMxXv/LN13OHXDVtG4JyhZN3Q3db+ORCPcmReQThVoRpRUa8qFhjAqGHd3U530cPfdpXLaEpJJZUT75PI2Kxn5bao/AkSO/YSn07srqLT994ZgjNb2mSQt+Ul/zJDA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OLHpyKVy; arc=none smtp.client-ip=209.85.128.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OLHpyKVy" Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-4533bf4d817so4220165e9.2 for ; Sat, 14 Jun 2025 02:54:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894855; x=1750499655; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zEnTZdM/nEO25TyhM2GgCtjBibybJT5d1uMDui3gNg8=; b=OLHpyKVyGYpP7/iUeDEi4asxly6SC3AoKjknM7+u74jKjiasvv1SCSszSLtCqjPKM4 xaA1UK5pcedzQI7vC648qLyMm/Qti4DD0Dm2sTEZPJVKSqvv1RXISe9eAi7Wfh0Ovywv Wfp835gyzn1jCQB2+mLJQmn+frQ9IrBGFLNHrVL1q/W1RMlO0kdutD5xsgopc+cLwIy3 f9bpPUM+/eUW3T+svLy4SdhXhrRm19uufC2tRgew0gWA2cKhGWOPJ5A2M6p8NBnUyQbS UX15GSE9Qjb9ZRh0xeOJRMNm7X9h/H2MnET3J3guXcOKSW0DosGNEEtX8ixg9ZNyrpyd PL+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894855; x=1750499655; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zEnTZdM/nEO25TyhM2GgCtjBibybJT5d1uMDui3gNg8=; b=F9UDUc2UQ53QUC0IPwp1/eZ/icAPeGCOVy9AdtIlyCpOAvGdf/p8sh7de9JZ/K9Fy3 /MfRxHpJH2CxrNBisMaMdkq2n3O2sN7jOpDLNmShhfNsIJoQwJ9Jb8CHfS1JnozI3jox ibW6yDHp7MfjuyibGDBt1CjBUFc67nKJtjH6/CbW3REYV49QUs7Sxb46rRqdHQgtVWO7 21M6oq4rN/6eMONcA0xwnMuwgJfRDelPAdi7c695++IwtSNWW4UPChZY8EGcXsvZSqAz STHEMdMT4IJWDoFAi97D8VEzqYEG2WpdWhpaOZGo5C2feGCuCMT2RvSD0yhdkGA/awXk OuQA== X-Forwarded-Encrypted: i=1; AJvYcCWJQQdkgsvIbFpJ/WRul+Eo356D8oitoza8/1uSr/UOoFHysVDFvIqOf7bKL/TaLpvD+CniN+7ptUphMSo=@vger.kernel.org X-Gm-Message-State: AOJu0YzPUsX6JvyxqrcQ3lr2dzuk5naKX5OrNnwrEsELub9TBl2/JgLe b5TJRTNvfrcP8FdgRqMQPCtacGMZdnPX/meqEtLeOO/cTm/XSmUj4AGK X-Gm-Gg: ASbGncv+hJ+tODUWIPwmFOej1xTcn4FbZFndaE3KxFghzn6TeQhN5+4MfdWPwXi3znd 8qX4jgvRYZQMV8dGa6EuMq+O32lzwzoGs9J7axV8PT3GqeOPp4uAIKZWITtIjKkapQqCt1ZPDLJ jDDjB0t9NPyy+l8yjVxjZ8VI6DFbRiHwHa+u0lPJbLO4CFuBLtiJl+aA+EpWzVeyODFbvxtSD3C TT5nwexrlCrDyRJ5ZBUrorJyXRIBClkfsefAAKbbfUi84sBpJDeWWo8iUCV7exqJiLZqdARHkkE 16O+IlOg1yVfFKUf1eHF4HSryNixKpSQPqUhZUsFmYNnL2dqtfW3jEAYpU6dW7MRZkhcq+FiSRW ZdUnfj5Ju8RR5oH+Yz4fW4fWa/4sV1KYNlaxoU/5rjKY= X-Google-Smtp-Source: AGHT+IGG7FXFstBCaJSZ99EkRoiD5VoPJ9Uqi8ywctGye8MRdWYs7O71NT1/WI1+uaJ0oiJ5K4lMeQ== X-Received: by 2002:a05:6000:2486:b0:3a5:52d4:5b39 with SMTP id ffacd0b85a97d-3a572398ea5mr2411614f8f.8.1749894855151; Sat, 14 Jun 2025 02:54:15 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:14 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 05/10] lib: Add tests for mul_u64_u64_div_u64_roundup() Date: Sat, 14 Jun 2025 10:53:41 +0100 Message-Id: <20250614095346.69130-6-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Replicate the existing mul_u64_u64_div_u64() test cases with round up. Update the shell script that verifies the table, remove the comment markers so that it can be directly pasted into a shell. Rename the divisor from 'c' to 'd' to match mul_u64_add_u64_div_u64(). It any tests fail then fail the module load with -EINVAL. Signed-off-by: David Laight Reviewed-by: Nicolas Pitre --- Changes for v3: - Rename 'c' to 'd' to match mul_u64_add_u64_div_u64() lib/math/test_mul_u64_u64_div_u64.c | 122 +++++++++++++++++----------- 1 file changed, 73 insertions(+), 49 deletions(-) diff --git a/lib/math/test_mul_u64_u64_div_u64.c b/lib/math/test_mul_u64_u6= 4_div_u64.c index 58d058de4e73..ea5b703cccff 100644 --- a/lib/math/test_mul_u64_u64_div_u64.c +++ b/lib/math/test_mul_u64_u64_div_u64.c @@ -10,61 +10,73 @@ #include #include =20 -typedef struct { u64 a; u64 b; u64 c; u64 result; } test_params; +typedef struct { u64 a; u64 b; u64 d; u64 result; uint round_up;} test_par= ams; =20 static test_params test_values[] =3D { /* this contains many edge values followed by a couple random values */ -{ 0xb, 0x7, 0x3, = 0x19 }, -{ 0xffff0000, 0xffff0000, 0xf, 0x1110eeef00= 000000 }, -{ 0xffffffff, 0xffffffff, 0x1, 0xfffffffe00= 000001 }, -{ 0xffffffff, 0xffffffff, 0x2, 0x7fffffff00= 000000 }, -{ 0x1ffffffff, 0xffffffff, 0x2, 0xfffffffe80= 000000 }, -{ 0x1ffffffff, 0xffffffff, 0x3, 0xaaaaaaa9aa= aaaaab }, -{ 0x1ffffffff, 0x1ffffffff, 0x4, 0xffffffff00= 000000 }, -{ 0xffff000000000000, 0xffff000000000000, 0xffff000000000001, 0xfffeffffff= ffffff }, -{ 0x3333333333333333, 0x3333333333333333, 0x5555555555555555, 0x1eb851eb85= 1eb851 }, -{ 0x7fffffffffffffff, 0x2, 0x3, 0x5555555555= 555554 }, -{ 0xffffffffffffffff, 0x2, 0x8000000000000000, = 0x3 }, -{ 0xffffffffffffffff, 0x2, 0xc000000000000000, = 0x2 }, -{ 0xffffffffffffffff, 0x4000000000000004, 0x8000000000000000, 0x8000000000= 000007 }, -{ 0xffffffffffffffff, 0x4000000000000001, 0x8000000000000000, 0x8000000000= 000001 }, -{ 0xffffffffffffffff, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000= 000001 }, -{ 0xfffffffffffffffe, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000= 000000 }, -{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffe, 0x8000000000= 000001 }, -{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffd, 0x8000000000= 000002 }, -{ 0x7fffffffffffffff, 0xffffffffffffffff, 0xc000000000000000, 0xaaaaaaaaaa= aaaaa8 }, -{ 0xffffffffffffffff, 0x7fffffffffffffff, 0xa000000000000000, 0xcccccccccc= ccccca }, -{ 0xffffffffffffffff, 0x7fffffffffffffff, 0x9000000000000000, 0xe38e38e38e= 38e38b }, -{ 0x7fffffffffffffff, 0x7fffffffffffffff, 0x5000000000000000, 0xcccccccccc= ccccc9 }, -{ 0xffffffffffffffff, 0xfffffffffffffffe, 0xffffffffffffffff, 0xffffffffff= fffffe }, -{ 0xe6102d256d7ea3ae, 0x70a77d0be4c31201, 0xd63ec35ab3220357, 0x78f8bf8cc8= 6c6e18 }, -{ 0xf53bae05cb86c6e1, 0x3847b32d2f8d32e0, 0xcfd4f55a647f403c, 0x42687f79d8= 998d35 }, -{ 0x9951c5498f941092, 0x1f8c8bfdf287a251, 0xa3c8dc5f81ea3fe2, 0x1d887cb259= 00091f }, -{ 0x374fee9daa1bb2bb, 0x0d0bfbff7b8ae3ef, 0xc169337bd42d5179, 0x03bb2dbaff= cbb961 }, -{ 0xeac0d03ac10eeaf0, 0x89be05dfa162ed9b, 0x92bb1679a41f0e4b, 0xdc5f5cc9e2= 70d216 }, +{ 0xb, 0x7, 0x3, = 0x19, 1 }, +{ 0xffff0000, 0xffff0000, 0xf, 0x1110eeef00= 000000, 0 }, +{ 0xffffffff, 0xffffffff, 0x1, 0xfffffffe00= 000001, 0 }, +{ 0xffffffff, 0xffffffff, 0x2, 0x7fffffff00= 000000, 1 }, +{ 0x1ffffffff, 0xffffffff, 0x2, 0xfffffffe80= 000000, 1 }, +{ 0x1ffffffff, 0xffffffff, 0x3, 0xaaaaaaa9aa= aaaaab, 0 }, +{ 0x1ffffffff, 0x1ffffffff, 0x4, 0xffffffff00= 000000, 1 }, +{ 0xffff000000000000, 0xffff000000000000, 0xffff000000000001, 0xfffeffffff= ffffff, 1 }, +{ 0x3333333333333333, 0x3333333333333333, 0x5555555555555555, 0x1eb851eb85= 1eb851, 1 }, +{ 0x7fffffffffffffff, 0x2, 0x3, 0x5555555555= 555554, 1 }, +{ 0xffffffffffffffff, 0x2, 0x8000000000000000, = 0x3, 1 }, +{ 0xffffffffffffffff, 0x2, 0xc000000000000000, = 0x2, 1 }, +{ 0xffffffffffffffff, 0x4000000000000004, 0x8000000000000000, 0x8000000000= 000007, 1 }, +{ 0xffffffffffffffff, 0x4000000000000001, 0x8000000000000000, 0x8000000000= 000001, 1 }, +{ 0xffffffffffffffff, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000= 000001, 0 }, +{ 0xfffffffffffffffe, 0x8000000000000001, 0xffffffffffffffff, 0x8000000000= 000000, 1 }, +{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffe, 0x8000000000= 000001, 1 }, +{ 0xffffffffffffffff, 0x8000000000000001, 0xfffffffffffffffd, 0x8000000000= 000002, 1 }, +{ 0x7fffffffffffffff, 0xffffffffffffffff, 0xc000000000000000, 0xaaaaaaaaaa= aaaaa8, 1 }, +{ 0xffffffffffffffff, 0x7fffffffffffffff, 0xa000000000000000, 0xcccccccccc= ccccca, 1 }, +{ 0xffffffffffffffff, 0x7fffffffffffffff, 0x9000000000000000, 0xe38e38e38e= 38e38b, 1 }, +{ 0x7fffffffffffffff, 0x7fffffffffffffff, 0x5000000000000000, 0xcccccccccc= ccccc9, 1 }, +{ 0xffffffffffffffff, 0xfffffffffffffffe, 0xffffffffffffffff, 0xffffffffff= fffffe, 0 }, +{ 0xe6102d256d7ea3ae, 0x70a77d0be4c31201, 0xd63ec35ab3220357, 0x78f8bf8cc8= 6c6e18, 1 }, +{ 0xf53bae05cb86c6e1, 0x3847b32d2f8d32e0, 0xcfd4f55a647f403c, 0x42687f79d8= 998d35, 1 }, +{ 0x9951c5498f941092, 0x1f8c8bfdf287a251, 0xa3c8dc5f81ea3fe2, 0x1d887cb259= 00091f, 1 }, +{ 0x374fee9daa1bb2bb, 0x0d0bfbff7b8ae3ef, 0xc169337bd42d5179, 0x03bb2dbaff= cbb961, 1 }, +{ 0xeac0d03ac10eeaf0, 0x89be05dfa162ed9b, 0x92bb1679a41f0e4b, 0xdc5f5cc9e2= 70d216, 1 }, }; =20 /* * The above table can be verified with the following shell script: - * - * #!/bin/sh - * sed -ne 's/^{ \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\) },$/\1 \2 \3 \4/p'= \ - * lib/math/test_mul_u64_u64_div_u64.c | - * while read a b c r; do - * expected=3D$( printf "obase=3D16; ibase=3D16; %X * %X / %X\n" $a $b $= c | bc ) - * given=3D$( printf "%X\n" $r ) - * if [ "$expected" =3D "$given" ]; then - * echo "$a * $b / $c =3D $r OK" - * else - * echo "$a * $b / $c =3D $r is wrong" >&2 - * echo "should be equivalent to 0x$expected" >&2 - * exit 1 - * fi - * done + +#!/bin/sh +sed -ne 's/^{ \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\), \+\(.*\) },$/\1 \2 \= 3 \4 \5/p' \ + lib/math/test_mul_u64_u64_div_u64.c | +while read a b d r d; do + expected=3D$( printf "obase=3D16; ibase=3D16; %X * %X / %X\n" $a $b $d |= bc ) + given=3D$( printf "%X\n" $r ) + if [ "$expected" =3D "$given" ]; then + echo "$a * $b / $d =3D $r OK" + else + echo "$a * $b / $d =3D $r is wrong" >&2 + echo "should be equivalent to 0x$expected" >&2 + exit 1 + fi + expected=3D$( printf "obase=3D16; ibase=3D16; (%X * %X + %X) / %X\n" $a = $b $((d-1)) $d | bc ) + given=3D$( printf "%X\n" $((r + d)) ) + if [ "$expected" =3D "$given" ]; then + echo "$a * $b +/ $d =3D $(printf '%#x' $((r + d))) OK" + else + echo "$a * $b +/ $d =3D $(printf '%#x' $((r + d))) is wrong" >&2 + echo "should be equivalent to 0x$expected" >&2 + exit 1 + fi +done + */ =20 static int __init test_init(void) { + int errors =3D 0; + int tests =3D 0; int i; =20 pr_info("Starting mul_u64_u64_div_u64() test\n"); @@ -72,19 +84,31 @@ static int __init test_init(void) for (i =3D 0; i < ARRAY_SIZE(test_values); i++) { u64 a =3D test_values[i].a; u64 b =3D test_values[i].b; - u64 c =3D test_values[i].c; + u64 d =3D test_values[i].d; u64 expected_result =3D test_values[i].result; - u64 result =3D mul_u64_u64_div_u64(a, b, c); + u64 result =3D mul_u64_u64_div_u64(a, b, d); + u64 result_up =3D mul_u64_u64_div_u64_roundup(a, b, d); + + tests +=3D 2; =20 if (result !=3D expected_result) { - pr_err("ERROR: 0x%016llx * 0x%016llx / 0x%016llx\n", a, b, c); + pr_err("ERROR: 0x%016llx * 0x%016llx / 0x%016llx\n", a, b, d); pr_err("ERROR: expected result: %016llx\n", expected_result); pr_err("ERROR: obtained result: %016llx\n", result); + errors++; + } + expected_result +=3D test_values[i].round_up; + if (result_up !=3D expected_result) { + pr_err("ERROR: 0x%016llx * 0x%016llx +/ 0x%016llx\n", a, b, d); + pr_err("ERROR: expected result: %016llx\n", expected_result); + pr_err("ERROR: obtained result: %016llx\n", result_up); + errors++; } } =20 - pr_info("Completed mul_u64_u64_div_u64() test\n"); - return 0; + pr_info("Completed mul_u64_u64_div_u64() test, %d tests, %d errors\n", + tests, errors); + return errors ? -EINVAL : 0; } =20 static void __exit test_exit(void) --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8378D200BBC for ; Sat, 14 Jun 2025 09:54:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894859; cv=none; b=LFnY3bK8bpBrciJHLFXZRFOaARaQwcap/IoFeGtR2cMyvtUCBxj5D5cEBcR5X9qvTYPAJNxLH+KqBIBVJZu+9Z2hEb3w+jpzz5VO+PeRviKidlme/ldSZTr5wCjUSYT1ILyrcYvD+eYRR+79GF2TTvvgWVcIN/MEfyXy8ciEPJU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894859; c=relaxed/simple; bh=Z5hzu1KYGp1QiejfFXW9fK6Q53roitRAH7xYYDd4dnY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=pyzeAPzYwjJ9HgqW+MwmnTVFHIdJsyeWyHjSoN8A7l7DVIMUEY7SwpP+FqbC+51r4qjI2+suMVwUtOxC/IBwn364DC7g8qlOu/JNzva2VIDJwb7pXWAGAE8EMkgzGqVl6lbzLWH2ZYFt58Lh7Gn6ilexhTxMruH1NYu/7p6oE5k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NFYWjRt1; arc=none smtp.client-ip=209.85.221.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NFYWjRt1" Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-3a575a988f9so257819f8f.0 for ; Sat, 14 Jun 2025 02:54:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894856; x=1750499656; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pdx9zr3Z1yBW7Zs2OBAHWfDA624/sOwZdHmBJOvy5iU=; b=NFYWjRt1lIltFpZj386tOhNELwUym8YwqKluwNwvX7reTemKhFBLMqdB8t1OKPHfH/ QeRKp9Fc/xPnSWRewSsDFdfC4/RKbkPe5Dn+NtE3bCBc0SfC5CMZiNQcwuxgI67iifpK 5k5Au4qS1kKxYx6VPzilUQ8ikgpawE3ghbcq7v4JTHZLCXuyRZ2YN4iEWKE12nO7UpZQ timY48Jk6plax6+zz0RCUtUT3d4HhDWIG6nFszz8osBqQozaoj7x27sNHDdI7ZwjM5pI NyS5LLg+setEVIzeWr5IZeG3Fc7ykF4Qp/wWp229xfggcsf6wcPZl9r5amlKJOJW+8zv yrCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894856; x=1750499656; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pdx9zr3Z1yBW7Zs2OBAHWfDA624/sOwZdHmBJOvy5iU=; b=OLkf8cvCkhSKZRtWLx+EGXB7mYLTYTRYRpuS27QhlSj4vYBn+kPVEPlJQsSQiuuXR4 i8x/Duryzx+u69c24ZFp+FAFfzswiPr85mdnajjMrW/zM0qt/turCNYYs9saSF+z96Gv i6q6f4PYadU1s/x77gKS4/jEksWDpZ1eJHpPjVwKk34zaxPH9S5229+rRP2XnXvot8ff hmZajaZHw9wo9Oo54386w7l4gRyRqLdexakTxpicMvLLaCJtK8lY6M1ydSAU7KQcw6pj VkjKrKLIf5Tdbj85ZWm4FrLcgUkjWkTfW0DpzDoxo5qv2aj+W23sFy19c6d3f/DQ7FYi WBtQ== X-Forwarded-Encrypted: i=1; AJvYcCWUhhOzIoN22YJMuCI/ZavEBGW+Xjjp9VutXBQy1pDyAyrq+vTkUXXAzXeCnOyFRQCiVjrOjXtf+mCTlu4=@vger.kernel.org X-Gm-Message-State: AOJu0Yxe/BKFUZjCamaWh0BlKVep1+ifRpD8jByfEsu894pPNfoOymvi cVOb4ftfHO9tNSouMIUz4eiccflhKGbEeMpLOz+vSXYA7/kEw6nukNj4GnkmTQ== X-Gm-Gg: ASbGncuy53HM/WwpfXgYzPt+YIYkv8uqdcik8eRwq1+p2zkw+fp4aMTvZIzzsR1Vzij qk9Rhot64ThL1Q+ocxbDzqmA9a+Q+HkzZwBgjDbTqdMNM+pfSOX6N32in/AbEsroN00tKezbrYy tfx0UIjnY2lBrj7yfeLlfYzluw3kov0J7EWyFF1xYSRuaT3jdezL3i977HqAq25NF+XiTFwUVd2 CK/PvtZIhrBnAb35I9JfyoGdFJqMhrem5+yX3Nn2mG9RXmS1ycYrrFKSCwv70gMo5vh1SURg4DU QkCgjCmSPNOFyAUpKt1EIv/ZDf4rbijy7F5kQz82sR4X8sE1HuJsLbh+i6uSubObSmpvFUv97Gt mSHIlURCHGG1I5ws6j2IhE912lPTvSHLXS1F7Rb6ub2lBm158Q6f0hw== X-Google-Smtp-Source: AGHT+IFRtCg+wzVm0vQk2uN9Kk18ZSaVFYJ73nA85I0SubpmWS1UwpViZSgDcDNPhMQlI0jm5e1mWw== X-Received: by 2002:a05:6000:2003:b0:3a4:dd02:f565 with SMTP id ffacd0b85a97d-3a572397756mr2468489f8f.3.1749894855664; Sat, 14 Jun 2025 02:54:15 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:15 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 06/10] lib: test_mul_u64_u64_div_u64: Test both generic and arch versions Date: Sat, 14 Jun 2025 10:53:42 +0100 Message-Id: <20250614095346.69130-7-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change the #if in div64.c so that test_mul_u64_u64_div_u64.c can compile and test the generic version (including the 'long multiply') on architectures (eg amd64) that define their own copy. Test the kernel version and the locally compiled version on all arch. Output the time taken (in ns) on the 'test completed' trace. For reference, on my zen 5, the optimised version takes ~220ns and the generic version ~3350ns. Using the native multiply saves ~200ns and adding back the ilog2() 'optimis= ation' test adds ~50ms. Signed-off-by: David Laight Reviewed-by: Nicolas Pitre --- New patch for v3, replacing changes in v1 that were removed for v2. lib/math/div64.c | 8 +++-- lib/math/test_mul_u64_u64_div_u64.c | 48 ++++++++++++++++++++++++----- 2 files changed, 47 insertions(+), 9 deletions(-) diff --git a/lib/math/div64.c b/lib/math/div64.c index 7850cc0a7596..22433e5565c4 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -178,13 +178,15 @@ EXPORT_SYMBOL(div64_s64); * Iterative div/mod for use when dividend is not expected to be much * bigger than divisor. */ +#ifndef iter_div_u64_rem u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) { return __iter_div_u64_rem(dividend, divisor, remainder); } EXPORT_SYMBOL(iter_div_u64_rem); +#endif =20 -#ifndef mul_u64_add_u64_div_u64 +#if !defined(mul_u64_add_u64_div_u64) || defined(test_mul_u64_add_u64_div_= u64) u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) { if (WARN_ONCE(!d, "%s: division of (%#llx * %#llx + %#llx) by zero, retur= ning 0", @@ -196,7 +198,7 @@ u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) return 0; } =20 -#if defined(__SIZEOF_INT128__) +#if defined(__SIZEOF_INT128__) && !defined(test_mul_u64_add_u64_div_u64) =20 /* native 64x64=3D128 bits multiplication */ u128 prod =3D (u128)a * b + c; @@ -270,5 +272,7 @@ u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) =20 return res; } +#if !defined(test_mul_u64_add_u64_div_u64) EXPORT_SYMBOL(mul_u64_add_u64_div_u64); #endif +#endif diff --git a/lib/math/test_mul_u64_u64_div_u64.c b/lib/math/test_mul_u64_u6= 4_div_u64.c index ea5b703cccff..f0134f25cb0d 100644 --- a/lib/math/test_mul_u64_u64_div_u64.c +++ b/lib/math/test_mul_u64_u64_div_u64.c @@ -73,21 +73,34 @@ done =20 */ =20 -static int __init test_init(void) +static u64 test_mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d); + +static int __init test_run(unsigned int fn_no, const char *fn_name) { + u64 start_time; int errors =3D 0; int tests =3D 0; int i; =20 - pr_info("Starting mul_u64_u64_div_u64() test\n"); + start_time =3D ktime_get_ns(); =20 for (i =3D 0; i < ARRAY_SIZE(test_values); i++) { u64 a =3D test_values[i].a; u64 b =3D test_values[i].b; u64 d =3D test_values[i].d; u64 expected_result =3D test_values[i].result; - u64 result =3D mul_u64_u64_div_u64(a, b, d); - u64 result_up =3D mul_u64_u64_div_u64_roundup(a, b, d); + u64 result, result_up; + + switch (fn_no) { + default: + result =3D mul_u64_u64_div_u64(a, b, d); + result_up =3D mul_u64_u64_div_u64_roundup(a, b, d); + break; + case 1: + result =3D test_mul_u64_add_u64_div_u64(a, b, 0, d); + result_up =3D test_mul_u64_add_u64_div_u64(a, b, d - 1, d); + break; + } =20 tests +=3D 2; =20 @@ -106,15 +119,36 @@ static int __init test_init(void) } } =20 - pr_info("Completed mul_u64_u64_div_u64() test, %d tests, %d errors\n", - tests, errors); - return errors ? -EINVAL : 0; + pr_info("Completed %s() test, %d tests, %d errors, %llu ns\n", + fn_name, tests, errors, ktime_get_ns() - start_time); + return errors; +} + +static int __init test_init(void) +{ + pr_info("Starting mul_u64_u64_div_u64() test\n"); + if (test_run(0, "mul_u64_u64_div_u64")) + return -EINVAL; + if (test_run(1, "test_mul_u64_u64_div_u64")) + return -EINVAL; + return 0; } =20 static void __exit test_exit(void) { } =20 +/* Compile the generic mul_u64_add_u64_div_u64() code */ +#define div64_u64 div64_u64 +#define div64_s64 div64_s64 +#define iter_div_u64_rem iter_div_u64_rem + +#undef mul_u64_add_u64_div_u64 +#define mul_u64_add_u64_div_u64 test_mul_u64_add_u64_div_u64 +#define test_mul_u64_add_u64_div_u64 test_mul_u64_add_u64_div_u64 + +#include "div64.c" + module_init(test_init); module_exit(test_exit); =20 --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3142B23D2A9 for ; Sat, 14 Jun 2025 09:54:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894860; cv=none; b=Yr7w96+f9+VUDG08rA+2G8B8azRczTX/u55fkI1Ue/i7tY8+EW6jsEwBeSJ8fArhXCBZhMGy9u2U96GZKE8tsB2ygGZ/SMvzncPdbaO+C8DAOfe+FmYfxijhc0XoMkhCwVpkdGfrHazCDzowsem/6kn9YV6uHz9Ncbp4i7a6RLk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894860; c=relaxed/simple; bh=P7fZLPmZMmNJxuvvhuZX2hlKlRwIdsmOpsscv3/F8r4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lwBUprdA8Og9GlkMU8CsrRI//3+HLe0EH8WGnTMr7GvLV4ozd+TF2GXIKxQNOrFA8rtMBYswU2/1v0ugC5UvgS0hVJOUkPfTTnWmlQCwX8yBrYAn9dM415EseDimgS5Mu+8jwpq2Q/hr5afDV9tNr9W4hRBrpHKCQMqWFg8kwrE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bCaEumiy; arc=none smtp.client-ip=209.85.221.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bCaEumiy" Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-3a50fc819f2so2453000f8f.2 for ; Sat, 14 Jun 2025 02:54:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894856; x=1750499656; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5S6RWHkm8g4r7/EkPnCUIyR/CtOndR5cnePpxZF3FoU=; b=bCaEumiyWKjpGEE2orqlvWcENavtsPq/If6TKcmXopWsZhAHfQPDdiw3oEVEpjE2kK r/cs4nLQzAlCRrlt5fkiKeDf/RjR9CbmIHuvyIgjDhvUV43nayzar0xgacWpbbQ7V+No slz9UmvCyVptB/vxsfg0+HFzKoqEOYw3ZaVqb6itS4Yl4JTGUlR4A//JAtdUSqr4Ldhc P4SM4CKyxnkYkud0vW6LbWuTEEah584WEzso7f3DXrHiVpWybC1bTnFk0AScPl2LqR83 xMIHi4lXjM5dSYtd2ucmOOBoDXasdyHQ91JPvRVzDd3/P8l0Nr8weccCFuXdVZk8r3nQ ieZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894856; x=1750499656; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5S6RWHkm8g4r7/EkPnCUIyR/CtOndR5cnePpxZF3FoU=; b=YLrxyH1mp9PuXT3v/ggrS0pEwnbIrvtc50FGLBkwWOr3lUhob67fngzWWxvEGdwCxA J7yvPi7UlGFF/VxwNbdrU8uMqwCVHY+Oj6sHvrb7wG9BCYkjSccZP/alsO9yCzK326Qg nN0C8bSSX3yONJP2N6viV+nOFiCshthYOFzRoTX8lY5KAkwB1+8DcuIosMGysyrZIo2n EvcfSQvPuaZBRHAvUJ4lSyiMgVft1z40lyxHaAxmdts/B3kBANcmC45FEc/qX1TOgdXv Znyv2DoinmUmo1KBxxtcix5ySrJtFCMr3sWbfRV/ozSfDUQJ/EVblBpqCEsOAZK5O/6Z 1TcA== X-Forwarded-Encrypted: i=1; AJvYcCWFyybMbAS6+bIfOpFk+si/b1ZDvYiftc4iteKl9KYIImgQ8n7u9DE7Sc5lLgU3y7oHnUAvocNCvcr1nkE=@vger.kernel.org X-Gm-Message-State: AOJu0YwADmDWQYWEh7nLLZJAacgEvOMhc/gcxfsODvvPMTGz7G+MLbHf av248IkK2Iu5Z7VMba0NRWT7NTbw0QpAujXP0Xr7MGYavPBgWikfAnxG X-Gm-Gg: ASbGnctfeZdMwfgzeBo+k4yy6UxozvRl7Dun9eJUruqTguCeM0IeCMvYDAuIkDVnEq8 uKe4NBVZ4RB6GEE2StUr5AgFFiDdjGvzgjSSCbhabGAb+BZQh7x4x5Gq+hN57V4J9sE/NZ48S+0 Jkv29vx8YTvysaUbyjo0DLnuuNuX0lQ+bVbR3Z/ukPv4NTlCgeOQ6J0tdJgZd/96aCxgOJLd5xc tNsRFAcoe0N8iDZ2wn2UkdTgUCBzj4SH/JBSdqgn+hQ3aXx7MKhNBbBFaCtfN4bO3hzqgMgCTrS U1CEJq47lbHDLCqFAaaem9pRVjGI9HlUS6UrNEZFtPQR0bDXIq+y7f/BU3gKeYdxk7igaE8Sv1j eiCf1z2qJrgEHC5+2tRzGSJ++1LJ0BN7hmZVWoMjO9io= X-Google-Smtp-Source: AGHT+IERW2dM2HCSqWNUXKt0pJsjtju7uQZsZMFyYm9x1Il+B6p60Uts7Er67qVj3AGDBgeCxt1UXQ== X-Received: by 2002:a5d:5f8f:0:b0:3a4:f439:e715 with SMTP id ffacd0b85a97d-3a572398eb0mr2599419f8f.9.1749894856099; Sat, 14 Jun 2025 02:54:16 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:15 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 07/10] lib: mul_u64_u64_div_u64() optimise multiply on 32bit x86 Date: Sat, 14 Jun 2025 10:53:43 +0100 Message-Id: <20250614095346.69130-8-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" gcc generates horrid code for both ((u64)u32_a * u32_b) and (u64_a + u32_b). As well as the extra instructions it can generate a lot of spills to stack (including spills of constant zeros and even multiplies by constant zero). mul_u32_u32() already exists to optimise the multiply. Add a similar add_u64_32() for the addition. Disable both for clang - it generates better code without them. Use mul_u32_u32() and add_u64_u32() in the 64x64 =3D> 128 multiply in mul_u64_add_u64_div_u64(). Tested by forcing the amd64 build of test_mul_u64_u64_div_u64.ko to use the 32bit asm code. Signed-off-by: David Laight --- New patch for v3. arch/x86/include/asm/div64.h | 14 ++++++++++++++ include/linux/math64.h | 11 +++++++++++ lib/math/div64.c | 18 ++++++++++++------ 3 files changed, 37 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h index 7a0a916a2d7d..4a4c29e8602d 100644 --- a/arch/x86/include/asm/div64.h +++ b/arch/x86/include/asm/div64.h @@ -60,6 +60,7 @@ static inline u64 div_u64_rem(u64 dividend, u32 divisor, = u32 *remainder) } #define div_u64_rem div_u64_rem =20 +#ifndef __clang__ static inline u64 mul_u32_u32(u32 a, u32 b) { u32 high, low; @@ -71,6 +72,19 @@ static inline u64 mul_u32_u32(u32 a, u32 b) } #define mul_u32_u32 mul_u32_u32 =20 +static inline u64 add_u64_u32(u64 a, u32 b) +{ + u32 high =3D a >> 32, low =3D a; + + asm ("addl %[b], %[low]; adcl $0, %[high]" + : [low] "+r" (low), [high] "+r" (high) + : [b] "rm" (b) ); + + return low | (u64)high << 32; +} +#define add_u64_u32 add_u64_u32 +#endif + /* * __div64_32() is never called on x86, so prevent the * generic definition from getting built. diff --git a/include/linux/math64.h b/include/linux/math64.h index e1c2e3642cec..5e497836e975 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -158,6 +158,17 @@ static inline u64 mul_u32_u32(u32 a, u32 b) } #endif =20 +#ifndef add_u64_u32 +/* + * Many a GCC version also messes this up. + * Zero extending b and then spilling everything to stack. + */ +static inline u64 add_u64_u32(u64 a, u32 b) +{ + return a + b; +} +#endif + #if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__) =20 #ifndef mul_u64_u32_shr diff --git a/lib/math/div64.c b/lib/math/div64.c index 22433e5565c4..2ac7e25039a1 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -187,6 +187,12 @@ EXPORT_SYMBOL(iter_div_u64_rem); #endif =20 #if !defined(mul_u64_add_u64_div_u64) || defined(test_mul_u64_add_u64_div_= u64) + +static u64 mul_add(u32 a, u32 b, u32 c) +{ + return add_u64_u32(mul_u32_u32(a, b), c); +} + u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) { if (WARN_ONCE(!d, "%s: division of (%#llx * %#llx + %#llx) by zero, retur= ning 0", @@ -211,12 +217,12 @@ u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 = d) u64 x, y, z; =20 /* Since (x-1)(x-1) + 2(x-1) =3D=3D x.x - 1 two u32 can be added to a u64= */ - x =3D (u64)a_lo * b_lo + (u32)c; - y =3D (u64)a_lo * b_hi + (u32)(c >> 32); - y +=3D (u32)(x >> 32); - z =3D (u64)a_hi * b_hi + (u32)(y >> 32); - y =3D (u64)a_hi * b_lo + (u32)y; - z +=3D (u32)(y >> 32); + x =3D mul_add(a_lo, b_lo, c); + y =3D mul_add(a_lo, b_hi, c >> 32); + y =3D add_u64_u32(y, x >> 32); + z =3D mul_add(a_hi, b_hi, y >> 32); + y =3D mul_add(a_hi, b_lo, y); + z =3D add_u64_u32(z, y >> 32); x =3D (y << 32) + (u32)x; =20 u64 n_lo =3D x, n_hi =3D z; --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F4301E411C for ; Sat, 14 Jun 2025 09:54:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894860; cv=none; b=fxQYy9xEAlXKuh5h/O/VwUiLy6KnLjllALlUyrk6bis6DyPgE1LTDtl+I8OBeBonjnpJ10AsG7DOeYWsNj6g7dn2HC59L94KOYJJ+ByL+0W/4RyqirBIHTUU/Zrd1l1b/aNrzRrj+bM/qDo9RQzC38nG2gQAwmnZVMPqasWylN0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894860; c=relaxed/simple; bh=f6VoYaziB6JRa0uECDBQhDG51vS3VudPWkMC9wC2bFg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JYtfOgB+Wb85O396WzI7X0QEzaO0vF02bUYU97BhBaVlBMdh4jxJbiPMg7JgGPWVgxll/paaP4RrsvNKHWKSY560wGyTE1/Gy+YepxQbXKvjJzOYx16cdzfaDtEvPaFaT5jKAl3BRWpG5Dm7vtsua5L+7f6hoGlsOJeRrKWs5x8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JjvD9Jwi; arc=none smtp.client-ip=209.85.128.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JjvD9Jwi" Received: by mail-wm1-f44.google.com with SMTP id 5b1f17b1804b1-451d3f72391so35717205e9.3 for ; Sat, 14 Jun 2025 02:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894857; x=1750499657; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IsjJMTsptJdgGk7nbEvhtT2Jyi35Ikr94jQB9Jmrevg=; b=JjvD9JwilNhEtHFP3Ry5xHFCM7h0OxPSFjLbrEj3auFk7SSN7VRnQe0pYbapr1uFsY RcpCOLsnKWUA+w8Baeu1xmWL489LQYAtGwk95KTTCwN8H4I6U40NkeaENm2CLQbqiYg3 SFZrKWX6Sn8iWiY5YO8hr5mA/X7jtYXlReksnKRhCiYBbjxs2X7yAK+eFVqmQ57q4tY1 p4mtW9RjkiT5Nco+cAzY8p95filpT21piqoLvhkCOh9+PPSKcfp1eiv7i7i8+1vcxG03 EdW5/JXb5PLaS8dbSoVtgepVg7HYpo2SGat1lJA8cJRIhbvzujJ5Vcf3sOrOTdz5ZbVU 1zQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894857; x=1750499657; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IsjJMTsptJdgGk7nbEvhtT2Jyi35Ikr94jQB9Jmrevg=; b=rWTt9nFzMtAX9qVzZQ4uDWMjpbYFvTgiVHFdLt3pud1C+/y9XIFcsQkBELqD+A4H3b Dv2eVn6ZTH71UVJMzdq5a7LV85PHCalF7Sv9FYRAA6FR+W/TWiJFDOJXbPn+44SKU8Qd PZxEHM2IzLH3Q/+hHM7P/XSpPwCBJUjJ5gVs1Wv39lpN+IYj6WefMaQJTddz6pWAcoQi 7kmArc//7XbQrl1OYTxdzbRNK5HB0HQ9cZGwOp0a9JAhDag3mGBPef24DjWGski7qCLS NvuIr8Npv/UYjmKUtWLEy4p2U+D5M1E1MXxlm+4g+62f8aDCFLVl1BifZgbEIrGEZD3E evUA== X-Forwarded-Encrypted: i=1; AJvYcCV+/inT1UTTKZPv63wIXk1HTwsN+jnMcLXX73D1fsJ5MWFh+/e595wyyviznTJhVizktiOwxDJK2CCKjm8=@vger.kernel.org X-Gm-Message-State: AOJu0Yxh2br2K4svil/Kq/OQO08Gkgn1tIMpPiq5wuLL2CorPcukldCP zFuOe2lKXvBsudD0wjYMsCwN8gQ9UmisC4oDOE++Nb12NrT0GucjdNz9 X-Gm-Gg: ASbGnctkuYp8d8icd6uFvdbFZu9J9hY08v8BMME33LMxaCL/XYFUuSw1gV+4sAv7BtY TRukwpHWkDHGjLKwisfmfjI2+Zds/S56iERXcxj65fcZC8oJREGd6PYEGhJnSMQhvKEHq04gsNc I8w8l5cp/LcajUUmxE51gbkbay1ot/BdcahY582RF24Et77BcWmXEOlwuEU7zOTyh+/M0qNfIbl 0gRLHuRahklSCa++1jOQzALQYiHAf7NNCBBcSImuKv8Rn78wruTzwzn4VenOnvkjUNISNydKXfm 5DWfJ8XheXPcGaeqjZ1z8LCJEslq85LWksgQW9afhPD+Z/CvyFztZO8o2ikLGl7I1Ae23oLE8Qf uecXgCZX5i/gjenTbR8V4ON2LgrXsM5APFceCj8OC7Jk= X-Google-Smtp-Source: AGHT+IERN1ooEDtZI6VI3kgNKHhz6VB6qACq35VnuE44l1ar0jAEXBqaRGMQvyFsmn+R4Vr0Ne5rgw== X-Received: by 2002:a05:600c:1c1b:b0:44a:b478:1387 with SMTP id 5b1f17b1804b1-4533cab8332mr26434225e9.17.1749894856577; Sat, 14 Jun 2025 02:54:16 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:16 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 08/10] lib: mul_u64_u64_div_u64() Separate multiply to a helper for clarity Date: Sat, 14 Jun 2025 10:53:44 +0100 Message-Id: <20250614095346.69130-9-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move the 64x64 =3D> 128 multiply into a static inline helper function for code clarity. No need for the a/b_hi/lo variables, the implicit casts on the function calls do the work for us. Should have minimal effect on the generated code. Signed-off-by: David Laight Reviewed-by: Nicolas Pitre --- new patch for v3. lib/math/div64.c | 54 +++++++++++++++++++++++++++--------------------- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/lib/math/div64.c b/lib/math/div64.c index 2ac7e25039a1..fb77fd9d999d 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -193,42 +193,48 @@ static u64 mul_add(u32 a, u32 b, u32 c) return add_u64_u32(mul_u32_u32(a, b), c); } =20 -u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) -{ - if (WARN_ONCE(!d, "%s: division of (%#llx * %#llx + %#llx) by zero, retur= ning 0", - __func__, a, b, c)) { - /* - * Return 0 (rather than ~(u64)0) because it is less likely to - * have unexpected side effects. - */ - return 0; - } - #if defined(__SIZEOF_INT128__) && !defined(test_mul_u64_add_u64_div_u64) - +static inline u64 mul_u64_u64_add_u64(u64 *p_lo, u64 a, u64 b, u64 c) +{ /* native 64x64=3D128 bits multiplication */ u128 prod =3D (u128)a * b + c; - u64 n_lo =3D prod, n_hi =3D prod >> 64; =20 -#else + *p_lo =3D prod; + return prod >> 64; +} =20 - /* perform a 64x64=3D128 bits multiplication manually */ - u32 a_lo =3D a, a_hi =3D a >> 32, b_lo =3D b, b_hi =3D b >> 32; +#else +static inline u64 mul_u64_u64_add_u64(u64 *p_lo, u64 a, u64 b, u64 c) +{ + /* perform a 64x64=3D128 bits multiplication in 32bit chunks */ u64 x, y, z; =20 /* Since (x-1)(x-1) + 2(x-1) =3D=3D x.x - 1 two u32 can be added to a u64= */ - x =3D mul_add(a_lo, b_lo, c); - y =3D mul_add(a_lo, b_hi, c >> 32); + x =3D mul_add(a, b, c); + y =3D mul_add(a, b >> 32, c >> 32); y =3D add_u64_u32(y, x >> 32); - z =3D mul_add(a_hi, b_hi, y >> 32); - y =3D mul_add(a_hi, b_lo, y); - z =3D add_u64_u32(z, y >> 32); - x =3D (y << 32) + (u32)x; - - u64 n_lo =3D x, n_hi =3D z; + z =3D mul_add(a >> 32, b >> 32, y >> 32); + y =3D mul_add(a >> 32, b, y); + *p_lo =3D (y << 32) + (u32)x; + return add_u64_u32(z, y >> 32); +} =20 #endif =20 +u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) +{ + u64 n_lo, n_hi; + + if (WARN_ONCE(!d, "%s: division of (%llx * %llx + %llx) by zero, returnin= g 0", + __func__, a, b, c )) { + /* + * Return 0 (rather than ~(u64)0) because it is less likely to + * have unexpected side effects. + */ + return 0; + } + + n_hi =3D mul_u64_u64_add_u64(&n_lo, a, b, c); if (!n_hi) return div64_u64(n_lo, d); =20 --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2349257458 for ; Sat, 14 Jun 2025 09:54:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894862; cv=none; b=V8w48A4ET+tGNcW+XnbuzzD728TM7KTY3zLmlr+/mPu7qkmg2SaqhGHoFg2v5Ier/l82IXNe7IHXBiHlR5Uc9jmMS150UIZtSRogJmXB2DR440N1T4qxAPaT5p10jzGtaVbm26X/pSRaYttFsKgt86TbbDLGdLE1b882Zwm+e7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894862; c=relaxed/simple; bh=dG1ssyasJyjtU9vkTNfZ76hCjcwYZKPPRGfq5YfvYE8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jSes3gp2Q2It59Hs3blB+WdNTI09nW2dDJkqG76bpRaom8QEkXg5XuPJap+iWQUvuErAVP9W3e6aPXNvDygSW5zP6IpFJYhKUL1DGQfEQCffCIi4GdPelbjWtn+6MJnA4qSiCCr9ruYgiWezChYRdMNmKB8kFubCRz4snbm8Kh0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=j3pjECzx; arc=none smtp.client-ip=209.85.128.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="j3pjECzx" Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-451ebd3d149so19355805e9.2 for ; Sat, 14 Jun 2025 02:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894857; x=1750499657; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=N6AwblgqjFfv250dWTS4cP7IPuV5Z5gc9JjcTsPRvzc=; b=j3pjECzxlyD9nvIYyqUjJHpL3t13+BHSy+Rj4cuvnpEBotGNdYZ0pBD/PpWIYlBz7o zqEjvB5cu7WBDpyfFDiqEJLvMNjWyNlROWo/ZoXjrLobd210m/xLEsXt5fG+F3zt1ZgP dponcy0RyOzQuQscCc/rwy7y0M+WGRKZaFsJnKlv+QSMnYNxJoscJQTUoLP0Iz79Hj0q XdispHWfxsjybsJtZvPpcX/0gA59xxkBdRONT2/kIoszyVL58w6/scfBfU1k51hj6brj +Y+R3wP4aUPeSGnxgU7FCUEaLGatwMyHCkAB3492a/BQ+bj9tzGxow59USeWSXKPNv7K Es2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894857; x=1750499657; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N6AwblgqjFfv250dWTS4cP7IPuV5Z5gc9JjcTsPRvzc=; b=Jbx06ucRkpF2p7pBneB4cLFJEhXDZA58X2KTF5lM1kTs/Rl5ml76tNReAbrCmPbYkn H7Bdnrbda4EabRxyXhtCa+NyWmDnOIAWg+DNEwl/T3E6YAxQ4Mzu1cFjuJl5sEriWAc7 kVqB0Eq939LjsWFm3rKaNQQ9rWhiu6RbmOilEd8ixmtQHoQFi47YIjhg4KW1Mmc4SizZ 9VCz6hmtYaCjcN+JASrIQHNjLqvecy0cuHFdyblTjYDKwmVRP3CgIW6y0qJSp8mQyTtM PtYN2P2o36J+AHshPV35Lrth9RV6rr8S8xsSzH4nc04666Rp4S2+02JelslprYkH8srk hqlg== X-Forwarded-Encrypted: i=1; AJvYcCXT1yje4fkPoMDppMvj4qLZ3ZA4xm6xY9/CikUTEZJHOnQjm/sUzP0Gxj8lQ3oagjC7LEqhe+64gWcji+o=@vger.kernel.org X-Gm-Message-State: AOJu0YxAk2kWd9aJ7ypdBI25HbhbqqwrmVzbul3ah3Yf7uUukgwgc+S3 MjRQRS0QyU3AbmixAUI9VtdxYYIjXOafNGG0rmwgLz+Dbrh8JMUtrHSx X-Gm-Gg: ASbGncuVfn6+alk7lFo0T8SfCExxLfXeTMI3WnjCnrBqBCyRxVNGmcZB2OWb5cU5F/4 /KhvSI/wUaFCWzCWqx1GLs4COVjyA78WiFL002lewI2IgTKqLYfqOscRzOkC0laOQ8k3nrffu+F WfvDaTDI91mZAUqDxnAv0rWOTqqrprOACfeB9Rzdp+qIIfTz80eFQXgXiE+ff2qgqX1F+NNzT9w B/6WNO4OzPxCKG/rdKFSL4zKGfwOEnt6QSyNfDTm7f11mAbFUvzTneTvyakXKx9DOcz8gPRFP9b V5BSAO8qXqmc7uBZ0vleguw2EEavk6pbsOblvEsTY7BUOmXqYHD717lEuOJiIx/bJa4v8ucmXxS 8DdYjdpBTXVaLhkLNt1CQvDZNV1gXGdFoiYB4rwZEQg4= X-Google-Smtp-Source: AGHT+IEcPZHfQsXkx6bHbhcM9iJSTKc3QJHEBv91uXTk4MUcoIJLLMnTTk/gpFeBWs1ggfq173oF0Q== X-Received: by 2002:a05:600c:3505:b0:43d:abd:ad1c with SMTP id 5b1f17b1804b1-4533cadf55emr25556665e9.6.1749894857099; Sat, 14 Jun 2025 02:54:17 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:16 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 09/10] lib: mul_u64_u64_div_u64() Optimise the divide code Date: Sat, 14 Jun 2025 10:53:45 +0100 Message-Id: <20250614095346.69130-10-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Replace the bit by bit algorithm with one that generates 16 bits per iteration on 32bit architectures and 32 bits on 64bit ones. On my zen 5 this reduces the time for the tests (using the generic code) from ~3350ns to ~1000ns. Running the 32bit algorithm on 64bit x86 takes ~1500ns. It'll be slightly slower on a real 32bit system, mostly due to register pressure. The savings for 32bit x86 are much higher (tested in userspace). The worst case (lots of bits in the quotient) drops from ~900 clocks to ~130 (pretty much independant of the arguments). Other 32bit architectures may see better savings. It is possibly to optimise for divisors that span less than __LONG_WIDTH__/2 bits. However I suspect they don't happen that often and it doesn't remove any slow cpu divide instructions which dominate the result. Signed-off-by: David Laight --- new patch for v3. lib/math/div64.c | 124 +++++++++++++++++++++++++++++++++-------------- 1 file changed, 87 insertions(+), 37 deletions(-) diff --git a/lib/math/div64.c b/lib/math/div64.c index fb77fd9d999d..bb318ff2ad87 100644 --- a/lib/math/div64.c +++ b/lib/math/div64.c @@ -221,11 +221,37 @@ static inline u64 mul_u64_u64_add_u64(u64 *p_lo, u64 = a, u64 b, u64 c) =20 #endif =20 +#ifndef BITS_PER_ITER +#define BITS_PER_ITER (__LONG_WIDTH__ >=3D 64 ? 32 : 16) +#endif + +#if BITS_PER_ITER =3D=3D 32 +#define mul_u64_long_add_u64(p_lo, a, b, c) mul_u64_u64_add_u64(p_lo, a, b= , c) +#define add_u64_long(a, b) ((a) + (b)) + +#else +static inline u32 mul_u64_long_add_u64(u64 *p_lo, u64 a, u32 b, u64 c) +{ + u64 n_lo =3D mul_add(a, b, c); + u64 n_med =3D mul_add(a >> 32, b, c >> 32); + + n_med =3D add_u64_u32(n_med, n_lo >> 32); + *p_lo =3D n_med << 32 | (u32)n_lo; + return n_med >> 32; +} + +#define add_u64_long(a, b) add_u64_u32(a, b) + +#endif + u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d) { - u64 n_lo, n_hi; + unsigned long d_msig, q_digit; + unsigned int reps, d_z_hi; + u64 quotient, n_lo, n_hi; + u32 overflow; =20 if (WARN_ONCE(!d, "%s: division of (%#llx * %#llx + %#llx) by zero, retur= ning 0", __func__, a, b, c )) { /* * Return 0 (rather than ~(u64)0) because it is less likely to @@ -243,46 +269,70 @@ u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 = d) __func__, a, b, c, n_hi, n_lo, d)) return ~(u64)0; =20 - int shift =3D __builtin_ctzll(d); - - /* try reducing the fraction in case the dividend becomes <=3D 64 bits */ - if ((n_hi >> shift) =3D=3D 0) { - u64 n =3D shift ? (n_lo >> shift) | (n_hi << (64 - shift)) : n_lo; - - return div64_u64(n, d >> shift); - /* - * The remainder value if needed would be: - * res =3D div64_u64_rem(n, d >> shift, &rem); - * rem =3D (rem << shift) + (n_lo - (n << shift)); - */ + /* Left align the divisor, shifting the dividend to match */ + d_z_hi =3D __builtin_clzll(d); + if (d_z_hi) { + d <<=3D d_z_hi; + n_hi =3D n_hi << d_z_hi | n_lo >> (64 - d_z_hi); + n_lo <<=3D d_z_hi; } =20 - /* Do the full 128 by 64 bits division */ - - shift =3D __builtin_clzll(d); - d <<=3D shift; - - int p =3D 64 + shift; - u64 res =3D 0; - bool carry; + reps =3D 64 / BITS_PER_ITER; + /* Optimise loop count for small dividends */ + if (!(u32)(n_hi >> 32)) { + reps -=3D 32 / BITS_PER_ITER; + n_hi =3D n_hi << 32 | n_lo >> 32; + n_lo <<=3D 32; + } +#if BITS_PER_ITER =3D=3D 16 + if (!(u32)(n_hi >> 48)) { + reps--; + n_hi =3D add_u64_u32(n_hi << 16, n_lo >> 48); + n_lo <<=3D 16; + } +#endif =20 - do { - carry =3D n_hi >> 63; - shift =3D carry ? 1 : __builtin_clzll(n_hi); - if (p < shift) - break; - p -=3D shift; - n_hi <<=3D shift; - n_hi |=3D n_lo >> (64 - shift); - n_lo <<=3D shift; - if (carry || (n_hi >=3D d)) { - n_hi -=3D d; - res |=3D 1ULL << p; + /* Invert the dividend so we can use add instead of subtract. */ + n_lo =3D ~n_lo; + n_hi =3D ~n_hi; + + /* + * Get the most significant BITS_PER_ITER bits of the divisor. + * This is used to get a low 'guestimate' of the quotient digit. + */ + d_msig =3D (d >> (64 - BITS_PER_ITER)) + 1; + + /* + * Now do a 'long division' with BITS_PER_ITER bit 'digits'. + * The 'guess' quotient digit can be low and BITS_PER_ITER+1 bits. + * The worst case is dividing ~0 by 0x8000 which requires two subtracts. + */ + quotient =3D 0; + while (reps--) { + q_digit =3D (unsigned long)(~n_hi >> (64 - 2 * BITS_PER_ITER)) / d_msig; + /* Shift 'n' left to align with the product q_digit * d */ + overflow =3D n_hi >> (64 - BITS_PER_ITER); + n_hi =3D add_u64_u32(n_hi << BITS_PER_ITER, n_lo >> (64 - BITS_PER_ITER)= ); + n_lo <<=3D BITS_PER_ITER; + /* Add product to negated divisor */ + overflow +=3D mul_u64_long_add_u64(&n_hi, d, q_digit, n_hi); + /* Adjust for the q_digit 'guestimate' being low */ + while (overflow < 0xffffffff >> (32 - BITS_PER_ITER)) { + q_digit++; + n_hi +=3D d; + overflow +=3D n_hi < d; } - } while (n_hi); - /* The remainder value if needed would be n_hi << p */ + quotient =3D add_u64_long(quotient << BITS_PER_ITER, q_digit); + } =20 - return res; + /* + * The above only ensures the remainder doesn't overflow, + * it can still be possible to add (aka subtract) another copy + * of the divisor. + */ + if ((n_hi + d) > n_hi) + quotient++; + return quotient; } #if !defined(test_mul_u64_add_u64_div_u64) EXPORT_SYMBOL(mul_u64_add_u64_div_u64); --=20 2.39.5 From nobody Fri Oct 10 09:15:02 2025 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DCC7256C81 for ; Sat, 14 Jun 2025 09:54:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894861; cv=none; b=Y9R/LVT9CAJn41GFNwui6wabHPr/3FU8cfnXs3AaoHai3TA3K4vlsKEF2ll8mnV0jG/O7aWUB9ZlIjmK8bD01jTAjwKsGqb7t1+Ey+k23PBqcKbJU6MWJ33wWpCLuT97mHpi9wTl2wSG8bvP0o82wKqjEhUD74Mfio+Gw4JypTY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749894861; c=relaxed/simple; bh=sw/StANdfeuHD1lCmTFwq6z44Xlo57UZpd7FUXDdO8U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RithnU9UB/8+GvJ4pgqu0aZiOpMNsGTCiqsm8MnzIQW2FVhkidhm5O3ytCWZCmyHoYaq7mc+UYB9pIrqVsuXUHXf/HD+4LO+aGzdPVSmmE0O3/vOGpc9NpCQ73gHawlvFBDy/Kq0HJPFGnBuCvFiaaaxAQAk20FozVdj5wXfxYc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hIrINYXd; arc=none smtp.client-ip=209.85.221.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hIrINYXd" Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-3a5257748e1so2130152f8f.2 for ; Sat, 14 Jun 2025 02:54:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749894858; x=1750499658; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2hU2TAGn71FWrOpga/52dRwN9mYl6H2hUFYejrisY8s=; b=hIrINYXd3SFFZXSou03VixHxHTRawvGGPHqn5RUgHwuFXOeGBytd2UaRnWir3ShgkK qa1TiEJUwV5SLOPNProN3J5EjQxE+9FR0qd5Yn7QqZZy4mv5FBFAKYu4DnCTcUIZOVEm k7q8531/LOQET+8D4LE9816Wy+0NrtbVQy152pmcLx/kd0Re4WZEXt7WQPQegB43JAWJ 0oKNi8xhRLPY6bXaTDpJqlRIsTbZWgqGhVStd95O5Yxk4Db2mGWUbur2iKjEcP3QlYi5 hfpB8S9gPqT0r4ksQgE4T6pPCRUj+SH5+Q+LJBfOnIYqhm6Xj49P4DyssiWZUtMsPOgp bGvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749894858; x=1750499658; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2hU2TAGn71FWrOpga/52dRwN9mYl6H2hUFYejrisY8s=; b=FdPw44uWbozJL0bqvuq24XNU7oWc9K/+ZQr1wba2RL97Ef3vDboJp3KolZeckuKS9z XxWCuQxIzi4s+OGumdMJPQWYLQ4JbF6Tpgp0NzjpXy5Q9stuE0PKCZPqUNnU32p2sSPH /8+7FgiThA/CMIaBZbzeEzswXGkLGqzP7ZFpQyit6uvHhNYwDBFNUH/VJR1a9Xl1w3zw 4iiZd6JKYyKYzlaQKN2Q9pCEbgvIKGN1U8qJ0HTWW00JG7UFWYBKvUBVtyHomr1MmRmW b6dsSqkaV5+/O53prZn1khs0Rc99hg+mDFzEGBtK7ezGCeeoGlGOMXTHdnHuRjlZTl2y 7OzA== X-Forwarded-Encrypted: i=1; AJvYcCWYakRzykyV2nog+fKgQSyTtpHHbiyK+g0M294TTGMlenhDu/c4DriQh9TsatMdp7Xov6QkgkJ+E8r2d1g=@vger.kernel.org X-Gm-Message-State: AOJu0Yxd2soVSwNLjo4UPQ2eWtJ8NUH+h9/qihO+f5hlFb7SB1ZMiwLM GYwI4nWG/g/C5LZmYIiNHjXnSqvnB9EtfAssGJbgaFhQ+7EePIhAE1uATiczlQ== X-Gm-Gg: ASbGnctKVnBaKZvxil7kWLqBysSpYmsZqLy2hpGvfwjOEm2EZPh8Mhlth99RIG/xNUO aFn+/VRB3GZcWn1/sV7/kvZgF8QD5AzAhSFBZNhKXhHA7ES15kgIgDyXWu60ni/rfvEDnCl2Tp3 RnrKIqTmNiSDz1scEiqyga2USQnvW3PRxHk9UqKV1+WfrqZfPa1LKz8Dzo+KoQaGOFiuFOkwPmg tYtLaJ2xhVTiuOg0cIoSqy95ZBG+9O/kp8LVdPS1+hyhoXa4lhbpxvpfkPk8ipkeFWDtJVj+PpX j5JFO2V01i4DMNN/IpU3LDXN31X204+P0VckdUwKLNU9a9kh2m55bAxzqggd+N3knAcgUZ4wFn5 MPkHmnwf4vR06Ac7tbhg8gXrBqFz0Hub6AvGMHZozlCQ= X-Google-Smtp-Source: AGHT+IFNq8WKXC8iWJP1ApWys1+VExNhCOBY+Wn3G77QMl9MpTqIxPJ8mysvwAhoiiZ+XZuLFhDiiA== X-Received: by 2002:a5d:5e8b:0:b0:3a4:d0fe:429f with SMTP id ffacd0b85a97d-3a5723717f5mr2331756f8f.14.1749894857647; Sat, 14 Jun 2025 02:54:17 -0700 (PDT) Received: from snowdrop.snailnet.com (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a568b19b32sm4869444f8f.67.2025.06.14.02.54.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Jun 2025 02:54:17 -0700 (PDT) From: David Laight To: Andrew Morton , linux-kernel@vger.kernel.org Cc: David Laight , u.kleine-koenig@baylibre.com, Nicolas Pitre , Oleg Nesterov , Peter Zijlstra , Biju Das Subject: [PATCH v3 next 10/10] lib: test_mul_u64_u64_div_u64: Test the 32bit code on 64bit Date: Sat, 14 Jun 2025 10:53:46 +0100 Message-Id: <20250614095346.69130-11-david.laight.linux@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250614095346.69130-1-david.laight.linux@gmail.com> References: <20250614095346.69130-1-david.laight.linux@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There are slight differences in the mul_u64_add_u64_div_u64() code between 32bit and 64bit systems. Compile and test the 32bit version on 64bit hosts for better test coverage. Signed-off-by: David Laight --- new patch for v3. lib/math/test_mul_u64_u64_div_u64.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/lib/math/test_mul_u64_u64_div_u64.c b/lib/math/test_mul_u64_u6= 4_div_u64.c index f0134f25cb0d..ff5df742ec8a 100644 --- a/lib/math/test_mul_u64_u64_div_u64.c +++ b/lib/math/test_mul_u64_u64_div_u64.c @@ -74,6 +74,10 @@ done */ =20 static u64 test_mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 d); +#if __LONG_WIDTH__ >=3D 64 +#define TEST_32BIT_DIV +static u64 test_mul_u64_add_u64_div_u64_32bit(u64 a, u64 b, u64 c, u64 d); +#endif =20 static int __init test_run(unsigned int fn_no, const char *fn_name) { @@ -100,6 +104,12 @@ static int __init test_run(unsigned int fn_no, const c= har *fn_name) result =3D test_mul_u64_add_u64_div_u64(a, b, 0, d); result_up =3D test_mul_u64_add_u64_div_u64(a, b, d - 1, d); break; +#ifdef TEST_32BIT_DIV + case 2: + result =3D test_mul_u64_add_u64_div_u64_32bit(a, b, 0, d); + result_up =3D test_mul_u64_add_u64_div_u64_32bit(a, b, d - 1, d); + break; +#endif } =20 tests +=3D 2; @@ -131,6 +141,10 @@ static int __init test_init(void) return -EINVAL; if (test_run(1, "test_mul_u64_u64_div_u64")) return -EINVAL; +#ifdef TEST_32BIT_DIV + if (test_run(2, "test_mul_u64_u64_div_u64_32bit")) + return -EINVAL; +#endif return 0; } =20 @@ -149,6 +163,21 @@ static void __exit test_exit(void) =20 #include "div64.c" =20 +#ifdef TEST_32BIT_DIV +/* Recompile the generic code for 32bit long */ +#undef test_mul_u64_add_u64_div_u64 +#define test_mul_u64_add_u64_div_u64 test_mul_u64_add_u64_div_u64_32bit +#undef BITS_PER_ITER +#define BITS_PER_ITER 16 + +#define mul_add mul_add_32bit +#define mul_u64_u64_add_u64 mul_u64_u64_add_u64_32bit +#undef mul_u64_long_add_u64 +#undef add_u64_long + +#include "div64.c" +#endif + module_init(test_init); module_exit(test_exit); =20 --=20 2.39.5