From nobody Sun Feb 8 18:24:48 2026 Received: from pb-smtp20.pobox.com (pb-smtp20.pobox.com [173.228.157.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6543C6116; Mon, 8 Jul 2024 01:27:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.228.157.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720402080; cv=none; b=SiKzts5MCmfgYlwbTM15pw/DYgDuJSVm9qMK/MaioerPaDOX0WA6/h5YZT2SuaWW+Rz3bFiOwRPvO2/C+aev1pq6kgnc+n7sSrLWze2dloBBnj6uDGlkQbui5RrdVHrRkKmghFZa+3at0aloMAwMLiO1fRvHLfPAo/MahfUfMlo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720402080; c=relaxed/simple; bh=DQ1QAacvuwMrmpkxxbOR3zurnYTNDoQ2zsyiFBbAL1g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K2YMZrGaWIrRIgG4BlXPnFjXsyqchtOxWo7fz1j9jtIGefkUTr6WueUsFylDdBOq+kqyZdz7Ff5imXSnllSljhQN+1Cue3/BrB2ik6boc3TxEDB15Wkk7Ad1JkTN/CuglORmXX7tCPdC84ekU5GLxvF1X/gcUXHk9ePAULVMzLU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (1024-bit key) header.d=pobox.com header.i=@pobox.com header.b=PTzdWbTB; dkim=fail (1024-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=i2vIPkS8 reason="signature verification failed"; arc=none smtp.client-ip=173.228.157.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=pobox.com header.i=@pobox.com header.b="PTzdWbTB"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="i2vIPkS8" Received: from pb-smtp20.pobox.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id EBF8C305D9; Sun, 7 Jul 2024 21:27:57 -0400 (EDT) (envelope-from nico@fluxnic.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=sasl; bh=DQ1QAacvuwMrmpkxxbOR3zurn YTNDoQ2zsyiFBbAL1g=; b=PTzdWbTBjXmYXeo+yjfiQMLkyltWAb2PXkO4P9NRi IJ3SInEjJyhbdV2uxKGrGtt3O4CeEg8kzEiY0mQA/60KjdOU9L3KMOucBflw6Qqn kQeNeslEhl2EhuAyaDdp4Q+054ch4DJTZuClxinNBnUzUxCKsJ6bT3B6uwmOBWH1 7Y= Received: from pb-smtp20.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id E3918305D8; Sun, 7 Jul 2024 21:27:57 -0400 (EDT) (envelope-from nico@fluxnic.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=fluxnic.net; h=from:to:cc:subject:date:message-id:in-reply-to:references:mime-version:content-transfer-encoding; s=2016-12.pbsmtp; bh=9vregTxqV1G6NmQ0sYraOAMwCNG3P0fo7gosidspjcs=; b=i2vIPkS8st3uYyw7KT4nQTxm23roJDy4OKbtfctwQchKnaKzu6CpYJTMmDeLo66T/vked6GKHtRhUX33wd+O+6nB3SPnKJHWzJ9n+gZvCV9kU44XbQOmFKFsqa6mFxGJZQm5xGW67gh5Z00mhWwanKQc7figWu8GirvcC4rDQoM= Received: from yoda.fluxnic.net (unknown [184.162.15.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp20.pobox.com (Postfix) with ESMTPSA id B8F57305D6; Sun, 7 Jul 2024 21:27:53 -0400 (EDT) (envelope-from nico@fluxnic.net) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 75F29D3BCFE; Sun, 7 Jul 2024 21:27:51 -0400 (EDT) From: Nicolas Pitre To: Arnd Bergmann , Russell King Cc: Nicolas Pitre , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 1/4] lib/math/test_div64: add some edge cases relevant to __div64_const32() Date: Sun, 7 Jul 2024 21:27:14 -0400 Message-ID: <20240708012749.2098373-2-nico@fluxnic.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240708012749.2098373-1-nico@fluxnic.net> References: <20240708012749.2098373-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Pobox-Relay-ID: 4C491FB8-3CC9-11EF-BD15-C38742FD603B-78420484!pb-smtp20.pobox.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Nicolas Pitre Be sure to test the extreme cases with and without bias. Signed-off-by: Nicolas Pitre --- lib/math/test_div64.c | 85 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 83 insertions(+), 2 deletions(-) diff --git a/lib/math/test_div64.c b/lib/math/test_div64.c index c15edd688d..3cd699b654 100644 --- a/lib/math/test_div64.c +++ b/lib/math/test_div64.c @@ -26,6 +26,9 @@ static const u64 test_div64_dividends[] =3D { 0x0072db27380dd689, 0x0842f488162e2284, 0xf66745411d8ab063, + 0xfffffffffffffffb, + 0xfffffffffffffffc, + 0xffffffffffffffff, }; #define SIZE_DIV64_DIVIDENDS ARRAY_SIZE(test_div64_dividends) =20 @@ -37,7 +40,10 @@ static const u64 test_div64_dividends[] =3D { #define TEST_DIV64_DIVISOR_5 0x0008a880 #define TEST_DIV64_DIVISOR_6 0x003fd3ae #define TEST_DIV64_DIVISOR_7 0x0b658fac -#define TEST_DIV64_DIVISOR_8 0xdc08b349 +#define TEST_DIV64_DIVISOR_8 0x80000001 +#define TEST_DIV64_DIVISOR_9 0xdc08b349 +#define TEST_DIV64_DIVISOR_A 0xfffffffe +#define TEST_DIV64_DIVISOR_B 0xffffffff =20 static const u32 test_div64_divisors[] =3D { TEST_DIV64_DIVISOR_0, @@ -49,13 +55,16 @@ static const u32 test_div64_divisors[] =3D { TEST_DIV64_DIVISOR_6, TEST_DIV64_DIVISOR_7, TEST_DIV64_DIVISOR_8, + TEST_DIV64_DIVISOR_9, + TEST_DIV64_DIVISOR_A, + TEST_DIV64_DIVISOR_B, }; #define SIZE_DIV64_DIVISORS ARRAY_SIZE(test_div64_divisors) =20 static const struct { u64 quotient; u32 remainder; -} test_div64_results[SIZE_DIV64_DIVISORS][SIZE_DIV64_DIVIDENDS] =3D { +} test_div64_results[SIZE_DIV64_DIVIDENDS][SIZE_DIV64_DIVISORS] =3D { { { 0x0000000013045e47, 0x00000001 }, { 0x000000000161596c, 0x00000030 }, @@ -65,6 +74,9 @@ static const struct { { 0x00000000000013c4, 0x0004ce80 }, { 0x00000000000002ae, 0x001e143c }, { 0x000000000000000f, 0x0033e56c }, + { 0x0000000000000001, 0x2b27507f }, + { 0x0000000000000000, 0xab275080 }, + { 0x0000000000000000, 0xab275080 }, { 0x0000000000000000, 0xab275080 }, }, { { 0x00000001c45c02d1, 0x00000000 }, @@ -75,7 +87,10 @@ static const struct { { 0x000000000001d637, 0x0004e5d9 }, { 0x0000000000003fc9, 0x000713bb }, { 0x0000000000000165, 0x029abe7d }, + { 0x000000000000001f, 0x673c193a }, { 0x0000000000000012, 0x6e9f7e37 }, + { 0x000000000000000f, 0xe73c1977 }, + { 0x000000000000000f, 0xe73c1968 }, }, { { 0x000000197a3a0cf7, 0x00000002 }, { 0x00000001d9632e5c, 0x00000021 }, @@ -85,7 +100,10 @@ static const struct { { 0x00000000001a7bb3, 0x00072331 }, { 0x00000000000397ad, 0x0002c61b }, { 0x000000000000141e, 0x06ea2e89 }, + { 0x00000000000001ca, 0x4c0a72e7 }, { 0x000000000000010a, 0xab002ad7 }, + { 0x00000000000000e5, 0x4c0a767b }, + { 0x00000000000000e5, 0x4c0a7596 }, }, { { 0x0000017949e37538, 0x00000001 }, { 0x0000001b62441f37, 0x00000055 }, @@ -95,7 +113,10 @@ static const struct { { 0x0000000001882ec6, 0x0005cbf9 }, { 0x000000000035333b, 0x0017abdf }, { 0x00000000000129f1, 0x0ab4520d }, + { 0x0000000000001a87, 0x18ff0472 }, { 0x0000000000000f6e, 0x8ac0ce9b }, + { 0x0000000000000d43, 0x98ff397f }, + { 0x0000000000000d43, 0x98ff2c3c }, }, { { 0x000011f321a74e49, 0x00000006 }, { 0x0000014d8481d211, 0x0000005b }, @@ -105,7 +126,10 @@ static const struct { { 0x0000000012a88828, 0x00036c97 }, { 0x000000000287f16f, 0x002c2a25 }, { 0x00000000000e2cc7, 0x02d581e3 }, + { 0x0000000000014318, 0x2ee07d7f }, { 0x000000000000bbf4, 0x1ba08c03 }, + { 0x000000000000a18c, 0x2ee303af }, + { 0x000000000000a18c, 0x2ee26223 }, }, { { 0x0000d8db8f72935d, 0x00000005 }, { 0x00000fbd5aed7a2e, 0x00000002 }, @@ -115,7 +139,10 @@ static const struct { { 0x00000000e16b20fa, 0x0002a14a }, { 0x000000001e940d22, 0x00353b2e }, { 0x0000000000ab40ac, 0x06fba6ba }, + { 0x00000000000f3f70, 0x0af7eeda }, { 0x000000000008debd, 0x72d98365 }, + { 0x0000000000079fb8, 0x0b166dba }, + { 0x0000000000079fb8, 0x0b0ece02 }, }, { { 0x000cc3045b8fc281, 0x00000000 }, { 0x0000ed1f48b5c9fc, 0x00000079 }, @@ -125,7 +152,10 @@ static const struct { { 0x0000000d43fce827, 0x00082b09 }, { 0x00000001ccaba11a, 0x0037e8dd }, { 0x000000000a13f729, 0x0566dffd }, + { 0x0000000000e5b64e, 0x3728203b }, { 0x000000000085a14b, 0x23d36726 }, + { 0x000000000072db27, 0x38f38cd7 }, + { 0x000000000072db27, 0x3880b1b0 }, }, { { 0x00eafeb9c993592b, 0x00000001 }, { 0x00110e5befa9a991, 0x00000048 }, @@ -135,7 +165,10 @@ static const struct { { 0x000000f4459740fc, 0x00084484 }, { 0x0000002122c47bf9, 0x002ca446 }, { 0x00000000b9936290, 0x004979c4 }, + { 0x000000001085e910, 0x05a83974 }, { 0x00000000099ca89d, 0x9db446bf }, + { 0x000000000842f488, 0x26b40b94 }, + { 0x000000000842f488, 0x1e71170c }, }, { { 0x1b60cece589da1d2, 0x00000001 }, { 0x01fcb42be1453f5b, 0x0000004f }, @@ -145,7 +178,49 @@ static const struct { { 0x00001c757dfab350, 0x00048863 }, { 0x000003dc4979c652, 0x00224ea7 }, { 0x000000159edc3144, 0x06409ab3 }, + { 0x00000001ecce8a7e, 0x30bc25e5 }, { 0x000000011eadfee3, 0xa99c48a8 }, + { 0x00000000f6674543, 0x0a593ae9 }, + { 0x00000000f6674542, 0x13f1f5a5 }, + }, { + { 0x1c71c71c71c71c71, 0x00000002 }, + { 0x0210842108421084, 0x0000000b }, + { 0x007f01fc07f01fc0, 0x000000fb }, + { 0x00014245eabf1f9a, 0x0000a63d }, + { 0x0000ffffffffffff, 0x0000fffb }, + { 0x00001d913cecc509, 0x0007937b }, + { 0x00000402c70c678f, 0x0005bfc9 }, + { 0x00000016766cb70b, 0x045edf97 }, + { 0x00000001fffffffb, 0x80000000 }, + { 0x0000000129d84b3a, 0xa2e8fe71 }, + { 0x0000000100000001, 0xfffffffd }, + { 0x0000000100000000, 0xfffffffb }, + }, { + { 0x1c71c71c71c71c71, 0x00000003 }, + { 0x0210842108421084, 0x0000000c }, + { 0x007f01fc07f01fc0, 0x000000fc }, + { 0x00014245eabf1f9a, 0x0000a63e }, + { 0x0000ffffffffffff, 0x0000fffc }, + { 0x00001d913cecc509, 0x0007937c }, + { 0x00000402c70c678f, 0x0005bfca }, + { 0x00000016766cb70b, 0x045edf98 }, + { 0x00000001fffffffc, 0x00000000 }, + { 0x0000000129d84b3a, 0xa2e8fe72 }, + { 0x0000000100000002, 0x00000000 }, + { 0x0000000100000000, 0xfffffffc }, + }, { + { 0x1c71c71c71c71c71, 0x00000006 }, + { 0x0210842108421084, 0x0000000f }, + { 0x007f01fc07f01fc0, 0x000000ff }, + { 0x00014245eabf1f9a, 0x0000a641 }, + { 0x0000ffffffffffff, 0x0000ffff }, + { 0x00001d913cecc509, 0x0007937f }, + { 0x00000402c70c678f, 0x0005bfcd }, + { 0x00000016766cb70b, 0x045edf9b }, + { 0x00000001fffffffc, 0x00000003 }, + { 0x0000000129d84b3a, 0xa2e8fe75 }, + { 0x0000000100000002, 0x00000003 }, + { 0x0000000100000001, 0x00000000 }, }, }; =20 @@ -208,6 +283,12 @@ static bool __init test_div64(void) return false; if (!test_div64_one(dividend, TEST_DIV64_DIVISOR_8, i, 8)) return false; + if (!test_div64_one(dividend, TEST_DIV64_DIVISOR_9, i, 9)) + return false; + if (!test_div64_one(dividend, TEST_DIV64_DIVISOR_A, i, 10)) + return false; + if (!test_div64_one(dividend, TEST_DIV64_DIVISOR_B, i, 11)) + return false; for (j =3D 0; j < SIZE_DIV64_DIVISORS; j++) { if (!test_div64_one(dividend, test_div64_divisors[j], i, j)) --=20 2.45.2 From nobody Sun Feb 8 18:24:48 2026 Received: from pb-smtp2.pobox.com (pb-smtp2.pobox.com [64.147.108.71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85FCFEDC; Mon, 8 Jul 2024 01:27:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.108.71 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720402076; cv=none; b=rAkdrftkAG68MYDnpXnxrIwa0Pfp87k9h/uhBRtSrljF+Gi9i6Hnrvl5G1o5C3oV+F1+HR621ByIhhDbWYqL1a4xqPNY8A0yGktyTrbxKHWUbnIMgSC+zIhhmW4IPWh6vFZm9snIX+79yfc2xJl1GEnOZgK9F/8MlY913LrRNdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720402076; c=relaxed/simple; bh=mMSq6t9G/IHT+FHiu7gS6S1bs9WXkhLQQHcvmvK3aeA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FboHOIYJStjkdPVZoJxZcTjTzYdLflJ7olR8OL8KjbpGJuM/nfn0CjQmUTMUUZPQ/5/zgTRXmWmBxZs4vngAxmg2NN9mKLhKaAwgeCKoJ59iJ1nJ6/46GOrwHpJHcoGAEHl/qgKLy/ajjT82ToThf/omvcbwA8dCgr45tDLhoQg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (1024-bit key) header.d=pobox.com header.i=@pobox.com header.b=meTRqOFT; dkim=fail (1024-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=SHjOl6cQ reason="signature verification failed"; arc=none smtp.client-ip=64.147.108.71 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=pobox.com header.i=@pobox.com header.b="meTRqOFT"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="SHjOl6cQ" Received: from pb-smtp2.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 56EE81C32E; Sun, 7 Jul 2024 21:27:53 -0400 (EDT) (envelope-from nico@fluxnic.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=sasl; bh=mMSq6t9G/IHT+FHiu7gS6S1bs 9WXkhLQQHcvmvK3aeA=; b=meTRqOFTQ4adZjbf+aHnbgQ7DDQERO6fD3xwxwnjo Io0OBwzi7kmhMGJ/DucR0zxg+tJYn/AUv3hIm88gcwe+9UKa3D6/qyKn/2biXGoY PGixmrt6US0+XH1g1yVSF66f7XSnz25kcdq3u3jb1rxpwcizvwIlnBbGY6+TtBa2 QQ= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 4F5741C32D; Sun, 7 Jul 2024 21:27:53 -0400 (EDT) (envelope-from nico@fluxnic.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=fluxnic.net; h=from:to:cc:subject:date:message-id:in-reply-to:references:mime-version:content-transfer-encoding; s=2016-12.pbsmtp; bh=/DDnUJ29jmXhWLjJfdTuobMgv2bC3GfpHPNMc0mP3CQ=; b=SHjOl6cQuk0cT2/OlEobDKBbpLvkEwVXDn+OG9hf1NVAu1lw40HWjuPikig7xk1mCGi/nsxYC9X2OL6HKOXwAFNxdZgypvYRYkgwGPYIH447lYogJR+oiTLZ8h6ESwCOeuHnhDQAesmstf2c59w+KtQZLXh2jTnHkaESPGSMjHI= Received: from yoda.fluxnic.net (unknown [184.162.15.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id AE5EB1C32A; Sun, 7 Jul 2024 21:27:52 -0400 (EDT) (envelope-from nico@fluxnic.net) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id 93E4ED3BCFF; Sun, 7 Jul 2024 21:27:51 -0400 (EDT) From: Nicolas Pitre To: Arnd Bergmann , Russell King Cc: Nicolas Pitre , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 2/4] asm-generic/div64: optimize/simplify __div64_const32() Date: Sun, 7 Jul 2024 21:27:15 -0400 Message-ID: <20240708012749.2098373-3-nico@fluxnic.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240708012749.2098373-1-nico@fluxnic.net> References: <20240708012749.2098373-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Pobox-Relay-ID: 4BAAA0C2-3CC9-11EF-B21F-965B910A682E-78420484!pb-smtp2.pobox.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Nicolas Pitre Several years later I just realized that this code could be greatly simplified. First, let's formalize the need for overflow handling in __arch_xprod64(). Assuming n =3D UINT64_MAX, there are 2 cases where an overflow may occur: 1) If a bias must be added, we have m_lo * n_lo + m or m_lo * 0xffffffff + ((m_hi << 32) + m_lo) or ((m_lo << 32) - m_lo) + ((m_hi << 32) + m_lo) or (m_lo + m_hi) << 32 which must be < (1 << 64). So the criteria for no overflow is m_lo + m_hi < (1 << 32). 2) The cross product m_lo * n_hi + m_hi * n_lo or m_lo * 0xffffffff + m_hi * 0xffffffff or ((m_lo << 32) - m_lo) + ((m_hi << 32) - m_hi). Assuming the top result from the previous step (m_lo + m_hi) that must be added to this, we get (m_lo + m_hi) << 32 again. So let's have a straight and simpler version when this is true. Otherwise some reordering allows for taking care of possible overflows without any actual conditionals. And prevent from generating both code variants by making sure this is considered only if m is perceived as constant by the compiler. This, in turn, allows for greatly simplifying __div64_const32(). The "special case" may go as well as the regular case works just fine without needing a bias. Then reduction should be applied all the time as minimizing m is the key. Signed-off-by: Nicolas Pitre --- include/asm-generic/div64.h | 114 +++++++++++------------------------- 1 file changed, 35 insertions(+), 79 deletions(-) diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h index 13f5aa68a4..5d59cf7e73 100644 --- a/include/asm-generic/div64.h +++ b/include/asm-generic/div64.h @@ -74,7 +74,8 @@ * do the trick here). \ */ \ uint64_t ___res, ___x, ___t, ___m, ___n =3D (n); \ - uint32_t ___p, ___bias; \ + uint32_t ___p; \ + bool ___bias =3D false; \ \ /* determine MSB of b */ \ ___p =3D 1 << ilog2(___b); \ @@ -87,22 +88,14 @@ ___x =3D ~0ULL / ___b * ___b - 1; \ \ /* test our ___m with res =3D m * x / (p << 64) */ \ - ___res =3D ((___m & 0xffffffff) * (___x & 0xffffffff)) >> 32; \ - ___t =3D ___res +=3D (___m & 0xffffffff) * (___x >> 32); \ - ___res +=3D (___x & 0xffffffff) * (___m >> 32); \ - ___t =3D (___res < ___t) ? (1ULL << 32) : 0; \ - ___res =3D (___res >> 32) + ___t; \ - ___res +=3D (___m >> 32) * (___x >> 32); \ - ___res /=3D ___p; \ + ___res =3D (___m & 0xffffffff) * (___x & 0xffffffff); \ + ___t =3D (___m & 0xffffffff) * (___x >> 32) + (___res >> 32); \ + ___res =3D (___m >> 32) * (___x >> 32) + (___t >> 32); \ + ___t =3D (___m >> 32) * (___x & 0xffffffff) + (___t & 0xffffffff);\ + ___res =3D (___res + (___t >> 32)) / ___p; \ \ - /* Now sanitize and optimize what we've got. */ \ - if (~0ULL % (___b / (___b & -___b)) =3D=3D 0) { \ - /* special case, can be simplified to ... */ \ - ___n /=3D (___b & -___b); \ - ___m =3D ~0ULL / (___b / (___b & -___b)); \ - ___p =3D 1; \ - ___bias =3D 1; \ - } else if (___res !=3D ___x / ___b) { \ + /* Now validate what we've got. */ \ + if (___res !=3D ___x / ___b) { \ /* \ * We can't get away without a bias to compensate \ * for bit truncation errors. To avoid it we'd need an \ @@ -111,45 +104,18 @@ * \ * Instead we do m =3D p / b and n / b =3D (n * m + m) / p. \ */ \ - ___bias =3D 1; \ + ___bias =3D true; \ /* Compute m =3D (p << 64) / b */ \ ___m =3D (~0ULL / ___b) * ___p; \ ___m +=3D ((~0ULL % ___b + 1) * ___p) / ___b; \ - } else { \ - /* \ - * Reduce m / p, and try to clear bit 31 of m when \ - * possible, otherwise that'll need extra overflow \ - * handling later. \ - */ \ - uint32_t ___bits =3D -(___m & -___m); \ - ___bits |=3D ___m >> 32; \ - ___bits =3D (~___bits) << 1; \ - /* \ - * If ___bits =3D=3D 0 then setting bit 31 is unavoidable. \ - * Simply apply the maximum possible reduction in that \ - * case. Otherwise the MSB of ___bits indicates the \ - * best reduction we should apply. \ - */ \ - if (!___bits) { \ - ___p /=3D (___m & -___m); \ - ___m /=3D (___m & -___m); \ - } else { \ - ___p >>=3D ilog2(___bits); \ - ___m >>=3D ilog2(___bits); \ - } \ - /* No bias needed. */ \ - ___bias =3D 0; \ } \ \ + /* Reduce m / p to help avoid overflow handling later. */ \ + ___p /=3D (___m & -___m); \ + ___m /=3D (___m & -___m); \ + \ /* \ - * Now we have a combination of 2 conditions: \ - * \ - * 1) whether or not we need to apply a bias, and \ - * \ - * 2) whether or not there might be an overflow in the cross \ - * product determined by (___m & ((1 << 63) | (1 << 31))). \ - * \ - * Select the best way to do (m_bias + m * n) / (1 << 64). \ + * Perform (m_bias + m * n) / (1 << 64). \ * From now on there will be actual runtime code generated. \ */ \ ___res =3D __arch_xprod_64(___m, ___n, ___bias); \ @@ -165,7 +131,7 @@ * Semantic: retval =3D ((bias ? m : 0) + m * n) >> 64 * * The product is a 128-bit value, scaled down to 64 bits. - * Assuming constant propagation to optimize away unused conditional code. + * Hoping for compile-time optimization of conditional code. * Architectures may provide their own optimized assembly implementation. */ static inline uint64_t __arch_xprod_64(const uint64_t m, uint64_t n, bool = bias) @@ -174,38 +140,28 @@ static inline uint64_t __arch_xprod_64(const uint64_t= m, uint64_t n, bool bias) uint32_t m_hi =3D m >> 32; uint32_t n_lo =3D n; uint32_t n_hi =3D n >> 32; - uint64_t res; - uint32_t res_lo, res_hi, tmp; - - if (!bias) { - res =3D ((uint64_t)m_lo * n_lo) >> 32; - } else if (!(m & ((1ULL << 63) | (1ULL << 31)))) { - /* there can't be any overflow here */ - res =3D (m + (uint64_t)m_lo * n_lo) >> 32; - } else { - res =3D m + (uint64_t)m_lo * n_lo; - res_lo =3D res >> 32; - res_hi =3D (res_lo < m_hi); - res =3D res_lo | ((uint64_t)res_hi << 32); - } - - if (!(m & ((1ULL << 63) | (1ULL << 31)))) { - /* there can't be any overflow here */ - res +=3D (uint64_t)m_lo * n_hi; - res +=3D (uint64_t)m_hi * n_lo; - res >>=3D 32; + uint64_t x, y; + + /* Determine if overflow handling can be dispensed with. */ + bool no_ovf =3D __builtin_constant_p(m) && + ((m >> 32) + (m & 0xffffffff) < 0x100000000); + + if (no_ovf) { + x =3D (uint64_t)m_lo * n_lo + (bias ? m : 0); + x >>=3D 32; + x +=3D (uint64_t)m_lo * n_hi; + x +=3D (uint64_t)m_hi * n_lo; + x >>=3D 32; + x +=3D (uint64_t)m_hi * n_hi; } else { - res +=3D (uint64_t)m_lo * n_hi; - tmp =3D res >> 32; - res +=3D (uint64_t)m_hi * n_lo; - res_lo =3D res >> 32; - res_hi =3D (res_lo < tmp); - res =3D res_lo | ((uint64_t)res_hi << 32); + x =3D (uint64_t)m_lo * n_lo + (bias ? m_lo : 0); + y =3D (uint64_t)m_lo * n_hi + (uint32_t)(x >> 32) + (bias ? m_hi : 0); + x =3D (uint64_t)m_hi * n_hi + (uint32_t)(y >> 32); + y =3D (uint64_t)m_hi * n_lo + (uint32_t)y; + x +=3D (uint32_t)(y >> 32); } =20 - res +=3D (uint64_t)m_hi * n_hi; - - return res; + return x; } #endif =20 --=20 2.45.2 From nobody Sun Feb 8 18:24:48 2026 Received: from pb-smtp21.pobox.com (pb-smtp21.pobox.com [173.228.157.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 989D16AAD; Mon, 8 Jul 2024 01:27:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.228.157.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720402079; cv=none; b=aZf3ayDpO7vsqq9lFoGboLxAbQ8RGVzzUhHY0wrznKEy/CBoLSCx2pVNziGlKTo/kK68f9Cxh5G4zPFwC621xDo0S4xPtrjOJbcfNYWUqpdYP1miekxKzd8D7DlFCE3Ta33Y3Us0s6+GqA9hfWTAyBxUcPdon3tNE7Ac34MDawk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720402079; c=relaxed/simple; bh=hh73mRA6cREHc2Vf6iteyClTsDzET6cc5Qu0tN72rs0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R+lPkMkVfz21oeHtWTc0yundxn7QKTqLJWdfX2xSdW2qGVcANhwXNDnpOxphnqJdmt/wj6Rt0k4VdX7DMb6LoSNF6stpyh9TQoFxLIRVWGTKiwMx2dB/ln2IvNOo5nJbTuRgTgKIBpGhxEPJ0jApAF743TdnGjuu78+LIKgiqyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (1024-bit key) header.d=pobox.com header.i=@pobox.com header.b=nr//5ujV; dkim=fail (1024-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=07PyNHYO reason="signature verification failed"; arc=none smtp.client-ip=173.228.157.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=pobox.com header.i=@pobox.com header.b="nr//5ujV"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="07PyNHYO" Received: from pb-smtp21.pobox.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 1B77C35315; Sun, 7 Jul 2024 21:27:58 -0400 (EDT) (envelope-from nico@fluxnic.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=sasl; bh=hh73mRA6cREHc2Vf6iteyClTs DzET6cc5Qu0tN72rs0=; b=nr//5ujVi78/L3ehXFYkmKkQjz+5/N6QEm/nuNxC9 BCIg/qzZwFgFXJyNKcKXEI/Hzt6sO6XJI4rYb6okAcxK1J7iTCHkvBE+LZ8elsHj TGeSnEc+jAuDJYzU6U0s+pqSyeb5i9h8jNBImIwaW39FHUtfQvq9LwT9WnAB1NiA 0o= Received: from pb-smtp21.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 13CA135314; Sun, 7 Jul 2024 21:27:58 -0400 (EDT) (envelope-from nico@fluxnic.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=fluxnic.net; h=from:to:cc:subject:date:message-id:in-reply-to:references:mime-version:content-transfer-encoding; s=2016-12.pbsmtp; bh=DF+iFCNV9DX0zDm8YrVM8tgrxkkUwPAYjWZ2z/l8CeE=; b=07PyNHYO/+uriSTQJj65HV4f9hrSf76zUgiHPwbUMnBMjIRUW8hHnpvnwPF7WeX+WOwLxckxKh8Ez5wr1Rae9ONblNbBVeCLSawsVFJW9h5Ab2PMxS0OGm847s0YJtoZGs+/9z6wqul1onibCy1duV8Z2ppx/a6oMjhmbQ+IlXM= Received: from yoda.fluxnic.net (unknown [184.162.15.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp21.pobox.com (Postfix) with ESMTPSA id 037F53530F; Sun, 7 Jul 2024 21:27:54 -0400 (EDT) (envelope-from nico@fluxnic.net) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id B35D1D3BD01; Sun, 7 Jul 2024 21:27:51 -0400 (EDT) From: Nicolas Pitre To: Arnd Bergmann , Russell King Cc: Nicolas Pitre , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 3/4] ARM: div64: improve __arch_xprod_64() Date: Sun, 7 Jul 2024 21:27:16 -0400 Message-ID: <20240708012749.2098373-4-nico@fluxnic.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240708012749.2098373-1-nico@fluxnic.net> References: <20240708012749.2098373-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Pobox-Relay-ID: 4C705696-3CC9-11EF-83C3-DFF1FEA446E2-78420484!pb-smtp21.pobox.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Nicolas Pitre Let's use the same criteria for overflow handling necessity as the generic code. Signed-off-by: Nicolas Pitre --- arch/arm/include/asm/div64.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h index 4b69cf8504..562d5376ae 100644 --- a/arch/arm/include/asm/div64.h +++ b/arch/arm/include/asm/div64.h @@ -56,6 +56,8 @@ static inline uint64_t __arch_xprod_64(uint64_t m, uint64= _t n, bool bias) { unsigned long long res; register unsigned int tmp asm("ip") =3D 0; + bool no_ovf =3D __builtin_constant_p(m) && + ((m >> 32) + (m & 0xffffffff) < 0x100000000); =20 if (!bias) { asm ( "umull %Q0, %R0, %Q1, %Q2\n\t" @@ -63,7 +65,7 @@ static inline uint64_t __arch_xprod_64(uint64_t m, uint64= _t n, bool bias) : "=3D&r" (res) : "r" (m), "r" (n) : "cc"); - } else if (!(m & ((1ULL << 63) | (1ULL << 31)))) { + } else if (no_ovf) { res =3D m; asm ( "umlal %Q0, %R0, %Q1, %Q2\n\t" "mov %Q0, #0" @@ -80,7 +82,7 @@ static inline uint64_t __arch_xprod_64(uint64_t m, uint64= _t n, bool bias) : "cc"); } =20 - if (!(m & ((1ULL << 63) | (1ULL << 31)))) { + if (no_ovf) { asm ( "umlal %R0, %Q0, %R1, %Q2\n\t" "umlal %R0, %Q0, %Q1, %R2\n\t" "mov %R0, #0\n\t" --=20 2.45.2 From nobody Sun Feb 8 18:24:48 2026 Received: from pb-smtp20.pobox.com (pb-smtp20.pobox.com [173.228.157.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A89D86FB6; Mon, 8 Jul 2024 01:27:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.228.157.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720402080; cv=none; b=AIFPza1FFNX40shsalWVCw63IuSBBeMSAEl0saXHhlC4HJjiwQI84FWIR6pV5LFzhdycH6iqqvtyg3wGDmfvInED+NW8yGL2bJOLfFqZxZgzLG+osO8apDbfYnwufrQkJVdtCavuFA6msf5gYe4gLGHp1+uVXI+90GgdYyM3cJM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720402080; c=relaxed/simple; bh=Iyvob8m8hAMZg2VcQl4+cqFXaOixCxGCDjPNS+xXtA0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qwDmnpMwxMZx0dFO0fyBtMCMnxQ/ixKOU5SYAzRrmE2lXlLvN0G5c8wtG+8eqUkjVP3zkojKlnLS6Rqt5SxiU8q2SpVaS0pJ2c/Gk+7VYAa4lafY3NMehvcj5EiivjjsJYeAuM9VNjLRHcYiXQKL9W8HeLLHuLaaPP432rp8/Ak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fluxnic.net; spf=pass smtp.mailfrom=fluxnic.net; dkim=pass (1024-bit key) header.d=pobox.com header.i=@pobox.com header.b=gvj0owoX; dkim=fail (1024-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b=x0OiMEcH reason="signature verification failed"; arc=none smtp.client-ip=173.228.157.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fluxnic.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=pobox.com header.i=@pobox.com header.b="gvj0owoX"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=fluxnic.net header.i=@fluxnic.net header.b="x0OiMEcH" Received: from pb-smtp20.pobox.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id 2E428305DB; Sun, 7 Jul 2024 21:27:58 -0400 (EDT) (envelope-from nico@fluxnic.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=sasl; bh=Iyvob8m8hAMZg2VcQl4+cqFXa OixCxGCDjPNS+xXtA0=; b=gvj0owoXvgJ/AWD9YLb5O6zh/aP34x2boQEKEr0fr e7fShveWOlCMLQXXd9XXRMNcZEQuaQdCAfv7Wnvd1ppEZ8/OV9KIXoRy743d8QBC a0+2ZE7n1ndxyYabI+sDzuHfPHw9YWMPiCuaxdrqxq3meFZgmBi9QpenYr8G+YWI Kc= Received: from pb-smtp20.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id 26B77305DA; Sun, 7 Jul 2024 21:27:58 -0400 (EDT) (envelope-from nico@fluxnic.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=fluxnic.net; h=from:to:cc:subject:date:message-id:in-reply-to:references:mime-version:content-transfer-encoding; s=2016-12.pbsmtp; bh=KdayDt3ACE2onXUdriUt4N9A11pw6subhbTj7vBnMBM=; b=x0OiMEcHoWb9lyqZmA/bg8DZTtvYY11LUkLJRq8dvK2HV0+3zbSF0AyhFRc4HHT59qDN/gd2JNki+E0uBXszwVUU4ZVIecv5CBbQPVZdL/4nmANuDiubFQYkd8bCSKW4cnn/xLsiExCBhd1p9p/uTAuy2/n3uiU7ZHqvznCWp34= Received: from yoda.fluxnic.net (unknown [184.162.15.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp20.pobox.com (Postfix) with ESMTPSA id 1FA54305D7; Sun, 7 Jul 2024 21:27:54 -0400 (EDT) (envelope-from nico@fluxnic.net) Received: from xanadu.lan (OpenWrt.lan [192.168.1.1]) by yoda.fluxnic.net (Postfix) with ESMTPSA id CF290D3BD02; Sun, 7 Jul 2024 21:27:51 -0400 (EDT) From: Nicolas Pitre To: Arnd Bergmann , Russell King Cc: Nicolas Pitre , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 4/4] __arch_xprod64(): make __always_inline when optimizing for performance Date: Sun, 7 Jul 2024 21:27:17 -0400 Message-ID: <20240708012749.2098373-5-nico@fluxnic.net> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240708012749.2098373-1-nico@fluxnic.net> References: <20240708012749.2098373-1-nico@fluxnic.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Pobox-Relay-ID: 4C81F428-3CC9-11EF-B11A-C38742FD603B-78420484!pb-smtp20.pobox.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Nicolas Pitre Recent gcc versions started not systematically inline __arch_xprod64() and that has performance implications. Give the compiler the freedom to decide only when optimizing for size. Signed-off-by: Nicolas Pitre --- arch/arm/include/asm/div64.h | 7 ++++++- include/asm-generic/div64.h | 7 ++++++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h index 562d5376ae..d3ef8e416b 100644 --- a/arch/arm/include/asm/div64.h +++ b/arch/arm/include/asm/div64.h @@ -52,7 +52,12 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t = base) =20 #else =20 -static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias) +#ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE +static __always_inline +#else +static inline +#endif +uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias) { unsigned long long res; register unsigned int tmp asm("ip") =3D 0; diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h index 5d59cf7e73..25e7b4b58d 100644 --- a/include/asm-generic/div64.h +++ b/include/asm-generic/div64.h @@ -134,7 +134,12 @@ * Hoping for compile-time optimization of conditional code. * Architectures may provide their own optimized assembly implementation. */ -static inline uint64_t __arch_xprod_64(const uint64_t m, uint64_t n, bool = bias) +#ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE +static __always_inline +#else +static inline +#endif +uint64_t __arch_xprod_64(const uint64_t m, uint64_t n, bool bias) { uint32_t m_lo =3D m; uint32_t m_hi =3D m >> 32; --=20 2.45.2