From nobody Wed Apr 15 01:30:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EEDDC04A68 for ; Thu, 28 Jul 2022 16:12:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231764AbiG1QM2 (ORCPT ); Thu, 28 Jul 2022 12:12:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231853AbiG1QMU (ORCPT ); Thu, 28 Jul 2022 12:12:20 -0400 Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com [IPv6:2607:f8b0:4864:20::f36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E35FD6D2D6 for ; Thu, 28 Jul 2022 09:12:18 -0700 (PDT) Received: by mail-qv1-xf36.google.com with SMTP id mn11so1721975qvb.9 for ; Thu, 28 Jul 2022 09:12:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Hcao3mS9A68A4ygDFBhvxMbZZ8matHzGnHuIbBLuhfU=; b=Jt6CsrspqbmJuU7v5pdYKf8Qf1xLBwR83W9TWQE4Zt2kFGYZu0FBW9HDonEdJ5VO8t t+zbRfRtVdAxZ5BEI3NPWR4aqmEFN0qLvM1bZAkBOXfDa+uVnRl2wVvd1zyLkHU0iEA/ kxxdq7UaQxyNVvw1pO/l4nEz6YyCBR740FjDcDvHKuNdzXNWWsO0ZtycbbofpalgVTJv MczIaKEcwK3PhwPZE0PcCHNwI5OojUe0hF39w4UCno0AsrG9KlF7rhfNMjvckJoOzF0i q/q97yyOCEVOjCT3zxvohmE2+9ujvII8mg0A/93Zv1JynHpjzWAFMcnZRSW3uUTbveuj vyZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Hcao3mS9A68A4ygDFBhvxMbZZ8matHzGnHuIbBLuhfU=; b=US1Y8htE/4IWk51c9EDrHPkCw0yamPL6O5n7qlSWRB3e7nVnIpDXRleaquHSaXeYtQ MMwPnuOxpWaWcG+3gv573CivV6uaoVWT577j/HUWBp0Dc04mfV/kblD2SAy7QlINxoDq CSrBC8Fzc2VXXLVr9xUbQ6/xYH2yde4G0oxJ6RqMXQM/cLgIV7HfFfLo7cvPPMmGZBO4 JQhonU17cbj4AY8QfMJZkbP2Te1jqGzjLPoqgsxN7Mw0cv9aKc5Fo7pdqP5gEP590wO1 m0TtlpKDsqPI/5jaP+uIlaQLbHq1uF3KpVQ3SUx2UfU2razkCJb8xdQNhVMvEDHIW4rn nK9w== X-Gm-Message-State: AJIora86mUpFFMBbNh4NOp+ybagII0OgAE1I/KXcXV6I9urE3MY4zdvh qkGzOzcPBHq9rgFpanPxG3g= X-Google-Smtp-Source: AGRyM1sJ1mwSx9Wpq11csum70XnEQlfimGA6PespiIrUdiD2AailNKiMaUYAXZLhCnOHhgNNZ9lKxw== X-Received: by 2002:a0c:f04f:0:b0:474:499:5000 with SMTP id b15-20020a0cf04f000000b0047404995000mr8812048qvl.125.1659024737956; Thu, 28 Jul 2022 09:12:17 -0700 (PDT) Received: from localhost ([2601:4c1:c100:1230:b984:ba52:c3cf:cb5e]) by smtp.gmail.com with ESMTPSA id y7-20020a05620a44c700b006af039ff090sm755694qkp.97.2022.07.28.09.12.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jul 2022 09:12:17 -0700 (PDT) From: Yury Norov To: Linus Torvalds , Guenter Roeck , Dennis Zhou , Russell King , Catalin Marinas , Andy Shevchenko , Rasmus Villemoes , Alexey Klimov , Linux Kernel Mailing List Cc: Yury Norov Subject: [PATCH 1/5] lib/find_bit: rename "le" to "need_swab" in __find_next_bit() Date: Thu, 28 Jul 2022 09:12:04 -0700 Message-Id: <20220728161208.865420-2-yury.norov@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220728161208.865420-1-yury.norov@gmail.com> References: <20220728161208.865420-1-yury.norov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The parameter is used to swap bytes on BE machines when searching for bits in little-endian bitmaps. On LE machines this parameter is 0, which misleads readers. Signed-off-by: Yury Norov --- lib/find_bit.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/find_bit.c b/lib/find_bit.c index 1b8e4b2a9cba..04c142acfc40 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -31,7 +31,7 @@ */ unsigned long _find_next_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long nbits, - unsigned long start, unsigned long invert, unsigned long le) + unsigned long start, unsigned long invert, bool need_swab) { unsigned long tmp, mask; =20 @@ -45,7 +45,7 @@ unsigned long _find_next_bit(const unsigned long *addr1, =20 /* Handle 1st word. */ mask =3D BITMAP_FIRST_WORD_MASK(start); - if (le) + if (need_swab) mask =3D swab(mask); =20 tmp &=3D mask; @@ -63,7 +63,7 @@ unsigned long _find_next_bit(const unsigned long *addr1, tmp ^=3D invert; } =20 - if (le) + if (need_swab) tmp =3D swab(tmp); =20 return min(start + __ffs(tmp), nbits); --=20 2.34.1 From nobody Wed Apr 15 01:30:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 953A6C04A68 for ; Thu, 28 Jul 2022 16:12:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232482AbiG1QMd (ORCPT ); Thu, 28 Jul 2022 12:12:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232047AbiG1QMW (ORCPT ); Thu, 28 Jul 2022 12:12:22 -0400 Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C52936E2EF for ; Thu, 28 Jul 2022 09:12:20 -0700 (PDT) Received: by mail-qv1-xf2d.google.com with SMTP id j11so1723658qvt.10 for ; Thu, 28 Jul 2022 09:12:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=m9VCKSONvhrO3g3DCwfmUIlgvFSHhyje+UErLAquBos=; b=l57PGFmgeekTFiJHUE9YC4JG7YJQk4hI38s2znWPIBsxolCh6K2OIfqllro+e04aS2 M0MwRCANUKCJGQsQAKbTuzTuswm1nNrD/gtknK/32uABFp4D+RRqfgWndO71pH2Bq14s 2X78Cwc77B5LkU0lzCwyitpF/aFFP6puFRyLQO1G9xkwRsnD383TOFBMtKlS9eacYIIs 99JVMrNYiQTnJTlVocpExO7uvTOeZgd41E/spIO8WzNB81F2/0vlrmsa9fGXPZ11K3Wy 3VytRzffJPp2lKOOnsdFDbyOz9DyXDPCny67844rzoWcm/f7AHQ/e7i4IwAHqd6qkBfj KuQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=m9VCKSONvhrO3g3DCwfmUIlgvFSHhyje+UErLAquBos=; b=Uek0z29JuhC6+MBM2zYJpEzrFQQl0pogU4cGY/d6q09ivS7uMKAPtYECJuWWIvxMH5 4a/nFmsbbA7mSy3YcXd/jj2Su+2/2mQW1xZdBe/gnW76VIhcFk+Vk0wH7i2Ksd4+NayU fVVZ6/Eh7XVO1K/zMovGfcYcDkCe8ZCx93IiFHGYwheAuPPhm6x4s1XfUzSHJqZTAB9j ukv2eiouBCEsoKTldTKzbaa9Yker3KN0c95k3B0NnQxQ0DJFZ2uMC4H1T0/h+rdGGpfR N1OU0jTRuKftVH0X28A7gt8+vNf5Cw+iC1BLfBxvYXinIRkKbPIuM00J0Jr9uGtsvLAC c1nA== X-Gm-Message-State: AJIora+g8M3cjBUmyYjKdWKHHC1YokgpWfwjdc/kF9a1RYKYiUQoh1Cj WpYzKpKEMEb42G3jZYC12JI= X-Google-Smtp-Source: AGRyM1v50KTeGtTYCDNdNLmmHNSaoQwrv7kuOZxnx8ebs38KHkwkssrlbmbg+5AkWJkD/lsklATDdg== X-Received: by 2002:ad4:5dcc:0:b0:473:9d1d:a1e with SMTP id m12-20020ad45dcc000000b004739d1d0a1emr24402524qvh.54.1659024739649; Thu, 28 Jul 2022 09:12:19 -0700 (PDT) Received: from localhost ([2601:4c1:c100:1230:b984:ba52:c3cf:cb5e]) by smtp.gmail.com with ESMTPSA id w18-20020a05620a425200b006b60c965024sm803133qko.113.2022.07.28.09.12.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jul 2022 09:12:19 -0700 (PDT) From: Yury Norov To: Linus Torvalds , Guenter Roeck , Dennis Zhou , Russell King , Catalin Marinas , Andy Shevchenko , Rasmus Villemoes , Alexey Klimov , Linux Kernel Mailing List Cc: Yury Norov Subject: [PATCH 2/5] lib/find_bit: optimize find_next_bit() functions Date: Thu, 28 Jul 2022 09:12:05 -0700 Message-Id: <20220728161208.865420-3-yury.norov@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220728161208.865420-1-yury.norov@gmail.com> References: <20220728161208.865420-1-yury.norov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The function _find_next_bit() takes parameters that modify its behavior to implement and- zero- and le- flavors. The parameters are passed at compile time, but current design prevents the compiled from optimizing them out. This patch adds wrappers around _find_next_bit(), and turns it into internal inline helper, so that the optimization becomes possible. I ran find_bit_benchmark 5 times on top of 5.19-rc8 and 5 times on top of this patch. The results for kvm/x86_64 are: v5.19-rc8 Optimized Difference (more - better) Random dense bitmap ns ns % sigmas* find_next_bit: 721209 692337 4 0.73 find_next_zero_bit: 738138 701094 5 0.52 find_last_bit: 802393 698133 13 0.49 find_first_bit: 3560900 3574644 0 -0.07 find_first_and_bit: 38601442 37945046 2 0.71 find_next_and_bit: 335574 306184 9 2.36 =20 Random sparse bitmap =20 find_next_bit: 15868 13856 13 0.82 find_next_zero_bit: 1311843 1227418 4 0.72 find_last_bit: 13633 14080 -3 -0.74 find_first_bit: 1273625 1253343 1 0.52 find_first_and_bit: 8548 8157 7 0.32 find_next_and_bit: 8828 8437 6 0.52 * Calculated as: (mean(before) - mean(after)) / max(std(before), std(after)) All at all, optimized code is generally faster, but the difference never reaches solid 3 sigmas. find_next_and_bit almost touches the limit in dence bitmap test, but no... However, bloat-o-meter shows significant ~2.5K decrease of image size So, the optimization has total positive impact. Bloat-o-meter: add/remove: 4/2 grow/shrink: 18/193 up/down: 627/-3108 (-2481) Suggested-by: Linus Torvalds Signed-off-by: Yury Norov --- include/linux/find.h | 25 ++++++++++++++-------- lib/find_bit.c | 49 ++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 62 insertions(+), 12 deletions(-) diff --git a/include/linux/find.h b/include/linux/find.h index 424ef67d4a42..3ace995d7079 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -8,9 +8,18 @@ =20 #include =20 -extern unsigned long _find_next_bit(const unsigned long *addr1, - const unsigned long *addr2, unsigned long nbits, - unsigned long start, unsigned long invert, unsigned long le); +unsigned long _find_next_bit(const unsigned long *addr1, unsigned long nbi= ts, + unsigned long start); +unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigne= d long *addr2, + unsigned long nbits, unsigned long start); +unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long= nbits, + unsigned long start); +#ifdef __BIG_ENDIAN +unsigned long _find_next_zero_bit_le(const void *addr, unsigned + long size, unsigned long offset); +unsigned long _find_next_bit_le(const void *addr, unsigned + long size, unsigned long offset); +#endif extern unsigned long _find_first_bit(const unsigned long *addr, unsigned l= ong size); extern unsigned long _find_first_and_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long size); @@ -41,7 +50,7 @@ unsigned long find_next_bit(const unsigned long *addr, un= signed long size, return val ? __ffs(val) : size; } =20 - return _find_next_bit(addr, NULL, size, offset, 0UL, 0); + return _find_next_bit(addr, size, offset); } #endif =20 @@ -71,7 +80,7 @@ unsigned long find_next_and_bit(const unsigned long *addr= 1, return val ? __ffs(val) : size; } =20 - return _find_next_bit(addr1, addr2, size, offset, 0UL, 0); + return _find_next_and_bit(addr1, addr2, size, offset); } #endif =20 @@ -99,7 +108,7 @@ unsigned long find_next_zero_bit(const unsigned long *ad= dr, unsigned long size, return val =3D=3D ~0UL ? size : ffz(val); } =20 - return _find_next_bit(addr, NULL, size, offset, ~0UL, 0); + return _find_next_zero_bit(addr, size, offset); } #endif =20 @@ -247,7 +256,7 @@ unsigned long find_next_zero_bit_le(const void *addr, u= nsigned return val =3D=3D ~0UL ? size : ffz(val); } =20 - return _find_next_bit(addr, NULL, size, offset, ~0UL, 1); + return _find_next_zero_bit_le(addr, size, offset); } #endif =20 @@ -266,7 +275,7 @@ unsigned long find_next_bit_le(const void *addr, unsign= ed return val ? __ffs(val) : size; } =20 - return _find_next_bit(addr, NULL, size, offset, 0UL, 1); + return _find_next_bit_le(addr, size, offset); } #endif =20 diff --git a/lib/find_bit.c b/lib/find_bit.c index 04c142acfc40..4ef3151b3109 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -19,9 +19,6 @@ #include #include =20 -#if !defined(find_next_bit) || !defined(find_next_zero_bit) || \ - !defined(find_next_bit_le) || !defined(find_next_zero_bit_le) || \ - !defined(find_next_and_bit) /* * This is a common helper function for find_next_bit, find_next_zero_bit,= and * find_next_and_bit. The differences are: @@ -29,7 +26,7 @@ * searching it for one bits. * - The optional "addr2", which is anded with "addr1" if present. */ -unsigned long _find_next_bit(const unsigned long *addr1, +static inline unsigned long __find_next_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long nbits, unsigned long start, unsigned long invert, bool need_swab) { @@ -68,9 +65,53 @@ unsigned long _find_next_bit(const unsigned long *addr1, =20 return min(start + __ffs(tmp), nbits); } + +#ifndef find_next_bit +unsigned long _find_next_bit(const unsigned long *addr, unsigned long nbit= s, unsigned long start) +{ + return __find_next_bit(addr, NULL, nbits, start, 0UL, 0); +} EXPORT_SYMBOL(_find_next_bit); #endif =20 +#ifndef find_next_and_bit +unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigne= d long *addr2, + unsigned long nbits, unsigned long start) +{ + return __find_next_bit(addr1, addr2, nbits, start, 0UL, 0); +} +EXPORT_SYMBOL(_find_next_and_bit); +#endif + +#ifndef find_next_zero_bit +unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long= nbits, + unsigned long start) +{ + return __find_next_bit(addr, NULL, nbits, start, ~0UL, 0); +} +EXPORT_SYMBOL(_find_next_zero_bit); +#endif + +#ifdef __BIG_ENDIAN +#ifndef find_next_zero_bit_le +unsigned long _find_next_zero_bit_le(const void *addr, unsigned + long size, unsigned long offset) +{ + return __find_next_bit(addr, NULL, size, offset, ~0UL, 1); +} +EXPORT_SYMBOL(_find_next_zero_bit_le); +#endif + +#ifndef find_next_bit_le +unsigned long _find_next_bit_le(const void *addr, unsigned + long size, unsigned long offset) +{ + return __find_next_bit(addr, NULL, size, offset, 0UL, 1); +} +EXPORT_SYMBOL(_find_next_bit_le); +#endif +#endif /* __BIG_ENDIAN */ + #ifndef find_first_bit /* * Find the first set bit in a memory region. --=20 2.34.1 From nobody Wed Apr 15 01:30:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2E94C04A68 for ; Thu, 28 Jul 2022 16:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232214AbiG1QMh (ORCPT ); Thu, 28 Jul 2022 12:12:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231867AbiG1QMX (ORCPT ); Thu, 28 Jul 2022 12:12:23 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C5606D9D6 for ; Thu, 28 Jul 2022 09:12:22 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id r21so1440391qtn.11 for ; Thu, 28 Jul 2022 09:12:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nVqHrflF9vmMmd/588tMUx02l5nIAJz+TrX/yH1UWPg=; b=SO5jDgoaWFhsn4dVMMRHUCOpRhHElt3Ciii5idJEiO0hFJfn6fgecNVmt4KHry/Cij NYabiutcmn4DRx7NeUg43FHV009+73aJRlMudRel8p0B/DDT1mtb6JQzT0fCrSCotTrb e0AkLa7xkL9CMXzk94FNpizgTEUt7JilClDPE+izqb3tfD7f8X3gdNt/Se+12mn23wyw Rz5QkMo25z2yBn6LDd6O5SwsTdNXg0cB+M2G1QEvhGKzvoZ0F/WVCR+QgfDm9FouFXi6 nL2y5AQFsVXQw5RqYwKOb9FcLMxUcADGH8p7115xi2aLzPBaPoW7pPYlxpeEtNNkct3L 0vvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nVqHrflF9vmMmd/588tMUx02l5nIAJz+TrX/yH1UWPg=; b=DfWZCL5B+WtwFBBhR4Ct/XiKsvXDD2MdSOz3kx2JNjpLc+eGawdALZWx8nYGrxbaWa YfSG0o/82i6GquNwUr55a80nL74LZU2chrSQ4CcX+J6BN+llovTCz499qo5Nj9ckP5fX 42e8q1EUxRuyYh/9LgXyT69pZaM3ugPF9OOu9ZuH8Dk0x3cU0a6EW0LtkMmwvZyDa5LW JNF25pdiuwhwYTbaStuoL7iI0HeqUHH3ZZX/xKDUG7UdUzrUdszKCwbLZGD86sRL/V0A 2nOEz05++HzvipEobv/4Yy9RJN/mlvVT/4b6ookjkStPm90jOn0O+LxsVy2eqt/2JYIa 4caQ== X-Gm-Message-State: AJIora+P8tcEmewWCj3D5pWPr8a8Kio0CLYhHKZTQr+sTJ3TL0SpxaYQ KXtJWRzXvjChI0cZm1rTkZ8= X-Google-Smtp-Source: AGRyM1vi7CbQZpdu2lkjfaSDGPtna5N5h6kizl6LAgeyElZzackUdou5wj/NukL9DPzowSKlEGJ5Jg== X-Received: by 2002:a05:622a:53:b0:31f:1fb6:8d3a with SMTP id y19-20020a05622a005300b0031f1fb68d3amr23883580qtw.386.1659024741054; Thu, 28 Jul 2022 09:12:21 -0700 (PDT) Received: from localhost ([2601:4c1:c100:1230:b984:ba52:c3cf:cb5e]) by smtp.gmail.com with ESMTPSA id l21-20020ac87255000000b0031ef21aec36sm633174qtp.32.2022.07.28.09.12.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jul 2022 09:12:20 -0700 (PDT) From: Yury Norov To: Linus Torvalds , Guenter Roeck , Dennis Zhou , Russell King , Catalin Marinas , Andy Shevchenko , Rasmus Villemoes , Alexey Klimov , Linux Kernel Mailing List Cc: Yury Norov Subject: [PATCH 3/5] lib/find_bit: unify _find_first_{,and,zero}_bit implementations Date: Thu, 28 Jul 2022 09:12:06 -0700 Message-Id: <20220728161208.865420-4-yury.norov@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220728161208.865420-1-yury.norov@gmail.com> References: <20220728161208.865420-1-yury.norov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The functions are almost identical, so create a common helper for them so that compiler will be able to either inline the helper and optimize-out parameters known at compile-time, or save some space by keeping it as a real function. On kvm/x86_64, bloat-o-meter reports +9 bytes. Find_bit_benchmark 5 times before and after doesn't show significant (i.e. delta is greater than 3 sigma) difference, except find_next_bit, which is most likely an outlier (although, lucky for the patch): v5.19-rc8 Optimized Difference (more - better) Random dense bitmap ns ns % sigmas find_next_bit: 721209 594936 18 3.19 find_next_zero_bit: 738138 638182 14 1.40 find_last_bit: 802393 940846 -17 -0.31 find_first_bit: 3560900 3379983 5 0.65 find_first_and_bit: 38601442 37683449 2 1.00 find_next_and_bit: 335574 300373 10 2.82 =20 Random sparse bitmap =20 find_next_bit: 15868 13856 13 0.82 find_next_zero_bit: 1311843 1227418 6 0.72 find_last_bit: 13633 14080 -3 -0.74 find_first_bit: 1273625 1253343 2 0.52 find_first_and_bit: 8548 8157 5 0.32 find_next_and_bit: 8828 8437 4 0.52 Signed-off-by: Yury Norov --- lib/find_bit.c | 62 +++++++++++++++++++++++++++----------------------- 1 file changed, 33 insertions(+), 29 deletions(-) diff --git a/lib/find_bit.c b/lib/find_bit.c index 4ef3151b3109..d207d1699834 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -20,12 +20,38 @@ #include =20 /* - * This is a common helper function for find_next_bit, find_next_zero_bit,= and - * find_next_and_bit. The differences are: + * This is a common helper functions for find_{first,next}_bit{,_le}. + * Internal parameters are: * - The "invert" argument, which is XORed with each fetched word before - * searching it for one bits. - * - The optional "addr2", which is anded with "addr1" if present. + * searching it for set bits; to implement find_*_zero_bit(). + * - The optional "addr2", which is ANDed with "addr1" if present; to + * implement find_*_and_bit(). + * - The "need_swab" that converts words to BE format; to implement + * find_*_le() on big-endian machines. */ +static inline +unsigned long __find_first_bit(const unsigned long *addr1, const unsigned = long *addr2, + unsigned long size, unsigned long invert, bool need_swab) +{ + unsigned long idx, val; + + for (idx =3D 0; idx * BITS_PER_LONG < size; idx++) { + val =3D addr1[idx]; + if (addr2) + val &=3D addr2[idx]; + + val ^=3D invert; + + if (val) { + if (need_swab) + val =3D swab(val); + return min(idx * BITS_PER_LONG + __ffs(val), size); + } + } + + return size; +} + static inline unsigned long __find_next_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long nbits, unsigned long start, unsigned long invert, bool need_swab) @@ -118,14 +144,7 @@ EXPORT_SYMBOL(_find_next_bit_le); */ unsigned long _find_first_bit(const unsigned long *addr, unsigned long siz= e) { - unsigned long idx; - - for (idx =3D 0; idx * BITS_PER_LONG < size; idx++) { - if (addr[idx]) - return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size); - } - - return size; + return __find_first_bit(addr, NULL, size, 0UL, false); } EXPORT_SYMBOL(_find_first_bit); #endif @@ -138,15 +157,7 @@ unsigned long _find_first_and_bit(const unsigned long = *addr1, const unsigned long *addr2, unsigned long size) { - unsigned long idx, val; - - for (idx =3D 0; idx * BITS_PER_LONG < size; idx++) { - val =3D addr1[idx] & addr2[idx]; - if (val) - return min(idx * BITS_PER_LONG + __ffs(val), size); - } - - return size; + return __find_first_bit(addr1, addr2, size, 0UL, false); } EXPORT_SYMBOL(_find_first_and_bit); #endif @@ -157,14 +168,7 @@ EXPORT_SYMBOL(_find_first_and_bit); */ unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned lon= g size) { - unsigned long idx; - - for (idx =3D 0; idx * BITS_PER_LONG < size; idx++) { - if (addr[idx] !=3D ~0UL) - return min(idx * BITS_PER_LONG + ffz(addr[idx]), size); - } - - return size; + return __find_first_bit(addr, NULL, size, ~0UL, false); } EXPORT_SYMBOL(_find_first_zero_bit); #endif --=20 2.34.1 From nobody Wed Apr 15 01:30:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DF97C04A68 for ; Thu, 28 Jul 2022 16:12:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232455AbiG1QMl (ORCPT ); Thu, 28 Jul 2022 12:12:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232034AbiG1QMY (ORCPT ); Thu, 28 Jul 2022 12:12:24 -0400 Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2AD46E2DE for ; Thu, 28 Jul 2022 09:12:23 -0700 (PDT) Received: by mail-qv1-xf2b.google.com with SMTP id i4so1726578qvv.7 for ; Thu, 28 Jul 2022 09:12:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uLzhPJuLV78wBr0Tv6Z8LxwLkcgZW3FnmQ38kfC7t60=; b=jPIh/kiGOQp6NfTNaB9ktuEuEwoiIsvSqCjmCsUsq350WFK/FHOEYwDXqBO1/wg1cl OaF9nNK4YurhE/6oZrSO2n1o+h1gq/U6VmXNIIpasAEdkAxFzVXeaBt9O7W0hnXyD5EJ NOrOqxdUN/0BbL3Az/xhiOHIlVqxZJMZcMtqIWJQ4FbcWE0cQBL9L0XbCfJfuYRA6I84 7tOoKY9k9CdO3BgObsEef7SqHb2s5Tt+EwX8fa7ywXLUWzJz/Zp/GjeSpe9IOIfwVwnA jrThVA3joG7iv6LcR/e2jjbkY+Se8JZZ1ZnGhDcPWVmwSFzQEHGV7LaoqiJHwMAwmfLb e9HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uLzhPJuLV78wBr0Tv6Z8LxwLkcgZW3FnmQ38kfC7t60=; b=pdkfOSjnpviMXdw+Dy9xgYhQ+RIIDJJXBd7J65Ut5xu5wR6SveWYxySwLH9j7Jl3SK aP0I6H9aqApGxrYO0T65O4Haid+hPsR8EQKMOQ3DVbo4VJNRf0Ps+duqD5vs4NSzPkgu bYUsd3hkXd5rsjxsj3VY5AclzRi3dZR0ieqqDZStdF7ttE38/0ELa/5gMUYskS0Tc6Xq +psM+JrZjmmYTtcG9R5kQbxPzXBpt/OP5O5Lbc5EunlDrmnx3p2jE5LnIoPPwSkT3EQY o5peq4zvBmcPo2G8PiDRtmgsGMdSn4Kg9eVVq9CdOcG4N+AF4XLb0ku4WZapdVE61tzR t6sw== X-Gm-Message-State: AJIora9Wd2JlTPfcXsNIoQFK6+yaE3T6zvDy+ZabOYWNJ9w7eX539Y8+ 5bK1DIcq0fmBfQ9bkjKE6U4= X-Google-Smtp-Source: AGRyM1uJvC3TwxbK73EwdvMezs9OEmGdKB4Mvh2YmymBonKjWnC6fwAUfDn5f1qUF7Kj8l4456qj/w== X-Received: by 2002:a0c:e50d:0:b0:474:6b1e:ff39 with SMTP id l13-20020a0ce50d000000b004746b1eff39mr11370789qvm.2.1659024742656; Thu, 28 Jul 2022 09:12:22 -0700 (PDT) Received: from localhost ([2601:4c1:c100:1230:b984:ba52:c3cf:cb5e]) by smtp.gmail.com with ESMTPSA id g1-20020a05620a40c100b006b55036fd3fsm775859qko.70.2022.07.28.09.12.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jul 2022 09:12:22 -0700 (PDT) From: Yury Norov To: Linus Torvalds , Guenter Roeck , Dennis Zhou , Russell King , Catalin Marinas , Andy Shevchenko , Rasmus Villemoes , Alexey Klimov , Linux Kernel Mailing List Cc: Yury Norov Subject: [PATCH 4/5] lib/find_bit: create find_first_zero_bit_le() Date: Thu, 28 Jul 2022 09:12:07 -0700 Message-Id: <20220728161208.865420-5-yury.norov@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220728161208.865420-1-yury.norov@gmail.com> References: <20220728161208.865420-1-yury.norov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" find_first_zero_bit_le() is an alias to find_next_zero_bit_le();=20 despite that 'next' is known to be slower than 'first' version. Now that we have common __find_first_bit(), it's trivial to implement find_first_zero_bit_le() as a real function. Signed-off-by: Yury Norov --- include/linux/find.h | 20 +++++++++++++++----- lib/find_bit.c | 11 +++++++++++ 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/include/linux/find.h b/include/linux/find.h index 3ace995d7079..8d326d1518f4 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -19,6 +19,7 @@ unsigned long _find_next_zero_bit_le(const void *addr, un= signed long size, unsigned long offset); unsigned long _find_next_bit_le(const void *addr, unsigned long size, unsigned long offset); +unsigned long _find_first_zero_bit_le(const unsigned long *addr, unsigned = long size); #endif extern unsigned long _find_first_bit(const unsigned long *addr, unsigned l= ong size); extern unsigned long _find_first_and_bit(const unsigned long *addr1, @@ -260,6 +261,20 @@ unsigned long find_next_zero_bit_le(const void *addr, = unsigned } #endif =20 +#ifndef find_first_zero_bit_le +static inline +unsigned long find_first_zero_bit_le(const unsigned long *addr, unsigned l= ong size) +{ + if (small_const_nbits(size)) { + unsigned long val =3D swab(*addr) | ~GENMASK(size - 1, 0); + + return val =3D=3D ~0UL ? size : ffz(val); + } + + return _find_first_zero_bit_le(addr, size); +} +#endif + #ifndef find_next_bit_le static inline unsigned long find_next_bit_le(const void *addr, unsigned @@ -279,11 +294,6 @@ unsigned long find_next_bit_le(const void *addr, unsig= ned } #endif =20 -#ifndef find_first_zero_bit_le -#define find_first_zero_bit_le(addr, size) \ - find_next_zero_bit_le((addr), (size), 0) -#endif - #else #error "Please fix " #endif diff --git a/lib/find_bit.c b/lib/find_bit.c index d207d1699834..a56872611a59 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -119,6 +119,17 @@ EXPORT_SYMBOL(_find_next_zero_bit); #endif =20 #ifdef __BIG_ENDIAN +#ifndef find_first_zero_bit_le +/* + * Find the first cleared bit in an LE memory region. + */ +unsigned long _find_first_zero_bit_le(const unsigned long *addr, unsigned = long size) +{ + return __find_first_bit(addr, NULL, size, ~0UL, true); +} +EXPORT_SYMBOL(_find_first_zero_bit_le); +#endif + #ifndef find_next_zero_bit_le unsigned long _find_next_zero_bit_le(const void *addr, unsigned long size, unsigned long offset) --=20 2.34.1 From nobody Wed Apr 15 01:30:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13C00C04A68 for ; Thu, 28 Jul 2022 16:12:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232805AbiG1QMp (ORCPT ); Thu, 28 Jul 2022 12:12:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232066AbiG1QM0 (ORCPT ); Thu, 28 Jul 2022 12:12:26 -0400 Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com [IPv6:2607:f8b0:4864:20::f2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 449C56E2F1 for ; Thu, 28 Jul 2022 09:12:25 -0700 (PDT) Received: by mail-qv1-xf2c.google.com with SMTP id mh14so1751235qvb.1 for ; Thu, 28 Jul 2022 09:12:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=b+TtgBs1YBfFDFS7bvjbrZrbqBFKrT/paxnRG4cTTIw=; b=h9plRpRDHdI6/W4nREA3xIO8VueRMyfCUC/StFjcjgblj4Dbw8pn8VrKhaLc7ftOY8 C2/Nr91meOZ6tUbL2yWZ0QM+rvHRcdZxh9KdmJrMMmu3TQA2q1N/6actR8uV2MHpJJWn A7o+2HY90z/ifiFZnPrSK3GfcjfFbApL55hURWPRmAe8gp0dPRR76PqeJFVpsKf8r7/a 5D1r/NLR5zV0mOytEaZUv5h42wOY9k5eChzV96kJjVq9vYr+Nz6XeIOkGcNgEk7XE3bI bzXUPnTOIWxHTNUGceziDxyce8JXzc0aESUBctVMbGBO5HhqqZJ1Kc3erM8vga8Bgz6Y PlLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=b+TtgBs1YBfFDFS7bvjbrZrbqBFKrT/paxnRG4cTTIw=; b=Imk5libakqWYQ3TxwXY9n8Xxf1sTMJ7A7ju7xP8iB7eGPZber1mZqvCNjo4SlPghUH TxeP3C2sL+2Ww7SoIcThLAw919dhoWc1h/hcnkWQHb+gesxYyiEXQ4pWTetxhD3eOQmS hya9xXzLxEj9ieZ+RWj/X4zGBUhSYh8utbZQFvsCy+pvpnjeJAA8OxYu7Oqwok+qV1LV s1yJmPg1Orgp3Ni7zY+Loi2/nWFL70RXp1OZHqwYCEyXzKiqrExaROwWARQquWeL8SLI wlKxhghZWG11Cu35MAfjvxUp01sY3x+e1QpNULReT2YjbGraUElkIqSkQxMOyuh5RlpA SuQw== X-Gm-Message-State: AJIora8dPXD03IACfkwYc65wAzrEfJSY4eZcyMVpwv18rYzSB8eCIwvR IBFNfpqnt47DZ8EJ94mr4AQ= X-Google-Smtp-Source: AGRyM1u6rjSHc/bB6TKyTEE3f0wd9yAr9Vz667bpYhueaYGkf3KF6+gGCTynZRpzIWxIHDJfK/Rwdg== X-Received: by 2002:a0c:b319:0:b0:473:82bd:327b with SMTP id s25-20020a0cb319000000b0047382bd327bmr23785874qve.43.1659024744239; Thu, 28 Jul 2022 09:12:24 -0700 (PDT) Received: from localhost ([2601:4c1:c100:1230:b984:ba52:c3cf:cb5e]) by smtp.gmail.com with ESMTPSA id w16-20020ac857d0000000b0031f0b43629dsm686287qta.23.2022.07.28.09.12.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jul 2022 09:12:23 -0700 (PDT) From: Yury Norov To: Linus Torvalds , Guenter Roeck , Dennis Zhou , Russell King , Catalin Marinas , Andy Shevchenko , Rasmus Villemoes , Alexey Klimov , Linux Kernel Mailing List Cc: Yury Norov Subject: [RFC PATCH 5/5] lib/find_bit: re-use __find_first_bit() in __find_next_bit() Date: Thu, 28 Jul 2022 09:12:08 -0700 Message-Id: <20220728161208.865420-6-yury.norov@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220728161208.865420-1-yury.norov@gmail.com> References: <20220728161208.865420-1-yury.norov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Similarly to m68k, switch __find_first_bit() to use __find_next_bit(). The only difference between them is that 'next' should skip #start bits. So do it in prologue, and then just call 'first' version if needed. Signed-off-by: Yury Norov --- This patch is just to see how m68k approach would work in general (x86_64) case. It almost doubles the size of find_next() functions, and adds nothing to performance. I would prefer not taking it upstream. add/remove: 0/0 grow/shrink: 4/0 up/down: 332/0 (332) Function old new delta _find_next_zero_bit 103 191 +88 find_next_clump8 133 219 +86 _find_next_and_bit 120 203 +83 _find_next_bit 99 174 +75 v5.19-rc8 Optimized Difference (more - better) Random dense bitmap ns ns % sigmas find_next_bit: 721209 742050 -3 -0.09 find_next_zero_bit: 738138 920508 -25 -0.51 find_last_bit: 802393 839157 -5 -0.08 find_first_bit: 3560900 3747795 -5 -0.30 find_first_and_bit: 38601442 37423751 3 1.28 find_next_and_bit: 335574 322220 4 1.07 Random sparse bitmap =20 find_next_bit: 15868 13969 12 2.21 find_next_zero_bit: 1311843 1290946 2 0.18 find_last_bit: 13633 13856 -2 -0.37 find_first_bit: 1273625 1233955 3 1.01 find_first_and_bit: 8548 14974 -75 -0.43 find_next_and_bit: 8828 7766 12 1.37 lib/find_bit.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/lib/find_bit.c b/lib/find_bit.c index a56872611a59..137e606a05a1 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -61,12 +61,15 @@ static inline unsigned long __find_next_bit(const unsig= ned long *addr1, if (unlikely(start >=3D nbits)) return nbits; =20 + if (IS_ALIGNED(start, BITS_PER_LONG)) + goto aligned; + + /* Handle 1st word. */ tmp =3D addr1[start / BITS_PER_LONG]; if (addr2) tmp &=3D addr2[start / BITS_PER_LONG]; tmp ^=3D invert; =20 - /* Handle 1st word. */ mask =3D BITMAP_FIRST_WORD_MASK(start); if (need_swab) mask =3D swab(mask); @@ -74,22 +77,20 @@ static inline unsigned long __find_next_bit(const unsig= ned long *addr1, tmp &=3D mask; =20 start =3D round_down(start, BITS_PER_LONG); - - while (!tmp) { - start +=3D BITS_PER_LONG; - if (start >=3D nbits) - return nbits; - - tmp =3D addr1[start / BITS_PER_LONG]; - if (addr2) - tmp &=3D addr2[start / BITS_PER_LONG]; - tmp ^=3D invert; + if (tmp) { + if (need_swab) + tmp =3D swab(tmp); + return min(start + __ffs(tmp), nbits); } =20 - if (need_swab) - tmp =3D swab(tmp); + start +=3D BITS_PER_LONG; + if (start >=3D nbits) + return nbits; =20 - return min(start + __ffs(tmp), nbits); +aligned: + return start + __find_first_bit(addr1 + start/BITS_PER_LONG, + addr2 ? addr2 + start/BITS_PER_LONG : NULL, + nbits - start, invert, need_swab); } =20 #ifndef find_next_bit --=20 2.34.1