From nobody Thu Dec 25 06:46:32 2025 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B23C37702; Sun, 21 Jan 2024 15:37:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851422; cv=none; b=Fwco7426CbeNXmrK4FI11SB5FPgBtXw+SrSzj3QRra4dAGX3vzM6BOJgCJKjkpsdyLYbuAdMA4g8fnsEHR6mQT783ZGZY7eK1D1mXlrtML9Q8l09KpDJx6M92xU12XqeghY3QPEhH74D5+eFCnK66UoYq0UyDvpsNFxC2x+44Hc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851422; c=relaxed/simple; bh=F2S6WYJX/d0/D0HlTXYXOZ5EKWmUIRf6nsWTG7vHf00=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BxIP8scvzY3B0IWxjYB8wlgJ1910BRsZvmDkHIy9MzR2VZB36irNavJ8AvuwanvUfKpXlV8vKw4cXyWqxFYTstdOgvEv5Uweg1ukaxcEIk6ggB/kphJKsI8NeZVQEQuwkgwlBIxANFWHGwuJa40TcponaZRwuKB+HNkU1W5PYFE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bIOJk50/; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bIOJk50/" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1d3b84173feso6781785ad.1; Sun, 21 Jan 2024 07:37:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705851420; x=1706456220; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dLlh0MVCcmdBauFIBsS5gCwpIFl7Oy5O74VPwVPzmOA=; b=bIOJk50/4cJbXL0TWiUIBnrTIF47s029IhoymVaincNjb4iWfdoERUcpzhpNf8mWX2 MTe2MmKAXtZ3Kk5LiFy+G+Dn93YpxO6gbPUA9oxDCUWXaNgDrac53fivD8jmq5oZ8xF1 8+d3STlb495K4Om7LBCHIz5xJaNnh0G5wt7qPyllI4NURF30RQRRIcyN1ZdMHm3h4ie6 PAHFirqMaO366u8hF3Vy/+V6fJnieTH2OvK36V35fvbhBxvUEOzsI1ZyGm1r0XWhR2Mq 4moVshbdU0CQPDMLD6/zMpA1tiOq/Rx6fuyF1hkpwDrv+mILnjjRHCcJRIQA4kCfV1Cp 7U2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705851420; x=1706456220; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dLlh0MVCcmdBauFIBsS5gCwpIFl7Oy5O74VPwVPzmOA=; b=xEKJGDhjSfCgfiuXvo0eJdEZU3PFxqumG9OUN0twqmA1xGA4x+OZvDBALkKnoi0ngq OCt1MGZga2/e7glj2wiXFCnSg8QRVMQsPVpJQXLKP8u9Nwv2qjU0/1MzwN/9+WZlBPlt T/7IdeiS+zuEdspycVFqbjH/lrPWTJitGe63HLm1/vmZUcGb6fJFm4jr5vIXLhmCX0kQ PdX/2Kof3KLRzVckk5O3usjYmadG1V4GwKmELyerTAlWyH4BE0CuJ8bxFGHNpyDCama0 dwjSHQO+mL+YABYHcykVTx0aPJ+zZOsSws0rSw6mbn5+9Tf8qR5UKLmmyuUsH7UG3w8w gVCA== X-Gm-Message-State: AOJu0YwESkbuLjvMC3jcPf8b3rrQic9N6+PISpkPh7SdS6sJV2cGs9Ev ETVZkNWlQGMdE3XDThd9uqgOyamZ1W/vdaECq/LYGVvxxO3iYPI9 X-Google-Smtp-Source: AGHT+IFK80cqD5kDScW3aeShBmpheDZ8RdnJZlICoPR6ox7z5ef7doIKv7RuHqsgBBo46v7ZHK1raQ== X-Received: by 2002:a17:90b:3443:b0:290:4cc1:5051 with SMTP id lj3-20020a17090b344300b002904cc15051mr3596335pjb.2.1705851419911; Sun, 21 Jan 2024 07:36:59 -0800 (PST) Received: from localhost.localdomain ([140.116.154.65]) by smtp.gmail.com with ESMTPSA id sv13-20020a17090b538d00b0028d8fa0171asm7744347pjb.35.2024.01.21.07.36.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Jan 2024 07:36:59 -0800 (PST) From: Kuan-Wei Chiu To: colyli@suse.de, kent.overstreet@linux.dev Cc: bfoster@redhat.com, jserv@ccns.ncku.edu.tw, linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org, linux-bcachefs@vger.kernel.org, Kuan-Wei Chiu Subject: [PATCH 1/5] bcachefs: Optimize eytzinger0_sort() using bottom-up heapsort Date: Sun, 21 Jan 2024 23:36:45 +0800 Message-Id: <20240121153649.2733274-2-visitorckw@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240121153649.2733274-1-visitorckw@gmail.com> References: <20240121153649.2733274-1-visitorckw@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This optimization reduces the average number of comparisons required from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is sufficiently large, it results in approximately 50% fewer comparisons. Currently, eytzinger0_sort employs the textbook version of heapsort, where during the heapify process, each level requires two comparisons to determine the maximum among three elements. In contrast, the bottom-up heapsort, during heapify, only compares two children at each level until reaching a leaf node. Then, it backtracks from the leaf node to find the correct position. Since heapify typically continues until very close to the leaf node, the standard heapify requires about 2*log2(n) comparisons, while the bottom-up variant only needs log2(n) comparisons. The experimental data presented below is based on an array generated by get_random_u32(). | N | comparisons(old) | comparisons(new) | time(old) | time(new) | |-------|------------------|------------------|-----------|-----------| | 10000 | 235381 | 136615 | 25545 us | 20366 us | | 20000 | 510694 | 293425 | 31336 us | 18312 us | | 30000 | 800384 | 457412 | 35042 us | 27386 us | | 40000 | 1101617 | 626831 | 48779 us | 38253 us | | 50000 | 1409762 | 799637 | 62238 us | 46950 us | | 60000 | 1721191 | 974521 | 75588 us | 58367 us | | 70000 | 2038536 | 1152171 | 90823 us | 68778 us | | 80000 | 2362958 | 1333472 | 104165 us | 78625 us | | 90000 | 2690900 | 1516065 | 116111 us | 89573 us | | 100000| 3019413 | 1699879 | 133638 us | 100998 us | Refs: BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small) Ingo Wegener Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993 https://doi.org/10.1016/0304-3975(93)90364-Y Signed-off-by: Kuan-Wei Chiu --- This patch has undergone unit testing and micro benchmarking using the following code [1]. [1]: static long long int cmp_count =3D 0; static int mycmp(const void *a, const void *b, size_t size) { u32 _a =3D *(u32 *)a; u32 _b =3D *(u32 *)b; cmp_count++; if (_a < _b) return -1; else if (_a > _b) return 1; else return 0; } static int test(void) { size_t N, i, L, R; ktime_t start, end; s64 delta; u32 *arr; for (N =3D 10000; N <=3D 100000; N +=3D 10000) { arr =3D kmalloc_array(N, sizeof(u32), GFP_KERNEL); cmp_count =3D 0; for (i =3D 0; i < N; i++) arr[i] =3D get_random_u32(); =09 start =3D ktime_get(); eytzinger0_sort(arr, N, sizeof(u32), mycmp, NULL); end =3D ktime_get(); delta =3D ktime_us_delta(end, start); printk(KERN_INFO "time: %lld\n", delta); printk(KERN_INFO "comparisons: %lld\n", cmp_count); for (int i =3D 0; i < N; i++) { L =3D 2 * i + 1; R =3D 2 * i + 2; if (L < N && arr[i] < arr[L]) goto err; if (R < N && arr[i] > arr[R]) goto err; } kfree(arr); } return 0; err: kfree(arr); return -1; } fs/bcachefs/util.c | 50 +++++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 20 deletions(-) diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c index c2ef7cddaa4f..bbc83b43162e 100644 --- a/fs/bcachefs/util.c +++ b/fs/bcachefs/util.c @@ -911,7 +911,7 @@ void eytzinger0_sort(void *base, size_t n, size_t size, int (*cmp_func)(const void *, const void *, size_t), void (*swap_func)(void *, void *, size_t)) { - int i, c, r; + int i, j, k; =20 if (!swap_func) { if (size =3D=3D 4 && alignment_ok(base, 4)) @@ -924,17 +924,22 @@ void eytzinger0_sort(void *base, size_t n, size_t siz= e, =20 /* heapify */ for (i =3D n / 2 - 1; i >=3D 0; --i) { - for (r =3D i; r * 2 + 1 < n; r =3D c) { - c =3D r * 2 + 1; - - if (c + 1 < n && - do_cmp(base, n, size, cmp_func, c, c + 1) < 0) - c++; - - if (do_cmp(base, n, size, cmp_func, r, c) >=3D 0) - break; - - do_swap(base, n, size, swap_func, r, c); + /* Find the sift-down path all the way to the leaves. */ + for (j =3D i; k =3D j * 2 + 1, k + 1 < n;) + j =3D do_cmp(base, n, size, cmp_func, k, k + 1) > 0 ? k : k + 1; + + /* Special case for the last leaf with no sibling. */ + if (j * 2 + 2 =3D=3D n) + j =3D j * 2 + 1; + + /* Backtrack to the correct location. */ + while (j !=3D i && do_cmp(base, n, size, cmp_func, i, j) >=3D 0) + j =3D (j - 1) / 2; + + /* Shift the element into its correct place. */ + for (k =3D j; j !=3D i;) { + j =3D (j - 1) / 2; + do_swap(base, n, size, swap_func, j, k); } } =20 @@ -942,17 +947,22 @@ void eytzinger0_sort(void *base, size_t n, size_t siz= e, for (i =3D n - 1; i > 0; --i) { do_swap(base, n, size, swap_func, 0, i); =20 - for (r =3D 0; r * 2 + 1 < i; r =3D c) { - c =3D r * 2 + 1; + /* Find the sift-down path all the way to the leaves. */ + for (j =3D 0; k =3D j * 2 + 1, k + 1 < i;) + j =3D do_cmp(base, n, size, cmp_func, k, k + 1) > 0 ? k : k + 1; =20 - if (c + 1 < i && - do_cmp(base, n, size, cmp_func, c, c + 1) < 0) - c++; + /* Special case for the last leaf with no sibling. */ + if (j * 2 + 2 =3D=3D i) + j =3D j * 2 + 1; =20 - if (do_cmp(base, n, size, cmp_func, r, c) >=3D 0) - break; + /* Backtrack to the correct location. */ + while (j && do_cmp(base, n, size, cmp_func, 0, j) >=3D 0) + j =3D (j - 1) / 2; =20 - do_swap(base, n, size, swap_func, r, c); + /* Shift the element into its correct place. */ + for (k =3D j; j;) { + j =3D (j - 1) / 2; + do_swap(base, n, size, swap_func, j, k); } } } --=20 2.25.1 From nobody Thu Dec 25 06:46:32 2025 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87C9F3771F; Sun, 21 Jan 2024 15:37:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851425; cv=none; b=g136HdmKrKhwih3yT2cuEJv4ACBG9eDxwgWEdWXVjEhJP3Th6NKc7sNk4fOGwjD+Og8YhMV2TvDASOivdi98fUyXKUqP3WpRcPOj0QM3ybjluIjNGkiFrPYMgffljEDCNXkLVWeJirqV6idpOGAdBe9sdK5+ohQ84gLGKOJN+VM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851425; c=relaxed/simple; bh=td7eYq5RRdZjgP7IOM9O5bSI3K4VazA2jQDoOaruLEY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lr8wtKlkoBRQR7rjvjrcSHLBVaqCznk8ECnXgImJ1hBfpD05Cm/S2HJIa9QUqWwiEIZvtuFktSEssvubqnt9vDV7+37NjEk/BoCayBjdox6nInu8iDyfPvLrO3L9u+A7gRFYzeYSGyk17Fy4/Hqkw2TSDfqaPqyDpERU7oKtlLw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=F/uylxvv; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="F/uylxvv" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-6d9b41a3cb7so828814b3a.0; Sun, 21 Jan 2024 07:37:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705851424; x=1706456224; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2pqHvvCphlALlajToOUOu8ABe7Bhki2MXmZlVOF4qBQ=; b=F/uylxvvWe45LqSORYN5ugmKL87jo+XNva6oXhtSANRpkM5TDGds2oAcUG5Ifa1P8a N8qZNUEZ4XG+FTo8srme4S9gcBHBGhDMb06F3wTk4tts5+HLQ26/bweMC5soMwABPDLP 4IVCxBnYcQtnkQLOoUnUQbMqUuJM6G4NOVwArSxHw55WdNjsj3Wq+DfB/3aUoZ40XVBw NYysfNSdTQPLVHzijOmMTw4MxCS9sIsrugHqpwkMPjhC+oKWNUyS8OojSqjbzlNrqbU6 3Q8wljLamC/soitqdk2h22wBcP0yC0tnzQvYildMNULop9qUZHU0gSputMlkRtrbaA4K TZkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705851424; x=1706456224; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2pqHvvCphlALlajToOUOu8ABe7Bhki2MXmZlVOF4qBQ=; b=dvCPvYuHyKNvQTUsRhVIC4lFNFTaHWi/HwFQ1aRDujbKJ1EIDWjCj4YJ2C5XdtJeVD FPtp20wXhu9oY8H6myI8d68yP+oxsMrT1lXAf4MDVjcHvGCL8EN67bRuu1Pcba5FHXL8 pY/FhwYJ13mU5tDnbOkmJb5PKhkBf/gm5VjlziFrXqnHsF5+1LyRSN/4qMRNFqIN9J8c 4wCFpMIkHeOlvgl+OxMEanLpmwy0ZOboyTrvd6k6E4Q1wVb6KALs63qvu7h22pBtR3l3 r6DJvveZ8FaqkHhgPW4nnjbJhhGV5KjzSx0TcA2JuMN4Hgt40wBZYrdOgRPdhTyma8lN QNjg== X-Gm-Message-State: AOJu0YzAzHVQN0uViyhOO8RLZygQO7LCEwdEU3QrzFSbObkTs870a0b5 yaNKVolek/BtOlwjfTd3J+DRoruea49EnZFcADUBgLgugNWTSKhi X-Google-Smtp-Source: AGHT+IG+D6at0+cAHIsJd/0BlRcS+2PEbB1TO9MOrWa17WDiILOJxIBA0dyciguF6FOdTC2vq+MDJg== X-Received: by 2002:a17:90a:c286:b0:290:19c1:103f with SMTP id f6-20020a17090ac28600b0029019c1103fmr3578520pjt.1.1705851423879; Sun, 21 Jan 2024 07:37:03 -0800 (PST) Received: from localhost.localdomain ([140.116.154.65]) by smtp.gmail.com with ESMTPSA id sv13-20020a17090b538d00b0028d8fa0171asm7744347pjb.35.2024.01.21.07.37.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Jan 2024 07:37:03 -0800 (PST) From: Kuan-Wei Chiu To: colyli@suse.de, kent.overstreet@linux.dev Cc: bfoster@redhat.com, jserv@ccns.ncku.edu.tw, linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org, linux-bcachefs@vger.kernel.org, Kuan-Wei Chiu Subject: [PATCH 2/5] bcachefs: Introduce parent function for sort_cmp_size() Date: Sun, 21 Jan 2024 23:36:46 +0800 Message-Id: <20240121153649.2733274-3-visitorckw@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240121153649.2733274-1-visitorckw@gmail.com> References: <20240121153649.2733274-1-visitorckw@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When dealing with array indices, the parent's index can be obtained using the formula (i - 1) / 2. However, when working with byte offsets, this approach is not straightforward. To address this, we have introduced a branch-free parent function that does not require any division operations to calculate the parent's byte offset. Signed-off-by: Kuan-Wei Chiu --- This patch has undergone unit testing using the following code [1]. [1]: static int test(void) { size_t i, p, size, lsbit; for (i =3D 0; i < 10000; i++) { size =3D get_random_u32() % (1U << 10); lsbit =3D size & -size; i =3D get_random_u32() % (1U << 20) * size + size; p =3D parent(i, lsbit, size); if (p !=3D (i / size - 1) / 2 * size) return -1; } return 0; } fs/bcachefs/util.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c index bbc83b43162e..f5bbf96df2ce 100644 --- a/fs/bcachefs/util.c +++ b/fs/bcachefs/util.c @@ -907,6 +907,13 @@ static inline void do_swap(void *base, size_t n, size_= t size, size); } =20 +static inline size_t parent(size_t i, size_t lsbit, size_t size) +{ + i -=3D size; + i -=3D size & -(i & lsbit); + return i >> 1; +} + void eytzinger0_sort(void *base, size_t n, size_t size, int (*cmp_func)(const void *, const void *, size_t), void (*swap_func)(void *, void *, size_t)) --=20 2.25.1 From nobody Thu Dec 25 06:46:32 2025 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5543C381C1; Sun, 21 Jan 2024 15:37:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851429; cv=none; b=KePnUtvS6K+OxijZhAaTVkvZMrPPFMd3XGXik/y0G+Naa7w5BoPyDtJSyxZDB5K1WP054tXp9GsfJ68V5rSA0wVPErXzCS7cx049w1sgKtgBJMX1gMrtttgVSnaviQ98i5xGImY+iA+aOjzOPzeo9gAXyf1WDXy5YEIHmL835C0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851429; c=relaxed/simple; bh=ZvDQ06JGuVeJWVozfqdeUmELQkKAXTvtxrmXADmelYo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cU80gy0Jp2T8hMIiouKcSIgIs1XgXqFD7hJpvwtwuDoumo0q5Dq2pLxCNn5++1GCXUT/oHhP8KYzS6XsejQuGNbEBtUzHg2E44ua9QQCwXbeEKMwDjvSGZnVJOkR8AsjscaDnfO5al0QHQ5AmrrW8aighVu9pfp/SoZ4BP7wQZY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SOD/BP01; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SOD/BP01" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1d3ae9d1109so6079035ad.0; Sun, 21 Jan 2024 07:37:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705851427; x=1706456227; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Z9tvCyvHjEIJoBvqUMUvKN9uieNjlbwCtwNk5egfjBU=; b=SOD/BP01vqwt0JOkwqjHPAYZy5kczb+7uXQjUmUG2Wrx6b7yRB59zxCQP8jTGCaRXG 2o34x6CbV7a0Zr4t7BSMDN5/uLBOwuUr+FlQG5uxDWgCngF/5PaMV3zRqvFHlVhDfBah 7LuzqxONoMOeUiFsYITr0jGwWpscKMqDcmn9ajZ1EZNUYgOxWCMv6MqY8z7i7+XR9nFu /NKZgusbBGjqFiE4CfC7kX7P+KYe4WBcDIq8YTAGUeGnWVGSMFBaG7zaekdbJz+3BY9x crGS0sQN3/pfyUev0tOddLIL9KmUIegHxFjY04WKOWynBIVQ/LZXx0k7PivHB9ZaIP/p wXxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705851427; x=1706456227; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z9tvCyvHjEIJoBvqUMUvKN9uieNjlbwCtwNk5egfjBU=; b=BUEhoRCaVtd12+J3f/C8gQdZPUWYaQlTB2tunOZ741lCOsHH4UeNLjUyXpao38HB/r qP1HOVVzeWSXWF4F/PEtRGc6sHMq8mO2Y4+nwMPiy0K3eepCRC3p214ypYF64NvBB/8n dCqtJIj28mOG+iK6On/gd/PPNfDa06rWaTJXbOxWlnTCvB5vr5bhdaLdvwlwjT4c883Y 6TmDsjvEN563fdlD0HnXAX7C2FLOacPx3pV8o/RJvgM2KyF8ycUYoUGcLJp6fL+BM6hM +RAqB2vrMOWcsUCeWFqFykpTuvpqq4exoes4iTunWAJqSoKOhfIBQrP03dsjcPBm3vSF MKpQ== X-Gm-Message-State: AOJu0Yz3p7POWKtzE/2ZE3dsLPt4pBPTGml/Wt3YsCgndCyAvDqgcKsF FKou8yxdiSnvIkFJRASsEGktCxenbqw3vPVkLG2/69NmYhYsnzH9 X-Google-Smtp-Source: AGHT+IHT470V+lk0gVqTK0HU/uE8EnYN270KoYPZPNHsQJLGbF7XxIFT/uJP+4iHExMw1AiH3HL7Tg== X-Received: by 2002:a17:902:ed41:b0:1d5:efd6:20f with SMTP id y1-20020a170902ed4100b001d5efd6020fmr6167217plb.1.1705851427577; Sun, 21 Jan 2024 07:37:07 -0800 (PST) Received: from localhost.localdomain ([140.116.154.65]) by smtp.gmail.com with ESMTPSA id sv13-20020a17090b538d00b0028d8fa0171asm7744347pjb.35.2024.01.21.07.37.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Jan 2024 07:37:07 -0800 (PST) From: Kuan-Wei Chiu To: colyli@suse.de, kent.overstreet@linux.dev Cc: bfoster@redhat.com, jserv@ccns.ncku.edu.tw, linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org, linux-bcachefs@vger.kernel.org, Kuan-Wei Chiu Subject: [PATCH 3/5] bcachefs: Optimize sort_cmp_size() using bottom-up heapsort Date: Sun, 21 Jan 2024 23:36:47 +0800 Message-Id: <20240121153649.2733274-4-visitorckw@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240121153649.2733274-1-visitorckw@gmail.com> References: <20240121153649.2733274-1-visitorckw@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This optimization reduces the average number of comparisons required from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is sufficiently large, it results in approximately 50% fewer comparisons. Currently, sort_cmp_size employs the textbook version of heapsort, where during the heapify process, each level requires two comparisons to determine the maximum among three elements. In contrast, the bottom-up heapsort, during heapify, only compares two children at each level until reaching a leaf node. Then, it backtracks from the leaf node to find the correct position. Since heapify typically continues until very close to the leaf node, the standard heapify requires about 2*log2(n) comparisons, while the bottom-up variant only needs log2(n) comparisons. Note: This patch depends on patch "bcachefs: Introduce parent function for sort_cmp_size()". The experimental data presented below is based on an array generated by get_random_u32(). | N | comparisons(old) | comparisons(new) | time(old) | time(new) | |-------|------------------|------------------|-----------|-----------| | 10000 | 235498 | 136592 | 17954 us | 14834 us | | 20000 | 510666 | 293254 | 23549 us | 18364 us | | 30000 | 800461 | 457351 | 33361 us | 19493 us | | 40000 | 1101317 | 626661 | 33672 us | 26810 us | | 50000 | 1409743 | 799745 | 42634 us | 34694 us | | 60000 | 1721165 | 974737 | 51414 us | 41367 us | | 70000 | 2037972 | 1152111 | 63084 us | 49146 us | | 80000 | 2362590 | 1333270 | 73802 us | 54715 us | | 90000 | 2690155 | 1516148 | 82693 us | 63070 us | | 100000| 3019730 | 1699757 | 88981 us | 70367 us | Refs: BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small) Ingo Wegener Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993 https://doi.org/10.1016/0304-3975(93)90364-Y Signed-off-by: Kuan-Wei Chiu --- This patch has undergone unit testing and micro benchmarking using the following code [1]. [1]: static long long int cmp_count =3D 0; static int mycmp(const void *a, const void *b, size_t size) { u32 _a =3D *(u32 *)a; u32 _b =3D *(u32 *)b; cmp_count++; if (_a < _b) return -1; else if (_a > _b) return 1; else return 0; } static int test(void) { size_t N, i; ktime_t start, end; s64 delta; u32 *arr; for (N =3D 10000; N <=3D 100000; N +=3D 10000) { arr =3D kmalloc_array(N, sizeof(u32), GFP_KERNEL); cmp_count =3D 0; for (i =3D 0; i < N; i++) arr[i] =3D get_random_u32(); =09 start =3D ktime_get(); sort_cmp_size(arr, N, sizeof(u32), mycmp, NULL); end =3D ktime_get(); delta =3D ktime_us_delta(end, start); printk(KERN_INFO "time: %lld\n", delta); printk(KERN_INFO "comparisons: %lld\n", cmp_count); for (i =3D 0; i < N - 1; i++) { if (arr[i] > arr[i+1]) { kfree(arr); return -1; } } kfree(arr); } return 0; } fs/bcachefs/util.c | 52 +++++++++++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c index f5bbf96df2ce..4dacb2b9a0a5 100644 --- a/fs/bcachefs/util.c +++ b/fs/bcachefs/util.c @@ -979,7 +979,8 @@ void sort_cmp_size(void *base, size_t num, size_t size, void (*swap_func)(void *, void *, size_t size)) { /* pre-scale counters for performance */ - int i =3D (num/2 - 1) * size, n =3D num * size, c, r; + int i =3D (num / 2 - 1) * size, n =3D num * size, j, k; + const size_t lsbit =3D size & -size; =20 if (!swap_func) { if (size =3D=3D 4 && alignment_ok(base, 4)) @@ -992,28 +993,45 @@ void sort_cmp_size(void *base, size_t num, size_t siz= e, =20 /* heapify */ for ( ; i >=3D 0; i -=3D size) { - for (r =3D i; r * 2 + size < n; r =3D c) { - c =3D r * 2 + size; - if (c < n - size && - cmp_func(base + c, base + c + size, size) < 0) - c +=3D size; - if (cmp_func(base + r, base + c, size) >=3D 0) - break; - swap_func(base + r, base + c, size); + /* Find the sift-down path all the way to the leaves. */ + for (j =3D i; k =3D j * 2 + size, k + size < n;) + j =3D cmp_func(base + k, base + k + size, size) > 0 ? k : k + size; + + /* Special case for the last leaf with no sibling. */ + if (j * 2 + size * 2 =3D=3D n) + j =3D j * 2 + size; + + /* Backtrack to the correct location. */ + while (j !=3D i && cmp_func(base + i, base + j, size) >=3D 0) + j =3D parent(j, lsbit, size); + + /* Shift the element into its correct place. */ + for (k =3D j; j !=3D i;) { + j =3D parent(j, lsbit, size); + swap_func(base + j, base + k, size); } } =20 /* sort */ for (i =3D n - size; i > 0; i -=3D size) { swap_func(base, base + i, size); - for (r =3D 0; r * 2 + size < i; r =3D c) { - c =3D r * 2 + size; - if (c < i - size && - cmp_func(base + c, base + c + size, size) < 0) - c +=3D size; - if (cmp_func(base + r, base + c, size) >=3D 0) - break; - swap_func(base + r, base + c, size); + + /* Find the sift-down path all the way to the leaves. */ + for (j =3D 0; k =3D j * 2 + size, k + size < i;) + j =3D cmp_func(base + k, base + k + size, size) > 0 ? k : k + size; + + /* Special case for the last leaf with no sibling. */ + if (j * 2 + size * 2 =3D=3D i) + j =3D j * 2 + size; + + /* Backtrack to the correct location. */ + while (j && cmp_func(base, base + j, size) >=3D 0) + j =3D parent(j, lsbit, size); + + /* Shift the element into its correct place. */ + for (k =3D j; j;) { + j =3D parent(j, lsbit, size); + swap_func(base + j, base + k, size); } } } --=20 2.25.1 From nobody Thu Dec 25 06:46:32 2025 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1B0B38385; Sun, 21 Jan 2024 15:37:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851433; cv=none; b=g2XTBd6dUxEZqeWOL32j2mi0mjI1u1a2/3eayzXzlufdC38ZjXkxfCQMA0KzRZnE/oZfUYYF8RuC30b3qTpe3HnME+AbWbvZUCmOJrI6LmoM55o1suFHIrFpCvXyVrBmwi/z5EA8mM52oGxWBg7UC28IdboU+ipk2nSdyCCHYPI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851433; c=relaxed/simple; bh=tijDl2hnuKm9WbAUqkhD9/1J1vuoq4jZt7ycAR0fs4A=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=sLNcmpoAx+RGZEjPnpngMS0aK6d3Ooh1juxfDIUhW646INANFx0WkGulKjVY3JWVSvyF6j3xYcpbQ0K7u4qmnAiMgm9Wgtg79QZ+A9uVn3t+2X9EkvI6YXO1yrhELO8x6oUmboMoWjV7IyjI1oXgYIpDETicw8peREnF3zgmskU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gzlPBqwt; arc=none smtp.client-ip=209.85.216.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gzlPBqwt" Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-28b93b04446so606646a91.0; Sun, 21 Jan 2024 07:37:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705851431; x=1706456231; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iygKgI5ZDiaIj0nQgrBOB8aT9sAvWiPz2NR8vE+U8Qg=; b=gzlPBqwtr5Kvk1cTFJjZPdDKmd0HoQob82qtmF8u4L0VYypR6kNKzCrAFKvVee948z 8MB7cpNxF5S6e9jFSLrXBjPyKQ+6mO661vEQuD8/1vSgOitDhvmUgazyYl0Ok8q4Bmw5 DHIwJ9DGeCFeekJh+wFe8NqJzs5W1Bb4mz1mHNs2Cun8hCp2K7OMlR8+uvDr/uf2ntaM wuPys2cFJFJlgafiQTrAYkDsDXJ5KL1Zaev3hUTv6b+dwsycG38t0a9x5Kht72c97mSj v1tag6EpyHwqXPYlfWEABw/2GqUqE3u6rrOPimEXhkcF+4i+u5phzNzdz/aLekS7vkXo a+DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705851431; x=1706456231; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iygKgI5ZDiaIj0nQgrBOB8aT9sAvWiPz2NR8vE+U8Qg=; b=ZzVxGJ8MLMwRzBc59NjWTxXw/SxfuvstyymQIAz6nifwMnw/myp62zRjXcBScYusfE rX/WVqn4aRXQrM+YHnQ3HJE7Q8C8/9D+MlkGq2tgP5+rABJaTCxxORqMImz2xMd3Bked ZcwPIqDRDIjGO0zpK/H1JwthArU/KsOd4ATbSuhg+PIP0mQnUYLCvWQ5R7o/itjV/60O cLOsN9BYzsMzUrv9sT7En8f44tw3oQgg5gJiwoszKH+ol+3H3bCcpx88fUF4wv2C7PCV ebz7EZ68ZMWdSEpGL1LN7dn2Jo4KZmHA6go/bAqLdsbfH3jTx5nHraxYXK5gjF+FZ8d5 noeQ== X-Gm-Message-State: AOJu0Yw5Qt5xkgpbZH8RicdMzY1NNTP0r6LBhSGke+BhP555xleSFfIR DGkpUvOVPXhiZ2nOimeGkWqYzvax+S9jWy9xPh5x/2vnfiaGrO3Y X-Google-Smtp-Source: AGHT+IGIdIgbXNxq7W6wY/eMqCNBhWI4UxT+2S/K6rteLkQcBYPfChzZQxgBkWOWB7WfemwzzXQXwA== X-Received: by 2002:a17:90a:c098:b0:28e:3dcc:7381 with SMTP id o24-20020a17090ac09800b0028e3dcc7381mr3733782pjs.0.1705851431249; Sun, 21 Jan 2024 07:37:11 -0800 (PST) Received: from localhost.localdomain ([140.116.154.65]) by smtp.gmail.com with ESMTPSA id sv13-20020a17090b538d00b0028d8fa0171asm7744347pjb.35.2024.01.21.07.37.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Jan 2024 07:37:10 -0800 (PST) From: Kuan-Wei Chiu To: colyli@suse.de, kent.overstreet@linux.dev Cc: bfoster@redhat.com, jserv@ccns.ncku.edu.tw, linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org, linux-bcachefs@vger.kernel.org, Kuan-Wei Chiu Subject: [PATCH 4/5] bcachefs: Optimize number of comparisons in heap_sift_down Date: Sun, 21 Jan 2024 23:36:48 +0800 Message-Id: <20240121153649.2733274-5-visitorckw@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240121153649.2733274-1-visitorckw@gmail.com> References: <20240121153649.2733274-1-visitorckw@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Optimize the heap_sift_down() macro, resulting in a significant reduction of approximately 50% in the number of comparisons for large random inputs, while maintaining identical results. The current implementation performs two comparisons per level to identify the maximum among three elements. In contrast, the proposed bottom-up variation uses only one comparison per level to assess two children until reaching the leaves. Then, it sifts up until the correct position is determined. Typically, the process of sifting down proceeds to the leaf level, resulting in O(1) secondary comparisons instead of log2(n). This optimization significantly reduces the number of costly indirect function calls and improves overall performance. The experimental data below is derived from first adding N elements generated by get_random_u32() to the heap, and then performing heap_pop until the heap is empty. | N | comparisons(old) | comparisons(new) | time(old) | time(new) | |-------|------------------|------------------|-----------|-----------| | 10000 | 239233 | 142673 | 1219 us | 1000 us | | 20000 | 518498 | 305394 | 2564 us | 1904 us | | 30000 | 812864 | 476594 | 4197 us | 3203 us | | 40000 | 1117553 | 651737 | 5666 us | 4290 us | | 50000 | 1430128 | 830600 | 7156 us | 5574 us | | 60000 | 1746128 | 1012201 | 8862 us | 7036 us | | 70000 | 2066099 | 1196653 | 10454 us | 8111 us | | 80000 | 2394593 | 1383311 | 11993 us | 9602 us | | 90000 | 2727097 | 1572381 | 13501 us | 10718 us | | 100000| 3059841 | 1763083 | 15325 us | 11776 us | Signed-off-by: Kuan-Wei Chiu --- This patch has undergone unit testing and micro benchmarking using the following code [1]. [1]: static long long int cmp_count =3D 0; typedef HEAP(u32) heap; int mycmp(heap *h, u32 a, u32 b) { cmp_count++; if (a < b) return -1; if (a > b) return 1; return 0; } int check_heap(heap *h, int (*cmp)(heap *, u32, u32)) { size_t i; for (i =3D 0; i < h->used / 2; i++) { if (i * 2 + 1 < h->used) if (cmp(h, h->data[i], h->data[i * 2 + 1]) > 0) return -1; if (i * 2 + 2 < h->used) if (cmp(h, h->data[i], h->data[i * 2 + 2]) > 0) return -1; } return 0; } static int test(void) { size_t N, i; u32 x; heap myheap; ktime_t start, end; s64 delta; /* Test for correctness. */ for (N =3D 1000; N <=3D 10000; N +=3D 1000) { init_heap(&myheap, N, GFP_KERNEL); for (i =3D 0; i < N; i++) { heap_add(&myheap, get_random_u32(), mycmp, NULL); if (check_heap(&myheap, mycmp)) return -1; } for (i =3D 0; i < N; i++) { heap_pop(&myheap, x, mycmp, NULL); if (check_heap(&myheap, mycmp)) return -1; } free_heap(&myheap); } /* Micro-benchmark. */ for (N =3D 10000; N <=3D 100000; N +=3D 10000) { cmp_count =3D 0; init_heap(&myheap, N, GFP_KERNEL); start =3D ktime_get(); for (i =3D 0; i < N; i++) heap_add(&myheap, get_random_u32(), mycmp, NULL); for (i =3D 0; i < N; i++) heap_pop(&myheap, x, mycmp, NULL); end =3D ktime_get(); delta =3D ktime_us_delta(end, start); printk(KERN_INFO "time: %lld\n", delta); printk(KERN_INFO "comparisons: %lld\n", cmp_count); free_heap(&myheap); } return 0; } fs/bcachefs/util.h | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h index c75fc31915d3..be22eb63084b 100644 --- a/fs/bcachefs/util.h +++ b/fs/bcachefs/util.h @@ -131,17 +131,20 @@ do { \ =20 #define heap_sift_down(h, i, cmp, set_backpointer) \ do { \ - size_t _c, _j =3D i; \ + size_t _j, _k; \ \ - for (; _j * 2 + 1 < (h)->used; _j =3D _c) { \ - _c =3D _j * 2 + 1; \ - if (_c + 1 < (h)->used && \ - cmp(h, (h)->data[_c], (h)->data[_c + 1]) >=3D 0) \ - _c++; \ - \ - if (cmp(h, (h)->data[_c], (h)->data[_j]) >=3D 0) \ - break; \ - heap_swap(h, _c, _j, set_backpointer); \ + for (_j =3D i; _k =3D _j * 2 + 1, _k + 1 < (h)->used;) { \ + if (cmp(h, (h)->data[_k], (h)->data[_k + 1]) >=3D 0) \ + _k++; \ + _j =3D _k; \ + } \ + if (_j * 2 + 2 =3D=3D (h)->used) \ + _j =3D _j * 2 + 1; \ + while (_j !=3D i && cmp(h, (h)->data[i], (h)->data[_j]) <=3D 0) \ + _j =3D (_j - 1) / 2; \ + for (_k =3D _j; _j !=3D i;) { \ + _j =3D (_j - 1) / 2; \ + heap_swap(h, _j, _k, set_backpointer); \ } \ } while (0) =20 --=20 2.25.1 From nobody Thu Dec 25 06:46:32 2025 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9482A383BA; Sun, 21 Jan 2024 15:37:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851437; cv=none; b=ezkxdlLcmJIhQZrZDI/mu1MAxAnrRFDMSEssSFLf75m5Sup2jKYuy6w2jb7wvf5dRnaAOcL6ePSDY+7kXNvSsPg8Vl6tmOFiv1X0IW8caZ/LQm6ouh7C6HbByA3r9BBfp03iElUi3Nc/huXKZDlFJ7hbtGw8A//oX3ZKFR4CHEs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705851437; c=relaxed/simple; bh=/VqrbHUIRmrLzrdN+YUUNaTaIebnWjB3AeI+d3LBsk4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tab9B317JCvR+flhM7eG0m/Om/QPW6BQZRD2fiVVhk3jHymM/qJbRIl/A4skEYah8dwShTH5aUbbVWd1cNBNHUKscRJ4YQ3WYP4QXGzV/cdIkpJoV9o16ZnWobsTUCAA8GK1vwukjxbu5JXjP7RuxO8ofGCtxbGaEe5HZo2N7IM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ucqf9o1m; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ucqf9o1m" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-5cdbc42f5efso194278a12.0; Sun, 21 Jan 2024 07:37:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705851435; x=1706456235; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l18zRlt2Y1RNd/KM4WSl1s19uASsIpnGpnt5xxTh/gA=; b=Ucqf9o1mFallL8saWFNH6iKgwij3WAxUv6o7wEPAcXxfsLK73YZ+D03lYeObvioHmF WKnrdpmC7zUD1Rj3hc/Catf+UX7BexD8vSjRK+zpLhQZG+d4jBhwFcnY6pGMpnypU8PV yb8/qB9HPto9KejvsALTKtWrFKLh2iILHnFN5oCUh6WLFCvhg/bFqwnobjVqa37eby9F IrEvc/KGD9CM4ZvxB/ZngpVjn2COxYY/UdlySyq/mES2JW95wwfXY10AqoZUO78bBBpe 91PF0fYTTHW4yH5Sc1MVucXKTqsxspMZVxSakNdzI5NCeMwxkIafTAHMDJtVSDkACXaf PaCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705851435; x=1706456235; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l18zRlt2Y1RNd/KM4WSl1s19uASsIpnGpnt5xxTh/gA=; b=JTjG3exd8qK1AGpVxq/Ib1YMcO6ng6PlYvLYEEVyi6UOGQxBFhWLhjkplWZvDT6YIT QxAnw9SBv8VvykOkMS8KnULab+GdeSaq+B868rL/W6lFR8bLVIEg+mo2jYNUUN3khf87 Yj/qYkOpicW7HqoX0OqbN44R0XzGqcRtWz72iHpWlDUROEXpUGjEtFzrPW6snR3QT5eU 6Bt9MxDsHP3VYZKxMEvZxupaGLiIc3CXf1SKsOxd+nQp0AbPh/EiPR2eSDuGLsxBSMgC XTFJlWg/CRONsc1wx8+qWr5dEuDdyYf39xPDFbKoBJ3SrY+qMCRyGWFue9Xcu8eIv7Rh RmDQ== X-Gm-Message-State: AOJu0Yxy5Xi7Pi1rzZEyP63oo4xsPeOPpXZh+yipc1Jbb/xSM2rnAHTG fXjOsK50HjdlrmIwHIV0GxHmPSPu/hIamsxbUobeRQFlmEOUq7Se X-Google-Smtp-Source: AGHT+IEj47iqSM/u5cnpFwEi+Tlit4Kq3eDjT8Fcn9cvyD2BTStrxajADI0G6lDHkblf2BuAGIFa5Q== X-Received: by 2002:a17:902:ee05:b0:1d3:c5e4:b2f6 with SMTP id z5-20020a170902ee0500b001d3c5e4b2f6mr6021894plb.4.1705851434840; Sun, 21 Jan 2024 07:37:14 -0800 (PST) Received: from localhost.localdomain ([140.116.154.65]) by smtp.gmail.com with ESMTPSA id sv13-20020a17090b538d00b0028d8fa0171asm7744347pjb.35.2024.01.21.07.37.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Jan 2024 07:37:14 -0800 (PST) From: Kuan-Wei Chiu To: colyli@suse.de, kent.overstreet@linux.dev Cc: bfoster@redhat.com, jserv@ccns.ncku.edu.tw, linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org, linux-bcachefs@vger.kernel.org, Kuan-Wei Chiu Subject: [PATCH 5/5] bcache: Optimize number of comparisons in heap_sift Date: Sun, 21 Jan 2024 23:36:49 +0800 Message-Id: <20240121153649.2733274-6-visitorckw@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240121153649.2733274-1-visitorckw@gmail.com> References: <20240121153649.2733274-1-visitorckw@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Optimize the heap_sift() macro, resulting in a significant reduction of approximately 50% in the number of comparisons for large random inputs, while maintaining identical results. The current implementation performs two comparisons per level to identify the maximum among three elements. In contrast, the proposed bottom-up variation uses only one comparison per level to assess two children until reaching the leaves. Then, it sifts up until the correct position is determined. Typically, the process of sifting down proceeds to the leaf level, resulting in O(1) secondary comparisons instead of log2(n). This optimization significantly reduces the number of costly indirect function calls and improves overall performance. The experimental data below is derived from first adding N elements generated by get_random_u32() to the heap, and then performing heap_pop until the heap is empty. | N | comparisons(old) | comparisons(new) | time(old) | time(new) | |-------|------------------|------------------|-----------|-----------| | 10000 | 249215 | 164872 | 1253 us | 958 us | | 20000 | 539766 | 350113 | 2693 us | 2059 us | | 30000 | 843297 | 543037 | 4120 us | 3119 us | | 40000 | 1159127 | 739595 | 5624 us | 4221 us | | 50000 | 1482608 | 941655 | 7147 us | 5349 us | | 60000 | 1808772 | 1144716 | 8754 us | 6786 us | | 70000 | 2139443 | 1348878 | 10323 us | 8030 us | | 80000 | 2478304 | 1560892 | 11934 us | 9061 us | | 90000 | 2820532 | 1768678 | 13611 us | 10679 us | | 100000| 3163503 | 1983806 | 15244 us | 11745 us | Signed-off-by: Kuan-Wei Chiu --- This patch has undergone unit testing and micro benchmarking using the following code [1]. [1]: static long long int cmp_count =3D 0; typedef DECLARE_HEAP(u32, heap); int mycmp(u32 a, u32 b) { cmp_count++; return a > b; } int check_heap(heap *h, int (*cmp)(u32, u32)) { size_t i; for (i =3D 0; i < h->used / 2; i++) { if (i * 2 + 1 < h->used) if (cmp(h->data[i], h->data[i * 2 + 1])) return -1; if (i * 2 + 2 < h->used) if (cmp(h->data[i], h->data[i * 2 + 2])) return -1; } return 0; } static int test(void) { size_t N, i; u32 x; heap myheap; ktime_t start, end; s64 delta; /* Test for correctness. */ for (N =3D 1000; N <=3D 10000; N +=3D 1000) { init_heap(&myheap, N, GFP_KERNEL); for (i =3D 0; i < N; i++) { heap_add(&myheap, get_random_u32(), mycmp); if (check_heap(&myheap, mycmp)) return -1; } for (i =3D 0; i < N; i++) { heap_pop(&myheap, x, mycmp); if (check_heap(&myheap, mycmp)) return -1; } free_heap(&myheap); } /* Micro-benchmark. */ for(N =3D 10000; N <=3D 100000; N +=3D 10000) { cmp_count =3D 0; init_heap(&myheap, N, GFP_KERNEL); start =3D ktime_get(); for (i =3D 0; i < N; i++) heap_add(&myheap, get_random_u32(), mycmp); for (i =3D 0; i < N; i++) heap_pop(&myheap, x, mycmp); end =3D ktime_get(); delta =3D ktime_us_delta(end, start); printk(KERN_INFO "time: %lld\n", delta); printk(KERN_INFO "comparisons: %lld\n", cmp_count); free_heap(&myheap); } return 0; } drivers/md/bcache/util.h | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h index f61ab1bada6c..3aa74b0d7f0a 100644 --- a/drivers/md/bcache/util.h +++ b/drivers/md/bcache/util.h @@ -56,17 +56,20 @@ do { \ =20 #define heap_sift(h, i, cmp) \ do { \ - size_t _r, _j =3D i; \ + size_t _j, _k; \ \ - for (; _j * 2 + 1 < (h)->used; _j =3D _r) { \ - _r =3D _j * 2 + 1; \ - if (_r + 1 < (h)->used && \ - cmp((h)->data[_r], (h)->data[_r + 1])) \ - _r++; \ - \ - if (cmp((h)->data[_r], (h)->data[_j])) \ - break; \ - heap_swap(h, _r, _j); \ + for (_j =3D i; _k =3D 2 * _j + 1, _k + 1 < (h)->used;) { \ + if (cmp((h)->data[_k], (h)->data[_k + 1])) \ + _k++; \ + _j =3D _k; \ + } \ + if (_j * 2 + 2 =3D=3D (h)->used) \ + _j =3D _j * 2 + 1; \ + while (_j !=3D i && cmp((h)->data[_j], (h)->data[i])) \ + _j =3D (_j - 1) / 2; \ + for (_k =3D _j; _j !=3D i;) { \ + _j =3D (_j - 1) / 2; \ + heap_swap(h, _j, _k); \ } \ } while (0) =20 --=20 2.25.1