[PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout

Kairui Song via B4 Relay posted 3 patches 1 month, 2 weeks ago
There is a newer version of this series
[PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout
Posted by Kairui Song via B4 Relay 1 month, 2 weeks ago
From: Kairui Song <kasong@tencent.com>

Since commit 0ff67f990bd4 ("mm, swap: remove swap slot cache"),
hibernation has been using the swap slot slow allocation path for
simplification, which turns out might cause regression for some
devices because the allocator now rotates clusters too often, leading to
slower allocation and more random distribution of data.

Fast allocation is not complex, so implement hibernation support as
well.

Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the
performance is several times better [1]:
6.19:               324 seconds
After this series:  35 seconds

Fixes: 0ff67f990bd4 ("mm, swap: remove swap slot cache")
Reported-by: Carsten Grohmann <mail@carstengrohmann.de>
Closes: https://lore.kernel.org/linux-mm/20260206121151.dea3633d1f0ded7bbf49c22e@linux-foundation.org/
Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ [1]
Cc: stable@vger.kernel.org
Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/swapfile.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index c6863ff7152c..32e0e7545ab8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1926,8 +1926,9 @@ void swap_put_entries_direct(swp_entry_t entry, int nr)
 /* Allocate a slot for hibernation */
 swp_entry_t swap_alloc_hibernation_slot(int type)
 {
-	struct swap_info_struct *si = swap_type_to_info(type);
-	unsigned long offset;
+	struct swap_info_struct *pcp_si, *si = swap_type_to_info(type);
+	unsigned long pcp_offset, offset = SWAP_ENTRY_INVALID;
+	struct swap_cluster_info *ci;
 	swp_entry_t entry = {0};
 
 	if (!si)
@@ -1937,11 +1938,21 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
 	if (get_swap_device_info(si)) {
 		if (si->flags & SWP_WRITEOK) {
 			/*
-			 * Grab the local lock to be compliant
-			 * with swap table allocation.
+			 * Try the local cluster first if it matches the device. If
+			 * not, try grab a new cluster and override local cluster.
 			 */
 			local_lock(&percpu_swap_cluster.lock);
-			offset = cluster_alloc_swap_entry(si, NULL);
+			pcp_si = this_cpu_read(percpu_swap_cluster.si[0]);
+			pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]);
+			if (pcp_si == si && pcp_offset) {
+				ci = swap_cluster_lock(si, pcp_offset);
+				if (cluster_is_usable(ci, 0))
+					offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset);
+				else
+					swap_cluster_unlock(ci);
+			}
+			if (!offset)
+				offset = cluster_alloc_swap_entry(si, NULL);
 			local_unlock(&percpu_swap_cluster.lock);
 			if (offset)
 				entry = swp_entry(si->type, offset);

-- 
2.52.0
Re: [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout
Posted by Barry Song 1 month, 2 weeks ago
On Mon, Feb 16, 2026 at 3:00 AM Kairui Song via B4 Relay
<devnull+kasong.tencent.com@kernel.org> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Since commit 0ff67f990bd4 ("mm, swap: remove swap slot cache"),
> hibernation has been using the swap slot slow allocation path for
> simplification, which turns out might cause regression for some
> devices because the allocator now rotates clusters too often, leading to
> slower allocation and more random distribution of data.
>
> Fast allocation is not complex, so implement hibernation support as
> well.
>
> Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the
> performance is several times better [1]:
> 6.19:               324 seconds
> After this series:  35 seconds
>
> Fixes: 0ff67f990bd4 ("mm, swap: remove swap slot cache")
> Reported-by: Carsten Grohmann <mail@carstengrohmann.de>
> Closes: https://lore.kernel.org/linux-mm/20260206121151.dea3633d1f0ded7bbf49c22e@linux-foundation.org/
> Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ [1]
> Cc: stable@vger.kernel.org
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/swapfile.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index c6863ff7152c..32e0e7545ab8 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1926,8 +1926,9 @@ void swap_put_entries_direct(swp_entry_t entry, int nr)
>  /* Allocate a slot for hibernation */
>  swp_entry_t swap_alloc_hibernation_slot(int type)
>  {
> -       struct swap_info_struct *si = swap_type_to_info(type);
> -       unsigned long offset;
> +       struct swap_info_struct *pcp_si, *si = swap_type_to_info(type);
> +       unsigned long pcp_offset, offset = SWAP_ENTRY_INVALID;
> +       struct swap_cluster_info *ci;
>         swp_entry_t entry = {0};
>
>         if (!si)
> @@ -1937,11 +1938,21 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
>         if (get_swap_device_info(si)) {
>                 if (si->flags & SWP_WRITEOK) {
>                         /*
> -                        * Grab the local lock to be compliant
> -                        * with swap table allocation.
> +                        * Try the local cluster first if it matches the device. If
> +                        * not, try grab a new cluster and override local cluster.
>                          */
>                         local_lock(&percpu_swap_cluster.lock);
> -                       offset = cluster_alloc_swap_entry(si, NULL);
> +                       pcp_si = this_cpu_read(percpu_swap_cluster.si[0]);
> +                       pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]);
> +                       if (pcp_si == si && pcp_offset) {
> +                               ci = swap_cluster_lock(si, pcp_offset);
> +                               if (cluster_is_usable(ci, 0))
> +                                       offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset);
> +                               else
> +                                       swap_cluster_unlock(ci);
> +                       }
> +                       if (!offset)

I assume you mean SWAP_ENTRY_INVALID? Would that be more readable?

> +                               offset = cluster_alloc_swap_entry(si, NULL);
>                         local_unlock(&percpu_swap_cluster.lock);
>                         if (offset)
>                                 entry = swp_entry(si->type, offset);
>
> --
> 2.52.0

Thanks
Barry
Re: [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout
Posted by Kairui Song 1 month, 2 weeks ago
On Mon, Feb 16, 2026 at 04:43:40AM +0800, Barry Song wrote:
> > @@ -1937,11 +1938,21 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
> >         if (get_swap_device_info(si)) {
> >                 if (si->flags & SWP_WRITEOK) {
> >                         /*
> > -                        * Grab the local lock to be compliant
> > -                        * with swap table allocation.
> > +                        * Try the local cluster first if it matches the device. If
> > +                        * not, try grab a new cluster and override local cluster.
> >                          */
> >                         local_lock(&percpu_swap_cluster.lock);
> > -                       offset = cluster_alloc_swap_entry(si, NULL);
> > +                       pcp_si = this_cpu_read(percpu_swap_cluster.si[0]);
> > +                       pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]);
> > +                       if (pcp_si == si && pcp_offset) {
> > +                               ci = swap_cluster_lock(si, pcp_offset);
> > +                               if (cluster_is_usable(ci, 0))
> > +                                       offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset);
> > +                               else
> > +                                       swap_cluster_unlock(ci);
> > +                       }
> > +                       if (!offset)
> 
> I assume you mean SWAP_ENTRY_INVALID? Would that be more readable?

Yes, it's very common in swapfile.c to check !offset since
SWAP_ENTRY_INVALID is zero. But I agree checking SWAP_ENTRY_INVALID
is more readable and maintainable, I'll change to SWAP_ENTRY_INVALID,
also use this macro more in further codes.

Thanks!