From: Hui Zhu <zhuhui@kylinos.cn>
swap_cluster_alloc_table() assumes that the caller holds the following
locks:
ci->lock
percpu_swap_cluster.lock
si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices)
There are five call paths leading to swap_cluster_alloc_table():
swap_alloc_hibernation_slot->cluster_alloc_swap_entry
->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table
swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list
->isolate_lock_cluster->swap_cluster_alloc_table
swap_alloc_hibernation_slot->cluster_alloc_swap_entry
->swap_reclaim_full_clusters->isolate_lock_cluster
->swap_cluster_alloc_table
swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters
->isolate_lock_cluster->swap_cluster_alloc_table
swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster
->swap_cluster_alloc_table
Other paths correctly acquire the necessary locks before calling
swap_cluster_alloc_table().
But the swap_reclaim_work() path fails to acquire
percpu_swap_cluster.lock and, for non-SWP_SOLIDSTATE devices,
si->global_cluster_lock.
This patch fixes the issue by ensuring swap_reclaim_work() properly
acquires the required locks before proceeding with the swap cluster
allocation.
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
---
mm/swapfile.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 94af29d1de88..2e8717f84ba3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1031,7 +1031,15 @@ static void swap_reclaim_work(struct work_struct *work)
si = container_of(work, struct swap_info_struct, reclaim_work);
+ local_lock(&percpu_swap_cluster.lock);
+ if (!(si->flags & SWP_SOLIDSTATE))
+ spin_lock(&si->global_cluster_lock);
+
swap_reclaim_full_clusters(si, true);
+
+ if (!(si->flags & SWP_SOLIDSTATE))
+ spin_unlock(&si->global_cluster_lock);
+ local_unlock(&percpu_swap_cluster.lock);
}
/*
--
2.43.0
On Fri, Mar 06, 2026 at 07:50:36PM +0800, Hui Zhu wrote: > From: Hui Zhu <zhuhui@kylinos.cn> Hello Hui Zhu! :) > > swap_cluster_alloc_table() assumes that the caller holds the following > locks: > ci->lock > percpu_swap_cluster.lock > si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices) > > There are five call paths leading to swap_cluster_alloc_table(): > swap_alloc_hibernation_slot->cluster_alloc_swap_entry > ->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table > > swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list > ->isolate_lock_cluster->swap_cluster_alloc_table > > swap_alloc_hibernation_slot->cluster_alloc_swap_entry > ->swap_reclaim_full_clusters->isolate_lock_cluster > ->swap_cluster_alloc_table > > swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters > ->isolate_lock_cluster->swap_cluster_alloc_table > > swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster > ->swap_cluster_alloc_table Can isolate_lock_cluster() actually invoke swap_cluster_alloc_table() on a full cluster? My understanding is that full clusters already have a swap_table allocated, and swap_cluster_alloc_table() is only called for free clusters that need a new allocation. If isolate_lock_cluster() checks !cluster_table_is_alloced() before calling swap_cluster_alloc_table(), wouldn't the full-cluster reclaim path skip that allocation entirely? > Other paths correctly acquire the necessary locks before calling > swap_cluster_alloc_table(). > But the swap_reclaim_work() path fails to acquire > percpu_swap_cluster.lock and, for non-SWP_SOLIDSTATE devices, > si->global_cluster_lock. If my assumtion is right, table is not alloced so synchronization is not need. Also, percpu_swap_cluster.lock and si->global_cluster_lock appear to protect the percpu cluster cache and global cluster state, not the allocation table itself as I think. Best Regards Youngjun Park > This patch fixes the issue by ensuring swap_reclaim_work() properly > acquires the required locks before proceeding with the swap cluster > allocation. > > Signed-off-by: Hui Zhu <zhuhui@kylinos.cn> > --- > mm/swapfile.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 94af29d1de88..2e8717f84ba3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1031,7 +1031,15 @@ static void swap_reclaim_work(struct work_struct *work) > > si = container_of(work, struct swap_info_struct, reclaim_work); > > + local_lock(&percpu_swap_cluster.lock); > + if (!(si->flags & SWP_SOLIDSTATE)) > + spin_lock(&si->global_cluster_lock); > + > swap_reclaim_full_clusters(si, true); > + > + if (!(si->flags & SWP_SOLIDSTATE)) > + spin_unlock(&si->global_cluster_lock); > + local_unlock(&percpu_swap_cluster.lock); > } > > /* > -- > 2.43.0 > >
On Fri, Mar 6, 2026 at 9:58 PM YoungJun Park <youngjun.park@lge.com> wrote: > > On Fri, Mar 06, 2026 at 07:50:36PM +0800, Hui Zhu wrote: > > From: Hui Zhu <zhuhui@kylinos.cn> > > Hello Hui Zhu! :) > > > > swap_cluster_alloc_table() assumes that the caller holds the following > > locks: > > ci->lock > > percpu_swap_cluster.lock > > si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices) > > > > There are five call paths leading to swap_cluster_alloc_table(): > > swap_alloc_hibernation_slot->cluster_alloc_swap_entry > > ->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table > > > > swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list > > ->isolate_lock_cluster->swap_cluster_alloc_table > > > > swap_alloc_hibernation_slot->cluster_alloc_swap_entry > > ->swap_reclaim_full_clusters->isolate_lock_cluster > > ->swap_cluster_alloc_table > > > > swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters > > ->isolate_lock_cluster->swap_cluster_alloc_table > > > > swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster > > ->swap_cluster_alloc_table > > Can isolate_lock_cluster() actually invoke swap_cluster_alloc_table() > on a full cluster? My understanding is that full clusters already have > a swap_table allocated, and swap_cluster_alloc_table() is only called > for free clusters that need a new allocation. If isolate_lock_cluster() > checks !cluster_table_is_alloced() before calling swap_cluster_alloc_table(), > wouldn't the full-cluster reclaim path skip that allocation entirely? Hi all, thanks for the patch and review. That's correct, I don't think a full cluster will need allocation. Maybe we can add a VM_WARN to check if `list == si->free_cluster` when cluster_table_is_alloced is true? Just in case and make things easier to understand. BTW there is already a comment in swap_cluster_alloc_table: "Only cluster isolation from the allocator does table allocation." That comment can be extended if that's not detailed enough.
© 2016 - 2026 Red Hat, Inc.