[PATCH 1/2] mm/swap: fix missing locks in swap_reclaim_work()

Hui Zhu posted 2 patches 1 month ago
[PATCH 1/2] mm/swap: fix missing locks in swap_reclaim_work()
Posted by Hui Zhu 1 month ago
From: Hui Zhu <zhuhui@kylinos.cn>

swap_cluster_alloc_table() assumes that the caller holds the following
locks:
ci->lock
percpu_swap_cluster.lock
si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices)

There are five call paths leading to swap_cluster_alloc_table():
swap_alloc_hibernation_slot->cluster_alloc_swap_entry
->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table

swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list
->isolate_lock_cluster->swap_cluster_alloc_table

swap_alloc_hibernation_slot->cluster_alloc_swap_entry
->swap_reclaim_full_clusters->isolate_lock_cluster
->swap_cluster_alloc_table

swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters
->isolate_lock_cluster->swap_cluster_alloc_table

swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster
->swap_cluster_alloc_table

Other paths correctly acquire the necessary locks before calling
swap_cluster_alloc_table().
But the swap_reclaim_work() path fails to acquire
percpu_swap_cluster.lock and, for non-SWP_SOLIDSTATE devices,
si->global_cluster_lock.

This patch fixes the issue by ensuring swap_reclaim_work() properly
acquires the required locks before proceeding with the swap cluster
allocation.

Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
---
 mm/swapfile.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 94af29d1de88..2e8717f84ba3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1031,7 +1031,15 @@ static void swap_reclaim_work(struct work_struct *work)
 
 	si = container_of(work, struct swap_info_struct, reclaim_work);
 
+	local_lock(&percpu_swap_cluster.lock);
+	if (!(si->flags & SWP_SOLIDSTATE))
+		spin_lock(&si->global_cluster_lock);
+
 	swap_reclaim_full_clusters(si, true);
+
+	if (!(si->flags & SWP_SOLIDSTATE))
+		spin_unlock(&si->global_cluster_lock);
+	local_unlock(&percpu_swap_cluster.lock);
 }
 
 /*
-- 
2.43.0
Re: [PATCH 1/2] mm/swap: fix missing locks in swap_reclaim_work()
Posted by YoungJun Park 1 month ago
On Fri, Mar 06, 2026 at 07:50:36PM +0800, Hui Zhu wrote:
> From: Hui Zhu <zhuhui@kylinos.cn>

Hello Hui Zhu! :)
> 
> swap_cluster_alloc_table() assumes that the caller holds the following
> locks:
> ci->lock
> percpu_swap_cluster.lock
> si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices)
> 
> There are five call paths leading to swap_cluster_alloc_table():
> swap_alloc_hibernation_slot->cluster_alloc_swap_entry
> ->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table
> 
> swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list
> ->isolate_lock_cluster->swap_cluster_alloc_table
> 
> swap_alloc_hibernation_slot->cluster_alloc_swap_entry
> ->swap_reclaim_full_clusters->isolate_lock_cluster
> ->swap_cluster_alloc_table
> 
> swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters
> ->isolate_lock_cluster->swap_cluster_alloc_table
> 
> swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster
> ->swap_cluster_alloc_table

Can isolate_lock_cluster() actually invoke swap_cluster_alloc_table()
on a full cluster? My understanding is that full clusters already have
a swap_table allocated, and swap_cluster_alloc_table() is only called
for free clusters that need a new allocation. If isolate_lock_cluster()
checks !cluster_table_is_alloced() before calling swap_cluster_alloc_table(),
wouldn't the full-cluster reclaim path skip that allocation entirely?

> Other paths correctly acquire the necessary locks before calling
> swap_cluster_alloc_table().
> But the swap_reclaim_work() path fails to acquire
> percpu_swap_cluster.lock and, for non-SWP_SOLIDSTATE devices,
> si->global_cluster_lock.

If my assumtion is right, table is not alloced so synchronization is not need.
Also, percpu_swap_cluster.lock and si->global_cluster_lock appear to protect
the percpu cluster cache and global cluster state, not the allocation
table itself as I think.

Best Regards
Youngjun Park

> This patch fixes the issue by ensuring swap_reclaim_work() properly
> acquires the required locks before proceeding with the swap cluster
> allocation.
> 
> Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
> ---
>  mm/swapfile.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 94af29d1de88..2e8717f84ba3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1031,7 +1031,15 @@ static void swap_reclaim_work(struct work_struct *work)
>  
>  	si = container_of(work, struct swap_info_struct, reclaim_work);
>  
> +	local_lock(&percpu_swap_cluster.lock);
> +	if (!(si->flags & SWP_SOLIDSTATE))
> +		spin_lock(&si->global_cluster_lock);
> +
>  	swap_reclaim_full_clusters(si, true);
> +
> +	if (!(si->flags & SWP_SOLIDSTATE))
> +		spin_unlock(&si->global_cluster_lock);
> +	local_unlock(&percpu_swap_cluster.lock);
>  }
>  
>  /*
> -- 
> 2.43.0
> 
>
Re: [PATCH 1/2] mm/swap: fix missing locks in swap_reclaim_work()
Posted by Kairui Song 1 month ago
On Fri, Mar 6, 2026 at 9:58 PM YoungJun Park <youngjun.park@lge.com> wrote:
>
> On Fri, Mar 06, 2026 at 07:50:36PM +0800, Hui Zhu wrote:
> > From: Hui Zhu <zhuhui@kylinos.cn>
>
> Hello Hui Zhu! :)
> >
> > swap_cluster_alloc_table() assumes that the caller holds the following
> > locks:
> > ci->lock
> > percpu_swap_cluster.lock
> > si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices)
> >
> > There are five call paths leading to swap_cluster_alloc_table():
> > swap_alloc_hibernation_slot->cluster_alloc_swap_entry
> > ->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table
> >
> > swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list
> > ->isolate_lock_cluster->swap_cluster_alloc_table
> >
> > swap_alloc_hibernation_slot->cluster_alloc_swap_entry
> > ->swap_reclaim_full_clusters->isolate_lock_cluster
> > ->swap_cluster_alloc_table
> >
> > swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters
> > ->isolate_lock_cluster->swap_cluster_alloc_table
> >
> > swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster
> > ->swap_cluster_alloc_table
>
> Can isolate_lock_cluster() actually invoke swap_cluster_alloc_table()
> on a full cluster? My understanding is that full clusters already have
> a swap_table allocated, and swap_cluster_alloc_table() is only called
> for free clusters that need a new allocation. If isolate_lock_cluster()
> checks !cluster_table_is_alloced() before calling swap_cluster_alloc_table(),
> wouldn't the full-cluster reclaim path skip that allocation entirely?

Hi all, thanks for the patch and review. That's correct, I don't think
a full cluster will need allocation.

Maybe we can add a VM_WARN to check if `list == si->free_cluster` when
cluster_table_is_alloced is true? Just in case and make things easier
to understand.

BTW there is already a comment in swap_cluster_alloc_table: "Only
cluster isolation from the allocator does table allocation." That
comment can be extended if that's not detailed enough.