From nobody Tue Apr 7 18:33:07 2026 Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BE062EA731 for ; Thu, 12 Mar 2026 02:30:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773282644; cv=none; b=CoCryMkrrCT+dYs1VAdqlqncUqpiewI6UFg6cx6/+F9coGCJAbfL8voDTcPBb7VKXlf7jX1eDS6F9GCEM3f9WcMSVKxZWISrJvFNjOwjEBuQE9m2Btxe2p4qRbB3Zo/qynOCLMoCapGMf2nivSGHCHvszDx2JoAC82hofBLmAvA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773282644; c=relaxed/simple; bh=QElLQ+1HJTKSxWLL0p7zgF51kbw6ZE0MNCdvZUvBKLc=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=jKCtdRS2tKtjJ5MA/qzInxh+jStiyVGr4ijyAPVYQiB2xC3I/eS2juEnFRErfCKgVzgH/Y+7WLHc/2zX5+wQ3t6kuINrKR1xGDYuMgVP1KaHf6c9p52gpbMG9RwljG+35ZPgnKKzko4AwhdQjm+PVrALgYQlgwVKouTqG4CY/Lc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=YuxXe+Gp; arc=none smtp.client-ip=95.215.58.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="YuxXe+Gp" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1773282641; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Qgm+4fBfeILAFrGgPSJQdkYHwRBAjotjsIQctASgY6U=; b=YuxXe+Gp80E4cwYpKmKe4p781rH+bTy7rSce3M6WPYkECWPC5k/3RYB7mXjIBEhJ9CFAxO s65Pb3EW5O6uN7yMZFcQBfWZb8loGQyAYOZQj2RH5D2kWWXF71+Ka0iGFRK6nu9nbW0elE suJ1NJDQtqjkjm9kMgRzSHmzGN56z1o= From: Hui Zhu To: Andrew Morton , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , YoungJun Park , Geliang Tang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hui Zhu Subject: [PATCH v5] mm/swap: strengthen locking assertions and invariants in cluster allocation Date: Thu, 12 Mar 2026 10:30:24 +0800 Message-ID: <20260312023024.903143-1-hui.zhu@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Hui Zhu The swap_cluster_alloc_table() function requires several locks to be held by its callers: ci->lock, the per-CPU swap_cluster lock, and, for non-solid-state devices (non-SWP_SOLIDSTATE), the si->global_cluster_lock. While most call paths (e.g., via cluster_alloc_swap_entry() or alloc_swap_scan_list()) correctly acquire these locks before invocation, the path through swap_reclaim_work() -> swap_reclaim_full_clusters() -> isolate_lock_cluster() is distinct. This path operates exclusively on si->full_clusters, where the swap allocation tables are guaranteed to be already allocated. Consequently, isolate_lock_cluster() should never trigger a call to swap_cluster_alloc_table() for these clusters. Strengthen the locking and state assertions to formalize these invariants: 1. Add a lockdep_assert_held() for si->global_cluster_lock in swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices. 2. Reorder existing lockdep assertions in swap_cluster_alloc_table() to match the actual lock acquisition order (per-CPU lock, then global lock, then cluster lock). 3. Add a VM_WARN_ON_ONCE() in isolate_lock_cluster() to ensure that table allocations are only attempted for clusters being isolated from the free list. Attempting to allocate a table for a cluster from other lists (like the full list during reclaim) indicates a violation of subsystem invariants. These changes ensure locking consistency and help catch potential synchronization or logic issues during development. Changelog: v5: According to the comments of Chris Li, add the initialization code of flags. v4: According to the comments of Barry Song, remove redundant comment. v3: According to the comments of Kairui Song, squash patches and fix logic bug in isolate_lock_cluster() where flags were cleared before check. v2: According to the comments of YoungJun Park, Kairui Song and Chris Li, change acquire locks in swap_reclaim_work() to adds a VM_WARN_ON in isolate_lock_cluster(). According to the comments of YoungJun Park, add code in patch 2 to Change the order of lockdep_assert_held() to match the actual lock acquisition order. Reviewed-by: Youngjun Park Reviewed-by: Barry Song Acked-by: Chris Li Acked-by: Geliang Tang Signed-off-by: Hui Zhu --- mm/swapfile.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 94af29d1de88..de1c2203436e 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -476,8 +476,10 @@ swap_cluster_alloc_table(struct swap_info_struct *si, * Only cluster isolation from the allocator does table allocation. * Swap allocator uses percpu clusters and holds the local lock. */ - lockdep_assert_held(&ci->lock); lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock); + if (!(si->flags & SWP_SOLIDSTATE)) + lockdep_assert_held(&si->global_cluster_lock); + lockdep_assert_held(&ci->lock); =20 /* The cluster must be free and was just isolated from the free list. */ VM_WARN_ON_ONCE(ci->flags || !cluster_is_empty(ci)); @@ -577,6 +579,7 @@ static struct swap_cluster_info *isolate_lock_cluster( struct swap_info_struct *si, struct list_head *list) { struct swap_cluster_info *ci, *found =3D NULL; + u8 flags =3D CLUSTER_FLAG_NONE; =20 spin_lock(&si->lock); list_for_each_entry(ci, list, list) { @@ -589,6 +592,7 @@ static struct swap_cluster_info *isolate_lock_cluster( ci->flags !=3D CLUSTER_FLAG_FULL); =20 list_del(&ci->list); + flags =3D ci->flags; ci->flags =3D CLUSTER_FLAG_NONE; found =3D ci; break; @@ -597,6 +601,7 @@ static struct swap_cluster_info *isolate_lock_cluster( =20 if (found && !cluster_table_is_alloced(found)) { /* Only an empty free cluster's swap table can be freed. */ + VM_WARN_ON_ONCE(flags !=3D CLUSTER_FLAG_FREE); VM_WARN_ON_ONCE(list !=3D &si->free_clusters); VM_WARN_ON_ONCE(!cluster_is_empty(found)); return swap_cluster_alloc_table(si, found); --=20 2.43.0