From nobody Mon Apr 6 17:25:10 2026 Received: from mail-ot1-f46.google.com (mail-ot1-f46.google.com [209.85.210.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9544D377572 for ; Wed, 18 Mar 2026 22:29:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773873001; cv=none; b=DXeIuBlmXI3MFprys9/ORUfEQi7STCxkUGH5QeghhCyS+E8NSGBs16NZL3lTZV05pCJNpFs0poEk47WkDea8P4SAzvsWT6Ovx5P79+0nEpQkmK4uSbahOOoV4B6UPIVFdt5Dv/Cnj0ZtOw9SEiDUZmi5XPpv5vFYltcZHuVeUyA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773873001; c=relaxed/simple; bh=yko9Ai+xXzBPyBtaZDvB4WH97vS9etz/cyKxz4bc+eA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MWujrnoO7PQlxI8r+z0BSnOmmMZDqemZhrurjcV1Rb5lK2Kftkw3fVf7ycniL2W3PI5wNg+34GEIdhevYtLZCeR4qQfHSPrrzs/eMf+31wNkEEUq76ihvZGJ7JfuxLpYNfC9/1+mgjTWJOup9kCeG4nPWph9e1iOHqa2GoM7IZU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Atu4GZM3; arc=none smtp.client-ip=209.85.210.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Atu4GZM3" Received: by mail-ot1-f46.google.com with SMTP id 46e09a7af769-7d556c1a79eso377571a34.3 for ; Wed, 18 Mar 2026 15:29:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773872998; x=1774477798; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c2YShPZ4Jwzd3oBlE3QRrECRq+KlYIXhWhrhg6PnH0Y=; b=Atu4GZM3OtGcOTHstbngzWu4tZMlXNpeqRk1VJLcUbMAcS6rladqYKNcQuiGIAj8YP AiklSk4kSQBPVMGklx4JW9RpVO2pCWSTuZ1N2FaUVvsURZ48i1UkmzsXPgSIkSUGjuL0 qc83PtPLz9V4OYKpNO8s1CICR1v1I2i3nq/2+lSVVxAL+xaUh+Ntr7MMRPmlJ7aXtU5C GSBra+9fUtrYyzEqsJ4ayKLpVP++NvkLGmQR+9BSEH/D27Xz8uB/FALeHc8dl14HRrPH bhPNRfrmv33BDCWZQOKthNOWiO1mpFqsh4wB2QBJGQoDEAcP6VhipMyI9R0lYQ2K1Gu+ 8tFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773872998; x=1774477798; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=c2YShPZ4Jwzd3oBlE3QRrECRq+KlYIXhWhrhg6PnH0Y=; b=mTABxSL4jOPGxRy4bjO9gpVNA9gQyH0O9nmE5l2hZao/1db/ennqPomYkchRAOvJ6F uDbT2t1aIvPeynqyuFT54x64sTWGpfxTOet5Jgj6y87iaufvfRUzx+7aMBYbRi648jC7 PV859/MbtlS18b3iJGGXkGA/q4+LtwcLPWudmSEBpT3Oc/w85uLxn/byxo8WQhn5nSOa oIuD4PuQfBqHqYUTSOZNNSuNBriZAiRqniY2o8OpRVsRisSGwgRKMikGPJS9ZEdn3Iky HTuMsR/4fGtlYXILLHrnI4BrVa550gnDDhlFQ7dB8mY/l+N6V5GZM8Ab/sYpojrOssrW e98A== X-Forwarded-Encrypted: i=1; AJvYcCUpGym1JczIzPgdy19Epz7Q1+FyIGlNcT4zZCzoGqUcpZVZAT11fonr6mtUNWB89VVw1KFxkJh7s2ZGfs8=@vger.kernel.org X-Gm-Message-State: AOJu0YxVioKWLldX/KXgNRbjCZZNZ0g/lRpxR/KxyhpUwnkbGzqa+sPC HA9ViJl7nQD0/qy/IqmsWYAFogj29xrPbnu6x4Def+YEcF91YfY5hRj8 X-Gm-Gg: ATEYQzyto9iGzplDddjAnNrMpWyjGghlQM3pznpRb1MR/Ltj9ml9fH3tXfDiwtu3LHp ek0WWMsuDDaU7SCEa8JWz2Tv/Ve9Up1ARTKYxMWq9Rehat4l+JLyHSYM25lJRKovjCypp29ZggM eYY426hxECFv76ZQGj3g+8NlVZe5bLQ2+diE8UfrXDCJDJf6iKKLnDKKXinNkq26SLGV77wZvCj /p6+ssFRlmCaSRQl8yFSjcyji/sGrk1XljyyXcUYRRGW+97OoOtUmHDOEyD5oL1eZR12IFqgUFg fpFy+49Tq90zeR+9XqFNBoA6WQBBfQDl66a21jokPNZh47y8J51BpumlaUeBvcZqpPmEyaDoc0X EPuBkhHyT+ixYwqin7IsclGIljRJwWn20XMzPIIpWY7+JCbmMkRRP57rwvi9iOY/cmBVTZWpBMN KIkfAdxVFOnA4PlxuB6xKVspal5VYYB2+8guK/nXxdaUqKAA== X-Received: by 2002:a05:6830:3148:b0:7d7:cda1:36b with SMTP id 46e09a7af769-7d7cda10537mr2804540a34.20.1773872998497; Wed, 18 Mar 2026 15:29:58 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:4e::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7d7c9b39e11sm2988989a34.18.2026.03.18.15.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Mar 2026 15:29:58 -0700 (PDT) From: Nhat Pham To: kasong@tencent.com Cc: Liam.Howlett@oracle.com, akpm@linux-foundation.org, apopple@nvidia.com, axelrasmussen@google.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, bhe@redhat.com, byungchul@sk.com, cgroups@vger.kernel.org, chengming.zhou@linux.dev, chrisl@kernel.org, corbet@lwn.net, david@kernel.org, dev.jain@arm.com, gourry@gourry.net, hannes@cmpxchg.org, hughd@google.com, jannh@google.com, joshua.hahnjy@gmail.com, lance.yang@linux.dev, lenb@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-pm@vger.kernel.org, lorenzo.stoakes@oracle.com, matthew.brost@intel.com, mhocko@suse.com, muchun.song@linux.dev, npache@redhat.com, nphamcs@gmail.com, pavel@kernel.org, peterx@redhat.com, peterz@infradead.org, pfalcato@suse.de, rafael@kernel.org, rakie.kim@sk.com, roman.gushchin@linux.dev, rppt@kernel.org, ryan.roberts@arm.com, shakeel.butt@linux.dev, shikemeng@huaweicloud.com, surenb@google.com, tglx@kernel.org, vbabka@suse.cz, weixugc@google.com, ying.huang@linux.alibaba.com, yosry.ahmed@linux.dev, yuanchu@google.com, zhengqi.arch@bytedance.com, ziy@nvidia.com, kernel-team@meta.com, riel@surriel.com Subject: [PATCH v4 03/21] mm: swap: add an abstract API for locking out swapoff Date: Wed, 18 Mar 2026 15:29:34 -0700 Message-ID: <20260318222953.441758-4-nphamcs@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260318222953.441758-1-nphamcs@gmail.com> References: <20260318222953.441758-1-nphamcs@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, we get a reference to the backing swap device in order to prevent swapoff from freeing the metadata of a swap entry. This does not make sense in the new virtual swap design, especially after the swap backends are decoupled - a swap entry might not have any backing swap device at all, and its backend might change at any time during its lifetime. In preparation for this, abstract away the swapoff locking out behavior into a generic API. Signed-off-by: Nhat Pham --- include/linux/swap.h | 17 +++++++++++++++++ mm/memory.c | 13 +++++++------ mm/mincore.c | 15 +++------------ mm/shmem.c | 12 ++++++------ mm/swap_state.c | 14 +++++++------- mm/userfaultfd.c | 15 +++++++++------ mm/zswap.c | 5 ++--- 7 files changed, 51 insertions(+), 40 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index aa29d8ac542d1..3da637b218baf 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -659,5 +659,22 @@ static inline bool mem_cgroup_swap_full(struct folio *= folio) } #endif =20 +static inline bool tryget_swap_entry(swp_entry_t entry, + struct swap_info_struct **sip) +{ + struct swap_info_struct *si =3D get_swap_device(entry); + + if (sip) + *sip =3D si; + + return si; +} + +static inline void put_swap_entry(swp_entry_t entry, + struct swap_info_struct *si) +{ + put_swap_device(si); +} + #endif /* __KERNEL__*/ #endif /* _LINUX_SWAP_H */ diff --git a/mm/memory.c b/mm/memory.c index da360a6eb8a48..90031f833f52e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4630,6 +4630,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) struct swap_info_struct *si =3D NULL; rmap_t rmap_flags =3D RMAP_NONE; bool need_clear_cache =3D false; + bool swapoff_locked =3D false; bool exclusive =3D false; softleaf_t entry; pte_t pte; @@ -4698,8 +4699,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } =20 /* Prevent swapoff from happening to us. */ - si =3D get_swap_device(entry); - if (unlikely(!si)) + swapoff_locked =3D tryget_swap_entry(entry, &si); + if (unlikely(!swapoff_locked)) goto out; =20 folio =3D swap_cache_get_folio(entry); @@ -5047,8 +5048,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (waitqueue_active(&swapcache_wq)) wake_up(&swapcache_wq); } - if (si) - put_swap_device(si); + if (swapoff_locked) + put_swap_entry(entry, si); return ret; out_nomap: if (vmf->pte) @@ -5066,8 +5067,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (waitqueue_active(&swapcache_wq)) wake_up(&swapcache_wq); } - if (si) - put_swap_device(si); + if (swapoff_locked) + put_swap_entry(entry, si); return ret; } =20 diff --git a/mm/mincore.c b/mm/mincore.c index e5d13eea92347..f3eb771249d67 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -77,19 +77,10 @@ static unsigned char mincore_swap(swp_entry_t entry, bo= ol shmem) if (!softleaf_is_swap(entry)) return !shmem; =20 - /* - * Shmem mapping lookup is lockless, so we need to grab the swap - * device. mincore page table walk locks the PTL, and the swap - * device is stable, avoid touching the si for better performance. - */ - if (shmem) { - si =3D get_swap_device(entry); - if (!si) - return 0; - } + if (!tryget_swap_entry(entry, &si)) + return 0; folio =3D swap_cache_get_folio(entry); - if (shmem) - put_swap_device(si); + put_swap_entry(entry, si); /* The swap cache space contains either folio, shadow or NULL */ if (folio && !xa_is_value(folio)) { present =3D folio_test_uptodate(folio); diff --git a/mm/shmem.c b/mm/shmem.c index 1db97ef2d14eb..b40be22fa5f09 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2307,7 +2307,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, softleaf_t index_entry; struct swap_info_struct *si; struct folio *folio =3D NULL; - bool skip_swapcache =3D false; + bool swapoff_locked, skip_swapcache =3D false; int error, nr_pages, order; pgoff_t offset; =20 @@ -2319,16 +2319,16 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, if (softleaf_is_poison_marker(index_entry)) return -EIO; =20 - si =3D get_swap_device(index_entry); + swapoff_locked =3D tryget_swap_entry(index_entry, &si); order =3D shmem_confirm_swap(mapping, index, index_entry); - if (unlikely(!si)) { + if (unlikely(!swapoff_locked)) { if (order < 0) return -EEXIST; else return -EINVAL; } if (unlikely(order < 0)) { - put_swap_device(si); + put_swap_entry(index_entry, si); return -EEXIST; } =20 @@ -2448,7 +2448,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, } folio_mark_dirty(folio); swap_free_nr(swap, nr_pages); - put_swap_device(si); + put_swap_entry(swap, si); =20 *foliop =3D folio; return 0; @@ -2466,7 +2466,7 @@ static int shmem_swapin_folio(struct inode *inode, pg= off_t index, swapcache_clear(si, folio->swap, folio_nr_pages(folio)); if (folio) folio_put(folio); - put_swap_device(si); + put_swap_entry(swap, si); =20 return error; } diff --git a/mm/swap_state.c b/mm/swap_state.c index 34c9d9b243a74..bece18eb540fa 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -538,8 +538,7 @@ struct folio *read_swap_cache_async(swp_entry_t entry, = gfp_t gfp_mask, pgoff_t ilx; struct folio *folio; =20 - si =3D get_swap_device(entry); - if (!si) + if (!tryget_swap_entry(entry, &si)) return NULL; =20 mpol =3D get_vma_policy(vma, addr, 0, &ilx); @@ -550,7 +549,7 @@ struct folio *read_swap_cache_async(swp_entry_t entry, = gfp_t gfp_mask, if (page_allocated) swap_read_folio(folio, plug); =20 - put_swap_device(si); + put_swap_entry(entry, si); return folio; } =20 @@ -763,6 +762,7 @@ static struct folio *swap_vma_readahead(swp_entry_t tar= g_entry, gfp_t gfp_mask, for (addr =3D start; addr < end; ilx++, addr +=3D PAGE_SIZE) { struct swap_info_struct *si =3D NULL; softleaf_t entry; + bool swapoff_locked =3D false; =20 if (!pte++) { pte =3D pte_offset_map(vmf->pmd, addr); @@ -781,14 +781,14 @@ static struct folio *swap_vma_readahead(swp_entry_t t= arg_entry, gfp_t gfp_mask, * holding a reference to, try to grab a reference, or skip. */ if (swp_type(entry) !=3D swp_type(targ_entry)) { - si =3D get_swap_device(entry); - if (!si) + swapoff_locked =3D tryget_swap_entry(entry, &si); + if (!swapoff_locked) continue; } folio =3D __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); - if (si) - put_swap_device(si); + if (swapoff_locked) + put_swap_entry(entry, si); if (!folio) continue; if (page_allocated) { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e6dfd5f28acd7..25f89eba0438c 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1262,9 +1262,11 @@ static long move_pages_ptes(struct mm_struct *mm, pm= d_t *dst_pmd, pmd_t *src_pmd pte_t *dst_pte =3D NULL; pmd_t dummy_pmdval; pmd_t dst_pmdval; + softleaf_t entry; struct folio *src_folio =3D NULL; struct mmu_notifier_range range; long ret =3D 0; + bool swapoff_locked =3D false; =20 mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, src_addr, src_addr + len); @@ -1429,7 +1431,7 @@ static long move_pages_ptes(struct mm_struct *mm, pmd= _t *dst_pmd, pmd_t *src_pmd len); } else { /* !pte_present() */ struct folio *folio =3D NULL; - const softleaf_t entry =3D softleaf_from_pte(orig_src_pte); + entry =3D softleaf_from_pte(orig_src_pte); =20 if (softleaf_is_migration(entry)) { pte_unmap(src_pte); @@ -1449,8 +1451,8 @@ static long move_pages_ptes(struct mm_struct *mm, pmd= _t *dst_pmd, pmd_t *src_pmd goto out; } =20 - si =3D get_swap_device(entry); - if (unlikely(!si)) { + swapoff_locked =3D tryget_swap_entry(entry, &si); + if (unlikely(!swapoff_locked)) { ret =3D -EAGAIN; goto out; } @@ -1480,8 +1482,9 @@ static long move_pages_ptes(struct mm_struct *mm, pmd= _t *dst_pmd, pmd_t *src_pmd pte_unmap(src_pte); pte_unmap(dst_pte); src_pte =3D dst_pte =3D NULL; - put_swap_device(si); + put_swap_entry(entry, si); si =3D NULL; + swapoff_locked =3D false; /* now we can block and wait */ folio_lock(src_folio); goto retry; @@ -1507,8 +1510,8 @@ static long move_pages_ptes(struct mm_struct *mm, pmd= _t *dst_pmd, pmd_t *src_pmd if (dst_pte) pte_unmap(dst_pte); mmu_notifier_invalidate_range_end(&range); - if (si) - put_swap_device(si); + if (swapoff_locked) + put_swap_entry(entry, si); =20 return ret; } diff --git a/mm/zswap.c b/mm/zswap.c index ac9b7a60736bc..315e4d0d08311 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1009,14 +1009,13 @@ static int zswap_writeback_entry(struct zswap_entry= *entry, int ret =3D 0; =20 /* try to allocate swap cache folio */ - si =3D get_swap_device(swpentry); - if (!si) + if (!tryget_swap_entry(swpentry, &si)) return -EEXIST; =20 mpol =3D get_task_policy(current); folio =3D __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, NO_INTERLEAVE_INDEX, &folio_was_allocated, true); - put_swap_device(si); + put_swap_entry(swpentry, si); if (!folio) return -ENOMEM; =20 --=20 2.52.0