From nobody Sun Apr 5 13:22:51 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0D29324B1F; Thu, 19 Feb 2026 23:42:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771544527; cv=none; b=Z5iYAISCH5rWWoiOEMGSNnBPHYZefIsiMPh6kxUZIsbheuzZ6LIgw/eo284v9QBpFLsmiAXS+Va6gyTJg5bRBfXGjeSWR32MkukDK7syzOd/yLHireEh8zN+anchzLyeze4bhfzYMRwKTFkhWYDHL5oAjIciHQIlyFA0DvkpOlg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771544527; c=relaxed/simple; bh=pjJUa1fFgyrsyPyrCrROjkC/rrvPsGfIR4mvGiZIF7g=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VMmP99GKwcZ/5Na2JqsLN6KHWwVsgZmPEYn5WJV9vcBgz95nBBQmV6dRBfxV97pZRxWqzsRrOavDSAPjmh4sryLTnS2K9fgbDr4xcAgCWTpkW/NaNDNY7RZHAYloAwR5GEq3rGb7anAJNp7i+vjelL8wNnK49ab9oJHT/HxeNmk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TS1V7COW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TS1V7COW" Received: by smtp.kernel.org (Postfix) with ESMTPS id 7D5EDC2BC86; Thu, 19 Feb 2026 23:42:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771544527; bh=pjJUa1fFgyrsyPyrCrROjkC/rrvPsGfIR4mvGiZIF7g=; h=From:Date:Subject:References:In-Reply-To:To:Cc:Reply-To:From; b=TS1V7COWhmNelMcnOj04+7XD92TCDA7yhx9HvwDuofIcEQ80TMyO6/xJAHYc87tMB AROmFVzJHpPrusJmAy+qS4PJYmU9BxcxyOxDCsm4ytscHPWKcW+aPUyZ7JRz3Pbswd LZSsS5q02J0+m8ZfDGUI7MelY8cah/qt2h9nLyqj8ZXYWGWwL/Fo57p+K4x2m2dRe7 RupvSECyeo4u8Kyybzsw774xgasmiUdfhubbCbh+q6TsabPZyDK6492mK+p0PdCh6E ZZmvhs6pb5B8GlT+MkDiADIbICYSBjz1g18XdjLubPF4D1w3Vd/+uCmA/zFM5/eF5E GQsSnnMG8pXrw== Received: from aws-us-west-2-korg-lkml-1.web.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71208C531EA; Thu, 19 Feb 2026 23:42:07 +0000 (UTC) From: Kairui Song via B4 Relay Date: Fri, 20 Feb 2026 07:42:06 +0800 Subject: [PATCH RFC 05/15] mm, swap: unify large folio allocation Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260220-swap-table-p4-v1-5-104795d19815@tencent.com> References: <20260220-swap-table-p4-v1-0-104795d19815@tencent.com> In-Reply-To: <20260220-swap-table-p4-v1-0-104795d19815@tencent.com> To: linux-mm@kvack.org Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Zi Yan , Baolin Wang , Barry Song , Hugh Dickins , Chris Li , Kemeng Shi , Nhat Pham , Baoquan He , Johannes Weiner , Yosry Ahmed , Youngjun Park , Chengming Zhou , Roman Gushchin , Shakeel Butt , Muchun Song , Qi Zheng , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Kairui Song X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1771544524; l=19505; i=kasong@tencent.com; s=kasong-sign-tencent; h=from:subject:message-id; bh=zaL8E2c3BSd0dUGVRVd03OjIP9BByn+whCdDpg5s+H4=; b=uf0UfWfusmCfdSvsmKFiWhO5/Si66GCuWlnzS5Feoymfb9ZlBpLHWNuzNs+2rzeoeNbJFkp0+ KY+1cqqyWbsBVWoCMt1omqzZJjKfy7VzEN6mMusRquUPs/Jq7ZDBLB4 X-Developer-Key: i=kasong@tencent.com; a=ed25519; pk=kCdoBuwrYph+KrkJnrr7Sm1pwwhGDdZKcKrqiK8Y1mI= X-Endpoint-Received: by B4 Relay for kasong@tencent.com/kasong-sign-tencent with auth_id=562 X-Original-From: Kairui Song Reply-To: kasong@tencent.com From: Kairui Song Now the large order allocation is supported in swap cache, making both anon and shmem use this instead of implementing their own different method for doing so. Signed-off-by: Kairui Song --- mm/memory.c | 77 +++++--------------------- mm/shmem.c | 94 ++++++++------------------------ mm/swap.h | 30 ++--------- mm/swap_state.c | 163 ++++++++++++----------------------------------------= ---- mm/swapfile.c | 3 +- 5 files changed, 76 insertions(+), 291 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 21bf2517fbce..e58f976508b3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4520,26 +4520,6 @@ static vm_fault_t handle_pte_marker(struct vm_fault = *vmf) return VM_FAULT_SIGBUS; } =20 -static struct folio *__alloc_swap_folio(struct vm_fault *vmf) -{ - struct vm_area_struct *vma =3D vmf->vma; - struct folio *folio; - softleaf_t entry; - - folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vmf->address); - if (!folio) - return NULL; - - entry =3D softleaf_from_pte(vmf->orig_pte); - if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, - GFP_KERNEL, entry)) { - folio_put(folio); - return NULL; - } - - return folio; -} - #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * Check if the PTEs within a range are contiguous swap entries @@ -4569,8 +4549,6 @@ static bool can_swapin_thp(struct vm_fault *vmf, pte_= t *ptep, int nr_pages) */ if (unlikely(swap_zeromap_batch(entry, nr_pages, NULL) !=3D nr_pages)) return false; - if (unlikely(non_swapcache_batch(entry, nr_pages) !=3D nr_pages)) - return false; =20 return true; } @@ -4598,16 +4576,14 @@ static inline unsigned long thp_swap_suitable_order= s(pgoff_t swp_offset, return orders; } =20 -static struct folio *alloc_swap_folio(struct vm_fault *vmf) +static unsigned long thp_swapin_suiltable_orders(struct vm_fault *vmf) { struct vm_area_struct *vma =3D vmf->vma; unsigned long orders; - struct folio *folio; unsigned long addr; softleaf_t entry; spinlock_t *ptl; pte_t *pte; - gfp_t gfp; int order; =20 /* @@ -4615,7 +4591,7 @@ static struct folio *alloc_swap_folio(struct vm_fault= *vmf) * maintain the uffd semantics. */ if (unlikely(userfaultfd_armed(vma))) - goto fallback; + return 0; =20 /* * A large swapped out folio could be partially or fully in zswap. We @@ -4623,7 +4599,7 @@ static struct folio *alloc_swap_folio(struct vm_fault= *vmf) * folio. */ if (!zswap_never_enabled()) - goto fallback; + return 0; =20 entry =3D softleaf_from_pte(vmf->orig_pte); /* @@ -4637,12 +4613,12 @@ static struct folio *alloc_swap_folio(struct vm_fau= lt *vmf) vmf->address, orders); =20 if (!orders) - goto fallback; + return 0; =20 pte =3D pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address & PMD_MASK, &ptl); if (unlikely(!pte)) - goto fallback; + return 0; =20 /* * For do_swap_page, find the highest order where the aligned range is @@ -4658,29 +4634,12 @@ static struct folio *alloc_swap_folio(struct vm_fau= lt *vmf) =20 pte_unmap_unlock(pte, ptl); =20 - /* Try allocating the highest of the remaining orders. */ - gfp =3D vma_thp_gfp_mask(vma); - while (orders) { - addr =3D ALIGN_DOWN(vmf->address, PAGE_SIZE << order); - folio =3D vma_alloc_folio(gfp, order, vma, addr); - if (folio) { - if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, - gfp, entry)) - return folio; - count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK_CHARGE); - folio_put(folio); - } - count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK); - order =3D next_order(&orders, order); - } - -fallback: - return __alloc_swap_folio(vmf); + return orders; } #else /* !CONFIG_TRANSPARENT_HUGEPAGE */ -static struct folio *alloc_swap_folio(struct vm_fault *vmf) +static unsigned long thp_swapin_suiltable_orders(struct vm_fault *vmf) { - return __alloc_swap_folio(vmf); + return 0; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 @@ -4785,21 +4744,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (folio) swap_update_readahead(folio, vma, vmf->address); if (!folio) { - if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { - folio =3D alloc_swap_folio(vmf); - if (folio) { - /* - * folio is charged, so swapin can only fail due - * to raced swapin and return NULL. - */ - swapcache =3D swapin_folio(entry, folio); - if (swapcache !=3D folio) - folio_put(folio); - folio =3D swapcache; - } - } else { + /* Swapin bypass readahead for SWP_SYNCHRONOUS_IO devices */ + if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) + folio =3D swapin_entry(entry, GFP_HIGHUSER_MOVABLE, + thp_swapin_suiltable_orders(vmf), + vmf, NULL, 0); + else folio =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vmf); - } =20 if (!folio) { /* diff --git a/mm/shmem.c b/mm/shmem.c index 9f054b5aae8e..0a19ac82ec77 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -159,7 +159,7 @@ static unsigned long shmem_default_max_inodes(void) =20 static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct folio **foliop, enum sgp_type sgp, gfp_t gfp, - struct vm_area_struct *vma, vm_fault_t *fault_type); + struct vm_fault *vmf, vm_fault_t *fault_type); =20 static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb) { @@ -2014,68 +2014,24 @@ static struct folio *shmem_alloc_and_add_folio(stru= ct vm_fault *vmf, } =20 static struct folio *shmem_swap_alloc_folio(struct inode *inode, - struct vm_area_struct *vma, pgoff_t index, + struct vm_fault *vmf, pgoff_t index, swp_entry_t entry, int order, gfp_t gfp) { + pgoff_t ilx; + struct folio *folio; + struct mempolicy *mpol; + unsigned long orders =3D BIT(order); struct shmem_inode_info *info =3D SHMEM_I(inode); - struct folio *new, *swapcache; - int nr_pages =3D 1 << order; - gfp_t alloc_gfp =3D gfp; - - if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - if (WARN_ON_ONCE(order)) - return ERR_PTR(-EINVAL); - } else if (order) { - /* - * If uffd is active for the vma, we need per-page fault - * fidelity to maintain the uffd semantics, then fallback - * to swapin order-0 folio, as well as for zswap case. - * Any existing sub folio in the swap cache also blocks - * mTHP swapin. - */ - if ((vma && unlikely(userfaultfd_armed(vma))) || - !zswap_never_enabled() || - non_swapcache_batch(entry, nr_pages) !=3D nr_pages) - goto fallback; =20 - alloc_gfp =3D thp_limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); - } -retry: - new =3D shmem_alloc_folio(alloc_gfp, order, info, index); - if (!new) { - new =3D ERR_PTR(-ENOMEM); - goto fallback; - } + if ((vmf && unlikely(userfaultfd_armed(vmf->vma))) || + !zswap_never_enabled()) + orders =3D 0; =20 - if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL, - alloc_gfp, entry)) { - folio_put(new); - new =3D ERR_PTR(-ENOMEM); - goto fallback; - } + mpol =3D shmem_get_pgoff_policy(info, index, order, &ilx); + folio =3D swapin_entry(entry, gfp, orders, vmf, mpol, ilx); + mpol_cond_put(mpol); =20 - swapcache =3D swapin_folio(entry, new); - if (swapcache !=3D new) { - folio_put(new); - if (!swapcache) { - /* - * The new folio is charged already, swapin can - * only fail due to another raced swapin. - */ - new =3D ERR_PTR(-EEXIST); - goto fallback; - } - } - return swapcache; -fallback: - /* Order 0 swapin failed, nothing to fallback to, abort */ - if (!order) - return new; - entry.val +=3D index - round_down(index, nr_pages); - alloc_gfp =3D gfp; - nr_pages =3D 1; - order =3D 0; - goto retry; + return folio; } =20 /* @@ -2262,11 +2218,12 @@ static int shmem_split_large_entry(struct inode *in= ode, pgoff_t index, */ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct folio **foliop, enum sgp_type sgp, - gfp_t gfp, struct vm_area_struct *vma, + gfp_t gfp, struct vm_fault *vmf, vm_fault_t *fault_type) { struct address_space *mapping =3D inode->i_mapping; - struct mm_struct *fault_mm =3D vma ? vma->vm_mm : NULL; + struct vm_area_struct *vma =3D vmf ? vmf->vma : NULL; + struct mm_struct *fault_mm =3D vmf ? vmf->vma->vm_mm : NULL; struct shmem_inode_info *info =3D SHMEM_I(inode); swp_entry_t swap; softleaf_t index_entry; @@ -2307,20 +2264,15 @@ static int shmem_swapin_folio(struct inode *inode, = pgoff_t index, if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { /* Direct swapin skipping swap cache & readahead */ - folio =3D shmem_swap_alloc_folio(inode, vma, index, - index_entry, order, gfp); - if (IS_ERR(folio)) { - error =3D PTR_ERR(folio); - folio =3D NULL; - goto failed; - } + folio =3D shmem_swap_alloc_folio(inode, vmf, index, + swap, order, gfp); } else { /* Cached swapin only supports order 0 folio */ folio =3D shmem_swapin_cluster(swap, gfp, info, index); - if (!folio) { - error =3D -ENOMEM; - goto failed; - } + } + if (!folio) { + error =3D -ENOMEM; + goto failed; } if (fault_type) { *fault_type |=3D VM_FAULT_MAJOR; @@ -2468,7 +2420,7 @@ static int shmem_get_folio_gfp(struct inode *inode, p= goff_t index, =20 if (xa_is_value(folio)) { error =3D shmem_swapin_folio(inode, index, &folio, - sgp, gfp, vma, fault_type); + sgp, gfp, vmf, fault_type); if (error =3D=3D -EEXIST) goto repeat; =20 diff --git a/mm/swap.h b/mm/swap.h index 6774af10a943..80c2f1bf7a57 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -300,7 +300,8 @@ struct folio *swap_cluster_readahead(swp_entry_t entry,= gfp_t flag, struct mempolicy *mpol, pgoff_t ilx); struct folio *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); -struct folio *swapin_folio(swp_entry_t entry, struct folio *folio); +struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, unsigned long or= ders, + struct vm_fault *vmf, struct mempolicy *mpol, pgoff_t ilx); void swap_update_readahead(struct folio *folio, struct vm_area_struct *vma, unsigned long addr); =20 @@ -334,24 +335,6 @@ static inline int swap_zeromap_batch(swp_entry_t entry= , int max_nr, return find_next_bit(sis->zeromap, end, start) - start; } =20 -static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) -{ - int i; - - /* - * While allocating a large folio and doing mTHP swapin, we need to - * ensure all entries are not cached, otherwise, the mTHP folio will - * be in conflict with the folio in swap cache. - */ - for (i =3D 0; i < max_nr; i++) { - if (swap_cache_has_folio(entry)) - return i; - entry.val++; - } - - return i; -} - #else /* CONFIG_SWAP */ struct swap_iocb; static inline struct swap_cluster_info *swap_cluster_lock( @@ -433,7 +416,9 @@ static inline struct folio *swapin_readahead(swp_entry_= t swp, gfp_t gfp_mask, return NULL; } =20 -static inline struct folio *swapin_folio(swp_entry_t entry, struct folio *= folio) +static inline struct folio *swapin_entry( + swp_entry_t entry, gfp_t flag, unsigned long orders, + struct vm_fault *vmf, struct mempolicy *mpol, pgoff_t ilx) { return NULL; } @@ -493,10 +478,5 @@ static inline int swap_zeromap_batch(swp_entry_t entry= , int max_nr, { return 0; } - -static inline int non_swapcache_batch(swp_entry_t entry, int max_nr) -{ - return 0; -} #endif /* CONFIG_SWAP */ #endif /* _MM_SWAP_H */ diff --git a/mm/swap_state.c b/mm/swap_state.c index e32b06a1f229..0a2a4e084cf2 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -199,43 +199,6 @@ void __swap_cache_add_folio(struct swap_cluster_info *= ci, lruvec_stat_mod_folio(folio, NR_SWAPCACHE, nr_pages); } =20 -/** - * swap_cache_add_folio - Add a folio into the swap cache. - * @folio: The folio to be added. - * @entry: The swap entry corresponding to the folio. - * @gfp: gfp_mask for XArray node allocation. - * @shadowp: If a shadow is found, return the shadow. - * - * Context: Caller must ensure @entry is valid and protect the swap device - * with reference count or locks. - */ -static int swap_cache_add_folio(struct folio *folio, swp_entry_t entry, - void **shadowp) -{ - int err; - void *shadow =3D NULL; - unsigned int ci_off; - struct swap_info_struct *si; - struct swap_cluster_info *ci; - unsigned long nr_pages =3D folio_nr_pages(folio); - - si =3D __swap_entry_to_info(entry); - ci =3D swap_cluster_lock(si, swp_offset(entry)); - ci_off =3D swp_cluster_offset(entry); - err =3D __swap_cache_check_batch(ci, ci_off, ci_off, nr_pages, &shadow); - if (err) { - swap_cluster_unlock(ci); - return err; - } - - __swap_cache_add_folio(ci, folio, entry); - swap_cluster_unlock(ci); - if (shadowp) - *shadowp =3D shadow; - - return 0; -} - static struct folio *__swap_cache_alloc(struct swap_cluster_info *ci, swp_entry_t targ_entry, gfp_t gfp, unsigned int order, struct vm_fault *vmf, @@ -328,30 +291,28 @@ struct folio *swap_cache_alloc_folio(swp_entry_t targ= _entry, gfp_t gfp_mask, unsigned long orders, struct vm_fault *vmf, struct mempolicy *mpol, pgoff_t ilx) { - int order; + int order, err; struct folio *folio; struct swap_cluster_info *ci; =20 + /* Always allow order 0 so swap won't fail under pressure. */ + order =3D orders ? highest_order(orders |=3D BIT(0)) : 0; ci =3D __swap_entry_to_cluster(targ_entry); - order =3D orders ? highest_order(orders) : 0; for (;;) { folio =3D __swap_cache_alloc(ci, targ_entry, gfp_mask, order, vmf, mpol, ilx); if (!IS_ERR(folio)) return folio; - if (PTR_ERR(folio) =3D=3D -EAGAIN) + err =3D PTR_ERR(folio); + if (err =3D=3D -EAGAIN) continue; - /* Only -EBUSY means we should fallback and retry. */ - if (PTR_ERR(folio) !=3D -EBUSY) - return folio; + if (!order || (err !=3D -EBUSY && err !=3D -ENOMEM)) + break; count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK); order =3D next_order(&orders, order); - if (!orders) - break; } - /* Should never reach here, order 0 should not fail with -EBUSY. */ - WARN_ON_ONCE(1); - return ERR_PTR(-EINVAL); + + return ERR_PTR(err); } =20 /** @@ -584,51 +545,6 @@ void swap_update_readahead(struct folio *folio, struct= vm_area_struct *vma, } } =20 -/** - * __swap_cache_prepare_and_add - Prepare the folio and add it to swap cac= he. - * @entry: swap entry to be bound to the folio. - * @folio: folio to be added. - * @gfp: memory allocation flags for charge, can be 0 if @charged if true. - * @charged: if the folio is already charged. - * - * Update the swap_map and add folio as swap cache, typically before swapi= n. - * All swap slots covered by the folio must have a non-zero swap count. - * - * Context: Caller must protect the swap device with reference count or lo= cks. - * Return: 0 if success, error code if failed. - */ -static int __swap_cache_prepare_and_add(swp_entry_t entry, - struct folio *folio, - gfp_t gfp, bool charged) -{ - void *shadow; - int ret; - - __folio_set_locked(folio); - __folio_set_swapbacked(folio); - ret =3D swap_cache_add_folio(folio, entry, &shadow); - if (ret) - goto failed; - - if (!charged && mem_cgroup_swapin_charge_folio(folio, NULL, gfp, entry)) { - swap_cache_del_folio(folio); - ret =3D -ENOMEM; - goto failed; - } - - memcg1_swapin(entry, folio_nr_pages(folio)); - if (shadow) - workingset_refault(folio, shadow); - - /* Caller will initiate read into locked folio */ - folio_add_lru(folio); - return 0; - -failed: - folio_unlock(folio); - return ret; -} - static struct folio *swap_cache_read_folio(swp_entry_t entry, gfp_t gfp, struct mempolicy *mpol, pgoff_t ilx, struct swap_iocb **plug, bool readahead) @@ -649,7 +565,6 @@ static struct folio *swap_cache_read_folio(swp_entry_t = entry, gfp_t gfp, folio =3D swap_cache_get_folio(entry); if (folio) return folio; - folio =3D swap_cache_alloc_folio(entry, gfp, 0, NULL, mpol, ilx); } while (PTR_ERR(folio) =3D=3D -EEXIST); =20 @@ -666,49 +581,37 @@ static struct folio *swap_cache_read_folio(swp_entry_= t entry, gfp_t gfp, } =20 /** - * swapin_folio - swap-in one or multiple entries skipping readahead. - * @entry: starting swap entry to swap in - * @folio: a new allocated and charged folio + * swapin_entry - swap-in one or multiple entries skipping readahead. + * @entry: swap entry indicating the target slot + * @gfp_mask: memory allocation flags + * @orders: allocation orders + * @vmf: fault information + * @mpol: NUMA memory allocation policy to be applied + * @ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE * - * Reads @entry into @folio, @folio will be added to the swap cache. - * If @folio is a large folio, the @entry will be rounded down to align - * with the folio size. + * This would allocate a folio suit given @orders, or return the existing + * folio in the swap cache for @entry. This initiates the IO, too, if need= ed. + * @entry could be rounded down if @orders allows large allocation. * - * Return: returns pointer to @folio on success. If folio is a large folio - * and this raced with another swapin, NULL will be returned to allow fall= back - * to order 0. Else, if another folio was already added to the swap cache, - * return that swap cache folio instead. + * Context: Caller must ensure @entry is valid and pin the swap device wit= h refcount. + * Return: Returns the folio on success, returns error code if failed. */ -struct folio *swapin_folio(swp_entry_t entry, struct folio *folio) +struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp, unsigned long ord= ers, + struct vm_fault *vmf, struct mempolicy *mpol, pgoff_t ilx) { - int ret; - struct folio *swapcache; - pgoff_t offset =3D swp_offset(entry); - unsigned long nr_pages =3D folio_nr_pages(folio); - - entry =3D swp_entry(swp_type(entry), round_down(offset, nr_pages)); - for (;;) { - ret =3D __swap_cache_prepare_and_add(entry, folio, 0, true); - if (!ret) { - swap_read_folio(folio, NULL); - break; - } + struct folio *folio; =20 - /* - * Large order allocation needs special handling on - * race: if a smaller folio exists in cache, swapin needs - * to fallback to order 0, and doing a swap cache lookup - * might return a folio that is irrelevant to the faulting - * entry because @entry is aligned down. Just return NULL. - */ - if (ret !=3D -EEXIST || nr_pages > 1) - return NULL; + do { + folio =3D swap_cache_get_folio(entry); + if (folio) + return folio; + folio =3D swap_cache_alloc_folio(entry, gfp, orders, vmf, mpol, ilx); + } while (PTR_ERR(folio) =3D=3D -EEXIST); =20 - swapcache =3D swap_cache_get_folio(entry); - if (swapcache) - return swapcache; - } + if (IS_ERR(folio)) + return NULL; =20 + swap_read_folio(folio, NULL); return folio; } =20 diff --git a/mm/swapfile.c b/mm/swapfile.c index 06b37efad2bd..7e7614a5181a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1833,8 +1833,7 @@ void folio_put_swap(struct folio *folio, struct page = *subpage) * do_swap_page() * ... swapoff+swapon * swap_cache_alloc_folio() - * swap_cache_add_folio() - * // check swap_map + * // check swap_map * // verify PTE not changed * * In __swap_duplicate(), the swap_map need to be checked before --=20 2.53.0