From: Kairui Song <kasong@tencent.com>
The current swap entry allocation/freeing workflow has never had a clear
definition. This makes it hard to debug or add new optimizations.
This commit introduces a proper definition of how swap entries would be
allocated and freed. Now, most operations are folio based, so they will
never exceed one swap cluster, and we now have a cleaner border between
swap and the rest of mm, making it much easier to follow and debug,
especially with new added sanity checks. Also making more optimization
possible.
Swap entry will be mostly allocated and free with a folio bound.
The folio lock will be useful for resolving many swap ralated races.
Now swap allocation (except hibernation) always starts with a folio in
the swap cache, and gets duped/freed protected by the folio lock:
- folio_alloc_swap() - The only allocation entry point now.
Context: The folio must be locked.
This allocates one or a set of continuous swap slots for a folio and
binds them to the folio by adding the folio to the swap cache. The
swap slots' swap count start with zero value.
- folio_dup_swap() - Increase the swap count of one or more entries.
Context: The folio must be locked and in the swap cache. For now, the
caller still has to lock the new swap entry owner (e.g., PTL).
This increases the ref count of swap entries allocated to a folio.
Newly allocated swap slots' count has to be increased by this helper
as the folio got unmapped (and swap entries got installed).
- folio_put_swap() - Decrease the swap count of one or more entries.
Context: The folio must be locked and in the swap cache. For now, the
caller still has to lock the new swap entry owner (e.g., PTL).
This decreases the ref count of swap entries allocated to a folio.
Typically, swapin will decrease the swap count as the folio got
installed back and the swap entry got uninstalled
This won't remove the folio from the swap cache and free the
slot. Lazy freeing of swap cache is helpful for reducing IO.
There is already a folio_free_swap() for immediate cache reclaim.
This part could be further optimized later.
The above locking constraints could be further relaxed when the swap
table if fully implemented. Currently dup still needs the caller
to lock the swap entry container (e.g. PTL), or a concurrent zap
may underflow the swap count.
Some swap users need to interact with swap count without involving folio
(e.g. forking/zapping the page table or mapping truncate without swapin).
In such cases, the caller has to ensure there is no race condition on
whatever owns the swap count and use the below helpers:
- swap_put_entries_direct() - Decrease the swap count directly.
Context: The caller must lock whatever is referencing the slots to
avoid a race.
Typically the page table zapping or shmem mapping truncate will need
to free swap slots directly. If a slot is cached (has a folio bound),
this will also try to release the swap cache.
- swap_dup_entry_direct() - Increase the swap count directly.
Context: The caller must lock whatever is referencing the entries to
avoid race, and the entries must already have a swap count > 1.
Typically, forking will need to copy the page table and hence needs to
increase the swap count of the entries in the table. The page table is
locked while referencing the swap entries, so the entries all have a
swap count > 1 and can't be freed.
Hibernation subsystem is a bit different, so two special wrappers are here:
- swap_alloc_hibernation_slot() - Allocate one entry from one device.
- swap_free_hibernation_slot() - Free one entry allocated by the above
helper.
All hibernation entries are exclusive to the hibernation subsystem and
should not interact with ordinary swap routines.
By separating the workflows, it will be possible to bind folio more
tightly with swap cache and get rid of the SWAP_HAS_CACHE as a temporary
pin.
This commit should not introduce any behavior change
Signed-off-by: Kairui Song <kasong@tencent.com>
---
arch/s390/mm/pgtable.c | 2 +-
include/linux/swap.h | 58 +++++++++----------
kernel/power/swap.c | 10 ++--
mm/madvise.c | 2 +-
mm/memory.c | 15 +++--
mm/rmap.c | 7 ++-
mm/shmem.c | 10 ++--
mm/swap.h | 37 +++++++++++++
mm/swapfile.c | 148 ++++++++++++++++++++++++++++++++++---------------
9 files changed, 192 insertions(+), 97 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 0fde20bbc50b..c51304a4418e 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -692,7 +692,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
dec_mm_counter(mm, mm_counter(folio));
}
- free_swap_and_cache(entry);
+ swap_put_entries_direct(entry, 1);
}
void ptep_zap_unused(struct mm_struct *mm, unsigned long addr,
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 69025b473472..ac3caa4c6999 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -452,14 +452,8 @@ static inline long get_nr_swap_pages(void)
}
extern void si_swapinfo(struct sysinfo *);
-int folio_alloc_swap(struct folio *folio);
-bool folio_free_swap(struct folio *folio);
void put_swap_folio(struct folio *folio, swp_entry_t entry);
-extern swp_entry_t get_swap_page_of_type(int);
extern int add_swap_count_continuation(swp_entry_t, gfp_t);
-extern int swap_duplicate_nr(swp_entry_t entry, int nr);
-extern void swap_free_nr(swp_entry_t entry, int nr_pages);
-extern void free_swap_and_cache_nr(swp_entry_t entry, int nr);
int swap_type_of(dev_t device, sector_t offset);
int find_first_swap(dev_t *device);
extern unsigned int count_swap_pages(int, int);
@@ -472,6 +466,29 @@ struct backing_dev_info;
extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
sector_t swap_folio_sector(struct folio *folio);
+/*
+ * If there is an existing swap slot reference (swap entry) and the caller
+ * guarantees that there is no race modification of it (e.g., PTL
+ * protecting the swap entry in page table; shmem's cmpxchg protects t
+ * he swap entry in shmem mapping), these two helpers below can be used
+ * to put/dup the entries directly.
+ *
+ * All entries must be allocated by folio_alloc_swap(). And they must have
+ * a swap count > 1. See comments of folio_*_swap helpers for more info.
+ */
+int swap_dup_entry_direct(swp_entry_t entry);
+void swap_put_entries_direct(swp_entry_t entry, int nr);
+
+/*
+ * folio_free_swap tries to free the swap entries pinned by a swap cache
+ * folio, it has to be here to be called by other components.
+ */
+bool folio_free_swap(struct folio *folio);
+
+/* Allocate / free (hibernation) exclusive entries */
+swp_entry_t swap_alloc_hibernation_slot(int type);
+void swap_free_hibernation_slot(swp_entry_t entry);
+
static inline void put_swap_device(struct swap_info_struct *si)
{
percpu_ref_put(&si->users);
@@ -499,10 +516,6 @@ static inline void put_swap_device(struct swap_info_struct *si)
#define free_pages_and_swap_cache(pages, nr) \
release_pages((pages), (nr));
-static inline void free_swap_and_cache_nr(swp_entry_t entry, int nr)
-{
-}
-
static inline void free_swap_cache(struct folio *folio)
{
}
@@ -512,12 +525,12 @@ static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
return 0;
}
-static inline int swap_duplicate_nr(swp_entry_t swp, int nr_pages)
+static inline int swap_dup_entry_direct(swp_entry_t ent)
{
return 0;
}
-static inline void swap_free_nr(swp_entry_t entry, int nr_pages)
+static inline void swap_put_entries_direct(swp_entry_t ent, int nr)
{
}
@@ -541,11 +554,6 @@ static inline int swp_swapcount(swp_entry_t entry)
return 0;
}
-static inline int folio_alloc_swap(struct folio *folio)
-{
- return -EINVAL;
-}
-
static inline bool folio_free_swap(struct folio *folio)
{
return false;
@@ -558,22 +566,6 @@ static inline int add_swap_extent(struct swap_info_struct *sis,
return -EINVAL;
}
#endif /* CONFIG_SWAP */
-
-static inline int swap_duplicate(swp_entry_t entry)
-{
- return swap_duplicate_nr(entry, 1);
-}
-
-static inline void free_swap_and_cache(swp_entry_t entry)
-{
- free_swap_and_cache_nr(entry, 1);
-}
-
-static inline void swap_free(swp_entry_t entry)
-{
- swap_free_nr(entry, 1);
-}
-
#ifdef CONFIG_MEMCG
static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg)
{
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 0beff7eeaaba..546a0c701970 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -179,10 +179,10 @@ sector_t alloc_swapdev_block(int swap)
{
unsigned long offset;
- offset = swp_offset(get_swap_page_of_type(swap));
+ offset = swp_offset(swap_alloc_hibernation_slot(swap));
if (offset) {
if (swsusp_extents_insert(offset))
- swap_free(swp_entry(swap, offset));
+ swap_free_hibernation_slot(swp_entry(swap, offset));
else
return swapdev_block(swap, offset);
}
@@ -197,6 +197,7 @@ sector_t alloc_swapdev_block(int swap)
void free_all_swap_pages(int swap)
{
+ unsigned long offset;
struct rb_node *node;
while ((node = swsusp_extents.rb_node)) {
@@ -204,8 +205,9 @@ void free_all_swap_pages(int swap)
ext = rb_entry(node, struct swsusp_extent, node);
rb_erase(node, &swsusp_extents);
- swap_free_nr(swp_entry(swap, ext->start),
- ext->end - ext->start + 1);
+
+ for (offset = ext->start; offset < ext->end; offset++)
+ swap_free_hibernation_slot(swp_entry(swap, offset));
kfree(ext);
}
diff --git a/mm/madvise.c b/mm/madvise.c
index fb1c86e630b6..3cf2097d2085 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -697,7 +697,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
max_nr = (end - addr) / PAGE_SIZE;
nr = swap_pte_batch(pte, max_nr, ptent);
nr_swap -= nr;
- free_swap_and_cache_nr(entry, nr);
+ swap_put_entries_direct(entry, nr);
clear_not_present_full_ptes(mm, addr, pte, nr, tlb->fullmm);
} else if (is_hwpoison_entry(entry) ||
is_poisoned_swp_entry(entry)) {
diff --git a/mm/memory.c b/mm/memory.c
index 589d6fc3d424..27d91ae3648a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -933,7 +933,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
swp_entry_t entry = pte_to_swp_entry(orig_pte);
if (likely(!non_swap_entry(entry))) {
- if (swap_duplicate(entry) < 0)
+ if (swap_dup_entry_direct(entry) < 0)
return -EIO;
/* make sure dst_mm is on swapoff's mmlist. */
@@ -1746,7 +1746,7 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb,
nr = swap_pte_batch(pte, max_nr, ptent);
rss[MM_SWAPENTS] -= nr;
- free_swap_and_cache_nr(entry, nr);
+ swap_put_entries_direct(entry, nr);
} else if (is_migration_entry(entry)) {
struct folio *folio = pfn_swap_entry_folio(entry);
@@ -4932,7 +4932,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/*
* Some architectures may have to restore extra metadata to the page
* when reading from swap. This metadata may be indexed by swap entry
- * so this must be called before swap_free().
+ * so this must be called before folio_put_swap().
*/
arch_swap_restore(folio_swap(entry, folio), folio);
@@ -4970,6 +4970,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if (unlikely(folio != swapcache)) {
folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
folio_add_lru_vma(folio, vma);
+ folio_put_swap(swapcache, NULL);
} else if (!folio_test_anon(folio)) {
/*
* We currently only expect !anon folios that are fully
@@ -4978,9 +4979,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
VM_WARN_ON_ONCE_FOLIO(folio_nr_pages(folio) != nr_pages, folio);
VM_WARN_ON_ONCE_FOLIO(folio_mapped(folio), folio);
folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
+ folio_put_swap(folio, NULL);
} else {
+ VM_WARN_ON_ONCE(nr_pages != 1 && nr_pages != folio_nr_pages(folio));
folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address,
- rmap_flags);
+ rmap_flags);
+ folio_put_swap(folio, nr_pages == 1 ? page : NULL);
}
VM_BUG_ON(!folio_test_anon(folio) ||
@@ -4994,7 +4998,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
* swapcache. Do it after mapping so any raced page fault will
* see the folio in swap cache and wait for us.
*/
- swap_free_nr(entry, nr_pages);
if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
folio_free_swap(folio);
@@ -5004,7 +5007,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
* Hold the lock to avoid the swap entry to be reused
* until we take the PT lock for the pte_same() check
* (to avoid false positives from pte_same). For
- * further safety release the lock after the swap_free
+ * further safety release the lock after the folio_put_swap
* so that the swap count won't change under a
* parallel locked swapcache.
*/
diff --git a/mm/rmap.c b/mm/rmap.c
index 1954c538a991..844864831797 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -82,6 +82,7 @@
#include <trace/events/migrate.h>
#include "internal.h"
+#include "swap.h"
static struct kmem_cache *anon_vma_cachep;
static struct kmem_cache *anon_vma_chain_cachep;
@@ -2146,7 +2147,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
goto discard;
}
- if (swap_duplicate(entry) < 0) {
+ if (folio_dup_swap(folio, subpage) < 0) {
set_pte_at(mm, address, pvmw.pte, pteval);
goto walk_abort;
}
@@ -2157,7 +2158,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* so we'll not check/care.
*/
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
- swap_free(entry);
+ folio_put_swap(folio, subpage);
set_pte_at(mm, address, pvmw.pte, pteval);
goto walk_abort;
}
@@ -2165,7 +2166,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
/* See folio_try_share_anon_rmap(): clear PTE first. */
if (anon_exclusive &&
folio_try_share_anon_rmap_pte(folio, subpage)) {
- swap_free(entry);
+ folio_put_swap(folio, subpage);
set_pte_at(mm, address, pvmw.pte, pteval);
goto walk_abort;
}
diff --git a/mm/shmem.c b/mm/shmem.c
index 46d54a1288fd..5e6cb763d945 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -982,7 +982,7 @@ static long shmem_free_swap(struct address_space *mapping,
old = xa_cmpxchg_irq(&mapping->i_pages, index, radswap, NULL, 0);
if (old != radswap)
return 0;
- free_swap_and_cache_nr(radix_to_swp_entry(radswap), 1 << order);
+ swap_put_entries_direct(radix_to_swp_entry(radswap), 1 << order);
return 1 << order;
}
@@ -1665,7 +1665,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
spin_unlock(&shmem_swaplist_lock);
}
- swap_duplicate_nr(folio->swap, nr_pages);
+ folio_dup_swap(folio, NULL);
shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap));
BUG_ON(folio_mapped(folio));
@@ -1686,7 +1686,7 @@ int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
/* Swap entry might be erased by racing shmem_free_swap() */
if (!error) {
shmem_recalc_inode(inode, 0, -nr_pages);
- swap_free_nr(folio->swap, nr_pages);
+ folio_put_swap(folio, NULL);
}
/*
@@ -2172,6 +2172,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
nr_pages = folio_nr_pages(folio);
folio_wait_writeback(folio);
+ folio_put_swap(folio, NULL);
swap_cache_del_folio(folio);
/*
* Don't treat swapin error folio as alloced. Otherwise inode->i_blocks
@@ -2179,7 +2180,6 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
* in shmem_evict_inode().
*/
shmem_recalc_inode(inode, -nr_pages, -nr_pages);
- swap_free_nr(swap, nr_pages);
}
static int shmem_split_large_entry(struct inode *inode, pgoff_t index,
@@ -2401,9 +2401,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
if (sgp == SGP_WRITE)
folio_mark_accessed(folio);
+ folio_put_swap(folio, NULL);
swap_cache_del_folio(folio);
folio_mark_dirty(folio);
- swap_free_nr(swap, nr_pages);
put_swap_device(si);
*foliop = folio;
diff --git a/mm/swap.h b/mm/swap.h
index a3c5f2dca0d5..74c61129d7b7 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -183,6 +183,28 @@ static inline void swap_cluster_unlock_irq(struct swap_cluster_info *ci)
spin_unlock_irq(&ci->lock);
}
+/*
+ * Below are the core routines for doing swap for a folio.
+ * All helpers requires the folio to be locked, and a locked folio
+ * in the swap cache pins the swap entries / slots allocated to the
+ * folio, swap relies heavily on the swap cache and folio lock for
+ * synchronization.
+ *
+ * folio_alloc_swap(): the entry point for a folio to be swapped
+ * out. It allocates swap slots and pins the slots with swap cache.
+ * The slots start with a swap count of zero.
+ *
+ * folio_dup_swap(): increases the swap count of a folio, usually
+ * during it gets unmapped and a swap entry is installed to replace
+ * it (e.g., swap entry in page table). A swap slot with swap
+ * count == 0 should only be increasd by this helper.
+ *
+ * folio_put_swap(): does the opposite thing of folio_dup_swap().
+ */
+int folio_alloc_swap(struct folio *folio);
+int folio_dup_swap(struct folio *folio, struct page *subpage);
+void folio_put_swap(struct folio *folio, struct page *subpage);
+
/* linux/mm/page_io.c */
int sio_pool_init(void);
struct swap_iocb;
@@ -363,9 +385,24 @@ static inline struct swap_info_struct *__swap_entry_to_info(swp_entry_t entry)
return NULL;
}
+static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
+{
+ return -EINVAL;
+}
+
+static inline int folio_dup_swap(struct folio *folio, struct page *page)
+{
+ return -EINVAL;
+}
+
+static inline void folio_put_swap(struct folio *folio, struct page *page)
+{
+}
+
static inline void swap_read_folio(struct folio *folio, struct swap_iocb **plug)
{
}
+
static inline void swap_write_unplug(struct swap_iocb *sio)
{
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 415db36d85d3..426b0b6d583f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -58,6 +58,9 @@ static void swap_entries_free(struct swap_info_struct *si,
swp_entry_t entry, unsigned int nr_pages);
static void swap_range_alloc(struct swap_info_struct *si,
unsigned int nr_entries);
+static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr);
+static bool swap_entries_put_map(struct swap_info_struct *si,
+ swp_entry_t entry, int nr);
static bool folio_swapcache_freeable(struct folio *folio);
static void move_cluster(struct swap_info_struct *si,
struct swap_cluster_info *ci, struct list_head *list,
@@ -1467,6 +1470,12 @@ int folio_alloc_swap(struct folio *folio)
*/
WARN_ON_ONCE(swap_cache_add_folio(folio, entry, NULL, true));
+ /*
+ * Allocator should always allocate aligned entries so folio based
+ * operations never crossed more than one cluster.
+ */
+ VM_WARN_ON_ONCE_FOLIO(!IS_ALIGNED(folio->swap.val, size), folio);
+
return 0;
out_free:
@@ -1474,6 +1483,62 @@ int folio_alloc_swap(struct folio *folio)
return -ENOMEM;
}
+/**
+ * folio_dup_swap() - Increase swap count of swap entries of a folio.
+ * @folio: folio with swap entries bounded.
+ * @subpage: if not NULL, only increase the swap count of this subpage.
+ *
+ * Context: Caller must ensure the folio is locked and in the swap cache.
+ * The caller also has to ensure there is no raced call to
+ * swap_put_entries_direct before this helper returns, or the swap
+ * map may underflow (TODO: maybe we should allow or avoid underflow to
+ * make swap refcount lockless).
+ */
+int folio_dup_swap(struct folio *folio, struct page *subpage)
+{
+ int err = 0;
+ swp_entry_t entry = folio->swap;
+ unsigned long nr_pages = folio_nr_pages(folio);
+
+ VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+ VM_WARN_ON_FOLIO(!folio_test_swapcache(folio), folio);
+
+ if (subpage) {
+ entry.val += folio_page_idx(folio, subpage);
+ nr_pages = 1;
+ }
+
+ while (!err && __swap_duplicate(entry, 1, nr_pages) == -ENOMEM)
+ err = add_swap_count_continuation(entry, GFP_ATOMIC);
+
+ return err;
+}
+
+/**
+ * folio_put_swap() - Decrease swap count of swap entries of a folio.
+ * @folio: folio with swap entries bounded, must be in swap cache and locked.
+ * @subpage: if not NULL, only decrease the swap count of this subpage.
+ *
+ * This won't free the swap slots even if swap count drops to zero, they are
+ * still pinned by the swap cache. User may call folio_free_swap to free them.
+ * Context: Caller must ensure the folio is locked and in the swap cache.
+ */
+void folio_put_swap(struct folio *folio, struct page *subpage)
+{
+ swp_entry_t entry = folio->swap;
+ unsigned long nr_pages = folio_nr_pages(folio);
+
+ VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
+ VM_WARN_ON_FOLIO(!folio_test_swapcache(folio), folio);
+
+ if (subpage) {
+ entry.val += folio_page_idx(folio, subpage);
+ nr_pages = 1;
+ }
+
+ swap_entries_put_map(__swap_entry_to_info(entry), entry, nr_pages);
+}
+
static struct swap_info_struct *_swap_info_get(swp_entry_t entry)
{
struct swap_info_struct *si;
@@ -1714,28 +1779,6 @@ static void swap_entries_free(struct swap_info_struct *si,
partial_free_cluster(si, ci);
}
-/*
- * Caller has made sure that the swap device corresponding to entry
- * is still around or has not been recycled.
- */
-void swap_free_nr(swp_entry_t entry, int nr_pages)
-{
- int nr;
- struct swap_info_struct *sis;
- unsigned long offset = swp_offset(entry);
-
- sis = _swap_info_get(entry);
- if (!sis)
- return;
-
- while (nr_pages) {
- nr = min_t(int, nr_pages, SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
- swap_entries_put_map(sis, swp_entry(sis->type, offset), nr);
- offset += nr;
- nr_pages -= nr;
- }
-}
-
/*
* Called after dropping swapcache to decrease refcnt to swap entries.
*/
@@ -1924,16 +1967,19 @@ bool folio_free_swap(struct folio *folio)
}
/**
- * free_swap_and_cache_nr() - Release reference on range of swap entries and
- * reclaim their cache if no more references remain.
+ * swap_put_entries_direct() - Release reference on range of swap entries and
+ * reclaim their cache if no more references remain.
* @entry: First entry of range.
* @nr: Number of entries in range.
*
* For each swap entry in the contiguous range, release a reference. If any swap
* entries become free, try to reclaim their underlying folios, if present. The
* offset range is defined by [entry.offset, entry.offset + nr).
+ *
+ * Context: Caller must ensure there is no race condition on the reference
+ * owner. e.g., locking the PTL of a PTE containing the entry being released.
*/
-void free_swap_and_cache_nr(swp_entry_t entry, int nr)
+void swap_put_entries_direct(swp_entry_t entry, int nr)
{
const unsigned long start_offset = swp_offset(entry);
const unsigned long end_offset = start_offset + nr;
@@ -1942,10 +1988,9 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr)
unsigned long offset;
si = get_swap_device(entry);
- if (!si)
+ if (WARN_ON_ONCE(!si))
return;
-
- if (WARN_ON(end_offset > si->max))
+ if (WARN_ON_ONCE(end_offset > si->max))
goto out;
/*
@@ -1989,8 +2034,8 @@ void free_swap_and_cache_nr(swp_entry_t entry, int nr)
}
#ifdef CONFIG_HIBERNATION
-
-swp_entry_t get_swap_page_of_type(int type)
+/* Allocate a slot for hibernation */
+swp_entry_t swap_alloc_hibernation_slot(int type)
{
struct swap_info_struct *si = swap_type_to_info(type);
unsigned long offset;
@@ -2020,6 +2065,27 @@ swp_entry_t get_swap_page_of_type(int type)
return entry;
}
+/* Free a slot allocated by swap_alloc_hibernation_slot */
+void swap_free_hibernation_slot(swp_entry_t entry)
+{
+ struct swap_info_struct *si;
+ struct swap_cluster_info *ci;
+ pgoff_t offset = swp_offset(entry);
+
+ si = get_swap_device(entry);
+ if (WARN_ON(!si))
+ return;
+
+ ci = swap_cluster_lock(si, offset);
+ swap_entry_put_locked(si, ci, entry, 1);
+ WARN_ON(swap_entry_swapped(si, offset));
+ swap_cluster_unlock(ci);
+
+ /* In theory readahead might add it to the swap cache by accident */
+ __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
+ put_swap_device(si);
+}
+
/*
* Find the swap type that corresponds to given device (if any).
*
@@ -2181,7 +2247,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
/*
* Some architectures may have to restore extra metadata to the page
* when reading from swap. This metadata may be indexed by swap entry
- * so this must be called before swap_free().
+ * so this must be called before folio_put_swap().
*/
arch_swap_restore(folio_swap(entry, folio), folio);
@@ -2222,7 +2288,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
new_pte = pte_mkuffd_wp(new_pte);
setpte:
set_pte_at(vma->vm_mm, addr, pte, new_pte);
- swap_free(entry);
+ folio_put_swap(folio, page);
out:
if (pte)
pte_unmap_unlock(pte, ptl);
@@ -3725,28 +3791,22 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
return err;
}
-/**
- * swap_duplicate_nr() - Increase reference count of nr contiguous swap entries
- * by 1.
- *
+/*
+ * swap_dup_entry_direct() - Increase reference count of a swap entry by one.
* @entry: first swap entry from which we want to increase the refcount.
- * @nr: Number of entries in range.
*
* Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
* but could not be atomically allocated. Returns 0, just as if it succeeded,
* if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
* might occur if a page table entry has got corrupted.
*
- * Note that we are currently not handling the case where nr > 1 and we need to
- * add swap count continuation. This is OK, because no such user exists - shmem
- * is the only user that can pass nr > 1, and it never re-duplicates any swap
- * entry it owns.
+ * Context: Caller must ensure there is no race condition on the reference
+ * owner. e.g., locking the PTL of a PTE containing the entry being increased.
*/
-int swap_duplicate_nr(swp_entry_t entry, int nr)
+int swap_dup_entry_direct(swp_entry_t entry)
{
int err = 0;
-
- while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
+ while (!err && __swap_duplicate(entry, 1, 1) == -ENOMEM)
err = add_swap_count_continuation(entry, GFP_ATOMIC);
return err;
}
--
2.51.1
On Wed, Oct 29, 2025 at 11:58:40PM +0800, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
Hello Kairui!
> The current swap entry allocation/freeing workflow has never had a clear
> definition. This makes it hard to debug or add new optimizations.
>
> This commit introduces a proper definition of how swap entries would be
> allocated and freed. Now, most operations are folio based, so they will
> never exceed one swap cluster, and we now have a cleaner border between
> swap and the rest of mm, making it much easier to follow and debug,
> especially with new added sanity checks. Also making more optimization
> possible.
>
> Swap entry will be mostly allocated and free with a folio bound.
> The folio lock will be useful for resolving many swap ralated races.
>
> Now swap allocation (except hibernation) always starts with a folio in
> the swap cache, and gets duped/freed protected by the folio lock:
>
> - folio_alloc_swap() - The only allocation entry point now.
> Context: The folio must be locked.
> This allocates one or a set of continuous swap slots for a folio and
> binds them to the folio by adding the folio to the swap cache. The
> swap slots' swap count start with zero value.
>
> - folio_dup_swap() - Increase the swap count of one or more entries.
> Context: The folio must be locked and in the swap cache. For now, the
> caller still has to lock the new swap entry owner (e.g., PTL).
> This increases the ref count of swap entries allocated to a folio.
> Newly allocated swap slots' count has to be increased by this helper
> as the folio got unmapped (and swap entries got installed).
>
> - folio_put_swap() - Decrease the swap count of one or more entries.
> Context: The folio must be locked and in the swap cache. For now, the
> caller still has to lock the new swap entry owner (e.g., PTL).
> This decreases the ref count of swap entries allocated to a folio.
> Typically, swapin will decrease the swap count as the folio got
> installed back and the swap entry got uninstalled
>
> This won't remove the folio from the swap cache and free the
> slot. Lazy freeing of swap cache is helpful for reducing IO.
> There is already a folio_free_swap() for immediate cache reclaim.
> This part could be further optimized later.
>
> The above locking constraints could be further relaxed when the swap
> table if fully implemented. Currently dup still needs the caller
> to lock the swap entry container (e.g. PTL), or a concurrent zap
> may underflow the swap count.
>
> Some swap users need to interact with swap count without involving folio
> (e.g. forking/zapping the page table or mapping truncate without swapin).
> In such cases, the caller has to ensure there is no race condition on
> whatever owns the swap count and use the below helpers:
>
> - swap_put_entries_direct() - Decrease the swap count directly.
> Context: The caller must lock whatever is referencing the slots to
> avoid a race.
>
> Typically the page table zapping or shmem mapping truncate will need
> to free swap slots directly. If a slot is cached (has a folio bound),
> this will also try to release the swap cache.
>
> - swap_dup_entry_direct() - Increase the swap count directly.
> Context: The caller must lock whatever is referencing the entries to
> avoid race, and the entries must already have a swap count > 1.
>
> Typically, forking will need to copy the page table and hence needs to
> increase the swap count of the entries in the table. The page table is
> locked while referencing the swap entries, so the entries all have a
> swap count > 1 and can't be freed.
>
> Hibernation subsystem is a bit different, so two special wrappers are here:
>
> - swap_alloc_hibernation_slot() - Allocate one entry from one device.
> - swap_free_hibernation_slot() - Free one entry allocated by the above
> helper.
During the code review, I found something to be verified.
It is not directly releavant your patch,
I send the email for checking it right and possible fix on this patch.
on the swap_alloc_hibernation_slot function
nr_swap_pages is decreased. but as I think it is decreased on swap_range_alloc.
The nr_swap_pages are decremented as the callflow as like the below.
cluster_alloc_swap_entry -> alloc_swap_scan_cluster
-> closter_alloc_range -> swap_range_alloc
Introduced on
4f78252da887ee7e9d1875dd6e07d9baa936c04f
mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc()
#ifdef CONFIG_HIBERNATION
/* Allocate a slot for hibernation */
swp_entry_t swap_alloc_hibernation_slot(int type)
{
....
local_unlock(&percpu_swap_cluster.lock);
if (offset) {
entry = swp_entry(si->type, offset);
atomic_long_dec(&nr_swap_pages); // here
Thank you,
Youngjun Park
On Sat, Nov 1, 2025 at 12:51 PM YoungJun Park <youngjun.park@lge.com> wrote: > > On Wed, Oct 29, 2025 at 11:58:40PM +0800, Kairui Song wrote: > > From: Kairui Song <kasong@tencent.com> > > Hello Kairui! > > > The current swap entry allocation/freeing workflow has never had a clear > > definition. This makes it hard to debug or add new optimizations. > > > > This commit introduces a proper definition of how swap entries would be > > allocated and freed. Now, most operations are folio based, so they will > > never exceed one swap cluster, and we now have a cleaner border between > > swap and the rest of mm, making it much easier to follow and debug, > > especially with new added sanity checks. Also making more optimization > > possible. > > > > Swap entry will be mostly allocated and free with a folio bound. > > The folio lock will be useful for resolving many swap ralated races. > > > > Now swap allocation (except hibernation) always starts with a folio in > > the swap cache, and gets duped/freed protected by the folio lock: > > > > - folio_alloc_swap() - The only allocation entry point now. > > Context: The folio must be locked. > > This allocates one or a set of continuous swap slots for a folio and > > binds them to the folio by adding the folio to the swap cache. The > > swap slots' swap count start with zero value. > > > > - folio_dup_swap() - Increase the swap count of one or more entries. > > Context: The folio must be locked and in the swap cache. For now, the > > caller still has to lock the new swap entry owner (e.g., PTL). > > This increases the ref count of swap entries allocated to a folio. > > Newly allocated swap slots' count has to be increased by this helper > > as the folio got unmapped (and swap entries got installed). > > > > - folio_put_swap() - Decrease the swap count of one or more entries. > > Context: The folio must be locked and in the swap cache. For now, the > > caller still has to lock the new swap entry owner (e.g., PTL). > > This decreases the ref count of swap entries allocated to a folio. > > Typically, swapin will decrease the swap count as the folio got > > installed back and the swap entry got uninstalled > > > > This won't remove the folio from the swap cache and free the > > slot. Lazy freeing of swap cache is helpful for reducing IO. > > There is already a folio_free_swap() for immediate cache reclaim. > > This part could be further optimized later. > > > > The above locking constraints could be further relaxed when the swap > > table if fully implemented. Currently dup still needs the caller > > to lock the swap entry container (e.g. PTL), or a concurrent zap > > may underflow the swap count. > > > > Some swap users need to interact with swap count without involving folio > > (e.g. forking/zapping the page table or mapping truncate without swapin). > > In such cases, the caller has to ensure there is no race condition on > > whatever owns the swap count and use the below helpers: > > > > - swap_put_entries_direct() - Decrease the swap count directly. > > Context: The caller must lock whatever is referencing the slots to > > avoid a race. > > > > Typically the page table zapping or shmem mapping truncate will need > > to free swap slots directly. If a slot is cached (has a folio bound), > > this will also try to release the swap cache. > > > > - swap_dup_entry_direct() - Increase the swap count directly. > > Context: The caller must lock whatever is referencing the entries to > > avoid race, and the entries must already have a swap count > 1. > > > > Typically, forking will need to copy the page table and hence needs to > > increase the swap count of the entries in the table. The page table is > > locked while referencing the swap entries, so the entries all have a > > swap count > 1 and can't be freed. > > > > Hibernation subsystem is a bit different, so two special wrappers are here: > > > > - swap_alloc_hibernation_slot() - Allocate one entry from one device. > > - swap_free_hibernation_slot() - Free one entry allocated by the above > > helper. > > During the code review, I found something to be verified. > It is not directly releavant your patch, > I send the email for checking it right and possible fix on this patch. > > on the swap_alloc_hibernation_slot function > nr_swap_pages is decreased. but as I think it is decreased on swap_range_alloc. > > The nr_swap_pages are decremented as the callflow as like the below. > > cluster_alloc_swap_entry -> alloc_swap_scan_cluster > -> closter_alloc_range -> swap_range_alloc > > Introduced on > 4f78252da887ee7e9d1875dd6e07d9baa936c04f > mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() > Yeah, you are right, that's a bug introduced by 4f78252da887, will you send a patch to fix that ? Or I can send one, just remove the atomic_long_dec(&nr_swap_pages) in get_swap_page_of_type then we are fine.
On Sat, Nov 01, 2025 at 04:59:05PM +0800, Kairui Song wrote: > On Sat, Nov 1, 2025 at 12:51 PM YoungJun Park <youngjun.park@lge.com> wrote: > > > > On Wed, Oct 29, 2025 at 11:58:40PM +0800, Kairui Song wrote: > > > From: Kairui Song <kasong@tencent.com> > > During the code review, I found something to be verified. > > It is not directly releavant your patch, > > I send the email for checking it right and possible fix on this patch. > > > > on the swap_alloc_hibernation_slot function > > nr_swap_pages is decreased. but as I think it is decreased on swap_range_alloc. > > > > The nr_swap_pages are decremented as the callflow as like the below. > > > > cluster_alloc_swap_entry -> alloc_swap_scan_cluster > > -> closter_alloc_range -> swap_range_alloc > > > > Introduced on > > 4f78252da887ee7e9d1875dd6e07d9baa936c04f > > mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() > > > > Yeah, you are right, that's a bug introduced by 4f78252da887, will you > send a patch to fix that ? Or I can send one, just remove the > atomic_long_dec(&nr_swap_pages) in get_swap_page_of_type then we are > fine. Thank you for double check. I will send a patch soon. Regards, Youngjun Park
Hi Kairui,
kernel test robot noticed the following build errors:
[auto build test ERROR on f30d294530d939fa4b77d61bc60f25c4284841fa]
url: https://github.com/intel-lab-lkp/linux/commits/Kairui-Song/mm-swap-rename-__read_swap_cache_async-to-swap_cache_alloc_folio/20251030-000506
base: f30d294530d939fa4b77d61bc60f25c4284841fa
patch link: https://lore.kernel.org/r/20251029-swap-table-p2-v1-14-3d43f3b6ec32%40tencent.com
patch subject: [PATCH 14/19] mm, swap: sanitize swap entry management workflow
config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20251030/202510300341.cOYqY4ki-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251030/202510300341.cOYqY4ki-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510300341.cOYqY4ki-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from mm/shmem.c:44:
mm/swap.h:465:1: warning: non-void function does not return a value [-Wreturn-type]
465 | }
| ^
>> mm/shmem.c:1649:29: error: too few arguments to function call, expected 2, have 1
1649 | if (!folio_alloc_swap(folio)) {
| ~~~~~~~~~~~~~~~~ ^
mm/swap.h:388:19: note: 'folio_alloc_swap' declared here
388 | static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning and 1 error generated.
vim +1649 mm/shmem.c
^1da177e4c3f41 Linus Torvalds 2005-04-16 1563
7b73c12c6ebf00 Matthew Wilcox (Oracle 2025-04-02 1564) /**
7b73c12c6ebf00 Matthew Wilcox (Oracle 2025-04-02 1565) * shmem_writeout - Write the folio to swap
7b73c12c6ebf00 Matthew Wilcox (Oracle 2025-04-02 1566) * @folio: The folio to write
44b1b073eb3614 Christoph Hellwig 2025-06-10 1567 * @plug: swap plug
44b1b073eb3614 Christoph Hellwig 2025-06-10 1568 * @folio_list: list to put back folios on split
7b73c12c6ebf00 Matthew Wilcox (Oracle 2025-04-02 1569) *
7b73c12c6ebf00 Matthew Wilcox (Oracle 2025-04-02 1570) * Move the folio from the page cache to the swap cache.
7b73c12c6ebf00 Matthew Wilcox (Oracle 2025-04-02 1571) */
44b1b073eb3614 Christoph Hellwig 2025-06-10 1572 int shmem_writeout(struct folio *folio, struct swap_iocb **plug,
44b1b073eb3614 Christoph Hellwig 2025-06-10 1573 struct list_head *folio_list)
7b73c12c6ebf00 Matthew Wilcox (Oracle 2025-04-02 1574) {
8ccee8c19c605a Luis Chamberlain 2023-03-09 1575 struct address_space *mapping = folio->mapping;
8ccee8c19c605a Luis Chamberlain 2023-03-09 1576 struct inode *inode = mapping->host;
8ccee8c19c605a Luis Chamberlain 2023-03-09 1577 struct shmem_inode_info *info = SHMEM_I(inode);
2c6efe9cf2d784 Luis Chamberlain 2023-03-09 1578 struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
6922c0c7abd387 Hugh Dickins 2011-08-03 1579 pgoff_t index;
650180760be6bb Baolin Wang 2024-08-12 1580 int nr_pages;
809bc86517cc40 Baolin Wang 2024-08-12 1581 bool split = false;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1582
adae46ac1e38a2 Ricardo Cañuelo Navarro 2025-02-26 1583 if ((info->flags & VM_LOCKED) || sbinfo->noswap)
9a976f0c847b67 Luis Chamberlain 2023-03-09 1584 goto redirty;
9a976f0c847b67 Luis Chamberlain 2023-03-09 1585
9a976f0c847b67 Luis Chamberlain 2023-03-09 1586 if (!total_swap_pages)
9a976f0c847b67 Luis Chamberlain 2023-03-09 1587 goto redirty;
9a976f0c847b67 Luis Chamberlain 2023-03-09 1588
1e6decf30af5c5 Hugh Dickins 2021-09-02 1589 /*
809bc86517cc40 Baolin Wang 2024-08-12 1590 * If CONFIG_THP_SWAP is not enabled, the large folio should be
809bc86517cc40 Baolin Wang 2024-08-12 1591 * split when swapping.
809bc86517cc40 Baolin Wang 2024-08-12 1592 *
809bc86517cc40 Baolin Wang 2024-08-12 1593 * And shrinkage of pages beyond i_size does not split swap, so
809bc86517cc40 Baolin Wang 2024-08-12 1594 * swapout of a large folio crossing i_size needs to split too
809bc86517cc40 Baolin Wang 2024-08-12 1595 * (unless fallocate has been used to preallocate beyond EOF).
1e6decf30af5c5 Hugh Dickins 2021-09-02 1596 */
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1597) if (folio_test_large(folio)) {
809bc86517cc40 Baolin Wang 2024-08-12 1598 index = shmem_fallocend(inode,
809bc86517cc40 Baolin Wang 2024-08-12 1599 DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE));
809bc86517cc40 Baolin Wang 2024-08-12 1600 if ((index > folio->index && index < folio_next_index(folio)) ||
809bc86517cc40 Baolin Wang 2024-08-12 1601 !IS_ENABLED(CONFIG_THP_SWAP))
809bc86517cc40 Baolin Wang 2024-08-12 1602 split = true;
809bc86517cc40 Baolin Wang 2024-08-12 1603 }
809bc86517cc40 Baolin Wang 2024-08-12 1604
809bc86517cc40 Baolin Wang 2024-08-12 1605 if (split) {
809bc86517cc40 Baolin Wang 2024-08-12 1606 try_split:
1e6decf30af5c5 Hugh Dickins 2021-09-02 1607 /* Ensure the subpages are still dirty */
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1608) folio_test_set_dirty(folio);
44b1b073eb3614 Christoph Hellwig 2025-06-10 1609 if (split_folio_to_list(folio, folio_list))
1e6decf30af5c5 Hugh Dickins 2021-09-02 1610 goto redirty;
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1611) folio_clear_dirty(folio);
1e6decf30af5c5 Hugh Dickins 2021-09-02 1612 }
1e6decf30af5c5 Hugh Dickins 2021-09-02 1613
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1614) index = folio->index;
650180760be6bb Baolin Wang 2024-08-12 1615 nr_pages = folio_nr_pages(folio);
1635f6a74152f1 Hugh Dickins 2012-05-29 1616
1635f6a74152f1 Hugh Dickins 2012-05-29 1617 /*
1635f6a74152f1 Hugh Dickins 2012-05-29 1618 * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
1635f6a74152f1 Hugh Dickins 2012-05-29 1619 * value into swapfile.c, the only way we can correctly account for a
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1620) * fallocated folio arriving here is now to initialize it and write it.
1aac1400319d30 Hugh Dickins 2012-05-29 1621 *
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1622) * That's okay for a folio already fallocated earlier, but if we have
1aac1400319d30 Hugh Dickins 2012-05-29 1623 * not yet completed the fallocation, then (a) we want to keep track
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1624) * of this folio in case we have to undo it, and (b) it may not be a
1aac1400319d30 Hugh Dickins 2012-05-29 1625 * good idea to continue anyway, once we're pushing into swap. So
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1626) * reactivate the folio, and let shmem_fallocate() quit when too many.
1635f6a74152f1 Hugh Dickins 2012-05-29 1627 */
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1628) if (!folio_test_uptodate(folio)) {
1aac1400319d30 Hugh Dickins 2012-05-29 1629 if (inode->i_private) {
1aac1400319d30 Hugh Dickins 2012-05-29 1630 struct shmem_falloc *shmem_falloc;
1aac1400319d30 Hugh Dickins 2012-05-29 1631 spin_lock(&inode->i_lock);
1aac1400319d30 Hugh Dickins 2012-05-29 1632 shmem_falloc = inode->i_private;
1aac1400319d30 Hugh Dickins 2012-05-29 1633 if (shmem_falloc &&
8e205f779d1443 Hugh Dickins 2014-07-23 1634 !shmem_falloc->waitq &&
1aac1400319d30 Hugh Dickins 2012-05-29 1635 index >= shmem_falloc->start &&
1aac1400319d30 Hugh Dickins 2012-05-29 1636 index < shmem_falloc->next)
d77b90d2b26426 Baolin Wang 2024-12-19 1637 shmem_falloc->nr_unswapped += nr_pages;
1aac1400319d30 Hugh Dickins 2012-05-29 1638 else
1aac1400319d30 Hugh Dickins 2012-05-29 1639 shmem_falloc = NULL;
1aac1400319d30 Hugh Dickins 2012-05-29 1640 spin_unlock(&inode->i_lock);
1aac1400319d30 Hugh Dickins 2012-05-29 1641 if (shmem_falloc)
1aac1400319d30 Hugh Dickins 2012-05-29 1642 goto redirty;
1aac1400319d30 Hugh Dickins 2012-05-29 1643 }
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1644) folio_zero_range(folio, 0, folio_size(folio));
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1645) flush_dcache_folio(folio);
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1646) folio_mark_uptodate(folio);
1635f6a74152f1 Hugh Dickins 2012-05-29 1647 }
1635f6a74152f1 Hugh Dickins 2012-05-29 1648
7d14492199f93c Kairui Song 2025-10-24 @1649 if (!folio_alloc_swap(folio)) {
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1650 bool first_swapped = shmem_recalc_inode(inode, 0, nr_pages);
6344a6d9ce13ae Hugh Dickins 2025-07-16 1651 int error;
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1652
b1dea800ac3959 Hugh Dickins 2011-05-11 1653 /*
b1dea800ac3959 Hugh Dickins 2011-05-11 1654 * Add inode to shmem_unuse()'s list of swapped-out inodes,
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1655) * if it's not already there. Do it now before the folio is
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1656 * removed from page cache, when its pagelock no longer
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1657 * protects the inode from eviction. And do it now, after
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1658 * we've incremented swapped, because shmem_unuse() will
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1659 * prune a !swapped inode from the swaplist.
b1dea800ac3959 Hugh Dickins 2011-05-11 1660 */
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1661 if (first_swapped) {
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1662 spin_lock(&shmem_swaplist_lock);
05bf86b4ccfd0f Hugh Dickins 2011-05-14 1663 if (list_empty(&info->swaplist))
b56a2d8af9147a Vineeth Remanan Pillai 2019-03-05 1664 list_add(&info->swaplist, &shmem_swaplist);
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1665 spin_unlock(&shmem_swaplist_lock);
ea693aaa5ce5ad Hugh Dickins 2025-07-16 1666 }
b1dea800ac3959 Hugh Dickins 2011-05-11 1667
80d6ed40156385 Kairui Song 2025-10-29 1668 folio_dup_swap(folio, NULL);
b487a2da3575b6 Kairui Song 2025-03-14 1669 shmem_delete_from_page_cache(folio, swp_to_radix_entry(folio->swap));
267a4c76bbdb95 Hugh Dickins 2015-12-11 1670
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1671) BUG_ON(folio_mapped(folio));
6344a6d9ce13ae Hugh Dickins 2025-07-16 1672 error = swap_writeout(folio, plug);
6344a6d9ce13ae Hugh Dickins 2025-07-16 1673 if (error != AOP_WRITEPAGE_ACTIVATE) {
6344a6d9ce13ae Hugh Dickins 2025-07-16 1674 /* folio has been unlocked */
6344a6d9ce13ae Hugh Dickins 2025-07-16 1675 return error;
6344a6d9ce13ae Hugh Dickins 2025-07-16 1676 }
6344a6d9ce13ae Hugh Dickins 2025-07-16 1677
6344a6d9ce13ae Hugh Dickins 2025-07-16 1678 /*
6344a6d9ce13ae Hugh Dickins 2025-07-16 1679 * The intention here is to avoid holding on to the swap when
6344a6d9ce13ae Hugh Dickins 2025-07-16 1680 * zswap was unable to compress and unable to writeback; but
6344a6d9ce13ae Hugh Dickins 2025-07-16 1681 * it will be appropriate if other reactivate cases are added.
6344a6d9ce13ae Hugh Dickins 2025-07-16 1682 */
6344a6d9ce13ae Hugh Dickins 2025-07-16 1683 error = shmem_add_to_page_cache(folio, mapping, index,
6344a6d9ce13ae Hugh Dickins 2025-07-16 1684 swp_to_radix_entry(folio->swap),
6344a6d9ce13ae Hugh Dickins 2025-07-16 1685 __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
6344a6d9ce13ae Hugh Dickins 2025-07-16 1686 /* Swap entry might be erased by racing shmem_free_swap() */
6344a6d9ce13ae Hugh Dickins 2025-07-16 1687 if (!error) {
6344a6d9ce13ae Hugh Dickins 2025-07-16 1688 shmem_recalc_inode(inode, 0, -nr_pages);
80d6ed40156385 Kairui Song 2025-10-29 1689 folio_put_swap(folio, NULL);
6344a6d9ce13ae Hugh Dickins 2025-07-16 1690 }
6344a6d9ce13ae Hugh Dickins 2025-07-16 1691
6344a6d9ce13ae Hugh Dickins 2025-07-16 1692 /*
fd8d4f862f8c27 Kairui Song 2025-09-17 1693 * The swap_cache_del_folio() below could be left for
6344a6d9ce13ae Hugh Dickins 2025-07-16 1694 * shrink_folio_list()'s folio_free_swap() to dispose of;
6344a6d9ce13ae Hugh Dickins 2025-07-16 1695 * but I'm a little nervous about letting this folio out of
6344a6d9ce13ae Hugh Dickins 2025-07-16 1696 * shmem_writeout() in a hybrid half-tmpfs-half-swap state
6344a6d9ce13ae Hugh Dickins 2025-07-16 1697 * e.g. folio_mapping(folio) might give an unexpected answer.
6344a6d9ce13ae Hugh Dickins 2025-07-16 1698 */
fd8d4f862f8c27 Kairui Song 2025-09-17 1699 swap_cache_del_folio(folio);
6344a6d9ce13ae Hugh Dickins 2025-07-16 1700 goto redirty;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1701 }
b487a2da3575b6 Kairui Song 2025-03-14 1702 if (nr_pages > 1)
b487a2da3575b6 Kairui Song 2025-03-14 1703 goto try_split;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1704 redirty:
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1705) folio_mark_dirty(folio);
f530ed0e2d01aa Matthew Wilcox (Oracle 2022-09-02 1706) return AOP_WRITEPAGE_ACTIVATE; /* Return with folio locked */
^1da177e4c3f41 Linus Torvalds 2005-04-16 1707 }
7b73c12c6ebf00 Matthew Wilcox (Oracle 2025-04-02 1708) EXPORT_SYMBOL_GPL(shmem_writeout);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1709
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Kairui,
kernel test robot noticed the following build errors:
[auto build test ERROR on f30d294530d939fa4b77d61bc60f25c4284841fa]
url: https://github.com/intel-lab-lkp/linux/commits/Kairui-Song/mm-swap-rename-__read_swap_cache_async-to-swap_cache_alloc_folio/20251030-000506
base: f30d294530d939fa4b77d61bc60f25c4284841fa
patch link: https://lore.kernel.org/r/20251029-swap-table-p2-v1-14-3d43f3b6ec32%40tencent.com
patch subject: [PATCH 14/19] mm, swap: sanitize swap entry management workflow
config: i386-allnoconfig (https://download.01.org/0day-ci/archive/20251030/202510300316.UL4gxAlC-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251030/202510300316.UL4gxAlC-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510300316.UL4gxAlC-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from mm/vmscan.c:70:
mm/swap.h: In function 'swap_cache_add_folio':
mm/swap.h:465:1: warning: no return statement in function returning non-void [-Wreturn-type]
465 | }
| ^
mm/vmscan.c: In function 'shrink_folio_list':
>> mm/vmscan.c:1298:37: error: too few arguments to function 'folio_alloc_swap'
1298 | if (folio_alloc_swap(folio)) {
| ^~~~~~~~~~~~~~~~
mm/swap.h:388:19: note: declared here
388 | static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
| ^~~~~~~~~~~~~~~~
mm/vmscan.c:1314:45: error: too few arguments to function 'folio_alloc_swap'
1314 | if (folio_alloc_swap(folio))
| ^~~~~~~~~~~~~~~~
mm/swap.h:388:19: note: declared here
388 | static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
| ^~~~~~~~~~~~~~~~
--
In file included from mm/shmem.c:44:
mm/swap.h: In function 'swap_cache_add_folio':
mm/swap.h:465:1: warning: no return statement in function returning non-void [-Wreturn-type]
465 | }
| ^
mm/shmem.c: In function 'shmem_writeout':
>> mm/shmem.c:1649:14: error: too few arguments to function 'folio_alloc_swap'
1649 | if (!folio_alloc_swap(folio)) {
| ^~~~~~~~~~~~~~~~
mm/swap.h:388:19: note: declared here
388 | static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
| ^~~~~~~~~~~~~~~~
vim +/folio_alloc_swap +1298 mm/vmscan.c
d791ea676b6648 NeilBrown 2022-05-09 1072
^1da177e4c3f41 Linus Torvalds 2005-04-16 1073 /*
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1074) * shrink_folio_list() returns the number of reclaimed pages
^1da177e4c3f41 Linus Torvalds 2005-04-16 1075 */
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1076) static unsigned int shrink_folio_list(struct list_head *folio_list,
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1077) struct pglist_data *pgdat, struct scan_control *sc,
7d709f49babc28 Gregory Price 2025-04-24 1078 struct reclaim_stat *stat, bool ignore_references,
7d709f49babc28 Gregory Price 2025-04-24 1079 struct mem_cgroup *memcg)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1080 {
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1081) struct folio_batch free_folios;
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1082) LIST_HEAD(ret_folios);
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1083) LIST_HEAD(demote_folios);
a479b078fddb0a Li Zhijian 2025-01-10 1084 unsigned int nr_reclaimed = 0, nr_demoted = 0;
730ec8c01a2bd6 Maninder Singh 2020-06-03 1085 unsigned int pgactivate = 0;
26aa2d199d6f2c Dave Hansen 2021-09-02 1086 bool do_demote_pass;
2282679fb20bf0 NeilBrown 2022-05-09 1087 struct swap_iocb *plug = NULL;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1088
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1089) folio_batch_init(&free_folios);
060f005f074791 Kirill Tkhai 2019-03-05 1090 memset(stat, 0, sizeof(*stat));
^1da177e4c3f41 Linus Torvalds 2005-04-16 1091 cond_resched();
7d709f49babc28 Gregory Price 2025-04-24 1092 do_demote_pass = can_demote(pgdat->node_id, sc, memcg);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1093
26aa2d199d6f2c Dave Hansen 2021-09-02 1094 retry:
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1095) while (!list_empty(folio_list)) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1096 struct address_space *mapping;
be7c07d60e13ac Matthew Wilcox (Oracle 2021-12-23 1097) struct folio *folio;
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1098) enum folio_references references = FOLIOREF_RECLAIM;
d791ea676b6648 NeilBrown 2022-05-09 1099 bool dirty, writeback;
98879b3b9edc16 Yang Shi 2019-07-11 1100 unsigned int nr_pages;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1101
^1da177e4c3f41 Linus Torvalds 2005-04-16 1102 cond_resched();
^1da177e4c3f41 Linus Torvalds 2005-04-16 1103
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1104) folio = lru_to_folio(folio_list);
be7c07d60e13ac Matthew Wilcox (Oracle 2021-12-23 1105) list_del(&folio->lru);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1106
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1107) if (!folio_trylock(folio))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1108 goto keep;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1109
1b0449544c6482 Jinjiang Tu 2025-03-18 1110 if (folio_contain_hwpoisoned_page(folio)) {
9f1e8cd0b7c4c9 Jinjiang Tu 2025-06-27 1111 /*
9f1e8cd0b7c4c9 Jinjiang Tu 2025-06-27 1112 * unmap_poisoned_folio() can't handle large
9f1e8cd0b7c4c9 Jinjiang Tu 2025-06-27 1113 * folio, just skip it. memory_failure() will
9f1e8cd0b7c4c9 Jinjiang Tu 2025-06-27 1114 * handle it if the UCE is triggered again.
9f1e8cd0b7c4c9 Jinjiang Tu 2025-06-27 1115 */
9f1e8cd0b7c4c9 Jinjiang Tu 2025-06-27 1116 if (folio_test_large(folio))
9f1e8cd0b7c4c9 Jinjiang Tu 2025-06-27 1117 goto keep_locked;
9f1e8cd0b7c4c9 Jinjiang Tu 2025-06-27 1118
1b0449544c6482 Jinjiang Tu 2025-03-18 1119 unmap_poisoned_folio(folio, folio_pfn(folio), false);
1b0449544c6482 Jinjiang Tu 2025-03-18 1120 folio_unlock(folio);
1b0449544c6482 Jinjiang Tu 2025-03-18 1121 folio_put(folio);
1b0449544c6482 Jinjiang Tu 2025-03-18 1122 continue;
1b0449544c6482 Jinjiang Tu 2025-03-18 1123 }
1b0449544c6482 Jinjiang Tu 2025-03-18 1124
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1125) VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1126
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1127) nr_pages = folio_nr_pages(folio);
98879b3b9edc16 Yang Shi 2019-07-11 1128
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1129) /* Account the number of base pages */
98879b3b9edc16 Yang Shi 2019-07-11 1130 sc->nr_scanned += nr_pages;
80e4342601abfa Christoph Lameter 2006-02-11 1131
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1132) if (unlikely(!folio_evictable(folio)))
ad6b67041a4549 Minchan Kim 2017-05-03 1133 goto activate_locked;
894bc310419ac9 Lee Schermerhorn 2008-10-18 1134
1bee2c1677bcb5 Matthew Wilcox (Oracle 2022-05-12 1135) if (!sc->may_unmap && folio_mapped(folio))
80e4342601abfa Christoph Lameter 2006-02-11 1136 goto keep_locked;
80e4342601abfa Christoph Lameter 2006-02-11 1137
e2be15f6c3eece Mel Gorman 2013-07-03 1138 /*
894befec4d70b1 Andrey Ryabinin 2018-04-10 1139 * The number of dirty pages determines if a node is marked
8cd7c588decf47 Mel Gorman 2021-11-05 1140 * reclaim_congested. kswapd will stall and start writing
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1141) * folios if the tail of the LRU is all dirty unqueued folios.
e2be15f6c3eece Mel Gorman 2013-07-03 1142 */
e20c41b1091a24 Matthew Wilcox (Oracle 2022-01-17 1143) folio_check_dirty_writeback(folio, &dirty, &writeback);
e2be15f6c3eece Mel Gorman 2013-07-03 1144 if (dirty || writeback)
c79b7b96db8b12 Matthew Wilcox (Oracle 2022-01-17 1145) stat->nr_dirty += nr_pages;
e2be15f6c3eece Mel Gorman 2013-07-03 1146
e2be15f6c3eece Mel Gorman 2013-07-03 1147 if (dirty && !writeback)
c79b7b96db8b12 Matthew Wilcox (Oracle 2022-01-17 1148) stat->nr_unqueued_dirty += nr_pages;
e2be15f6c3eece Mel Gorman 2013-07-03 1149
d04e8acd03e5c3 Mel Gorman 2013-07-03 1150 /*
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1151) * Treat this folio as congested if folios are cycling
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1152) * through the LRU so quickly that the folios marked
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1153) * for immediate reclaim are making it to the end of
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1154) * the LRU a second time.
d04e8acd03e5c3 Mel Gorman 2013-07-03 1155 */
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1156) if (writeback && folio_test_reclaim(folio))
c79b7b96db8b12 Matthew Wilcox (Oracle 2022-01-17 1157) stat->nr_congested += nr_pages;
e2be15f6c3eece Mel Gorman 2013-07-03 1158
e62e384e9da8d9 Michal Hocko 2012-07-31 1159 /*
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1160) * If a folio at the tail of the LRU is under writeback, there
283aba9f9e0e48 Mel Gorman 2013-07-03 1161 * are three cases to consider.
283aba9f9e0e48 Mel Gorman 2013-07-03 1162 *
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1163) * 1) If reclaim is encountering an excessive number
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1164) * of folios under writeback and this folio has both
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1165) * the writeback and reclaim flags set, then it
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1166) * indicates that folios are being queued for I/O but
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1167) * are being recycled through the LRU before the I/O
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1168) * can complete. Waiting on the folio itself risks an
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1169) * indefinite stall if it is impossible to writeback
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1170) * the folio due to I/O error or disconnected storage
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1171) * so instead note that the LRU is being scanned too
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1172) * quickly and the caller can stall after the folio
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1173) * list has been processed.
283aba9f9e0e48 Mel Gorman 2013-07-03 1174 *
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1175) * 2) Global or new memcg reclaim encounters a folio that is
ecf5fc6e9654cd Michal Hocko 2015-08-04 1176 * not marked for immediate reclaim, or the caller does not
ecf5fc6e9654cd Michal Hocko 2015-08-04 1177 * have __GFP_FS (or __GFP_IO if it's simply going to swap,
0c4f8ed498cea1 Joanne Koong 2025-04-14 1178 * not to fs), or the folio belongs to a mapping where
0c4f8ed498cea1 Joanne Koong 2025-04-14 1179 * waiting on writeback during reclaim may lead to a deadlock.
0c4f8ed498cea1 Joanne Koong 2025-04-14 1180 * In this case mark the folio for immediate reclaim and
0c4f8ed498cea1 Joanne Koong 2025-04-14 1181 * continue scanning.
283aba9f9e0e48 Mel Gorman 2013-07-03 1182 *
d791ea676b6648 NeilBrown 2022-05-09 1183 * Require may_enter_fs() because we would wait on fs, which
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1184) * may not have submitted I/O yet. And the loop driver might
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1185) * enter reclaim, and deadlock if it waits on a folio for
283aba9f9e0e48 Mel Gorman 2013-07-03 1186 * which it is needed to do the write (loop masks off
283aba9f9e0e48 Mel Gorman 2013-07-03 1187 * __GFP_IO|__GFP_FS for this reason); but more thought
283aba9f9e0e48 Mel Gorman 2013-07-03 1188 * would probably show more reasons.
283aba9f9e0e48 Mel Gorman 2013-07-03 1189 *
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1190) * 3) Legacy memcg encounters a folio that already has the
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1191) * reclaim flag set. memcg does not have any dirty folio
283aba9f9e0e48 Mel Gorman 2013-07-03 1192 * throttling so we could easily OOM just because too many
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1193) * folios are in writeback and there is nothing else to
283aba9f9e0e48 Mel Gorman 2013-07-03 1194 * reclaim. Wait for the writeback to complete.
c55e8d035b28b2 Johannes Weiner 2017-02-24 1195 *
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1196) * In cases 1) and 2) we activate the folios to get them out of
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1197) * the way while we continue scanning for clean folios on the
c55e8d035b28b2 Johannes Weiner 2017-02-24 1198 * inactive list and refilling from the active list. The
c55e8d035b28b2 Johannes Weiner 2017-02-24 1199 * observation here is that waiting for disk writes is more
c55e8d035b28b2 Johannes Weiner 2017-02-24 1200 * expensive than potentially causing reloads down the line.
c55e8d035b28b2 Johannes Weiner 2017-02-24 1201 * Since they're marked for immediate reclaim, they won't put
c55e8d035b28b2 Johannes Weiner 2017-02-24 1202 * memory pressure on the cache working set any longer than it
c55e8d035b28b2 Johannes Weiner 2017-02-24 1203 * takes to write them to disk.
e62e384e9da8d9 Michal Hocko 2012-07-31 1204 */
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1205) if (folio_test_writeback(folio)) {
0c4f8ed498cea1 Joanne Koong 2025-04-14 1206 mapping = folio_mapping(folio);
0c4f8ed498cea1 Joanne Koong 2025-04-14 1207
283aba9f9e0e48 Mel Gorman 2013-07-03 1208 /* Case 1 above */
283aba9f9e0e48 Mel Gorman 2013-07-03 1209 if (current_is_kswapd() &&
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1210) folio_test_reclaim(folio) &&
599d0c954f91d0 Mel Gorman 2016-07-28 1211 test_bit(PGDAT_WRITEBACK, &pgdat->flags)) {
c79b7b96db8b12 Matthew Wilcox (Oracle 2022-01-17 1212) stat->nr_immediate += nr_pages;
c55e8d035b28b2 Johannes Weiner 2017-02-24 1213 goto activate_locked;
283aba9f9e0e48 Mel Gorman 2013-07-03 1214
283aba9f9e0e48 Mel Gorman 2013-07-03 1215 /* Case 2 above */
b5ead35e7e1d34 Johannes Weiner 2019-11-30 1216 } else if (writeback_throttling_sane(sc) ||
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1217) !folio_test_reclaim(folio) ||
0c4f8ed498cea1 Joanne Koong 2025-04-14 1218 !may_enter_fs(folio, sc->gfp_mask) ||
0c4f8ed498cea1 Joanne Koong 2025-04-14 1219 (mapping &&
0c4f8ed498cea1 Joanne Koong 2025-04-14 1220 mapping_writeback_may_deadlock_on_reclaim(mapping))) {
c3b94f44fcb072 Hugh Dickins 2012-07-31 1221 /*
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1222) * This is slightly racy -
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1223) * folio_end_writeback() might have
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1224) * just cleared the reclaim flag, then
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1225) * setting the reclaim flag here ends up
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1226) * interpreted as the readahead flag - but
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1227) * that does not matter enough to care.
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1228) * What we do want is for this folio to
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1229) * have the reclaim flag set next time
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1230) * memcg reclaim reaches the tests above,
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1231) * so it will then wait for writeback to
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1232) * avoid OOM; and it's also appropriate
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1233) * in global reclaim.
c3b94f44fcb072 Hugh Dickins 2012-07-31 1234 */
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1235) folio_set_reclaim(folio);
c79b7b96db8b12 Matthew Wilcox (Oracle 2022-01-17 1236) stat->nr_writeback += nr_pages;
c55e8d035b28b2 Johannes Weiner 2017-02-24 1237 goto activate_locked;
283aba9f9e0e48 Mel Gorman 2013-07-03 1238
283aba9f9e0e48 Mel Gorman 2013-07-03 1239 /* Case 3 above */
283aba9f9e0e48 Mel Gorman 2013-07-03 1240 } else {
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1241) folio_unlock(folio);
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1242) folio_wait_writeback(folio);
d33e4e1412c8b6 Matthew Wilcox (Oracle 2022-05-12 1243) /* then go back and try same folio again */
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1244) list_add_tail(&folio->lru, folio_list);
7fadc820222497 Hugh Dickins 2015-09-08 1245 continue;
e62e384e9da8d9 Michal Hocko 2012-07-31 1246 }
283aba9f9e0e48 Mel Gorman 2013-07-03 1247 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1248
8940b34a4e082a Minchan Kim 2019-09-25 1249 if (!ignore_references)
d92013d1e5e47f Matthew Wilcox (Oracle 2022-02-15 1250) references = folio_check_references(folio, sc);
02c6de8d757cb3 Minchan Kim 2012-10-08 1251
dfc8d636cdb95f Johannes Weiner 2010-03-05 1252 switch (references) {
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1253) case FOLIOREF_ACTIVATE:
^1da177e4c3f41 Linus Torvalds 2005-04-16 1254 goto activate_locked;
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1255) case FOLIOREF_KEEP:
98879b3b9edc16 Yang Shi 2019-07-11 1256 stat->nr_ref_keep += nr_pages;
645747462435d8 Johannes Weiner 2010-03-05 1257 goto keep_locked;
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1258) case FOLIOREF_RECLAIM:
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1259) case FOLIOREF_RECLAIM_CLEAN:
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1260) ; /* try to reclaim the folio below */
dfc8d636cdb95f Johannes Weiner 2010-03-05 1261 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1262
26aa2d199d6f2c Dave Hansen 2021-09-02 1263 /*
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1264) * Before reclaiming the folio, try to relocate
26aa2d199d6f2c Dave Hansen 2021-09-02 1265 * its contents to another node.
26aa2d199d6f2c Dave Hansen 2021-09-02 1266 */
26aa2d199d6f2c Dave Hansen 2021-09-02 1267 if (do_demote_pass &&
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1268) (thp_migration_supported() || !folio_test_large(folio))) {
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1269) list_add(&folio->lru, &demote_folios);
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1270) folio_unlock(folio);
26aa2d199d6f2c Dave Hansen 2021-09-02 1271 continue;
26aa2d199d6f2c Dave Hansen 2021-09-02 1272 }
26aa2d199d6f2c Dave Hansen 2021-09-02 1273
^1da177e4c3f41 Linus Torvalds 2005-04-16 1274 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1275 * Anonymous process memory has backing store?
^1da177e4c3f41 Linus Torvalds 2005-04-16 1276 * Try to allocate it some swap space here.
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1277) * Lazyfree folio could be freed directly
^1da177e4c3f41 Linus Torvalds 2005-04-16 1278 */
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1279) if (folio_test_anon(folio) && folio_test_swapbacked(folio)) {
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1280) if (!folio_test_swapcache(folio)) {
63eb6b93ce725e Hugh Dickins 2008-11-19 1281 if (!(sc->gfp_mask & __GFP_IO))
63eb6b93ce725e Hugh Dickins 2008-11-19 1282 goto keep_locked;
d4b4084ac3154c Matthew Wilcox (Oracle 2022-02-04 1283) if (folio_maybe_dma_pinned(folio))
feb889fb40fafc Linus Torvalds 2021-01-16 1284 goto keep_locked;
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1285) if (folio_test_large(folio)) {
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1286) /* cannot split folio, skip it */
8710f6ed34e7bc David Hildenbrand 2024-08-02 1287 if (!can_split_folio(folio, 1, NULL))
b8f593cd0896b8 Ying Huang 2017-07-06 1288 goto activate_locked;
747552b1e71b40 Ying Huang 2017-07-06 1289 /*
5ed890ce514785 Ryan Roberts 2024-04-08 1290 * Split partially mapped folios right away.
5ed890ce514785 Ryan Roberts 2024-04-08 1291 * We can free the unmapped pages without IO.
747552b1e71b40 Ying Huang 2017-07-06 1292 */
8422acdc97ed58 Usama Arif 2024-08-30 1293 if (data_race(!list_empty(&folio->_deferred_list) &&
8422acdc97ed58 Usama Arif 2024-08-30 1294 folio_test_partially_mapped(folio)) &&
5ed890ce514785 Ryan Roberts 2024-04-08 1295 split_folio_to_list(folio, folio_list))
747552b1e71b40 Ying Huang 2017-07-06 1296 goto activate_locked;
747552b1e71b40 Ying Huang 2017-07-06 1297 }
7d14492199f93c Kairui Song 2025-10-24 @1298 if (folio_alloc_swap(folio)) {
d0f048ac39f6a7 Barry Song 2024-04-12 1299 int __maybe_unused order = folio_order(folio);
d0f048ac39f6a7 Barry Song 2024-04-12 1300
09c02e56327bda Matthew Wilcox (Oracle 2022-05-12 1301) if (!folio_test_large(folio))
98879b3b9edc16 Yang Shi 2019-07-11 1302 goto activate_locked_split;
bd4c82c22c367e Ying Huang 2017-09-06 1303 /* Fallback to swap normal pages */
5ed890ce514785 Ryan Roberts 2024-04-08 1304 if (split_folio_to_list(folio, folio_list))
0f0746589e4be0 Minchan Kim 2017-07-06 1305 goto activate_locked;
fe490cc0fe9e6e Ying Huang 2017-09-06 1306 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
5ed890ce514785 Ryan Roberts 2024-04-08 1307 if (nr_pages >= HPAGE_PMD_NR) {
5ed890ce514785 Ryan Roberts 2024-04-08 1308 count_memcg_folio_events(folio,
5ed890ce514785 Ryan Roberts 2024-04-08 1309 THP_SWPOUT_FALLBACK, 1);
fe490cc0fe9e6e Ying Huang 2017-09-06 1310 count_vm_event(THP_SWPOUT_FALLBACK);
5ed890ce514785 Ryan Roberts 2024-04-08 1311 }
fe490cc0fe9e6e Ying Huang 2017-09-06 1312 #endif
e26060d1fbd31a Kanchana P Sridhar 2024-10-02 1313 count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK);
7d14492199f93c Kairui Song 2025-10-24 1314 if (folio_alloc_swap(folio))
98879b3b9edc16 Yang Shi 2019-07-11 1315 goto activate_locked_split;
0f0746589e4be0 Minchan Kim 2017-07-06 1316 }
b487a2da3575b6 Kairui Song 2025-03-14 1317 /*
b487a2da3575b6 Kairui Song 2025-03-14 1318 * Normally the folio will be dirtied in unmap because its
b487a2da3575b6 Kairui Song 2025-03-14 1319 * pte should be dirty. A special case is MADV_FREE page. The
b487a2da3575b6 Kairui Song 2025-03-14 1320 * page's pte could have dirty bit cleared but the folio's
b487a2da3575b6 Kairui Song 2025-03-14 1321 * SwapBacked flag is still set because clearing the dirty bit
b487a2da3575b6 Kairui Song 2025-03-14 1322 * and SwapBacked flag has no lock protected. For such folio,
b487a2da3575b6 Kairui Song 2025-03-14 1323 * unmap will not set dirty bit for it, so folio reclaim will
b487a2da3575b6 Kairui Song 2025-03-14 1324 * not write the folio out. This can cause data corruption when
b487a2da3575b6 Kairui Song 2025-03-14 1325 * the folio is swapped in later. Always setting the dirty flag
b487a2da3575b6 Kairui Song 2025-03-14 1326 * for the folio solves the problem.
b487a2da3575b6 Kairui Song 2025-03-14 1327 */
b487a2da3575b6 Kairui Song 2025-03-14 1328 folio_mark_dirty(folio);
bd4c82c22c367e Ying Huang 2017-09-06 1329 }
e2be15f6c3eece Mel Gorman 2013-07-03 1330 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1331
98879b3b9edc16 Yang Shi 2019-07-11 1332 /*
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1333) * If the folio was split above, the tail pages will make
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1334) * their own pass through this function and be accounted
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1335) * then.
98879b3b9edc16 Yang Shi 2019-07-11 1336 */
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1337) if ((nr_pages > 1) && !folio_test_large(folio)) {
98879b3b9edc16 Yang Shi 2019-07-11 1338 sc->nr_scanned -= (nr_pages - 1);
98879b3b9edc16 Yang Shi 2019-07-11 1339 nr_pages = 1;
98879b3b9edc16 Yang Shi 2019-07-11 1340 }
98879b3b9edc16 Yang Shi 2019-07-11 1341
^1da177e4c3f41 Linus Torvalds 2005-04-16 1342 /*
1bee2c1677bcb5 Matthew Wilcox (Oracle 2022-05-12 1343) * The folio is mapped into the page tables of one or more
^1da177e4c3f41 Linus Torvalds 2005-04-16 1344 * processes. Try to unmap it here.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1345 */
1bee2c1677bcb5 Matthew Wilcox (Oracle 2022-05-12 1346) if (folio_mapped(folio)) {
013339df116c2e Shakeel Butt 2020-12-14 1347 enum ttu_flags flags = TTU_BATCH_FLUSH;
1bee2c1677bcb5 Matthew Wilcox (Oracle 2022-05-12 1348) bool was_swapbacked = folio_test_swapbacked(folio);
bd4c82c22c367e Ying Huang 2017-09-06 1349
1bee2c1677bcb5 Matthew Wilcox (Oracle 2022-05-12 1350) if (folio_test_pmd_mappable(folio))
bd4c82c22c367e Ying Huang 2017-09-06 1351 flags |= TTU_SPLIT_HUGE_PMD;
73bc32875ee9b1 Barry Song 2024-03-06 1352 /*
73bc32875ee9b1 Barry Song 2024-03-06 1353 * Without TTU_SYNC, try_to_unmap will only begin to
73bc32875ee9b1 Barry Song 2024-03-06 1354 * hold PTL from the first present PTE within a large
73bc32875ee9b1 Barry Song 2024-03-06 1355 * folio. Some initial PTEs might be skipped due to
73bc32875ee9b1 Barry Song 2024-03-06 1356 * races with parallel PTE writes in which PTEs can be
73bc32875ee9b1 Barry Song 2024-03-06 1357 * cleared temporarily before being written new present
73bc32875ee9b1 Barry Song 2024-03-06 1358 * values. This will lead to a large folio is still
73bc32875ee9b1 Barry Song 2024-03-06 1359 * mapped while some subpages have been partially
73bc32875ee9b1 Barry Song 2024-03-06 1360 * unmapped after try_to_unmap; TTU_SYNC helps
73bc32875ee9b1 Barry Song 2024-03-06 1361 * try_to_unmap acquire PTL from the first PTE,
73bc32875ee9b1 Barry Song 2024-03-06 1362 * eliminating the influence of temporary PTE values.
73bc32875ee9b1 Barry Song 2024-03-06 1363 */
e5a119c4a6835a Barry Song 2024-06-30 1364 if (folio_test_large(folio))
73bc32875ee9b1 Barry Song 2024-03-06 1365 flags |= TTU_SYNC;
1f318a9b0dc399 Jaewon Kim 2020-06-03 1366
869f7ee6f64773 Matthew Wilcox (Oracle 2022-02-15 1367) try_to_unmap(folio, flags);
1bee2c1677bcb5 Matthew Wilcox (Oracle 2022-05-12 1368) if (folio_mapped(folio)) {
98879b3b9edc16 Yang Shi 2019-07-11 1369 stat->nr_unmap_fail += nr_pages;
1bee2c1677bcb5 Matthew Wilcox (Oracle 2022-05-12 1370) if (!was_swapbacked &&
1bee2c1677bcb5 Matthew Wilcox (Oracle 2022-05-12 1371) folio_test_swapbacked(folio))
1f318a9b0dc399 Jaewon Kim 2020-06-03 1372 stat->nr_lazyfree_fail += nr_pages;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1373 goto activate_locked;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1374 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1375 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1376
d824ec2a154677 Jan Kara 2023-04-28 1377 /*
d824ec2a154677 Jan Kara 2023-04-28 1378 * Folio is unmapped now so it cannot be newly pinned anymore.
d824ec2a154677 Jan Kara 2023-04-28 1379 * No point in trying to reclaim folio if it is pinned.
d824ec2a154677 Jan Kara 2023-04-28 1380 * Furthermore we don't want to reclaim underlying fs metadata
d824ec2a154677 Jan Kara 2023-04-28 1381 * if the folio is pinned and thus potentially modified by the
d824ec2a154677 Jan Kara 2023-04-28 1382 * pinning process as that may upset the filesystem.
d824ec2a154677 Jan Kara 2023-04-28 1383 */
d824ec2a154677 Jan Kara 2023-04-28 1384 if (folio_maybe_dma_pinned(folio))
d824ec2a154677 Jan Kara 2023-04-28 1385 goto activate_locked;
d824ec2a154677 Jan Kara 2023-04-28 1386
5441d4902f9692 Matthew Wilcox (Oracle 2022-05-12 1387) mapping = folio_mapping(folio);
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1388) if (folio_test_dirty(folio)) {
e2a80749555d73 Baolin Wang 2025-10-17 1389 if (folio_is_file_lru(folio)) {
49ea7eb65e7c50 Mel Gorman 2011-10-31 1390 /*
49ea7eb65e7c50 Mel Gorman 2011-10-31 1391 * Immediately reclaim when written back.
5a9e34747c9f73 Vishal Moola (Oracle 2022-12-21 1392) * Similar in principle to folio_deactivate()
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1393) * except we already have the folio isolated
49ea7eb65e7c50 Mel Gorman 2011-10-31 1394 * and know it's dirty
49ea7eb65e7c50 Mel Gorman 2011-10-31 1395 */
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1396) node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE,
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1397) nr_pages);
e2a80749555d73 Baolin Wang 2025-10-17 1398 if (!folio_test_reclaim(folio))
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1399) folio_set_reclaim(folio);
49ea7eb65e7c50 Mel Gorman 2011-10-31 1400
c55e8d035b28b2 Johannes Weiner 2017-02-24 1401 goto activate_locked;
ee72886d8ed5d9 Mel Gorman 2011-10-31 1402 }
ee72886d8ed5d9 Mel Gorman 2011-10-31 1403
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1404) if (references == FOLIOREF_RECLAIM_CLEAN)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1405 goto keep_locked;
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1406) if (!may_enter_fs(folio, sc->gfp_mask))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1407 goto keep_locked;
52a8363eae3872 Christoph Lameter 2006-02-01 1408 if (!sc->may_writepage)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1409 goto keep_locked;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1410
d950c9477d51f0 Mel Gorman 2015-09-04 1411 /*
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1412) * Folio is dirty. Flush the TLB if a writable entry
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1413) * potentially exists to avoid CPU writes after I/O
d950c9477d51f0 Mel Gorman 2015-09-04 1414 * starts and then write it out here.
d950c9477d51f0 Mel Gorman 2015-09-04 1415 */
d950c9477d51f0 Mel Gorman 2015-09-04 1416 try_to_unmap_flush_dirty();
809bc86517cc40 Baolin Wang 2024-08-12 1417 switch (pageout(folio, mapping, &plug, folio_list)) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 1418 case PAGE_KEEP:
^1da177e4c3f41 Linus Torvalds 2005-04-16 1419 goto keep_locked;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1420 case PAGE_ACTIVATE:
809bc86517cc40 Baolin Wang 2024-08-12 1421 /*
809bc86517cc40 Baolin Wang 2024-08-12 1422 * If shmem folio is split when writeback to swap,
809bc86517cc40 Baolin Wang 2024-08-12 1423 * the tail pages will make their own pass through
809bc86517cc40 Baolin Wang 2024-08-12 1424 * this function and be accounted then.
809bc86517cc40 Baolin Wang 2024-08-12 1425 */
809bc86517cc40 Baolin Wang 2024-08-12 1426 if (nr_pages > 1 && !folio_test_large(folio)) {
809bc86517cc40 Baolin Wang 2024-08-12 1427 sc->nr_scanned -= (nr_pages - 1);
809bc86517cc40 Baolin Wang 2024-08-12 1428 nr_pages = 1;
809bc86517cc40 Baolin Wang 2024-08-12 1429 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1430 goto activate_locked;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1431 case PAGE_SUCCESS:
809bc86517cc40 Baolin Wang 2024-08-12 1432 if (nr_pages > 1 && !folio_test_large(folio)) {
809bc86517cc40 Baolin Wang 2024-08-12 1433 sc->nr_scanned -= (nr_pages - 1);
809bc86517cc40 Baolin Wang 2024-08-12 1434 nr_pages = 1;
809bc86517cc40 Baolin Wang 2024-08-12 1435 }
c79b7b96db8b12 Matthew Wilcox (Oracle 2022-01-17 1436) stat->nr_pageout += nr_pages;
96f8bf4fb1dd26 Johannes Weiner 2020-06-03 1437
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1438) if (folio_test_writeback(folio))
41ac1999c3e356 Mel Gorman 2012-05-29 1439 goto keep;
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1440) if (folio_test_dirty(folio))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1441 goto keep;
7d3579e8e61937 KOSAKI Motohiro 2010-10-26 1442
^1da177e4c3f41 Linus Torvalds 2005-04-16 1443 /*
^1da177e4c3f41 Linus Torvalds 2005-04-16 1444 * A synchronous write - probably a ramdisk. Go
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1445) * ahead and try to reclaim the folio.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1446 */
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1447) if (!folio_trylock(folio))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1448 goto keep;
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1449) if (folio_test_dirty(folio) ||
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1450) folio_test_writeback(folio))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1451 goto keep_locked;
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1452) mapping = folio_mapping(folio);
01359eb2013b4b Gustavo A. R. Silva 2020-12-14 1453 fallthrough;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1454 case PAGE_CLEAN:
49bd2bf9679f4a Matthew Wilcox (Oracle 2022-05-12 1455) ; /* try to free the folio below */
^1da177e4c3f41 Linus Torvalds 2005-04-16 1456 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1457 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1458
^1da177e4c3f41 Linus Torvalds 2005-04-16 1459 /*
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1460) * If the folio has buffers, try to free the buffer
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1461) * mappings associated with this folio. If we succeed
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1462) * we try to free the folio as well.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1463 *
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1464) * We do this even if the folio is dirty.
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1465) * filemap_release_folio() does not perform I/O, but it
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1466) * is possible for a folio to have the dirty flag set,
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1467) * but it is actually clean (all its buffers are clean).
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1468) * This happens if the buffers were written out directly,
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1469) * with submit_bh(). ext3 will do this, as well as
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1470) * the blockdev mapping. filemap_release_folio() will
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1471) * discover that cleanness and will drop the buffers
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1472) * and mark the folio clean - it can be freed.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1473 *
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1474) * Rarely, folios can have buffers and no ->mapping.
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1475) * These are the folios which were not successfully
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1476) * invalidated in truncate_cleanup_folio(). We try to
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1477) * drop those buffers here and if that worked, and the
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1478) * folio is no longer mapped into process address space
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1479) * (refcount == 1) it can be freed. Otherwise, leave
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1480) * the folio on the LRU so it is swappable.
^1da177e4c3f41 Linus Torvalds 2005-04-16 1481 */
0201ebf274a306 David Howells 2023-06-28 1482 if (folio_needs_release(folio)) {
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1483) if (!filemap_release_folio(folio, sc->gfp_mask))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1484 goto activate_locked;
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1485) if (!mapping && folio_ref_count(folio) == 1) {
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1486) folio_unlock(folio);
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1487) if (folio_put_testzero(folio))
^1da177e4c3f41 Linus Torvalds 2005-04-16 1488 goto free_it;
e286781d5f2e9c Nicholas Piggin 2008-07-25 1489 else {
e286781d5f2e9c Nicholas Piggin 2008-07-25 1490 /*
e286781d5f2e9c Nicholas Piggin 2008-07-25 1491 * rare race with speculative reference.
e286781d5f2e9c Nicholas Piggin 2008-07-25 1492 * the speculative reference will free
0a36111c8c20b2 Matthew Wilcox (Oracle 2022-05-12 1493) * this folio shortly, so we may
e286781d5f2e9c Nicholas Piggin 2008-07-25 1494 * increment nr_reclaimed here (and
e286781d5f2e9c Nicholas Piggin 2008-07-25 1495 * leave it off the LRU).
e286781d5f2e9c Nicholas Piggin 2008-07-25 1496 */
9aafcffc18785f Miaohe Lin 2022-05-12 1497 nr_reclaimed += nr_pages;
e286781d5f2e9c Nicholas Piggin 2008-07-25 1498 continue;
e286781d5f2e9c Nicholas Piggin 2008-07-25 1499 }
e286781d5f2e9c Nicholas Piggin 2008-07-25 1500 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1501 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1502
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1503) if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) {
802a3a92ad7ac0 Shaohua Li 2017-05-03 1504 /* follow __remove_mapping for reference */
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1505) if (!folio_ref_freeze(folio, 1))
49d2e9cc454436 Christoph Lameter 2006-01-08 1506 goto keep_locked;
d17be2d9ff6c68 Miaohe Lin 2021-09-02 1507 /*
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1508) * The folio has only one reference left, which is
d17be2d9ff6c68 Miaohe Lin 2021-09-02 1509 * from the isolation. After the caller puts the
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1510) * folio back on the lru and drops the reference, the
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1511) * folio will be freed anyway. It doesn't matter
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1512) * which lru it goes on. So we don't bother checking
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1513) * the dirty flag here.
d17be2d9ff6c68 Miaohe Lin 2021-09-02 1514 */
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1515) count_vm_events(PGLAZYFREED, nr_pages);
64daa5d818ae34 Matthew Wilcox (Oracle 2022-05-12 1516) count_memcg_folio_events(folio, PGLAZYFREED, nr_pages);
be7c07d60e13ac Matthew Wilcox (Oracle 2021-12-23 1517) } else if (!mapping || !__remove_mapping(mapping, folio, true,
b910718a948a91 Johannes Weiner 2019-11-30 1518 sc->target_mem_cgroup))
802a3a92ad7ac0 Shaohua Li 2017-05-03 1519 goto keep_locked;
9a1ea439b16b92 Hugh Dickins 2018-12-28 1520
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1521) folio_unlock(folio);
e286781d5f2e9c Nicholas Piggin 2008-07-25 1522 free_it:
98879b3b9edc16 Yang Shi 2019-07-11 1523 /*
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1524) * Folio may get swapped out as a whole, need to account
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1525) * all pages in it.
98879b3b9edc16 Yang Shi 2019-07-11 1526 */
98879b3b9edc16 Yang Shi 2019-07-11 1527 nr_reclaimed += nr_pages;
abe4c3b50c3f25 Mel Gorman 2010-08-09 1528
f8f931bba0f920 Hugh Dickins 2024-10-27 1529 folio_unqueue_deferred_split(folio);
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1530) if (folio_batch_add(&free_folios, folio) == 0) {
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1531) mem_cgroup_uncharge_folios(&free_folios);
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1532) try_to_unmap_flush();
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1533) free_unref_folios(&free_folios);
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1534) }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1535 continue;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1536
98879b3b9edc16 Yang Shi 2019-07-11 1537 activate_locked_split:
98879b3b9edc16 Yang Shi 2019-07-11 1538 /*
98879b3b9edc16 Yang Shi 2019-07-11 1539 * The tail pages that are failed to add into swap cache
98879b3b9edc16 Yang Shi 2019-07-11 1540 * reach here. Fixup nr_scanned and nr_pages.
98879b3b9edc16 Yang Shi 2019-07-11 1541 */
98879b3b9edc16 Yang Shi 2019-07-11 1542 if (nr_pages > 1) {
98879b3b9edc16 Yang Shi 2019-07-11 1543 sc->nr_scanned -= (nr_pages - 1);
98879b3b9edc16 Yang Shi 2019-07-11 1544 nr_pages = 1;
98879b3b9edc16 Yang Shi 2019-07-11 1545 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1546 activate_locked:
68a22394c286a2 Rik van Riel 2008-10-18 1547 /* Not a candidate for swapping, so reclaim swap space. */
246b648038096c Matthew Wilcox (Oracle 2022-05-12 1548) if (folio_test_swapcache(folio) &&
9202d527b715f6 Matthew Wilcox (Oracle 2022-09-02 1549) (mem_cgroup_swap_full(folio) || folio_test_mlocked(folio)))
bdb0ed54a4768d Matthew Wilcox (Oracle 2022-09-02 1550) folio_free_swap(folio);
246b648038096c Matthew Wilcox (Oracle 2022-05-12 1551) VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
246b648038096c Matthew Wilcox (Oracle 2022-05-12 1552) if (!folio_test_mlocked(folio)) {
246b648038096c Matthew Wilcox (Oracle 2022-05-12 1553) int type = folio_is_file_lru(folio);
246b648038096c Matthew Wilcox (Oracle 2022-05-12 1554) folio_set_active(folio);
98879b3b9edc16 Yang Shi 2019-07-11 1555 stat->nr_activate[type] += nr_pages;
246b648038096c Matthew Wilcox (Oracle 2022-05-12 1556) count_memcg_folio_events(folio, PGACTIVATE, nr_pages);
ad6b67041a4549 Minchan Kim 2017-05-03 1557 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1558 keep_locked:
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1559) folio_unlock(folio);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1560 keep:
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1561) list_add(&folio->lru, &ret_folios);
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1562) VM_BUG_ON_FOLIO(folio_test_lru(folio) ||
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1563) folio_test_unevictable(folio), folio);
^1da177e4c3f41 Linus Torvalds 2005-04-16 1564 }
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1565) /* 'folio_list' is always empty here */
26aa2d199d6f2c Dave Hansen 2021-09-02 1566
c28a0e9695b724 Matthew Wilcox (Oracle 2022-05-12 1567) /* Migrate folios selected for demotion */
a479b078fddb0a Li Zhijian 2025-01-10 1568 nr_demoted = demote_folio_list(&demote_folios, pgdat);
a479b078fddb0a Li Zhijian 2025-01-10 1569 nr_reclaimed += nr_demoted;
a479b078fddb0a Li Zhijian 2025-01-10 1570 stat->nr_demoted += nr_demoted;
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1571) /* Folios that could not be demoted are still in @demote_folios */
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1572) if (!list_empty(&demote_folios)) {
6b426d071419a4 Mina Almasry 2022-12-01 1573 /* Folios which weren't demoted go back on @folio_list */
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1574) list_splice_init(&demote_folios, folio_list);
6b426d071419a4 Mina Almasry 2022-12-01 1575
6b426d071419a4 Mina Almasry 2022-12-01 1576 /*
6b426d071419a4 Mina Almasry 2022-12-01 1577 * goto retry to reclaim the undemoted folios in folio_list if
6b426d071419a4 Mina Almasry 2022-12-01 1578 * desired.
6b426d071419a4 Mina Almasry 2022-12-01 1579 *
6b426d071419a4 Mina Almasry 2022-12-01 1580 * Reclaiming directly from top tier nodes is not often desired
6b426d071419a4 Mina Almasry 2022-12-01 1581 * due to it breaking the LRU ordering: in general memory
6b426d071419a4 Mina Almasry 2022-12-01 1582 * should be reclaimed from lower tier nodes and demoted from
6b426d071419a4 Mina Almasry 2022-12-01 1583 * top tier nodes.
6b426d071419a4 Mina Almasry 2022-12-01 1584 *
6b426d071419a4 Mina Almasry 2022-12-01 1585 * However, disabling reclaim from top tier nodes entirely
6b426d071419a4 Mina Almasry 2022-12-01 1586 * would cause ooms in edge scenarios where lower tier memory
6b426d071419a4 Mina Almasry 2022-12-01 1587 * is unreclaimable for whatever reason, eg memory being
6b426d071419a4 Mina Almasry 2022-12-01 1588 * mlocked or too hot to reclaim. We can disable reclaim
6b426d071419a4 Mina Almasry 2022-12-01 1589 * from top tier nodes in proactive reclaim though as that is
6b426d071419a4 Mina Almasry 2022-12-01 1590 * not real memory pressure.
6b426d071419a4 Mina Almasry 2022-12-01 1591 */
6b426d071419a4 Mina Almasry 2022-12-01 1592 if (!sc->proactive) {
26aa2d199d6f2c Dave Hansen 2021-09-02 1593 do_demote_pass = false;
26aa2d199d6f2c Dave Hansen 2021-09-02 1594 goto retry;
26aa2d199d6f2c Dave Hansen 2021-09-02 1595 }
6b426d071419a4 Mina Almasry 2022-12-01 1596 }
abe4c3b50c3f25 Mel Gorman 2010-08-09 1597
98879b3b9edc16 Yang Shi 2019-07-11 1598 pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
98879b3b9edc16 Yang Shi 2019-07-11 1599
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1600) mem_cgroup_uncharge_folios(&free_folios);
72b252aed506b8 Mel Gorman 2015-09-04 1601 try_to_unmap_flush();
bc2ff4cbc3294c Matthew Wilcox (Oracle 2024-02-27 1602) free_unref_folios(&free_folios);
abe4c3b50c3f25 Mel Gorman 2010-08-09 1603
49fd9b6df54e61 Matthew Wilcox (Oracle 2022-09-02 1604) list_splice(&ret_folios, folio_list);
886cf1901db962 Kirill Tkhai 2019-05-13 1605 count_vm_events(PGACTIVATE, pgactivate);
060f005f074791 Kirill Tkhai 2019-03-05 1606
2282679fb20bf0 NeilBrown 2022-05-09 1607 if (plug)
2282679fb20bf0 NeilBrown 2022-05-09 1608 swap_write_unplug(plug);
05ff51376f01fd Andrew Morton 2006-03-22 1609 return nr_reclaimed;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1610 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 1611
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Thu, Oct 30, 2025 at 3:30 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Kairui,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on f30d294530d939fa4b77d61bc60f25c4284841fa]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Kairui-Song/mm-swap-rename-__read_swap_cache_async-to-swap_cache_alloc_folio/20251030-000506
> base: f30d294530d939fa4b77d61bc60f25c4284841fa
> patch link: https://lore.kernel.org/r/20251029-swap-table-p2-v1-14-3d43f3b6ec32%40tencent.com
> patch subject: [PATCH 14/19] mm, swap: sanitize swap entry management workflow
> config: i386-allnoconfig (https://download.01.org/0day-ci/archive/20251030/202510300316.UL4gxAlC-lkp@intel.com/config)
> compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251030/202510300316.UL4gxAlC-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202510300316.UL4gxAlC-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> In file included from mm/vmscan.c:70:
> mm/swap.h: In function 'swap_cache_add_folio':
> mm/swap.h:465:1: warning: no return statement in function returning non-void [-Wreturn-type]
> 465 | }
> | ^
> mm/vmscan.c: In function 'shrink_folio_list':
> >> mm/vmscan.c:1298:37: error: too few arguments to function 'folio_alloc_swap'
> 1298 | if (folio_alloc_swap(folio)) {
> | ^~~~~~~~~~~~~~~~
> mm/swap.h:388:19: note: declared here
> 388 | static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
> | ^~~~~~~~~~~~~~~~
> mm/vmscan.c:1314:45: error: too few arguments to function 'folio_alloc_swap'
> 1314 | if (folio_alloc_swap(folio))
> | ^~~~~~~~~~~~~~~~
> mm/swap.h:388:19: note: declared here
> 388 | static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
> | ^~~~~~~~~~~~~~~~
> --
> In file included from mm/shmem.c:44:
> mm/swap.h: In function 'swap_cache_add_folio':
> mm/swap.h:465:1: warning: no return statement in function returning non-void [-Wreturn-type]
> 465 | }
> | ^
> mm/shmem.c: In function 'shmem_writeout':
> >> mm/shmem.c:1649:14: error: too few arguments to function 'folio_alloc_swap'
> 1649 | if (!folio_alloc_swap(folio)) {
> | ^~~~~~~~~~~~~~~~
> mm/swap.h:388:19: note: declared here
> 388 | static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
> | ^~~~~~~~~~~~~~~~
>
Thanks, I forgot to update the empty place holder for folio_alloc_swap
during rebase:
diff --git a/mm/swap.h b/mm/swap.h
index 74c61129d7b7..9aa99061573a 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -385,7 +385,7 @@ static inline struct swap_info_struct
*__swap_entry_to_info(swp_entry_t entry)
return NULL;
}
-static inline int folio_alloc_swap(struct folio *folio, gfp_t gfp)
+static inline int folio_alloc_swap(struct folio *folio)
{
return -EINVAL;
}
© 2016 - 2026 Red Hat, Inc.