[PATCH v3 05/15] mm, swap: always lock and check the swap cache folio before use

Kairui Song posted 15 patches 2 months, 4 weeks ago
There is a newer version of this series
[PATCH v3 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by Kairui Song 2 months, 4 weeks ago
From: Kairui Song <kasong@tencent.com>

Swap cache lookup only increases the reference count of the returned
folio. That's not enough to ensure a folio is stable in the swap
cache, so the folio could be removed from the swap cache at any
time. The caller should always lock and check the folio before using it.

We have just documented this in kerneldoc, now introduce a helper for swap
cache folio verification with proper sanity checks.

Also, sanitize a few current users to use this convention and the new
helper for easier debugging. They were not having observable problems
yet, only trivial issues like wasted CPU cycles on swapoff or
reclaiming. They would fail in some other way, but it is still better to
always follow this convention to make things robust and make later
commits easier to do.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/memory.c     |  3 +--
 mm/swap.h       | 27 +++++++++++++++++++++++++++
 mm/swap_state.c |  7 +++++--
 mm/swapfile.c   | 11 ++++++++---
 4 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 94a5928e8ace..5808c4ef21b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4748,8 +4748,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		 * swapcache, we need to check that the page's swap has not
 		 * changed.
 		 */
-		if (unlikely(!folio_test_swapcache(folio) ||
-			     page_swap_entry(page).val != entry.val))
+		if (unlikely(!folio_matches_swap_entry(folio, entry)))
 			goto out_page;
 
 		if (unlikely(PageHWPoison(page))) {
diff --git a/mm/swap.h b/mm/swap.h
index efb6d7ff9f30..7d868f8de696 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -52,6 +52,28 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
 	return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK;
 }
 
+/**
+ * folio_matches_swap_entry - Check if a folio matches a given swap entry.
+ * @folio: The folio.
+ * @entry: The swap entry to check against.
+ *
+ * Context: The caller should have the folio locked to ensure it's stable
+ * and nothing will move it in or out of the swap cache.
+ * Return: true or false.
+ */
+static inline bool folio_matches_swap_entry(const struct folio *folio,
+					    swp_entry_t entry)
+{
+	swp_entry_t folio_entry = folio->swap;
+	long nr_pages = folio_nr_pages(folio);
+
+	VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
+	if (!folio_test_swapcache(folio))
+		return false;
+	VM_WARN_ON_ONCE_FOLIO(!IS_ALIGNED(folio_entry.val, nr_pages), folio);
+	return folio_entry.val == round_down(entry.val, nr_pages);
+}
+
 void show_swap_cache_info(void);
 void *get_shadow_from_swap_cache(swp_entry_t entry);
 int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
@@ -144,6 +166,11 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
 	return 0;
 }
 
+static inline bool folio_matches_swap_entry(const struct folio *folio, swp_entry_t entry)
+{
+	return false;
+}
+
 static inline void show_swap_cache_info(void)
 {
 }
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 68ec531d0f2b..9225d6b695ad 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -79,7 +79,7 @@ void show_swap_cache_info(void)
  * with reference count or locks.
  * Return: Returns the found folio on success, NULL otherwise. The caller
  * must lock and check if the folio still matches the swap entry before
- * use.
+ * use (e.g. with folio_matches_swap_entry).
  */
 struct folio *swap_cache_get_folio(swp_entry_t entry)
 {
@@ -346,7 +346,10 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 	for (;;) {
 		int err;
 
-		/* Check the swap cache in case the folio is already there */
+		/*
+		 * Check the swap cache first, if a cached folio is found,
+		 * return it unlocked. The caller will lock and check it.
+		 */
 		folio = swap_cache_get_folio(entry);
 		if (folio)
 			goto got_folio;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4baebd8b48f4..f1a4d381d719 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -240,14 +240,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
 	 * Offset could point to the middle of a large folio, or folio
 	 * may no longer point to the expected offset before it's locked.
 	 */
-	if (offset < swp_offset(folio->swap) ||
-	    offset >= swp_offset(folio->swap) + nr_pages) {
+	if (!folio_matches_swap_entry(folio, entry)) {
 		folio_unlock(folio);
 		folio_put(folio);
 		goto again;
 	}
 	offset = swp_offset(folio->swap);
-
 	need_reclaim = ((flags & TTRS_ANYWAY) ||
 			((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
 			((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
@@ -2004,6 +2002,13 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 	bool hwpoisoned = false;
 	int ret = 1;
 
+	/*
+	 * If the folio is removed from swap cache by others, continue to
+	 * unuse other PTEs. try_to_unuse may try again if we missed this one.
+	 */
+	if (!folio_matches_swap_entry(folio, entry))
+		return 0;
+
 	swapcache = folio;
 	folio = ksm_might_need_to_copy(folio, vma, addr);
 	if (unlikely(!folio))
-- 
2.51.0
Re: [PATCH v3 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by Nhat Pham 2 months, 3 weeks ago
On Wed, Sep 10, 2025 at 9:09 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Swap cache lookup only increases the reference count of the returned
> folio. That's not enough to ensure a folio is stable in the swap
> cache, so the folio could be removed from the swap cache at any
> time. The caller should always lock and check the folio before using it.
>
> We have just documented this in kerneldoc, now introduce a helper for swap
> cache folio verification with proper sanity checks.
>
> Also, sanitize a few current users to use this convention and the new
> helper for easier debugging. They were not having observable problems
> yet, only trivial issues like wasted CPU cycles on swapoff or
> reclaiming. They would fail in some other way, but it is still better to
> always follow this convention to make things robust and make later
> commits easier to do.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>

Apologies for the slow response. Swap code is dense, so it takes me
awhile to read each patch :(

Anyway, this LGTM!

Acked-by: Nhat Pham <nphamcs@gmail.com>
Re: [PATCH v3 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by Chris Li 2 months, 3 weeks ago
On Wed, Sep 10, 2025 at 9:09 AM Kairui Song <ryncsn@gmail.com> wrote:
> @@ -2004,6 +2002,13 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>         bool hwpoisoned = false;
>         int ret = 1;
>
> +       /*
> +        * If the folio is removed from swap cache by others, continue to
> +        * unuse other PTEs. try_to_unuse may try again if we missed this one.
> +        */

It took me a while to figure out why we add a
folio_matches_swap_entry() check here but we don't have an existing
check for folio swap entry matching in this function. Can you confirm
that if a race has happened causing the folio swap entry mismatch,
then try_to_usue() MUST try again rather than "may" try again. It
seems to me that it is a MUST rather than "may". I am curious in what
condition the mismatch happens and the try_to_unuse() does not need to
try again.

BTW, this function has three types of return value, 1, 0, and negative
for error. The 0 and 1 are ignored by the caller, the caller only
cares about the negative value. Overall this unuse_pte() and
try_to_unuse() walk feels complicated and maybe a candidate for later
clean up. That is not your patch's fault. I am not requesting a
cleanup in this series.

With that Nitpick,

Acked-by: Chris Li <chrisl@kernel.org>

Chris

> +       if (!folio_matches_swap_entry(folio, entry))
> +               return 0;
> +
>         swapcache = folio;
>         folio = ksm_might_need_to_copy(folio, vma, addr);
>         if (unlikely(!folio))
> --
> 2.51.0
>
Re: [PATCH v3 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by Kairui Song 2 months, 3 weeks ago
On Mon, Sep 15, 2025 at 11:07 PM Chris Li <chrisl@kernel.org> wrote:
>
> On Wed, Sep 10, 2025 at 9:09 AM Kairui Song <ryncsn@gmail.com> wrote:
> > @@ -2004,6 +2002,13 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
> >         bool hwpoisoned = false;
> >         int ret = 1;
> >
> > +       /*
> > +        * If the folio is removed from swap cache by others, continue to
> > +        * unuse other PTEs. try_to_unuse may try again if we missed this one.
> > +        */
>
> It took me a while to figure out why we add a
> folio_matches_swap_entry() check here but we don't have an existing
> check for folio swap entry matching in this function. Can you confirm
> that if a race has happened causing the folio swap entry mismatch,
> then try_to_usue() MUST try again rather than "may" try again. It
> seems to me that it is a MUST rather than "may". I am curious in what
> condition the mismatch happens and the try_to_unuse() does not need to
> try again.

It depends, I think it may, because: for example here we are unusing
folio A with entry S1, and a raced process just swapin folio A then
remove it from swap cache. If the raced process didn't swapout the PTE
again, then there is no need to retry as we are dong with this PTE.

There are many races, someone else swapped out folio A again using S2.
Then here we will see a !folio_matches_swap_entry. And that's OK.

But there have been many other subtle race conditions in other places,
e.g. another folio occupied the same PTE and got swapped out using S1,
causing PTE == S1 and here got a wrong folio A installed. This isn't
happening here, because we have removed the !SWP_WRITEOK flag from the
si during swapoff...

It's really complex and fragile, so just make it easier, check
folio_matches_swap_entry and abort early, is safer and follows the
convention.

> BTW, this function has three types of return value, 1, 0, and negative
> for error. The 0 and 1 are ignored by the caller, the caller only
> cares about the negative value. Overall this unuse_pte() and
> try_to_unuse() walk feels complicated and maybe a candidate for later
> clean up. That is not your patch's fault. I am not requesting a
> cleanup in this series.

Right, totally agree, we can cleanup the swapoff part later.

> With that Nitpick,
>
> Acked-by: Chris Li <chrisl@kernel.org>

Thanks!
Re: [PATCH v3 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by David Hildenbrand 2 months, 3 weeks ago
On 10.09.25 18:08, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
> 
> Swap cache lookup only increases the reference count of the returned
> folio. That's not enough to ensure a folio is stable in the swap
> cache, so the folio could be removed from the swap cache at any
> time. The caller should always lock and check the folio before using it.
> 
> We have just documented this in kerneldoc, now introduce a helper for swap
> cache folio verification with proper sanity checks.
> 
> Also, sanitize a few current users to use this convention and the new
> helper for easier debugging. They were not having observable problems
> yet, only trivial issues like wasted CPU cycles on swapoff or
> reclaiming. They would fail in some other way, but it is still better to
> always follow this convention to make things robust and make later
> commits easier to do.
> 
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---

[...]

> index 4baebd8b48f4..f1a4d381d719 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -240,14 +240,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>   	 * Offset could point to the middle of a large folio, or folio
>   	 * may no longer point to the expected offset before it's locked.
>   	 */
> -	if (offset < swp_offset(folio->swap) ||
> -	    offset >= swp_offset(folio->swap) + nr_pages) {
> +	if (!folio_matches_swap_entry(folio, entry)) {
>   		folio_unlock(folio);
>   		folio_put(folio);
>   		goto again;
>   	}
>   	offset = swp_offset(folio->swap);
> -

Nit: unrelated change.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb