[PATCH v2 05/15] mm, swap: always lock and check the swap cache folio before use

Kairui Song posted 15 patches 4 days, 10 hours ago
[PATCH v2 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by Kairui Song 4 days, 10 hours ago
From: Kairui Song <kasong@tencent.com>

Swap cache lookup only increases the reference count of the returned
folio. That's not enough to ensure a folio is stable in the swap
cache, so the folio could be removed from the swap cache at any
time. The caller should always lock and check the folio before using it.

We have just documented this in kerneldoc, now introduce a helper for swap
cache folio verification with proper sanity checks.

Also, sanitize a few current users to use this convention and the new
helper for easier debugging. They were not having observable problems
yet, only trivial issues like wasted CPU cycles on swapoff or
reclaiming. They would fail in some other way, but it is still better to
always follow this convention to make things robust and make later
commits easier to do.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/memory.c     |  3 +--
 mm/swap.h       | 24 ++++++++++++++++++++++++
 mm/swap_state.c |  7 +++++--
 mm/swapfile.c   | 10 +++++++---
 4 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 94a5928e8ace..5808c4ef21b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4748,8 +4748,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		 * swapcache, we need to check that the page's swap has not
 		 * changed.
 		 */
-		if (unlikely(!folio_test_swapcache(folio) ||
-			     page_swap_entry(page).val != entry.val))
+		if (unlikely(!folio_matches_swap_entry(folio, entry)))
 			goto out_page;
 
 		if (unlikely(PageHWPoison(page))) {
diff --git a/mm/swap.h b/mm/swap.h
index efb6d7ff9f30..a69e18b12b45 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -52,6 +52,25 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
 	return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK;
 }
 
+/**
+ * folio_matches_swap_entry - Check if a folio matches a given swap entry.
+ * @folio: The folio.
+ * @entry: The swap entry to check against.
+ *
+ * Context: The caller should have the folio locked to ensure it's stable
+ * and nothing will move it in or out of the swap cache.
+ * Return: true or false.
+ */
+static inline bool folio_matches_swap_entry(const struct folio *folio,
+					    swp_entry_t entry)
+{
+	VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
+	if (!folio_test_swapcache(folio))
+		return false;
+	VM_WARN_ON_ONCE_FOLIO(!IS_ALIGNED(folio->swap.val, folio_nr_pages(folio)), folio);
+	return folio->swap.val == round_down(entry.val, folio_nr_pages(folio));
+}
+
 void show_swap_cache_info(void);
 void *get_shadow_from_swap_cache(swp_entry_t entry);
 int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
@@ -144,6 +163,11 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
 	return 0;
 }
 
+static inline bool folio_matches_swap_entry(const struct folio *folio, swp_entry_t entry)
+{
+	return false;
+}
+
 static inline void show_swap_cache_info(void)
 {
 }
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 68ec531d0f2b..9225d6b695ad 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -79,7 +79,7 @@ void show_swap_cache_info(void)
  * with reference count or locks.
  * Return: Returns the found folio on success, NULL otherwise. The caller
  * must lock and check if the folio still matches the swap entry before
- * use.
+ * use (e.g. with folio_matches_swap_entry).
  */
 struct folio *swap_cache_get_folio(swp_entry_t entry)
 {
@@ -346,7 +346,10 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 	for (;;) {
 		int err;
 
-		/* Check the swap cache in case the folio is already there */
+		/*
+		 * Check the swap cache first, if a cached folio is found,
+		 * return it unlocked. The caller will lock and check it.
+		 */
 		folio = swap_cache_get_folio(entry);
 		if (folio)
 			goto got_folio;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4c63fc62f4cb..1bd90f17440f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -240,14 +240,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
 	 * Offset could point to the middle of a large folio, or folio
 	 * may no longer point to the expected offset before it's locked.
 	 */
-	if (offset < swp_offset(folio->swap) ||
-	    offset >= swp_offset(folio->swap) + nr_pages) {
+	if (!folio_matches_swap_entry(folio, entry)) {
 		folio_unlock(folio);
 		folio_put(folio);
 		goto again;
 	}
 	offset = swp_offset(folio->swap);
-
 	need_reclaim = ((flags & TTRS_ANYWAY) ||
 			((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
 			((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
@@ -2150,6 +2148,12 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		}
 
 		folio_lock(folio);
+		if (!folio_matches_swap_entry(folio, entry)) {
+			folio_unlock(folio);
+			folio_put(folio);
+			continue;
+		}
+
 		folio_wait_writeback(folio);
 		ret = unuse_pte(vma, pmd, addr, entry, folio);
 		if (ret < 0) {
-- 
2.51.0
Re: [PATCH v2 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by David Hildenbrand 1 day, 17 hours ago
>   
>   		folio_lock(folio);
> +		if (!folio_matches_swap_entry(folio, entry)) {
> +			folio_unlock(folio);
> +			folio_put(folio);
> +			continue;
> +		}
> +

I wonder if we should put that into unuse_pte() instead. It checks for 
other types of races (like the page table entry getting modified) already.

-- 
Cheers

David / dhildenb
Re: [PATCH v2 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by Kairui Song 14 hours ago
On Mon, Sep 8, 2025 at 10:08 PM David Hildenbrand <david@redhat.com> wrote:
>
>
> >
> >               folio_lock(folio);
> > +             if (!folio_matches_swap_entry(folio, entry)) {
> > +                     folio_unlock(folio);
> > +                     folio_put(folio);
> > +                     continue;
> > +             }
> > +
>
> I wonder if we should put that into unuse_pte() instead. It checks for
> other types of races (like the page table entry getting modified) already.

Doing this earlier here might help to avoid the folio_wait_writeback
below? And checking the folio right after locking seems to follow the
convention more strictly.

I'm fine either way though as there should be almost no difference.

> --
> Cheers
>
> David / dhildenb
>
>
Re: [PATCH v2 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by David Hildenbrand 14 hours ago
On 09.09.25 16:58, Kairui Song wrote:
> On Mon, Sep 8, 2025 at 10:08 PM David Hildenbrand <david@redhat.com> wrote:
>>
>>
>>>
>>>                folio_lock(folio);
>>> +             if (!folio_matches_swap_entry(folio, entry)) {
>>> +                     folio_unlock(folio);
>>> +                     folio_put(folio);
>>> +                     continue;
>>> +             }
>>> +
>>
>> I wonder if we should put that into unuse_pte() instead. It checks for
>> other types of races (like the page table entry getting modified) already.
> 
> Doing this earlier here might help to avoid the folio_wait_writeback
> below? 

Why would we care about optimizing that out in that corner case?

And checking the folio right after locking seems to follow the
> convention more strictly.

I'd just slap it into unuse_pte() where you can return immediately and 
we don't need another duplicated

	folio_unlock(folio);
	folio_put(folio);
	continue;

-- 
Cheers

David / dhildenb

Re: [PATCH v2 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by Chris Li 4 days, 3 hours ago
Looks correct to me.

Acked-by: Chris Li <chrisl@kernel.org>

With some nitpick follows,

On Fri, Sep 5, 2025 at 12:14 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Swap cache lookup only increases the reference count of the returned
> folio. That's not enough to ensure a folio is stable in the swap
> cache, so the folio could be removed from the swap cache at any
> time. The caller should always lock and check the folio before using it.
>
> We have just documented this in kerneldoc, now introduce a helper for swap
> cache folio verification with proper sanity checks.
>
> Also, sanitize a few current users to use this convention and the new
> helper for easier debugging. They were not having observable problems
> yet, only trivial issues like wasted CPU cycles on swapoff or
> reclaiming. They would fail in some other way, but it is still better to
> always follow this convention to make things robust and make later
> commits easier to do.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/memory.c     |  3 +--
>  mm/swap.h       | 24 ++++++++++++++++++++++++
>  mm/swap_state.c |  7 +++++--
>  mm/swapfile.c   | 10 +++++++---
>  4 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 94a5928e8ace..5808c4ef21b3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4748,8 +4748,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>                  * swapcache, we need to check that the page's swap has not
>                  * changed.
>                  */
> -               if (unlikely(!folio_test_swapcache(folio) ||
> -                            page_swap_entry(page).val != entry.val))
> +               if (unlikely(!folio_matches_swap_entry(folio, entry)))
>                         goto out_page;
>
>                 if (unlikely(PageHWPoison(page))) {
> diff --git a/mm/swap.h b/mm/swap.h
> index efb6d7ff9f30..a69e18b12b45 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -52,6 +52,25 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
>         return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK;
>  }
>
> +/**
> + * folio_matches_swap_entry - Check if a folio matches a given swap entry.
> + * @folio: The folio.
> + * @entry: The swap entry to check against.
> + *
> + * Context: The caller should have the folio locked to ensure it's stable
> + * and nothing will move it in or out of the swap cache.
> + * Return: true or false.
> + */
> +static inline bool folio_matches_swap_entry(const struct folio *folio,
> +                                           swp_entry_t entry)
> +{
> +       VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
> +       if (!folio_test_swapcache(folio))
> +               return false;
> +       VM_WARN_ON_ONCE_FOLIO(!IS_ALIGNED(folio->swap.val, folio_nr_pages(folio)), folio);

You should cache the folio->swap.val into a local register value.
Because WARN_ON_ONCE I think the compiler has no choice but to load it
twice? Haven't verified it myself.

There is no downside in compiler point of view using more local
variables, the compiler generates an internal version of the local
variable equivalent anyway.

> +       return folio->swap.val == round_down(entry.val, folio_nr_pages(folio));

Same for folio_nr_pages(folio), you should cache it. The function will
look less busy.

Chris

> +}
> +
>  void show_swap_cache_info(void);
>  void *get_shadow_from_swap_cache(swp_entry_t entry);
>  int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
> @@ -144,6 +163,11 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
>         return 0;
>  }
>
> +static inline bool folio_matches_swap_entry(const struct folio *folio, swp_entry_t entry)
> +{
> +       return false;
> +}
> +
>  static inline void show_swap_cache_info(void)
>  {
>  }
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 68ec531d0f2b..9225d6b695ad 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -79,7 +79,7 @@ void show_swap_cache_info(void)
>   * with reference count or locks.
>   * Return: Returns the found folio on success, NULL otherwise. The caller
>   * must lock and check if the folio still matches the swap entry before
> - * use.
> + * use (e.g. with folio_matches_swap_entry).
>   */
>  struct folio *swap_cache_get_folio(swp_entry_t entry)
>  {
> @@ -346,7 +346,10 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>         for (;;) {
>                 int err;
>
> -               /* Check the swap cache in case the folio is already there */
> +               /*
> +                * Check the swap cache first, if a cached folio is found,
> +                * return it unlocked. The caller will lock and check it.
> +                */
>                 folio = swap_cache_get_folio(entry);
>                 if (folio)
>                         goto got_folio;
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 4c63fc62f4cb..1bd90f17440f 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -240,14 +240,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>          * Offset could point to the middle of a large folio, or folio
>          * may no longer point to the expected offset before it's locked.
>          */
> -       if (offset < swp_offset(folio->swap) ||
> -           offset >= swp_offset(folio->swap) + nr_pages) {
> +       if (!folio_matches_swap_entry(folio, entry)) {
>                 folio_unlock(folio);
>                 folio_put(folio);
>                 goto again;
>         }
>         offset = swp_offset(folio->swap);
> -
>         need_reclaim = ((flags & TTRS_ANYWAY) ||
>                         ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
>                         ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
> @@ -2150,6 +2148,12 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>                 }
>
>                 folio_lock(folio);
> +               if (!folio_matches_swap_entry(folio, entry)) {
> +                       folio_unlock(folio);
> +                       folio_put(folio);
> +                       continue;
> +               }
> +
>                 folio_wait_writeback(folio);
>                 ret = unuse_pte(vma, pmd, addr, entry, folio);
>                 if (ret < 0) {
> --
> 2.51.0
>
>
Re: [PATCH v2 05/15] mm, swap: always lock and check the swap cache folio before use
Posted by Kairui Song 3 days, 23 hours ago
On Sat, Sep 6, 2025 at 11:51 AM Chris Li <chrisl@kernel.org> wrote:
>
> Looks correct to me.
>
> Acked-by: Chris Li <chrisl@kernel.org>

Thanks.

>
> With some nitpick follows,
>
> On Fri, Sep 5, 2025 at 12:14 PM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > From: Kairui Song <kasong@tencent.com>
> >
> > Swap cache lookup only increases the reference count of the returned
> > folio. That's not enough to ensure a folio is stable in the swap
> > cache, so the folio could be removed from the swap cache at any
> > time. The caller should always lock and check the folio before using it.
> >
> > We have just documented this in kerneldoc, now introduce a helper for swap
> > cache folio verification with proper sanity checks.
> >
> > Also, sanitize a few current users to use this convention and the new
> > helper for easier debugging. They were not having observable problems
> > yet, only trivial issues like wasted CPU cycles on swapoff or
> > reclaiming. They would fail in some other way, but it is still better to
> > always follow this convention to make things robust and make later
> > commits easier to do.
> >
> > Signed-off-by: Kairui Song <kasong@tencent.com>
> > ---
> >  mm/memory.c     |  3 +--
> >  mm/swap.h       | 24 ++++++++++++++++++++++++
> >  mm/swap_state.c |  7 +++++--
> >  mm/swapfile.c   | 10 +++++++---
> >  4 files changed, 37 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 94a5928e8ace..5808c4ef21b3 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4748,8 +4748,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >                  * swapcache, we need to check that the page's swap has not
> >                  * changed.
> >                  */
> > -               if (unlikely(!folio_test_swapcache(folio) ||
> > -                            page_swap_entry(page).val != entry.val))
> > +               if (unlikely(!folio_matches_swap_entry(folio, entry)))
> >                         goto out_page;
> >
> >                 if (unlikely(PageHWPoison(page))) {
> > diff --git a/mm/swap.h b/mm/swap.h
> > index efb6d7ff9f30..a69e18b12b45 100644
> > --- a/mm/swap.h
> > +++ b/mm/swap.h
> > @@ -52,6 +52,25 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry)
> >         return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK;
> >  }
> >
> > +/**
> > + * folio_matches_swap_entry - Check if a folio matches a given swap entry.
> > + * @folio: The folio.
> > + * @entry: The swap entry to check against.
> > + *
> > + * Context: The caller should have the folio locked to ensure it's stable
> > + * and nothing will move it in or out of the swap cache.
> > + * Return: true or false.
> > + */
> > +static inline bool folio_matches_swap_entry(const struct folio *folio,
> > +                                           swp_entry_t entry)
> > +{
> > +       VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
> > +       if (!folio_test_swapcache(folio))
> > +               return false;
> > +       VM_WARN_ON_ONCE_FOLIO(!IS_ALIGNED(folio->swap.val, folio_nr_pages(folio)), folio);
>
> You should cache the folio->swap.val into a local register value.
> Because WARN_ON_ONCE I think the compiler has no choice but to load it
> twice? Haven't verified it myself.
>
> There is no downside in compiler point of view using more local
> variables, the compiler generates an internal version of the local
> variable equivalent anyway.
>
> > +       return folio->swap.val == round_down(entry.val, folio_nr_pages(folio));
>
> Same for folio_nr_pages(folio), you should cache it. The function will
> look less busy.

That's a very good idea, that should also reduce line length so it is
easier to read.