[PATCH] mm, swap: fix potential UAF issue for VMA readahead

Kairui Song posted 1 patch 2 months, 4 weeks ago
mm/swap_state.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
[PATCH] mm, swap: fix potential UAF issue for VMA readahead
Posted by Kairui Song 2 months, 4 weeks ago
From: Kairui Song <kasong@tencent.com>

Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device
pinning"), the common helper for allocating and preparing a folio in the
swap cache layer no longer tries to get a swap device reference
internally, because all callers of __read_swap_cache_async are already
holding a swap entry reference. The repeated swap device pinning isn't
needed on the same swap device.

Caller of VMA readahead is also holding a reference to the target
entry's swap device, but VMA readahead walks the page table, so it might
encounter swap entries from other devices, and call
__read_swap_cache_async on another device without holding a reference to
it.

So it is possible to cause a UAF when swapoff of device A raced with
swapin on device B, and VMA readahead tries to read swap entries from
device A. It's not easy to trigger, but in theory, it could cause real
issues.

Make VMA readahead try to get the device reference first if the swap
device is a different one from the target entry.

Cc: stable@vger.kernel.org
Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
---
Sending as a new patch instead of V2 because the approach is very
different.

Previous patch:
https://lore.kernel.org/linux-mm/20251110-revert-78524b05f1a3-v1-1-88313f2b9b20@tencent.com/
---
 mm/swap_state.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 0cf9853a9232..da0481e163a4 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -745,6 +745,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 
 	blk_start_plug(&plug);
 	for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
+		struct swap_info_struct *si = NULL;
 		softleaf_t entry;
 
 		if (!pte++) {
@@ -759,8 +760,19 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 			continue;
 		pte_unmap(pte);
 		pte = NULL;
+		/*
+		 * Readahead entry may come from a device that we are not
+		 * holding a reference to, try to grab a reference, or skip.
+		 */
+		if (swp_type(entry) != swp_type(targ_entry)) {
+			si = get_swap_device(entry);
+			if (!si)
+				continue;
+		}
 		folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
 						&page_allocated, false);
+		if (si)
+			put_swap_device(si);
 		if (!folio)
 			continue;
 		if (page_allocated) {

---
base-commit: 565d240810a6c9689817a9f3d08f80adf488ca59
change-id: 20251111-swap-fix-vma-uaf-bec70969250f

Best regards,
-- 
Kairui Song <kasong@tencent.com>
Re: [PATCH] mm, swap: fix potential UAF issue for VMA readahead
Posted by Chris Li 2 months, 4 weeks ago
Acked-by: Chris Li <chrisl@kernel.org>

Chris

On Tue, Nov 11, 2025 at 5:36 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device
> pinning"), the common helper for allocating and preparing a folio in the
> swap cache layer no longer tries to get a swap device reference
> internally, because all callers of __read_swap_cache_async are already
> holding a swap entry reference. The repeated swap device pinning isn't
> needed on the same swap device.
>
> Caller of VMA readahead is also holding a reference to the target
> entry's swap device, but VMA readahead walks the page table, so it might
> encounter swap entries from other devices, and call
> __read_swap_cache_async on another device without holding a reference to
> it.
>
> So it is possible to cause a UAF when swapoff of device A raced with
> swapin on device B, and VMA readahead tries to read swap entries from
> device A. It's not easy to trigger, but in theory, it could cause real
> issues.
>
> Make VMA readahead try to get the device reference first if the swap
> device is a different one from the target entry.
>
> Cc: stable@vger.kernel.org
> Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
> Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
> Sending as a new patch instead of V2 because the approach is very
> different.
>
> Previous patch:
> https://lore.kernel.org/linux-mm/20251110-revert-78524b05f1a3-v1-1-88313f2b9b20@tencent.com/
> ---
>  mm/swap_state.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 0cf9853a9232..da0481e163a4 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -745,6 +745,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>
>         blk_start_plug(&plug);
>         for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
> +               struct swap_info_struct *si = NULL;
>                 softleaf_t entry;
>
>                 if (!pte++) {
> @@ -759,8 +760,19 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>                         continue;
>                 pte_unmap(pte);
>                 pte = NULL;
> +               /*
> +                * Readahead entry may come from a device that we are not
> +                * holding a reference to, try to grab a reference, or skip.
> +                */
> +               if (swp_type(entry) != swp_type(targ_entry)) {
> +                       si = get_swap_device(entry);
> +                       if (!si)
> +                               continue;
> +               }
>                 folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
>                                                 &page_allocated, false);
> +               if (si)
> +                       put_swap_device(si);
>                 if (!folio)
>                         continue;
>                 if (page_allocated) {
>
> ---
> base-commit: 565d240810a6c9689817a9f3d08f80adf488ca59
> change-id: 20251111-swap-fix-vma-uaf-bec70969250f
>
> Best regards,
> --
> Kairui Song <kasong@tencent.com>
>
Re: [PATCH] mm, swap: fix potential UAF issue for VMA readahead
Posted by Huang, Ying 2 months, 4 weeks ago
Kairui Song <ryncsn@gmail.com> writes:

> From: Kairui Song <kasong@tencent.com>
>
> Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device
> pinning"), the common helper for allocating and preparing a folio in the
> swap cache layer no longer tries to get a swap device reference
> internally, because all callers of __read_swap_cache_async are already
> holding a swap entry reference. The repeated swap device pinning isn't
> needed on the same swap device.
>
> Caller of VMA readahead is also holding a reference to the target
> entry's swap device, but VMA readahead walks the page table, so it might
> encounter swap entries from other devices, and call
> __read_swap_cache_async on another device without holding a reference to
> it.
>
> So it is possible to cause a UAF when swapoff of device A raced with
> swapin on device B, and VMA readahead tries to read swap entries from
> device A. It's not easy to trigger, but in theory, it could cause real
> issues.
>
> Make VMA readahead try to get the device reference first if the swap
> device is a different one from the target entry.
>
> Cc: stable@vger.kernel.org
> Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
> Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
> Sending as a new patch instead of V2 because the approach is very
> different.
>
> Previous patch:
> https://lore.kernel.org/linux-mm/20251110-revert-78524b05f1a3-v1-1-88313f2b9b20@tencent.com/
> ---
>  mm/swap_state.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 0cf9853a9232..da0481e163a4 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -745,6 +745,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>  
>  	blk_start_plug(&plug);
>  	for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
> +		struct swap_info_struct *si = NULL;
>  		softleaf_t entry;
>  
>  		if (!pte++) {
> @@ -759,8 +760,19 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>  			continue;
>  		pte_unmap(pte);
>  		pte = NULL;
> +		/*
> +		 * Readahead entry may come from a device that we are not
> +		 * holding a reference to, try to grab a reference, or skip.
> +		 */
> +		if (swp_type(entry) != swp_type(targ_entry)) {
> +			si = get_swap_device(entry);
> +			if (!si)
> +				continue;
> +		}
>  		folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
>  						&page_allocated, false);
> +		if (si)
> +			put_swap_device(si);
>  		if (!folio)
>  			continue;
>  		if (page_allocated) {

Personally, I prefer to call put_swap_device() after all swap operations
on the swap entry, that is, after possible swap_read_folio() and
folio_put() in the loop to make it easier to follow the
get/put_swap_device() rule.  But I understand that it will make

if (!folio)
        continue;

to use 'goto' and introduce more change.  So, it's up to you to decide
whether to do that.

Otherwise, LGTM, Thanks for doing this!  Feel free to add my

Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>

in the future versions.

---
Best Regards,
Huang, Ying
Re: [PATCH] mm, swap: fix potential UAF issue for VMA readahead
Posted by Chris Li 2 months, 4 weeks ago
On Tue, Nov 11, 2025 at 5:56 PM Huang, Ying
<ying.huang@linux.alibaba.com> wrote:
>
> Kairui Song <ryncsn@gmail.com> writes:
>
> > From: Kairui Song <kasong@tencent.com>
> >
> > Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device
> > pinning"), the common helper for allocating and preparing a folio in the
> > swap cache layer no longer tries to get a swap device reference
> > internally, because all callers of __read_swap_cache_async are already
> > holding a swap entry reference. The repeated swap device pinning isn't
> > needed on the same swap device.
> >
> > Caller of VMA readahead is also holding a reference to the target
> > entry's swap device, but VMA readahead walks the page table, so it might
> > encounter swap entries from other devices, and call
> > __read_swap_cache_async on another device without holding a reference to
> > it.
> >
> > So it is possible to cause a UAF when swapoff of device A raced with
> > swapin on device B, and VMA readahead tries to read swap entries from
> > device A. It's not easy to trigger, but in theory, it could cause real
> > issues.
> >
> > Make VMA readahead try to get the device reference first if the swap
> > device is a different one from the target entry.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
> > Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
> > Signed-off-by: Kairui Song <kasong@tencent.com>
> > ---
> > Sending as a new patch instead of V2 because the approach is very
> > different.
> >
> > Previous patch:
> > https://lore.kernel.org/linux-mm/20251110-revert-78524b05f1a3-v1-1-88313f2b9b20@tencent.com/
> > ---
> >  mm/swap_state.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index 0cf9853a9232..da0481e163a4 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -745,6 +745,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> >
> >       blk_start_plug(&plug);
> >       for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
> > +             struct swap_info_struct *si = NULL;
> >               softleaf_t entry;
> >
> >               if (!pte++) {
> > @@ -759,8 +760,19 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> >                       continue;
> >               pte_unmap(pte);
> >               pte = NULL;
> > +             /*
> > +              * Readahead entry may come from a device that we are not
> > +              * holding a reference to, try to grab a reference, or skip.
> > +              */
> > +             if (swp_type(entry) != swp_type(targ_entry)) {
> > +                     si = get_swap_device(entry);
> > +                     if (!si)
> > +                             continue;
> > +             }
> >               folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> >                                               &page_allocated, false);
> > +             if (si)
> > +                     put_swap_device(si);
> >               if (!folio)
> >                       continue;
> >               if (page_allocated) {
>
> Personally, I prefer to call put_swap_device() after all swap operations
> on the swap entry, that is, after possible swap_read_folio() and
> folio_put() in the loop to make it easier to follow the
> get/put_swap_device() rule.  But I understand that it will make
>
> if (!folio)
>         continue;
>
> to use 'goto' and introduce more change.  So, it's up to you to decide
> whether to do that.

Personally I prefer it to keep the put_swap_device() in the current
location, closer to the matching get_swap_device(). To me that is
simpler, I don't need to reason about other branch out conditions.
Those error handling branch conditions are very error prone, I have
made enough mistakes on those goto branch handling in my past
experience. The si reference is only needed for the
__read_swap_cache_async() anyway.

To it to the end also works, just take more brain power to reason it.

> Otherwise, LGTM, Thanks for doing this!  Feel free to add my
>
> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>

Thank you for the review.

Chris
Re: [PATCH] mm, swap: fix potential UAF issue for VMA readahead
Posted by Nhat Pham 2 months, 4 weeks ago
On Tue, Nov 11, 2025 at 5:36 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device
> pinning"), the common helper for allocating and preparing a folio in the
> swap cache layer no longer tries to get a swap device reference
> internally, because all callers of __read_swap_cache_async are already
> holding a swap entry reference. The repeated swap device pinning isn't
> needed on the same swap device.
>
> Caller of VMA readahead is also holding a reference to the target
> entry's swap device, but VMA readahead walks the page table, so it might
> encounter swap entries from other devices, and call
> __read_swap_cache_async on another device without holding a reference to
> it.
>
> So it is possible to cause a UAF when swapoff of device A raced with
> swapin on device B, and VMA readahead tries to read swap entries from
> device A. It's not easy to trigger, but in theory, it could cause real
> issues.
>
> Make VMA readahead try to get the device reference first if the swap
> device is a different one from the target entry.
>
> Cc: stable@vger.kernel.org
> Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
> Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
> Sending as a new patch instead of V2 because the approach is very
> different.
>
> Previous patch:
> https://lore.kernel.org/linux-mm/20251110-revert-78524b05f1a3-v1-1-88313f2b9b20@tencent.com/
> ---
>  mm/swap_state.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 0cf9853a9232..da0481e163a4 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -745,6 +745,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>
>         blk_start_plug(&plug);
>         for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
> +               struct swap_info_struct *si = NULL;
>                 softleaf_t entry;
>
>                 if (!pte++) {
> @@ -759,8 +760,19 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
>                         continue;
>                 pte_unmap(pte);
>                 pte = NULL;
> +               /*
> +                * Readahead entry may come from a device that we are not
> +                * holding a reference to, try to grab a reference, or skip.
> +                */
> +               if (swp_type(entry) != swp_type(targ_entry)) {
> +                       si = get_swap_device(entry);
> +                       if (!si)
> +                               continue;
> +               }
>                 folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
>                                                 &page_allocated, false);
> +               if (si)
> +                       put_swap_device(si);

Shouldn't we reset si to NULL here?

Otherwise, suppose we're swapping in a readahead window. One of the
swap entries in the window is on a different swapfile from the target
entry. We look up and get a reference to that different swapfile,
setting it to si.

We do the swapping in work, then we release the recently acquired reference.

In the next iteration in the for loop, we will still see si != NULL,
and we put_swap_device() it again, i.e double releasing reference to
that swap device.

Or am I missing something?

>                 if (!folio)
>                         continue;
>                 if (page_allocated) {
>
Re: [PATCH] mm, swap: fix potential UAF issue for VMA readahead
Posted by Nhat Pham 2 months, 4 weeks ago
On Tue, Nov 11, 2025 at 11:48 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Nov 11, 2025 at 5:36 AM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > From: Kairui Song <kasong@tencent.com>
> >
> > Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device
> > pinning"), the common helper for allocating and preparing a folio in the
> > swap cache layer no longer tries to get a swap device reference
> > internally, because all callers of __read_swap_cache_async are already
> > holding a swap entry reference. The repeated swap device pinning isn't
> > needed on the same swap device.
> >
> > Caller of VMA readahead is also holding a reference to the target
> > entry's swap device, but VMA readahead walks the page table, so it might
> > encounter swap entries from other devices, and call
> > __read_swap_cache_async on another device without holding a reference to
> > it.
> >
> > So it is possible to cause a UAF when swapoff of device A raced with
> > swapin on device B, and VMA readahead tries to read swap entries from
> > device A. It's not easy to trigger, but in theory, it could cause real
> > issues.
> >
> > Make VMA readahead try to get the device reference first if the swap
> > device is a different one from the target entry.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
> > Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
> > Signed-off-by: Kairui Song <kasong@tencent.com>
> > ---
> > Sending as a new patch instead of V2 because the approach is very
> > different.
> >
> > Previous patch:
> > https://lore.kernel.org/linux-mm/20251110-revert-78524b05f1a3-v1-1-88313f2b9b20@tencent.com/
> > ---
> >  mm/swap_state.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index 0cf9853a9232..da0481e163a4 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -745,6 +745,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> >
> >         blk_start_plug(&plug);
> >         for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
> > +               struct swap_info_struct *si = NULL;
> >                 softleaf_t entry;
> >
> >                 if (!pte++) {
> > @@ -759,8 +760,19 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> >                         continue;
> >                 pte_unmap(pte);
> >                 pte = NULL;
> > +               /*
> > +                * Readahead entry may come from a device that we are not
> > +                * holding a reference to, try to grab a reference, or skip.
> > +                */
> > +               if (swp_type(entry) != swp_type(targ_entry)) {
> > +                       si = get_swap_device(entry);
> > +                       if (!si)
> > +                               continue;
> > +               }
> >                 folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
> >                                                 &page_allocated, false);
> > +               if (si)
> > +                       put_swap_device(si);
>
> Shouldn't we reset si to NULL here?
>
> Otherwise, suppose we're swapping in a readahead window. One of the
> swap entries in the window is on a different swapfile from the target
> entry. We look up and get a reference to that different swapfile,
> setting it to si.
>
> We do the swapping in work, then we release the recently acquired reference.
>
> In the next iteration in the for loop, we will still see si != NULL,
> and we put_swap_device() it again, i.e double releasing reference to
> that swap device.
>
> Or am I missing something?

Nvm - Andrew pointed out to me that si was NULLED at the beginning of
the loop. Looks like I was blind :)

Anyway:
Acked-by: Nhat Pham <nphamcs@gmail.com>