[RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()

Mikhail Gavrilov posted 1 patch 1 week, 2 days ago
[RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Mikhail Gavrilov 1 week, 2 days ago
Hi,

I've been debugging a use-after-free bug in the swap subsystem that manifests
as a crash in free_swap_count_continuations() during swapoff on zram devices.

== Problem ==

KASAN reports wild-memory-access at address 0xdead000000000100 (LIST_POISON1):

  Oops: general protection fault, probably for non-canonical address
0xfbd59c0000000020
  KASAN: maybe wild-memory-access in range
[0xdead000000000100-0xdead000000000107]
  RIP: 0010:__do_sys_swapoff+0x1151/0x1860

  RBP: dead0000000000f8
  R13: dead000000000100

The crash occurs when free_swap_count_continuations() iterates over a
list_head containing LIST_POISON values from a previous list_del().

== Root Cause ==

The swap subsystem uses vmalloc_to_page() to get struct page pointers for
the swap_map array, then uses page->private and page->lru for swap count
continuation lists.

When vmalloc allocates high-order pages without __GFP_COMP and splits them
via split_page(), the resulting pages may contain stale data:

1. post_alloc_hook() only clears page->private for the head page (page[0])
2. split_page() only calls set_page_refcounted() for tail pages
3. Tail pages retain whatever was in page->private and page->lru from
   previous use - including LIST_POISON values from prior list_del() calls

In add_swap_count_continuation() (mm/swapfile.c):

    if (!page_private(head)) {
        INIT_LIST_HEAD(&head->lru);
        set_page_private(head, SWP_CONTINUED);
    }

If head is a vmalloc tail page with stale non-zero page->private, the
INIT_LIST_HEAD is skipped, leaving page->lru with poison values. When
free_swap_count_continuations() later iterates this list, it crashes.

The comment at line 3862 says "Page allocation does not initialize the
page's lru field, but it does always reset its private field" - this
assumption is incorrect for vmalloc pages obtained via split_page().

== Proposed Fix ==

Initialize page->private and page->lru for all pages in split_page().
This matches the documented expectation in mm/vmalloc.c:

  "High-order allocations must be able to be treated as independent
   small pages by callers... Some drivers do their own refcounting
   on vmalloc_to_page() pages, some use page->mapping, page->lru, etc."

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3122,6 +3122,16 @@ void split_page(struct page *page, unsigned int order)
        VM_BUG_ON_PAGE(PageCompound(page), page);
        VM_BUG_ON_PAGE(!page_count(page), page);

+       /*
+        * Split pages may contain stale data from previous use. Initialize
+        * page->private and page->lru which may have LIST_POISON values.
+        */
+       INIT_LIST_HEAD(&page->lru);
+       for (i = 1; i < (1 << order); i++) {
+               set_page_private(page + i, 0);
+               INIT_LIST_HEAD(&page[i].lru);
+       }
+
        for (i = 1; i < (1 << order); i++)
                set_page_refcounted(page + i);
        split_page_owner(page, order, 0);

== Testing ==

Reproduced with a stress test cycling swapon/swapoff on 8GB zram under
memory pressure:
  - Without patch: crash within ~50 iterations
  - With patch: 1154+ iterations, no crash

The bug was originally discovered on Fedora 44 with kernel 6.19.0-rc7
during normal system shutdown after extended use.

== Questions ==

1. Is split_page() the right place for this fix, or should the swap code
   be more defensive about uninitialized vmalloc pages?

2. Should prep_new_page()/post_alloc_hook() initialize all pages in
   high-order allocations, not just the head?

3. Are there other fields besides page->private and page->lru that
   callers of split_page() might expect to be initialized?

Thoughts?

-- 
Best Regards,
Mike Gavrilov.
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Kairui Song 1 week, 2 days ago
On Fri, Jan 30, 2026 at 9:49 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> Hi,
>
> I've been debugging a use-after-free bug in the swap subsystem that manifests
> as a crash in free_swap_count_continuations() during swapoff on zram devices.
>
> == Problem ==
>
> KASAN reports wild-memory-access at address 0xdead000000000100 (LIST_POISON1):
>
>   Oops: general protection fault, probably for non-canonical address
> 0xfbd59c0000000020
>   KASAN: maybe wild-memory-access in range
> [0xdead000000000100-0xdead000000000107]
>   RIP: 0010:__do_sys_swapoff+0x1151/0x1860
>
>   RBP: dead0000000000f8
>   R13: dead000000000100
>
> The crash occurs when free_swap_count_continuations() iterates over a
> list_head containing LIST_POISON values from a previous list_del().
>

Hi Mikhail,

Thanks for reporting this issue.

> == Root Cause ==
>
> The swap subsystem uses vmalloc_to_page() to get struct page pointers for
> the swap_map array, then uses page->private and page->lru for swap count
> continuation lists.
>
> When vmalloc allocates high-order pages without __GFP_COMP and splits them
> via split_page(), the resulting pages may contain stale data:

So the problem starts with `swap_map = vzalloc(maxpages);` right? Will
it be enough if we just pass GFP_COMP here?

And worth noting, mm/swapfile.c already have following code:

/*
* Page allocation does not initialize the page's lru field,
* but it does always reset its private field.
*/
if (!page_private(head)) {
    BUG_ON(count & COUNT_CONTINUED);
    INIT_LIST_HEAD(&head->lru);
    set_page_private(head, SWP_CONTINUED);
    si->flags |= SWP_CONTINUED;
}
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Mikhail Gavrilov 1 week, 2 days ago
On Fri, Jan 30, 2026 at 8:31 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> Hi Mikhail,
>
> Thanks for reporting this issue.
>
> So the problem starts with `swap_map = vzalloc(maxpages);` right? Will
> it be enough if we just pass GFP_COMP here?

No, __GFP_COMP won't help here. vmalloc always calls split_page() for
high-order allocations to treat them as independent pages (see
mm/vmalloc.c around line 3730). The compound page would be split
anyway.

> And worth noting, mm/swapfile.c already have following code:
>
> /*
> * Page allocation does not initialize the page's lru field,
> * but it does always reset its private field.
> */
> if (!page_private(head)) {
>     BUG_ON(count & COUNT_CONTINUED);
>     INIT_LIST_HEAD(&head->lru);
>     set_page_private(head, SWP_CONTINUED);
>     si->flags |= SWP_CONTINUED;
> }

Yes, this comment is the root of the problem - the assumption is
incorrect for vmalloc pages obtained via split_page().
post_alloc_hook() only clears page->private for the head page
(page[0]). When split_page() breaks a high-order page into individual
pages, tail pages keep their stale page->private values.
We could fix this in swapfile.c by always calling INIT_LIST_HEAD(),
but that would only fix swap. The comment in vmalloc.c suggests other
users also rely on these fields:

"Some drivers do their own refcounting on vmalloc_to_page() pages,
some use page->mapping, page->lru, etc."

So fixing it in split_page() seems like the right place to ensure all
callers get properly initialized pages.

What do you think?

-- 
Best Regards,
Mike Gavrilov.
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Kairui Song 1 week ago
On Fri, Jan 30, 2026 at 11:47 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Fri, Jan 30, 2026 at 8:31 PM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > Hi Mikhail,
> >
> > Thanks for reporting this issue.
> >
> > So the problem starts with `swap_map = vzalloc(maxpages);` right? Will
> > it be enough if we just pass GFP_COMP here?
>
> No, __GFP_COMP won't help here. vmalloc always calls split_page() for
> high-order allocations to treat them as independent pages (see
> mm/vmalloc.c around line 3730). The compound page would be split
> anyway.

Right, but with __GFP_COMP, prep_compound_page clears the tail pages'
private too, so the code snip I posted will clear their lru on use?

>
> > And worth noting, mm/swapfile.c already have following code:
> >
> > /*
> > * Page allocation does not initialize the page's lru field,
> > * but it does always reset its private field.
> > */
> > if (!page_private(head)) {
> >     BUG_ON(count & COUNT_CONTINUED);
> >     INIT_LIST_HEAD(&head->lru);
> >     set_page_private(head, SWP_CONTINUED);
> >     si->flags |= SWP_CONTINUED;
> > }
>
> Yes, this comment is the root of the problem - the assumption is
> incorrect for vmalloc pages obtained via split_page().
> post_alloc_hook() only clears page->private for the head page
> (page[0]). When split_page() breaks a high-order page into individual
> pages, tail pages keep their stale page->private values.
> We could fix this in swapfile.c by always calling INIT_LIST_HEAD(),
> but that would only fix swap. The comment in vmalloc.c suggests other
> users also rely on these fields:
>
> "Some drivers do their own refcounting on vmalloc_to_page() pages,
> some use page->mapping, page->lru, etc."
>
> So fixing it in split_page() seems like the right place to ensure all
> callers get properly initialized pages.
>
> What do you think?

I took a look at the history, commit 3b8000ae185c ("mm/vmalloc: huge
vmalloc backing pages should be split rather than compound") dropped
__GFP_COMP and added split_page, that's the commit added the comment
you mentioned.

Fixing it with this patch you posted seems could result in other
issues, e.g. split_free_pages / split_free_frozen_pages would call
split_page while the page is on a list, so at least clearing head
page's LRU seems incorrect?
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Mikhail Gavrilov 1 week ago
On Mon, Feb 2, 2026 at 8:18 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> Right, but with __GFP_COMP, prep_compound_page clears the tail pages'
> private too, so the code snip I posted will clear their lru on use?

You're right that prep_compound_tail() clears page->private for tail
pages. But the issue is that vmalloc explicitly avoids __GFP_COMP -
see commit 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages
should be split rather than compound"). So prep_compound_page() is
never called for these allocations.

> I took a look at the history, commit 3b8000ae185c ("mm/vmalloc: huge
> vmalloc backing pages should be split rather than compound") dropped
> __GFP_COMP and added split_page, that's the commit added the comment
> you mentioned.

Good find! That's exactly where the problem was introduced.

> Fixing it with this patch you posted seems could result in other
> issues, e.g. split_free_pages / split_free_frozen_pages would call
> split_page while the page is on a list, so at least clearing head
> page's LRU seems incorrect?

You're absolutely right, that's a problem I missed. split_free_page()
calls split_page() on pages that are still on the buddy free list -
initializing head->lru there would corrupt the list.

So the fix should only initialize tail pages, not the head page:
/*
 * Split pages may contain stale data from previous use. Initialize
 * page->private and page->lru for tail pages which may have
 * LIST_POISON values. Head page is left alone as callers like
 * split_free_page() may have it on a list.
 */
for (i = 1; i < (1 << order); i++) {
    set_page_refcounted(page + i);
    set_page_private(page + i, 0);
    INIT_LIST_HEAD(&page[i].lru);
}

But then we still have the problem for head page in vmalloc case.
Maybe the fix should be in vmalloc.c instead, after split_page()
returns?
Or alternatively, fix it in swapfile.c by unconditionally calling
INIT_LIST_HEAD() - the comment there is already wrong, so we should
fix both the comment and the code?
What do you think is the cleanest approach?

-- 
Best Regards,
Mike Gavrilov.
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Kairui Song 6 days, 19 hours ago
On Mon, Feb 2, 2026 at 1:27 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Mon, Feb 2, 2026 at 8:18 AM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > I took a look at the history, commit 3b8000ae185c ("mm/vmalloc: huge
> > vmalloc backing pages should be split rather than compound") dropped
> > __GFP_COMP and added split_page, that's the commit added the comment
> > you mentioned.
>
> Good find! That's exactly where the problem was introduced.

Right, then I think we need a Fixes tag, and is swap really the only
victim of that change? BTW, swap's usage of page->lru will be gone
soon. Still this definitely needs to be fixed first for stable branch, but
it looks strange why nothing else ever hit this.

> Or alternatively, fix it in swapfile.c by unconditionally calling
> INIT_LIST_HEAD() - the comment there is already wrong, so we should
> fix both the comment and the code?

Or maybe clean page->private instead? The problem is triggered by
free_swap_count_continuations which checks page_private to tell if the
page has list data, and ignores the list if not. So the pages should
have their private cleaned upon allocation.

The old comment in swapfile: "Page allocation does not initialize the
page's lru field, but it does always reset its private field" does
suggest that vmalloc should take care of the private field, not sure
if that suppose to be an convention, but if swap is really the only
user of that, patching from swap side looks cleaner.
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Mikhail Gavrilov 6 days, 17 hours ago
On Mon, Feb 2, 2026 at 10:55 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> Right, then I think we need a Fixes tag, and is swap really the only
> victim of that change?

Good question. The comment in vmalloc.c mentions "some use
page->mapping, page->lru, etc." but I haven't found other concrete
examples that would hit this bug. Swap seems to be the only one
relying on page->private being zero.

> Or maybe clean page->private instead? The problem is triggered by
> free_swap_count_continuations which checks page_private to tell if the
> page has list data, and ignores the list if not. So the pages should
> have their private cleaned upon allocation.

You're right. Looking at swap_count_continued(), it already checks
page_private(head) != SWP_CONTINUED, not page_private(head) == 0.
The fix in swapfile.c should use the same condition in
add_swap_count_continuation():

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 46d2008e4b99..f131494d4262 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3859,10 +3859,11 @@ int add_swap_count_continuation(swp_entry_t
entry, gfp_t gfp_mask)

        spin_lock(&si->cont_lock);
        /*
-        * Page allocation does not initialize the page's lru field,
-        * but it does always reset its private field.
+        * Page allocation does not initialize the page's lru field, and
+        * vmalloc pages from split_page() may have stale page->private.
+        * Check for SWP_CONTINUED not just non-zero.
         */
-       if (!page_private(head)) {
+       if (page_private(head) != SWP_CONTINUED) {
                BUG_ON(count & COUNT_CONTINUED);
                INIT_LIST_HEAD(&head->lru);
                set_page_private(head, SWP_CONTINUED);

This handles both cases:
- page_private == 0 (normal fresh page)
- page_private == stale non-zero value (vmalloc split_page bug)

And the comment should be updated to reflect that vmalloc pages may
have stale page->private.
Should I send a v2 with this approach?

-- 
Best Regards,
Mike Gavrilov.
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Mikhail Gavrilov 6 days, 6 hours ago
On Tue, Feb 3, 2026 at 1:21 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 46d2008e4b99..f131494d4262 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3859,10 +3859,11 @@ int add_swap_count_continuation(swp_entry_t
> entry, gfp_t gfp_mask)
>
>         spin_lock(&si->cont_lock);
>         /*
> -        * Page allocation does not initialize the page's lru field,
> -        * but it does always reset its private field.
> +        * Page allocation does not initialize the page's lru field, and
> +        * vmalloc pages from split_page() may have stale page->private.
> +        * Check for SWP_CONTINUED not just non-zero.
>          */
> -       if (!page_private(head)) {
> +       if (page_private(head) != SWP_CONTINUED) {
>                 BUG_ON(count & COUNT_CONTINUED);
>                 INIT_LIST_HEAD(&head->lru);
>                 set_page_private(head, SWP_CONTINUED);
>

Sorry, I sent the previous message before completing testing. The
swapfile.c fix doesn't actually work.

The problem is that stale page->private could accidentally equal
SWP_CONTINUED (32), so we can't distinguish between a legitimately
initialized page and stale data that happens to be 32. In testing, the
crash still occurred with the swapfile.c-only fix.

The correct fix is in split_page() - clear page->private for tail pages:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cbf758e27aa2..3604a00e2118 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3122,8 +3122,14 @@ void split_page(struct page *page, unsigned int order)
        VM_BUG_ON_PAGE(PageCompound(page), page);
        VM_BUG_ON_PAGE(!page_count(page), page);

-       for (i = 1; i < (1 << order); i++)
+       for (i = 1; i < (1 << order); i++) {
                set_page_refcounted(page + i);
+               /*
+                * Tail pages may have stale page->private from buddy
+                * allocator or previous use. Clear it.
+                */
+               set_page_private(page + i, 0);
+       }
        split_page_owner(page, order, 0);
        pgalloc_tag_split(page_folio(page), order, 0);
        split_page_memcg(page, order);


Note: only clearing page->private, not touching page->lru (to avoid
breaking split_free_page() which may have head on a list).

Tested for 7+ hours with stress test cycling swapon/swapoff on 8GB
zram under memory pressure - no crashes.

-- 
Best Regards,
Mike Gavrilov.
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Matthew Wilcox 1 week, 2 days ago
On Fri, Jan 30, 2026 at 06:49:00PM +0500, Mikhail Gavrilov wrote:
> +       /*
> +        * Split pages may contain stale data from previous use. Initialize
> +        * page->private and page->lru which may have LIST_POISON values.
> +        */
> +       INIT_LIST_HEAD(&page->lru);
> +       for (i = 1; i < (1 << order); i++) {
> +               set_page_private(page + i, 0);
> +               INIT_LIST_HEAD(&page[i].lru);
> +       }
> +
>         for (i = 1; i < (1 << order); i++)
>                 set_page_refcounted(page + i);
>         split_page_owner(page, order, 0);

Why add a second loop instead of using the existing one?
Re: [RFC PATCH] mm/page_alloc: fix use-after-free in swap due to stale page data after split_page()
Posted by Mikhail Gavrilov 1 week, 2 days ago
On Fri, Jan 30, 2026 at 6:59 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> Why add a second loop instead of using the existing one?

You're right, no good reason for a separate loop.
Here's v2:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cbf758e27aa2..306493d76ea4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3122,8 +3122,17 @@ void split_page(struct page *page, unsigned int order)
        VM_BUG_ON_PAGE(PageCompound(page), page);
        VM_BUG_ON_PAGE(!page_count(page), page);

-       for (i = 1; i < (1 << order); i++)
+       /*
+        * Split pages may contain stale data from previous use. Initialize
+        * page->private and page->lru which may have LIST_POISON values.
+        */
+       INIT_LIST_HEAD(&page->lru);
+       for (i = 1; i < (1 << order); i++) {
                set_page_refcounted(page + i);
+               set_page_private(page + i, 0);
+               INIT_LIST_HEAD(&page[i].lru);
+       }
+
        split_page_owner(page, order, 0);
        pgalloc_tag_split(page_folio(page), order, 0);
        split_page_memcg(page, order);


Should I send a formal v2 patch?

-- 
Best Regards,
Mike Gavrilov.