mm/zsmalloc.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
get_next_zpdesc() calls get_zspage() which unconditionally dereferences
zpdesc->zspage without a NULL check. This causes a kernel oops when
zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
between zspage destruction and page compaction/migration.
The race window is documented in a TODO comment in zs_page_migrate():
"nothing prevents a zspage from getting destroyed while it is
isolated for migration, as the page lock is temporarily dropped
after zs_page_isolate() succeeded"
The sequence is:
1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
page lock.
2. Concurrently, async_free_zspage() or free_zspage() destroys the
zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
3. A subsequent zs_free() path calls trylock_zspage(), which iterates
zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
NULL backpointer, causing:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:free_zspage+0x26/0x100
Call Trace:
zs_free+0xf4/0x110
zswap_entry_free+0x7e/0x160
The migration side already has a NULL guard (zs_page_migrate line 1675:
"if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
protection.
Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
instead of going through get_zspage(), and returning NULL when the
backpointer is NULL. This stops iteration safely — the caller treats
it as the end of the page chain.
Signed-off-by: Michael Fara <mjfara@gmail.com>
---
mm/zsmalloc.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)
static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
{
- struct zspage *zspage = get_zspage(zpdesc);
+ struct zspage *zspage = zpdesc->zspage;
+
+ /*
+ * If the backpointer is NULL, this zpdesc was already freed via
+ * reset_zpdesc() by a racing async_free_zspage() while isolated
+ * for compaction. See the TODO comment in zs_page_migrate().
+ */
+ if (unlikely(!zspage)) {
+ WARN_ON_ONCE(1);
+ return NULL;
+ }
+
+ BUG_ON(zspage->magic != ZSPAGE_MAGIC);
if (unlikely(ZsHugePage(zspage)))
return NULL;
--
2.39.0
On Mon, 9 Feb 2026 19:32:57 +0000 Michael Fara <mjfara@gmail.com> wrote:
Hello Michael,
I hope you are doing well! Thank you for the patch. I'm not entirely sure if
the race condition that you note here is correct, and also if this is the
right fix.
> get_next_zpdesc() calls get_zspage() which unconditionally dereferences
> zpdesc->zspage without a NULL check. This causes a kernel oops when
> zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
> between zspage destruction and page compaction/migration.
>
> The race window is documented in a TODO comment in zs_page_migrate():
>
> "nothing prevents a zspage from getting destroyed while it is
> isolated for migration, as the page lock is temporarily dropped
> after zs_page_isolate() succeeded"
I'm taking a look at zsmalloc these days, and I was confused by this comment
and remember looking into what was going on here as well : -) My thoughts
were that it should be safe in every other case, though.
> The sequence is:
> 1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
> page lock.
> 2. Concurrently, async_free_zspage() or free_zspage() destroys the
> zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
async_free_zspage isolates the zspage first by taking the class lock and
then splicing the zspage out. free_zspage is always called with the
class lock held, and it likewise removes itself from the fullness list.
zs_free -> free_zspage -> trylock_zspage holds the class->lock
In zs_page_migrate, we call replace_sub_page to remove the zpdesc from the
chain before calling reset_zpdesc, so I don't think there would be a race there
either.
> 3. A subsequent zs_free() path calls trylock_zspage(), which iterates
> zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
> NULL backpointer, causing:
So I don't see how a subsequent zs_free could race with reset_zpdesc on
the same zpdesc, maybe I'm missing something? : -)
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> RIP: 0010:free_zspage+0x26/0x100
> Call Trace:
> zs_free+0xf4/0x110
> zswap_entry_free+0x7e/0x160
Is this from a real crash? A more detailed crash dump would be helpful to
see what else was happening on the other threads!
> The migration side already has a NULL guard (zs_page_migrate line 1675:
> "if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
> protection.
>
> Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
> instead of going through get_zspage(), and returning NULL when the
> backpointer is NULL. This stops iteration safely — the caller treats
> it as the end of the page chain.
If we return NULL early, what happens to the remaining zpdescs in the
zspage? In trylock_zspage for instance, we might exit early and think we
successfully locked all zpdescs, when that isn't the case. It would eventually
lead to a VM_BUG_ON_PAGE when trying to free the zspage later anyways.
If you have a more detailed crash trace, that would be really helpful.
Thank you again for this patch, I hope you have a great day!
Joshua
> Signed-off-by: Michael Fara <mjfara@gmail.com>
> ---
> mm/zsmalloc.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)
>
> static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
> {
> - struct zspage *zspage = get_zspage(zpdesc);
> + struct zspage *zspage = zpdesc->zspage;
> +
> + /*
> + * If the backpointer is NULL, this zpdesc was already freed via
> + * reset_zpdesc() by a racing async_free_zspage() while isolated
> + * for compaction. See the TODO comment in zs_page_migrate().
> + */
> + if (unlikely(!zspage)) {
> + WARN_ON_ONCE(1);
> + return NULL;
> + }
> +
> + BUG_ON(zspage->magic != ZSPAGE_MAGIC);
>
> if (unlikely(ZsHugePage(zspage)))
> return NULL;
> --
> 2.39.0
I just noticed that Minchan wasn't Cc-ed in the patch. Adding him to the
conversation here.
On Mon, 9 Feb 2026 14:50:16 -0800 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> On Mon, 9 Feb 2026 19:32:57 +0000 Michael Fara <mjfara@gmail.com> wrote:
>
> Hello Michael,
>
> I hope you are doing well! Thank you for the patch. I'm not entirely sure if
> the race condition that you note here is correct, and also if this is the
> right fix.
>
> > get_next_zpdesc() calls get_zspage() which unconditionally dereferences
> > zpdesc->zspage without a NULL check. This causes a kernel oops when
> > zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
> > between zspage destruction and page compaction/migration.
[...snip...]
Should we add a Fixes tag here as well?
> > Signed-off-by: Michael Fara <mjfara@gmail.com>
> > ---
> > mm/zsmalloc.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -735,7 +735,19 @@ static struct zspage *get_zspage(struct zpdesc *zpdesc)
> >
> > static struct zpdesc *get_next_zpdesc(struct zpdesc *zpdesc)
> > {
> > - struct zspage *zspage = get_zspage(zpdesc);
> > + struct zspage *zspage = zpdesc->zspage;
> > +
> > + /*
> > + * If the backpointer is NULL, this zpdesc was already freed via
> > + * reset_zpdesc() by a racing async_free_zspage() while isolated
> > + * for compaction. See the TODO comment in zs_page_migrate().
> > + */
> > + if (unlikely(!zspage)) {
> > + WARN_ON_ONCE(1);
> > + return NULL;
> > + }
> > +
> > + BUG_ON(zspage->magic != ZSPAGE_MAGIC);
> >
> > if (unlikely(ZsHugePage(zspage)))
> > return NULL;
> > --
> > 2.39.0
© 2016 - 2026 Red Hat, Inc.