mm/swapfile.c | 2 ++ 1 file changed, 2 insertions(+)
syzbot reported a WARNING in __swap_offset_to_cluster() triggered by
an invalid swap offset during swapoff:
WARNING: CPU: 0 PID: 9861 at mm/swap.h:87 swap_cache_get_folio+0x186/0x200
The issue occurs because unuse_pte_range() extracts a swap entry from
a PTE and uses the offset without validating it is within bounds of
the swap area.
While the existing swp_type() check filters entries for other swap
areas, it cannot catch cases where the type bits are valid but the
offset is corrupted or stale - for example, due to a race condition
during PTE updates or memory corruption.
Add validation to ensure offset < si->max before using the swap entry.
Reported-by: syzbot+d7bc9ec4a100437aa7a2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d7bc9ec4a100437aa7a2
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
mm/swapfile.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 46d2008e4b99..fdf358df7116 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2277,6 +2277,8 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
continue;
offset = swp_offset(entry);
+ if (offset >= si->max)
+ continue;
pte_unmap(pte);
pte = NULL;
--
2.43.0
On Mon, Dec 01, 2025 at 03:07:41PM +0530, Deepanshu Kartikey wrote: > syzbot reported a WARNING in __swap_offset_to_cluster() triggered by > an invalid swap offset during swapoff: > > WARNING: CPU: 0 PID: 9861 at mm/swap.h:87 swap_cache_get_folio+0x186/0x200 > > The issue occurs because unuse_pte_range() extracts a swap entry from > a PTE and uses the offset without validating it is within bounds of > the swap area. > > While the existing swp_type() check filters entries for other swap > areas, it cannot catch cases where the type bits are valid but the > offset is corrupted or stale - for example, due to a race condition > during PTE updates or memory corruption. Since this indicates a system-level issue (race/corruption), simply avoiding the crash seems to be the goal here. Should we at least add a WARN_ON or somthing? (Unless this corruption is expected to be reported elsewhere beforehand, in which case a silent skip is fine as I think) And it looks like swap_vma_readahead() share similar logic. The differene is intentionally allow entries from different swap devices (to support vma readahead). If the offset is corrupted or invalid in those paths, wouldn't they suffer from similar issues? Do you think we should add the boundary check (offset >= si->max) there as well? Best Regards, Youngjun Park > > Reported-by: syzbot+d7bc9ec4a100437aa7a2@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=d7bc9ec4a100437aa7a2 > Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> > --- > mm/swapfile.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 46d2008e4b99..fdf358df7116 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -2277,6 +2277,8 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > continue; > > offset = swp_offset(entry); > + if (offset >= si->max) > + continue; > pte_unmap(pte); > pte = NULL; > > -- > 2.43.0
On Mon, Dec 1, 2025 at 5:39 PM Deepanshu Kartikey <kartikey406@gmail.com> wrote: > > syzbot reported a WARNING in __swap_offset_to_cluster() triggered by > an invalid swap offset during swapoff: > > WARNING: CPU: 0 PID: 9861 at mm/swap.h:87 swap_cache_get_folio+0x186/0x200 > > The issue occurs because unuse_pte_range() extracts a swap entry from > a PTE and uses the offset without validating it is within bounds of > the swap area. > > While the existing swp_type() check filters entries for other swap > areas, it cannot catch cases where the type bits are valid but the > offset is corrupted or stale - for example, due to a race condition > during PTE updates or memory corruption. > > Add validation to ensure offset < si->max before using the swap entry. > > Reported-by: syzbot+d7bc9ec4a100437aa7a2@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=d7bc9ec4a100437aa7a2 Thanks for posting a fix! But it seems the report is no longer triggering after the softleaf v3 change right? Checking the syzbot link, last reproduce was 11/11, and my analyze was posted here: https://lore.kernel.org/all/CAMgjq7B=OizLoqKca3RjeV0h3p0GQ4uen+gDo3=WdAxQ1gfxnw@mail.gmail.com/ Then we have soft leaf v3 merged, and the warning is gone. Your analyze: > for example, due to a race condition > during PTE updates or memory corruption. What kind of race will lead to a invalid swap entry in the page table? During swapoff no one can allocate any swap entry from this swap device, and the swap type can't be used by other swap devices, so any swap entry still in the page table must be a valid swap entry that was allocated from this swap device before swapoff starts. And we are not releasing the swap_map or si->cluster_info until swapoff is done, seem no risk of OOB or UAF. Memory corruption may cause it indeed, but memory corruption can also cause failures in too many ways. I'm not against a sanity check like this though, just want to double check before we process. > Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> > --- > mm/swapfile.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 46d2008e4b99..fdf358df7116 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -2277,6 +2277,8 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > continue; > > offset = swp_offset(entry); > + if (offset >= si->max) > + continue; > pte_unmap(pte); > pte = NULL; > > -- > 2.43.0 > >
Hi Kairui, Thank you for the detailed feedback! > But it seems the report is no longer triggering after the softleaf v3 > change right? Checking the syzbot link, last reproduce was 11/11 You're right - I should have checked the syzbot status more carefully. If softleaf v3 has already fixed this, then this patch may not be needed. Could you point me to which specific change in softleaf v3 fixed it? I'd like to understand the root cause better. > What kind of race will lead to a invalid swap entry in the page table? You make a good point. I was speculating about possible causes without concrete evidence. > I'm not against a sanity check like this though, just want to double > check before we process. If softleaf v3 has fixed the underlying issue, I can withdraw this patch. Or if you think a defensive sanity check still has value, I can update the commit message to reflect that it is defensive hardening rather than a fix for an active bug. Please let me know how you'd like to proceed. Thanks, Deepanshu
On Mon, Dec 1, 2025 at 6:48 PM Deepanshu Kartikey <kartikey406@gmail.com> wrote: > > Hi Kairui, > > Thank you for the detailed feedback! You are welcome :), > > But it seems the report is no longer triggering after the softleaf v3 > > change right? Checking the syzbot link, last reproduce was 11/11 > > You're right - I should have checked the syzbot status more carefully. > If softleaf v3 has already fixed this, then this patch may not be > needed. > > Could you point me to which specific change in softleaf v3 fixed it? > I'd like to understand the root cause better. This one, I think Lorenzo included it or a similar fix along with another fix in swapfile.c: https://lore.kernel.org/all/CAMgjq7AP383YfU3L5ZxJ9U3x-vRPnEkEUtmnPdXD29HiNC8OrA@mail.gmail.com/ > > > What kind of race will lead to a invalid swap entry in the page table? > > You make a good point. I was speculating about possible causes without > concrete evidence. > > > I'm not against a sanity check like this though, just want to double > > check before we process. > > If softleaf v3 has fixed the underlying issue, I can withdraw this > patch. Or if you think a defensive sanity check still has value, I can > update the commit message to reflect that it is defensive hardening > rather than a fix for an active bug. A sanity check here is acceptable since swapoff is cold and the overhead is hardly visible. No strong opinion on this one.
On Wed, Dec 3, 2025 at 8:24 AM Kairui Song <ryncsn@gmail.com> wrote:
> > If softleaf v3 has fixed the underlying issue, I can withdraw this
> > patch. Or if you think a defensive sanity check still has value, I can
> > update the commit message to reflect that it is defensive hardening
> > rather than a fix for an active bug.
>
> A sanity check here is acceptable since swapoff is cold and the
> overhead is hardly visible. No strong opinion on this one.
Hi Kairui,
Thank you for the link and clarification!
I'll study Lorenzo's fix to understand the root cause better.
Since you mentioned a sanity check is acceptable here, should I update
the commit message to frame this as defensive hardening rather than a
bug fix? Something like:
mm/swapfile: add defensive bounds check in unuse_pte_range()
Add a sanity check to validate the swap offset is within bounds
before using it. While there is no known code path that can
trigger an out-of-bounds offset, this provides defense against
potential edge cases or memory corruption.
The overhead is negligible since swapoff is a cold path.
Thanks,
Deepanshu
On Sat, Dec 6, 2025 at 4:28 PM Deepanshu Kartikey <kartikey406@gmail.com> wrote: > > On Wed, Dec 3, 2025 at 8:24 AM Kairui Song <ryncsn@gmail.com> wrote: > > > If softleaf v3 has fixed the underlying issue, I can withdraw this > > > patch. Or if you think a defensive sanity check still has value, I can > > > update the commit message to reflect that it is defensive hardening > > > rather than a fix for an active bug. > > > > A sanity check here is acceptable since swapoff is cold and the > > overhead is hardly visible. No strong opinion on this one. > > Hi Kairui, > > Thank you for the link and clarification! Thanks for working on this. > > I'll study Lorenzo's fix to understand the root cause better. > > Since you mentioned a sanity check is acceptable here, should I update > the commit message to frame this as defensive hardening rather than a > bug fix? Something like: > > mm/swapfile: add defensive bounds check in unuse_pte_range() > > Add a sanity check to validate the swap offset is within bounds > before using it. While there is no known code path that can > trigger an out-of-bounds offset, this provides defense against > potential edge cases or memory corruption. Adding a defensive hardening is justified. Basically, the kernel shouldn't put an invalid swap entry in the pte. If the swap entry read back from the pte is invalid (out of range). That means there is some kind of bug that might be able to cause data corruption. Bail out and add a WARN_ONCE or the preferred equivalent version if triggered to expose the possible data corruption. I understand there is some opinion about avoiding using WARN. I will leave other comments on what is the best way to notify the possible data corruption. Because it is possible data corruption we are talking about, silently skipping over the possible data corruption bug is worse. That is just my take on this. > > The overhead is negligible since swapoff is a cold path. Ack. Chris
© 2016 - 2026 Red Hat, Inc.