mm/rmap.c | 2 ++ 1 file changed, 2 insertions(+)
Initialize nr_pages to 1 at the start of each loop iteration, like
folio_referenced_one() does.
Without this, nr_pages computed by a previous folio_unmap_pte_batch() call
can be reused on a later iteration that does not run
folio_unmap_pte_batch() again.
mmap a 64K large folio with MAP_ANONYMOUS | MAP_DROPPABLE, then call
madvise(MADV_FREE), then make the last page device-exclusive via
HMM_DMIRROR_EXCLUSIVE.
Trigger node reclaim through sysfs. Now, in try_to_unmap_one(), we will
first clear the first 15 out of 16 entries mapping the lazyfree folio.
This will set nr_pages to 15. In the next pvmw walk, this nr_pages gets
reused on a device-exclusive pte, thus potentially corrupting folio
refcount/mapcount.
At the moment, I have a userspace program which can make the kernel spit
out a trace, but the blow up is in folio_referenced_one(), because there
are existing bugs in the interaction between device-private and rmap
(which too I am investigating). I did a one liner kernel change to avoid
going into folio_referenced_one(), and the kernel blows up at
folio_remove_rmap_ptes in try_to_unmap_one which is what I wanted.
Note that the bug is there not since file folio batching but lazyfree folio
batching, since device-exclusive only works for anonymous folios.
Userspace visible effect is simply kernel crashing somewhere due to
refcount/mapcount corruption.
Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
Cc: stable@vger.kernel.org
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: Barry Song <baohua@kernel.org>
---
Applies on mm-unstable. This patch was part of
https://lore.kernel.org/all/20260506094504.2588857-2-dev.jain@arm.com/
mm/rmap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/rmap.c b/mm/rmap.c
index fb3c351f8c45..1c77d5dc06e9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2030,6 +2030,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
mmu_notifier_invalidate_range_start(&range);
while (page_vma_mapped_walk(&pvmw)) {
+ nr_pages = 1;
+
/*
* If the folio is in an mlock()d vma, we must not swap it out.
*/
--
2.43.0
On Mon, May 18, 2026 at 12:06:56PM +0530, Dev Jain wrote:
> Initialize nr_pages to 1 at the start of each loop iteration, like
> folio_referenced_one() does.
>
> Without this, nr_pages computed by a previous folio_unmap_pte_batch() call
> can be reused on a later iteration that does not run
> folio_unmap_pte_batch() again.
Yikes!
>
> mmap a 64K large folio with MAP_ANONYMOUS | MAP_DROPPABLE, then call
> madvise(MADV_FREE), then make the last page device-exclusive via
> HMM_DMIRROR_EXCLUSIVE.
>
> Trigger node reclaim through sysfs. Now, in try_to_unmap_one(), we will
> first clear the first 15 out of 16 entries mapping the lazyfree folio.
> This will set nr_pages to 15. In the next pvmw walk, this nr_pages gets
> reused on a device-exclusive pte, thus potentially corrupting folio
> refcount/mapcount.
>
> At the moment, I have a userspace program which can make the kernel spit
> out a trace, but the blow up is in folio_referenced_one(), because there
> are existing bugs in the interaction between device-private and rmap
> (which too I am investigating). I did a one liner kernel change to avoid
> going into folio_referenced_one(), and the kernel blows up at
> folio_remove_rmap_ptes in try_to_unmap_one which is what I wanted.
>
> Note that the bug is there not since file folio batching but lazyfree folio
> batching, since device-exclusive only works for anonymous folios.
>
> Userspace visible effect is simply kernel crashing somewhere due to
> refcount/mapcount corruption.
Also yikes!
Thanks for the detailed commit message :)
>
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dev Jain <dev.jain@arm.com>
Thanks, LGTM so:
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
> Acked-by: Barry Song <baohua@kernel.org>
> ---
> Applies on mm-unstable. This patch was part of
> https://lore.kernel.org/all/20260506094504.2588857-2-dev.jain@arm.com/
>
> mm/rmap.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index fb3c351f8c45..1c77d5dc06e9 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2030,6 +2030,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> mmu_notifier_invalidate_range_start(&range);
>
> while (page_vma_mapped_walk(&pvmw)) {
> + nr_pages = 1;
> +
> /*
> * If the folio is in an mlock()d vma, we must not swap it out.
> */
> --
> 2.43.0
>
Cheers, Lorenzo
On 5/18/26 08:36, Dev Jain wrote:
> Initialize nr_pages to 1 at the start of each loop iteration, like
> folio_referenced_one() does.
>
> Without this, nr_pages computed by a previous folio_unmap_pte_batch() call
> can be reused on a later iteration that does not run
> folio_unmap_pte_batch() again.
>
> mmap a 64K large folio with MAP_ANONYMOUS | MAP_DROPPABLE, then call
> madvise(MADV_FREE), then make the last page device-exclusive via
> HMM_DMIRROR_EXCLUSIVE.
>
> Trigger node reclaim through sysfs. Now, in try_to_unmap_one(), we will
> first clear the first 15 out of 16 entries mapping the lazyfree folio.
> This will set nr_pages to 15. In the next pvmw walk, this nr_pages gets
> reused on a device-exclusive pte, thus potentially corrupting folio
> refcount/mapcount.
>
> At the moment, I have a userspace program which can make the kernel spit
> out a trace, but the blow up is in folio_referenced_one(), because there
> are existing bugs in the interaction between device-private and rmap
> (which too I am investigating). I did a one liner kernel change to avoid
> going into folio_referenced_one(), and the kernel blows up at
> folio_remove_rmap_ptes in try_to_unmap_one which is what I wanted.
>
> Note that the bug is there not since file folio batching but lazyfree folio
> batching, since device-exclusive only works for anonymous folios.
>
> Userspace visible effect is simply kernel crashing somewhere due to
> refcount/mapcount corruption.
>
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> Acked-by: Barry Song <baohua@kernel.org>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Thanks!
--
Cheers,
David
© 2016 - 2026 Red Hat, Inc.