mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one

[PATCH] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one

Posted by Dev Jain 6 days, 22 hours ago

Initialize nr_pages to 1 at the start of each loop iteration, like
folio_referenced_one() does.

Without this, nr_pages computed by a previous folio_unmap_pte_batch() call
can be reused on a later iteration that does not run
folio_unmap_pte_batch() again.

mmap a 64K large folio with MAP_ANONYMOUS | MAP_DROPPABLE, then call
madvise(MADV_FREE), then make the last page device-exclusive via
HMM_DMIRROR_EXCLUSIVE.

Trigger node reclaim through sysfs. Now, in try_to_unmap_one(), we will
first clear the first 15 out of 16 entries mapping the lazyfree folio.
This will set nr_pages to 15. In the next pvmw walk, this nr_pages gets
reused on a device-exclusive pte, thus potentially corrupting folio
refcount/mapcount.

At the moment, I have a userspace program which can make the kernel spit
out a trace, but the blow up is in folio_referenced_one(), because there
are existing bugs in the interaction between device-private and rmap
(which too I am investigating). I did a one liner kernel change to avoid
going into folio_referenced_one(), and the kernel blows up at
folio_remove_rmap_ptes in try_to_unmap_one which is what I wanted. 

Note that the bug is there not since file folio batching but lazyfree folio
batching, since device-exclusive only works for anonymous folios.

Userspace visible effect is simply kernel crashing somewhere due to
refcount/mapcount corruption.

Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
Cc: stable@vger.kernel.org
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: Barry Song <baohua@kernel.org>
---
Applies on mm-unstable. This patch was part of
https://lore.kernel.org/all/20260506094504.2588857-2-dev.jain@arm.com/

 mm/rmap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/rmap.c b/mm/rmap.c
index fb3c351f8c45..1c77d5dc06e9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2030,6 +2030,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 	mmu_notifier_invalidate_range_start(&range);
 
 	while (page_vma_mapped_walk(&pvmw)) {
+		nr_pages = 1;
+
 		/*
 		 * If the folio is in an mlock()d vma, we must not swap it out.
 		 */
-- 
2.43.0

Re: [PATCH] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one

Posted by Lorenzo Stoakes 6 days, 18 hours ago

On Mon, May 18, 2026 at 12:06:56PM +0530, Dev Jain wrote:
> Initialize nr_pages to 1 at the start of each loop iteration, like
> folio_referenced_one() does.
>
> Without this, nr_pages computed by a previous folio_unmap_pte_batch() call
> can be reused on a later iteration that does not run
> folio_unmap_pte_batch() again.

Yikes!

>
> mmap a 64K large folio with MAP_ANONYMOUS | MAP_DROPPABLE, then call
> madvise(MADV_FREE), then make the last page device-exclusive via
> HMM_DMIRROR_EXCLUSIVE.
>
> Trigger node reclaim through sysfs. Now, in try_to_unmap_one(), we will
> first clear the first 15 out of 16 entries mapping the lazyfree folio.
> This will set nr_pages to 15. In the next pvmw walk, this nr_pages gets
> reused on a device-exclusive pte, thus potentially corrupting folio
> refcount/mapcount.
>
> At the moment, I have a userspace program which can make the kernel spit
> out a trace, but the blow up is in folio_referenced_one(), because there
> are existing bugs in the interaction between device-private and rmap
> (which too I am investigating). I did a one liner kernel change to avoid
> going into folio_referenced_one(), and the kernel blows up at
> folio_remove_rmap_ptes in try_to_unmap_one which is what I wanted.
>
> Note that the bug is there not since file folio batching but lazyfree folio
> batching, since device-exclusive only works for anonymous folios.
>
> Userspace visible effect is simply kernel crashing somewhere due to
> refcount/mapcount corruption.

Also yikes!

Thanks for the detailed commit message :)

>
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dev Jain <dev.jain@arm.com>

Thanks, LGTM so:

Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>

> Acked-by: Barry Song <baohua@kernel.org>
> ---
> Applies on mm-unstable. This patch was part of
> https://lore.kernel.org/all/20260506094504.2588857-2-dev.jain@arm.com/
>
>  mm/rmap.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index fb3c351f8c45..1c77d5dc06e9 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2030,6 +2030,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  	mmu_notifier_invalidate_range_start(&range);
>
>  	while (page_vma_mapped_walk(&pvmw)) {
> +		nr_pages = 1;
> +
>  		/*
>  		 * If the folio is in an mlock()d vma, we must not swap it out.
>  		 */
> --
> 2.43.0
>

Cheers, Lorenzo

Re: [PATCH] mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one

Posted by David Hildenbrand (Arm) 6 days, 21 hours ago

On 5/18/26 08:36, Dev Jain wrote:
> Initialize nr_pages to 1 at the start of each loop iteration, like
> folio_referenced_one() does.
> 
> Without this, nr_pages computed by a previous folio_unmap_pte_batch() call
> can be reused on a later iteration that does not run
> folio_unmap_pte_batch() again.
> 
> mmap a 64K large folio with MAP_ANONYMOUS | MAP_DROPPABLE, then call
> madvise(MADV_FREE), then make the last page device-exclusive via
> HMM_DMIRROR_EXCLUSIVE.
> 
> Trigger node reclaim through sysfs. Now, in try_to_unmap_one(), we will
> first clear the first 15 out of 16 entries mapping the lazyfree folio.
> This will set nr_pages to 15. In the next pvmw walk, this nr_pages gets
> reused on a device-exclusive pte, thus potentially corrupting folio
> refcount/mapcount.
> 
> At the moment, I have a userspace program which can make the kernel spit
> out a trace, but the blow up is in folio_referenced_one(), because there
> are existing bugs in the interaction between device-private and rmap
> (which too I am investigating). I did a one liner kernel change to avoid
> going into folio_referenced_one(), and the kernel blows up at
> folio_remove_rmap_ptes in try_to_unmap_one which is what I wanted. 
> 
> Note that the bug is there not since file folio batching but lazyfree folio
> batching, since device-exclusive only works for anonymous folios.
> 
> Userspace visible effect is simply kernel crashing somewhere due to
> refcount/mapcount corruption.
> 
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> Acked-by: Barry Song <baohua@kernel.org>
> ---

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

Thanks!

-- 
Cheers,

David