[v1] mm/huge_memory: refactor zap_huge_pmd()

[PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()

Posted by Lorenzo Stoakes (Oracle) 2 weeks, 4 days ago

A recent bug I analysed [0] managed to, through a bug in the userfaultfd
implementation, reach an invalid point in the zap_huge_pmd() code where the
PMD was none of:

- A non-DAX, PFN or mixed map.
- The huge zero folio
- A present PMD entry
- A softleaf entry

The code at this point calls folio_test_anon() on a known-NULL
folio. Having logic like this explicitly NULL dereference in the code is
hard to understand, and makes debugging potentially more difficult.

Add an else branch to handle this case and WARN() and exit indicating
failure.

[0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bba1ba1f6b67..8e6b7ba11448 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 		if (!thp_migration_supported())
 			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
+	} else {
+		WARN_ON_ONCE(true);
+		spin_unlock(ptl);
+		return false;
 	}
 
 	if (folio_test_anon(folio)) {
-- 
2.53.0

Re: [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()

Posted by Baolin Wang 2 weeks, 4 days ago


On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> A recent bug I analysed [0] managed to, through a bug in the userfaultfd
> implementation, reach an invalid point in the zap_huge_pmd() code where the
> PMD was none of:
> 
> - A non-DAX, PFN or mixed map.
> - The huge zero folio
> - A present PMD entry
> - A softleaf entry
> 
> The code at this point calls folio_test_anon() on a known-NULL
> folio. Having logic like this explicitly NULL dereference in the code is
> hard to understand, and makes debugging potentially more difficult.
> 
> Add an else branch to handle this case and WARN() and exit indicating
> failure.
> 
> [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
>   mm/huge_memory.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bba1ba1f6b67..8e6b7ba11448 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>   
>   		if (!thp_migration_supported())
>   			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
> +	} else {
> +		WARN_ON_ONCE(true);
> +		spin_unlock(ptl);

The warning looks reasonable to me, but ...

> +		return false;

IIUC, if we return false here, the caller zap_pmd_range() will fall back 
to call zap_pte_range(). Since pmd_trans_huge(pmd) returns true, 
zap_pte_range() will simply return 'addr', causing an infinite loop in 
zap_pmd_range(), right?

Re: [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()

Posted by Lorenzo Stoakes (Oracle) 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 03:00:17PM +0800, Baolin Wang wrote:
>
>
> On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> > A recent bug I analysed [0] managed to, through a bug in the userfaultfd
> > implementation, reach an invalid point in the zap_huge_pmd() code where the
> > PMD was none of:
> >
> > - A non-DAX, PFN or mixed map.
> > - The huge zero folio
> > - A present PMD entry
> > - A softleaf entry
> >
> > The code at this point calls folio_test_anon() on a known-NULL
> > folio. Having logic like this explicitly NULL dereference in the code is
> > hard to understand, and makes debugging potentially more difficult.
> >
> > Add an else branch to handle this case and WARN() and exit indicating
> > failure.
> >
> > [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> >   mm/huge_memory.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index bba1ba1f6b67..8e6b7ba11448 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >   		if (!thp_migration_supported())
> >   			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
> > +	} else {
> > +		WARN_ON_ONCE(true);
> > +		spin_unlock(ptl);
>
> The warning looks reasonable to me, but ...
>
> > +		return false;
>
> IIUC, if we return false here, the caller zap_pmd_range() will fall back to
> call zap_pte_range(). Since pmd_trans_huge(pmd) returns true,
> zap_pte_range() will simply return 'addr', causing an infinite loop in
> zap_pmd_range(), right?

You mean because:

	start_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
	if (!pte)
		return addr;

In any case it looks like it degrades a false to potentially carrying on forever:

		addr = zap_pte_range(tlb, vma, pmd, addr, next, details);
		if (addr != next)
			pmd--;

So yeah, this should be a true, annoyingly.

Will fix thanks!

Cheers, Lorenzo

[PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge()
[PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd()
[PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
[PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
[PATCH 5/8] mm/huge_memory: add a common exit path to zap_huge_pmd()
[PATCH 6/8] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
[PATCH 7/8] mm/huge_memory: deduplicate zap deposited table call
[PATCH 8/8] mm/huge_memory: deduplicate zap_huge_pmd() further by tracking state