[PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()

Lorenzo Stoakes (Oracle) posted 8 patches 2 weeks, 4 days ago
There is a newer version of this series
[PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
Posted by Lorenzo Stoakes (Oracle) 2 weeks, 4 days ago
A recent bug I analysed [0] managed to, through a bug in the userfaultfd
implementation, reach an invalid point in the zap_huge_pmd() code where the
PMD was none of:

- A non-DAX, PFN or mixed map.
- The huge zero folio
- A present PMD entry
- A softleaf entry

The code at this point calls folio_test_anon() on a known-NULL
folio. Having logic like this explicitly NULL dereference in the code is
hard to understand, and makes debugging potentially more difficult.

Add an else branch to handle this case and WARN() and exit indicating
failure.

[0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bba1ba1f6b67..8e6b7ba11448 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 		if (!thp_migration_supported())
 			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
+	} else {
+		WARN_ON_ONCE(true);
+		spin_unlock(ptl);
+		return false;
 	}
 
 	if (folio_test_anon(folio)) {
-- 
2.53.0
Re: [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
Posted by Baolin Wang 2 weeks, 4 days ago

On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> A recent bug I analysed [0] managed to, through a bug in the userfaultfd
> implementation, reach an invalid point in the zap_huge_pmd() code where the
> PMD was none of:
> 
> - A non-DAX, PFN or mixed map.
> - The huge zero folio
> - A present PMD entry
> - A softleaf entry
> 
> The code at this point calls folio_test_anon() on a known-NULL
> folio. Having logic like this explicitly NULL dereference in the code is
> hard to understand, and makes debugging potentially more difficult.
> 
> Add an else branch to handle this case and WARN() and exit indicating
> failure.
> 
> [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
>   mm/huge_memory.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bba1ba1f6b67..8e6b7ba11448 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>   
>   		if (!thp_migration_supported())
>   			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
> +	} else {
> +		WARN_ON_ONCE(true);
> +		spin_unlock(ptl);

The warning looks reasonable to me, but ...

> +		return false;

IIUC, if we return false here, the caller zap_pmd_range() will fall back 
to call zap_pte_range(). Since pmd_trans_huge(pmd) returns true, 
zap_pte_range() will simply return 'addr', causing an infinite loop in 
zap_pmd_range(), right?
Re: [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
Posted by Lorenzo Stoakes (Oracle) 2 weeks, 4 days ago
On Thu, Mar 19, 2026 at 03:00:17PM +0800, Baolin Wang wrote:
>
>
> On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> > A recent bug I analysed [0] managed to, through a bug in the userfaultfd
> > implementation, reach an invalid point in the zap_huge_pmd() code where the
> > PMD was none of:
> >
> > - A non-DAX, PFN or mixed map.
> > - The huge zero folio
> > - A present PMD entry
> > - A softleaf entry
> >
> > The code at this point calls folio_test_anon() on a known-NULL
> > folio. Having logic like this explicitly NULL dereference in the code is
> > hard to understand, and makes debugging potentially more difficult.
> >
> > Add an else branch to handle this case and WARN() and exit indicating
> > failure.
> >
> > [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> >   mm/huge_memory.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index bba1ba1f6b67..8e6b7ba11448 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >   		if (!thp_migration_supported())
> >   			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
> > +	} else {
> > +		WARN_ON_ONCE(true);
> > +		spin_unlock(ptl);
>
> The warning looks reasonable to me, but ...
>
> > +		return false;
>
> IIUC, if we return false here, the caller zap_pmd_range() will fall back to
> call zap_pte_range(). Since pmd_trans_huge(pmd) returns true,
> zap_pte_range() will simply return 'addr', causing an infinite loop in
> zap_pmd_range(), right?

You mean because:

	start_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
	if (!pte)
		return addr;

In any case it looks like it degrades a false to potentially carrying on forever:

		addr = zap_pte_range(tlb, vma, pmd, addr, next, details);
		if (addr != next)
			pmd--;

So yeah, this should be a true, annoyingly.

Will fix thanks!

Cheers, Lorenzo