[PATCH] mm/mmap: Fix race in mmap_region() with ftrucate()

Liam R. Howlett posted 1 patch 4 months, 3 weeks ago
There is a newer version of this series
mm/mmap.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
[PATCH] mm/mmap: Fix race in mmap_region() with ftrucate()
Posted by Liam R. Howlett 4 months, 3 weeks ago
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>

Avoiding the zeroing of the vma tree in mmap_region() introduced a race
with truncate in the page table walk.  To avoid any races, create a hole
in the rmap during the operation by clearing the pagetable entries
earlier under the mmap write lock and (critically) before the new vma is
installed into the vma tree.  The result is that the old vma is still in
the vma tree, but the page tables are cleared while holding the
i_mmap_rwsem.

This change extends the fix required for hugetblfs and the call_mmap()
function by moving the cleanup higher in the function and running it
unconditionally.

Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
Link: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
---
 mm/mmap.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..a20998fb633c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1413,6 +1413,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vmg.flags = vm_flags;
 	}
 
+	/*
+	 * clear PTEs while the vma is still in the tree so that rmap
+	 * cannot race with the freeing later in the truncate scenario.
+	 * This is also needed for call_mmap(), which is why vm_ops
+	 * close function is called.
+	 */
+	vms_clean_up_area(&vms, &mas_detach);
 	vma = vma_merge_new_range(&vmg);
 	if (vma)
 		goto expanded;
@@ -1432,11 +1439,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 
 	if (file) {
 		vma->vm_file = get_file(file);
-		/*
-		 * call_mmap() may map PTE, so ensure there are no existing PTEs
-		 * and call the vm_ops close function if one exists.
-		 */
-		vms_clean_up_area(&vms, &mas_detach);
 		error = call_mmap(file, vma);
 		if (error)
 			goto unmap_and_free_vma;
-- 
2.43.0
Re: [PATCH] mm/mmap: Fix race in mmap_region() with ftrucate()
Posted by Lorenzo Stoakes 4 months, 3 weeks ago
On Tue, Oct 15, 2024 at 12:11:35PM -0400, Liam R. Howlett wrote:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
>
> Avoiding the zeroing of the vma tree in mmap_region() introduced a race
> with truncate in the page table walk.  To avoid any races, create a hole
> in the rmap during the operation by clearing the pagetable entries
> earlier under the mmap write lock and (critically) before the new vma is
> installed into the vma tree.  The result is that the old vma is still in
> the vma tree, but the page tables are cleared while holding the
> i_mmap_rwsem.

Nit: old VMAs (can be more than one).

Hmm this is not correct I don't think. We are not holding the i_mmap_rwsem
(nor could we as free_pgtables() ends up manipulating this lock.

Rather, free_pgtables() unlinks the old VMAs from the rmap before removing
their page tables, the key point here is that we have not yet established a
new VMA that makes page tables within the range valid and accessible.

Before we did so _prior_ to this, making it possible for a racing
ftruncate() to grab a PMD right before we free it and then UAF.

Now we make that impossible.

So I'd reword this to that effect.

>
> This change extends the fix required for hugetblfs and the call_mmap()
> function by moving the cleanup higher in the function and running it
> unconditionally.
>
> Cc: Jann Horn <jannh@google.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
> Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
> Reported-by: Jann Horn <jannh@google.com>
> Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> Link: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

Other than that, LGTM, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  mm/mmap.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..a20998fb633c 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1413,6 +1413,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>  		vmg.flags = vm_flags;
>  	}
>
> +	/*
> +	 * clear PTEs while the vma is still in the tree so that rmap
> +	 * cannot race with the freeing later in the truncate scenario.
> +	 * This is also needed for call_mmap(), which is why vm_ops
> +	 * close function is called.
> +	 */
> +	vms_clean_up_area(&vms, &mas_detach);
>  	vma = vma_merge_new_range(&vmg);
>  	if (vma)
>  		goto expanded;
> @@ -1432,11 +1439,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>
>  	if (file) {
>  		vma->vm_file = get_file(file);
> -		/*
> -		 * call_mmap() may map PTE, so ensure there are no existing PTEs
> -		 * and call the vm_ops close function if one exists.
> -		 */
> -		vms_clean_up_area(&vms, &mas_detach);
>  		error = call_mmap(file, vma);
>  		if (error)
>  			goto unmap_and_free_vma;
> --
> 2.43.0
>
Re: [PATCH] mm/mmap: Fix race in mmap_region() with ftrucate()
Posted by Liam R. Howlett 4 months, 3 weeks ago
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [241015 14:23]:
> On Tue, Oct 15, 2024 at 12:11:35PM -0400, Liam R. Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> >
> > Avoiding the zeroing of the vma tree in mmap_region() introduced a race
> > with truncate in the page table walk.  To avoid any races, create a hole
> > in the rmap during the operation by clearing the pagetable entries
> > earlier under the mmap write lock and (critically) before the new vma is
> > installed into the vma tree.  The result is that the old vma is still in
> > the vma tree, but the page tables are cleared while holding the
> > i_mmap_rwsem.
> 
> Nit: old VMAs (can be more than one).

right, okay.

> 
> Hmm this is not correct I don't think. We are not holding the i_mmap_rwsem
> (nor could we as free_pgtables() ends up manipulating this lock.

I meant we hold i_mmap_rwsem when clearing the ptes by calling
free_pgtables(), which does what we need in regards to locking.

> 
> Rather, free_pgtables() unlinks the old VMAs from the rmap before removing
> their page tables, the key point here is that we have not yet established a
> new VMA that makes page tables within the range valid and accessible.
> 
> Before we did so _prior_ to this, making it possible for a racing
> ftruncate() to grab a PMD right before we free it and then UAF.

You have totally lost me in your wording as well, but I get what you are
saying.  I'll try to make the change log more clear.

> 
> Now we make that impossible.
> 
> So I'd reword this to that effect.
> 
> >
> > This change extends the fix required for hugetblfs and the call_mmap()
> > function by moving the cleanup higher in the function and running it
> > unconditionally.
> >
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: David Hildenbrand <david@redhat.com>
> > Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
> > Reported-by: Jann Horn <jannh@google.com>
> > Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> > Link: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> 
> Other than that, LGTM, so:
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> 
> > ---
> >  mm/mmap.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index dd4b35a25aeb..a20998fb633c 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1413,6 +1413,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >  		vmg.flags = vm_flags;
> >  	}
> >
> > +	/*
> > +	 * clear PTEs while the vma is still in the tree so that rmap
> > +	 * cannot race with the freeing later in the truncate scenario.
> > +	 * This is also needed for call_mmap(), which is why vm_ops
> > +	 * close function is called.
> > +	 */
> > +	vms_clean_up_area(&vms, &mas_detach);
> >  	vma = vma_merge_new_range(&vmg);
> >  	if (vma)
> >  		goto expanded;
> > @@ -1432,11 +1439,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> >
> >  	if (file) {
> >  		vma->vm_file = get_file(file);
> > -		/*
> > -		 * call_mmap() may map PTE, so ensure there are no existing PTEs
> > -		 * and call the vm_ops close function if one exists.
> > -		 */
> > -		vms_clean_up_area(&vms, &mas_detach);
> >  		error = call_mmap(file, vma);
> >  		if (error)
> >  			goto unmap_and_free_vma;
> > --
> > 2.43.0
> >
Re: [PATCH] mm/mmap: Fix race in mmap_region() with ftrucate()
Posted by Jann Horn 4 months, 3 weeks ago
On Tue, Oct 15, 2024 at 6:12 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> Avoiding the zeroing of the vma tree in mmap_region() introduced a race
> with truncate in the page table walk.  To avoid any races, create a hole
> in the rmap during the operation by clearing the pagetable entries
> earlier under the mmap write lock and (critically) before the new vma is
> installed into the vma tree.  The result is that the old vma is still in
> the vma tree, but the page tables are cleared while holding the
> i_mmap_rwsem.
>
> This change extends the fix required for hugetblfs and the call_mmap()
> function by moving the cleanup higher in the function and running it
> unconditionally.
>
> Cc: Jann Horn <jannh@google.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
> Fixes: f8d112a4e657 ("mm/mmap: avoid zeroing vma tree in mmap_region()")
> Reported-by: Jann Horn <jannh@google.com>
> Closes: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> Link: https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

Thanks, this looks good to me.

Reviewed-by: Jann Horn <jannh@google.com>