[v2] mm: Optimize mseal checks

[PATCH v2 4/6] mm/mremap: Replace can_modify_mm with can_modify_vma

Posted by Pedro Falcato 1 year, 6 months ago

Delegate all can_modify checks to the proper places. Unmap checks are
done in do_unmap (et al).

This patch allows for mremap partial failure in certain cases (for
instance, when destination VMAs aren't sealed, but the source VMA is).
It shouldn't be too troublesome, as you'd need to go out of your way to
do illegal operations on a VMA.

Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
v2:
	- Removed a superfluous check in mremap (Jeff Xu)

 mm/mremap.c | 30 ++++--------------------------
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index e7ae140fc64..35afb3e38a8 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -821,6 +821,10 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
 	if (!vma)
 		return ERR_PTR(-EFAULT);
 
+	/* Don't allow vma expansion when it has already been sealed */
+	if (!can_modify_vma(vma))
+		return ERR_PTR(-EPERM);
+
 	/*
 	 * !old_len is a special case where an attempt is made to 'duplicate'
 	 * a mapping.  This makes no sense for private mappings as it will
@@ -902,19 +906,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
 	if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
 		return -ENOMEM;
 
-	/*
-	 * In mremap_to().
-	 * Move a VMA to another location, check if src addr is sealed.
-	 *
-	 * Place can_modify_mm here because mremap_to()
-	 * does its own checking for address range, and we only
-	 * check the sealing after passing those checks.
-	 *
-	 * can_modify_mm assumes we have acquired the lock on MM.
-	 */
-	if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
-		return -EPERM;
-
 	if (flags & MREMAP_FIXED) {
 		/*
 		 * In mremap_to().
@@ -1079,19 +1070,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 		goto out;
 	}
 
-	/*
-	 * Below is shrink/expand case (not mremap_to())
-	 * Check if src address is sealed, if so, reject.
-	 * In other words, prevent shrinking or expanding a sealed VMA.
-	 *
-	 * Place can_modify_mm here so we can keep the logic related to
-	 * shrink/expand together.
-	 */
-	if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
-		ret = -EPERM;
-		goto out;
-	}
-
 	/*
 	 * Always allow a shrinking remap: that just unmaps
 	 * the unnecessary pages..
-- 
2.46.0

Re: [PATCH v2 4/6] mm/mremap: Replace can_modify_mm with can_modify_vma

Posted by Liam R. Howlett 1 year, 6 months ago

* Pedro Falcato <pedro.falcato@gmail.com> [240807 17:13]:
> Delegate all can_modify checks to the proper places. Unmap checks are
> done in do_unmap (et al).
> 
> This patch allows for mremap partial failure in certain cases (for
> instance, when destination VMAs aren't sealed, but the source VMA is).
> It shouldn't be too troublesome, as you'd need to go out of your way to
> do illegal operations on a VMA.

As mseal() is supposed to be a security thing, is the illegal operation
not a concern?

> 
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> v2:
> 	- Removed a superfluous check in mremap (Jeff Xu)
> 
>  mm/mremap.c | 30 ++++--------------------------
>  1 file changed, 4 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/mremap.c b/mm/mremap.c
> index e7ae140fc64..35afb3e38a8 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -821,6 +821,10 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
>  	if (!vma)
>  		return ERR_PTR(-EFAULT);
>  
> +	/* Don't allow vma expansion when it has already been sealed */
> +	if (!can_modify_vma(vma))
> +		return ERR_PTR(-EPERM);
> +
>  	/*
>  	 * !old_len is a special case where an attempt is made to 'duplicate'
>  	 * a mapping.  This makes no sense for private mappings as it will
> @@ -902,19 +906,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
>  	if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
>  		return -ENOMEM;
>  
> -	/*
> -	 * In mremap_to().
> -	 * Move a VMA to another location, check if src addr is sealed.
> -	 *
> -	 * Place can_modify_mm here because mremap_to()
> -	 * does its own checking for address range, and we only
> -	 * check the sealing after passing those checks.
> -	 *
> -	 * can_modify_mm assumes we have acquired the lock on MM.
> -	 */
> -	if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
> -		return -EPERM;
> -
>  	if (flags & MREMAP_FIXED) {
>  		/*
>  		 * In mremap_to().
> @@ -1079,19 +1070,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>  		goto out;
>  	}
>  
> -	/*
> -	 * Below is shrink/expand case (not mremap_to())
> -	 * Check if src address is sealed, if so, reject.
> -	 * In other words, prevent shrinking or expanding a sealed VMA.
> -	 *
> -	 * Place can_modify_mm here so we can keep the logic related to
> -	 * shrink/expand together.
> -	 */
> -	if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
> -		ret = -EPERM;
> -		goto out;
> -	}
> -
>  	/*
>  	 * Always allow a shrinking remap: that just unmaps
>  	 * the unnecessary pages..
> -- 
> 2.46.0
>

Re: [PATCH v2 4/6] mm/mremap: Replace can_modify_mm with can_modify_vma

Posted by Pedro Falcato 1 year, 6 months ago

On Fri, Aug 9, 2024 at 5:06 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Pedro Falcato <pedro.falcato@gmail.com> [240807 17:13]:
> > Delegate all can_modify checks to the proper places. Unmap checks are
> > done in do_unmap (et al).
> >
> > This patch allows for mremap partial failure in certain cases (for
> > instance, when destination VMAs aren't sealed, but the source VMA is).
> > It shouldn't be too troublesome, as you'd need to go out of your way to
> > do illegal operations on a VMA.
>
> As mseal() is supposed to be a security thing, is the illegal operation
> not a concern?

My 3 cents (note: I'm not a security guy):

- Linux m*() operations have been allowed to partially fail for ages.
POSIX only disallows this in the munmap case (which is why we need all
that detached VMA logic), but not in any other case. We have a lot of
other failure points in these syscalls, and would require extensive
refactoring to patch this up (very likely with an inevitable
performance regression, as we saw in this case).

- Despite allowing for partial failure, this patch set always keeps
the sealed VMAs untouched (so that invariant isn't broken). The munmap
semantics remain untouched (and POSIXly correct) due to the detached
VMA logic.

- I personally have not heard of a single attack making use of this,
and the performance hit is very measurable and exists _for every
user_, despite mseal being a very niche feature (I cannot find a
single user of mseal upstream, both in debian code search, github,
chromium, v8, glibc, and what have you).

I remember SIGKILL on a bad operation being a possibility during the
mseal patch set talks, and if people are really scared of this partial
stuff, I would guess that's still a possibility. There are no users
after all...

-- 
Pedro

Re: [PATCH v2 4/6] mm/mremap: Replace can_modify_mm with can_modify_vma

Posted by Liam R. Howlett 1 year, 6 months ago

* Pedro Falcato <pedro.falcato@gmail.com> [240809 14:45]:
> On Fri, Aug 9, 2024 at 5:06 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * Pedro Falcato <pedro.falcato@gmail.com> [240807 17:13]:
> > > Delegate all can_modify checks to the proper places. Unmap checks are
> > > done in do_unmap (et al).
> > >
> > > This patch allows for mremap partial failure in certain cases (for
> > > instance, when destination VMAs aren't sealed, but the source VMA is).
> > > It shouldn't be too troublesome, as you'd need to go out of your way to
> > > do illegal operations on a VMA.
> >
> > As mseal() is supposed to be a security thing, is the illegal operation
> > not a concern?
> 
> My 3 cents (note: I'm not a security guy):
> 
> - Linux m*() operations have been allowed to partially fail for ages.
> POSIX only disallows this in the munmap case (which is why we need all
> that detached VMA logic), but not in any other case. We have a lot of
> other failure points in these syscalls, and would require extensive
> refactoring to patch this up (very likely with an inevitable
> performance regression, as we saw in this case).
> 
> - Despite allowing for partial failure, this patch set always keeps
> the sealed VMAs untouched (so that invariant isn't broken). The munmap
> semantics remain untouched (and POSIXly correct) due to the detached
> VMA logic.
> 
> - I personally have not heard of a single attack making use of this,
> and the performance hit is very measurable and exists _for every
> user_, despite mseal being a very niche feature (I cannot find a
> single user of mseal upstream, both in debian code search, github,
> chromium, v8, glibc, and what have you).
> 

...

I really have no disagreement with the above statements, but looking at
this further: vma_to_resize() is called in 2 places:
1. mremap() syscall
mremap() calls vma_lookup() and then later calls vma_to_resize() which
also calls vma_lookup() in the first 5 lines of the function.

2. mremap_to() static function
mremap_to() is called only from mreamp(), but earlier than
vma_to_resize().

If we move the vma check to mremap() after finding the vma, then we can
avoid partial failures due to mseal().  We should probably check as much
as possible there, but that change would be too large to fix a
regression.

iow the check was in the wrong place and was the wrong check, but we can
use your check and move it up ~15 lines and everything will be the same
and faster.

For a later patch, there is an opportunity to even make this faster by
passing through the vma to vma_to_resize().  We could remove another
walk of the vma tree.  Probably not necessary to fix the regression, but
it would at least reduce the instruction count - if not a performance
increase (depending on cache use).

Thanks,
Liam