Delegate all can_modify checks to the proper places. Unmap checks are
done in do_unmap (et al).
This patch allows for mremap partial failure in certain cases (for
instance, when destination VMAs aren't sealed, but the source VMA is).
It shouldn't be too troublesome, as you'd need to go out of your way to
do illegal operations on a VMA.
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
v2:
- Removed a superfluous check in mremap (Jeff Xu)
mm/mremap.c | 30 ++++--------------------------
1 file changed, 4 insertions(+), 26 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index e7ae140fc64..35afb3e38a8 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -821,6 +821,10 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
if (!vma)
return ERR_PTR(-EFAULT);
+ /* Don't allow vma expansion when it has already been sealed */
+ if (!can_modify_vma(vma))
+ return ERR_PTR(-EPERM);
+
/*
* !old_len is a special case where an attempt is made to 'duplicate'
* a mapping. This makes no sense for private mappings as it will
@@ -902,19 +906,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
return -ENOMEM;
- /*
- * In mremap_to().
- * Move a VMA to another location, check if src addr is sealed.
- *
- * Place can_modify_mm here because mremap_to()
- * does its own checking for address range, and we only
- * check the sealing after passing those checks.
- *
- * can_modify_mm assumes we have acquired the lock on MM.
- */
- if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
- return -EPERM;
-
if (flags & MREMAP_FIXED) {
/*
* In mremap_to().
@@ -1079,19 +1070,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
goto out;
}
- /*
- * Below is shrink/expand case (not mremap_to())
- * Check if src address is sealed, if so, reject.
- * In other words, prevent shrinking or expanding a sealed VMA.
- *
- * Place can_modify_mm here so we can keep the logic related to
- * shrink/expand together.
- */
- if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
- ret = -EPERM;
- goto out;
- }
-
/*
* Always allow a shrinking remap: that just unmaps
* the unnecessary pages..
--
2.46.0
* Pedro Falcato <pedro.falcato@gmail.com> [240807 17:13]:
> Delegate all can_modify checks to the proper places. Unmap checks are
> done in do_unmap (et al).
>
> This patch allows for mremap partial failure in certain cases (for
> instance, when destination VMAs aren't sealed, but the source VMA is).
> It shouldn't be too troublesome, as you'd need to go out of your way to
> do illegal operations on a VMA.
As mseal() is supposed to be a security thing, is the illegal operation
not a concern?
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> v2:
> - Removed a superfluous check in mremap (Jeff Xu)
>
> mm/mremap.c | 30 ++++--------------------------
> 1 file changed, 4 insertions(+), 26 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index e7ae140fc64..35afb3e38a8 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -821,6 +821,10 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> if (!vma)
> return ERR_PTR(-EFAULT);
>
> + /* Don't allow vma expansion when it has already been sealed */
> + if (!can_modify_vma(vma))
> + return ERR_PTR(-EPERM);
> +
> /*
> * !old_len is a special case where an attempt is made to 'duplicate'
> * a mapping. This makes no sense for private mappings as it will
> @@ -902,19 +906,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
> return -ENOMEM;
>
> - /*
> - * In mremap_to().
> - * Move a VMA to another location, check if src addr is sealed.
> - *
> - * Place can_modify_mm here because mremap_to()
> - * does its own checking for address range, and we only
> - * check the sealing after passing those checks.
> - *
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
> - return -EPERM;
> -
> if (flags & MREMAP_FIXED) {
> /*
> * In mremap_to().
> @@ -1079,19 +1070,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> goto out;
> }
>
> - /*
> - * Below is shrink/expand case (not mremap_to())
> - * Check if src address is sealed, if so, reject.
> - * In other words, prevent shrinking or expanding a sealed VMA.
> - *
> - * Place can_modify_mm here so we can keep the logic related to
> - * shrink/expand together.
> - */
> - if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
> - ret = -EPERM;
> - goto out;
> - }
> -
> /*
> * Always allow a shrinking remap: that just unmaps
> * the unnecessary pages..
> --
> 2.46.0
>
On Fri, Aug 9, 2024 at 5:06 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > * Pedro Falcato <pedro.falcato@gmail.com> [240807 17:13]: > > Delegate all can_modify checks to the proper places. Unmap checks are > > done in do_unmap (et al). > > > > This patch allows for mremap partial failure in certain cases (for > > instance, when destination VMAs aren't sealed, but the source VMA is). > > It shouldn't be too troublesome, as you'd need to go out of your way to > > do illegal operations on a VMA. > > As mseal() is supposed to be a security thing, is the illegal operation > not a concern? My 3 cents (note: I'm not a security guy): - Linux m*() operations have been allowed to partially fail for ages. POSIX only disallows this in the munmap case (which is why we need all that detached VMA logic), but not in any other case. We have a lot of other failure points in these syscalls, and would require extensive refactoring to patch this up (very likely with an inevitable performance regression, as we saw in this case). - Despite allowing for partial failure, this patch set always keeps the sealed VMAs untouched (so that invariant isn't broken). The munmap semantics remain untouched (and POSIXly correct) due to the detached VMA logic. - I personally have not heard of a single attack making use of this, and the performance hit is very measurable and exists _for every user_, despite mseal being a very niche feature (I cannot find a single user of mseal upstream, both in debian code search, github, chromium, v8, glibc, and what have you). I remember SIGKILL on a bad operation being a possibility during the mseal patch set talks, and if people are really scared of this partial stuff, I would guess that's still a possibility. There are no users after all... -- Pedro
* Pedro Falcato <pedro.falcato@gmail.com> [240809 14:45]: > On Fri, Aug 9, 2024 at 5:06 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > > * Pedro Falcato <pedro.falcato@gmail.com> [240807 17:13]: > > > Delegate all can_modify checks to the proper places. Unmap checks are > > > done in do_unmap (et al). > > > > > > This patch allows for mremap partial failure in certain cases (for > > > instance, when destination VMAs aren't sealed, but the source VMA is). > > > It shouldn't be too troublesome, as you'd need to go out of your way to > > > do illegal operations on a VMA. > > > > As mseal() is supposed to be a security thing, is the illegal operation > > not a concern? > > My 3 cents (note: I'm not a security guy): > > - Linux m*() operations have been allowed to partially fail for ages. > POSIX only disallows this in the munmap case (which is why we need all > that detached VMA logic), but not in any other case. We have a lot of > other failure points in these syscalls, and would require extensive > refactoring to patch this up (very likely with an inevitable > performance regression, as we saw in this case). > > - Despite allowing for partial failure, this patch set always keeps > the sealed VMAs untouched (so that invariant isn't broken). The munmap > semantics remain untouched (and POSIXly correct) due to the detached > VMA logic. > > - I personally have not heard of a single attack making use of this, > and the performance hit is very measurable and exists _for every > user_, despite mseal being a very niche feature (I cannot find a > single user of mseal upstream, both in debian code search, github, > chromium, v8, glibc, and what have you). > ... I really have no disagreement with the above statements, but looking at this further: vma_to_resize() is called in 2 places: 1. mremap() syscall mremap() calls vma_lookup() and then later calls vma_to_resize() which also calls vma_lookup() in the first 5 lines of the function. 2. mremap_to() static function mremap_to() is called only from mreamp(), but earlier than vma_to_resize(). If we move the vma check to mremap() after finding the vma, then we can avoid partial failures due to mseal(). We should probably check as much as possible there, but that change would be too large to fix a regression. iow the check was in the wrong place and was the wrong check, but we can use your check and move it up ~15 lines and everything will be the same and faster. For a later patch, there is an opportunity to even make this faster by passing through the vma to vma_to_resize(). We could remove another walk of the vma tree. Probably not necessary to fix the regression, but it would at least reduce the instruction count - if not a performance increase (depending on cache use). Thanks, Liam
© 2016 - 2026 Red Hat, Inc.