fs/userfaultfd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
The current implementation of validate_range() in fs/userfaultfd.c
performs a hard check against mmap_min_addr without considering
capabilities, but the mmap() syscall uses security_mmap_addr()
which allows privileged processes (with CAP_SYS_RAWIO) to map below
mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
dac_mmap_min_addr variable which can be changed with
/proc/sys/vm/mmap_min_addr.
Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
with -EINVAL for valid memory areas that were successfully mapped
below mmap_min_addr even with appropriate capabilities.
This prevents apps like binary compilers from using UFFD for valid memory
regions mapped by application.
Replace the rigid mmap_min_addr check with security_mmap_addr() to align
userfaultfd with the standard kernel memory mapping security policy.
Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
---
Initial RFC following the discussion on the [BUG] thread.
Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/
---
fs/userfaultfd.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index bdc84e521..dbfe5b2a0 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
return -EINVAL;
if (!len)
return -EINVAL;
- if (start < mmap_min_addr)
- return -EINVAL;
if (start >= task_size)
return -EINVAL;
if (len > task_size - start)
return -EINVAL;
if (start + len <= start)
return -EINVAL;
- return 0;
+ return security_mmap_addr(start);
}
static __always_inline int validate_range(struct mm_struct *mm,
--
2.47.3
On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: > The current implementation of validate_range() in fs/userfaultfd.c > performs a hard check against mmap_min_addr without considering > capabilities, but the mmap() syscall uses security_mmap_addr() > which allows privileged processes (with CAP_SYS_RAWIO) to map below > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > dac_mmap_min_addr variable which can be changed with > /proc/sys/vm/mmap_min_addr. > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > with -EINVAL for valid memory areas that were successfully mapped > below mmap_min_addr even with appropriate capabilities. > > This prevents apps like binary compilers from using UFFD for valid memory > regions mapped by application. > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > userfaultfd with the standard kernel memory mapping security policy. > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > --- > Initial RFC following the discussion on the [BUG] thread. > Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ > --- > fs/userfaultfd.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index bdc84e521..dbfe5b2a0 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > return -EINVAL; > if (!len) > return -EINVAL; > - if (start < mmap_min_addr) > - return -EINVAL; > if (start >= task_size) > return -EINVAL; > if (len > task_size - start) > return -EINVAL; > if (start + len <= start) > return -EINVAL; > - return 0; > + return security_mmap_addr(start); Is this introducing an ABI change? The old code returned -EINVAL when start was below mmap_min_addr. The new code calls security_mmap_addr() which returns -EPERM when the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking specifically for -EINVAL would see different behavior start is below mmap_min_addr. > } > > static __always_inline int validate_range(struct mm_struct *mm, > -- > 2.47.3 > >
On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote: > On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: > > > The current implementation of validate_range() in fs/userfaultfd.c > > performs a hard check against mmap_min_addr without considering > > capabilities, but the mmap() syscall uses security_mmap_addr() > > which allows privileged processes (with CAP_SYS_RAWIO) to map below > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > > dac_mmap_min_addr variable which can be changed with > > /proc/sys/vm/mmap_min_addr. > > > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > > with -EINVAL for valid memory areas that were successfully mapped > > below mmap_min_addr even with appropriate capabilities. > > > > This prevents apps like binary compilers from using UFFD for valid memory > > regions mapped by application. > > > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > > userfaultfd with the standard kernel memory mapping security policy. > > > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > > > --- > > Initial RFC following the discussion on the [BUG] thread. > > Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ > > --- > > fs/userfaultfd.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index bdc84e521..dbfe5b2a0 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > > return -EINVAL; > > if (!len) > > return -EINVAL; > > - if (start < mmap_min_addr) > > - return -EINVAL; > > if (start >= task_size) > > return -EINVAL; > > if (len > task_size - start) > > return -EINVAL; > > if (start + len <= start) > > return -EINVAL; > > - return 0; > > + return security_mmap_addr(start); > > Is this introducing an ABI change? > > The old code returned -EINVAL when start was below mmap_min_addr. > The new code calls security_mmap_addr() which returns -EPERM when > the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking > specifically for -EINVAL would see different behavior start is > below mmap_min_addr. You mean API change? :) we don't guarantee ABI for kernel stuff anyway. Firstly, as with Harry, I don't believe we should be duplicating checks here anyway. UFFD is duplicative enough as it is. And this is such a silly edge case that I don't think it is valid or reasonable for us to account for whichever totally insane user relies on a pointless re-check being done there and _then_ relies on the error code being... -EINVAL... which is overloaded for a million other possible failures. Let's let it be -EFAULT and remove this silly check altogether. > > > } > > > > static __always_inline int validate_range(struct mm_struct *mm, > > -- > > 2.47.3 > > > > Thanks, Lorenzo
On 09/04/2026 09:01, Lorenzo Stoakes wrote: > On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote: >> On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: >> >>> The current implementation of validate_range() in fs/userfaultfd.c >>> performs a hard check against mmap_min_addr without considering >>> capabilities, but the mmap() syscall uses security_mmap_addr() >>> which allows privileged processes (with CAP_SYS_RAWIO) to map below >>> mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses >>> dac_mmap_min_addr variable which can be changed with >>> /proc/sys/vm/mmap_min_addr. >>> >>> Because userfaultfd uses a different check, UFFDIO_REGISTER may fail >>> with -EINVAL for valid memory areas that were successfully mapped >>> below mmap_min_addr even with appropriate capabilities. >>> >>> This prevents apps like binary compilers from using UFFD for valid memory >>> regions mapped by application. >>> >>> Replace the rigid mmap_min_addr check with security_mmap_addr() to align >>> userfaultfd with the standard kernel memory mapping security policy. >>> >>> Signed-off-by: Denis M. Karpov <komlomal@gmail.com> >>> >>> --- >>> Initial RFC following the discussion on the [BUG] thread. >>> Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ >>> --- >>> fs/userfaultfd.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c >>> index bdc84e521..dbfe5b2a0 100644 >>> --- a/fs/userfaultfd.c >>> +++ b/fs/userfaultfd.c >>> @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( >>> return -EINVAL; >>> if (!len) >>> return -EINVAL; >>> - if (start < mmap_min_addr) >>> - return -EINVAL; >>> if (start >= task_size) >>> return -EINVAL; >>> if (len > task_size - start) >>> return -EINVAL; >>> if (start + len <= start) >>> return -EINVAL; >>> - return 0; >>> + return security_mmap_addr(start); >> >> Is this introducing an ABI change? >> >> The old code returned -EINVAL when start was below mmap_min_addr. >> The new code calls security_mmap_addr() which returns -EPERM when >> the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking >> specifically for -EINVAL would see different behavior start is >> below mmap_min_addr. > > You mean API change? :) we don't guarantee ABI for kernel stuff anyway. > Ah no, I meant ABI, I hope :) The return value of validate_unaligned_range() flows directly back to the ioctl() return value, which is visible to userspace. The error code a program sees from ioctl(fd, UFFDIO_REGISTER, ...) changes from -EINVAL to -EPERM for the same input, right? Its probably not an issue, but we would need to update https://man7.org/linux/man-pages/man2/ioctl_userfaultfd.2.html right?
On Thu, Apr 09, 2026 at 11:52:12AM +0100, Usama Arif wrote: > > > On 09/04/2026 09:01, Lorenzo Stoakes wrote: > > On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote: > >> On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: > >> > >>> The current implementation of validate_range() in fs/userfaultfd.c > >>> performs a hard check against mmap_min_addr without considering > >>> capabilities, but the mmap() syscall uses security_mmap_addr() > >>> which allows privileged processes (with CAP_SYS_RAWIO) to map below > >>> mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > >>> dac_mmap_min_addr variable which can be changed with > >>> /proc/sys/vm/mmap_min_addr. > >>> > >>> Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > >>> with -EINVAL for valid memory areas that were successfully mapped > >>> below mmap_min_addr even with appropriate capabilities. > >>> > >>> This prevents apps like binary compilers from using UFFD for valid memory > >>> regions mapped by application. > >>> > >>> Replace the rigid mmap_min_addr check with security_mmap_addr() to align > >>> userfaultfd with the standard kernel memory mapping security policy. > >>> > >>> Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > >>> > >>> --- > >>> Initial RFC following the discussion on the [BUG] thread. > >>> Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ > >>> --- > >>> fs/userfaultfd.c | 4 +--- > >>> 1 file changed, 1 insertion(+), 3 deletions(-) > >>> > >>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > >>> index bdc84e521..dbfe5b2a0 100644 > >>> --- a/fs/userfaultfd.c > >>> +++ b/fs/userfaultfd.c > >>> @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > >>> return -EINVAL; > >>> if (!len) > >>> return -EINVAL; > >>> - if (start < mmap_min_addr) > >>> - return -EINVAL; > >>> if (start >= task_size) > >>> return -EINVAL; > >>> if (len > task_size - start) > >>> return -EINVAL; > >>> if (start + len <= start) > >>> return -EINVAL; > >>> - return 0; > >>> + return security_mmap_addr(start); > >> > >> Is this introducing an ABI change? > >> > >> The old code returned -EINVAL when start was below mmap_min_addr. > >> The new code calls security_mmap_addr() which returns -EPERM when > >> the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking > >> specifically for -EINVAL would see different behavior start is > >> below mmap_min_addr. > > > > You mean API change? :) we don't guarantee ABI for kernel stuff anyway. > > > > Ah no, I meant ABI, I hope :) > > The return value of validate_unaligned_range() flows directly back to the > ioctl() return value, which is visible to userspace. The error code a program > sees from ioctl(fd, UFFDIO_REGISTER, ...) changes from -EINVAL to -EPERM for > the same input, right? Its probably not an issue, but we would need to update > https://man7.org/linux/man-pages/man2/ioctl_userfaultfd.2.html > right? Ah right I see, yeah just a doc change then :) Cheers, Lorenzo
(to Harry) > Technically, it's less restrictive only if start < mmap_min_addr > (setting aside the discussion of whether this is an appropriate check). > > Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process > should have the capability when registering an existing VMA to userfaultfd) Hmm, I can't find any checks for addr >= mmap_min_addr in the security subsystem, only if addr < mmap_min_addr. Otherwise, one would need capabilities for regular mmap() calls as well. > Was it a private discussion? I can't find Andrea's emails on the thread. Oh, it seems Andrea accidentally dropped some recipients from the CC list. I have CC'd him here so he can clarify his points if he feels it is necessary. (to Lorenzo) >Duplicating this kind of logic in the already horribly duplicative (and more >generally, horrible) UFFD implementation is actively buggy and incorrect IMO. So, no security_mmap_addr check, no FIRST_USER_ADDRESS check. Thank you both for the review. I'll prepare the patch. On Thu, Apr 9, 2026 at 11:01 AM Lorenzo Stoakes <ljs@kernel.org> wrote: > > On Wed, Apr 08, 2026 at 05:36:59AM -0700, Usama Arif wrote: > > On Tue, 7 Apr 2026 11:14:42 +0300 "Denis M. Karpov" <komlomal@gmail.com> wrote: > > > > > The current implementation of validate_range() in fs/userfaultfd.c > > > performs a hard check against mmap_min_addr without considering > > > capabilities, but the mmap() syscall uses security_mmap_addr() > > > which allows privileged processes (with CAP_SYS_RAWIO) to map below > > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses > > > dac_mmap_min_addr variable which can be changed with > > > /proc/sys/vm/mmap_min_addr. > > > > > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail > > > with -EINVAL for valid memory areas that were successfully mapped > > > below mmap_min_addr even with appropriate capabilities. > > > > > > This prevents apps like binary compilers from using UFFD for valid memory > > > regions mapped by application. > > > > > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align > > > userfaultfd with the standard kernel memory mapping security policy. > > > > > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com> > > > > > > --- > > > Initial RFC following the discussion on the [BUG] thread. > > > Link: https://lore.kernel.org/all/CADtiZd0tWysx5HMCUnOXfSHB7PXAuXg1Mh4eY_hUmH29S=sejg@mail.gmail.com/ > > > --- > > > fs/userfaultfd.c | 4 +--- > > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > > index bdc84e521..dbfe5b2a0 100644 > > > --- a/fs/userfaultfd.c > > > +++ b/fs/userfaultfd.c > > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range( > > > return -EINVAL; > > > if (!len) > > > return -EINVAL; > > > - if (start < mmap_min_addr) > > > - return -EINVAL; > > > if (start >= task_size) > > > return -EINVAL; > > > if (len > task_size - start) > > > return -EINVAL; > > > if (start + len <= start) > > > return -EINVAL; > > > - return 0; > > > + return security_mmap_addr(start); > > > > Is this introducing an ABI change? > > > > The old code returned -EINVAL when start was below mmap_min_addr. > > The new code calls security_mmap_addr() which returns -EPERM when > > the caller lacks CAP_SYS_RAWIO. Existing userspace callers checking > > specifically for -EINVAL would see different behavior start is > > below mmap_min_addr. > > You mean API change? :) we don't guarantee ABI for kernel stuff anyway. > > Firstly, as with Harry, I don't believe we should be duplicating checks here > anyway. UFFD is duplicative enough as it is. > > And this is such a silly edge case that I don't think it is valid or reasonable > for us to account for whichever totally insane user relies on a pointless > re-check being done there and _then_ relies on the error code > being... -EINVAL... which is overloaded for a million other possible failures. > > Let's let it be -EFAULT and remove this silly check altogether. > > > > > > } > > > > > > static __always_inline int validate_range(struct mm_struct *mm, > > > -- > > > 2.47.3 > > > > > > > > Thanks, Lorenzo
On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote:
> The current implementation of validate_range() in fs/userfaultfd.c
> performs a hard check against mmap_min_addr without considering
> capabilities, but the mmap() syscall uses security_mmap_addr()
> which allows privileged processes (with CAP_SYS_RAWIO) to map below
> mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> dac_mmap_min_addr variable which can be changed with
> /proc/sys/vm/mmap_min_addr.
>
> Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> with -EINVAL for valid memory areas that were successfully mapped
> below mmap_min_addr even with appropriate capabilities.
>
> This prevents apps like binary compilers from using UFFD for valid memory
> regions mapped by application.
>
> Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> userfaultfd with the standard kernel memory mapping security policy.
Perhaps worth adding
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
> Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
>
> ---
> fs/userfaultfd.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index bdc84e521..dbfe5b2a0 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
> return -EINVAL;
> if (!len)
> return -EINVAL;
> - if (start < mmap_min_addr)
> - return -EINVAL;
> if (start >= task_size)
> return -EINVAL;
> if (len > task_size - start)
> return -EINVAL;
> if (start + len <= start)
> return -EINVAL;
> - return 0;
> + return security_mmap_addr(start);
Hmm but it looks bit strange to check capability for address that is
already mapped by mmap(). Why is this required?
> }
--
Cheers,
Harry / Hyeonggon
> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Thank you, I will add this Fixes tag in the next patch.
> Hmm but it looks bit strange to check capability for address that is
> already mapped by mmap(). Why is this required?
Actually, it's not obvious to me either, but I may miss something. My intent was
to replace the current restrictive check with a more flexible one. I think
performing this check here allows us to deny invalid requests early,
before locks or VMA lookups occur.
Removing this check entirely would also allow using UFFD in cases where a task
drops privileges after the initial mmap(). This seems reasonable because the
VMA already exists, i.e. kernel already allowed this mapping.
In the [BUG] thread discussion Andrea Arcangeli also suggested adding
a check for
FIRST_USER_ADDRESS to handle architectural constraints.
Andrea, could you please comment on this? Specifically, would a
check against FIRST_USER_ADDRESS sufficient here, or do we still
need to check caps?
On Wed, Apr 8, 2026 at 6:21 AM Harry Yoo (Oracle) <harry@kernel.org> wrote:
>
> On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote:
> > The current implementation of validate_range() in fs/userfaultfd.c
> > performs a hard check against mmap_min_addr without considering
> > capabilities, but the mmap() syscall uses security_mmap_addr()
> > which allows privileged processes (with CAP_SYS_RAWIO) to map below
> > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> > dac_mmap_min_addr variable which can be changed with
> > /proc/sys/vm/mmap_min_addr.
> >
> > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> > with -EINVAL for valid memory areas that were successfully mapped
> > below mmap_min_addr even with appropriate capabilities.
> >
> > This prevents apps like binary compilers from using UFFD for valid memory
> > regions mapped by application.
> >
> > Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> > userfaultfd with the standard kernel memory mapping security policy.
>
> Perhaps worth adding
>
> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
>
> > Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
> >
> > ---
> > fs/userfaultfd.c | 4 +---
> > 1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index bdc84e521..dbfe5b2a0 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
> > return -EINVAL;
> > if (!len)
> > return -EINVAL;
> > - if (start < mmap_min_addr)
> > - return -EINVAL;
> > if (start >= task_size)
> > return -EINVAL;
> > if (len > task_size - start)
> > return -EINVAL;
> > if (start + len <= start)
> > return -EINVAL;
> > - return 0;
> > + return security_mmap_addr(start);
>
> Hmm but it looks bit strange to check capability for address that is
> already mapped by mmap(). Why is this required?
>
> > }
>
> --
> Cheers,
> Harry / Hyeonggon
On Wed, Apr 08, 2026 at 11:09:00AM +0300, Denis M. Karpov wrote:
> > Hmm but it looks bit strange to check capability for address that is
> > already mapped by mmap(). Why is this required?
>
> Actually, it's not obvious to me either, but I may miss something.
> My intent was to replace the current restrictive check with a more flexible one.
Technically, it's less restrictive only if start < mmap_min_addr
(setting aside the discussion of whether this is an appropriate check).
Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process
should have the capability when registering an existing VMA to userfaultfd)
> I think performing this check here allows us to deny invalid requests early,
> before locks or VMA lookups occur.
But we're not trying to optimize it and we shouldn't add checks without
a proper explanation for the sake of optimization.
> Removing this check entirely would also allow using UFFD in cases where a task
> drops privileges after the initial mmap(). This seems reasonable because the
> VMA already exists, i.e. kernel already allowed this mapping.
Yeah, that seems reasonable to me.
IOW, I don't think "creating a VMA on a specific address (w/ proper
capabilities) is okay but once it is registered to userfaultfd,
it becomes a security hole" is a valid argument.
And we don't unmap those mappings when the process loses the capability
to map them anyway.
> In the [BUG] thread discussion
Was it a private discussion? I can't find Andrea's emails on the thread.
> Andrea Arcangeli also suggested adding a check for
> FIRST_USER_ADDRESS to handle architectural constraints.
Again, what's the point of checking this on the VMA that is already created?
*checks why FIRST_USER_ADDRESS was introduced*
commit e2cdef8c847b480529b7e26991926aab4be008e6
Author: Hugh Dickins <hugh@veritas.com>
Date: Tue Apr 19 13:29:19 2005 -0700
[PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS
The patches to free_pgtables by vma left problems on any architectures which
leave some user address page table entries unencapsulated by vma. Andi has
fixed the 32-bit vDSO on x86_64 to use a vma. Now fix arm (and arm26), whose
first PAGE_SIZE is reserved (perhaps) for machine vectors.
Our calls to free_pgtables must not touch that area, and exit_mmap's
BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have
allocated an extra page table, which its free_pgd_slow would free later.
FIRST_USER_PGD_NR has misled me and others: until all the arches define
FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other. This
patch fixes the bugs, the remaining patches just clean it up.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oh, ok. there might be a raw mapping without VMA below FIRST_USER_ADDRESS.
Adding such a check wouldn't hurt... but if there is no VMA, you can't
register the range to userfaultfd anyway?
> Andrea, could you please comment on this? Specifically, would a
> check against FIRST_USER_ADDRESS sufficient here, or do we still
> need to check caps?
>
> On Wed, Apr 8, 2026 at 6:21 AM Harry Yoo (Oracle) <harry@kernel.org> wrote:
> >
> > On Tue, Apr 07, 2026 at 11:14:42AM +0300, Denis M. Karpov wrote:
> > > The current implementation of validate_range() in fs/userfaultfd.c
> > > performs a hard check against mmap_min_addr without considering
> > > capabilities, but the mmap() syscall uses security_mmap_addr()
> > > which allows privileged processes (with CAP_SYS_RAWIO) to map below
> > > mmap_min_addr. Furthermore, security_mmap_addr()->cap_mmap_addr() uses
> > > dac_mmap_min_addr variable which can be changed with
> > > /proc/sys/vm/mmap_min_addr.
> > >
> > > Because userfaultfd uses a different check, UFFDIO_REGISTER may fail
> > > with -EINVAL for valid memory areas that were successfully mapped
> > > below mmap_min_addr even with appropriate capabilities.
> > >
> > > This prevents apps like binary compilers from using UFFD for valid memory
> > > regions mapped by application.
> > >
> > > Replace the rigid mmap_min_addr check with security_mmap_addr() to align
> > > userfaultfd with the standard kernel memory mapping security policy.
> >
> > Perhaps worth adding
> >
> > Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
> >
> > > Signed-off-by: Denis M. Karpov <komlomal@gmail.com>
> > >
> > > ---
> > > fs/userfaultfd.c | 4 +---
> > > 1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index bdc84e521..dbfe5b2a0 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -1238,15 +1238,13 @@ static __always_inline int validate_unaligned_range(
> > > return -EINVAL;
> > > if (!len)
> > > return -EINVAL;
> > > - if (start < mmap_min_addr)
> > > - return -EINVAL;
> > > if (start >= task_size)
> > > return -EINVAL;
> > > if (len > task_size - start)
> > > return -EINVAL;
> > > if (start + len <= start)
> > > return -EINVAL;
> > > - return 0;
> > > + return security_mmap_addr(start);
> >
> > Hmm but it looks bit strange to check capability for address that is
> > already mapped by mmap(). Why is this required?
--
Cheers,
Harry / Hyeonggon
On Thu, Apr 09, 2026 at 11:51:11AM +0900, Harry Yoo (Oracle) wrote: > On Wed, Apr 08, 2026 at 11:09:00AM +0300, Denis M. Karpov wrote: > > > Hmm but it looks bit strange to check capability for address that is > > > already mapped by mmap(). Why is this required? > > > > Actually, it's not obvious to me either, but I may miss something. > > My intent was to replace the current restrictive check with a more flexible one. > > Technically, it's less restrictive only if start < mmap_min_addr > (setting aside the discussion of whether this is an appropriate check). > > Otherwise (start >= mmap_min_addr) it's more restrictive? (now, the process > should have the capability when registering an existing VMA to userfaultfd) > > > I think performing this check here allows us to deny invalid requests early, > > before locks or VMA lookups occur. > > But we're not trying to optimize it and we shouldn't add checks without > a proper explanation for the sake of optimization. Duplicating this kind of logic in the already horribly duplicative (and more generally, horrible) UFFD implementation is actively buggy and incorrect IMO. I also find it extremely odd that we are validating that a... source address... is... mapped that way (in userfaultfd_copy(), we validate uffdio_copy.src using validate_unaligned_range(), as well as the destination via validate_range()). It just makes no sense to me at all. Let's get rid of it. > > > Removing this check entirely would also allow using UFFD in cases where a task > > drops privileges after the initial mmap(). This seems reasonable because the > > VMA already exists, i.e. kernel already allowed this mapping. > > Yeah, that seems reasonable to me. > > IOW, I don't think "creating a VMA on a specific address (w/ proper > capabilities) is okay but once it is registered to userfaultfd, > it becomes a security hole" is a valid argument. Yes. > > And we don't unmap those mappings when the process loses the capability > to map them anyway. Once it's mapped it's mapped...? > > > In the [BUG] thread discussion > > Was it a private discussion? I can't find Andrea's emails on the thread. > > > Andrea Arcangeli also suggested adding a check for > > FIRST_USER_ADDRESS to handle architectural constraints. > > Again, what's the point of checking this on the VMA that is already created? > *checks why FIRST_USER_ADDRESS was introduced* Yeah this is just the exact same thing with a different thing to compare against no? copy_from_user() will handle this in mfill_copy_folio_locked(), returning an error if a user tried to copy from somewhere they shouldn't have (the same way as if the user tried to copy from somewhere else they shouldn't have). Let's not block on off-list sidebars. > > commit e2cdef8c847b480529b7e26991926aab4be008e6 > Author: Hugh Dickins <hugh@veritas.com> > Date: Tue Apr 19 13:29:19 2005 -0700 > > [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS > > The patches to free_pgtables by vma left problems on any architectures which > leave some user address page table entries unencapsulated by vma. Andi has > fixed the 32-bit vDSO on x86_64 to use a vma. Now fix arm (and arm26), whose > first PAGE_SIZE is reserved (perhaps) for machine vectors. > > Our calls to free_pgtables must not touch that area, and exit_mmap's > BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have > allocated an extra page table, which its free_pgd_slow would free later. > > FIRST_USER_PGD_NR has misled me and others: until all the arches define > FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other. This > patch fixes the bugs, the remaining patches just clean it up. > > Signed-off-by: Hugh Dickins <hugh@veritas.com> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > Oh, ok. there might be a raw mapping without VMA below FIRST_USER_ADDRESS. > > Adding such a check wouldn't hurt... but if there is no VMA, you can't > register the range to userfaultfd anyway? Exactly... and I don't want to see us randomly do checks that already happened previously. Putting duplicated bitrot-baiting code in what is one of the worst areas of mm is not something I want us to do, and would like us to actively remove anything that already exists like this. And the fact that this is in an fs/ file is even more annoying to me. Really I don't think _any_ meaningful uffd logic belongs there. Especially since we have a bunch of other uffd crap in mm/userfaultfd.c. The fs/userfaultfd.c file should be a bare-bones thing that handles the fs side of uffd _only_. Cheers, Lorenzo
© 2016 - 2026 Red Hat, Inc.