fs/hugetlbfs/inode.c | 2 +- include/linux/fs.h | 4 ++-- include/linux/mm.h | 24 ++++++++++++++++++------ kernel/fork.c | 2 +- mm/filemap.c | 2 +- mm/madvise.c | 2 +- mm/mmap.c | 22 +++++++++++----------- mm/shmem.c | 2 +- 8 files changed, 36 insertions(+), 24 deletions(-)
This patch series is in two parts:- 1. Currently there are a number of places in the kernel where we assume VM_SHARED implies that a mapping is writable. Let's be slightly less strict and relax this restriction in the case that VM_MAYWRITE is not set. This should have no noticeable impact as the lack of VM_MAYWRITE implies that the mapping can not be made writable via mprotect() or any other means. 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). The latter already clears the VM_MAYWRITE flag for a sealed read-only mapping, we simply extend this to F_SEAL_WRITE too. For this to have effect, we must also invoke call_mmap() before mapping_map_writable(). As this is quite a fundamental change on the assumptions around VM_SHARED and since this causes a visible change to userland (in permitting read-only shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC to see if there is anything terribly wrong with it. I suspect even if the patch series as a whole is unpalatable, there are probably things we can salvage from it in any case. Thanks to Andy Lutomirski who inspired the series! Lorenzo Stoakes (3): mm: drop the assumption that VM_SHARED always implies writable mm: update seal_check_[future_]write() to include F_SEAL_WRITE as well mm: perform the mapping_map_writable() check after call_mmap() fs/hugetlbfs/inode.c | 2 +- include/linux/fs.h | 4 ++-- include/linux/mm.h | 24 ++++++++++++++++++------ kernel/fork.c | 2 +- mm/filemap.c | 2 +- mm/madvise.c | 2 +- mm/mmap.c | 22 +++++++++++----------- mm/shmem.c | 2 +- 8 files changed, 36 insertions(+), 24 deletions(-) -- 2.40.0
Hi! On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote: > This patch series is in two parts:- > > 1. Currently there are a number of places in the kernel where we assume > VM_SHARED implies that a mapping is writable. Let's be slightly less > strict and relax this restriction in the case that VM_MAYWRITE is not > set. > > This should have no noticeable impact as the lack of VM_MAYWRITE implies > that the mapping can not be made writable via mprotect() or any other > means. > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). > The latter already clears the VM_MAYWRITE flag for a sealed read-only > mapping, we simply extend this to F_SEAL_WRITE too. > > For this to have effect, we must also invoke call_mmap() before > mapping_map_writable(). > > As this is quite a fundamental change on the assumptions around VM_SHARED > and since this causes a visible change to userland (in permitting read-only > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC > to see if there is anything terribly wrong with it. So what I miss in this series is what the motivation is. Is it that you need to map F_SEAL_WRITE read-only? Why? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote:
> Hi!
>
> On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote:
> > This patch series is in two parts:-
> >
> > 1. Currently there are a number of places in the kernel where we assume
> > VM_SHARED implies that a mapping is writable. Let's be slightly less
> > strict and relax this restriction in the case that VM_MAYWRITE is not
> > set.
> >
> > This should have no noticeable impact as the lack of VM_MAYWRITE implies
> > that the mapping can not be made writable via mprotect() or any other
> > means.
> >
> > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap().
> > The latter already clears the VM_MAYWRITE flag for a sealed read-only
> > mapping, we simply extend this to F_SEAL_WRITE too.
> >
> > For this to have effect, we must also invoke call_mmap() before
> > mapping_map_writable().
> >
> > As this is quite a fundamental change on the assumptions around VM_SHARED
> > and since this causes a visible change to userland (in permitting read-only
> > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC
> > to see if there is anything terribly wrong with it.
>
> So what I miss in this series is what the motivation is. Is it that you need
> to map F_SEAL_WRITE read-only? Why?
>
This originated from the discussion in [1], which refers to the bug
reported in [2]. Essentially the user is write-sealing a memfd then trying
to mmap it read-only, but receives an -EPERM error.
F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not.
The fcntl() man page states:
Furthermore, trying to create new shared, writable memory-mappings via
mmap(2) will also fail with EPERM.
So the kernel does not behave as the documentation states.
I took the user-supplied repro and slightly modified it, enclosed
below. After this patch series, this code works correctly.
I think there's definitely a case for the VM_MAYWRITE part of this patch
series even if the memfd bits are not considered useful, as we do seem to
make the implicit assumption that MAP_SHARED == writable even if
!VM_MAYWRITE which seems odd.
Reproducer:-
int main()
{
int fd = memfd_create("test", MFD_ALLOW_SEALING);
if (fd == -1) {
perror("memfd_create");
return EXIT_FAILURE;
}
write(fd, "test", 4);
if (fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE) == -1) {
perror("fcntl");
return EXIT_FAILURE;
}
void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0);
if (ret == MAP_FAILED) {
perror("mmap");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
[1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/
[2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238
> Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
On Fri 21-04-23 22:23:12, Lorenzo Stoakes wrote: > On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote: > > Hi! > > > > On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote: > > > This patch series is in two parts:- > > > > > > 1. Currently there are a number of places in the kernel where we assume > > > VM_SHARED implies that a mapping is writable. Let's be slightly less > > > strict and relax this restriction in the case that VM_MAYWRITE is not > > > set. > > > > > > This should have no noticeable impact as the lack of VM_MAYWRITE implies > > > that the mapping can not be made writable via mprotect() or any other > > > means. > > > > > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). > > > The latter already clears the VM_MAYWRITE flag for a sealed read-only > > > mapping, we simply extend this to F_SEAL_WRITE too. > > > > > > For this to have effect, we must also invoke call_mmap() before > > > mapping_map_writable(). > > > > > > As this is quite a fundamental change on the assumptions around VM_SHARED > > > and since this causes a visible change to userland (in permitting read-only > > > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC > > > to see if there is anything terribly wrong with it. > > > > So what I miss in this series is what the motivation is. Is it that you need > > to map F_SEAL_WRITE read-only? Why? > > > > This originated from the discussion in [1], which refers to the bug > reported in [2]. Essentially the user is write-sealing a memfd then trying > to mmap it read-only, but receives an -EPERM error. > > F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not. > > The fcntl() man page states: > > Furthermore, trying to create new shared, writable memory-mappings via > mmap(2) will also fail with EPERM. > > So the kernel does not behave as the documentation states. > > I took the user-supplied repro and slightly modified it, enclosed > below. After this patch series, this code works correctly. > > I think there's definitely a case for the VM_MAYWRITE part of this patch > series even if the memfd bits are not considered useful, as we do seem to > make the implicit assumption that MAP_SHARED == writable even if > !VM_MAYWRITE which seems odd. Thanks for the explanation! Could you please include this information in the cover letter (perhaps in a form of a short note and reference to the mailing list) for future reference? Thanks! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
On Mon, Apr 24, 2023 at 02:19:36PM +0200, Jan Kara wrote: > On Fri 21-04-23 22:23:12, Lorenzo Stoakes wrote: > > On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote: > > > Hi! > > > > > > On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote: > > > > This patch series is in two parts:- > > > > > > > > 1. Currently there are a number of places in the kernel where we assume > > > > VM_SHARED implies that a mapping is writable. Let's be slightly less > > > > strict and relax this restriction in the case that VM_MAYWRITE is not > > > > set. > > > > > > > > This should have no noticeable impact as the lack of VM_MAYWRITE implies > > > > that the mapping can not be made writable via mprotect() or any other > > > > means. > > > > > > > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). > > > > The latter already clears the VM_MAYWRITE flag for a sealed read-only > > > > mapping, we simply extend this to F_SEAL_WRITE too. > > > > > > > > For this to have effect, we must also invoke call_mmap() before > > > > mapping_map_writable(). > > > > > > > > As this is quite a fundamental change on the assumptions around VM_SHARED > > > > and since this causes a visible change to userland (in permitting read-only > > > > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC > > > > to see if there is anything terribly wrong with it. > > > > > > So what I miss in this series is what the motivation is. Is it that you need > > > to map F_SEAL_WRITE read-only? Why? > > > > > > > This originated from the discussion in [1], which refers to the bug > > reported in [2]. Essentially the user is write-sealing a memfd then trying > > to mmap it read-only, but receives an -EPERM error. > > > > F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not. > > > > The fcntl() man page states: > > > > Furthermore, trying to create new shared, writable memory-mappings via > > mmap(2) will also fail with EPERM. > > > > So the kernel does not behave as the documentation states. > > > > I took the user-supplied repro and slightly modified it, enclosed > > below. After this patch series, this code works correctly. > > > > I think there's definitely a case for the VM_MAYWRITE part of this patch > > series even if the memfd bits are not considered useful, as we do seem to > > make the implicit assumption that MAP_SHARED == writable even if > > !VM_MAYWRITE which seems odd. > > Thanks for the explanation! Could you please include this information in > the cover letter (perhaps in a form of a short note and reference to the > mailing list) for future reference? Thanks! > > Honza > Sure, apologies for not being clear about that :) I may respin this as a non-RFC (with updated description of course) as its received very little attention as an RFC and I don't think it's so insane/huge a concept as to warrant remaining one. > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
© 2016 - 2026 Red Hat, Inc.