While it's not strictly necessary to lock a newly created vma before
adding it into the vma tree (as long as no further changes are performed
to it), it seems like a good policy to lock it and prevent accidental
changes after it becomes visible to the page faults. Lock the vma before
adding it into the vma tree.
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---
mm/mmap.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 3937479d0e07..850a39dee075 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -412,6 +412,8 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma)
if (vma_iter_prealloc(&vmi))
return -ENOMEM;
+ vma_start_write(vma);
+
if (vma->vm_file) {
mapping = vma->vm_file->f_mapping;
i_mmap_lock_write(mapping);
@@ -477,7 +479,8 @@ static inline void vma_prepare(struct vma_prepare *vp)
vma_start_write(vp->vma);
if (vp->adj_next)
vma_start_write(vp->adj_next);
- /* vp->insert is always a newly created VMA, no need for locking */
+ if (vp->insert)
+ vma_start_write(vp->insert);
if (vp->remove)
vma_start_write(vp->remove);
if (vp->remove2)
@@ -3098,6 +3101,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma->vm_pgoff = addr >> PAGE_SHIFT;
vm_flags_init(vma, flags);
vma->vm_page_prot = vm_get_page_prot(flags);
+ vma_start_write(vma);
if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
goto mas_store_fail;
@@ -3345,7 +3349,6 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
get_file(new_vma->vm_file);
if (new_vma->vm_ops && new_vma->vm_ops->open)
new_vma->vm_ops->open(new_vma);
- vma_start_write(new_vma);
if (vma_link(mm, new_vma))
goto out_vma_link;
*need_rmap_locks = false;
--
2.41.0.585.gd2178a4bd4-goog
On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan <surenb@google.com> wrote: > > While it's not strictly necessary to lock a newly created vma before > adding it into the vma tree (as long as no further changes are performed > to it), it seems like a good policy to lock it and prevent accidental > changes after it becomes visible to the page faults. Lock the vma before > adding it into the vma tree. So my main reaction here is that I started to wonder about the vma allocation. Why doesn't vma_init() do something like mmap_assert_write_locked(mm); vma->vm_lock_seq = mm->mm_lock_seq; and instead we seem to expect vma_lock_alloc() to do this (and do it very badly indeed). Strange. Anyway, this observation was just a reaction to that "not strictly necessary to lock a newly created vma" part of the commentary. I feel like we could/should just make sure that all newly created vma's are always simply created write-locked. Linus
* Linus Torvalds <torvalds@linux-foundation.org> [230803 14:02]: > On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan <surenb@google.com> wrote: > > > > While it's not strictly necessary to lock a newly created vma before > > adding it into the vma tree (as long as no further changes are performed > > to it), it seems like a good policy to lock it and prevent accidental > > changes after it becomes visible to the page faults. Lock the vma before > > adding it into the vma tree. > > So my main reaction here is that I started to wonder about the vma allocation. > > Why doesn't vma_init() do something like > > mmap_assert_write_locked(mm); > vma->vm_lock_seq = mm->mm_lock_seq; > > and instead we seem to expect vma_lock_alloc() to do this (and do it > very badly indeed). > > Strange. > > Anyway, this observation was just a reaction to that "not strictly > necessary to lock a newly created vma" part of the commentary. I feel > like we could/should just make sure that all newly created vma's are > always simply created write-locked. > I thought the same thing initially, but Suren pointed out that it's not necessary to hold the vma lock to allocate a vma object. And it seems there is at least one user (arch/ia64/mm/init.c) which does allocate outside the lock during ia64_init_addr_space(), which is fine but I'm not sure it gains much to do it this way - the insert needs to take the lock anyways and it is hardly going to be contended. Anywhere else besides an address space setup would probably introduce a race. Thanks, Liam
On Thu, Aug 3, 2023 at 11:15 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > * Linus Torvalds <torvalds@linux-foundation.org> [230803 14:02]: > > On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > While it's not strictly necessary to lock a newly created vma before > > > adding it into the vma tree (as long as no further changes are performed > > > to it), it seems like a good policy to lock it and prevent accidental > > > changes after it becomes visible to the page faults. Lock the vma before > > > adding it into the vma tree. > > > > So my main reaction here is that I started to wonder about the vma allocation. > > > > Why doesn't vma_init() do something like > > > > mmap_assert_write_locked(mm); > > vma->vm_lock_seq = mm->mm_lock_seq; > > > > and instead we seem to expect vma_lock_alloc() to do this (and do it > > very badly indeed). > > > > Strange. > > > > Anyway, this observation was just a reaction to that "not strictly > > necessary to lock a newly created vma" part of the commentary. I feel > > like we could/should just make sure that all newly created vma's are > > always simply created write-locked. > > > > I thought the same thing initially, but Suren pointed out that it's not > necessary to hold the vma lock to allocate a vma object. And it seems > there is at least one user (arch/ia64/mm/init.c) which does allocate > outside the lock during ia64_init_addr_space(), which is fine but I'm > not sure it gains much to do it this way - the insert needs to take the > lock anyways and it is hardly going to be contended. Yeah, I remember discussing that. At the time of VMA creation the mmap_lock might not be write-locked, so mmap_assert_write_locked() would trigger and mm->mm_lock_seq is not stable. Maybe we can necessitate holding mmap_lock at the time of VMA creation but that sounds like an unnecessary restriction. IIRC some drivers also create vm_are_structs without holding mmap_lock... I'll double-check. > > Anywhere else besides an address space setup would probably introduce a > race. > > Thanks, > Liam >
On Thu, Aug 3, 2023 at 11:26 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Thu, Aug 3, 2023 at 11:15 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > > * Linus Torvalds <torvalds@linux-foundation.org> [230803 14:02]: > > > On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > > > While it's not strictly necessary to lock a newly created vma before > > > > adding it into the vma tree (as long as no further changes are performed > > > > to it), it seems like a good policy to lock it and prevent accidental > > > > changes after it becomes visible to the page faults. Lock the vma before > > > > adding it into the vma tree. > > > > > > So my main reaction here is that I started to wonder about the vma allocation. > > > > > > Why doesn't vma_init() do something like > > > > > > mmap_assert_write_locked(mm); > > > vma->vm_lock_seq = mm->mm_lock_seq; > > > > > > and instead we seem to expect vma_lock_alloc() to do this (and do it > > > very badly indeed). > > > > > > Strange. > > > > > > Anyway, this observation was just a reaction to that "not strictly > > > necessary to lock a newly created vma" part of the commentary. I feel > > > like we could/should just make sure that all newly created vma's are > > > always simply created write-locked. > > > > > > > I thought the same thing initially, but Suren pointed out that it's not > > necessary to hold the vma lock to allocate a vma object. And it seems > > there is at least one user (arch/ia64/mm/init.c) which does allocate > > outside the lock during ia64_init_addr_space(), which is fine but I'm > > not sure it gains much to do it this way - the insert needs to take the > > lock anyways and it is hardly going to be contended. > > Yeah, I remember discussing that. At the time of VMA creation the > mmap_lock might not be write-locked, so mmap_assert_write_locked() > would trigger and mm->mm_lock_seq is not stable. Maybe we can > necessitate holding mmap_lock at the time of VMA creation but that > sounds like an unnecessary restriction. IIRC some drivers also create > vm_are_structs without holding mmap_lock... I'll double-check. Yeah, there are places like an initcall gate_vma_init() which call vma_init(). I don't think these are called with a locked mmap_lock. > > > > > Anywhere else besides an address space setup would probably introduce a > > race. > > > > Thanks, > > Liam > >
© 2016 - 2025 Red Hat, Inc.