The root anon_vma of all anon_vma's linked to a VMA must by definition be
the same - a VMA and all of its descendants/ancestors must exist in the
same CoW chain.
Commit bb4aa39676f7 ("mm: avoid repeated anon_vma lock/unlock sequences in
anon_vma_clone()") introduced paranoid checking of the root anon_vma
remaining the same throughout all AVC's in 2011.
I think 15 years later we can safely assume that this is always the case.
Additionally, since unfaulted VMAs being cloned from or unlinked are
no-op's, we can simply lock the anon_vma's associated with this rather than
doing any specific dance around this.
This removes unnecessary checks and makes it clear that the root anon_vma
is shared between all anon_vma's in a given VMA's anon_vma_chain.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/rmap.c | 48 ++++++++++++------------------------------------
1 file changed, 12 insertions(+), 36 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 9332d1cbc643..60134a566073 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -231,32 +231,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
return -ENOMEM;
}
-/*
- * This is a useful helper function for locking the anon_vma root as
- * we traverse the vma->anon_vma_chain, looping over anon_vma's that
- * have the same vma.
- *
- * Such anon_vma's should have the same root, so you'd expect to see
- * just a single mutex_lock for the whole traversal.
- */
-static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma)
-{
- struct anon_vma *new_root = anon_vma->root;
- if (new_root != root) {
- if (WARN_ON_ONCE(root))
- up_write(&root->rwsem);
- root = new_root;
- down_write(&root->rwsem);
- }
- return root;
-}
-
-static inline void unlock_anon_vma_root(struct anon_vma *root)
-{
- if (root)
- up_write(&root->rwsem);
-}
-
static void check_anon_vma_clone(struct vm_area_struct *dst,
struct vm_area_struct *src)
{
@@ -307,26 +281,25 @@ static void check_anon_vma_clone(struct vm_area_struct *dst,
int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
{
struct anon_vma_chain *avc, *pavc;
- struct anon_vma *root = NULL;
if (!src->anon_vma)
return 0;
check_anon_vma_clone(dst, src);
+ anon_vma_lock_write(src->anon_vma);
list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma;
avc = anon_vma_chain_alloc(GFP_NOWAIT);
if (unlikely(!avc)) {
- unlock_anon_vma_root(root);
- root = NULL;
+ anon_vma_unlock_write(src->anon_vma);
avc = anon_vma_chain_alloc(GFP_KERNEL);
if (!avc)
goto enomem_failure;
+ anon_vma_lock_write(src->anon_vma);
}
anon_vma = pavc->anon_vma;
- root = lock_anon_vma_root(root, anon_vma);
anon_vma_chain_link(dst, avc, anon_vma);
/*
@@ -343,7 +316,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
}
if (dst->anon_vma)
dst->anon_vma->num_active_vmas++;
- unlock_anon_vma_root(root);
+
+ anon_vma_unlock_write(src->anon_vma);
return 0;
enomem_failure:
@@ -438,15 +412,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
void unlink_anon_vmas(struct vm_area_struct *vma)
{
struct anon_vma_chain *avc, *next;
- struct anon_vma *root = NULL;
+ struct anon_vma *active_anon_vma = vma->anon_vma;
/* Always hold mmap lock, read-lock on unmap possibly. */
mmap_assert_locked(vma->vm_mm);
/* Unfaulted is a no-op. */
- if (!vma->anon_vma)
+ if (!active_anon_vma)
return;
+ anon_vma_lock_write(active_anon_vma);
+
/*
* Unlink each anon_vma chained to the VMA. This list is ordered
* from newest to oldest, ensuring the root anon_vma gets freed last.
@@ -454,7 +430,6 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma = avc->anon_vma;
- root = lock_anon_vma_root(root, anon_vma);
anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
/*
@@ -470,13 +445,14 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
anon_vma_chain_free(avc);
}
- vma->anon_vma->num_active_vmas--;
+ active_anon_vma->num_active_vmas--;
/*
* vma would still be needed after unlink, and anon_vma will be prepared
* when handle fault.
*/
vma->anon_vma = NULL;
- unlock_anon_vma_root(root);
+ anon_vma_unlock_write(active_anon_vma);
+
/*
* Iterate the list once more, it now only contains empty and unlinked
--
2.52.0
On Wed, Dec 17, 2025 at 4:27 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> The root anon_vma of all anon_vma's linked to a VMA must by definition be
> the same - a VMA and all of its descendants/ancestors must exist in the
> same CoW chain.
>
> Commit bb4aa39676f7 ("mm: avoid repeated anon_vma lock/unlock sequences in
> anon_vma_clone()") introduced paranoid checking of the root anon_vma
> remaining the same throughout all AVC's in 2011.
>
> I think 15 years later we can safely assume that this is always the case.
>
> Additionally, since unfaulted VMAs being cloned from or unlinked are
> no-op's, we can simply lock the anon_vma's associated with this rather than
> doing any specific dance around this.
>
> This removes unnecessary checks and makes it clear that the root anon_vma
> is shared between all anon_vma's in a given VMA's anon_vma_chain.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/rmap.c | 48 ++++++++++++------------------------------------
> 1 file changed, 12 insertions(+), 36 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9332d1cbc643..60134a566073 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -231,32 +231,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> return -ENOMEM;
> }
>
> -/*
> - * This is a useful helper function for locking the anon_vma root as
> - * we traverse the vma->anon_vma_chain, looping over anon_vma's that
> - * have the same vma.
> - *
> - * Such anon_vma's should have the same root, so you'd expect to see
> - * just a single mutex_lock for the whole traversal.
> - */
> -static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma)
> -{
> - struct anon_vma *new_root = anon_vma->root;
> - if (new_root != root) {
> - if (WARN_ON_ONCE(root))
> - up_write(&root->rwsem);
> - root = new_root;
> - down_write(&root->rwsem);
> - }
> - return root;
> -}
> -
> -static inline void unlock_anon_vma_root(struct anon_vma *root)
> -{
> - if (root)
> - up_write(&root->rwsem);
> -}
> -
> static void check_anon_vma_clone(struct vm_area_struct *dst,
> struct vm_area_struct *src)
> {
> @@ -307,26 +281,25 @@ static void check_anon_vma_clone(struct vm_area_struct *dst,
> int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> {
> struct anon_vma_chain *avc, *pavc;
> - struct anon_vma *root = NULL;
>
> if (!src->anon_vma)
> return 0;
>
> check_anon_vma_clone(dst, src);
>
> + anon_vma_lock_write(src->anon_vma);
> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> struct anon_vma *anon_vma;
>
> avc = anon_vma_chain_alloc(GFP_NOWAIT);
> if (unlikely(!avc)) {
> - unlock_anon_vma_root(root);
> - root = NULL;
> + anon_vma_unlock_write(src->anon_vma);
> avc = anon_vma_chain_alloc(GFP_KERNEL);
> if (!avc)
> goto enomem_failure;
> + anon_vma_lock_write(src->anon_vma);
So, we drop and then reacquire src->anon_vma->root->rwsem, expecting
src->anon_vma and src->anon_vma->root to be the same. And IIUC
src->vm_mm's mmap lock is what guarantees all this. If so, could you
please add a clarifying comment here?
> }
> anon_vma = pavc->anon_vma;
> - root = lock_anon_vma_root(root, anon_vma);
> anon_vma_chain_link(dst, avc, anon_vma);
>
> /*
> @@ -343,7 +316,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> }
> if (dst->anon_vma)
> dst->anon_vma->num_active_vmas++;
> - unlock_anon_vma_root(root);
> +
> + anon_vma_unlock_write(src->anon_vma);
> return 0;
>
> enomem_failure:
> @@ -438,15 +412,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> void unlink_anon_vmas(struct vm_area_struct *vma)
> {
> struct anon_vma_chain *avc, *next;
> - struct anon_vma *root = NULL;
> + struct anon_vma *active_anon_vma = vma->anon_vma;
>
> /* Always hold mmap lock, read-lock on unmap possibly. */
> mmap_assert_locked(vma->vm_mm);
>
> /* Unfaulted is a no-op. */
> - if (!vma->anon_vma)
> + if (!active_anon_vma)
> return;
>
> + anon_vma_lock_write(active_anon_vma);
> +
> /*
> * Unlink each anon_vma chained to the VMA. This list is ordered
> * from newest to oldest, ensuring the root anon_vma gets freed last.
> @@ -454,7 +430,6 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
> struct anon_vma *anon_vma = avc->anon_vma;
>
> - root = lock_anon_vma_root(root, anon_vma);
> anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
>
> /*
> @@ -470,13 +445,14 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> anon_vma_chain_free(avc);
> }
>
> - vma->anon_vma->num_active_vmas--;
> + active_anon_vma->num_active_vmas--;
> /*
> * vma would still be needed after unlink, and anon_vma will be prepared
> * when handle fault.
> */
> vma->anon_vma = NULL;
> - unlock_anon_vma_root(root);
> + anon_vma_unlock_write(active_anon_vma);
> +
>
> /*
> * Iterate the list once more, it now only contains empty and unlinked
> --
> 2.52.0
>
On Mon, Dec 29, 2025 at 02:17:53PM -0800, Suren Baghdasaryan wrote:
> On Wed, Dec 17, 2025 at 4:27 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > The root anon_vma of all anon_vma's linked to a VMA must by definition be
> > the same - a VMA and all of its descendants/ancestors must exist in the
> > same CoW chain.
> >
> > Commit bb4aa39676f7 ("mm: avoid repeated anon_vma lock/unlock sequences in
> > anon_vma_clone()") introduced paranoid checking of the root anon_vma
> > remaining the same throughout all AVC's in 2011.
> >
> > I think 15 years later we can safely assume that this is always the case.
> >
> > Additionally, since unfaulted VMAs being cloned from or unlinked are
> > no-op's, we can simply lock the anon_vma's associated with this rather than
> > doing any specific dance around this.
> >
> > This removes unnecessary checks and makes it clear that the root anon_vma
> > is shared between all anon_vma's in a given VMA's anon_vma_chain.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > mm/rmap.c | 48 ++++++++++++------------------------------------
> > 1 file changed, 12 insertions(+), 36 deletions(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 9332d1cbc643..60134a566073 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -231,32 +231,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> > return -ENOMEM;
> > }
> >
> > -/*
> > - * This is a useful helper function for locking the anon_vma root as
> > - * we traverse the vma->anon_vma_chain, looping over anon_vma's that
> > - * have the same vma.
> > - *
> > - * Such anon_vma's should have the same root, so you'd expect to see
> > - * just a single mutex_lock for the whole traversal.
> > - */
> > -static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma)
> > -{
> > - struct anon_vma *new_root = anon_vma->root;
> > - if (new_root != root) {
> > - if (WARN_ON_ONCE(root))
> > - up_write(&root->rwsem);
> > - root = new_root;
> > - down_write(&root->rwsem);
> > - }
> > - return root;
> > -}
> > -
> > -static inline void unlock_anon_vma_root(struct anon_vma *root)
> > -{
> > - if (root)
> > - up_write(&root->rwsem);
> > -}
> > -
> > static void check_anon_vma_clone(struct vm_area_struct *dst,
> > struct vm_area_struct *src)
> > {
> > @@ -307,26 +281,25 @@ static void check_anon_vma_clone(struct vm_area_struct *dst,
> > int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > {
> > struct anon_vma_chain *avc, *pavc;
> > - struct anon_vma *root = NULL;
> >
> > if (!src->anon_vma)
> > return 0;
> >
> > check_anon_vma_clone(dst, src);
> >
> > + anon_vma_lock_write(src->anon_vma);
> > list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> > struct anon_vma *anon_vma;
> >
> > avc = anon_vma_chain_alloc(GFP_NOWAIT);
> > if (unlikely(!avc)) {
> > - unlock_anon_vma_root(root);
> > - root = NULL;
> > + anon_vma_unlock_write(src->anon_vma);
> > avc = anon_vma_chain_alloc(GFP_KERNEL);
> > if (!avc)
> > goto enomem_failure;
> > + anon_vma_lock_write(src->anon_vma);
>
> So, we drop and then reacquire src->anon_vma->root->rwsem, expecting
> src->anon_vma and src->anon_vma->root to be the same. And IIUC
I mean did you read the commit message? :)
We're not expecting that, they _have_ to be the same. It simply makes no sense
for them _not_ to be the same.
This is kind of the entire point of the patch.
> src->vm_mm's mmap lock is what guarantees all this. If so, could you
> please add a clarifying comment here?
No that's not what guarantees it? I don't understand what you mean?
I mean in a sense, if you had a totally broken situation where you didn't take
exclusive locks and could do some horribly broken racing here, then sure you
might end up with something broken, but I think it's super confusing to say 'oh
this lock guarantees it', well no it guarantees that you aren't completely
broken, what guarantees the shared root is how anon_vma_fork() works, which is
to:
- Clone.
- If not reused an anon_vma (which by recursion would also have same root)
allocate new anon_vma.
- If allocated new, set root to source VMA's anon_vma, which by definition also
has to be in its anon_vma_chain and have the same root (itself, if we're
cloning from the ultimate parent).
But I don't think it'd be helpful to document all this, or we get into _adding_
confusion by putting _too much_ in a comment.
So I guess I'll just say,a s I do in the newly introduced
clenaup_partial_anon_vmas():
/* All anon_vma's share the same root. */
>
> > }
> > anon_vma = pavc->anon_vma;
> > - root = lock_anon_vma_root(root, anon_vma);
> > anon_vma_chain_link(dst, avc, anon_vma);
> >
> > /*
> > @@ -343,7 +316,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > }
> > if (dst->anon_vma)
> > dst->anon_vma->num_active_vmas++;
> > - unlock_anon_vma_root(root);
> > +
> > + anon_vma_unlock_write(src->anon_vma);
> > return 0;
> >
> > enomem_failure:
> > @@ -438,15 +412,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> > void unlink_anon_vmas(struct vm_area_struct *vma)
> > {
> > struct anon_vma_chain *avc, *next;
> > - struct anon_vma *root = NULL;
> > + struct anon_vma *active_anon_vma = vma->anon_vma;
> >
> > /* Always hold mmap lock, read-lock on unmap possibly. */
> > mmap_assert_locked(vma->vm_mm);
> >
> > /* Unfaulted is a no-op. */
> > - if (!vma->anon_vma)
> > + if (!active_anon_vma)
> > return;
> >
> > + anon_vma_lock_write(active_anon_vma);
> > +
> > /*
> > * Unlink each anon_vma chained to the VMA. This list is ordered
> > * from newest to oldest, ensuring the root anon_vma gets freed last.
> > @@ -454,7 +430,6 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> > list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
> > struct anon_vma *anon_vma = avc->anon_vma;
> >
> > - root = lock_anon_vma_root(root, anon_vma);
> > anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
> >
> > /*
> > @@ -470,13 +445,14 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> > anon_vma_chain_free(avc);
> > }
> >
> > - vma->anon_vma->num_active_vmas--;
> > + active_anon_vma->num_active_vmas--;
> > /*
> > * vma would still be needed after unlink, and anon_vma will be prepared
> > * when handle fault.
> > */
> > vma->anon_vma = NULL;
> > - unlock_anon_vma_root(root);
> > + anon_vma_unlock_write(active_anon_vma);
> > +
> >
> > /*
> > * Iterate the list once more, it now only contains empty and unlinked
> > --
> > 2.52.0
> >
On Tue, Jan 6, 2026 at 5:58 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Mon, Dec 29, 2025 at 02:17:53PM -0800, Suren Baghdasaryan wrote:
> > On Wed, Dec 17, 2025 at 4:27 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > The root anon_vma of all anon_vma's linked to a VMA must by definition be
> > > the same - a VMA and all of its descendants/ancestors must exist in the
> > > same CoW chain.
> > >
> > > Commit bb4aa39676f7 ("mm: avoid repeated anon_vma lock/unlock sequences in
> > > anon_vma_clone()") introduced paranoid checking of the root anon_vma
> > > remaining the same throughout all AVC's in 2011.
> > >
> > > I think 15 years later we can safely assume that this is always the case.
> > >
> > > Additionally, since unfaulted VMAs being cloned from or unlinked are
> > > no-op's, we can simply lock the anon_vma's associated with this rather than
> > > doing any specific dance around this.
> > >
> > > This removes unnecessary checks and makes it clear that the root anon_vma
> > > is shared between all anon_vma's in a given VMA's anon_vma_chain.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > ---
> > > mm/rmap.c | 48 ++++++++++++------------------------------------
> > > 1 file changed, 12 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 9332d1cbc643..60134a566073 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -231,32 +231,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> > > return -ENOMEM;
> > > }
> > >
> > > -/*
> > > - * This is a useful helper function for locking the anon_vma root as
> > > - * we traverse the vma->anon_vma_chain, looping over anon_vma's that
> > > - * have the same vma.
> > > - *
> > > - * Such anon_vma's should have the same root, so you'd expect to see
> > > - * just a single mutex_lock for the whole traversal.
> > > - */
> > > -static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma)
> > > -{
> > > - struct anon_vma *new_root = anon_vma->root;
> > > - if (new_root != root) {
> > > - if (WARN_ON_ONCE(root))
> > > - up_write(&root->rwsem);
> > > - root = new_root;
> > > - down_write(&root->rwsem);
> > > - }
> > > - return root;
> > > -}
> > > -
> > > -static inline void unlock_anon_vma_root(struct anon_vma *root)
> > > -{
> > > - if (root)
> > > - up_write(&root->rwsem);
> > > -}
> > > -
> > > static void check_anon_vma_clone(struct vm_area_struct *dst,
> > > struct vm_area_struct *src)
> > > {
> > > @@ -307,26 +281,25 @@ static void check_anon_vma_clone(struct vm_area_struct *dst,
> > > int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > > {
> > > struct anon_vma_chain *avc, *pavc;
> > > - struct anon_vma *root = NULL;
> > >
> > > if (!src->anon_vma)
> > > return 0;
> > >
> > > check_anon_vma_clone(dst, src);
> > >
> > > + anon_vma_lock_write(src->anon_vma);
> > > list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> > > struct anon_vma *anon_vma;
> > >
> > > avc = anon_vma_chain_alloc(GFP_NOWAIT);
> > > if (unlikely(!avc)) {
> > > - unlock_anon_vma_root(root);
> > > - root = NULL;
> > > + anon_vma_unlock_write(src->anon_vma);
> > > avc = anon_vma_chain_alloc(GFP_KERNEL);
> > > if (!avc)
> > > goto enomem_failure;
> > > + anon_vma_lock_write(src->anon_vma);
> >
> > So, we drop and then reacquire src->anon_vma->root->rwsem, expecting
> > src->anon_vma and src->anon_vma->root to be the same. And IIUC
>
> I mean did you read the commit message? :)
>
> We're not expecting that, they _have_ to be the same. It simply makes no sense
> for them _not_ to be the same.
Sorry, maybe I chose my words badly to explain my concern. I meant
that we expect those fields to still be valid between the time when we
drop and re-ackquire the lock. The comment next to anon_vma.rwsem
definition says "W: modification, R: walking the list". Here we are
walking the list with the lock but are dropping the lock in the
process. I think there needs to be an explanation why this is safe.
>
> This is kind of the entire point of the patch.
>
> > src->vm_mm's mmap lock is what guarantees all this. If so, could you
> > please add a clarifying comment here?
>
> No that's not what guarantees it? I don't understand what you mean?
>
> I mean in a sense, if you had a totally broken situation where you didn't take
> exclusive locks and could do some horribly broken racing here, then sure you
> might end up with something broken, but I think it's super confusing to say 'oh
> this lock guarantees it', well no it guarantees that you aren't completely
> broken, what guarantees the shared root is how anon_vma_fork() works, which is
> to:
>
> - Clone.
> - If not reused an anon_vma (which by recursion would also have same root)
> allocate new anon_vma.
> - If allocated new, set root to source VMA's anon_vma, which by definition also
> has to be in its anon_vma_chain and have the same root (itself, if we're
> cloning from the ultimate parent).
>
> But I don't think it'd be helpful to document all this, or we get into _adding_
> confusion by putting _too much_ in a comment.
>
> So I guess I'll just say,a s I do in the newly introduced
> clenaup_partial_anon_vmas():
>
> /* All anon_vma's share the same root. */
Yeah, my concern was not the root being different but that the list
itself is stable after we drop the lock.
>
> >
> > > }
> > > anon_vma = pavc->anon_vma;
> > > - root = lock_anon_vma_root(root, anon_vma);
> > > anon_vma_chain_link(dst, avc, anon_vma);
> > >
> > > /*
> > > @@ -343,7 +316,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > > }
> > > if (dst->anon_vma)
> > > dst->anon_vma->num_active_vmas++;
> > > - unlock_anon_vma_root(root);
> > > +
> > > + anon_vma_unlock_write(src->anon_vma);
> > > return 0;
> > >
> > > enomem_failure:
> > > @@ -438,15 +412,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> > > void unlink_anon_vmas(struct vm_area_struct *vma)
> > > {
> > > struct anon_vma_chain *avc, *next;
> > > - struct anon_vma *root = NULL;
> > > + struct anon_vma *active_anon_vma = vma->anon_vma;
> > >
> > > /* Always hold mmap lock, read-lock on unmap possibly. */
> > > mmap_assert_locked(vma->vm_mm);
> > >
> > > /* Unfaulted is a no-op. */
> > > - if (!vma->anon_vma)
> > > + if (!active_anon_vma)
> > > return;
> > >
> > > + anon_vma_lock_write(active_anon_vma);
> > > +
> > > /*
> > > * Unlink each anon_vma chained to the VMA. This list is ordered
> > > * from newest to oldest, ensuring the root anon_vma gets freed last.
> > > @@ -454,7 +430,6 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> > > list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
> > > struct anon_vma *anon_vma = avc->anon_vma;
> > >
> > > - root = lock_anon_vma_root(root, anon_vma);
> > > anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
> > >
> > > /*
> > > @@ -470,13 +445,14 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> > > anon_vma_chain_free(avc);
> > > }
> > >
> > > - vma->anon_vma->num_active_vmas--;
> > > + active_anon_vma->num_active_vmas--;
> > > /*
> > > * vma would still be needed after unlink, and anon_vma will be prepared
> > > * when handle fault.
> > > */
> > > vma->anon_vma = NULL;
> > > - unlock_anon_vma_root(root);
> > > + anon_vma_unlock_write(active_anon_vma);
> > > +
> > >
> > > /*
> > > * Iterate the list once more, it now only contains empty and unlinked
> > > --
> > > 2.52.0
> > >
On Tue, Jan 06, 2026 at 12:58:46PM -0800, Suren Baghdasaryan wrote:
> On Tue, Jan 6, 2026 at 5:58 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Mon, Dec 29, 2025 at 02:17:53PM -0800, Suren Baghdasaryan wrote:
> > > On Wed, Dec 17, 2025 at 4:27 AM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > >
> > > > The root anon_vma of all anon_vma's linked to a VMA must by definition be
> > > > the same - a VMA and all of its descendants/ancestors must exist in the
> > > > same CoW chain.
> > > >
> > > > Commit bb4aa39676f7 ("mm: avoid repeated anon_vma lock/unlock sequences in
> > > > anon_vma_clone()") introduced paranoid checking of the root anon_vma
> > > > remaining the same throughout all AVC's in 2011.
> > > >
> > > > I think 15 years later we can safely assume that this is always the case.
> > > >
> > > > Additionally, since unfaulted VMAs being cloned from or unlinked are
> > > > no-op's, we can simply lock the anon_vma's associated with this rather than
> > > > doing any specific dance around this.
> > > >
> > > > This removes unnecessary checks and makes it clear that the root anon_vma
> > > > is shared between all anon_vma's in a given VMA's anon_vma_chain.
> > > >
> > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > > ---
> > > > mm/rmap.c | 48 ++++++++++++------------------------------------
> > > > 1 file changed, 12 insertions(+), 36 deletions(-)
> > > >
> > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > index 9332d1cbc643..60134a566073 100644
> > > > --- a/mm/rmap.c
> > > > +++ b/mm/rmap.c
> > > > @@ -231,32 +231,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> > > > return -ENOMEM;
> > > > }
> > > >
> > > > -/*
> > > > - * This is a useful helper function for locking the anon_vma root as
> > > > - * we traverse the vma->anon_vma_chain, looping over anon_vma's that
> > > > - * have the same vma.
> > > > - *
> > > > - * Such anon_vma's should have the same root, so you'd expect to see
> > > > - * just a single mutex_lock for the whole traversal.
> > > > - */
> > > > -static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma)
> > > > -{
> > > > - struct anon_vma *new_root = anon_vma->root;
> > > > - if (new_root != root) {
> > > > - if (WARN_ON_ONCE(root))
> > > > - up_write(&root->rwsem);
> > > > - root = new_root;
> > > > - down_write(&root->rwsem);
> > > > - }
> > > > - return root;
> > > > -}
> > > > -
> > > > -static inline void unlock_anon_vma_root(struct anon_vma *root)
> > > > -{
> > > > - if (root)
> > > > - up_write(&root->rwsem);
> > > > -}
> > > > -
> > > > static void check_anon_vma_clone(struct vm_area_struct *dst,
> > > > struct vm_area_struct *src)
> > > > {
> > > > @@ -307,26 +281,25 @@ static void check_anon_vma_clone(struct vm_area_struct *dst,
> > > > int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > > > {
> > > > struct anon_vma_chain *avc, *pavc;
> > > > - struct anon_vma *root = NULL;
> > > >
> > > > if (!src->anon_vma)
> > > > return 0;
> > > >
> > > > check_anon_vma_clone(dst, src);
> > > >
> > > > + anon_vma_lock_write(src->anon_vma);
> > > > list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> > > > struct anon_vma *anon_vma;
> > > >
> > > > avc = anon_vma_chain_alloc(GFP_NOWAIT);
> > > > if (unlikely(!avc)) {
> > > > - unlock_anon_vma_root(root);
> > > > - root = NULL;
> > > > + anon_vma_unlock_write(src->anon_vma);
> > > > avc = anon_vma_chain_alloc(GFP_KERNEL);
> > > > if (!avc)
> > > > goto enomem_failure;
> > > > + anon_vma_lock_write(src->anon_vma);
> > >
> > > So, we drop and then reacquire src->anon_vma->root->rwsem, expecting
> > > src->anon_vma and src->anon_vma->root to be the same. And IIUC
> >
> > I mean did you read the commit message? :)
> >
> > We're not expecting that, they _have_ to be the same. It simply makes no sense
> > for them _not_ to be the same.
>
> Sorry, maybe I chose my words badly to explain my concern. I meant
> that we expect those fields to still be valid between the time when we
> drop and re-ackquire the lock. The comment next to anon_vma.rwsem
> definition says "W: modification, R: walking the list". Here we are
> walking the list with the lock but are dropping the lock in the
> process. I think there needs to be an explanation why this is safe.
This already happened though? And yes it's sketchy.
I don't think this is necessary as later I change this anyway, so we'd just be
adding an explanation I'd have to delete later.
I already provide explanation as to the locking when I go ahead and change the
scope of the anon_vma rmap lock elsewhere so this general 'explaining lock
scope' pattern is happening in the final result of the series.
>
>
> >
> > This is kind of the entire point of the patch.
> >
> > > src->vm_mm's mmap lock is what guarantees all this. If so, could you
> > > please add a clarifying comment here?
> >
> > No that's not what guarantees it? I don't understand what you mean?
> >
> > I mean in a sense, if you had a totally broken situation where you didn't take
> > exclusive locks and could do some horribly broken racing here, then sure you
> > might end up with something broken, but I think it's super confusing to say 'oh
> > this lock guarantees it', well no it guarantees that you aren't completely
> > broken, what guarantees the shared root is how anon_vma_fork() works, which is
> > to:
> >
> > - Clone.
> > - If not reused an anon_vma (which by recursion would also have same root)
> > allocate new anon_vma.
> > - If allocated new, set root to source VMA's anon_vma, which by definition also
> > has to be in its anon_vma_chain and have the same root (itself, if we're
> > cloning from the ultimate parent).
> >
> > But I don't think it'd be helpful to document all this, or we get into _adding_
> > confusion by putting _too much_ in a comment.
> >
> > So I guess I'll just say,a s I do in the newly introduced
> > clenaup_partial_anon_vmas():
> >
> > /* All anon_vma's share the same root. */
>
> Yeah, my concern was not the root being different but that the list
> itself is stable after we drop the lock.
Again, I'm going to end up deleting any explanation that I add in a later
patch where I extensively change this, which seems like it'd not be a
useful thing to do in the series.
So I think we should leave it as-is.
Thanks, Lorenzo
© 2016 - 2026 Red Hat, Inc.