[WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma()

Sasha Levin posted 1 patch 1 month, 1 week ago
mm/vma.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
[WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma()
Posted by Sasha Levin 1 month, 1 week ago
When dup_anon_vma() calls anon_vma_clone() and it fails with -ENOMEM,
dst->anon_vma is left pointing at src->anon_vma without a corresponding
num_active_vmas increment (which only happens on the success path).

The internal cleanup_partial_anon_vmas() correctly frees partially-
allocated AVCs but does not clear dst->anon_vma. Later, when the VMA is
torn down during process exit, unlink_anon_vmas() sees a non-NULL
vma->anon_vma and decrements num_active_vmas without a prior matching
increment, causing an underflow. This eventually triggers:

  WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900 mm/rmap.c:528

First, fault injection in the mlock2 syscall path:

  FAULT_INJECTION: forcing a failure.
  name failslab, interval 1, probability 0, space 0, times 0
  CPU: 3 PID: 4261 Comm: syz.6.96
  Call Trace:
   should_fail_ex.cold+0xd8/0x15d
   should_failslab+0xd4/0x150
   kmem_cache_alloc_noprof+0x60/0x630
   anon_vma_clone+0x2ed/0xcf0
   dup_anon_vma+0x1cb/0x320
   vma_modify+0x16dd/0x2230
   vma_modify_flags+0x1f9/0x350
   mlock_fixup+0x225/0xe10
   apply_vma_lock_flags+0x249/0x360
   do_mlock+0x269/0x7f0
   __x64_sys_mlock2+0xc0/0x100

Followed by the WARNING on the same task during exit:

  WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900
  CPU: 3 PID: 4261 Comm: syz.6.96
  Call Trace:
   free_pgtables+0x312/0x950
   exit_mmap+0x487/0xa80
   __mmput+0x11b/0x540
   exit_mm
   do_exit+0x7b9/0x2c60

Fix this by clearing dst->anon_vma on clone failure, restoring the VMA
to its original unfaulted state. This ensures unlink_anon_vmas() will
correctly bail out early at the !active_anon_vma check.

Other callers of anon_vma_clone() are unaffected: VMA_OP_SPLIT/REMAP
free the dst VMA on error, and VMA_OP_FORK explicitly sets anon_vma to
NULL before cloning.

Fixes: 542eda1a83294 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/vma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/vma.c b/mm/vma.c
index be64f781a3aa7..4cf6a2a05c10a 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -629,8 +629,10 @@ static int dup_anon_vma(struct vm_area_struct *dst,
 		vma_assert_write_locked(dst);
 		dst->anon_vma = src->anon_vma;
 		ret = anon_vma_clone(dst, src, VMA_OP_MERGE_UNFAULTED);
-		if (ret)
+		if (ret) {
+			dst->anon_vma = NULL;
 			return ret;
+		}
 
 		*dup = dst;
 	}
-- 
2.51.0
Re: [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma()
Posted by Lorenzo Stoakes 1 month, 1 week ago
On Mon, Mar 02, 2026 at 10:15:47AM -0500, Sasha Levin wrote:
> When dup_anon_vma() calls anon_vma_clone() and it fails with -ENOMEM,
> dst->anon_vma is left pointing at src->anon_vma without a corresponding
> num_active_vmas increment (which only happens on the success path).
>
> The internal cleanup_partial_anon_vmas() correctly frees partially-
> allocated AVCs but does not clear dst->anon_vma. Later, when the VMA is
> torn down during process exit, unlink_anon_vmas() sees a non-NULL
> vma->anon_vma and decrements num_active_vmas without a prior matching
> increment, causing an underflow. This eventually triggers:

Yikes!

>
>   WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900 mm/rmap.c:528
>
> First, fault injection in the mlock2 syscall path:
>
>   FAULT_INJECTION: forcing a failure.
>   name failslab, interval 1, probability 0, space 0, times 0
>   CPU: 3 PID: 4261 Comm: syz.6.96
>   Call Trace:
>    should_fail_ex.cold+0xd8/0x15d
>    should_failslab+0xd4/0x150
>    kmem_cache_alloc_noprof+0x60/0x630
>    anon_vma_clone+0x2ed/0xcf0
>    dup_anon_vma+0x1cb/0x320
>    vma_modify+0x16dd/0x2230
>    vma_modify_flags+0x1f9/0x350
>    mlock_fixup+0x225/0xe10
>    apply_vma_lock_flags+0x249/0x360
>    do_mlock+0x269/0x7f0
>    __x64_sys_mlock2+0xc0/0x100
>
> Followed by the WARNING on the same task during exit:
>
>   WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900
>   CPU: 3 PID: 4261 Comm: syz.6.96
>   Call Trace:
>    free_pgtables+0x312/0x950
>    exit_mmap+0x487/0xa80
>    __mmput+0x11b/0x540
>    exit_mm
>    do_exit+0x7b9/0x2c60
>
> Fix this by clearing dst->anon_vma on clone failure, restoring the VMA
> to its original unfaulted state. This ensures unlink_anon_vmas() will
> correctly bail out early at the !active_anon_vma check.
>
> Other callers of anon_vma_clone() are unaffected: VMA_OP_SPLIT/REMAP
> free the dst VMA on error, and VMA_OP_FORK explicitly sets anon_vma to
> NULL before cloning.
>
> Fixes: 542eda1a83294 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
> Assisted-by: Claude:claude-opus-4-6
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  mm/vma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index be64f781a3aa7..4cf6a2a05c10a 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -629,8 +629,10 @@ static int dup_anon_vma(struct vm_area_struct *dst,
>  		vma_assert_write_locked(dst);
>  		dst->anon_vma = src->anon_vma;
>  		ret = anon_vma_clone(dst, src, VMA_OP_MERGE_UNFAULTED);
> -		if (ret)
> +		if (ret) {
> +			dst->anon_vma = NULL;
>  			return ret;
> +		}

Hm, I think I'd rather we tackle this at the source to be honest.

I think it makes sense to do this in cleanup_partial_anon_vmas() since that's
handling the rest of the cleanup, and this is what the anon_vma_clone() error
path previously did.

Something like:

static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
{
	struct anon_vma_chain *avc, *next;

	list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
		list_del(&avc->same_vma);
		anon_vma_chain_free(avc);
	}
+	vma->anon_vma = NULL;
}


>
>  		*dup = dst;
>  	}
> --
> 2.51.0
>

Thanks for looking at this, this definitely needs fixing, albeit luckily real
world OOM's like this are probably near-impossible to trigger due to be 'too
small to fail' allocations, however we do absolutely need to ensure these code
paths are correctly handled.

Thanks, Lorenzo