fs/namespace.c | 5 +++++ 1 file changed, 5 insertions(+)
mntput_no_expire_slowpath() does not remove a mount from its peer group
(mnt_share list) or slave list before sending it to the free path. If a
mount that was added to a peer group by clone_mnt() is freed through
mntput() without going through umount_tree()/bulk_make_private(), it
remains linked in the peer group's circular list after the slab object
is freed.
When another mount namespace is later torn down, umount_tree() calls
bulk_make_private() -> trace_transfers(), which walks the peer group via
next_peer(). This dereferences the freed mount's mnt_share field,
causing use-after-free:
BUG: KASAN: slab-use-after-free in __list_del_entry_valid_or_report
Read of size 8 at addr ffff88807d533af8
Call Trace:
__list_del_entry_valid_or_report
bulk_make_private
umount_tree
put_mnt_ns
do_exit
Allocated by:
alloc_vfsmnt
clone_mnt
vfs_open_tree
Freed by:
kmem_cache_free
rcu_core
Fix this by calling change_mnt_propagation(mnt, MS_PRIVATE) in
mntput_no_expire_slowpath() after mnt_del_instance(), while holding
lock_mount_hash(). This removes the mount from both the peer group and
any slave list before it enters the cleanup path.
This is safe without namespace_sem: the mount has MNT_DOOMED set and has
been removed from the instance list by mnt_del_instance(), making it
unreachable through normal lookup paths. lock_mount_hash() prevents
concurrent peer group traversal. This call is also idempotent: mounts
already made private by bulk_make_private() have IS_MNT_SHARED() and
IS_MNT_SLAVE() both false, so the condition is skipped.
Reported-by: syzbot+c0fd9ea308d049c4e0b9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c0fd9ea308d049c4e0b9
Fixes: 75db7fd99075b ("umount_tree(): take all victims out of propagation graph at once")
Cc: stable@vger.kernel.org
Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
---
fs/namespace.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/namespace.c b/fs/namespace.c
index 854f4fc66469..d25abf051ad6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1359,6 +1359,11 @@ static void noinline mntput_no_expire_slowpath(struct mount *mnt)
rcu_read_unlock();
mnt_del_instance(mnt);
+
+ /* Remove from peer group / slave list before freeing */
+ if (unlikely(IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)))
+ change_mnt_propagation(mnt, MS_PRIVATE);
+
if (unlikely(!list_empty(&mnt->mnt_expire)))
list_del(&mnt->mnt_expire);
--
2.50.1
Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
On Sat, Mar 14, 2026 at 06:44:22PM +0000, Yuto Ohnuki wrote:
> mntput_no_expire_slowpath() does not remove a mount from its peer group
> (mnt_share list) or slave list before sending it to the free path. If a
> mount that was added to a peer group by clone_mnt() is freed through
> mntput() without going through umount_tree()/bulk_make_private(), it
> remains linked in the peer group's circular list after the slab object
> is freed.
>
> When another mount namespace is later torn down, umount_tree() calls
> bulk_make_private() -> trace_transfers(), which walks the peer group via
> next_peer(). This dereferences the freed mount's mnt_share field,
> causing use-after-free:
>
> BUG: KASAN: slab-use-after-free in __list_del_entry_valid_or_report
> Read of size 8 at addr ffff88807d533af8
>
> Call Trace:
> __list_del_entry_valid_or_report
> bulk_make_private
> umount_tree
> put_mnt_ns
> do_exit
>
> Allocated by:
> alloc_vfsmnt
> clone_mnt
> vfs_open_tree
>
> Freed by:
> kmem_cache_free
> rcu_core
>
> Fix this by calling change_mnt_propagation(mnt, MS_PRIVATE) in
> mntput_no_expire_slowpath() after mnt_del_instance(), while holding
> lock_mount_hash(). This removes the mount from both the peer group and
> any slave list before it enters the cleanup path.
>
> This is safe without namespace_sem: the mount has MNT_DOOMED set and has
> been removed from the instance list by mnt_del_instance(), making it
> unreachable through normal lookup paths. lock_mount_hash() prevents
> concurrent peer group traversal. This call is also idempotent: mounts
> already made private by bulk_make_private() have IS_MNT_SHARED() and
> IS_MNT_SLAVE() both false, so the condition is skipped.
>
> Reported-by: syzbot+c0fd9ea308d049c4e0b9@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c0fd9ea308d049c4e0b9
> Fixes: 75db7fd99075b ("umount_tree(): take all victims out of propagation graph at once")
> Cc: stable@vger.kernel.org
> Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
> ---
The last time this reproduced upstream was on:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
which is v7.0-rc1. At which point the question should be "why?" :)
Fixed by: a41dbf5e004e ("mount: hold namespace_sem across copy in create_new_namespace()")
In any case, thanks for the proposed fix but it is already fixed
upstream and the fix you suggested indicates another bug that is the
real cause.
On Tue, Mar 17, 2026 at 04:24:32PM +0100, Christian Brauner wrote:
> The last time this reproduced upstream was on:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
>
> which is v7.0-rc1. At which point the question should be "why?" :)
>
> Fixed by: a41dbf5e004e ("mount: hold namespace_sem across copy in create_new_namespace()")
>
> In any case, thanks for the proposed fix but it is already fixed
> upstream and the fix you suggested indicates another bug that is the
> real cause.
Thanks for the review and explanation. I should have checked why the
reproducer stopped firing on current HEAD before sending the patch -
lesson learned. I was testing with a custom reproducer that called
clone_mnt() directly from a module, which bypassed the actual
create_new_namespace() code path and masked the fact that the real
bug was already fixed.
I see now that the real issue was the namespace_sem drop-and-reacquire
race in create_new_namespace(), not a missing cleanup in
mntput_no_expire_slowpath(). a41dbf5e004e properly fixes the root
cause by holding namespace_sem across the copy.
Please disregard this patch.
Thanks again,
Yuto
Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
© 2016 - 2026 Red Hat, Inc.