fs/mount.h | 10 +--------- fs/namespace.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 9 deletions(-)
The locking in mnt_notify_add(), which was introduced in commit
bf630c401641 ("vfs: add notifications for mount attach and detach"), is a
bit gnarly.
notify_list is protected by namespace_lock, but there are cases where
mnt_notify_add() is called without holding namespace_lock, for example:
__do_sys_fsmount -> mnt_add_to_ns -> mnt_notify_add
Luckily, in cases where the namespace_lock isn't held, the namespace is
always freshly created and can't have any fsnotify marks yet, which means
the notify_list isn't actually accessed.
The existing comment claims that not accessing the notify_list in these
cases is merely an optimization, which is wrong. Fix the comment, and add a
locking assertion.
To allow mnt_notify_add() to reference the namespace_sem, move it into
fs/namespace.c.
Signed-off-by: Jann Horn <jannh@google.com>
---
I'm sending this patch because I spent some time staring at this
trying to figure out if this was buggy or not.
I don't know if this is working as intended or working by accident,
and it might be nice if this was cleaned up to have simpler locking;
but for now, document what's going on for the next person who stares
at this code.
---
fs/mount.h | 10 +---------
fs/namespace.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/fs/mount.h b/fs/mount.h
index e0816c11a198..99016db2f408 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -219,15 +219,7 @@ static inline struct mnt_namespace *to_mnt_ns(struct ns_common *ns)
}
#ifdef CONFIG_FSNOTIFY
-static inline void mnt_notify_add(struct mount *m)
-{
- /* Optimize the case where there are no watches */
- if ((m->mnt_ns && m->mnt_ns->n_fsnotify_marks) ||
- (m->prev_ns && m->prev_ns->n_fsnotify_marks))
- list_add_tail(&m->to_notify, ¬ify_list);
- else
- m->prev_ns = m->mnt_ns;
-}
+void mnt_notify_add(struct mount *m);
#else
static inline void mnt_notify_add(struct mount *m)
{
diff --git a/fs/namespace.c b/fs/namespace.c
index fe919abd2f01..56f130b49e58 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6431,6 +6431,25 @@ bool mnt_may_suid(struct vfsmount *mnt)
current_in_userns(mnt->mnt_sb->s_user_ns);
}
+#ifdef CONFIG_FSNOTIFY
+void mnt_notify_add(struct mount *m)
+{
+ /*
+ * notify_list is protected by namespace_sem.
+ * It is possible to call this function without holding namespace_sem,
+ * but in those cases, the mount is associated with a new mount
+ * namespace that can't have any fanotify marks yet.
+ */
+ if ((m->mnt_ns && m->mnt_ns->n_fsnotify_marks) ||
+ (m->prev_ns && m->prev_ns->n_fsnotify_marks)) {
+ rwsem_assert_held_write(&namespace_sem);
+ list_add_tail(&m->to_notify, ¬ify_list);
+ } else {
+ m->prev_ns = m->mnt_ns;
+ }
+}
+#endif
+
static struct ns_common *mntns_get(struct task_struct *task)
{
struct ns_common *ns = NULL;
---
base-commit: 8fde5d1d47f69db6082dfa34500c27f8485389a5
change-id: 20260529-vfs-mnt-notify-comment-4369e0558f0b
Best regards,
--
Jann Horn <jannh@google.com>
On Fri, May 29, 2026 at 09:38:08PM +0200, Jann Horn wrote:
> The locking in mnt_notify_add(), which was introduced in commit
> bf630c401641 ("vfs: add notifications for mount attach and detach"), is a
> bit gnarly.
> notify_list is protected by namespace_lock, but there are cases where
> mnt_notify_add() is called without holding namespace_lock, for example:
>
> __do_sys_fsmount -> mnt_add_to_ns -> mnt_notify_add
>
> Luckily, in cases where the namespace_lock isn't held, the namespace is
> always freshly created and can't have any fsnotify marks yet, which means
> the notify_list isn't actually accessed.
When fsmount() is called it creates an anonymous mount namespace. Such
anonymous mount namespaces are pure containers for a mount trees. They
can never actually appear as namespaces to userspace. Nothing can be
registered on them and they can't be setns()'d into. So it is impossible
to ever register any mount notification watch on them.
In general modifying mount notification objects require namespace_sem to
be held (read-side - we downgrade when possible).
>
> The existing comment claims that not accessing the notify_list in these
> cases is merely an optimization, which is wrong. Fix the comment, and add a
> locking assertion.
I don't think this is what the comment intended to communicate but I
guess it can be misread.
>
> To allow mnt_notify_add() to reference the namespace_sem, move it into
> fs/namespace.c.
>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> I'm sending this patch because I spent some time staring at this
> trying to figure out if this was buggy or not.
>
> I don't know if this is working as intended or working by accident,
> and it might be nice if this was cleaned up to have simpler locking;
The locking is rather simple: namespace_sem. But maybe I misunderstand.
It just piggy-backs on the same logic as mnt_add_to_ns() in that
mnt_add_to_ns() may be called to add mount into a newly allocated
namespaces without namespace semaphore held. So the locking context is
guaranteed by mnt_add_to_ns(). To put another way: if you have quarrels
with mnt_notify_add() you likely also have quarrels with the contextual
locking of mnt_add_to_ns().
On Tue, Jun 2, 2026 at 3:14 PM Christian Brauner <brauner@kernel.org> wrote: > On Fri, May 29, 2026 at 09:38:08PM +0200, Jann Horn wrote: > > To allow mnt_notify_add() to reference the namespace_sem, move it into > > fs/namespace.c. > > > > Signed-off-by: Jann Horn <jannh@google.com> > > --- > > I'm sending this patch because I spent some time staring at this > > trying to figure out if this was buggy or not. > > > > I don't know if this is working as intended or working by accident, > > and it might be nice if this was cleaned up to have simpler locking; > > The locking is rather simple: namespace_sem. But maybe I misunderstand. > It just piggy-backs on the same logic as mnt_add_to_ns() in that > mnt_add_to_ns() may be called to add mount into a newly allocated > namespaces without namespace semaphore held. So the locking context is > guaranteed by mnt_add_to_ns(). To put another way: if you have quarrels > with mnt_notify_add() you likely also have quarrels with the contextual > locking of mnt_add_to_ns(). Ah, okay, thanks for the explanation. Yeah, I guess I just didn't understand the larger context of what's going on here.
On Fri 29-05-26 21:38:08, Jann Horn wrote:
> The locking in mnt_notify_add(), which was introduced in commit
> bf630c401641 ("vfs: add notifications for mount attach and detach"), is a
> bit gnarly.
> notify_list is protected by namespace_lock, but there are cases where
> mnt_notify_add() is called without holding namespace_lock, for example:
>
> __do_sys_fsmount -> mnt_add_to_ns -> mnt_notify_add
>
> Luckily, in cases where the namespace_lock isn't held, the namespace is
> always freshly created and can't have any fsnotify marks yet, which means
> the notify_list isn't actually accessed.
>
> The existing comment claims that not accessing the notify_list in these
> cases is merely an optimization, which is wrong. Fix the comment, and add a
> locking assertion.
>
> To allow mnt_notify_add() to reference the namespace_sem, move it into
> fs/namespace.c.
>
> Signed-off-by: Jann Horn <jannh@google.com>
Thanks for looking into this. Adding Miklos to CC who actually wrote that
code. Also in cases like path_pivot_root() I don't think we hold the
namespace_sem so the locking is even more complicated if it is correct at
all. Miklos?
Honza
> ---
> I'm sending this patch because I spent some time staring at this
> trying to figure out if this was buggy or not.
>
> I don't know if this is working as intended or working by accident,
> and it might be nice if this was cleaned up to have simpler locking;
> but for now, document what's going on for the next person who stares
> at this code.
> ---
> fs/mount.h | 10 +---------
> fs/namespace.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/fs/mount.h b/fs/mount.h
> index e0816c11a198..99016db2f408 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -219,15 +219,7 @@ static inline struct mnt_namespace *to_mnt_ns(struct ns_common *ns)
> }
>
> #ifdef CONFIG_FSNOTIFY
> -static inline void mnt_notify_add(struct mount *m)
> -{
> - /* Optimize the case where there are no watches */
> - if ((m->mnt_ns && m->mnt_ns->n_fsnotify_marks) ||
> - (m->prev_ns && m->prev_ns->n_fsnotify_marks))
> - list_add_tail(&m->to_notify, ¬ify_list);
> - else
> - m->prev_ns = m->mnt_ns;
> -}
> +void mnt_notify_add(struct mount *m);
> #else
> static inline void mnt_notify_add(struct mount *m)
> {
> diff --git a/fs/namespace.c b/fs/namespace.c
> index fe919abd2f01..56f130b49e58 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -6431,6 +6431,25 @@ bool mnt_may_suid(struct vfsmount *mnt)
> current_in_userns(mnt->mnt_sb->s_user_ns);
> }
>
> +#ifdef CONFIG_FSNOTIFY
> +void mnt_notify_add(struct mount *m)
> +{
> + /*
> + * notify_list is protected by namespace_sem.
> + * It is possible to call this function without holding namespace_sem,
> + * but in those cases, the mount is associated with a new mount
> + * namespace that can't have any fanotify marks yet.
> + */
> + if ((m->mnt_ns && m->mnt_ns->n_fsnotify_marks) ||
> + (m->prev_ns && m->prev_ns->n_fsnotify_marks)) {
> + rwsem_assert_held_write(&namespace_sem);
> + list_add_tail(&m->to_notify, ¬ify_list);
> + } else {
> + m->prev_ns = m->mnt_ns;
> + }
> +}
> +#endif
> +
> static struct ns_common *mntns_get(struct task_struct *task)
> {
> struct ns_common *ns = NULL;
>
> ---
> base-commit: 8fde5d1d47f69db6082dfa34500c27f8485389a5
> change-id: 20260529-vfs-mnt-notify-comment-4369e0558f0b
>
> Best regards,
> --
> Jann Horn <jannh@google.com>
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
On Mon, Jun 1, 2026 at 3:04 PM Jan Kara <jack@suse.cz> wrote:
> On Fri 29-05-26 21:38:08, Jann Horn wrote:
> > The locking in mnt_notify_add(), which was introduced in commit
> > bf630c401641 ("vfs: add notifications for mount attach and detach"), is a
> > bit gnarly.
> > notify_list is protected by namespace_lock, but there are cases where
> > mnt_notify_add() is called without holding namespace_lock, for example:
> >
> > __do_sys_fsmount -> mnt_add_to_ns -> mnt_notify_add
> >
> > Luckily, in cases where the namespace_lock isn't held, the namespace is
> > always freshly created and can't have any fsnotify marks yet, which means
> > the notify_list isn't actually accessed.
> >
> > The existing comment claims that not accessing the notify_list in these
> > cases is merely an optimization, which is wrong. Fix the comment, and add a
> > locking assertion.
> >
> > To allow mnt_notify_add() to reference the namespace_sem, move it into
> > fs/namespace.c.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Thanks for looking into this. Adding Miklos to CC who actually wrote that
> code. Also in cases like path_pivot_root() I don't think we hold the
> namespace_sem so the locking is even more complicated if it is correct at
> all. Miklos?
I was confused by that code too, but it turns out that actually does
hold the namespace_sem - it's just really hard to see because of
__cleanup() magic.
path_pivot_root() does LOCK_MOUNT(), which is a macro that expands to
LOCK_MOUNT_MAYBE_BENEATH(), which expands to something that calls
do_lock_mount() and does a `__cleanup(unlock_mount)` declaration.
do_lock_mount() does namespace_lock(), and on return from
path_pivot_root(), the cleanup handler unlock_mount() calls
__unlock_mount(), which calls namespace_unlock().
© 2016 - 2026 Red Hat, Inc.