fs/namespace.c | 9 +++++++++ 1 file changed, 9 insertions(+)
From: Guopeng Zhang <zhangguopeng@kylinos.cn>
Changing mount propagation through the legacy mount API changes
user-visible mountinfo contents, including the shared: and master:
optional fields.
The mount_setattr() path already touches the mount namespace after
change_mnt_propagation(), so pollers of /proc/<pid>/mountinfo are woken
when the namespace event changes.
The legacy mount --make-* path also changes propagation through
change_mnt_propagation(), and MOVE_MOUNT_SET_GROUP updates the
propagation relationship of the target mount. Both paths currently
return without touching the affected mount namespace.
As a result, userspace polling /proc/<pid>/mountinfo can miss these
propagation-only changes even though mountinfo has changed.
A simple reproducer that polls /proc/self/mountinfo while changing
propagation shows the inconsistency.
Before this change:
legacy MS_SHARED: poll ret=0 revents=0x0
mount_setattr MS_SHARED: poll ret=1 revents=0xa
After this change:
legacy MS_SHARED: poll ret=1 revents=0xa
mount_setattr MS_SHARED: poll ret=1 revents=0xa
Touch the affected mount namespace after successfully changing
propagation state in do_change_type() and do_set_group(). Take the
vfsmount lock for write around touch_mnt_namespace(), as required by
its locking rules.
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
---
fs/namespace.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/namespace.c b/fs/namespace.c
index 9a66a806a9b8..f871c7bf3bc8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2908,6 +2908,10 @@ static int do_change_type(const struct path *path, int ms_flags)
for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL))
change_mnt_propagation(m, type);
+ lock_mount_hash();
+ touch_mnt_namespace(mnt->mnt_ns);
+ unlock_mount_hash();
+
return 0;
}
@@ -3479,6 +3483,11 @@ static int do_set_group(const struct path *from_path, const struct path *to_path
list_add(&to->mnt_share, &from->mnt_share);
set_mnt_shared(to);
}
+
+ lock_mount_hash();
+ touch_mnt_namespace(to->mnt_ns);
+ unlock_mount_hash();
+
return 0;
}
--
2.43.0
On Fri, May 29, 2026 at 05:54:41PM +0800, Guopeng Zhang wrote: > From: Guopeng Zhang <zhangguopeng@kylinos.cn> > > Changing mount propagation through the legacy mount API changes > user-visible mountinfo contents, including the shared: and master: > optional fields. > > The mount_setattr() path already touches the mount namespace after > change_mnt_propagation(), so pollers of /proc/<pid>/mountinfo are woken > when the namespace event changes. > > The legacy mount --make-* path also changes propagation through > change_mnt_propagation(), and MOVE_MOUNT_SET_GROUP updates the > propagation relationship of the target mount. Both paths currently > return without touching the affected mount namespace. > > As a result, userspace polling /proc/<pid>/mountinfo can miss these > propagation-only changes even though mountinfo has changed. > > A simple reproducer that polls /proc/self/mountinfo while changing > propagation shows the inconsistency. > > Before this change: > > legacy MS_SHARED: poll ret=0 revents=0x0 > mount_setattr MS_SHARED: poll ret=1 revents=0xa > > After this change: > > legacy MS_SHARED: poll ret=1 revents=0xa > mount_setattr MS_SHARED: poll ret=1 revents=0xa > > Touch the affected mount namespace after successfully changing > propagation state in do_change_type() and do_set_group(). Take the > vfsmount lock for write around touch_mnt_namespace(), as required by > its locking rules. > > Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> > --- > fs/namespace.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 9a66a806a9b8..f871c7bf3bc8 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2908,6 +2908,10 @@ static int do_change_type(const struct path *path, int ms_flags) > for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL)) > change_mnt_propagation(m, type); > > + lock_mount_hash(); > + touch_mnt_namespace(mnt->mnt_ns); > + unlock_mount_hash(); > + > return 0; > } > > @@ -3479,6 +3483,11 @@ static int do_set_group(const struct path *from_path, const struct path *to_path > list_add(&to->mnt_share, &from->mnt_share); > set_mnt_shared(to); > } > + > + lock_mount_hash(); > + touch_mnt_namespace(to->mnt_ns); > + unlock_mount_hash(); Doing this would cause seqcount readers to retry on mount propagation changes when all of them really only care about mount topology changes. So this can likely use: guard(mount_locked_reader)(); touch_mnt_namespace(mnt_ns); Even today, observing an unchanged seqcount across mnt->mnt_flags reads doesn't guarantee that it really wasn't changed.
在 2026/5/29 18:23, Christian Brauner 写道: > On Fri, May 29, 2026 at 05:54:41PM +0800, Guopeng Zhang wrote: >> From: Guopeng Zhang <zhangguopeng@kylinos.cn> >> >> Changing mount propagation through the legacy mount API changes >> user-visible mountinfo contents, including the shared: and master: >> optional fields. >> >> The mount_setattr() path already touches the mount namespace after >> change_mnt_propagation(), so pollers of /proc/<pid>/mountinfo are woken >> when the namespace event changes. >> >> The legacy mount --make-* path also changes propagation through >> change_mnt_propagation(), and MOVE_MOUNT_SET_GROUP updates the >> propagation relationship of the target mount. Both paths currently >> return without touching the affected mount namespace. >> >> As a result, userspace polling /proc/<pid>/mountinfo can miss these >> propagation-only changes even though mountinfo has changed. >> >> A simple reproducer that polls /proc/self/mountinfo while changing >> propagation shows the inconsistency. >> >> Before this change: >> >> legacy MS_SHARED: poll ret=0 revents=0x0 >> mount_setattr MS_SHARED: poll ret=1 revents=0xa >> >> After this change: >> >> legacy MS_SHARED: poll ret=1 revents=0xa >> mount_setattr MS_SHARED: poll ret=1 revents=0xa >> >> Touch the affected mount namespace after successfully changing >> propagation state in do_change_type() and do_set_group(). Take the >> vfsmount lock for write around touch_mnt_namespace(), as required by >> its locking rules. >> >> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> >> --- >> fs/namespace.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/fs/namespace.c b/fs/namespace.c >> index 9a66a806a9b8..f871c7bf3bc8 100644 >> --- a/fs/namespace.c >> +++ b/fs/namespace.c >> @@ -2908,6 +2908,10 @@ static int do_change_type(const struct path *path, int ms_flags) >> for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL)) >> change_mnt_propagation(m, type); >> >> + lock_mount_hash(); >> + touch_mnt_namespace(mnt->mnt_ns); >> + unlock_mount_hash(); >> + >> return 0; >> } >> >> @@ -3479,6 +3483,11 @@ static int do_set_group(const struct path *from_path, const struct path *to_path >> list_add(&to->mnt_share, &from->mnt_share); >> set_mnt_shared(to); >> } >> + >> + lock_mount_hash(); >> + touch_mnt_namespace(to->mnt_ns); >> + unlock_mount_hash(); > > Doing this would cause seqcount readers to retry on mount propagation > changes when all of them really only care about mount topology changes. > So this can likely use: > > guard(mount_locked_reader)(); > touch_mnt_namespace(mnt_ns); > > Even today, observing an unchanged seqcount across mnt->mnt_flags reads > doesn't guarantee that it really wasn't changed. Hi Christian, Thanks for the review and explanation. I will send a v2 with the suggested changes. Thanks, Guopeng
© 2016 - 2026 Red Hat, Inc.