fs/namespace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Updates documentation for do_lock_mount() in fs/namespace.c
to clarify its parameters and return description to fix
warning reported by syzbot.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506301911.uysRaP8b-lkp@intel.com/
Signed-off-by: Ryan Chung <seokwoo.chung130@gmail.com>
---
fs/namespace.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index ddfd4457d338..577fdff9f1a8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2741,6 +2741,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
/**
* do_lock_mount - lock mount and mountpoint
* @path: target path
+ * @pinned: on success, holds a pin guarding the mountpoint
* @beneath: whether the intention is to mount beneath @path
*
* Follow the mount stack on @path until the top mount @mnt is found. If
@@ -2769,8 +2770,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
* to @mnt->mnt_mp->m_dentry. But if @mnt has been unmounted it will
* point to @mnt->mnt_root and @mnt->mnt_mp will be NULL.
*
- * Return: Either the target mountpoint on the top mount or the top
- * mount's mountpoint.
+ * Return: On success, 0 is returned. On failure, err is returned.
*/
static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bool beneath)
{
--
2.43.0
On Tue, Aug 19, 2025 at 02:22:35AM +0900, Ryan Chung wrote: > Updates documentation for do_lock_mount() in fs/namespace.c > to clarify its parameters and return description to fix > warning reported by syzbot. > > Reported-by: kernel test robot <lkp@intel.com> > Closes: https://lore.kernel.org/oe-kbuild-all/202506301911.uysRaP8b-lkp@intel.com/ > Signed-off-by: Ryan Chung <seokwoo.chung130@gmail.com> > --- > fs/namespace.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index ddfd4457d338..577fdff9f1a8 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2741,6 +2741,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, > /** > * do_lock_mount - lock mount and mountpoint > * @path: target path > + * @pinned: on success, holds a pin guarding the mountpoint I'm not sure if 'pin' is suitable here and in any case, that's not the only problem in that description - take a look at "Return:" part in there. The underlying problem is the semantics of function itself. lock_mount() assumed that it was called on the result of pathname resolution; the question is what to do if we race with somebody mounting something on top of the same location while we had been grabbing namespace_sem? "Follow through to the root of whatever's been mounted on top, same as we'd done if pathname resolution happened slightly later" used to be a reasonable answer, but these days we have move_mount(2), where we have * MOVE_MOUNT_T_EMPTY_PATH combined with empty pathname, which will have us start with whatever the descriptor is pointing to, mounts or no mounts. Choosing to treat that as "follow mounts anyway" is not a big deal. * MOVE_MOUNT_BENEATH - treated as "follow mounts and slip the damn thing under the topmost one". Again, OK for non-empty pathname, but... for empty ones the rationale is weaker. Alternative would be to treat these races as "act as if we'd won and the other guy had overmounted ours", i.e. *NOT* follow mounts. Again, for old syscalls that's fine - if another thread has raced with us and mounted something on top of the place we want to mount on, it could just as easily have come *after* we'd completed mount(2) and mounted their stuff on top of ours. If userland is not fine with such outcome, it needs to provide serialization between the callers. For move_mount(2)... again, the only real question is empty to_path case. Comments? Note, BTW, that attach_recursive_mnt() used to require dest_mnt/dest_mp to be on the very top; since 6.16 it treats that as "slip it under whatever's on top of that" - that's exactly what happens in 'beneath' case. So the second alternative is easily doable these days. And it would really simplify the lock_mount()/do_lock_mount()...
On Mon, Aug 18, 2025 at 09:14:28PM +0100, Al Viro wrote: > Alternative would be to treat these races as "act as if we'd won and > the other guy had overmounted ours", i.e. *NOT* follow mounts. Again, > for old syscalls that's fine - if another thread has raced with us and > mounted something on top of the place we want to mount on, it could just > as easily have come *after* we'd completed mount(2) and mounted their > stuff on top of ours. If userland is not fine with such outcome, it needs > to provide serialization between the callers. For move_mount(2)... again, > the only real question is empty to_path case. > > Comments? Thinking about it a bit more... Unfortunately, there's another corner case: "." as mountpoint. That would affect that old syscalls as well and I'm not sure that there's no userland code that relies upon the current behaviour. Background: pathname resolution does *NOT* follow mounts on the starting point and it does not follow mounts after "." ; mkdir /tmp/foo ; mount -t tmpfs none /tmp/foo ; cd /tmp/foo ; echo under > a ; cat /tmp/foo/a under ; mount -t tmpfs none /tmp/foo ; cat a under ; cat /tmp/foo/a cat: /tmp/foo/a: no such file or directory ; echo under > b ; cat b under ; cat /tmp/foo/b cat: /tmp/foo/b: no such file or directory ; It's been a bad decision (if it can be called that - it's been more of an accident, AFAICT), but it's decades too late to change it. And interaction with mount is also fun: mount(2) *DOES* follow mounts on the end of any pathname, no matter what. So in case when we are standing in an overmounted directory, ls . will show the contents of that directory, but mount <something> . will mount on top of whatever's mounted there. So the alternative I've mentioned above would change the behaviour of old syscalls in a corner case that just might be actually used in userland code - including the scripts run at the boot time, of all things ;-/ IOW, it probably falls under "can't touch that, no matter how much we'd like to" ;-/ Pity, that... That leaves the question of MOVE_MOUNT_BENEATH with empty pathname - do we want a variant that would say "slide precisely under the opened directory I gave you, no matter what might overmount it"? At the very least this corner case needs to be documented in move_mount(2) - behaviour of move_mount(_, _, dir_fd, "", MOVE_MOUNT_T_EMPTY | MOVE_MOUNT_BENEATH) has two apriori reasonable variants ("slide right under the top of whatever pile there might be over dir_fd" and "slide right under dir_fd itself, no matter what pile might be on top of that") and leaving it unspecified is not good, IMO...
On Mon, Aug 18, 2025 at 09:56:06PM +0100, Al Viro wrote: > On Mon, Aug 18, 2025 at 09:14:28PM +0100, Al Viro wrote: > > > Alternative would be to treat these races as "act as if we'd won and > > the other guy had overmounted ours", i.e. *NOT* follow mounts. Again, > > for old syscalls that's fine - if another thread has raced with us and > > mounted something on top of the place we want to mount on, it could just > > as easily have come *after* we'd completed mount(2) and mounted their > > stuff on top of ours. If userland is not fine with such outcome, it needs > > to provide serialization between the callers. For move_mount(2)... again, > > the only real question is empty to_path case. > > > > Comments? > > Thinking about it a bit more... Unfortunately, there's another corner > case: "." as mountpoint. That would affect that old syscalls as well > and I'm not sure that there's no userland code that relies upon the > current behaviour. > > Background: pathname resolution does *NOT* follow mounts on the starting > point and it does not follow mounts after "." > > ; mkdir /tmp/foo > ; mount -t tmpfs none /tmp/foo > ; cd /tmp/foo > ; echo under > a > ; cat /tmp/foo/a > under > ; mount -t tmpfs none /tmp/foo > ; cat a > under > ; cat /tmp/foo/a > cat: /tmp/foo/a: no such file or directory > ; echo under > b > ; cat b > under > ; cat /tmp/foo/b > cat: /tmp/foo/b: no such file or directory > ; > > It's been a bad decision (if it can be called that - it's been more > of an accident, AFAICT), but it's decades too late to change it. > And interaction with mount is also fun: mount(2) *DOES* follow mounts > on the end of any pathname, no matter what. So in case when we are > standing in an overmounted directory, ls . will show the contents of > that directory, but mount <something> . will mount on top of whatever's > mounted there. > > So the alternative I've mentioned above would change the behaviour of > old syscalls in a corner case that just might be actually used in userland > code - including the scripts run at the boot time, of all things ;-/ > > IOW, it probably falls under "can't touch that, no matter how much we'd > like to" ;-/ Pity, that... > > That leaves the question of MOVE_MOUNT_BENEATH with empty pathname - > do we want a variant that would say "slide precisely under the opened > directory I gave you, no matter what might overmount it"? Afaict, right now MOVE_MOUNT_BENEATH will take the overmount into account even for "." just like mount(2) will lookup the topmost mount no matter what. That is what userspace expects. I don't think we need a variant where "." ignores overmounts for MOVE_MOUNT_BENEATH and really not unless someone has a specific use-case for it. If it comes to that we should probably add a new flag. > > At the very least this corner case needs to be documented in move_mount(2) > - behaviour of > move_mount(_, _, dir_fd, "", > MOVE_MOUNT_T_EMPTY | MOVE_MOUNT_BENEATH) > has two apriori reasonable variants ("slide right under the top of > whatever pile there might be over dir_fd" and "slide right under dir_fd Yes, that's what's intended and documented also what I wrote in my commit messages and what the selftests should test for. I specifically did not make it deviate from standard mount(2) behavior. > itself, no matter what pile might be on top of that") and leaving it > unspecified is not good, IMO... Sure, Aleksa can pull that into his documentation patches.
On Tue, Aug 19, 2025 at 11:40:14AM +0200, Christian Brauner wrote: > On Mon, Aug 18, 2025 at 09:56:06PM +0100, Al Viro wrote: > > On Mon, Aug 18, 2025 at 09:14:28PM +0100, Al Viro wrote: > > > > > Alternative would be to treat these races as "act as if we'd won and > > > the other guy had overmounted ours", i.e. *NOT* follow mounts. Again, > > > for old syscalls that's fine - if another thread has raced with us and > > > mounted something on top of the place we want to mount on, it could just > > > as easily have come *after* we'd completed mount(2) and mounted their > > > stuff on top of ours. If userland is not fine with such outcome, it needs > > > to provide serialization between the callers. For move_mount(2)... again, > > > the only real question is empty to_path case. > > > > > > Comments? > > > > Thinking about it a bit more... Unfortunately, there's another corner > > case: "." as mountpoint. That would affect that old syscalls as well > > and I'm not sure that there's no userland code that relies upon the > > current behaviour. > > > > Background: pathname resolution does *NOT* follow mounts on the starting > > point and it does not follow mounts after "." > > > > ; mkdir /tmp/foo > > ; mount -t tmpfs none /tmp/foo > > ; cd /tmp/foo > > ; echo under > a > > ; cat /tmp/foo/a > > under > > ; mount -t tmpfs none /tmp/foo > > ; cat a > > under > > ; cat /tmp/foo/a > > cat: /tmp/foo/a: no such file or directory > > ; echo under > b > > ; cat b > > under > > ; cat /tmp/foo/b > > cat: /tmp/foo/b: no such file or directory > > ; > > > > It's been a bad decision (if it can be called that - it's been more > > of an accident, AFAICT), but it's decades too late to change it. > > And interaction with mount is also fun: mount(2) *DOES* follow mounts > > on the end of any pathname, no matter what. So in case when we are > > standing in an overmounted directory, ls . will show the contents of > > that directory, but mount <something> . will mount on top of whatever's > > mounted there. > > > > So the alternative I've mentioned above would change the behaviour of > > old syscalls in a corner case that just might be actually used in userland > > code - including the scripts run at the boot time, of all things ;-/ > > > > IOW, it probably falls under "can't touch that, no matter how much we'd > > like to" ;-/ Pity, that... > > > > That leaves the question of MOVE_MOUNT_BENEATH with empty pathname - > > do we want a variant that would say "slide precisely under the opened > > directory I gave you, no matter what might overmount it"? > > Afaict, right now MOVE_MOUNT_BENEATH will take the overmount into > account even for "." just like mount(2) will lookup the topmost mount no > matter what. That is what userspace expects. I don't think we need a > variant where "." ignores overmounts for MOVE_MOUNT_BENEATH and really > not unless someone has a specific use-case for it. If it comes to that > we should probably add a new flag. > > > > > At the very least this corner case needs to be documented in move_mount(2) > > - behaviour of > > move_mount(_, _, dir_fd, "", > > MOVE_MOUNT_T_EMPTY | MOVE_MOUNT_BENEATH) > > has two apriori reasonable variants ("slide right under the top of > > whatever pile there might be over dir_fd" and "slide right under dir_fd > > Yes, that's what's intended and documented also what I wrote in my > commit messages and what the selftests should test for. I specifically > did not make it deviate from standard mount(2) behavior. > > > itself, no matter what pile might be on top of that") and leaving it > > unspecified is not good, IMO... > > Sure, Aleksa can pull that into his documentation patches. Hello all, I am writing to follow up on this RFC patch. The last discussion was a month ago and it seems like the conversation has stalled. Thank you. Best regards, Ryan Chung
© 2016 - 2025 Red Hat, Inc.