[PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace)

Al Viro posted 1 patch 6 days, 1 hour ago
[PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace)
Posted by Al Viro 6 days, 1 hour ago
one should *not* be allowed to mount one of those, new API or not.

Reported-by: Denis Arefev <arefev@swemel.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
[[ I still want to see the rest of the reproducer - report smells like a missing
d_can_lookup() somewhere, on top of fsmount(2) bug]]
diff --git a/fs/namespace.c b/fs/namespace.c
index fe919abd2f01..17777c837683 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4499,6 +4499,10 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	new_mnt = vfs_create_mount(fc);
 	if (IS_ERR(new_mnt))
 		return PTR_ERR(new_mnt);
+	if (new_mnt->mnt_sb->s_flags & SB_NOUSER) {
+		mntput(new_mnt);
+		return -EINVAL;
+	}
 	new_mnt->mnt_flags = mnt_flags;
 
 	new_path.dentry = dget(fc->root);
Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace)
Posted by Christian Brauner 5 days, 12 hours ago
On Tue, 02 Jun 2026 03:04:44 +0100, Al Viro wrote:
> one should *not* be allowed to mount one of those, new API or not.

Applied to the vfs-7.2.misc branch of the vfs/vfs.git tree.
Patches in the vfs-7.2.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-7.2.misc

[1/1] mount: honour SB_NOUSER in the new mount API 
      https://git.kernel.org/vfs/vfs/c/67d8c452fae1
Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace)
Posted by Jan Kara 5 days, 18 hours ago
On Tue 02-06-26 03:04:44, Al Viro wrote:
> one should *not* be allowed to mount one of those, new API or not.
> 
> Reported-by: Denis Arefev <arefev@swemel.ru>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Won't it make sense to actually check fc->sb_flags before we call
vfs_create_mount()? Otherwise it looks good to me.

								Honza

> ---
> [[ I still want to see the rest of the reproducer - report smells like a missing
> d_can_lookup() somewhere, on top of fsmount(2) bug]]
> diff --git a/fs/namespace.c b/fs/namespace.c
> index fe919abd2f01..17777c837683 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -4499,6 +4499,10 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
>  	new_mnt = vfs_create_mount(fc);
>  	if (IS_ERR(new_mnt))
>  		return PTR_ERR(new_mnt);
> +	if (new_mnt->mnt_sb->s_flags & SB_NOUSER) {
> +		mntput(new_mnt);
> +		return -EINVAL;
> +	}
>  	new_mnt->mnt_flags = mnt_flags;
>  
>  	new_path.dentry = dget(fc->root);
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace)
Posted by Al Viro 5 days, 13 hours ago
On Tue, Jun 02, 2026 at 11:11:11AM +0200, Jan Kara wrote:
> On Tue 02-06-26 03:04:44, Al Viro wrote:
> > one should *not* be allowed to mount one of those, new API or not.
> > 
> > Reported-by: Denis Arefev <arefev@swemel.ru>
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> Won't it make sense to actually check fc->sb_flags before we call
> vfs_create_mount()? Otherwise it looks good to me.

Interpretation of fc->sb_flags is up to your ->get_tree().  What matters
is ->s_flags in the resulting superblock; that's type-independent and
that's what we ought to check...
Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace)
Posted by Arefev 5 days, 13 hours ago
02.06.2026 12:11, Jan Kara пишет:
> On Tue 02-06-26 03:04:44, Al Viro wrote:
>> one should *not* be allowed to mount one of those, new API or not.
>>
>> Reported-by: Denis Arefev <arefev@swemel.ru>
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> Won't it make sense to actually check fc->sb_flags before we call
> vfs_create_mount()? Otherwise it looks good to me.
>
> 								Honza

Hi all.

The sequence of system calls before the crash could be as follows:

fsopen("bdev", ...)
fsconfig(fd_fs, FSCONFIG_CMD_CREATE, 0,0,0)
fsmount(fd_fs, 0,0)
move_mount(fd_mnt, "", AT_FDCWD, "./file1", 0x46ul)

The system call executed at the time of the cras:

open("/dev/media0", ...);

Simplified stacktrace:

path_openat
|-> link_path_walk
    |-> walk_component
       |-> __lookup_slow
          |-> ld = inode->i_op->lookup(inode, dentry, flags);   <- Oops


Searching for possible solutions in the commit history yielded the 
following result:

commit fd3e007f6c6a0f677e4ee8aca4b9bab8ad6cab9a
commit 1a6e9e76b713d9632783efe78295ed3507fdad64
commit d6f2589ad561aa5fa39f347eca6942668b7560a1

Checking the fc->sb_flags flag before calling vfs_create_mount() is a 
great idea,
if it helps prevent crashes in two more file systems, 'sockfs' and 'pipefs'.

Best regards, Denis.
>
>> ---
>> [[ I still want to see the rest of the reproducer - report smells like a missing
>> d_can_lookup() somewhere, on top of fsmount(2) bug]]
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index fe919abd2f01..17777c837683 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -4499,6 +4499,10 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
>>   	new_mnt = vfs_create_mount(fc);
>>   	if (IS_ERR(new_mnt))
>>   		return PTR_ERR(new_mnt);
>> +	if (new_mnt->mnt_sb->s_flags & SB_NOUSER) {
>> +		mntput(new_mnt);
>> +		return -EINVAL;
>> +	}
>>   	new_mnt->mnt_flags = mnt_flags;
>>   
>>   	new_path.dentry = dget(fc->root);
Re: [PATCH] make new mount API honour SB_NOUSER (was Re: [PATCH] block: Avoid mounting the bdev pseudo-filesystem in userspace)
Posted by Al Viro 5 days, 12 hours ago
On Tue, Jun 02, 2026 at 04:23:21PM +0300, Arefev wrote:

> The sequence of system calls before the crash could be as follows:
> 
> fsopen("bdev", ...)
> fsconfig(fd_fs, FSCONFIG_CMD_CREATE, 0,0,0)
> fsmount(fd_fs, 0,0)
> move_mount(fd_mnt, "", AT_FDCWD, "./file1", 0x46ul)

	Huh?  "file1" being a regular file or was it actually
a directory?  AFAICS, the d_is_dir() mismatch would be rejected
by do_move_mount()...

> The system call executed at the time of the cras:
> 
> open("/dev/media0", ...);
> 
> Simplified stacktrace:
> 
> path_openat
> |-> link_path_walk
>    |-> walk_component
>       |-> __lookup_slow
>          |-> ld = inode->i_op->lookup(inode, dentry, flags);   <- Oops

How the hell does that thing bound on top of "./file1" lead to
resolution of "/dev/media0" walking anywhere near it?  Something's
missing here.

> Checking the fc->sb_flags flag before calling vfs_create_mount() is a great
> idea,
> if it helps prevent crashes in two more file systems, 'sockfs' and 'pipefs'.

Calling vfs_create_mount() is not a problem; refusing to attach
the result if SB_NOUSER has ended up in ->s_flags is the right
thing to do, but I still would like to understand how did this call
of walk_component() manage to evade
                if (unlikely(!d_can_lookup(nd->path.dentry))) {
			if (nd->flags & LOOKUP_RCU) {
				if (!try_to_unlazy(nd))
					return -ECHILD;
			}
			return -ENOTDIR;
		}
on the previous iteration through link_path_walk() or, if it had been
the first one, the corresponding checks at chroot()/chdir()/fchdir() time.

Note that there are very legitimate objects with NULL ->lookup() - every
regular file is like that, obviously, but there also exist ones that look
like directories in mode bits, but still have NULL ->lookup().  See
d_flags_for_inode() and look for DCACHE_AUTODIR_TYPE there.

So whatever scenario has played out, you've got a call of walk_component()
with nd->path.dentry that should have failed d_can_lookup().  That ought
to have been prevented and this prevention would better be much closer
than anything fsmount(2) does.

Don't get me wrong - userland mounting of bdev and friends should not be
allowed, but that's not the only thing that went wrong in the reproducer.
BTW, how easy to trigger it is?  Is that "you need to run for a few months
on a bunch of boxen" or "run this sequence and it'll crash that way"?