fs: Add additional checks for block devices during mount

[PATCH] fs: Add additional checks for block devices during mount

Posted by Zizhi Wo 2 months, 2 weeks ago

A filesystem abnormal mount issue was found during current testing:

disk_container=$(...kata-runtime...io.kubernets.docker.type=container...)
docker_id=$(...kata-runtime...io.katacontainers.disk_share=
            "{"src":"/dev/sdb","dest":"/dev/test"}"...)
${docker} stop "$disk_container"
${docker} exec "$docker_id" mount /dev/test /tmp -->success!!

When the "disk_container" is stopped, the created sda/sdb/sdc disks are
already deleted, but inside the "docker_id", /dev/test can still be mounted
successfully. The reason is that runc calls unshare, which triggers
clone_mnt(), increasing the "sb->s_active" reference count. As long as the
"docker_id" does not exit, the superblock still has a reference count.

So when mounting, the old superblock is reused in sget_fc(), and the mount
succeeds, even if the actual device no longer exists. The whole process can
be simplified as follows:

mkfs.ext4 -F /dev/sdb
mount /dev/sdb /mnt
mknod /dev/test b 8 16    # [sdb 8:16]
echo 1 > /sys/block/sdb/device/delete
mount /dev/test /mnt1    # -> mount success

Therefore, it is necessary to add an extra check. Solve this problem by
checking disk_live() in super_s_dev_test().

Fixes: aca740cecbe5 ("fs: open block device after superblock creation")
Link: https://lore.kernel.org/all/20250717091150.2156842-1-wozizhi@huawei.com/
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
---
 fs/super.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 80418ca8e215..8030fb519eb5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1376,8 +1376,16 @@ static int super_s_dev_set(struct super_block *s, struct fs_context *fc)
 
 static int super_s_dev_test(struct super_block *s, struct fs_context *fc)
 {
-	return !(s->s_iflags & SB_I_RETIRED) &&
-		s->s_dev == *(dev_t *)fc->sget_key;
+	if (s->s_iflags & SB_I_RETIRED)
+		return false;
+
+	if (s->s_dev != *(dev_t *)fc->sget_key)
+		return false;
+
+	if (s->s_bdev && !disk_live(s->s_bdev->bd_disk))
+		return false;
+
+	return true;
 }
 
 /**
-- 
2.39.2

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by Zizhi Wo 2 months, 2 weeks ago


在 2025/7/19 10:44, Zizhi Wo 写道:
> A filesystem abnormal mount issue was found during current testing:
> 
> disk_container=$(...kata-runtime...io.kubernets.docker.type=container...)
> docker_id=$(...kata-runtime...io.katacontainers.disk_share=
>              "{"src":"/dev/sdb","dest":"/dev/test"}"...)
> ${docker} stop "$disk_container"
> ${docker} exec "$docker_id" mount /dev/test /tmp -->success!!
> 
> When the "disk_container" is stopped, the created sda/sdb/sdc disks are
> already deleted, but inside the "docker_id", /dev/test can still be mounted
> successfully. The reason is that runc calls unshare, which triggers
> clone_mnt(), increasing the "sb->s_active" reference count. As long as the
> "docker_id" does not exit, the superblock still has a reference count.
> 
> So when mounting, the old superblock is reused in sget_fc(), and the mount
> succeeds, even if the actual device no longer exists. The whole process can
> be simplified as follows:
> 
> mkfs.ext4 -F /dev/sdb
> mount /dev/sdb /mnt
> mknod /dev/test b 8 16    # [sdb 8:16]
> echo 1 > /sys/block/sdb/device/delete
> mount /dev/test /mnt1    # -> mount success
> 
> Therefore, it is necessary to add an extra check. Solve this problem by
> checking disk_live() in super_s_dev_test().
> 
> Fixes: aca740cecbe5 ("fs: open block device after superblock creation")
> Link: https://lore.kernel.org/all/20250717091150.2156842-1-wozizhi@huawei.com/
> Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
> ---
>   fs/super.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 80418ca8e215..8030fb519eb5 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1376,8 +1376,16 @@ static int super_s_dev_set(struct super_block *s, struct fs_context *fc)
>   
>   static int super_s_dev_test(struct super_block *s, struct fs_context *fc)
>   {
> -	return !(s->s_iflags & SB_I_RETIRED) &&
> -		s->s_dev == *(dev_t *)fc->sget_key;
> +	if (s->s_iflags & SB_I_RETIRED)
> +		return false;
> +
> +	if (s->s_dev != *(dev_t *)fc->sget_key)
> +		return false;
> +
> +	if (s->s_bdev && !disk_live(s->s_bdev->bd_disk))
> +		return false;
> +
> +	return true;
>   }
>   
>   /**

Thanks for all the feedback. I will take some time to prepare a v2
version of the fix.

Thanks,
Zizhi Wo

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by kernel test robot 2 months, 2 weeks ago

Hi Zizhi,

kernel test robot noticed the following build errors:

[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on jack-fs/for_next linus/master v6.16-rc6 next-20250718]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Zizhi-Wo/fs-Add-additional-checks-for-block-devices-during-mount/20250719-105053
base:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link:    https://lore.kernel.org/r/20250719024403.3452285-1-wozizhi%40huawei.com
patch subject: [PATCH] fs: Add additional checks for block devices during mount
config: i386-buildonly-randconfig-001-20250719 (https://download.01.org/0day-ci/archive/20250719/202507192025.N75TF4Gp-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250719/202507192025.N75TF4Gp-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507192025.N75TF4Gp-lkp@intel.com/

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: disk_live
   >>> referenced by super.c:1385 (fs/super.c:1385)
   >>>               fs/super.o:(super_s_dev_test) in archive vmlinux.a

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by Zizhi Wo 2 months, 2 weeks ago


在 2025/7/19 20:32, kernel test robot 写道:
> Hi Zizhi,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on brauner-vfs/vfs.all]
> [also build test ERROR on jack-fs/for_next linus/master v6.16-rc6 next-20250718]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Zizhi-Wo/fs-Add-additional-checks-for-block-devices-during-mount/20250719-105053
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
> patch link:    https://lore.kernel.org/r/20250719024403.3452285-1-wozizhi%40huawei.com
> patch subject: [PATCH] fs: Add additional checks for block devices during mount
> config: i386-buildonly-randconfig-001-20250719 (https://download.01.org/0day-ci/archive/20250719/202507192025.N75TF4Gp-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250719/202507192025.N75TF4Gp-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202507192025.N75TF4Gp-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>>> ld.lld: error: undefined symbol: disk_live
>     >>> referenced by super.c:1385 (fs/super.c:1385)
>     >>>               fs/super.o:(super_s_dev_test) in archive vmlinux.a
> 

Sorry, disk_live() is only declared but not defined when CONFIG_BLOCK is
not set...

Perhaps, as hch suggests, it can be verified through GD_DEAD?

Thanks,
Zizhi Wo

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by Christoph Hellwig 2 months, 2 weeks ago

On Mon, Jul 21, 2025 at 09:20:27AM +0800, Zizhi Wo wrote:
> Sorry, disk_live() is only declared but not defined when CONFIG_BLOCK is
> not set...

You can just add a if (IS_ENABLED(CONFIG_BLOCK)) check around it.

But the layering here feels wrong.  sget_dev and it's helper operate
purely on the dev_t.  Anything actually dealing with a block device /
gendisk should be in the helpers that otherwise use it.

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by Christian Brauner 2 months, 2 weeks ago

On Mon, Jul 21, 2025 at 08:47:12AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 21, 2025 at 09:20:27AM +0800, Zizhi Wo wrote:
> > Sorry, disk_live() is only declared but not defined when CONFIG_BLOCK is
> > not set...
> 
> You can just add a if (IS_ENABLED(CONFIG_BLOCK)) check around it.
> 
> 
> But the layering here feels wrong.  sget_dev and it's helper operate
> purely on the dev_t.  Anything actually dealing with a block device /
> gendisk should be in the helpers that otherwise use it.

Either we add a lookup_bdev_alive() variant that checks whether the
inode is still hashed when looking up the dev_t and we accept that this
deals with the obvious cases and accept that this is racy...

Or we add lookup_bdev_no_open() that returns the block device with the
reference bumped, paired with lookup_bdev_put_no_open(). Afaiu, that
would prevent deletion. We could even put this behind a
guard(bdev_no_open)(fc->source). The reference count bump shouldn't
matter there. Christoph?

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by Christoph Hellwig 2 months, 2 weeks ago

On Wed, Jul 23, 2025 at 02:51:27PM +0200, Christian Brauner wrote:
> > You can just add a if (IS_ENABLED(CONFIG_BLOCK)) check around it.
> > 
> > 
> > But the layering here feels wrong.  sget_dev and it's helper operate
> > purely on the dev_t.  Anything actually dealing with a block device /
> > gendisk should be in the helpers that otherwise use it.
> 
> Either we add a lookup_bdev_alive() variant that checks whether the
> inode is still hashed when looking up the dev_t and we accept that this
> deals with the obvious cases and accept that this is racy...

I don't think racyness matters here.  The block device can die any
time, and the addition here is just to catch the cases where it might
have already been dead for a nicer interface.

> Or we add lookup_bdev_no_open() that returns the block device with the
> reference bumped, paired with lookup_bdev_put_no_open(). Afaiu, that
> would prevent deletion. We could even put this behind a
> guard(bdev_no_open)(fc->source). The reference count bump shouldn't
> matter there. Christoph?

Nothing prevents deletion, it will only get delayed until after the
open_mutex critical section.  I still think GD_DEAD is the best check
here, as it potentially gets set earlier than unshashing the inode,
but in the end both of the racy checks should be perfectly fine.

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by Zizhi Wo 2 months, 2 weeks ago

在 2025/7/21 14:47, Christoph Hellwig 写道:
> On Mon, Jul 21, 2025 at 09:20:27AM +0800, Zizhi Wo wrote:
>> Sorry, disk_live() is only declared but not defined when CONFIG_BLOCK is
>> not set...
> 
> You can just add a if (IS_ENABLED(CONFIG_BLOCK)) check around it.

Yes, adding this judgment directly is also fine.

> 
> 
> But the layering here feels wrong.  sget_dev and it's helper operate
> purely on the dev_t.  Anything actually dealing with a block device /
> gendisk should be in the helpers that otherwise use it.
> 
> 

Do you mean performing the check outside of sget_dev()? That is, after
we obtain an existing superblock, we then check whether the block device
exists, and if it doesn't, we report the error in the outer layer (e.g.,
in get_tree_bdev_flags(), this function seems to be targeted at bdev
rather than just dev)?

Thanks,
Zizhi Wo

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by Al Viro 2 months, 2 weeks ago

On Sat, Jul 19, 2025 at 10:44:03AM +0800, Zizhi Wo wrote:

> mkfs.ext4 -F /dev/sdb
> mount /dev/sdb /mnt
> mknod /dev/test b 8 16    # [sdb 8:16]
> echo 1 > /sys/block/sdb/device/delete
> mount /dev/test /mnt1    # -> mount success
> 
> Therefore, it is necessary to add an extra check. Solve this problem by
> checking disk_live() in super_s_dev_test().

That smells like a wrong approach...  You are counting upon the failure
of setup_bdev_super() after the thing is forced on the "no reuse" path,
and that's too convoluted and brittle...

Re: [PATCH] fs: Add additional checks for block devices during mount

Posted by Zizhi Wo 2 months, 2 weeks ago

在 2025/7/19 12:17, Al Viro 写道:
> On Sat, Jul 19, 2025 at 10:44:03AM +0800, Zizhi Wo wrote:
> 
>> mkfs.ext4 -F /dev/sdb
>> mount /dev/sdb /mnt
>> mknod /dev/test b 8 16    # [sdb 8:16]
>> echo 1 > /sys/block/sdb/device/delete
>> mount /dev/test /mnt1    # -> mount success
>>
>> Therefore, it is necessary to add an extra check. Solve this problem by
>> checking disk_live() in super_s_dev_test().
> 
> That smells like a wrong approach...  You are counting upon the failure
> of setup_bdev_super() after the thing is forced on the "no reuse" path,
> and that's too convoluted and brittle...
> 

Since this problem can only occur in the superblock reuse scenario, and
the .test function used in sget_fc() for bdev filesystems is
super_s_dev_test(), I considered adding an additional condition check
within that function.

I'm wondering if there's a better way to handle this...

Thanks,
Zizhi Wo