fs/super.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
A filesystem abnormal mount issue was found during current testing:
disk_container=$(...kata-runtime...io.kubernets.docker.type=container...)
docker_id=$(...kata-runtime...io.katacontainers.disk_share=
"{"src":"/dev/sdb","dest":"/dev/test"}"...)
${docker} stop "$disk_container"
${docker} exec "$docker_id" mount /dev/test /tmp -->success!!
When the "disk_container" is stopped, the created sda/sdb/sdc disks are
already deleted, but inside the "docker_id", /dev/test can still be mounted
successfully. The reason is that runc calls unshare, which triggers
clone_mnt(), increasing the "sb->s_active" reference count. As long as the
"docker_id" does not exit, the superblock still has a reference count.
So when mounting, the old superblock is reused in sget_fc(), and the mount
succeeds, even if the actual device no longer exists. The whole process can
be simplified as follows:
mkfs.ext4 -F /dev/sdb
mount /dev/sdb /mnt
mknod /dev/test b 8 16 # [sdb 8:16]
echo 1 > /sys/block/sdb/device/delete
mount /dev/test /mnt1 # -> mount success
Therefore, it is necessary to add an extra check. Solve this problem by
checking disk_live() in super_s_dev_test().
Fixes: aca740cecbe5 ("fs: open block device after superblock creation")
Link: https://lore.kernel.org/all/20250717091150.2156842-1-wozizhi@huawei.com/
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
---
fs/super.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index 80418ca8e215..8030fb519eb5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1376,8 +1376,16 @@ static int super_s_dev_set(struct super_block *s, struct fs_context *fc)
static int super_s_dev_test(struct super_block *s, struct fs_context *fc)
{
- return !(s->s_iflags & SB_I_RETIRED) &&
- s->s_dev == *(dev_t *)fc->sget_key;
+ if (s->s_iflags & SB_I_RETIRED)
+ return false;
+
+ if (s->s_dev != *(dev_t *)fc->sget_key)
+ return false;
+
+ if (s->s_bdev && !disk_live(s->s_bdev->bd_disk))
+ return false;
+
+ return true;
}
/**
--
2.39.2
在 2025/7/19 10:44, Zizhi Wo 写道: > A filesystem abnormal mount issue was found during current testing: > > disk_container=$(...kata-runtime...io.kubernets.docker.type=container...) > docker_id=$(...kata-runtime...io.katacontainers.disk_share= > "{"src":"/dev/sdb","dest":"/dev/test"}"...) > ${docker} stop "$disk_container" > ${docker} exec "$docker_id" mount /dev/test /tmp -->success!! > > When the "disk_container" is stopped, the created sda/sdb/sdc disks are > already deleted, but inside the "docker_id", /dev/test can still be mounted > successfully. The reason is that runc calls unshare, which triggers > clone_mnt(), increasing the "sb->s_active" reference count. As long as the > "docker_id" does not exit, the superblock still has a reference count. > > So when mounting, the old superblock is reused in sget_fc(), and the mount > succeeds, even if the actual device no longer exists. The whole process can > be simplified as follows: > > mkfs.ext4 -F /dev/sdb > mount /dev/sdb /mnt > mknod /dev/test b 8 16 # [sdb 8:16] > echo 1 > /sys/block/sdb/device/delete > mount /dev/test /mnt1 # -> mount success > > Therefore, it is necessary to add an extra check. Solve this problem by > checking disk_live() in super_s_dev_test(). > > Fixes: aca740cecbe5 ("fs: open block device after superblock creation") > Link: https://lore.kernel.org/all/20250717091150.2156842-1-wozizhi@huawei.com/ > Signed-off-by: Zizhi Wo <wozizhi@huawei.com> > --- > fs/super.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/fs/super.c b/fs/super.c > index 80418ca8e215..8030fb519eb5 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -1376,8 +1376,16 @@ static int super_s_dev_set(struct super_block *s, struct fs_context *fc) > > static int super_s_dev_test(struct super_block *s, struct fs_context *fc) > { > - return !(s->s_iflags & SB_I_RETIRED) && > - s->s_dev == *(dev_t *)fc->sget_key; > + if (s->s_iflags & SB_I_RETIRED) > + return false; > + > + if (s->s_dev != *(dev_t *)fc->sget_key) > + return false; > + > + if (s->s_bdev && !disk_live(s->s_bdev->bd_disk)) > + return false; > + > + return true; > } > > /** Thanks for all the feedback. I will take some time to prepare a v2 version of the fix. Thanks, Zizhi Wo
Hi Zizhi, kernel test robot noticed the following build errors: [auto build test ERROR on brauner-vfs/vfs.all] [also build test ERROR on jack-fs/for_next linus/master v6.16-rc6 next-20250718] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Zizhi-Wo/fs-Add-additional-checks-for-block-devices-during-mount/20250719-105053 base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all patch link: https://lore.kernel.org/r/20250719024403.3452285-1-wozizhi%40huawei.com patch subject: [PATCH] fs: Add additional checks for block devices during mount config: i386-buildonly-randconfig-001-20250719 (https://download.01.org/0day-ci/archive/20250719/202507192025.N75TF4Gp-lkp@intel.com/config) compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250719/202507192025.N75TF4Gp-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202507192025.N75TF4Gp-lkp@intel.com/ All errors (new ones prefixed by >>): >> ld.lld: error: undefined symbol: disk_live >>> referenced by super.c:1385 (fs/super.c:1385) >>> fs/super.o:(super_s_dev_test) in archive vmlinux.a -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
在 2025/7/19 20:32, kernel test robot 写道: > Hi Zizhi, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on brauner-vfs/vfs.all] > [also build test ERROR on jack-fs/for_next linus/master v6.16-rc6 next-20250718] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/Zizhi-Wo/fs-Add-additional-checks-for-block-devices-during-mount/20250719-105053 > base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all > patch link: https://lore.kernel.org/r/20250719024403.3452285-1-wozizhi%40huawei.com > patch subject: [PATCH] fs: Add additional checks for block devices during mount > config: i386-buildonly-randconfig-001-20250719 (https://download.01.org/0day-ci/archive/20250719/202507192025.N75TF4Gp-lkp@intel.com/config) > compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250719/202507192025.N75TF4Gp-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202507192025.N75TF4Gp-lkp@intel.com/ > > All errors (new ones prefixed by >>): > >>> ld.lld: error: undefined symbol: disk_live > >>> referenced by super.c:1385 (fs/super.c:1385) > >>> fs/super.o:(super_s_dev_test) in archive vmlinux.a > Sorry, disk_live() is only declared but not defined when CONFIG_BLOCK is not set... Perhaps, as hch suggests, it can be verified through GD_DEAD? Thanks, Zizhi Wo
On Mon, Jul 21, 2025 at 09:20:27AM +0800, Zizhi Wo wrote: > Sorry, disk_live() is only declared but not defined when CONFIG_BLOCK is > not set... You can just add a if (IS_ENABLED(CONFIG_BLOCK)) check around it. But the layering here feels wrong. sget_dev and it's helper operate purely on the dev_t. Anything actually dealing with a block device / gendisk should be in the helpers that otherwise use it.
On Mon, Jul 21, 2025 at 08:47:12AM +0200, Christoph Hellwig wrote: > On Mon, Jul 21, 2025 at 09:20:27AM +0800, Zizhi Wo wrote: > > Sorry, disk_live() is only declared but not defined when CONFIG_BLOCK is > > not set... > > You can just add a if (IS_ENABLED(CONFIG_BLOCK)) check around it. > > > But the layering here feels wrong. sget_dev and it's helper operate > purely on the dev_t. Anything actually dealing with a block device / > gendisk should be in the helpers that otherwise use it. Either we add a lookup_bdev_alive() variant that checks whether the inode is still hashed when looking up the dev_t and we accept that this deals with the obvious cases and accept that this is racy... Or we add lookup_bdev_no_open() that returns the block device with the reference bumped, paired with lookup_bdev_put_no_open(). Afaiu, that would prevent deletion. We could even put this behind a guard(bdev_no_open)(fc->source). The reference count bump shouldn't matter there. Christoph?
On Wed, Jul 23, 2025 at 02:51:27PM +0200, Christian Brauner wrote: > > You can just add a if (IS_ENABLED(CONFIG_BLOCK)) check around it. > > > > > > But the layering here feels wrong. sget_dev and it's helper operate > > purely on the dev_t. Anything actually dealing with a block device / > > gendisk should be in the helpers that otherwise use it. > > Either we add a lookup_bdev_alive() variant that checks whether the > inode is still hashed when looking up the dev_t and we accept that this > deals with the obvious cases and accept that this is racy... I don't think racyness matters here. The block device can die any time, and the addition here is just to catch the cases where it might have already been dead for a nicer interface. > Or we add lookup_bdev_no_open() that returns the block device with the > reference bumped, paired with lookup_bdev_put_no_open(). Afaiu, that > would prevent deletion. We could even put this behind a > guard(bdev_no_open)(fc->source). The reference count bump shouldn't > matter there. Christoph? Nothing prevents deletion, it will only get delayed until after the open_mutex critical section. I still think GD_DEAD is the best check here, as it potentially gets set earlier than unshashing the inode, but in the end both of the racy checks should be perfectly fine.
在 2025/7/21 14:47, Christoph Hellwig 写道: > On Mon, Jul 21, 2025 at 09:20:27AM +0800, Zizhi Wo wrote: >> Sorry, disk_live() is only declared but not defined when CONFIG_BLOCK is >> not set... > > You can just add a if (IS_ENABLED(CONFIG_BLOCK)) check around it. Yes, adding this judgment directly is also fine. > > > But the layering here feels wrong. sget_dev and it's helper operate > purely on the dev_t. Anything actually dealing with a block device / > gendisk should be in the helpers that otherwise use it. > > Do you mean performing the check outside of sget_dev()? That is, after we obtain an existing superblock, we then check whether the block device exists, and if it doesn't, we report the error in the outer layer (e.g., in get_tree_bdev_flags(), this function seems to be targeted at bdev rather than just dev)? Thanks, Zizhi Wo
On Sat, Jul 19, 2025 at 10:44:03AM +0800, Zizhi Wo wrote: > mkfs.ext4 -F /dev/sdb > mount /dev/sdb /mnt > mknod /dev/test b 8 16 # [sdb 8:16] > echo 1 > /sys/block/sdb/device/delete > mount /dev/test /mnt1 # -> mount success > > Therefore, it is necessary to add an extra check. Solve this problem by > checking disk_live() in super_s_dev_test(). That smells like a wrong approach... You are counting upon the failure of setup_bdev_super() after the thing is forced on the "no reuse" path, and that's too convoluted and brittle...
在 2025/7/19 12:17, Al Viro 写道: > On Sat, Jul 19, 2025 at 10:44:03AM +0800, Zizhi Wo wrote: > >> mkfs.ext4 -F /dev/sdb >> mount /dev/sdb /mnt >> mknod /dev/test b 8 16 # [sdb 8:16] >> echo 1 > /sys/block/sdb/device/delete >> mount /dev/test /mnt1 # -> mount success >> >> Therefore, it is necessary to add an extra check. Solve this problem by >> checking disk_live() in super_s_dev_test(). > > That smells like a wrong approach... You are counting upon the failure > of setup_bdev_super() after the thing is forced on the "no reuse" path, > and that's too convoluted and brittle... > Since this problem can only occur in the superblock reuse scenario, and the .test function used in sget_fc() for bdev filesystems is super_s_dev_test(), I considered adding an additional condition check within that function. I'm wondering if there's a better way to handle this... Thanks, Zizhi Wo
© 2016 - 2025 Red Hat, Inc.