bdev_mark_dead()'s @surprise == true means the device is already gone.
The filesystem callback fs_bdev_mark_dead() honours this and skips
sync_filesystem(), but the bare block device path (no ->mark_dead op)
lost its !surprise guard when the holder ->mark_dead callback was wired
up (see Fixes), and now calls sync_blockdev() unconditionally, which can
hang forever waiting on writeback that can no longer complete.
syzkaller hit this via nvme_reset_work()'s "I/O queues lost" path:
nvme_mark_namespaces_dead() -> blk_mark_disk_dead() ->
bdev_mark_dead(bdev, true) -> sync_blockdev() blocks in
folio_wait_writeback(), wedging the reset worker and every task waiting
on it.
Skip the sync on surprise removal, matching fs_bdev_mark_dead();
invalidate_bdev() still runs. Orderly removal (surprise == false) is
unchanged.
Fixes: d8530de5a6e8 ("block: call into the file system for bdev_mark_dead")
Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
block/bdev.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/block/bdev.c b/block/bdev.c
index b8fbb9576110..7fc3f5ba22a3 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1259,7 +1259,13 @@ void bdev_mark_dead(struct block_device *bdev, bool surprise)
bdev->bd_holder_ops->mark_dead(bdev, surprise);
else {
mutex_unlock(&bdev->bd_holder_lock);
- sync_blockdev(bdev);
+ /*
+ * On surprise removal the device is already gone; syncing is
+ * futile and can hang forever waiting on I/O that will never
+ * complete. Match fs_bdev_mark_dead(), which also skips it.
+ */
+ if (!surprise)
+ sync_blockdev(bdev);
}
invalidate_bdev(bdev);
--
2.43.0