[PATCH] md/raid5: Fix a deadlock of reshape and suspend

linan666@huaweicloud.com posted 1 patch 1 week ago
drivers/md/raid5.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] md/raid5: Fix a deadlock of reshape and suspend
Posted by linan666@huaweicloud.com 1 week ago
From: Li Nan <linan122@huawei.com>

Commit 868bba54a3bc ("md/raid5: fix a deadlock in the case that reshape is
interrupted") fixed a raid deadlock of reshape, but a similar issue is hit
by mdadm test 25raid456-reshape-deadlock.

  INFO: task (udev-worker):63822 blocked for more than 122 seconds.
        Not tainted 6.18.0-rc2-g0555b5424915-dirty #153
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  __schedule
  schedule
  schedule_timeout
  wait_woken
  raid5_make_request
  md_handle_request
  md_submit_bio
  [...]
  blkdev_read_iter
  vfs_read
  ksys_read
  __x64_sys_read

It is triggered by:
1) normal IO waits for reshape to progress
2) user sets ACTION_FROZEN via ioctl
3) reshape is interrupted and cannot restart
4) users try to suspend array while active IO waits reshape

Following Kuai's previous fix, such IOs should fail in
make_stripe_request(). Thus, set a timeout for wait_woken() to fix
the deadlock, and blocked IO will fail in the next cycle.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index cdbc7eba5c54..957e712d2be9 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6185,7 +6185,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 			}
 
 			wait_woken(&wait, TASK_UNINTERRUPTIBLE,
-				   MAX_SCHEDULE_TIMEOUT);
+				   msecs_to_jiffies(10000));
 			continue;
 		}
 
-- 
2.39.2