[PATCH -next v2 6/6] md: enhance checking in md_check_recovery()

Yu Kuai posted 6 patches 2 years, 8 months ago
[PATCH -next v2 6/6] md: enhance checking in md_check_recovery()
Posted by Yu Kuai 2 years, 8 months ago
From: Yu Kuai <yukuai3@huawei.com>

For md_check_recovery():

1) if 'MD_RECOVERY_RUNING' is not set, register new sync_thread.
2) if 'MD_RECOVERY_RUNING' is set:
 a) if 'MD_RECOVERY_DONE' is not set, don't do anything, wait for
   md_do_sync() to be done.
 b) if 'MD_RECOVERY_DONE' is set, unregister sync_thread. Current code
   expects that sync_thread is not NULL, otherwise new sync_thread will
   be registered, which will corrupt the array.

Make sure md_check_recovery() won't register new sync_thread if
'MD_RECOVERY_RUNING' is still set, and a new WARN_ON_ONCE() is added for
the above corruption,

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f90226e6ddf8..9da0fc906bbd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9397,16 +9397,24 @@ void md_check_recovery(struct mddev *mddev)
 		if (mddev->sb_flags)
 			md_update_sb(mddev, 0);
 
-		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
-		    !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
-			/* resync/recovery still happening */
-			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-			goto unlock;
-		}
-		if (mddev->sync_thread) {
+		/*
+		 * Never start a new sync thread if MD_RECOVERY_RUNNING is
+		 * still set.
+		 */
+		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+			if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
+				/* resync/recovery still happening */
+				clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+				goto unlock;
+			}
+
+			if (WARN_ON_ONCE(!mddev->sync_thread))
+				goto unlock;
+
 			md_reap_sync_thread(mddev);
 			goto unlock;
 		}
+
 		/* Set RUNNING before clearing NEEDED to avoid
 		 * any transients in the value of "sync_action".
 		 */
-- 
2.39.2
Re: [dm-devel] [PATCH -next v2 6/6] md: enhance checking in md_check_recovery()
Posted by Xiao Ni 2 years, 8 months ago
在 2023/5/29 下午9:20, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> For md_check_recovery():
>
> 1) if 'MD_RECOVERY_RUNING' is not set, register new sync_thread.
> 2) if 'MD_RECOVERY_RUNING' is set:
>   a) if 'MD_RECOVERY_DONE' is not set, don't do anything, wait for
>     md_do_sync() to be done.
>   b) if 'MD_RECOVERY_DONE' is set, unregister sync_thread. Current code
>     expects that sync_thread is not NULL, otherwise new sync_thread will
>     be registered, which will corrupt the array.
>
> Make sure md_check_recovery() won't register new sync_thread if
> 'MD_RECOVERY_RUNING' is still set, and a new WARN_ON_ONCE() is added for
> the above corruption,
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/md.c | 22 +++++++++++++++-------
>   1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index f90226e6ddf8..9da0fc906bbd 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9397,16 +9397,24 @@ void md_check_recovery(struct mddev *mddev)
>   		if (mddev->sb_flags)
>   			md_update_sb(mddev, 0);
>   
> -		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
> -		    !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
> -			/* resync/recovery still happening */
> -			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> -			goto unlock;
> -		}
> -		if (mddev->sync_thread) {
> +		/*
> +		 * Never start a new sync thread if MD_RECOVERY_RUNNING is
> +		 * still set.
> +		 */
> +		if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> +			if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
> +				/* resync/recovery still happening */
> +				clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> +				goto unlock;
> +			}
> +
> +			if (WARN_ON_ONCE(!mddev->sync_thread))
> +				goto unlock;
> +
>   			md_reap_sync_thread(mddev);
>   			goto unlock;
>   		}
> +
>   		/* Set RUNNING before clearing NEEDED to avoid
>   		 * any transients in the value of "sync_action".
>   		 */

It makes the logical more clear.

Reviewed-by: Xiao Ni <xni@redhat.com>