[PATCH -next 1/5] md/raid5: don't allow replacement while reshape is not done

Yu Kuai posted 5 patches 2 years, 9 months ago
[PATCH -next 1/5] md/raid5: don't allow replacement while reshape is not done
Posted by Yu Kuai 2 years, 9 months ago
From: Yu Kuai <yukuai3@huawei.com>

Set rdev replacement has but not only two conditions:

1) MD_RECOVERY_RUNNING is not set;
2) rdev nr_pending is 0;

If reshape is interrupted(for example, echo frozen to sync_action), then
rdev replacement can be set. It's safe because reshape is always prior to
resync in md_check_recovery(). However, if system reboots, then kernel will
complain cannot handle concurrent replacement and reshape and this array
is not able to assemble anymore.

Fix this problem by don't allow replacement until reshape is done.

Reported-by: Peter Neuwirth <reddunur@online.de>
Link: https://lore.kernel.org/linux-raid/e2f96772-bfbc-f43b-6da1-f520e5164536@online.de/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a58507a4345d..bd3b535c0739 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8378,6 +8378,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 		p = conf->disks + disk;
 		tmp = rdev_mdlock_deref(mddev, p->rdev);
 		if (test_bit(WantReplacement, &tmp->flags) &&
+		    mddev->reshape_position == MaxSector &&
 		    p->replacement == NULL) {
 			clear_bit(In_sync, &rdev->flags);
 			set_bit(Replacement, &rdev->flags);
-- 
2.39.2
Re: [PATCH -next 1/5] md/raid5: don't allow replacement while reshape is not done
Posted by Song Liu 2 years, 8 months ago
On Thu, May 11, 2023 at 6:59 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Set rdev replacement has but not only two conditions:
>
> 1) MD_RECOVERY_RUNNING is not set;
> 2) rdev nr_pending is 0;

The above is confusing. I updated it and applied the set to md-next.
Please let me know if it looks good.

Thanks,
Song

>
> If reshape is interrupted(for example, echo frozen to sync_action), then
> rdev replacement can be set. It's safe because reshape is always prior to
> resync in md_check_recovery(). However, if system reboots, then kernel will
> complain cannot handle concurrent replacement and reshape and this array
> is not able to assemble anymore.
>
> Fix this problem by don't allow replacement until reshape is done.
>
> Reported-by: Peter Neuwirth <reddunur@online.de>
> Link: https://lore.kernel.org/linux-raid/e2f96772-bfbc-f43b-6da1-f520e5164536@online.de/
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/raid5.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index a58507a4345d..bd3b535c0739 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -8378,6 +8378,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>                 p = conf->disks + disk;
>                 tmp = rdev_mdlock_deref(mddev, p->rdev);
>                 if (test_bit(WantReplacement, &tmp->flags) &&
> +                   mddev->reshape_position == MaxSector &&
>                     p->replacement == NULL) {
>                         clear_bit(In_sync, &rdev->flags);
>                         set_bit(Replacement, &rdev->flags);
> --
> 2.39.2
>
Re: [PATCH -next 1/5] md/raid5: don't allow replacement while reshape is not done
Posted by Yu Kuai 2 years, 8 months ago
Hi,

在 2023/05/20 7:33, Song Liu 写道:
> On Thu, May 11, 2023 at 6:59 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Set rdev replacement has but not only two conditions:
>>
>> 1) MD_RECOVERY_RUNNING is not set;
>> 2) rdev nr_pending is 0;
> 
> The above is confusing. I updated it and applied the set to md-next.

By the way, I'm willing to add regression test for these problems, and I
already send two other tests and there are no response yet. Should the
test wait for fixed patch to be applied to make progress?

Thanks,
Kuai
> Please let me know if it looks good.
> 
> Thanks,
> Song
> 
>>
>> If reshape is interrupted(for example, echo frozen to sync_action), then
>> rdev replacement can be set. It's safe because reshape is always prior to
>> resync in md_check_recovery(). However, if system reboots, then kernel will
>> complain cannot handle concurrent replacement and reshape and this array
>> is not able to assemble anymore.
>>
>> Fix this problem by don't allow replacement until reshape is done.
>>
>> Reported-by: Peter Neuwirth <reddunur@online.de>
>> Link: https://lore.kernel.org/linux-raid/e2f96772-bfbc-f43b-6da1-f520e5164536@online.de/
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/raid5.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index a58507a4345d..bd3b535c0739 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -8378,6 +8378,7 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>>                  p = conf->disks + disk;
>>                  tmp = rdev_mdlock_deref(mddev, p->rdev);
>>                  if (test_bit(WantReplacement, &tmp->flags) &&
>> +                   mddev->reshape_position == MaxSector &&
>>                      p->replacement == NULL) {
>>                          clear_bit(In_sync, &rdev->flags);
>>                          set_bit(Replacement, &rdev->flags);
>> --
>> 2.39.2
>>
> .
> 

Re: [PATCH -next 1/5] md/raid5: don't allow replacement while reshape is not done
Posted by Song Liu 2 years, 8 months ago
On Sun, May 21, 2023 at 8:46 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/20 7:33, Song Liu 写道:
> > On Thu, May 11, 2023 at 6:59 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> From: Yu Kuai <yukuai3@huawei.com>
> >>
> >> Set rdev replacement has but not only two conditions:
> >>
> >> 1) MD_RECOVERY_RUNNING is not set;
> >> 2) rdev nr_pending is 0;
> >
> > The above is confusing. I updated it and applied the set to md-next.
>
> By the way, I'm willing to add regression test for these problems, and I
> already send two other tests and there are no response yet. Should the
> test wait for fixed patch to be applied to make progress?

Jes just had a baby, so there will be long delays with mdadm patches.
I am already using the new tests, so please keep sending them.

Thanks,
Song