md: suspend array while updating raid_disks via sysfs

[PATCH v2] md: suspend array while updating raid_disks via sysfs

Posted by dannyshih 1 month, 1 week ago

From: FengWei Shih <dannyshih@synology.com>

In raid1_reshape(), freeze_array() is called before modifying the r1bio
memory pool (conf->r1bio_pool) and conf->raid_disks, and
unfreeze_array() is called after the update is completed.

However, freeze_array() only waits until nr_sync_pending and
(nr_pending - nr_queued) of all buckets reaches zero. When an I/O error
occurs, nr_queued is increased and the corresponding r1bio is queued to
either retry_list or bio_end_io_list. As a result, freeze_array() may
unblock before these r1bios are released.

This can lead to a situation where conf->raid_disks and the mempool have
already been updated while queued r1bios, allocated with the old
raid_disks value, are later released. Consequently, free_r1bio() may
access memory out of bounds in put_all_bios() and release r1bios of the
wrong size to the new mempool, potentially causing issues with the
mempool as well.

Since only normal I/O might increase nr_queued while an I/O error occurs,
suspending the array avoids this issue.

Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends
the array. Therefore, we suspend the array when updating raid_disks
via sysfs to avoid this issue too.

Signed-off-by: FengWei Shih <dannyshih@synology.com>
---
v2:
  * Suspend array unconditionally when updating raid_disks
  * Refine commit message to describe the issue more concretely
---
 drivers/md/md.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e5922a682953..6bcbe1c7483c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4407,7 +4407,7 @@ raid_disks_store(struct mddev *mddev, const char *buf, size_t len)
 	if (err < 0)
 		return err;
 
-	err = mddev_lock(mddev);
+	err = mddev_suspend_and_lock(mddev);
 	if (err)
 		return err;
 	if (mddev->pers)
@@ -4432,7 +4432,7 @@ raid_disks_store(struct mddev *mddev, const char *buf, size_t len)
 	} else
 		mddev->raid_disks = n;
 out_unlock:
-	mddev_unlock(mddev);
+	mddev_unlock_and_resume(mddev);
 	return err ? err : len;
 }
 static struct md_sysfs_entry md_raid_disks =
-- 
2.17.1


Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.

Re: [PATCH v2] md: suspend array while updating raid_disks via sysfs

Posted by Xiao Ni 1 month, 1 week ago

Hi FengWei and Kuai

On Fri, Dec 26, 2025 at 6:27 PM dannyshih <dannyshih@synology.com> wrote:
>
> From: FengWei Shih <dannyshih@synology.com>
>
> In raid1_reshape(), freeze_array() is called before modifying the r1bio
> memory pool (conf->r1bio_pool) and conf->raid_disks, and
> unfreeze_array() is called after the update is completed.
>
> However, freeze_array() only waits until nr_sync_pending and
> (nr_pending - nr_queued) of all buckets reaches zero. When an I/O error
> occurs, nr_queued is increased and the corresponding r1bio is queued to
> either retry_list or bio_end_io_list. As a result, freeze_array() may
> unblock before these r1bios are released.

Could you explain more about this? Why can't freeze_array block when
io error occurs? If io error occurs, the nr_pending number should be
equal with nr_queued, right?

Best Regards
Xiao
>
> This can lead to a situation where conf->raid_disks and the mempool have
> already been updated while queued r1bios, allocated with the old
> raid_disks value, are later released. Consequently, free_r1bio() may
> access memory out of bounds in put_all_bios() and release r1bios of the
> wrong size to the new mempool, potentially causing issues with the
> mempool as well.
>
> Since only normal I/O might increase nr_queued while an I/O error occurs,
> suspending the array avoids this issue.
>
> Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends
> the array. Therefore, we suspend the array when updating raid_disks
> via sysfs to avoid this issue too.
>
> Signed-off-by: FengWei Shih <dannyshih@synology.com>
> ---
> v2:
>   * Suspend array unconditionally when updating raid_disks
>   * Refine commit message to describe the issue more concretely
> ---
>  drivers/md/md.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index e5922a682953..6bcbe1c7483c 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4407,7 +4407,7 @@ raid_disks_store(struct mddev *mddev, const char *buf, size_t len)
>         if (err < 0)
>                 return err;
>
> -       err = mddev_lock(mddev);
> +       err = mddev_suspend_and_lock(mddev);
>         if (err)
>                 return err;
>         if (mddev->pers)
> @@ -4432,7 +4432,7 @@ raid_disks_store(struct mddev *mddev, const char *buf, size_t len)
>         } else
>                 mddev->raid_disks = n;
>  out_unlock:
> -       mddev_unlock(mddev);
> +       mddev_unlock_and_resume(mddev);
>         return err ? err : len;
>  }
>  static struct md_sysfs_entry md_raid_disks =
> --
> 2.17.1
>
>
> Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.
>

Re: [PATCH v2] md: suspend array while updating raid_disks via sysfs

Posted by FengWei Shih 1 month, 1 week ago

Hi Xiao

Xiao Ni 於 2025/12/29 下午 05:19 寫道:
> Hi FengWei and Kuai
>
> On Fri, Dec 26, 2025 at 6:27 PM dannyshih<dannyshih@synology.com> wrote:
>> From: FengWei Shih<dannyshih@synology.com>
>>
>> In raid1_reshape(), freeze_array() is called before modifying the r1bio
>> memory pool (conf->r1bio_pool) and conf->raid_disks, and
>> unfreeze_array() is called after the update is completed.
>>
>> However, freeze_array() only waits until nr_sync_pending and
>> (nr_pending - nr_queued) of all buckets reaches zero. When an I/O error
>> occurs, nr_queued is increased and the corresponding r1bio is queued to
>> either retry_list or bio_end_io_list. As a result, freeze_array() may
>> unblock before these r1bios are released.
> Could you explain more about this? Why can't freeze_array block when
> io error occurs? If io error occurs, the nr_pending number should be
> equal with nr_queued, right?
>
> Best Regards
> Xiao

Even though nr_pending == nr_queued, the r1bio is still alive and was
allocated under the old raid_disks value. Allowing freeze_array()
to unblock at this point permits an in-flight r1bio to span across the
reshape.

Assuming raid_disks is changed from 2 to 3, the sequence is roughly:

   normal I/O                        raid1 reshape

   nr_pending++

   r1bio allocated
   (raid_disks == 2)

   /* submit I/O */
                                 echo 3 > /sys/block/mdX/md/raid_disks
                                   raid1_reshape() -> freeze_array()
                                  (waiting for nr_pending == nr_queued)

   I/O error occurs and triggers
   reschedule_retry()
   r1bio queued to retry_list
   nr_queued++       ------------>  freeze_array() unblocks
                                    conf->r1bio_pool is changed
                                    conf->raid_disks is changed
                                    unfreeze_array()

   /* r1bio retry handling */
   free r1bio
   (conf->raid_disks == 3)

Therefore, freeze_array() cannot guarantee that all r1bios allocated
under the old array layout have been fully processed and freed.

Thanks,
FengWei Shih

>> This can lead to a situation where conf->raid_disks and the mempool have
>> already been updated while queued r1bios, allocated with the old
>> raid_disks value, are later released. Consequently, free_r1bio() may
>> access memory out of bounds in put_all_bios() and release r1bios of the
>> wrong size to the new mempool, potentially causing issues with the
>> mempool as well.
>>
>> Since only normal I/O might increase nr_queued while an I/O error occurs,
>> suspending the array avoids this issue.
>>
>> Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends
>> the array. Therefore, we suspend the array when updating raid_disks
>> via sysfs to avoid this issue too.
>>
>> Signed-off-by: FengWei Shih<dannyshih@synology.com>
>> ---
>> v2:
>>    * Suspend array unconditionally when updating raid_disks
>>    * Refine commit message to describe the issue more concretely
>> ---
>>   drivers/md/md.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index e5922a682953..6bcbe1c7483c 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4407,7 +4407,7 @@ raid_disks_store(struct mddev *mddev, const char *buf, size_t len)
>>          if (err < 0)
>>                  return err;
>>
>> -       err = mddev_lock(mddev);
>> +       err = mddev_suspend_and_lock(mddev);
>>          if (err)
>>                  return err;
>>          if (mddev->pers)
>> @@ -4432,7 +4432,7 @@ raid_disks_store(struct mddev *mddev, const char *buf, size_t len)
>>          } else
>>                  mddev->raid_disks = n;
>>   out_unlock:
>> -       mddev_unlock(mddev);
>> +       mddev_unlock_and_resume(mddev);
>>          return err ? err : len;
>>   }
>>   static struct md_sysfs_entry md_raid_disks =
>> --
>> 2.17.1
>>
>>
>> Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.
>>


Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.

Re: [PATCH v2] md: suspend array while updating raid_disks via sysfs

Posted by Xiao Ni 1 month, 1 week ago

On Tue, Dec 30, 2025 at 12:08 PM FengWei Shih <dannyshih@synology.com> wrote:
>
> Hi Xiao
>
> Xiao Ni 於 2025/12/29 下午 05:19 寫道:
> > Hi FengWei and Kuai
> >
> > On Fri, Dec 26, 2025 at 6:27 PM dannyshih<dannyshih@synology.com> wrote:
> >> From: FengWei Shih<dannyshih@synology.com>
> >>
> >> In raid1_reshape(), freeze_array() is called before modifying the r1bio
> >> memory pool (conf->r1bio_pool) and conf->raid_disks, and
> >> unfreeze_array() is called after the update is completed.
> >>
> >> However, freeze_array() only waits until nr_sync_pending and
> >> (nr_pending - nr_queued) of all buckets reaches zero. When an I/O error
> >> occurs, nr_queued is increased and the corresponding r1bio is queued to
> >> either retry_list or bio_end_io_list. As a result, freeze_array() may
> >> unblock before these r1bios are released.
> > Could you explain more about this? Why can't freeze_array block when
> > io error occurs? If io error occurs, the nr_pending number should be
> > equal with nr_queued, right?
> >
> > Best Regards
> > Xiao
>
> Even though nr_pending == nr_queued, the r1bio is still alive and was
> allocated under the old raid_disks value. Allowing freeze_array()
> to unblock at this point permits an in-flight r1bio to span across the
> reshape.
>
> Assuming raid_disks is changed from 2 to 3, the sequence is roughly:
>
>    normal I/O                        raid1 reshape
>
>    nr_pending++
>
>    r1bio allocated
>    (raid_disks == 2)
>
>    /* submit I/O */
>                                  echo 3 > /sys/block/mdX/md/raid_disks
>                                    raid1_reshape() -> freeze_array()
>                                   (waiting for nr_pending == nr_queued)
>
>    I/O error occurs and triggers
>    reschedule_retry()
>    r1bio queued to retry_list
>    nr_queued++       ------------>  freeze_array() unblocks
>                                     conf->r1bio_pool is changed
>                                     conf->raid_disks is changed
>                                     unfreeze_array()
>
>    /* r1bio retry handling */
>    free r1bio
>    (conf->raid_disks == 3)
>
> Therefore, freeze_array() cannot guarantee that all r1bios allocated
> under the old array layout have been fully processed and freed.

Ah I c, thanks very much for the detailed explanation!

Best Regards
Xiao
>
> Thanks,
> FengWei Shih
>
> >> This can lead to a situation where conf->raid_disks and the mempool have
> >> already been updated while queued r1bios, allocated with the old
> >> raid_disks value, are later released. Consequently, free_r1bio() may
> >> access memory out of bounds in put_all_bios() and release r1bios of the
> >> wrong size to the new mempool, potentially causing issues with the
> >> mempool as well.
> >>
> >> Since only normal I/O might increase nr_queued while an I/O error occurs,
> >> suspending the array avoids this issue.
> >>
> >> Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends
> >> the array. Therefore, we suspend the array when updating raid_disks
> >> via sysfs to avoid this issue too.
> >>
> >> Signed-off-by: FengWei Shih<dannyshih@synology.com>
> >> ---
> >> v2:
> >>    * Suspend array unconditionally when updating raid_disks
> >>    * Refine commit message to describe the issue more concretely
> >> ---
> >>   drivers/md/md.c | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index e5922a682953..6bcbe1c7483c 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -4407,7 +4407,7 @@ raid_disks_store(struct mddev *mddev, const char *buf, size_t len)
> >>          if (err < 0)
> >>                  return err;
> >>
> >> -       err = mddev_lock(mddev);
> >> +       err = mddev_suspend_and_lock(mddev);
> >>          if (err)
> >>                  return err;
> >>          if (mddev->pers)
> >> @@ -4432,7 +4432,7 @@ raid_disks_store(struct mddev *mddev, const char *buf, size_t len)
> >>          } else
> >>                  mddev->raid_disks = n;
> >>   out_unlock:
> >> -       mddev_unlock(mddev);
> >> +       mddev_unlock_and_resume(mddev);
> >>          return err ? err : len;
> >>   }
> >>   static struct md_sysfs_entry md_raid_disks =
> >> --
> >> 2.17.1
> >>
> >>
> >> Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.
> >>
>
>
> Disclaimer: The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or phone and delete this message and its attachments, if any.
>

Re: [PATCH v2] md: suspend array while updating raid_disks via sysfs

Posted by Yu Kuai 1 month, 1 week ago

在 2025/12/26 18:18, dannyshih 写道:

> From: FengWei Shih<dannyshih@synology.com>
>
> In raid1_reshape(), freeze_array() is called before modifying the r1bio
> memory pool (conf->r1bio_pool) and conf->raid_disks, and
> unfreeze_array() is called after the update is completed.
>
> However, freeze_array() only waits until nr_sync_pending and
> (nr_pending - nr_queued) of all buckets reaches zero. When an I/O error
> occurs, nr_queued is increased and the corresponding r1bio is queued to
> either retry_list or bio_end_io_list. As a result, freeze_array() may
> unblock before these r1bios are released.
>
> This can lead to a situation where conf->raid_disks and the mempool have
> already been updated while queued r1bios, allocated with the old
> raid_disks value, are later released. Consequently, free_r1bio() may
> access memory out of bounds in put_all_bios() and release r1bios of the
> wrong size to the new mempool, potentially causing issues with the
> mempool as well.
>
> Since only normal I/O might increase nr_queued while an I/O error occurs,
> suspending the array avoids this issue.
>
> Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends
> the array. Therefore, we suspend the array when updating raid_disks
> via sysfs to avoid this issue too.
>
> Signed-off-by: FengWei Shih<dannyshih@synology.com>
> ---
> v2:
>    * Suspend array unconditionally when updating raid_disks
>    * Refine commit message to describe the issue more concretely
> ---
>   drivers/md/md.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Applied to md-6.19

-- 
Thansk,
Kuai