Hi,
在 2025/10/27 23:04, Kenta Akagi 写道:
> The failfast feature in RAID1 and RAID10 assumes that when md_error() is
> called, the array remains functional because the last rdev neither fails
> nor sets MD_BROKEN.
>
> However, the current implementation can cause the array to lose
> its last in-sync device or be marked as MD_BROKEN, which breaks the
> assumption and can lead to array failure.
>
> To address this issue, a new handler md_cond_error() will be introduced
> to ensure that failfast I/O does not mark the array as broken.
>
> As preparation, this commit adds a helper pers->should_error() to determine
> from outside the personality whether an rdev can fail safely, which is
> needed by md_cond_error().
>
> Signed-off-by: Kenta Akagi <k@mgml.me>
> ---
> drivers/md/md.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index c982598cbf97..01c8182431d1 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -763,6 +763,7 @@ struct md_personality
> * if appropriate, and should abort recovery if needed
> */
> void (*error_handler)(struct mddev *mddev, struct md_rdev *rdev);
> + bool (*should_error)(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio);
I think the name is not quite accurate, perhaps error_handler_check()?
Thanks,
Kuai
> int (*hot_add_disk) (struct mddev *mddev, struct md_rdev *rdev);
> int (*hot_remove_disk) (struct mddev *mddev, struct md_rdev *rdev);
> int (*spare_active) (struct mddev *mddev);