[v5] Don't set MD_BROKEN on failfast bio failure

[PATCH v5 03/16] md: add pers->should_error() callback

Posted by Kenta Akagi 3 months, 2 weeks ago

The failfast feature in RAID1 and RAID10 assumes that when md_error() is
called, the array remains functional because the last rdev neither fails
nor sets MD_BROKEN.

However, the current implementation can cause the array to lose
its last in-sync device or be marked as MD_BROKEN, which breaks the
assumption and can lead to array failure.

To address this issue, a new handler md_cond_error() will be introduced
to ensure that failfast I/O does not mark the array as broken.

As preparation, this commit adds a helper pers->should_error() to determine
from outside the personality whether an rdev can fail safely, which is
needed by md_cond_error().

Signed-off-by: Kenta Akagi <k@mgml.me>
---
 drivers/md/md.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index c982598cbf97..01c8182431d1 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -763,6 +763,7 @@ struct md_personality
 	 * if appropriate, and should abort recovery if needed
 	 */
 	void (*error_handler)(struct mddev *mddev, struct md_rdev *rdev);
+	bool (*should_error)(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio);
 	int (*hot_add_disk) (struct mddev *mddev, struct md_rdev *rdev);
 	int (*hot_remove_disk) (struct mddev *mddev, struct md_rdev *rdev);
 	int (*spare_active) (struct mddev *mddev);
-- 
2.50.1

Re: [PATCH v5 03/16] md: add pers->should_error() callback

Posted by Yu Kuai 3 months, 1 week ago

Hi,

在 2025/10/27 23:04, Kenta Akagi 写道:
> The failfast feature in RAID1 and RAID10 assumes that when md_error() is
> called, the array remains functional because the last rdev neither fails
> nor sets MD_BROKEN.
>
> However, the current implementation can cause the array to lose
> its last in-sync device or be marked as MD_BROKEN, which breaks the
> assumption and can lead to array failure.
>
> To address this issue, a new handler md_cond_error() will be introduced
> to ensure that failfast I/O does not mark the array as broken.
>
> As preparation, this commit adds a helper pers->should_error() to determine
> from outside the personality whether an rdev can fail safely, which is
> needed by md_cond_error().
>
> Signed-off-by: Kenta Akagi <k@mgml.me>
> ---
>   drivers/md/md.h | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index c982598cbf97..01c8182431d1 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -763,6 +763,7 @@ struct md_personality
>   	 * if appropriate, and should abort recovery if needed
>   	 */
>   	void (*error_handler)(struct mddev *mddev, struct md_rdev *rdev);
> +	bool (*should_error)(struct mddev *mddev, struct md_rdev *rdev, struct bio *bio);

I think the name is not quite accurate, perhaps error_handler_check()?

Thanks,
Kuai

>   	int (*hot_add_disk) (struct mddev *mddev, struct md_rdev *rdev);
>   	int (*hot_remove_disk) (struct mddev *mddev, struct md_rdev *rdev);
>   	int (*spare_active) (struct mddev *mddev);