Failfast is a feature implemented only for RAID1 and RAID10. It instructs
the block device providing the rdev to immediately return a bio error
without retrying if any issue occurs. This allows quickly detaching a
problematic rdev and minimizes IO latency.
Due to its nature, failfast bios can fail easily, and md must not mark
an essential rdev as Faulty or set MD_BROKEN on the array just because
a failfast bio failed.
When failfast was introduced, RAID1 and RAID10 were designed to continue
operating normally even if md_error was called for the last rdev. However,
with the introduction of MD_BROKEN in RAID1/RAID10
in commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), calling
md_error for the last rdev now prevents further writes to the array.
Despite this, the current failfast error handler still assumes that
calling md_error will not break the array.
Normally, this is not an issue because MD_FAILFAST is not set when a bio
is issued to the last rdev. However, if the array is not degraded and a
bio with MD_FAILFAST has been issued, simultaneous failures could
potentially break the array. This is unusual but can happen; for example,
this can occur when using NVMe over TCP if all rdevs depend on
a single Ethernet link.
In other words, this becomes a problem under the following conditions:
Preconditions:
* Failfast is enabled on all rdevs.
* All rdevs are In_sync - This is a requirement for bio to be submit
with MD_FAILFAST.
* At least one bio has been submitted but has not yet completed.
Trigger condition:
* All underlying devices of the rdevs return an error for their failfast
bios.
Whether the bio is a read or a write makes little difference to the
outcome.
In the write case, md_error is invoked on each rdev through its bi_end_io
handler.
In the read case, losing the first rdev triggers a metadata
update. Next, md_super_write, unlike raid1_write_request, issues the bio
with MD_FAILFAST if the rdev supports it, causing the bio to fail
immediately - Before this patchset, LastDev was set only by the failure
path in super_written. Consequently, super_written calls md_error on the
remaining rdev.
Prior to this commit, the following changes were introduced:
* The helper function md_bio_failure_error() that skips the error handler
if a failfast bio targets the last rdev.
* Serialization md_error() and md_bio_failure_error().
* Setting the LastDev flag for rdevs that must not be lost.
This commit uses md_bio_failure_error() instead of md_error() for failfast
bio failures, ensuring that failfast bios do not stop array operations.
Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10")
Signed-off-by: Kenta Akagi <k@mgml.me>
---
drivers/md/md.c | 5 +----
drivers/md/raid1.c | 37 ++++++++++++++++++-------------------
drivers/md/raid10.c | 9 +++++----
3 files changed, 24 insertions(+), 27 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 65fdd9bae8f4..65814bbe9bad 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1004,11 +1004,8 @@ static void super_written(struct bio *bio)
if (bio->bi_status) {
pr_err("md: %s gets error=%d\n", __func__,
blk_status_to_errno(bio->bi_status));
- md_error(mddev, rdev);
- if (!test_bit(Faulty, &rdev->flags)
- && (bio->bi_opf & MD_FAILFAST)) {
+ if (!md_bio_failure_error(mddev, rdev, bio))
set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags);
- }
}
bio_put(bio);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 32ad6b102ff7..8fff9dacc6e0 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -470,7 +470,7 @@ static void raid1_end_write_request(struct bio *bio)
(bio->bi_opf & MD_FAILFAST) &&
/* We never try FailFast to WriteMostly devices */
!test_bit(WriteMostly, &rdev->flags)) {
- md_error(r1_bio->mddev, rdev);
+ md_bio_failure_error(r1_bio->mddev, rdev, bio);
}
/*
@@ -2178,8 +2178,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
if (test_bit(FailFast, &rdev->flags)) {
/* Don't try recovering from here - just fail it
* ... unless it is the last working device of course */
- md_error(mddev, rdev);
- if (test_bit(Faulty, &rdev->flags))
+ if (md_bio_failure_error(mddev, rdev, bio))
/* Don't try to read from here, but make sure
* put_buf does it's thing
*/
@@ -2657,9 +2656,8 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
{
struct mddev *mddev = conf->mddev;
- struct bio *bio;
+ struct bio *bio, *updated_bio;
struct md_rdev *rdev;
- sector_t sector;
clear_bit(R1BIO_ReadError, &r1_bio->state);
/* we got a read error. Maybe the drive is bad. Maybe just
@@ -2672,29 +2670,30 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
*/
bio = r1_bio->bios[r1_bio->read_disk];
- bio_put(bio);
- r1_bio->bios[r1_bio->read_disk] = NULL;
+ updated_bio = NULL;
rdev = conf->mirrors[r1_bio->read_disk].rdev;
- if (mddev->ro == 0
- && !test_bit(FailFast, &rdev->flags)) {
- freeze_array(conf, 1);
- fix_read_error(conf, r1_bio);
- unfreeze_array(conf);
- } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) {
- md_error(mddev, rdev);
+ if (mddev->ro == 0) {
+ if (!test_bit(FailFast, &rdev->flags)) {
+ freeze_array(conf, 1);
+ fix_read_error(conf, r1_bio);
+ unfreeze_array(conf);
+ } else {
+ md_bio_failure_error(mddev, rdev, bio);
+ }
} else {
- r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
+ updated_bio = IO_BLOCKED;
}
+ bio_put(bio);
+ r1_bio->bios[r1_bio->read_disk] = updated_bio;
+
rdev_dec_pending(rdev, conf->mddev);
- sector = r1_bio->sector;
- bio = r1_bio->master_bio;
/* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */
r1_bio->state = 0;
- raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio);
- allow_barrier(conf, sector);
+ raid1_read_request(mddev, r1_bio->master_bio, r1_bio->sectors, r1_bio);
+ allow_barrier(conf, r1_bio->sector);
}
static void raid1d(struct md_thread *thread)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index dc4edd4689f8..b73af94a88b0 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -488,7 +488,7 @@ static void raid10_end_write_request(struct bio *bio)
dec_rdev = 0;
if (test_bit(FailFast, &rdev->flags) &&
(bio->bi_opf & MD_FAILFAST)) {
- md_error(rdev->mddev, rdev);
+ md_bio_failure_error(rdev->mddev, rdev, bio);
}
/*
@@ -2443,7 +2443,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
continue;
} else if (test_bit(FailFast, &rdev->flags)) {
/* Just give up on this device */
- md_error(rdev->mddev, rdev);
+ md_bio_failure_error(rdev->mddev, rdev, tbio);
continue;
}
/* Ok, we need to write this bio, either to correct an
@@ -2895,8 +2895,9 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
freeze_array(conf, 1);
fix_read_error(conf, mddev, r10_bio);
unfreeze_array(conf);
- } else
- md_error(mddev, rdev);
+ } else {
+ md_bio_failure_error(mddev, rdev, bio);
+ }
rdev_dec_pending(rdev, mddev);
r10_bio->state = 0;
--
2.50.1
Hi, 在 2025/09/15 11:42, Kenta Akagi 写道: > Failfast is a feature implemented only for RAID1 and RAID10. It instructs > the block device providing the rdev to immediately return a bio error > without retrying if any issue occurs. This allows quickly detaching a > problematic rdev and minimizes IO latency. > > Due to its nature, failfast bios can fail easily, and md must not mark > an essential rdev as Faulty or set MD_BROKEN on the array just because > a failfast bio failed. > > When failfast was introduced, RAID1 and RAID10 were designed to continue > operating normally even if md_error was called for the last rdev. However, > with the introduction of MD_BROKEN in RAID1/RAID10 > in commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), calling > md_error for the last rdev now prevents further writes to the array. > Despite this, the current failfast error handler still assumes that > calling md_error will not break the array. > > Normally, this is not an issue because MD_FAILFAST is not set when a bio > is issued to the last rdev. However, if the array is not degraded and a > bio with MD_FAILFAST has been issued, simultaneous failures could > potentially break the array. This is unusual but can happen; for example, > this can occur when using NVMe over TCP if all rdevs depend on > a single Ethernet link. > > In other words, this becomes a problem under the following conditions: > Preconditions: > * Failfast is enabled on all rdevs. > * All rdevs are In_sync - This is a requirement for bio to be submit > with MD_FAILFAST. > * At least one bio has been submitted but has not yet completed. > > Trigger condition: > * All underlying devices of the rdevs return an error for their failfast > bios. > > Whether the bio is a read or a write makes little difference to the > outcome. > In the write case, md_error is invoked on each rdev through its bi_end_io > handler. > In the read case, losing the first rdev triggers a metadata > update. Next, md_super_write, unlike raid1_write_request, issues the bio > with MD_FAILFAST if the rdev supports it, causing the bio to fail > immediately - Before this patchset, LastDev was set only by the failure > path in super_written. Consequently, super_written calls md_error on the > remaining rdev. > > Prior to this commit, the following changes were introduced: > * The helper function md_bio_failure_error() that skips the error handler > if a failfast bio targets the last rdev. > * Serialization md_error() and md_bio_failure_error(). > * Setting the LastDev flag for rdevs that must not be lost. > > This commit uses md_bio_failure_error() instead of md_error() for failfast > bio failures, ensuring that failfast bios do not stop array operations. > > Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") > Signed-off-by: Kenta Akagi <k@mgml.me> > --- > drivers/md/md.c | 5 +---- > drivers/md/raid1.c | 37 ++++++++++++++++++------------------- > drivers/md/raid10.c | 9 +++++---- > 3 files changed, 24 insertions(+), 27 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 65fdd9bae8f4..65814bbe9bad 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -1004,11 +1004,8 @@ static void super_written(struct bio *bio) > if (bio->bi_status) { > pr_err("md: %s gets error=%d\n", __func__, > blk_status_to_errno(bio->bi_status)); > - md_error(mddev, rdev); > - if (!test_bit(Faulty, &rdev->flags) > - && (bio->bi_opf & MD_FAILFAST)) { > + if (!md_bio_failure_error(mddev, rdev, bio)) > set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); > - } > } > > bio_put(bio); > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 32ad6b102ff7..8fff9dacc6e0 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -470,7 +470,7 @@ static void raid1_end_write_request(struct bio *bio) > (bio->bi_opf & MD_FAILFAST) && > /* We never try FailFast to WriteMostly devices */ > !test_bit(WriteMostly, &rdev->flags)) { > - md_error(r1_bio->mddev, rdev); > + md_bio_failure_error(r1_bio->mddev, rdev, bio); > } Can following check of faulty replaced with return value? > > /* > @@ -2178,8 +2178,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > if (test_bit(FailFast, &rdev->flags)) { > /* Don't try recovering from here - just fail it > * ... unless it is the last working device of course */ > - md_error(mddev, rdev); > - if (test_bit(Faulty, &rdev->flags)) > + if (md_bio_failure_error(mddev, rdev, bio)) > /* Don't try to read from here, but make sure > * put_buf does it's thing > */ > @@ -2657,9 +2656,8 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio) > static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) > { > struct mddev *mddev = conf->mddev; > - struct bio *bio; > + struct bio *bio, *updated_bio; > struct md_rdev *rdev; > - sector_t sector; > > clear_bit(R1BIO_ReadError, &r1_bio->state); > /* we got a read error. Maybe the drive is bad. Maybe just > @@ -2672,29 +2670,30 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) > */ > > bio = r1_bio->bios[r1_bio->read_disk]; > - bio_put(bio); > - r1_bio->bios[r1_bio->read_disk] = NULL; > + updated_bio = NULL; > > rdev = conf->mirrors[r1_bio->read_disk].rdev; > - if (mddev->ro == 0 > - && !test_bit(FailFast, &rdev->flags)) { > - freeze_array(conf, 1); > - fix_read_error(conf, r1_bio); > - unfreeze_array(conf); > - } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) { > - md_error(mddev, rdev); > + if (mddev->ro == 0) { > + if (!test_bit(FailFast, &rdev->flags)) { > + freeze_array(conf, 1); > + fix_read_error(conf, r1_bio); > + unfreeze_array(conf); > + } else { > + md_bio_failure_error(mddev, rdev, bio); > + } > } else { > - r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; > + updated_bio = IO_BLOCKED; > } I'll suggest a separate patch to cleanup the conditions first, it's better for code review. BTW, I'll prefer if else chain insted of nested if else, perhaps following is better: if (mddev->ro != 0) { /* read-only */ } else if (!test_bit(FailFast, &rdev->flags) { /* read-write and failfast is not set */ } else { /* read-write and failfast is set */ } > > + bio_put(bio); > + r1_bio->bios[r1_bio->read_disk] = updated_bio; > + > rdev_dec_pending(rdev, conf->mddev); > - sector = r1_bio->sector; > - bio = r1_bio->master_bio; > > /* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */ > r1_bio->state = 0; > - raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio); > - allow_barrier(conf, sector); > + raid1_read_request(mddev, r1_bio->master_bio, r1_bio->sectors, r1_bio); > + allow_barrier(conf, r1_bio->sector); > } > > static void raid1d(struct md_thread *thread) > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index dc4edd4689f8..b73af94a88b0 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -488,7 +488,7 @@ static void raid10_end_write_request(struct bio *bio) > dec_rdev = 0; > if (test_bit(FailFast, &rdev->flags) && > (bio->bi_opf & MD_FAILFAST)) { > - md_error(rdev->mddev, rdev); > + md_bio_failure_error(rdev->mddev, rdev, bio); > } > Same as raid1, can following check of faulty replaced of return value. > /* > @@ -2443,7 +2443,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) > continue; > } else if (test_bit(FailFast, &rdev->flags)) { > /* Just give up on this device */ > - md_error(rdev->mddev, rdev); > + md_bio_failure_error(rdev->mddev, rdev, tbio); > continue; > } > /* Ok, we need to write this bio, either to correct an > @@ -2895,8 +2895,9 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio) > freeze_array(conf, 1); > fix_read_error(conf, mddev, r10_bio); > unfreeze_array(conf); > - } else > - md_error(mddev, rdev); > + } else { > + md_bio_failure_error(mddev, rdev, bio); > + } > > rdev_dec_pending(rdev, mddev); > r10_bio->state = 0; > And please split this patch for raid1 and raid10. Thanks Kuai
On 2025/09/18 10:26, Yu Kuai wrote: > Hi, > > 在 2025/09/15 11:42, Kenta Akagi 写道: >> Failfast is a feature implemented only for RAID1 and RAID10. It instructs >> the block device providing the rdev to immediately return a bio error >> without retrying if any issue occurs. This allows quickly detaching a >> problematic rdev and minimizes IO latency. >> >> Due to its nature, failfast bios can fail easily, and md must not mark >> an essential rdev as Faulty or set MD_BROKEN on the array just because >> a failfast bio failed. >> >> When failfast was introduced, RAID1 and RAID10 were designed to continue >> operating normally even if md_error was called for the last rdev. However, >> with the introduction of MD_BROKEN in RAID1/RAID10 >> in commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), calling >> md_error for the last rdev now prevents further writes to the array. >> Despite this, the current failfast error handler still assumes that >> calling md_error will not break the array. >> >> Normally, this is not an issue because MD_FAILFAST is not set when a bio >> is issued to the last rdev. However, if the array is not degraded and a >> bio with MD_FAILFAST has been issued, simultaneous failures could >> potentially break the array. This is unusual but can happen; for example, >> this can occur when using NVMe over TCP if all rdevs depend on >> a single Ethernet link. >> >> In other words, this becomes a problem under the following conditions: >> Preconditions: >> * Failfast is enabled on all rdevs. >> * All rdevs are In_sync - This is a requirement for bio to be submit >> with MD_FAILFAST. >> * At least one bio has been submitted but has not yet completed. >> >> Trigger condition: >> * All underlying devices of the rdevs return an error for their failfast >> bios. >> >> Whether the bio is a read or a write makes little difference to the >> outcome. >> In the write case, md_error is invoked on each rdev through its bi_end_io >> handler. >> In the read case, losing the first rdev triggers a metadata >> update. Next, md_super_write, unlike raid1_write_request, issues the bio >> with MD_FAILFAST if the rdev supports it, causing the bio to fail >> immediately - Before this patchset, LastDev was set only by the failure >> path in super_written. Consequently, super_written calls md_error on the >> remaining rdev. >> >> Prior to this commit, the following changes were introduced: >> * The helper function md_bio_failure_error() that skips the error handler >> if a failfast bio targets the last rdev. >> * Serialization md_error() and md_bio_failure_error(). >> * Setting the LastDev flag for rdevs that must not be lost. >> >> This commit uses md_bio_failure_error() instead of md_error() for failfast >> bio failures, ensuring that failfast bios do not stop array operations. >> >> Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") >> Signed-off-by: Kenta Akagi <k@mgml.me> >> --- >> drivers/md/md.c | 5 +---- >> drivers/md/raid1.c | 37 ++++++++++++++++++------------------- >> drivers/md/raid10.c | 9 +++++---- >> 3 files changed, 24 insertions(+), 27 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 65fdd9bae8f4..65814bbe9bad 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -1004,11 +1004,8 @@ static void super_written(struct bio *bio) >> if (bio->bi_status) { >> pr_err("md: %s gets error=%d\n", __func__, >> blk_status_to_errno(bio->bi_status)); >> - md_error(mddev, rdev); >> - if (!test_bit(Faulty, &rdev->flags) >> - && (bio->bi_opf & MD_FAILFAST)) { >> + if (!md_bio_failure_error(mddev, rdev, bio)) >> set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); >> - } >> } >> bio_put(bio); >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 32ad6b102ff7..8fff9dacc6e0 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -470,7 +470,7 @@ static void raid1_end_write_request(struct bio *bio) >> (bio->bi_opf & MD_FAILFAST) && >> /* We never try FailFast to WriteMostly devices */ >> !test_bit(WriteMostly, &rdev->flags)) { >> - md_error(r1_bio->mddev, rdev); >> + md_bio_failure_error(r1_bio->mddev, rdev, bio); >> } > > Can following check of faulty replaced with return value? In the case where raid1_end_write_request is called for a non-failfast IO, and the rdev has already been marked Faulty by another bio, it must not retry too. I think it would be simpler not to use a return value here. >> /* >> @@ -2178,8 +2178,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio) >> if (test_bit(FailFast, &rdev->flags)) { >> /* Don't try recovering from here - just fail it >> * ... unless it is the last working device of course */ >> - md_error(mddev, rdev); >> - if (test_bit(Faulty, &rdev->flags)) >> + if (md_bio_failure_error(mddev, rdev, bio)) >> /* Don't try to read from here, but make sure >> * put_buf does it's thing >> */ >> @@ -2657,9 +2656,8 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio) >> static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) >> { >> struct mddev *mddev = conf->mddev; >> - struct bio *bio; >> + struct bio *bio, *updated_bio; >> struct md_rdev *rdev; >> - sector_t sector; >> clear_bit(R1BIO_ReadError, &r1_bio->state); >> /* we got a read error. Maybe the drive is bad. Maybe just >> @@ -2672,29 +2670,30 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) >> */ >> bio = r1_bio->bios[r1_bio->read_disk]; >> - bio_put(bio); >> - r1_bio->bios[r1_bio->read_disk] = NULL; >> + updated_bio = NULL; >> rdev = conf->mirrors[r1_bio->read_disk].rdev; >> - if (mddev->ro == 0 >> - && !test_bit(FailFast, &rdev->flags)) { >> - freeze_array(conf, 1); >> - fix_read_error(conf, r1_bio); >> - unfreeze_array(conf); >> - } else if (mddev->ro == 0 && test_bit(FailFast, &rdev->flags)) { >> - md_error(mddev, rdev); >> + if (mddev->ro == 0) { >> + if (!test_bit(FailFast, &rdev->flags)) { >> + freeze_array(conf, 1); >> + fix_read_error(conf, r1_bio); >> + unfreeze_array(conf); >> + } else { >> + md_bio_failure_error(mddev, rdev, bio); >> + } >> } else { >> - r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED; >> + updated_bio = IO_BLOCKED; >> } > > I'll suggest a separate patch to cleanup the conditions first, it's > better for code review. Thank you for your guidance. I will split the commit. > > BTW, I'll prefer if else chain insted of nested if else, perhaps > following is better: > > if (mddev->ro != 0) { > /* read-only */ > } else if (!test_bit(FailFast, &rdev->flags) { > /* read-write and failfast is not set */ > } else { > /* read-write and failfast is set */ > } Ah, this looks much readable. I'll fix. Thanks. >> + bio_put(bio); >> + r1_bio->bios[r1_bio->read_disk] = updated_bio; >> + >> rdev_dec_pending(rdev, conf->mddev); >> - sector = r1_bio->sector; >> - bio = r1_bio->master_bio; >> /* Reuse the old r1_bio so that the IO_BLOCKED settings are preserved */ >> r1_bio->state = 0; >> - raid1_read_request(mddev, bio, r1_bio->sectors, r1_bio); >> - allow_barrier(conf, sector); >> + raid1_read_request(mddev, r1_bio->master_bio, r1_bio->sectors, r1_bio); >> + allow_barrier(conf, r1_bio->sector); >> } >> static void raid1d(struct md_thread *thread) >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c >> index dc4edd4689f8..b73af94a88b0 100644 >> --- a/drivers/md/raid10.c >> +++ b/drivers/md/raid10.c >> @@ -488,7 +488,7 @@ static void raid10_end_write_request(struct bio *bio) >> dec_rdev = 0; >> if (test_bit(FailFast, &rdev->flags) && >> (bio->bi_opf & MD_FAILFAST)) { >> - md_error(rdev->mddev, rdev); >> + md_bio_failure_error(rdev->mddev, rdev, bio); >> } >> > > Same as raid1, can following check of faulty replaced of return value. Same as RAID1, non-Failfast IO must also be handled. Therefore, the Faulty bit should be checked instead of relying on the return value. >> /* >> @@ -2443,7 +2443,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) >> continue; >> } else if (test_bit(FailFast, &rdev->flags)) { >> /* Just give up on this device */ >> - md_error(rdev->mddev, rdev); >> + md_bio_failure_error(rdev->mddev, rdev, tbio); >> continue; >> } >> /* Ok, we need to write this bio, either to correct an >> @@ -2895,8 +2895,9 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio) >> freeze_array(conf, 1); >> fix_read_error(conf, mddev, r10_bio); >> unfreeze_array(conf); >> - } else >> - md_error(mddev, rdev); >> + } else { >> + md_bio_failure_error(mddev, rdev, bio); >> + } >> rdev_dec_pending(rdev, mddev); >> r10_bio->state = 0; >> > > And please split this patch for raid1 and raid10. Understood, I will split it. Thanks, Akagi > > Thanks > Kuai > >
Hi, 在 2025/9/18 23:22, Kenta Akagi 写道: >>> @@ -470,7 +470,7 @@ static void raid1_end_write_request(struct bio *bio) >>> (bio->bi_opf & MD_FAILFAST) && >>> /* We never try FailFast to WriteMostly devices */ >>> !test_bit(WriteMostly, &rdev->flags)) { >>> - md_error(r1_bio->mddev, rdev); >>> + md_bio_failure_error(r1_bio->mddev, rdev, bio); >>> } >> Can following check of faulty replaced with return value? > In the case where raid1_end_write_request is called for a non-failfast IO, > and the rdev has already been marked Faulty by another bio, it must not retry too. > I think it would be simpler not to use a return value here. You can just add Faulty check inside md_bio_failure_error() as well, and both failfast and writemostly check. Thanks, Kuai
© 2016 - 2025 Red Hat, Inc.