[PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()

Yu Kuai posted 7 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by Yu Kuai 1 month, 1 week ago
From: Yu Kuai <yukuai3@huawei.com>

On the one hand unify bio split code, prepare to fix disordered split
IO; On the other hand fix missing blkcg_bio_issue_init() and
trace_block_split() for split IO.

Noted discard is not handled, because discard is only splited for
unaligned head and tail.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid10.c | 53 ++++++++++++++++++++-------------------------
 drivers/md/raid10.h |  1 +
 2 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index b60c30bfb6c7..b8777661307b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -322,10 +322,12 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
 	struct bio *bio = r10_bio->master_bio;
 	struct r10conf *conf = r10_bio->mddev->private;
 
-	if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
-		bio->bi_status = BLK_STS_IOERR;
+	if (!test_and_set_bit(R10BIO_Returned, &r10_bio->state)) {
+		if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
+			bio->bi_status = BLK_STS_IOERR;
+		bio_endio(bio);
+	}
 
-	bio_endio(bio);
 	/*
 	 * Wake up any possible resync thread that waits for the device
 	 * to go idle.
@@ -1154,7 +1156,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	int slot = r10_bio->read_slot;
 	struct md_rdev *err_rdev = NULL;
 	gfp_t gfp = GFP_NOIO;
-	int error;
 
 	if (slot >= 0 && r10_bio->devs[slot].rdev) {
 		/*
@@ -1203,17 +1204,16 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 				   rdev->bdev,
 				   (unsigned long long)r10_bio->sector);
 	if (max_sectors < bio_sectors(bio)) {
-		struct bio *split = bio_split(bio, max_sectors,
-					      gfp, &conf->bio_split);
-		if (IS_ERR(split)) {
-			error = PTR_ERR(split);
-			goto err_handle;
-		}
-		bio_chain(split, bio);
 		allow_barrier(conf);
-		submit_bio_noacct(bio);
+		bio = bio_submit_split(bio, max_sectors, &conf->bio_split);
 		wait_barrier(conf, false);
-		bio = split;
+
+		if (!bio) {
+			set_bit(R10BIO_Returned, &r10_bio->state);
+			goto err_handle;
+		}
+
+		bio->bi_opf &= ~REQ_NOMERGE;
 		r10_bio->master_bio = bio;
 		r10_bio->sectors = max_sectors;
 	}
@@ -1239,10 +1239,9 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	mddev_trace_remap(mddev, read_bio, r10_bio->sector);
 	submit_bio_noacct(read_bio);
 	return;
+
 err_handle:
 	atomic_dec(&rdev->nr_pending);
-	bio->bi_status = errno_to_blk_status(error);
-	set_bit(R10BIO_Uptodate, &r10_bio->state);
 	raid_end_bio_io(r10_bio);
 }
 
@@ -1351,7 +1350,6 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 	int i, k;
 	sector_t sectors;
 	int max_sectors;
-	int error;
 
 	if ((mddev_is_clustered(mddev) &&
 	     mddev->cluster_ops->area_resyncing(mddev, WRITE,
@@ -1465,10 +1463,8 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 				 * complexity of supporting that is not worth
 				 * the benefit.
 				 */
-				if (bio->bi_opf & REQ_ATOMIC) {
-					error = -EIO;
+				if (bio->bi_opf & REQ_ATOMIC)
 					goto err_handle;
-				}
 
 				good_sectors = first_bad - dev_sector;
 				if (good_sectors < max_sectors)
@@ -1489,17 +1485,16 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 		r10_bio->sectors = max_sectors;
 
 	if (r10_bio->sectors < bio_sectors(bio)) {
-		struct bio *split = bio_split(bio, r10_bio->sectors,
-					      GFP_NOIO, &conf->bio_split);
-		if (IS_ERR(split)) {
-			error = PTR_ERR(split);
-			goto err_handle;
-		}
-		bio_chain(split, bio);
 		allow_barrier(conf);
-		submit_bio_noacct(bio);
+		bio = bio_submit_split(bio, r10_bio->sectors, &conf->bio_split);
 		wait_barrier(conf, false);
-		bio = split;
+
+		if (!bio) {
+			set_bit(R10BIO_Returned, &r10_bio->state);
+			goto err_handle;
+		}
+
+		bio->bi_opf &= ~REQ_NOMERGE;
 		r10_bio->master_bio = bio;
 	}
 
@@ -1531,8 +1526,6 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 		}
 	}
 
-	bio->bi_status = errno_to_blk_status(error);
-	set_bit(R10BIO_Uptodate, &r10_bio->state);
 	raid_end_bio_io(r10_bio);
 }
 
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index 3f16ad6904a9..cc167e708125 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -165,6 +165,7 @@ enum r10bio_state {
  * so that raid10d knows what to do with them.
  */
 	R10BIO_ReadError,
+	R10BIO_Returned,
 /* If a write for this request means we can clear some
  * known-bad-block records, we set this flag.
  */
-- 
2.39.2
Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by Christoph Hellwig 1 month, 1 week ago
>  		allow_barrier(conf);
> +		bio = bio_submit_split(bio, max_sectors, &conf->bio_split);
>  		wait_barrier(conf, false);
> +
> +		if (!bio) {
> +			set_bit(R10BIO_Returned, &r10_bio->state);
> +			goto err_handle;
> +		}

The NULL return should only happen for REQ_NOWAIT here, so maybe
give R10BIO_Returned a more descriptive name?  Also please document
the flag in the header.

Any maybe yhe code wants a splitting helper instead of open coding
setting this flag in multiple places?
Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by Yu Kuai 1 month, 1 week ago
Hi,

在 2025/08/25 18:59, Christoph Hellwig 写道:
>>   		allow_barrier(conf);
>> +		bio = bio_submit_split(bio, max_sectors, &conf->bio_split);
>>   		wait_barrier(conf, false);
>> +
>> +		if (!bio) {
>> +			set_bit(R10BIO_Returned, &r10_bio->state);
>> +			goto err_handle;
>> +		}
> 
> The NULL return should only happen for REQ_NOWAIT here, so maybe
> give R10BIO_Returned a more descriptive name?  Also please document
> the flag in the header.

And also atomic write here, if bio has to split due to badblocks here.
The flag is refer to raid1. I can add cocument for both raid1 and raid10
in this case.

> 
> Any maybe yhe code wants a splitting helper instead of open coding
> setting this flag in multiple places?
> .
> 
Yes.

Thanks,
Kuai

Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by Christoph Hellwig 1 month, 1 week ago
On Tue, Aug 26, 2025 at 09:13:41AM +0800, Yu Kuai wrote:
> > The NULL return should only happen for REQ_NOWAIT here, so maybe
> > give R10BIO_Returned a more descriptive name?  Also please document
> > the flag in the header.
> 
> And also atomic write here, if bio has to split due to badblocks here.
> The flag is refer to raid1. I can add cocument for both raid1 and raid10
> in this case.

Umm, that's actually a red flag.  If a device guarantees atomic behavior
it can't just fail it.  So I think REQ_ATOMIC should be disallowed
for md raid with bad block tracking.

>
Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by Yu Kuai 1 month, 1 week ago
Hi,

在 2025/08/26 15:55, Christoph Hellwig 写道:
> On Tue, Aug 26, 2025 at 09:13:41AM +0800, Yu Kuai wrote:
>>> The NULL return should only happen for REQ_NOWAIT here, so maybe
>>> give R10BIO_Returned a more descriptive name?  Also please document
>>> the flag in the header.
>>
>> And also atomic write here, if bio has to split due to badblocks here.
>> The flag is refer to raid1. I can add cocument for both raid1 and raid10
>> in this case.
> 
> Umm, that's actually a red flag.  If a device guarantees atomic behavior
> it can't just fail it.  So I think REQ_ATOMIC should be disallowed
> for md raid with bad block tracking.
> 

I agree that do not look good, however, John explained while adding this
that user should retry and fallback without REQ_ATOMIC to make things
work as usual.

Thanks,
Kuai

>>
> 
> .
> 

Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by anthony 1 month, 1 week ago
On 26/08/2025 10:14, Yu Kuai wrote:
>> Umm, that's actually a red flag.  If a device guarantees atomic behavior
>> it can't just fail it.  So I think REQ_ATOMIC should be disallowed
>> for md raid with bad block tracking.
>>
> 
> I agree that do not look good, however, John explained while adding this
> that user should retry and fallback without REQ_ATOMIC to make things
> work as usual.

Whether a device promises atomic write is orthogonal to whether that 
write succeeds - it could fail for a whole host of reasons, so why can't 
"this is too big to be atomic" just be another reason for failing?

Yes you want to know *why* the write failed, if you can't pass that 
back, then you have a problem, but if you can pass back the error "too 
big for atomic write" then the caller can sort it out.

That then allows the driver - if it knows the block size of the device - 
to manage atomic writes (in the sense that it can refuse writes that are 
too large), even if the device doesn't claim to support it. It can just 
force the caller to submit small enough blocks.

Cheers,
Wol
Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by Christoph Hellwig 1 month, 1 week ago
On Tue, Aug 26, 2025 at 06:35:10PM +0100, anthony wrote:
> On 26/08/2025 10:14, Yu Kuai wrote:
> > > Umm, that's actually a red flag.  If a device guarantees atomic behavior
> > > it can't just fail it.  So I think REQ_ATOMIC should be disallowed
> > > for md raid with bad block tracking.
> > > 
> > 
> > I agree that do not look good, however, John explained while adding this
> > that user should retry and fallback without REQ_ATOMIC to make things
> > work as usual.
> 
> Whether a device promises atomic write is orthogonal to whether that write
> succeeds - it could fail for a whole host of reasons, so why can't "this is
> too big to be atomic" just be another reason for failing?

Too big to be atomic is a valid failure reason.  But the limit needs
to be documented in the queue limits beforehand.
Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by John Garry 1 month ago
On 27/08/2025 08:31, Christoph Hellwig wrote:
> On Tue, Aug 26, 2025 at 06:35:10PM +0100, anthony wrote:
>> On 26/08/2025 10:14, Yu Kuai wrote:
>>>> Umm, that's actually a red flag.  If a device guarantees atomic behavior
>>>> it can't just fail it.  So I think REQ_ATOMIC should be disallowed
>>>> for md raid with bad block tracking.
>>>>
>>>
>>> I agree that do not look good, however, John explained while adding this
>>> that user should retry and fallback without REQ_ATOMIC to make things
>>> work as usual.
>>
>> Whether a device promises atomic write is orthogonal to whether that write
>> succeeds - it could fail for a whole host of reasons, so why can't "this is
>> too big to be atomic" just be another reason for failing?
> 
> Too big to be atomic is a valid failure reason.  But the limit needs
> to be documented in the queue limits beforehand.
> 
> 

What exactly could need to be documented?

We just report -EIO in this case (when we try to write to a bad blocks 
region with REQ_ATOMIC). In general, for RWF_ATOMIC, we report -EINVAL 
for too large/small a size.

BTW, do we realistically expect atomic writes HW support and bad blocks 
ever to meet?

Thanks,
John

Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by Christoph Hellwig 1 month ago
On Tue, Sep 02, 2025 at 07:18:01AM +0100, John Garry wrote:
> BTW, do we realistically expect atomic writes HW support and bad blocks ever
> to meet?

That's the point I'm trying to make.  bad block tracking is stupid
with modern hardware.  Both SSDs and HDDs are overprovisioned on
physical "blocks", and once they run out fine grained bad block tracking
is not going to help.  І really do not understand why md even tries
to do this bad block tracking, but claiming to support atomic writes
while it does is actively harmful.

Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by John Garry 1 month ago
On 02/09/2025 07:30, Christoph Hellwig wrote:
> On Tue, Sep 02, 2025 at 07:18:01AM +0100, John Garry wrote:
>> BTW, do we realistically expect atomic writes HW support and bad blocks ever
>> to meet?
> 
> That's the point I'm trying to make.  bad block tracking is stupid
> with modern hardware.  Both SSDs and HDDs are overprovisioned on
> physical "blocks", and once they run out fine grained bad block tracking
> is not going to help.  І really do not understand why md even tries
> to do this bad block tracking, 

Just because they can try to deal with bad blocks for some (mirroring) 
personalities, I suppose.

> but claiming to support atomic writes
> while it does is actively harmful.
> 

There does not look to be some switch to turn off bad block support. 
That's from briefly checking raid10.c anyway. Kuai, any thoughts on 
whether we should allow this to be disabled?

Thanks,
John
Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by Yu Kuai 1 month ago
Hi,

在 2025/09/02 14:58, John Garry 写道:
> On 02/09/2025 07:30, Christoph Hellwig wrote:
>> On Tue, Sep 02, 2025 at 07:18:01AM +0100, John Garry wrote:
>>> BTW, do we realistically expect atomic writes HW support and bad 
>>> blocks ever
>>> to meet?
>>
>> That's the point I'm trying to make.  bad block tracking is stupid
>> with modern hardware.  Both SSDs and HDDs are overprovisioned on
>> physical "blocks", and once they run out fine grained bad block tracking
>> is not going to help.  І really do not understand why md even tries
>> to do this bad block tracking, 
> 
> Just because they can try to deal with bad blocks for some (mirroring) 
> personalities, I suppose.

I agree it's useless for enterprise storage, however, for personal
storage, there are lots of users using cost-effective (often aging)
disks, badblocks tracking can reduce the risk of data lost, and
make sure these devices will not become waste.

> 
>> but claiming to support atomic writes
>> while it does is actively harmful.
>>
> 
> There does not look to be some switch to turn off bad block support. 
> That's from briefly checking raid10.c anyway. Kuai, any thoughts on 
> whether we should allow this to be disabled?
> 

I remember that I used to suggest always enable failfast in this case,
and badblocks can be bypassed. Anyway, I think it's good to allow this
to be disabled, it will behave very similar to failfast.

Thanks,
Kuai

> Thanks,
> John
> 
> .
> 

Re: [PATCH RFC 4/7] md/raid10: convert read/write to use bio_submit_split()
Posted by John Garry 1 month ago
On 02/09/2025 09:25, Yu Kuai wrote:
>> There does not look to be some switch to turn off bad block support. 
>> That's from briefly checking raid10.c anyway. Kuai, any thoughts on 
>> whether we should allow this to be disabled?
>>
> 
> I remember that I used to suggest always enable failfast in this case,
> and badblocks can be bypassed. Anyway, I think it's good to allow this
> to be disabled, it will behave very similar to failfast.

ok, I can put this on my TODO list, unless someone else wants it.

Thanks,
John