[PATCH RFC v3 00/15] block: fix disordered IO in the case recursive split

Yu Kuai posted 15 patches 1 month ago
block/bio.c                 |  2 +-
block/blk-cgroup.h          |  5 ++-
block/blk-core.c            | 35 +++++++++++++++++----
block/blk-crypto-fallback.c | 15 +++------
block/blk-iolatency.c       | 15 +++------
block/blk-merge.c           | 63 ++++++++++++++++++++++++-------------
block/blk-mq-debugfs.c      |  1 +
block/blk-throttle.c        |  2 +-
block/blk.h                 | 45 ++------------------------
drivers/md/md-linear.c      | 11 ++-----
drivers/md/raid0.c          | 30 ++++++------------
drivers/md/raid1.c          | 38 ++++++++--------------
drivers/md/raid1.h          |  4 ++-
drivers/md/raid10.c         | 54 ++++++++++++++-----------------
drivers/md/raid10.h         |  2 ++
drivers/md/raid5.c          | 10 +++---
include/linux/blk_types.h   |  7 ++---
include/linux/blkdev.h      |  3 ++
18 files changed, 152 insertions(+), 190 deletions(-)
[PATCH RFC v3 00/15] block: fix disordered IO in the case recursive split
Posted by Yu Kuai 1 month ago
From: Yu Kuai <yukuai3@huawei.com>

Changes in v3:
 - add patch 1,2 to cleanup bio_issue;
 - add patch 3,4 to fix missing processing for split bio first;
 - bypass zoned device in patch 14;
Changes in v2:
 - export a new helper bio_submit_split_bioset() instead of
export bio_submit_split() directly;
 - don't set no merge flag in the new helper;
 - add patch 7 and patch 10;
 - add patch 8 to skip bio checks for resubmitting split bio;

patch 1,2 cleanup bio_issue;
patch 3,4 to fix missing processing for split bio;
patch 5 export a bio split helper;
patch 6-12 unify bio split code;
path 13,14 convert the helper to insert split bio to the head of current
bio list;
patch 15 is a follow cleanup for raid0;

This set is just test for raid5 for now, see details in patch 9;

Yu Kuai (15):
  block: cleanup bio_issue
  block: add QUEUE_FLAG_BIO_ISSUE
  md: fix mssing blktrace bio split events
  blk-crypto: fix missing processing for split bio
  block: factor out a helper bio_submit_split_bioset()
  md/raid0: convert raid0_handle_discard() to use
    bio_submit_split_bioset()
  md/raid1: convert to use bio_submit_split_bioset()
  md/raid10: add a new r10bio flag R10BIO_Returned
  md/raid10: convert read/write to use bio_submit_split_bioset()
  md/raid5: convert to use bio_submit_split_bioset()
  md/md-linear: convert to use bio_submit_split_bioset()
  blk-crypto: convert to use bio_submit_split_bioset()
  block: skip unnecessary checks for split bio
  block: fix disordered IO in the case recursive split
  md/raid0: convert raid0_make_request() to use
    bio_submit_split_bioset()

 block/bio.c                 |  2 +-
 block/blk-cgroup.h          |  5 ++-
 block/blk-core.c            | 35 +++++++++++++++++----
 block/blk-crypto-fallback.c | 15 +++------
 block/blk-iolatency.c       | 15 +++------
 block/blk-merge.c           | 63 ++++++++++++++++++++++++-------------
 block/blk-mq-debugfs.c      |  1 +
 block/blk-throttle.c        |  2 +-
 block/blk.h                 | 45 ++------------------------
 drivers/md/md-linear.c      | 11 ++-----
 drivers/md/raid0.c          | 30 ++++++------------
 drivers/md/raid1.c          | 38 ++++++++--------------
 drivers/md/raid1.h          |  4 ++-
 drivers/md/raid10.c         | 54 ++++++++++++++-----------------
 drivers/md/raid10.h         |  2 ++
 drivers/md/raid5.c          | 10 +++---
 include/linux/blk_types.h   |  7 ++---
 include/linux/blkdev.h      |  3 ++
 18 files changed, 152 insertions(+), 190 deletions(-)

-- 
2.39.2
Re: [PATCH RFC v3 00/15] block: fix disordered IO in the case recursive split
Posted by Bart Van Assche 1 month ago
On 8/31/25 8:32 PM, Yu Kuai wrote:
> This set is just test for raid5 for now, see details in patch 9;

Does this mean that this patch series doesn't fix reordering caused by
recursive splitting for zoned block devices? A test case that triggers
an I/O error is available here:
https://lore.kernel.org/linux-block/a8a714c7-de3d-4cc9-8c23-38b8dc06f5bb@acm.org/

I have not yet had the time to review this patch series but plan to take
a look soon.

Thanks,

Bart.
Re: [PATCH RFC v3 00/15] block: fix disordered IO in the case recursive split
Posted by Yu Kuai 1 month ago
Hi,

在 2025/09/01 22:09, Bart Van Assche 写道:
> On 8/31/25 8:32 PM, Yu Kuai wrote:
>> This set is just test for raid5 for now, see details in patch 9;
> 
> Does this mean that this patch series doesn't fix reordering caused by
> recursive splitting for zoned block devices? A test case that triggers
> an I/O error is available here:
> https://lore.kernel.org/linux-block/a8a714c7-de3d-4cc9-8c23-38b8dc06f5bb@acm.org/ 
> 
I'll try this test.

zoned block device is bypassed in patch 14 by:

+		if (split && !bdev_is_zoned(bio->bi_bdev))
+			bio_list_add_head(&current->bio_list[0], bio);

If I can find a reporducer for zoned block, and verify that recursive
split can be fixed as well, I can remove the checking for zoned devices
in the next verison.

Thanks,
Kuai

> 
> I have not yet had the time to review this patch series but plan to take
> a look soon.
> 
> Thanks,
> 
> Bart.
> .
> 

Re: [PATCH RFC v3 00/15] block: fix disordered IO in the case recursive split
Posted by Yu Kuai 1 month ago
Hi,

在 2025/09/02 9:50, Yu Kuai 写道:
> Hi,
> 
> 在 2025/09/01 22:09, Bart Van Assche 写道:
>> On 8/31/25 8:32 PM, Yu Kuai wrote:
>>> This set is just test for raid5 for now, see details in patch 9;
>>
>> Does this mean that this patch series doesn't fix reordering caused by
>> recursive splitting for zoned block devices? A test case that triggers
>> an I/O error is available here:
>> https://lore.kernel.org/linux-block/a8a714c7-de3d-4cc9-8c23-38b8dc06f5bb@acm.org/ 
>>
> I'll try this test.
> 

This test can't run directly in my VM, then I debug a bit and modify the
test a bit, following is the result by the block trace event of
block_io_start:

Before this set:

           dd-3014    [000] .N...  1918.939253: block_io_start: 252,2 WS 
524288 () 0 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1918.952434: block_io_start: 
252,2 WS 524288 () 1024 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .....  1918.973499: block_io_start: 
252,2 WS 524288 () 8192 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1918.984805: block_io_start: 
252,2 WS 524288 () 9216 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .N...  1919.010224: block_io_start: 
252,2 WS 524288 () 16384 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.021667: block_io_start: 
252,2 WS 524288 () 17408 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .....  1919.053072: block_io_start: 
252,2 WS 524288 () 24576 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.064781: block_io_start: 
252,2 WS 524288 () 25600 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .N...  1919.100657: block_io_start: 
252,2 WS 524288 () 32768 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.112999: block_io_start: 
252,2 WS 524288 () 33792 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .....  1919.145032: block_io_start: 
252,2 WS 524288 () 40960 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.156677: block_io_start: 
252,2 WS 524288 () 41984 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .N...  1919.188287: block_io_start: 
252,2 WS 524288 () 49152 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.199869: block_io_start: 
252,2 WS 524288 () 50176 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .N...  1919.233467: block_io_start: 
252,2 WS 524288 () 57344 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.245487: block_io_start: 
252,2 WS 524288 () 58368 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .N...  1919.281146: block_io_start: 
252,2 WS 524288 () 65536 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.292812: block_io_start: 
252,2 WS 524288 () 66560 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .N...  1919.326543: block_io_start: 
252,2 WS 524288 () 73728 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.338412: block_io_start: 
252,2 WS 524288 () 74752 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .N...  1919.374312: block_io_start: 
252,2 WS 524288 () 81920 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.386481: block_io_start: 
252,2 WS 524288 () 82944 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .....  1919.419795: block_io_start: 
252,2 WS 524288 () 90112 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....  1919.431454: block_io_start: 
252,2 WS 524288 () 91136 + 1024 be,0,4 [kworker/0:1H]
               dd-3014    [000] .N...  1919.466208: block_io_start: 
252,2 WS 524288 () 98304 + 1024 be,0,4 [dd]

We can see block_io_start is not sequential, and test will report out of
space failure.

With this set and zone device checking removed:

diff:
diff --git a/block/blk-core.c b/block/blk-core.c
index 6ca3c45f421c..37b5dd396e22 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -746,7 +746,7 @@ void submit_bio_noacct_nocheck(struct bio *bio, bool 
split)
          * it is active, and then process them after it returned.
          */
         if (current->bio_list) {
-               if (split && !bdev_is_zoned(bio->bi_bdev))
+               if (split)
                         bio_list_add_head(&current->bio_list[0], bio);
                 else
                         bio_list_add(&current->bio_list[0], bio);

result:
              dd-612     [000] .N...    52.856395: block_io_start: 252,2 
WS 524288 () 0 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....    52.869947: block_io_start: 
252,2 WS 524288 () 1024 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    52.880295: block_io_start: 
252,2 WS 524288 () 2048 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    52.890541: block_io_start: 
252,2 WS 524288 () 3072 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    52.900951: block_io_start: 
252,2 WS 524288 () 4096 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    52.911370: block_io_start: 
252,2 WS 524288 () 5120 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    52.922160: block_io_start: 
252,2 WS 524288 () 6144 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    52.932823: block_io_start: 
252,2 WS 524288 () 7168 + 1024 be,0,4 [kworker/0:1H]
               dd-612     [000] .N...    52.968469: block_io_start: 
252,2 WS 524288 () 8192 + 1024 be,0,4 [dd]
     kworker/0:1H-37      [000] .....    52.980892: block_io_start: 
252,2 WS 524288 () 9216 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    52.991500: block_io_start: 
252,2 WS 524288 () 10240 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    53.002088: block_io_start: 
252,2 WS 524288 () 11264 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    53.012879: block_io_start: 
252,2 WS 524288 () 12288 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    53.023518: block_io_start: 
252,2 WS 524288 () 13312 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    53.034365: block_io_start: 
252,2 WS 524288 () 14336 + 1024 be,0,4 [kworker/0:1H]
     kworker/0:1H-37      [000] .....    53.045077: block_io_start: 
252,2 WS 524288 () 15360 + 1024 be,0,4 [kworker/0:1H]
               dd-612     [000] .N...    53.082148: block_io_start: 
252,2 WS 524288 () 16384 + 1024 be,0,4 [dd]

We can see that block_io_start is sequential now.

Thanks,
Kuai

> zoned block device is bypassed in patch 14 by:
> 
> +        if (split && !bdev_is_zoned(bio->bi_bdev))
> +            bio_list_add_head(&current->bio_list[0], bio);
> 
> If I can find a reporducer for zoned block, and verify that recursive
> split can be fixed as well, I can remove the checking for zoned devices
> in the next verison.
> 
> Thanks,
> Kuai
> 
>>
>> I have not yet had the time to review this patch series but plan to take
>> a look soon.
>>
>> Thanks,
>>
>> Bart.
>> .
>>
> 
> .
>