[PATCH v2 00/12] Improve Raid5 Lock Contention

Logan Gunthorpe posted 12 patches 4 years ago
There is a newer version of this series
drivers/md/raid5.c | 632 +++++++++++++++++++++++++++++----------------
drivers/md/raid5.h |   1 +
2 files changed, 410 insertions(+), 223 deletions(-)
[PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Logan Gunthorpe 4 years ago
Hi,

This is v2 of this series which addresses Christoph's feedback and
fixes some bugs. The first posting is at [1]. A git branch is
available at [2].

--

I've been doing some work trying to improve the bulk write performance
of raid5 on large systems with fast NVMe drives. The bottleneck appears
largely to be lock contention on the hash_lock and device_lock. This
series improves the situation slightly by addressing a couple of low
hanging fruit ways to take the lock fewer times in the request path.

Patch 9 adjusts how batching works by keeping a reference to the
previous stripe_head in raid5_make_request(). Under most situtations,
this removes the need to take the hash_lock in stripe_add_to_batch_list()
which should reduce the number of times the lock is taken by a factor of
about 2.

Patch 12 pivots the way raid5_make_request() works. Before the patch, the
code must find the stripe_head for every 4KB page in the request, so each
stripe head must be found once for every data disk. The patch changes this
so that all the data disks can be added to a stripe_head at once and the
number of times the stripe_head must be found (and thus the number of
times the hash_lock is taken) should be reduced by a factor roughly equal
to the number of data disks.

The remaining patches are just cleanup and prep patches for those two
patches.

Doing apples to apples testing this series on a small VM with 5 ram
disks, I saw a bandwidth increase of roughly 14% and lock contentions
on the hash_lock (as reported by lock stat) reduced by more than a factor
of 5 (though it is still significantly contended).

Testing on larger systems with NVMe drives saw similar small bandwidth
increases from 3% to 20% depending on the parameters. Oddly small arrays
had larger gains, likely due to them having lower starting bandwidths; I
would have expected larger gains with larger arrays (seeing there
should have been even fewer locks taken in raid5_make_request()).

Logan

[1] https://lkml.kernel.org/r/20220407164511.8472-1-logang@deltatee.com
[2] https://github.com/sbates130272/linux-p2pmem raid5_lock_cont_v2

--

Changes since v1:
  - Rebased on current md-next branch (190a901246c69d79)
  - Added patch to create a helper for checking if a sector
    is ahead of the reshape (per Christoph)
  - Reworked the __find_stripe() patch to create a find_get_stripe()
    helper (per Christoph)
  - Added more patches to further refactor raid5_make_request() and
    pull most of the loop body into a helper function (per Christoph)
  - A few other minor cleanups (boolean return, droping casting when
    printing sectors, commit message grammar) as suggested by Christoph.
  - Fixed two uncommon but bad data corruption bugs in that were found.

--

Logan Gunthorpe (12):
  md/raid5: Factor out ahead_of_reshape() function
  md/raid5: Refactor raid5_make_request loop
  md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio()
  md/raid5: Move common stripe count increment code into __find_stripe()
  md/raid5: Factor out helper from raid5_make_request() loop
  md/raid5: Drop the do_prepare flag in raid5_make_request()
  md/raid5: Move read_seqcount_begin() into make_stripe_request()
  md/raid5: Refactor for loop in raid5_make_request() into while loop
  md/raid5: Keep a reference to last stripe_head for batch
  md/raid5: Refactor add_stripe_bio()
  md/raid5: Check all disks in a stripe_head for reshape progress
  md/raid5: Pivot raid5_make_request()

 drivers/md/raid5.c | 632 +++++++++++++++++++++++++++++----------------
 drivers/md/raid5.h |   1 +
 2 files changed, 410 insertions(+), 223 deletions(-)


base-commit: 190a901246c69d79dadd1ab574022548da612724
--
2.30.2
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Song Liu 4 years ago
On Wed, Apr 20, 2022 at 12:55 PM Logan Gunthorpe <logang@deltatee.com> wrote:
>
> Hi,
>
> This is v2 of this series which addresses Christoph's feedback and
> fixes some bugs. The first posting is at [1]. A git branch is
> available at [2].
>
> --
>
> I've been doing some work trying to improve the bulk write performance
> of raid5 on large systems with fast NVMe drives. The bottleneck appears
> largely to be lock contention on the hash_lock and device_lock. This
> series improves the situation slightly by addressing a couple of low
> hanging fruit ways to take the lock fewer times in the request path.
>
> Patch 9 adjusts how batching works by keeping a reference to the
> previous stripe_head in raid5_make_request(). Under most situtations,
> this removes the need to take the hash_lock in stripe_add_to_batch_list()
> which should reduce the number of times the lock is taken by a factor of
> about 2.
>
> Patch 12 pivots the way raid5_make_request() works. Before the patch, the
> code must find the stripe_head for every 4KB page in the request, so each
> stripe head must be found once for every data disk. The patch changes this
> so that all the data disks can be added to a stripe_head at once and the
> number of times the stripe_head must be found (and thus the number of
> times the hash_lock is taken) should be reduced by a factor roughly equal
> to the number of data disks.
>
> The remaining patches are just cleanup and prep patches for those two
> patches.
>
> Doing apples to apples testing this series on a small VM with 5 ram
> disks, I saw a bandwidth increase of roughly 14% and lock contentions
> on the hash_lock (as reported by lock stat) reduced by more than a factor
> of 5 (though it is still significantly contended).
>
> Testing on larger systems with NVMe drives saw similar small bandwidth
> increases from 3% to 20% depending on the parameters. Oddly small arrays
> had larger gains, likely due to them having lower starting bandwidths; I
> would have expected larger gains with larger arrays (seeing there
> should have been even fewer locks taken in raid5_make_request()).
>
> Logan
>
> [1] https://lkml.kernel.org/r/20220407164511.8472-1-logang@deltatee.com
> [2] https://github.com/sbates130272/linux-p2pmem raid5_lock_cont_v2
>

The set looks good to me overall. Thanks everyone for the review and feedback.

Logan, please incorporate feedback and send v3.

Thanks,
Song
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Guoqing Jiang 4 years ago

On 4/21/22 3:54 AM, Logan Gunthorpe wrote:
> Hi,
>
> This is v2 of this series which addresses Christoph's feedback and
> fixes some bugs. The first posting is at [1]. A git branch is
> available at [2].
>
> --
>
> I've been doing some work trying to improve the bulk write performance
> of raid5 on large systems with fast NVMe drives. The bottleneck appears
> largely to be lock contention on the hash_lock and device_lock. This
> series improves the situation slightly by addressing a couple of low
> hanging fruit ways to take the lock fewer times in the request path.
>
> Patch 9 adjusts how batching works by keeping a reference to the
> previous stripe_head in raid5_make_request(). Under most situtations,
> this removes the need to take the hash_lock in stripe_add_to_batch_list()
> which should reduce the number of times the lock is taken by a factor of
> about 2.
>
> Patch 12 pivots the way raid5_make_request() works. Before the patch, the
> code must find the stripe_head for every 4KB page in the request, so each
> stripe head must be found once for every data disk. The patch changes this
> so that all the data disks can be added to a stripe_head at once and the
> number of times the stripe_head must be found (and thus the number of
> times the hash_lock is taken) should be reduced by a factor roughly equal
> to the number of data disks.
>
> The remaining patches are just cleanup and prep patches for those two
> patches.
>
> Doing apples to apples testing this series on a small VM with 5 ram
> disks, I saw a bandwidth increase of roughly 14% and lock contentions
> on the hash_lock (as reported by lock stat) reduced by more than a factor
> of 5 (though it is still significantly contended).
>
> Testing on larger systems with NVMe drives saw similar small bandwidth
> increases from 3% to 20% depending on the parameters. Oddly small arrays
> had larger gains, likely due to them having lower starting bandwidths; I
> would have expected larger gains with larger arrays (seeing there
> should have been even fewer locks taken in raid5_make_request()).
>
> Logan
>
> [1] https://lkml.kernel.org/r/20220407164511.8472-1-logang@deltatee.com
> [2] https://github.com/sbates130272/linux-p2pmem raid5_lock_cont_v2
>
> --
>
> Changes since v1:
>    - Rebased on current md-next branch (190a901246c69d79)
>    - Added patch to create a helper for checking if a sector
>      is ahead of the reshape (per Christoph)
>    - Reworked the __find_stripe() patch to create a find_get_stripe()
>      helper (per Christoph)
>    - Added more patches to further refactor raid5_make_request() and
>      pull most of the loop body into a helper function (per Christoph)
>    - A few other minor cleanups (boolean return, droping casting when
>      printing sectors, commit message grammar) as suggested by Christoph.
>    - Fixed two uncommon but bad data corruption bugs in that were found.
>
> --
>
> Logan Gunthorpe (12):
>    md/raid5: Factor out ahead_of_reshape() function
>    md/raid5: Refactor raid5_make_request loop
>    md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio()
>    md/raid5: Move common stripe count increment code into __find_stripe()
>    md/raid5: Factor out helper from raid5_make_request() loop
>    md/raid5: Drop the do_prepare flag in raid5_make_request()
>    md/raid5: Move read_seqcount_begin() into make_stripe_request()
>    md/raid5: Refactor for loop in raid5_make_request() into while loop
>    md/raid5: Keep a reference to last stripe_head for batch
>    md/raid5: Refactor add_stripe_bio()
>    md/raid5: Check all disks in a stripe_head for reshape progress
>    md/raid5: Pivot raid5_make_request()

Generally, I don't object the cleanup patches since the code looks more 
cleaner.
But my concern is that since some additional function calls are added to 
hot path
(raid5_make_request), could the performance be affected?

And I think patch 9 and patch 12 are helpful for performance 
improvement,  did
you measure the performance without those cleanup patches?

Thanks,
Guoqing
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Logan Gunthorpe 4 years ago

On 2022-04-24 01:53, Guoqing Jiang wrote:
> 
> 
> On 4/21/22 3:54 AM, Logan Gunthorpe wrote:
>> Hi,
>>
>> This is v2 of this series which addresses Christoph's feedback and
>> fixes some bugs. The first posting is at [1]. A git branch is
>> available at [2].
>>
>> --
>>
>> I've been doing some work trying to improve the bulk write performance
>> of raid5 on large systems with fast NVMe drives. The bottleneck appears
>> largely to be lock contention on the hash_lock and device_lock. This
>> series improves the situation slightly by addressing a couple of low
>> hanging fruit ways to take the lock fewer times in the request path.
>>
>> Patch 9 adjusts how batching works by keeping a reference to the
>> previous stripe_head in raid5_make_request(). Under most situtations,
>> this removes the need to take the hash_lock in stripe_add_to_batch_list()
>> which should reduce the number of times the lock is taken by a factor of
>> about 2.
>>
>> Patch 12 pivots the way raid5_make_request() works. Before the patch, the
>> code must find the stripe_head for every 4KB page in the request, so each
>> stripe head must be found once for every data disk. The patch changes this
>> so that all the data disks can be added to a stripe_head at once and the
>> number of times the stripe_head must be found (and thus the number of
>> times the hash_lock is taken) should be reduced by a factor roughly equal
>> to the number of data disks.
>>
>> The remaining patches are just cleanup and prep patches for those two
>> patches.
>>
>> Doing apples to apples testing this series on a small VM with 5 ram
>> disks, I saw a bandwidth increase of roughly 14% and lock contentions
>> on the hash_lock (as reported by lock stat) reduced by more than a factor
>> of 5 (though it is still significantly contended).
>>
>> Testing on larger systems with NVMe drives saw similar small bandwidth
>> increases from 3% to 20% depending on the parameters. Oddly small arrays
>> had larger gains, likely due to them having lower starting bandwidths; I
>> would have expected larger gains with larger arrays (seeing there
>> should have been even fewer locks taken in raid5_make_request()).
>>
>> Logan
>>
>> [1] https://lkml.kernel.org/r/20220407164511.8472-1-logang@deltatee.com
>> [2] https://github.com/sbates130272/linux-p2pmem raid5_lock_cont_v2
>>
>> --
>>
>> Changes since v1:
>>    - Rebased on current md-next branch (190a901246c69d79)
>>    - Added patch to create a helper for checking if a sector
>>      is ahead of the reshape (per Christoph)
>>    - Reworked the __find_stripe() patch to create a find_get_stripe()
>>      helper (per Christoph)
>>    - Added more patches to further refactor raid5_make_request() and
>>      pull most of the loop body into a helper function (per Christoph)
>>    - A few other minor cleanups (boolean return, droping casting when
>>      printing sectors, commit message grammar) as suggested by Christoph.
>>    - Fixed two uncommon but bad data corruption bugs in that were found.
>>
>> --
>>
>> Logan Gunthorpe (12):
>>    md/raid5: Factor out ahead_of_reshape() function
>>    md/raid5: Refactor raid5_make_request loop
>>    md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio()
>>    md/raid5: Move common stripe count increment code into __find_stripe()
>>    md/raid5: Factor out helper from raid5_make_request() loop
>>    md/raid5: Drop the do_prepare flag in raid5_make_request()
>>    md/raid5: Move read_seqcount_begin() into make_stripe_request()
>>    md/raid5: Refactor for loop in raid5_make_request() into while loop
>>    md/raid5: Keep a reference to last stripe_head for batch
>>    md/raid5: Refactor add_stripe_bio()
>>    md/raid5: Check all disks in a stripe_head for reshape progress
>>    md/raid5: Pivot raid5_make_request()
> 
> Generally, I don't object the cleanup patches since the code looks more 
> cleaner.
> But my concern is that since some additional function calls are added to 
> hot path
> (raid5_make_request), could the performance be affected?

There's a bit of logic added to the raid5_make_requests but it is all
local and should be fast, and it reduces the amount of calls to the slow
contended locks.

> And I think patch 9 and patch 12 are helpful for performance 
> improvement,  did
> you measure the performance without those cleanup patches?

Yes, I compared performance with and without this entire series.

Logan
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Xiao Ni 4 years ago
On Thu, Apr 21, 2022 at 3:55 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>
> Hi,
>
> This is v2 of this series which addresses Christoph's feedback and
> fixes some bugs. The first posting is at [1]. A git branch is
> available at [2].
>
> --
>
> I've been doing some work trying to improve the bulk write performance
> of raid5 on large systems with fast NVMe drives. The bottleneck appears
> largely to be lock contention on the hash_lock and device_lock. This
> series improves the situation slightly by addressing a couple of low
> hanging fruit ways to take the lock fewer times in the request path.
>
> Patch 9 adjusts how batching works by keeping a reference to the
> previous stripe_head in raid5_make_request(). Under most situtations,
> this removes the need to take the hash_lock in stripe_add_to_batch_list()
> which should reduce the number of times the lock is taken by a factor of
> about 2.
>
> Patch 12 pivots the way raid5_make_request() works. Before the patch, the
> code must find the stripe_head for every 4KB page in the request, so each
> stripe head must be found once for every data disk. The patch changes this
> so that all the data disks can be added to a stripe_head at once and the
> number of times the stripe_head must be found (and thus the number of
> times the hash_lock is taken) should be reduced by a factor roughly equal
> to the number of data disks.
>
> The remaining patches are just cleanup and prep patches for those two
> patches.
>
> Doing apples to apples testing this series on a small VM with 5 ram
> disks, I saw a bandwidth increase of roughly 14% and lock contentions
> on the hash_lock (as reported by lock stat) reduced by more than a factor
> of 5 (though it is still significantly contended).
>
> Testing on larger systems with NVMe drives saw similar small bandwidth
> increases from 3% to 20% depending on the parameters. Oddly small arrays
> had larger gains, likely due to them having lower starting bandwidths; I
> would have expected larger gains with larger arrays (seeing there
> should have been even fewer locks taken in raid5_make_request()).


Hi Logan

Could you share the commands to get the test result (lock contention
and performance)?

Regards
Xiao
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Logan Gunthorpe 4 years ago

On 2022-04-21 02:45, Xiao Ni wrote:
> Could you share the commands to get the test result (lock contention
> and performance)?

Sure. The performance we were focused on was large block writes. So we
setup raid5 instances with varying number of disks and ran the following
fio script directly on the drive.

[simple]
filename=/dev/md0
ioengine=libaio
rw=write
direct=1
size=8G
blocksize=2m
iodepth=16
runtime=30s
time_based=1
offset_increment=8G
numjobs=12

(We also played around with tuning this but didn't find substantial
changes once the bottleneck was hit)

We tuned md with parameters like:

echo 4 > /sys/block/md0/md/group_thread_cnt
echo 8192 > /sys/block/md0/md/stripe_cache_size

For lock contention stats, we just used lockstat[1]; roughly like:

echo 1 > /proc/sys/kernel/lock_stat
fio test.fio
echo 0 > /proc/sys/kernel/lock_stat
cat /proc/lock_stat

And compared the before and after.

Logan

[1] https://www.kernel.org/doc/html/latest/locking/lockstat.html
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Guoqing Jiang 4 years ago

On 4/22/22 12:02 AM, Logan Gunthorpe wrote:
>
> On 2022-04-21 02:45, Xiao Ni wrote:
>> Could you share the commands to get the test result (lock contention
>> and performance)?
> Sure. The performance we were focused on was large block writes. So we
> setup raid5 instances with varying number of disks and ran the following
> fio script directly on the drive.
>
> [simple]
> filename=/dev/md0
> ioengine=libaio
> rw=write
> direct=1
> size=8G
> blocksize=2m
> iodepth=16
> runtime=30s
> time_based=1
> offset_increment=8G
> numjobs=12
> 
> (We also played around with tuning this but didn't find substantial
> changes once the bottleneck was hit)

Nice, I suppose other IO patterns keep the same performance as before.

> We tuned md with parameters like:
>
> echo 4 > /sys/block/md0/md/group_thread_cnt
> echo 8192 > /sys/block/md0/md/stripe_cache_size
>
> For lock contention stats, we just used lockstat[1]; roughly like:
>
> echo 1 > /proc/sys/kernel/lock_stat
> fio test.fio
> echo 0 > /proc/sys/kernel/lock_stat
> cat /proc/lock_stat
>
> And compared the before and after.

Thanks for your effort, besides the performance test, please try to run
mdadm test suites to avoid regression.

Thanks,
Guoqing
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Logan Gunthorpe 4 years ago

On 2022-04-24 02:00, Guoqing Jiang wrote:
> 
> 
> On 4/22/22 12:02 AM, Logan Gunthorpe wrote:
>>
>> On 2022-04-21 02:45, Xiao Ni wrote:
>>> Could you share the commands to get the test result (lock contention
>>> and performance)?
>> Sure. The performance we were focused on was large block writes. So we
>> setup raid5 instances with varying number of disks and ran the following
>> fio script directly on the drive.
>>
>> [simple]
>> filename=/dev/md0
>> ioengine=libaio
>> rw=write
>> direct=1
>> size=8G
>> blocksize=2m
>> iodepth=16
>> runtime=30s
>> time_based=1
>> offset_increment=8G
>> numjobs=12
>> 
>> (We also played around with tuning this but didn't find substantial
>> changes once the bottleneck was hit)
> 
> Nice, I suppose other IO patterns keep the same performance as before.
> 
>> We tuned md with parameters like:
>>
>> echo 4 > /sys/block/md0/md/group_thread_cnt
>> echo 8192 > /sys/block/md0/md/stripe_cache_size
>>
>> For lock contention stats, we just used lockstat[1]; roughly like:
>>
>> echo 1 > /proc/sys/kernel/lock_stat
>> fio test.fio
>> echo 0 > /proc/sys/kernel/lock_stat
>> cat /proc/lock_stat
>>
>> And compared the before and after.
> 
> Thanks for your effort, besides the performance test, please try to run
> mdadm test suites to avoid regression.

Yeah, is there any documentation for that? I tried to look into it but
couldn't figure out how it's run.

I do know that lkp-tests has run it on this series as I did get an error
from it. But while I'm pretty sure that error has been resolved, I was
never able to figure out how to run them locally.

Thanks,

Logan
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Xiao Ni 4 years ago
On Mon, Apr 25, 2022 at 11:39 PM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2022-04-24 02:00, Guoqing Jiang wrote:
> >
> >
> > On 4/22/22 12:02 AM, Logan Gunthorpe wrote:
> >>
> >> On 2022-04-21 02:45, Xiao Ni wrote:
> >>> Could you share the commands to get the test result (lock contention
> >>> and performance)?
> >> Sure. The performance we were focused on was large block writes. So we
> >> setup raid5 instances with varying number of disks and ran the following
> >> fio script directly on the drive.
> >>
> >> [simple]
> >> filename=/dev/md0
> >> ioengine=libaio
> >> rw=write
> >> direct=1
> >> size=8G
> >> blocksize=2m
> >> iodepth=16
> >> runtime=30s
> >> time_based=1
> >> offset_increment=8G
> >> numjobs=12
> >> 
> >> (We also played around with tuning this but didn't find substantial
> >> changes once the bottleneck was hit)
> >
> > Nice, I suppose other IO patterns keep the same performance as before.
> >
> >> We tuned md with parameters like:
> >>
> >> echo 4 > /sys/block/md0/md/group_thread_cnt
> >> echo 8192 > /sys/block/md0/md/stripe_cache_size
> >>
> >> For lock contention stats, we just used lockstat[1]; roughly like:
> >>
> >> echo 1 > /proc/sys/kernel/lock_stat
> >> fio test.fio
> >> echo 0 > /proc/sys/kernel/lock_stat
> >> cat /proc/lock_stat
> >>
> >> And compared the before and after.
> >
> > Thanks for your effort, besides the performance test, please try to run
> > mdadm test suites to avoid regression.
>
> Yeah, is there any documentation for that? I tried to look into it but
> couldn't figure out how it's run.
>
> I do know that lkp-tests has run it on this series as I did get an error
> from it. But while I'm pretty sure that error has been resolved, I was
> never able to figure out how to run them locally.
>

Hi Logan

You can clone the mdadm repo at
git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
Then you can find there is a script test under the directory. It's not
under the tests directory.
The test cases are under tests directory.

Regards
Xiao
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Logan Gunthorpe 4 years ago

On 2022-04-25 10:12, Xiao Ni wrote:
>> I do know that lkp-tests has run it on this series as I did get an error
>> from it. But while I'm pretty sure that error has been resolved, I was
>> never able to figure out how to run them locally.
>>
> 
> Hi Logan
> 
> You can clone the mdadm repo at
> git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
> Then you can find there is a script test under the directory. It's not
> under the tests directory.
> The test cases are under tests directory.

So I've been fighting with this and it seems there are just a ton of
failures in these tests without my changes. Running on the latest master
(52c67fcdd6dad) with stock v5.17.5 I see major brokenness. About 17 out
of 44 tests that run failed. I had to run with --disable-integrity
because those tests seem to hang on an infinite loop waiting for the md
array to go into the U state (even though it appears idle).

Even though I ran the tests with '--keep-going', the testing stopped
after the 07revert-grow reported errors in dmesg -- even though the only
errors printed to dmesg were that of mdadm segfaulting.

Running on md/md-next seems to get a bit further (to
10ddf-create-fail-rebuild) and stops with the same segfaulting issue (or
perhaps the 07 test only randomly fails first -- I haven't run it that
many times). Though most of the tests between these points fail anyway.

My upcoming v3 patches cause no failures that are different from the
md/md-next branch. But it seems these tests have rotted to the point
that they aren't all that useful; or maybe there are a ton of
regressions in the kernel already and nobody was paying much attention.
I have also tried to test certain cases that appear broken in recent
kernels anyway (like reducing the number of disks in a raid5 array hangs
on the first stripe to reshape).

In any case I have a very rough ad-hoc test suite I've been expanding
that is targeted at testing my specific changes. Testing these changes
has definitely been challenging. In any case, I've published my tests here:

https://github.com/Eideticom/raid5-tests

Logan
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Guoqing Jiang 4 years ago

On 4/29/22 5:22 AM, Logan Gunthorpe wrote:
>
> On 2022-04-25 10:12, Xiao Ni wrote:
>>> I do know that lkp-tests has run it on this series as I did get an error
>>> from it. But while I'm pretty sure that error has been resolved, I was
>>> never able to figure out how to run them locally.
>>>
>> Hi Logan
>>
>> You can clone the mdadm repo at
>> git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
>> Then you can find there is a script test under the directory. It's not
>> under the tests directory.
>> The test cases are under tests directory.
> So I've been fighting with this and it seems there are just a ton of
> failures in these tests without my changes. Running on the latest master
> (52c67fcdd6dad) with stock v5.17.5 I see major brokenness. About 17 out
> of 44 tests that run failed. I had to run with --disable-integrity
> because those tests seem to hang on an infinite loop waiting for the md
> array to go into the U state (even though it appears idle).
>
> Even though I ran the tests with '--keep-going', the testing stopped
> after the 07revert-grow reported errors in dmesg -- even though the only
> errors printed to dmesg were that of mdadm segfaulting.
>
> Running on md/md-next seems to get a bit further (to
> 10ddf-create-fail-rebuild) and stops with the same segfaulting issue (or
> perhaps the 07 test only randomly fails first -- I haven't run it that
> many times). Though most of the tests between these points fail anyway.
>
> My upcoming v3 patches cause no failures that are different from the
> md/md-next branch. But it seems these tests have rotted to the point
> that they aren't all that useful; or maybe there are a ton of
> regressions in the kernel already and nobody was paying much attention.

I can't agree with you anymore. I would say some patches were submitted
without run enough tests, then after one by one kernel release, the thing
becomes worse.

This is also the reason that I recommend run mdadm tests since md raid
is a complex subsystem, perhaps a simple change could cause regression.
And considering there are really limited developers and reviewers in the
community, the chance to cause regression get bigger.

> I have also tried to test certain cases that appear broken in recent
> kernels anyway (like reducing the number of disks in a raid5 array hangs
> on the first stripe to reshape).
>
> In any case I have a very rough ad-hoc test suite I've been expanding
> that is targeted at testing my specific changes. Testing these changes
> has definitely been challenging. In any case, I've published my tests here:
>
> https://github.com/Eideticom/raid5-tests

If I may, is it possible to submit your tests to mdadm as well? So we can
have one common place to contain enough tests.

Thanks,
Guoqing
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Logan Gunthorpe 4 years ago

On 2022-04-28 18:49, Guoqing Jiang wrote:
> I can't agree with you anymore. I would say some patches were submitted
> without run enough tests, then after one by one kernel release, the thing
> becomes worse.

I'm not sure where we disagree here. I certainly don't want to introduce
regressions myself. I haven't submitted v3 yet because I've become less
certain that there are no regressions in it. The point of my last email
was try to explain that I am taking testing seriously.

> This is also the reason that I recommend run mdadm tests since md raid
> is a complex subsystem, perhaps a simple change could cause regression.
> And considering there are really limited developers and reviewers in the
> community, the chance to cause regression get bigger.

While I'd certainly like to run mdadm tests, they appear to be very
broken to me. Too broken for me to fix all of it -- I don't have time
for fixing that many issues. Seems I'm not the only one to run into this
problem recently:

https://lore.kernel.org/linux-raid/20220111130635.00001478@linux.intel.com/T/#t

And it's a shame nobody could even bother to remove the unsupported 0.9
metadata tests from the repo as a result of this conversation.

> If I may, is it possible to submit your tests to mdadm as well? So we can
> have one common place to contain enough tests.

I'd certainly consider that if I could run the test suite. Though one
hitch is that I've found I need to run my tests repeatedly, for hours,
before hitting some rare bugs. Running the tests only once is much
easier to pass. It's hard to fully test things like this with so many
rare retry paths in a simple regression test.

Logan
Re: [PATCH v2 00/12] Improve Raid5 Lock Contention
Posted by Guoqing Jiang 4 years ago

On 4/30/22 12:01 AM, Logan Gunthorpe wrote:
>
> On 2022-04-28 18:49, Guoqing Jiang wrote:
>> I can't agree with you anymore. I would say some patches were submitted
>> without run enough tests, then after one by one kernel release, the thing
>> becomes worse.
> I'm not sure where we disagree here. I certainly don't want to introduce
> regressions myself. I haven't submitted v3 yet because I've become less
> certain that there are no regressions in it. The point of my last email
> was try to explain that I am taking testing seriously.

That is my intention too, no more new regression.

>> This is also the reason that I recommend run mdadm tests since md raid
>> is a complex subsystem, perhaps a simple change could cause regression.
>> And considering there are really limited developers and reviewers in the
>> community, the chance to cause regression get bigger.
> While I'd certainly like to run mdadm tests, they appear to be very
> broken to me. Too broken for me to fix all of it -- I don't have time
> for fixing that many issues.

I do agree it is not reasonable to ask you to fix them,  just compare 
the test result
with and without your set, at least there is no more new failure as said.

> Seems I'm not the only one to run into this problem recently:
>
> https://lore.kernel.org/linux-raid/20220111130635.00001478@linux.intel.com/T/#t
>
> And it's a shame nobody could even bother to remove the unsupported 0.9
> metadata tests from the repo as a result of this conversation.
>
>> If I may, is it possible to submit your tests to mdadm as well? So we can
>> have one common place to contain enough tests.
> I'd certainly consider that if I could run the test suite. Though one
> hitch is that I've found I need to run my tests repeatedly, for hours,
> before hitting some rare bugs. Running the tests only once is much
> easier to pass. It's hard to fully test things like this with so many
> rare retry paths in a simple regression test.

Let's focus on raid456 test given the code only touches raid5, you can 
pass argument
like this, FYI.

mdadm> ./test --raidtype=raid456 --dev=loop

Thanks,
Guoqing