[PATCH md-6.12 v2 00/42] md/md-bitmap: introduce bitmap_operations and make structure internal

Yu Kuai posted 42 patches 1 year, 3 months ago
drivers/md/dm-raid.c     |   7 +-
drivers/md/md-bitmap.c   | 568 +++++++++++++++++++++++++++++----------
drivers/md/md-bitmap.h   | 268 ++++--------------
drivers/md/md-cluster.c  |  91 ++++---
drivers/md/md.c          | 155 +++++++----
drivers/md/md.h          |   3 +-
drivers/md/raid1-10.c    |   9 +-
drivers/md/raid1.c       |  78 +++---
drivers/md/raid10.c      |  73 ++---
drivers/md/raid5-cache.c |   8 +-
drivers/md/raid5.c       |  62 ++---
11 files changed, 760 insertions(+), 562 deletions(-)
[PATCH md-6.12 v2 00/42] md/md-bitmap: introduce bitmap_operations and make structure internal
Posted by Yu Kuai 1 year, 3 months ago
From: Yu Kuai <yukuai3@huawei.com>

Changes from RFC v1:
 - add patch 1-8 to prevent dereference bitmap directly, and the last
 patch to make bitmap structure internel;
 - use plain function alls "bitmap_ops->xxx()" directly;

Changes from RFC v2:
 - some coding style;

Changes from v1:
 - add patch 5 to fix __le64 conversion;
 - fix a problem in patch 34;

The background is that currently bitmap is using a global spin_lock,
causing lock contention and huge IO performance degradation for all raid
levels.

However, it's impossible to implement a new lock free bitmap with
current situation that md-bitmap exposes the internal implementation
with lots of exported apis. Hence bitmap_operations is invented, to
describe bitmap core implementation, and a new bitmap can be introduced
with a new bitmap_operations, we only need to switch to the new one
during initialization.

And with this we can build bitmap as kernel module, but that's not
our concern for now.

This version was tested with mdadm tests. There are still few failed
tests in my VM, howerver, it's the test itself need to be fixed and
we're working on it.

Yu Kuai (42):
  md/raid1: use md_bitmap_wait_behind_writes() in raid1_read_request()
  md/md-bitmap: replace md_bitmap_status() with a new helper
    md_bitmap_get_stats()
  md: use new helper md_bitmap_get_stats() in update_array_info()
  md/md-bitmap: add 'events_cleared' into struct md_bitmap_stats
  md/md-cluster: fix spares warnings for __le64
  md/md-bitmap: add 'sync_size' into struct md_bitmap_stats
  md/md-bitmap: add 'file_pages' into struct md_bitmap_stats
  md/md-bitmap: add 'behind_writes' and 'behind_wait' into struct
    md_bitmap_stats
  md/md-cluster: use helper md_bitmap_get_stats() to get pages in
    resize_bitmaps()
  md/md-bitmap: add a new helper md_bitmap_set_pages()
  md/md-bitmap: introduce struct bitmap_operations
  md/md-bitmap: simplify md_bitmap_create() + md_bitmap_load()
  md/md-bitmap: merge md_bitmap_create() into bitmap_operations
  md/md-bitmap: merge md_bitmap_load() into bitmap_operations
  md/md-bitmap: merge md_bitmap_destroy() into bitmap_operations
  md/md-bitmap: merge md_bitmap_flush() into bitmap_operations
  md/md-bitmap: make md_bitmap_print_sb() internal
  md/md-bitmap: merge md_bitmap_update_sb() into bitmap_operations
  md/md-bitmap: merge md_bitmap_status() into bitmap_operations
  md/md-bitmap: remove md_bitmap_setallbits()
  md/md-bitmap: merge bitmap_write_all() into bitmap_operations
  md/md-bitmap: merge md_bitmap_dirty_bits() into bitmap_operations
  md/md-bitmap: merge md_bitmap_startwrite() into bitmap_operations
  md/md-bitmap: merge md_bitmap_endwrite() into bitmap_operations
  md/md-bitmap: merge md_bitmap_start_sync() into bitmap_operations
  md/md-bitmap: remove the parameter 'aborted' for md_bitmap_end_sync()
  md/md-bitmap: merge md_bitmap_end_sync() into bitmap_operations
  md/md-bitmap: merge md_bitmap_close_sync() into bitmap_operations
  md/md-bitmap: mrege md_bitmap_cond_end_sync() into bitmap_operations
  md/md-bitmap: merge md_bitmap_sync_with_cluster() into
    bitmap_operations
  md/md-bitmap: merge md_bitmap_unplug_async() into md_bitmap_unplug()
  md/md-bitmap: merge bitmap_unplug() into bitmap_operations
  md/md-bitmap: merge md_bitmap_daemon_work() into bitmap_operations
  md/md-bitmap: pass in mddev directly for md_bitmap_resize()
  md/md-bitmap: merge md_bitmap_resize() into bitmap_operations
  md/md-bitmap: merge get_bitmap_from_slot() into bitmap_operations
  md/md-bitmap: merge md_bitmap_copy_from_slot() into struct
    bitmap_operation.
  md/md-bitmap: merge md_bitmap_set_pages() into struct
    bitmap_operations
  md/md-bitmap: merge md_bitmap_free() into bitmap_operations
  md/md-bitmap: merge md_bitmap_wait_behind_writes() into
    bitmap_operations
  md/md-bitmap: merge md_bitmap_enabled() into bitmap_operations
  md/md-bitmap: make in memory structure internal

 drivers/md/dm-raid.c     |   7 +-
 drivers/md/md-bitmap.c   | 568 +++++++++++++++++++++++++++++----------
 drivers/md/md-bitmap.h   | 268 ++++--------------
 drivers/md/md-cluster.c  |  91 ++++---
 drivers/md/md.c          | 155 +++++++----
 drivers/md/md.h          |   3 +-
 drivers/md/raid1-10.c    |   9 +-
 drivers/md/raid1.c       |  78 +++---
 drivers/md/raid10.c      |  73 ++---
 drivers/md/raid5-cache.c |   8 +-
 drivers/md/raid5.c       |  62 ++---
 11 files changed, 760 insertions(+), 562 deletions(-)

-- 
2.39.2
Re: [PATCH md-6.12 v2 00/42] md/md-bitmap: introduce bitmap_operations and make structure internal
Posted by Song Liu 1 year, 3 months ago
On Mon, Aug 26, 2024 at 12:50 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
[...]
>
> And with this we can build bitmap as kernel module, but that's not
> our concern for now.
>
> This version was tested with mdadm tests. There are still few failed
> tests in my VM, howerver, it's the test itself need to be fixed and
> we're working on it.

Do we have new test failures after this set? If so, which ones?

Thanks,
Song

> Yu Kuai (42):
>   md/raid1: use md_bitmap_wait_behind_writes() in raid1_read_request()
>   md/md-bitmap: replace md_bitmap_status() with a new helper
>     md_bitmap_get_stats()

[...]
Re: [PATCH md-6.12 v2 00/42] md/md-bitmap: introduce bitmap_operations and make structure internal
Posted by Yu Kuai 1 year, 3 months ago
Hi,

在 2024/08/28 4:32, Song Liu 写道:
> On Mon, Aug 26, 2024 at 12:50 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
> [...]
>>
>> And with this we can build bitmap as kernel module, but that's not
>> our concern for now.
>>
>> This version was tested with mdadm tests. There are still few failed
>> tests in my VM, howerver, it's the test itself need to be fixed and
>> we're working on it.
> 
> Do we have new test failures after this set? If so, which ones?

No, there are new failures.

Thanks,
Kuai

> 
> Thanks,
> Song
> 
>> Yu Kuai (42):
>>    md/raid1: use md_bitmap_wait_behind_writes() in raid1_read_request()
>>    md/md-bitmap: replace md_bitmap_status() with a new helper
>>      md_bitmap_get_stats()
> 
> [...]
> .
> 

Re: [PATCH md-6.12 v2 00/42] md/md-bitmap: introduce bitmap_operations and make structure internal
Posted by Song Liu 1 year, 3 months ago
On Tue, Aug 27, 2024 at 6:15 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/08/28 4:32, Song Liu 写道:
> > On Mon, Aug 26, 2024 at 12:50 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> > [...]
> >>
> >> And with this we can build bitmap as kernel module, but that's not
> >> our concern for now.
> >>
> >> This version was tested with mdadm tests. There are still few failed
> >> tests in my VM, howerver, it's the test itself need to be fixed and
> >> we're working on it.
> >
> > Do we have new test failures after this set? If so, which ones?
>
> No, there are new failures.

I assume you meant "there are _no_ new failures.

Applied the set to md-6.12 branch.

Thanks,
Song
Re: [PATCH md-6.12 v2 00/42] md/md-bitmap: introduce bitmap_operations and make structure internal
Posted by Yu Kuai 1 year, 3 months ago
Hi,

在 2024/08/29 7:50, Song Liu 写道:
> On Tue, Aug 27, 2024 at 6:15 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/08/28 4:32, Song Liu 写道:
>>> On Mon, Aug 26, 2024 at 12:50 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>> [...]
>>>>
>>>> And with this we can build bitmap as kernel module, but that's not
>>>> our concern for now.
>>>>
>>>> This version was tested with mdadm tests. There are still few failed
>>>> tests in my VM, howerver, it's the test itself need to be fixed and
>>>> we're working on it.
>>>
>>> Do we have new test failures after this set? If so, which ones?
>>
>> No, there are new failures.
> 
> I assume you meant "there are _no_ new failures.

Yes.
> 
> Applied the set to md-6.12 branch.

Thanks!
Kuai
> 
> Thanks,
> Song
> .
> 

Re: [PATCH md-6.12 v2 00/42] md/md-bitmap: introduce bitmap_operations and make structure internal
Posted by Xiao Ni 1 year, 3 months ago
On Wed, Aug 28, 2024 at 9:15 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/08/28 4:32, Song Liu 写道:
> > On Mon, Aug 26, 2024 at 12:50 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> > [...]
> >>
> >> And with this we can build bitmap as kernel module, but that's not
> >> our concern for now.
> >>
> >> This version was tested with mdadm tests. There are still few failed
> >> tests in my VM, howerver, it's the test itself need to be fixed and
> >> we're working on it.
> >
> > Do we have new test failures after this set? If so, which ones?
>
> No, there are new failures.

Hi all

I suggest running lvm2 regression tests too. I can't run it myself now
because I can't get a stable network connection all the time.

Best Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > Thanks,
> > Song
> >
> >> Yu Kuai (42):
> >>    md/raid1: use md_bitmap_wait_behind_writes() in raid1_read_request()
> >>    md/md-bitmap: replace md_bitmap_status() with a new helper
> >>      md_bitmap_get_stats()
> >
> > [...]
> > .
> >
>
Re: [PATCH md-6.12 v2 00/42] md/md-bitmap: introduce bitmap_operations and make structure internal
Posted by Yu Kuai 1 year, 3 months ago
Hi,

在 2024/08/28 9:31, Xiao Ni 写道:
> On Wed, Aug 28, 2024 at 9:15 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/08/28 4:32, Song Liu 写道:
>>> On Mon, Aug 26, 2024 at 12:50 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>> [...]
>>>>
>>>> And with this we can build bitmap as kernel module, but that's not
>>>> our concern for now.
>>>>
>>>> This version was tested with mdadm tests. There are still few failed
>>>> tests in my VM, howerver, it's the test itself need to be fixed and
>>>> we're working on it.
>>>
>>> Do we have new test failures after this set? If so, which ones?
>>
>> No, there are new failures.
> 
> Hi all
> 
> I suggest running lvm2 regression tests too. I can't run it myself now
> because I can't get a stable network connection all the time.
> 

Today, I run one round of the lvm2 tests with following script in my VM:
for t in `ls test/shell`; do
         if cat test/shell/$t | grep raid &> /dev/null; then
                 make check T=shell/$t
         fi
done

And the failed tests are:
[root@fedora lvm2]# cat log1 | grep "###   " | grep failed
###       failed: [ndev-vanilla] shell/lvconvert-raid-reshape-size.sh
###       failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh

Then, I revert this set and test above tests again, they still fail. So
I didn't dig into these tests. :)

Thanks,
Kuai

> Best Regards
> Xiao
>>
>> Thanks,
>> Kuai
>>
>>>
>>> Thanks,
>>> Song
>>>
>>>> Yu Kuai (42):
>>>>     md/raid1: use md_bitmap_wait_behind_writes() in raid1_read_request()
>>>>     md/md-bitmap: replace md_bitmap_status() with a new helper
>>>>       md_bitmap_get_stats()
>>>
>>> [...]
>>> .
>>>
>>
> 
> 
> .
>