[v1] md/raid1: introduce a new sync action to repair badblocks

[RFC PATCH 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Zheng Qixing 1 month, 1 week ago

From: Zheng Qixing <zhengqixing@huawei.com>

In RAID1, some sectors may be marked as bad blocks due to I/O errors.
In certain scenarios, these bad blocks might not be permanent, and
issuing I/Os again could succeed.

To address this situation, a new sync action ('rectify') introduced
into RAID1 , allowing users to actively trigger the repair of existing
bad blocks and clear it in sys bad_blocks.

When echo rectify into /sys/block/md*/md/sync_action, a healthy disk is
selected from the array to read data and then writes it to the disk where
the bad block is located. If the write request succeeds, the bad block
record can be cleared.

Note:
  This patchset depends on [1] from Li Nan which is currently under review
  and not yet merged into md-6.19.

[1] [PATCH v3 00/13] cleanup and bugfix of sync
  Link: https://lore.kernel.org/all/20251215030444.1318434-1-linan666@huaweicloud.com/

Zheng Qixing (5):
  md: add helpers for requested sync action
  md: clear stale sync flags when frozen before sync starts
  md: simplify sync action print in status_resync
  md: introduce MAX_RAID_DISKS macro to replace magic number
  md/raid1: introduce rectify action to repair badblocks

 drivers/md/md.c    | 184 ++++++++++++++++++++++-----
 drivers/md/md.h    |  17 +++
 drivers/md/raid1.c | 308 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/raid1.h |   1 +
 4 files changed, 472 insertions(+), 38 deletions(-)

-- 
2.39.2

Re: [RFC PATCH 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Roman Mamedov 1 month, 1 week ago

On Wed, 31 Dec 2025 15:09:47 +0800
Zheng Qixing <zhengqixing@huaweicloud.com> wrote:

> From: Zheng Qixing <zhengqixing@huawei.com>
> 
> In RAID1, some sectors may be marked as bad blocks due to I/O errors.
> In certain scenarios, these bad blocks might not be permanent, and
> issuing I/Os again could succeed.
> 
> To address this situation, a new sync action ('rectify') introduced
> into RAID1 , allowing users to actively trigger the repair of existing
> bad blocks and clear it in sys bad_blocks.
> 
> When echo rectify into /sys/block/md*/md/sync_action, a healthy disk is
> selected from the array to read data and then writes it to the disk where
> the bad block is located. If the write request succeeds, the bad block
> record can be cleared.

Could you also check here that it reads back successfully, and only then clear?

Otherwise there are cases when the block won't read even after rewriting it.

Side note, on some hardware it might be necessary to rewrite a larger area
around the problematic block, to finally trigger a remap. Not 512B, but at
least the native sector size, which is often 4K.

-- 
With respect,
Roman

Re: [RFC PATCH 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Zheng Qixing 1 month ago

Hi,

在 2025/12/31 19:11, Roman Mamedov 写道:
> On Wed, 31 Dec 2025 15:09:47 +0800
> Zheng Qixing <zhengqixing@huaweicloud.com> wrote:
>
>> From: Zheng Qixing <zhengqixing@huawei.com>
>>
>> In RAID1, some sectors may be marked as bad blocks due to I/O errors.
>> In certain scenarios, these bad blocks might not be permanent, and
>> issuing I/Os again could succeed.
>>
>> To address this situation, a new sync action ('rectify') introduced
>> into RAID1 , allowing users to actively trigger the repair of existing
>> bad blocks and clear it in sys bad_blocks.
>>
>> When echo rectify into /sys/block/md*/md/sync_action, a healthy disk is
>> selected from the array to read data and then writes it to the disk where
>> the bad block is located. If the write request succeeds, the bad block
>> record can be cleared.
> Could you also check here that it reads back successfully, and only then clear?
>
> Otherwise there are cases when the block won't read even after rewriting it.

Thanks for your suggestions.

I'm a bit worried that reading the data again before clearing the bad 
blocks might

affect the performance of the bad block repair process.


> Side note, on some hardware it might be necessary to rewrite a larger area
> around the problematic block, to finally trigger a remap. Not 512B, but at
> least the native sector size, which is often 4K.


Are you referring to the case where we have logical 512B sectors but 
physical 4K sectors?

I'm not entirely clear on one aspect:

Can a physical 4K block have partial recovery (e.g., one 512B sector 
succeeds while the other 7 fail)?


Thanks,

Qixing

Re: [RFC PATCH 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Pascal Hambourg 1 month ago

On 06/01/2026 at 03:44, Zheng Qixing wrote:
> 在 2025/12/31 19:11, Roman Mamedov 写道:
>> On Wed, 31 Dec 2025 15:09:47 +0800
>>
>> Could you also check here that it reads back successfully, and only 
>> then clear?
>>
>> Otherwise there are cases when the block won't read even after 
>> rewriting it.

I confirm. The rewrite is reported successful but SMART reallocation 
attributes did not change and a further read still fails.

> I'm a bit worried that reading the data again before clearing the bad 
> blocks might affect the performance of the bad block repair process.

Isn't it more worrying to clear bad blocks while they may still be bad ?
Bad blocks should be rare anyway, so performance impact should be low.

>> Side note, on some hardware it might be necessary to rewrite a larger area
>> around the problematic block, to finally trigger a remap. Not 512B, but at
>> least the native sector size, which is often 4K.
> 
> Are you referring to the case where we have logical 512B sectors but 
> physical 4K sectors?

Yes. Writing a single logical sector implies a read-modify-write of the 
whole underlying physical sector and will not complete if the read fails.

> Can a physical 4K block have partial recovery (e.g., one 512B sector 
> succeeds while the other 7 fail)?

Not in my experience. There seems to be a single ECC for the whole 
physical sector.

Re: [RFC PATCH 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Zheng Qixing 1 month ago

在 2026/1/6 23:36, Pascal Hambourg 写道:
> On 06/01/2026 at 03:44, Zheng Qixing wrote:
>> 在 2025/12/31 19:11, Roman Mamedov 写道:
>>> On Wed, 31 Dec 2025 15:09:47 +0800
>>>
>>> Could you also check here that it reads back successfully, and only 
>>> then clear?
>>>
>>> Otherwise there are cases when the block won't read even after 
>>> rewriting it.
>
> I confirm. The rewrite is reported successful but SMART reallocation 
> attributes did not change and a further read still fails.
>
>> I'm a bit worried that reading the data again before clearing the bad 
>> blocks might affect the performance of the bad block repair process.
>
> Isn't it more worrying to clear bad blocks while they may still be bad ?
> Bad blocks should be rare anyway, so performance impact should be low.
>
>>> Side note, on some hardware it might be necessary to rewrite a 
>>> larger area
>>> around the problematic block, to finally trigger a remap. Not 512B, 
>>> but at
>>> least the native sector size, which is often 4K.
>>
>> Are you referring to the case where we have logical 512B sectors but 
>> physical 4K sectors?
>
> Yes. Writing a single logical sector implies a read-modify-write of 
> the whole underlying physical sector and will not complete if the read 
> fails.

That makes sense. I will change it in the next version.

>
>> Can a physical 4K block have partial recovery (e.g., one 512B sector 
>> succeeds while the other 7 fail)?
>
> Not in my experience. There seems to be a single ECC for the whole 
> physical sector.

I will try to test with disks that have lbs=512 and pbs=4096.

If 512B IOs can be successfully issued, then the bad block repair logic 
does need to

consider the minimum repair length and alignment logic.


Thanks,

Qixing

Re: [RFC PATCH 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Roman Mamedov 1 month ago

On Tue, 6 Jan 2026 10:44:38 +0800
Zheng Qixing <zhengqixing@huaweicloud.com> wrote:

> Are you referring to the case where we have logical 512B sectors but 
> physical 4K sectors?

At least that, yes. Such rewriting of bad blocks should happen at least at the
physical sector granularity.

But from my limited experience it feels like the badblock recovery algorithm
in hard drives, in addition to being opaque and proprietary, also highly
indeterministic and possibly buggy. In one case it would take REPEATEDLY
overwriting a full megabyte around a bad block to finally make the drive remap
it. (Maybe less than a megabyte would do, but overwriting only 4K - didn't).
Of course I understand such endeavors are outside of scope for mdraid, hence
it was just a side note.

-- 
With respect,
Roman