drivers/md/md.c | 184 ++++++++++++++++++++++----- drivers/md/md.h | 17 +++ drivers/md/raid1.c | 308 ++++++++++++++++++++++++++++++++++++++++++++- drivers/md/raid1.h | 1 + 4 files changed, 472 insertions(+), 38 deletions(-)
From: Zheng Qixing <zhengqixing@huawei.com>
In RAID1, some sectors may be marked as bad blocks due to I/O errors.
In certain scenarios, these bad blocks might not be permanent, and
issuing I/Os again could succeed.
To address this situation, a new sync action ('rectify') introduced
into RAID1 , allowing users to actively trigger the repair of existing
bad blocks and clear it in sys bad_blocks.
When echo rectify into /sys/block/md*/md/sync_action, a healthy disk is
selected from the array to read data and then writes it to the disk where
the bad block is located. If the write request succeeds, the bad block
record can be cleared.
Note:
This patchset depends on [1] from Li Nan which is currently under review
and not yet merged into md-6.19.
[1] [PATCH v3 00/13] cleanup and bugfix of sync
Link: https://lore.kernel.org/all/20251215030444.1318434-1-linan666@huaweicloud.com/
Zheng Qixing (5):
md: add helpers for requested sync action
md: clear stale sync flags when frozen before sync starts
md: simplify sync action print in status_resync
md: introduce MAX_RAID_DISKS macro to replace magic number
md/raid1: introduce rectify action to repair badblocks
drivers/md/md.c | 184 ++++++++++++++++++++++-----
drivers/md/md.h | 17 +++
drivers/md/raid1.c | 308 ++++++++++++++++++++++++++++++++++++++++++++-
drivers/md/raid1.h | 1 +
4 files changed, 472 insertions(+), 38 deletions(-)
--
2.39.2
On Wed, 31 Dec 2025 15:09:47 +0800
Zheng Qixing <zhengqixing@huaweicloud.com> wrote:
> From: Zheng Qixing <zhengqixing@huawei.com>
>
> In RAID1, some sectors may be marked as bad blocks due to I/O errors.
> In certain scenarios, these bad blocks might not be permanent, and
> issuing I/Os again could succeed.
>
> To address this situation, a new sync action ('rectify') introduced
> into RAID1 , allowing users to actively trigger the repair of existing
> bad blocks and clear it in sys bad_blocks.
>
> When echo rectify into /sys/block/md*/md/sync_action, a healthy disk is
> selected from the array to read data and then writes it to the disk where
> the bad block is located. If the write request succeeds, the bad block
> record can be cleared.
Could you also check here that it reads back successfully, and only then clear?
Otherwise there are cases when the block won't read even after rewriting it.
Side note, on some hardware it might be necessary to rewrite a larger area
around the problematic block, to finally trigger a remap. Not 512B, but at
least the native sector size, which is often 4K.
--
With respect,
Roman
Hi,
在 2025/12/31 19:11, Roman Mamedov 写道:
> On Wed, 31 Dec 2025 15:09:47 +0800
> Zheng Qixing <zhengqixing@huaweicloud.com> wrote:
>
>> From: Zheng Qixing <zhengqixing@huawei.com>
>>
>> In RAID1, some sectors may be marked as bad blocks due to I/O errors.
>> In certain scenarios, these bad blocks might not be permanent, and
>> issuing I/Os again could succeed.
>>
>> To address this situation, a new sync action ('rectify') introduced
>> into RAID1 , allowing users to actively trigger the repair of existing
>> bad blocks and clear it in sys bad_blocks.
>>
>> When echo rectify into /sys/block/md*/md/sync_action, a healthy disk is
>> selected from the array to read data and then writes it to the disk where
>> the bad block is located. If the write request succeeds, the bad block
>> record can be cleared.
> Could you also check here that it reads back successfully, and only then clear?
>
> Otherwise there are cases when the block won't read even after rewriting it.
Thanks for your suggestions.
I'm a bit worried that reading the data again before clearing the bad
blocks might
affect the performance of the bad block repair process.
> Side note, on some hardware it might be necessary to rewrite a larger area
> around the problematic block, to finally trigger a remap. Not 512B, but at
> least the native sector size, which is often 4K.
Are you referring to the case where we have logical 512B sectors but
physical 4K sectors?
I'm not entirely clear on one aspect:
Can a physical 4K block have partial recovery (e.g., one 512B sector
succeeds while the other 7 fail)?
Thanks,
Qixing
On 06/01/2026 at 03:44, Zheng Qixing wrote: > 在 2025/12/31 19:11, Roman Mamedov 写道: >> On Wed, 31 Dec 2025 15:09:47 +0800 >> >> Could you also check here that it reads back successfully, and only >> then clear? >> >> Otherwise there are cases when the block won't read even after >> rewriting it. I confirm. The rewrite is reported successful but SMART reallocation attributes did not change and a further read still fails. > I'm a bit worried that reading the data again before clearing the bad > blocks might affect the performance of the bad block repair process. Isn't it more worrying to clear bad blocks while they may still be bad ? Bad blocks should be rare anyway, so performance impact should be low. >> Side note, on some hardware it might be necessary to rewrite a larger area >> around the problematic block, to finally trigger a remap. Not 512B, but at >> least the native sector size, which is often 4K. > > Are you referring to the case where we have logical 512B sectors but > physical 4K sectors? Yes. Writing a single logical sector implies a read-modify-write of the whole underlying physical sector and will not complete if the read fails. > Can a physical 4K block have partial recovery (e.g., one 512B sector > succeeds while the other 7 fail)? Not in my experience. There seems to be a single ECC for the whole physical sector.
在 2026/1/6 23:36, Pascal Hambourg 写道: > On 06/01/2026 at 03:44, Zheng Qixing wrote: >> 在 2025/12/31 19:11, Roman Mamedov 写道: >>> On Wed, 31 Dec 2025 15:09:47 +0800 >>> >>> Could you also check here that it reads back successfully, and only >>> then clear? >>> >>> Otherwise there are cases when the block won't read even after >>> rewriting it. > > I confirm. The rewrite is reported successful but SMART reallocation > attributes did not change and a further read still fails. > >> I'm a bit worried that reading the data again before clearing the bad >> blocks might affect the performance of the bad block repair process. > > Isn't it more worrying to clear bad blocks while they may still be bad ? > Bad blocks should be rare anyway, so performance impact should be low. > >>> Side note, on some hardware it might be necessary to rewrite a >>> larger area >>> around the problematic block, to finally trigger a remap. Not 512B, >>> but at >>> least the native sector size, which is often 4K. >> >> Are you referring to the case where we have logical 512B sectors but >> physical 4K sectors? > > Yes. Writing a single logical sector implies a read-modify-write of > the whole underlying physical sector and will not complete if the read > fails. That makes sense. I will change it in the next version. > >> Can a physical 4K block have partial recovery (e.g., one 512B sector >> succeeds while the other 7 fail)? > > Not in my experience. There seems to be a single ECC for the whole > physical sector. I will try to test with disks that have lbs=512 and pbs=4096. If 512B IOs can be successfully issued, then the bad block repair logic does need to consider the minimum repair length and alignment logic. Thanks, Qixing
On Tue, 6 Jan 2026 10:44:38 +0800 Zheng Qixing <zhengqixing@huaweicloud.com> wrote: > Are you referring to the case where we have logical 512B sectors but > physical 4K sectors? At least that, yes. Such rewriting of bad blocks should happen at least at the physical sector granularity. But from my limited experience it feels like the badblock recovery algorithm in hard drives, in addition to being opaque and proprietary, also highly indeterministic and possibly buggy. In one case it would take REPEATEDLY overwriting a full megabyte around a bad block to finally make the drive remap it. (Maybe less than a megabyte would do, but overwriting only 4K - didn't). Of course I understand such endeavors are outside of scope for mdraid, hence it was just a side note. -- With respect, Roman
© 2016 - 2026 Red Hat, Inc.