[v2] md/raid1: introduce a new sync action to repair badblocks

[RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Zheng Qixing 3 days, 20 hours ago

From: Zheng Qixing <zhengqixing@huawei.com>

Hi,

This is v2 of the series.

# Mechanism
When rectifying badblocks, we issue a single repair write for
the bad range (copy data from a good mirror to the corresponding
LBA on bad mirror). Once the write completes successfully
(bi_status == 0), the LBA range is cleared from the badblocks
table. If the media is still bad for that LBA, a subsequent
read/write will fail again and the range will be marked bad
again.

Doing a read-back for every repair would only prove that the data
is readable at that moment, and it does not provide a stronger
guarantee against future internal remapping.

# Why use LBS granularity for bad-block repair?
In our RAID1 bad-block repair (rectify) testing on a device
reporting 512B logical blocks and 4KiB physical blocks, we
issue 512B I/O directly to the md device and inject an I/O
fault.

Since the md badblocks table can only track failures in terms
of host-visible LBA ranges, it is updated at 512B sector
granularity (i.e., it records the failing sector) and does not
attempt to infer or expand the entry to a 4KiB physical-block
boundary.

Given that the OS has no visibility into the device's internal
mapping from LBAs to physical media (or the FTL), using
logical block size for recording and repairing bad blocks is
the most appropriate choice from a correctness standpoint.

If the underlying media failure is actually larger than 512B,
this is typically reflected by subsequent failures on adjacent
LBAs, at which point the recorded bad range will naturally
grow to cover the affected area.

# Tests
This feature has been tested on a RAID1 built from two 480GB
system disks. It has also been tested under QEMU with a 4-disk
RAID1 setup, with both memory fault injection and I/O fault
injection enabled.

In addition, we will add a new test (26raid1-rectify-badblocks)
in mdadm/tests to verify whether `rectify` can effectively repair
sectors recorded in bad_blocks.

# TODO
rectify currently only supports bad-block repair for the RAID1
level. We will consider extending it to RAID5/10 in follow-up
work.

Changes in v2:
 - Patch 1: Remove non-essential helpers to reduce indirection.
 - Patch 2: Split out a bugfix that was previously included in patch 1.
 - Patch 3: Rename the /proc/mdstat action from "recovery" to "recover"
   to match the naming used by action_store() and action_show().
 - Patch 4: Add a brief comment for MAX_RAID_DISKS.
 - Patch 5: For rectify, reuse handle_sync_write_finished() to handle
   write request completion, removing duplicate completion handling.

Link of v1:
  https://lore.kernel.org/all/20251231070952.1233903-1-zhengqixing@huaweicloud.com/

Zheng Qixing (5):
  md: add helpers for requested sync action
  md: serialize requested sync actions and clear stale request state
  md: rename mdstat action "recovery" to "recover"
  md: introduce MAX_RAID_DISKS macro to replace magic number
  md/raid1: introduce rectify action to repair badblocks

 drivers/md/md.c    | 152 +++++++++++++++++++------
 drivers/md/md.h    |  21 ++++
 drivers/md/raid1.c | 270 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/raid1.h |   1 +
 4 files changed, 409 insertions(+), 35 deletions(-)

-- 
2.39.2

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Christoph Hellwig 3 days, 19 hours ago

Just curious, but what kind of devices do you see that have
permanent bad blocks at a fixed location that are not fixed by
rewriting the sector?

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Pascal Hambourg 3 days, 18 hours ago

On 03/02/2026 à 08:31, Christoph Hellwig wrote:
> Just curious, but what kind of devices do you see that have
> permanent bad blocks at a fixed location that are not fixed by
> rewriting the sector?

I have seen this with several hard disk drives of various brands, even 
though SMART attribute #5 (reallocated sector count) had not reached the 
limit.

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Christoph Hellwig 3 days, 10 hours ago

On Tue, Feb 03, 2026 at 09:08:18AM +0100, Pascal Hambourg wrote:
> On 03/02/2026 à 08:31, Christoph Hellwig wrote:
> > Just curious, but what kind of devices do you see that have
> > permanent bad blocks at a fixed location that are not fixed by
> > rewriting the sector?
> 
> I have seen this with several hard disk drives of various brands, even
> though SMART attribute #5 (reallocated sector count) had not reached the
> limit.

Weird.  Can you share the models?  I'm especially curious if these
are consumer of enterprise drives and of what vintage.

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Pascal Hambourg 3 days, 6 hours ago

On 03/02/2026 at 17:30, Christoph Hellwig wrote:
> On Tue, Feb 03, 2026 at 09:08:18AM +0100, Pascal Hambourg wrote:
>> On 03/02/2026 à 08:31, Christoph Hellwig wrote:
>>> Just curious, but what kind of devices do you see that have
>>> permanent bad blocks at a fixed location that are not fixed by
>>> rewriting the sector?
>>
>> I have seen this with several hard disk drives of various brands, even
>> though SMART attribute #5 (reallocated sector count) had not reached the
>> limit.
> 
> Weird.  Can you share the models?  I'm especially curious if these
> are consumer of enterprise drives and of what vintage.

I did not keep track of the models and do not remember them, it was a 
long time ago. They were mostly hard disk drives from Dell and HP 
professional desktop and laptop series, so consumer grade I guess, 
manufactured around 2010.

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Christoph Hellwig 2 days, 9 hours ago

On Tue, Feb 03, 2026 at 09:36:38PM +0100, Pascal Hambourg wrote:
> I did not keep track of the models and do not remember them, it was a long
> time ago. They were mostly hard disk drives from Dell and HP professional
> desktop and laptop series, so consumer grade I guess, manufactured around
> 2010.

Ok, for 15ish year old consumer devices I would not be very surprised.

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Zheng Qixing 3 days, 18 hours ago

Hi,

在 2026/2/3 15:31, Christoph Hellwig 写道:
> Just curious, but what kind of devices do you see that have
> permanent bad blocks at a fixed location that are not fixed by
> rewriting the sector?

The bad_blocks entries record sectors where I/O failed, which
indicates that the device-internal remapping did not succeed
at that time.

`rectify` does not assume a permanently bad or fixed LBA. Its
purpose is to trigger an additional rewrite, giving the underlying
device (e.g. FTL or firmware) another opportunity to perform its
own remapping.

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Christoph Hellwig 3 days, 10 hours ago

On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote:
> Hi,
> 
> 在 2026/2/3 15:31, Christoph Hellwig 写道:
> > Just curious, but what kind of devices do you see that have
> > permanent bad blocks at a fixed location that are not fixed by
> > rewriting the sector?
> 
> The bad_blocks entries record sectors where I/O failed, which
> indicates that the device-internal remapping did not succeed
> at that time.
> 
> `rectify` does not assume a permanently bad or fixed LBA. Its
> purpose is to trigger an additional rewrite, giving the underlying
> device (e.g. FTL or firmware) another opportunity to perform its
> own remapping.

Well, what devices do you see where writes fail, but rewrites
fix them?

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Zheng Qixing 2 days, 17 hours ago

resend..

在 2026/2/4 0:31, Christoph Hellwig 写道:
> On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote:
>> Hi,
>>
>> 在 2026/2/3 15:31, Christoph Hellwig 写道:
>>> Just curious, but what kind of devices do you see that have
>>> permanent bad blocks at a fixed location that are not fixed by
>>> rewriting the sector?
>> The bad_blocks entries record sectors where I/O failed, which
>> indicates that the device-internal remapping did not succeed
>> at that time.
>>
>> `rectify` does not assume a permanently bad or fixed LBA. Its
>> purpose is to trigger an additional rewrite, giving the underlying
>> device (e.g. FTL or firmware) another opportunity to perform its
>> own remapping.
> Well, what devices do you see where writes fail, but rewrites
> fix them?

I understand your concerns, but I do not have a concrete example tied
to a specific device model...

The intent here is to provide an additional rewrite opportunity, which
allows the write path to be exercised again and gives the underlying
device or stack a chance to recover or remap the affected range.

For remote storage devices, I/O may fail due to network or transport
issues. If the final attempt fails, MD can record the affected range in
bad_blocks. This behavior does not appear to be tied to a specific
device model.

For local storage, some controllers may have limitations or corner cases
in their remapping mechanisms. In such cases, a sector that could
potentially be recovered may be marked as bad, leaving no opportunity
for a subsequent successful rewrite.

Re: [RFC v2 0/5] md/raid1: introduce a new sync action to repair badblocks

Posted by Zheng Qixing 2 days, 17 hours ago

在 2026/2/4 0:31, Christoph Hellwig 写道:

> On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote:
>> Hi,
>>
>> 在 2026/2/3 15:31, Christoph Hellwig 写道:
>>> Just curious, but what kind of devices do you see that have
>>> permanent bad blocks at a fixed location that are not fixed by
>>> rewriting the sector?
>> The bad_blocks entries record sectors where I/O failed, which
>> indicates that the device-internal remapping did not succeed
>> at that time.
>>
>> `rectify` does not assume a permanently bad or fixed LBA. Its
>> purpose is to trigger an additional rewrite, giving the underlying
>> device (e.g. FTL or firmware) another opportunity to perform its
>> own remapping.
> Well, what devices do you see where writes fail, but rewrites
> fix them?

I understand your concerns, but I do not have a concrete example tied to a specific device model... The intent here is to provide an additional rewrite opportunity, which allows the write path to be exercised again and gives the underlying device or stack a chance to recover or remap the affected range. For remote storage devices, I/O may fail due to network or transport issues. If the final attempt fails, MD can record the affected range in bad_blocks. This behavior does not appear to be tied to a specific device model. For local storage, some controllers may have limitations or corner cases in their remapping mechanisms. In such cases, a sector that could potentially be recovered may be marked as bad, leaving no opportunity for a subsequent successful rewrite.