drivers/md/md.c | 152 +++++++++++++++++++------ drivers/md/md.h | 21 ++++ drivers/md/raid1.c | 270 ++++++++++++++++++++++++++++++++++++++++++++- drivers/md/raid1.h | 1 + 4 files changed, 409 insertions(+), 35 deletions(-)
From: Zheng Qixing <zhengqixing@huawei.com> Hi, This is v2 of the series. # Mechanism When rectifying badblocks, we issue a single repair write for the bad range (copy data from a good mirror to the corresponding LBA on bad mirror). Once the write completes successfully (bi_status == 0), the LBA range is cleared from the badblocks table. If the media is still bad for that LBA, a subsequent read/write will fail again and the range will be marked bad again. Doing a read-back for every repair would only prove that the data is readable at that moment, and it does not provide a stronger guarantee against future internal remapping. # Why use LBS granularity for bad-block repair? In our RAID1 bad-block repair (rectify) testing on a device reporting 512B logical blocks and 4KiB physical blocks, we issue 512B I/O directly to the md device and inject an I/O fault. Since the md badblocks table can only track failures in terms of host-visible LBA ranges, it is updated at 512B sector granularity (i.e., it records the failing sector) and does not attempt to infer or expand the entry to a 4KiB physical-block boundary. Given that the OS has no visibility into the device's internal mapping from LBAs to physical media (or the FTL), using logical block size for recording and repairing bad blocks is the most appropriate choice from a correctness standpoint. If the underlying media failure is actually larger than 512B, this is typically reflected by subsequent failures on adjacent LBAs, at which point the recorded bad range will naturally grow to cover the affected area. # Tests This feature has been tested on a RAID1 built from two 480GB system disks. It has also been tested under QEMU with a 4-disk RAID1 setup, with both memory fault injection and I/O fault injection enabled. In addition, we will add a new test (26raid1-rectify-badblocks) in mdadm/tests to verify whether `rectify` can effectively repair sectors recorded in bad_blocks. # TODO rectify currently only supports bad-block repair for the RAID1 level. We will consider extending it to RAID5/10 in follow-up work. Changes in v2: - Patch 1: Remove non-essential helpers to reduce indirection. - Patch 2: Split out a bugfix that was previously included in patch 1. - Patch 3: Rename the /proc/mdstat action from "recovery" to "recover" to match the naming used by action_store() and action_show(). - Patch 4: Add a brief comment for MAX_RAID_DISKS. - Patch 5: For rectify, reuse handle_sync_write_finished() to handle write request completion, removing duplicate completion handling. Link of v1: https://lore.kernel.org/all/20251231070952.1233903-1-zhengqixing@huaweicloud.com/ Zheng Qixing (5): md: add helpers for requested sync action md: serialize requested sync actions and clear stale request state md: rename mdstat action "recovery" to "recover" md: introduce MAX_RAID_DISKS macro to replace magic number md/raid1: introduce rectify action to repair badblocks drivers/md/md.c | 152 +++++++++++++++++++------ drivers/md/md.h | 21 ++++ drivers/md/raid1.c | 270 ++++++++++++++++++++++++++++++++++++++++++++- drivers/md/raid1.h | 1 + 4 files changed, 409 insertions(+), 35 deletions(-) -- 2.39.2
Just curious, but what kind of devices do you see that have permanent bad blocks at a fixed location that are not fixed by rewriting the sector?
On 03/02/2026 à 08:31, Christoph Hellwig wrote: > Just curious, but what kind of devices do you see that have > permanent bad blocks at a fixed location that are not fixed by > rewriting the sector? I have seen this with several hard disk drives of various brands, even though SMART attribute #5 (reallocated sector count) had not reached the limit.
On Tue, Feb 03, 2026 at 09:08:18AM +0100, Pascal Hambourg wrote: > On 03/02/2026 à 08:31, Christoph Hellwig wrote: > > Just curious, but what kind of devices do you see that have > > permanent bad blocks at a fixed location that are not fixed by > > rewriting the sector? > > I have seen this with several hard disk drives of various brands, even > though SMART attribute #5 (reallocated sector count) had not reached the > limit. Weird. Can you share the models? I'm especially curious if these are consumer of enterprise drives and of what vintage.
On 03/02/2026 at 17:30, Christoph Hellwig wrote: > On Tue, Feb 03, 2026 at 09:08:18AM +0100, Pascal Hambourg wrote: >> On 03/02/2026 à 08:31, Christoph Hellwig wrote: >>> Just curious, but what kind of devices do you see that have >>> permanent bad blocks at a fixed location that are not fixed by >>> rewriting the sector? >> >> I have seen this with several hard disk drives of various brands, even >> though SMART attribute #5 (reallocated sector count) had not reached the >> limit. > > Weird. Can you share the models? I'm especially curious if these > are consumer of enterprise drives and of what vintage. I did not keep track of the models and do not remember them, it was a long time ago. They were mostly hard disk drives from Dell and HP professional desktop and laptop series, so consumer grade I guess, manufactured around 2010.
On Tue, Feb 03, 2026 at 09:36:38PM +0100, Pascal Hambourg wrote: > I did not keep track of the models and do not remember them, it was a long > time ago. They were mostly hard disk drives from Dell and HP professional > desktop and laptop series, so consumer grade I guess, manufactured around > 2010. Ok, for 15ish year old consumer devices I would not be very surprised.
Hi, 在 2026/2/3 15:31, Christoph Hellwig 写道: > Just curious, but what kind of devices do you see that have > permanent bad blocks at a fixed location that are not fixed by > rewriting the sector? The bad_blocks entries record sectors where I/O failed, which indicates that the device-internal remapping did not succeed at that time. `rectify` does not assume a permanently bad or fixed LBA. Its purpose is to trigger an additional rewrite, giving the underlying device (e.g. FTL or firmware) another opportunity to perform its own remapping.
On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote: > Hi, > > 在 2026/2/3 15:31, Christoph Hellwig 写道: > > Just curious, but what kind of devices do you see that have > > permanent bad blocks at a fixed location that are not fixed by > > rewriting the sector? > > The bad_blocks entries record sectors where I/O failed, which > indicates that the device-internal remapping did not succeed > at that time. > > `rectify` does not assume a permanently bad or fixed LBA. Its > purpose is to trigger an additional rewrite, giving the underlying > device (e.g. FTL or firmware) another opportunity to perform its > own remapping. Well, what devices do you see where writes fail, but rewrites fix them?
resend.. 在 2026/2/4 0:31, Christoph Hellwig 写道: > On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote: >> Hi, >> >> 在 2026/2/3 15:31, Christoph Hellwig 写道: >>> Just curious, but what kind of devices do you see that have >>> permanent bad blocks at a fixed location that are not fixed by >>> rewriting the sector? >> The bad_blocks entries record sectors where I/O failed, which >> indicates that the device-internal remapping did not succeed >> at that time. >> >> `rectify` does not assume a permanently bad or fixed LBA. Its >> purpose is to trigger an additional rewrite, giving the underlying >> device (e.g. FTL or firmware) another opportunity to perform its >> own remapping. > Well, what devices do you see where writes fail, but rewrites > fix them? I understand your concerns, but I do not have a concrete example tied to a specific device model... The intent here is to provide an additional rewrite opportunity, which allows the write path to be exercised again and gives the underlying device or stack a chance to recover or remap the affected range. For remote storage devices, I/O may fail due to network or transport issues. If the final attempt fails, MD can record the affected range in bad_blocks. This behavior does not appear to be tied to a specific device model. For local storage, some controllers may have limitations or corner cases in their remapping mechanisms. In such cases, a sector that could potentially be recovered may be marked as bad, leaving no opportunity for a subsequent successful rewrite.
在 2026/2/4 0:31, Christoph Hellwig 写道: > On Tue, Feb 03, 2026 at 04:08:23PM +0800, Zheng Qixing wrote: >> Hi, >> >> 在 2026/2/3 15:31, Christoph Hellwig 写道: >>> Just curious, but what kind of devices do you see that have >>> permanent bad blocks at a fixed location that are not fixed by >>> rewriting the sector? >> The bad_blocks entries record sectors where I/O failed, which >> indicates that the device-internal remapping did not succeed >> at that time. >> >> `rectify` does not assume a permanently bad or fixed LBA. Its >> purpose is to trigger an additional rewrite, giving the underlying >> device (e.g. FTL or firmware) another opportunity to perform its >> own remapping. > Well, what devices do you see where writes fail, but rewrites > fix them? I understand your concerns, but I do not have a concrete example tied to a specific device model... The intent here is to provide an additional rewrite opportunity, which allows the write path to be exercised again and gives the underlying device or stack a chance to recover or remap the affected range. For remote storage devices, I/O may fail due to network or transport issues. If the final attempt fails, MD can record the affected range in bad_blocks. This behavior does not appear to be tied to a specific device model. For local storage, some controllers may have limitations or corner cases in their remapping mechanisms. In such cases, a sector that could potentially be recovered may be marked as bad, leaving no opportunity for a subsequent successful rewrite.
© 2016 - 2026 Red Hat, Inc.