From nobody Sun Feb 8 09:12:17 2026 Received: from www5210.sakura.ne.jp (www5210.sakura.ne.jp [133.167.8.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 524C233DEC2 for ; Mon, 5 Jan 2026 15:41:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=133.167.8.150 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767627683; cv=none; b=PSmolUrlB5E3+P9xr6yocmI+MzdzPvTk2cdJcQF+aJKEdXNW4XTaqvpIpx9RcuDbWNsGsyuzYTIFZbkqc2IhkY2xt/Sb/w7xkpidzXbypDC9MaCzMf9cRiMnXe7K51gQ4BOuQlQOzSNYV/ziFdTUYjkMfx9Je+RMd3ENfKa+UOQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767627683; c=relaxed/simple; bh=F06hzjZQ0WEGVhwTlRepJ4dPvS5CTlGogzKYVE4LWl0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cUcXGOWAKJ4bKshczLFJE7ETkqmIRvEp7r+FUSsmYvU/WZ3BfNQD+N7JJ2zH5pWdXAko6YTad/5GTOHd1ioPSs8s7oiGqBzI+j6xYBZ4HMY43Mre4jCqq/gPTrejz7AdHBEk4wR+gOTdsvVX2rSTtuyVkUnCw6m9f1sRtmOGj7A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=mgml.me; spf=pass smtp.mailfrom=mgml.me; dkim=pass (2048-bit key) header.d=mgml.me header.i=@mgml.me header.b=PpjVmug4; arc=none smtp.client-ip=133.167.8.150 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=mgml.me Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mgml.me Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mgml.me header.i=@mgml.me header.b="PpjVmug4" Received: from fedora (p3411048-ipxg00d01tokaisakaetozai.aichi.ocn.ne.jp [114.157.12.48]) (authenticated bits=0) by www5210.sakura.ne.jp (8.16.1/8.16.1) with ESMTPSA id 605Eenob052549 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 5 Jan 2026 23:41:02 +0900 (JST) (envelope-from k@mgml.me) DKIM-Signature: a=rsa-sha256; bh=MB9AIP0t2LUP+IzYF+5HmygSlbCHSoUfxVscAlh0aGU=; c=relaxed/relaxed; d=mgml.me; h=From:To:Subject:Date:Message-ID; s=rs20250315; t=1767624062; v=1; b=PpjVmug46PMuZjlnbkNpicZfZGK0RFUgHxSZpQSPZq+MgSw8HQYI/Zz2PGEcC714 C154yANXxUG4tHlDozwgcSQqkDr4ddQlLgf6rx9tRtjp+NRxVUuL9iIOlS/e/9tl WcUkOtdVJuR7vCq6YwcSCwo1+6OomhFMCd5MbgCA3v5lX14vzA1fGTEBMe/C4Vyr mlCh1EoU3hyf2v+s99BfItX1C82Xlbhd0XgsNujzZNuFRTv1ObHTNj8d4wS0ALPr C8+P0SGOLjOeT15mhE0XslxPuyAIWc4xBtJyT/Vjnw8R4iDBglSi8TrPdSCPEKWG 4hTrX4zAw68EExqT2lzSmQ== From: Kenta Akagi To: Song Liu , Yu Kuai , Shaohua Li , Mariusz Tkaczyk , Xiao Ni Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, Kenta Akagi Subject: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast Date: Mon, 5 Jan 2026 23:40:24 +0900 Message-ID: <20260105144025.12478-2-k@mgml.me> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20260105144025.12478-1-k@mgml.me> References: <20260105144025.12478-1-k@mgml.me> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), if the error handler is called on the last rdev in RAID1 or RAID10, the MD_BROKEN flag will be set on that mddev. When MD_BROKEN is set, write bios to the md will result in an I/O error. This causes a problem when using FailFast. The current implementation of FailFast expects the array to continue functioning without issues even after calling md_error for the last rdev. Furthermore, due to the nature of its functionality, FailFast may call md_error on all rdevs of the md. Even if retrying I/O on an rdev would succeed, it first calls md_error before retrying. To fix this issue, this commit ensures that for RAID1 and RAID10, if the last In_sync rdev has the FailFast flag set and the mddev's fail_last_dev is off, the MD_BROKEN flag will not be set on that mddev. This change impacts userspace. After this commit, If the rdev has the FailFast flag, the mddev never broken even if the failing bio is not FailFast. However, it's unlikely that any setup using FailFast expects the array to halt when md_error is called on the last rdev. Since FailFast is only implemented for RAID1 and RAID10, no changes are needed for other personalities. Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") Suggested-by: Xiao Ni Signed-off-by: Kenta Akagi --- drivers/md/md.c | 6 ++++-- drivers/md/raid1.c | 8 +++++++- drivers/md/raid10.c | 8 +++++++- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 6062e0deb616..f1745f8921fc 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -3050,7 +3050,8 @@ state_store(struct md_rdev *rdev, const char *buf, si= ze_t len) if (cmd_match(buf, "faulty") && rdev->mddev->pers) { md_error(rdev->mddev, rdev); =20 - if (test_bit(MD_BROKEN, &rdev->mddev->flags)) + if (test_bit(MD_BROKEN, &rdev->mddev->flags) || + !test_bit(Faulty, &rdev->flags)) err =3D -EBUSY; else err =3D 0; @@ -7915,7 +7916,8 @@ static int set_disk_faulty(struct mddev *mddev, dev_t= dev) err =3D -ENODEV; else { md_error(mddev, rdev); - if (test_bit(MD_BROKEN, &mddev->flags)) + if (test_bit(MD_BROKEN, &mddev->flags) || + !test_bit(Faulty, &rdev->flags)) err =3D -EBUSY; } rcu_read_unlock(); diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 592a40233004..459b34cd358b 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1745,6 +1745,10 @@ static void raid1_status(struct seq_file *seq, struc= t mddev *mddev) * - recovery is interrupted. * - &mddev->degraded is bumped. * + * If the following conditions are met, @mddev never fails: + * - The last In_sync @rdev has the &FailFast flag set. + * - &mddev->fail_last_dev is off. + * * @rdev is marked as &Faulty excluding case when array is failed and * &mddev->fail_last_dev is off. */ @@ -1757,7 +1761,9 @@ static void raid1_error(struct mddev *mddev, struct m= d_rdev *rdev) =20 if (test_bit(In_sync, &rdev->flags) && (conf->raid_disks - mddev->degraded) =3D=3D 1) { - set_bit(MD_BROKEN, &mddev->flags); + if (!test_bit(FailFast, &rdev->flags) || + mddev->fail_last_dev) + set_bit(MD_BROKEN, &mddev->flags); =20 if (!mddev->fail_last_dev) { conf->recovery_disabled =3D mddev->recovery_disabled; diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 14dcd5142eb4..b33149aa5b29 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1989,6 +1989,10 @@ static int enough(struct r10conf *conf, int ignore) * - recovery is interrupted. * - &mddev->degraded is bumped. * + * If the following conditions are met, @mddev never fails: + * - The last In_sync @rdev has the &FailFast flag set. + * - &mddev->fail_last_dev is off. + * * @rdev is marked as &Faulty excluding case when array is failed and * &mddev->fail_last_dev is off. */ @@ -2000,7 +2004,9 @@ static void raid10_error(struct mddev *mddev, struct = md_rdev *rdev) spin_lock_irqsave(&conf->device_lock, flags); =20 if (test_bit(In_sync, &rdev->flags) && !enough(conf, rdev->raid_disk)) { - set_bit(MD_BROKEN, &mddev->flags); + if (!test_bit(FailFast, &rdev->flags) || + mddev->fail_last_dev) + set_bit(MD_BROKEN, &mddev->flags); =20 if (!mddev->fail_last_dev) { spin_unlock_irqrestore(&conf->device_lock, flags); --=20 2.50.1 From nobody Sun Feb 8 09:12:17 2026 Received: from www5210.sakura.ne.jp (www5210.sakura.ne.jp [133.167.8.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBE0B33F386 for ; Mon, 5 Jan 2026 15:41:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=133.167.8.150 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767627689; cv=none; b=m9ydWj6Yw7EV8Ih9h2myZOfLRSUyXuDsZa6amaJI5zDsKR4xXxrrydd+NflQq4h9uWUF9HC+apcGubIpH5H/sRylbfTq54A0TikmofWlN16gXntuDWY2N8ZpAM3cddwyh9SpeZ8+4qytrDgQqBk57vrJl5FKVxPKxJUrN1VwcFg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767627689; c=relaxed/simple; bh=ejGBZNU9WJMWKqpvmwqiDbiDAX10TLBQzQDrBexviPI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z3rWbiPC7zfV6m88adU24Oy/eoW4z+2A0+2qdCUTmt4lmMQtJ2tmAtZZZeDkxnrVosOkgjR+4lLFceOpdump7JHvU/S3zobCQ1WjAZ6uL4IwMJh3TU4yT/WDpaOw6i7TFLriJ0xfv+1fNhPf8JZyXZFqr2sYU/oHeto++amsdz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=mgml.me; spf=pass smtp.mailfrom=mgml.me; dkim=pass (2048-bit key) header.d=mgml.me header.i=@mgml.me header.b=Az1ToRGf; arc=none smtp.client-ip=133.167.8.150 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=mgml.me Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mgml.me Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mgml.me header.i=@mgml.me header.b="Az1ToRGf" Received: from fedora (p3411048-ipxg00d01tokaisakaetozai.aichi.ocn.ne.jp [114.157.12.48]) (authenticated bits=0) by www5210.sakura.ne.jp (8.16.1/8.16.1) with ESMTPSA id 605Eenoc052549 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 5 Jan 2026 23:41:08 +0900 (JST) (envelope-from k@mgml.me) DKIM-Signature: a=rsa-sha256; bh=fTM2LuS+zQWnkW2Iqi4WiaphYQuAhQfnZqgoTIDOyU8=; c=relaxed/relaxed; d=mgml.me; h=From:To:Subject:Date:Message-ID; s=rs20250315; t=1767624068; v=1; b=Az1ToRGfdm0oA0RMBc8hZ8/LdnmawzP9pBP81zNq52M3JhCsoQCHzpA6rFuqJHPV hGS1/9MeDmez1X1d1yeMScwzRBX075Xzu19LR660JypYYAv8n9qpHcX4xOtaK40m ViEgnpUf6wqtsNxW/AeJatCVn/dfC/lMGzIrGEn1LFFNQVR+DGG8a98nI/tfneq7 E9PTMAgTCqoaHjPF4wT+wRIXCNl42+Mp3o53KrN6TIxgxnxgVFUJGztBn2MuMsce Pbk5GGyLou8LEWHgHxiAYJvsr4Q8buUzMaKM+H9uXkJbNNXT03MM605ee9wkuUU/ jykEwBT3Gwdh6gM1bo2/Cw== From: Kenta Akagi To: Song Liu , Yu Kuai , Shaohua Li , Mariusz Tkaczyk , Xiao Ni Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, Kenta Akagi , Li Nan Subject: [PATCH v6 2/2] md/raid10: fix failfast read error not rescheduled Date: Mon, 5 Jan 2026 23:40:25 +0900 Message-ID: <20260105144025.12478-3-k@mgml.me> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20260105144025.12478-1-k@mgml.me> References: <20260105144025.12478-1-k@mgml.me> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" raid10_end_read_request lacks a path to retry when a FailFast IO fails. As a result, when Failfast Read IOs fail on all rdevs, the upper layer receives EIO, without read rescheduled. Looking at the two commits below, it seems only raid10_end_read_request lacks the failfast read retry handling, while raid1_end_read_request has it. In RAID1, the retry works as expected. * commit 8d3ca83dcf9c ("md/raid10: add failfast handling for reads.") * commit 2e52d449bcec ("md/raid1: add failfast handling for reads.") This commit will make the failfast read bio for the last rdev in raid10 retry if it fails. Fixes: 8d3ca83dcf9c ("md/raid10: add failfast handling for reads.") Signed-off-by: Kenta Akagi Reviewed-by: Li Nan --- drivers/md/raid10.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index b33149aa5b29..8a254bab52e8 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -401,6 +401,13 @@ static void raid10_end_read_request(struct bio *bio) * wait for the 'master' bio. */ set_bit(R10BIO_Uptodate, &r10_bio->state); + } else if (test_bit(FailFast, &rdev->flags) && + test_bit(R10BIO_FailFast, &r10_bio->state)) { + /* + * This was a fail-fast read so we definitely + * want to retry + */ + ; } else if (!raid1_should_handle_error(bio)) { uptodate =3D 1; } else { --=20 2.50.1