From nobody Sun Feb 8 09:12:15 2026 Received: from canpmsgout02.his.huawei.com (canpmsgout02.his.huawei.com [113.46.200.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 098FA2DF706; Mon, 27 Oct 2025 07:37:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.217 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761550635; cv=none; b=sh57L7FrSqIkiBbv3WOb1hqaJCUDnNdQYbkMatQc9D3yPiRzR28A32VSVa0K1jEONgaApEKtWGe+Pko3OjWP2qQ9RzPy/lUMasJpvmXPUFlD9WAwbpuzUz+56Hpf1/Vo1nK4OB8lYWCobE+fD7vvPCuhuocjVGPt6DfT4lLexew= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761550635; c=relaxed/simple; bh=3HwpgOEOC3gF3put2iqdQQD+9ZFq21eIOGrz5jmplg8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=B21OgxRH31O4IB6ZRTLWs7chLaGvk71eQaHwIw1On9qSV6PxCJ28rOBl553MA42OC+fQ9TCOLHcPUjRkZW3gdDzvq/qezB8rahU1Sbx3BVeaElHRIgzPCvdWj0jAMqMFkyLTLczwAs15KgYl+9qCVwsDeB8s7WMVL6+SnMY0oYA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=h-partners.com; dkim=pass (1024-bit key) header.d=h-partners.com header.i=@h-partners.com header.b=kMoW1y3V; arc=none smtp.client-ip=113.46.200.217 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=h-partners.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=h-partners.com header.i=@h-partners.com header.b="kMoW1y3V" dkim-signature: v=1; a=rsa-sha256; d=h-partners.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=Y7HQy/x8hqZcaAq7f+oSEs4u0uGf4OTgzgEpYuVkks0=; b=kMoW1y3V+P8cZkxbIZL2hdYQjUbCb8Fry5k4yNgzqi/vE2aYTFMP8MYaRrb4Gs0YZtgLoyEY0 RVEaI5qQG96jd6k/X6SqDTAp6G+Z6YglR6yXa55gPjal1iXhs+Jr76ZcN6GYgs0qebvxlmx+A5l LNIHv7vh+0gNco2ayK2ZdU0= Received: from mail.maildlp.com (unknown [172.19.162.254]) by canpmsgout02.his.huawei.com (SkyGuard) with ESMTPS id 4cw4zz6WYkzcZxv; Mon, 27 Oct 2025 15:35:43 +0800 (CST) Received: from dggemv706-chm.china.huawei.com (unknown [10.3.19.33]) by mail.maildlp.com (Postfix) with ESMTPS id 03896180489; Mon, 27 Oct 2025 15:37:04 +0800 (CST) Received: from kwepemn500011.china.huawei.com (7.202.194.152) by dggemv706-chm.china.huawei.com (10.3.19.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Oct 2025 15:37:03 +0800 Received: from huawei.com (10.50.87.129) by kwepemn500011.china.huawei.com (7.202.194.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Oct 2025 15:37:03 +0800 From: To: , , , , , CC: , , , , , Subject: [PATCH v7 1/4] md: delete md_redundancy_group when array is becoming inactive Date: Mon, 27 Oct 2025 15:29:12 +0800 Message-ID: <20251027072915.3014463-2-linan122@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251027072915.3014463-1-linan122@huawei.com> References: <20251027072915.3014463-1-linan122@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems500002.china.huawei.com (7.221.188.17) To kwepemn500011.china.huawei.com (7.202.194.152) Content-Type: text/plain; charset="utf-8" From: Li Nan 'md_redundancy_group' are created in md_run() and deleted in del_gendisk(), but these are not paired. Writing inactive/active to sysfs array_state can trigger md_run() multiple times without del_gendisk(), leading to duplicate creation as below: sysfs: cannot create duplicate filename '/devices/virtual/block/md0/md/syn= c_action' Call Trace: dump_stack_lvl+0x9f/0x120 dump_stack+0x14/0x20 sysfs_warn_dup+0x96/0xc0 sysfs_add_file_mode_ns+0x19c/0x1b0 internal_create_group+0x213/0x830 sysfs_create_group+0x17/0x20 md_run+0x856/0xe60 ? __x64_sys_openat+0x23/0x30 do_md_run+0x26/0x1d0 array_state_store+0x559/0x760 md_attr_store+0xc9/0x1e0 sysfs_kf_write+0x6f/0xa0 kernfs_fop_write_iter+0x141/0x2a0 vfs_write+0x1fc/0x5a0 ksys_write+0x79/0x180 __x64_sys_write+0x1d/0x30 x64_sys_call+0x2818/0x2880 do_syscall_64+0xa9/0x580 entry_SYSCALL_64_after_hwframe+0x4b/0x53 md: cannot register extra attributes for md0 Creation of it depends on 'pers', its lifecycle cannot be aligned with gendisk. So fix this issue by triggering 'md_redundancy_group' deletion when the array is becoming inactive. Fixes: 790abe4d77af ("md: remove/add redundancy group only in level change") Signed-off-by: Li Nan Reviewed-by: Yu Kuai --- drivers/md/md.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index fa13eb02874e..f6fd55a1637b 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6878,6 +6878,10 @@ static int do_md_stop(struct mddev *mddev, int mode) if (!md_is_rdwr(mddev)) set_disk_ro(disk, 0); =20 + if (mode =3D=3D 2 && mddev->pers->sync_request && + mddev->to_remove =3D=3D NULL) + mddev->to_remove =3D &md_redundancy_group; + __md_stop_writes(mddev); __md_stop(mddev); =20 --=20 2.39.2 From nobody Sun Feb 8 09:12:15 2026 Received: from canpmsgout06.his.huawei.com (canpmsgout06.his.huawei.com [113.46.200.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3529B2DCF52; Mon, 27 Oct 2025 07:37:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761550633; cv=none; b=I92m74wYjHQ8sXCBlDy6EypdnRlXLy6OoY9P+DE9kIcuMVJhpD8KKi6Z/jEirlztT4G8fm2e27v+7iFqJyEC/2BnRmTd2YP19LJXtr6fdt1sN379E2P7vtaERfpfzjzNYzIo/EWEDI+GqcJTwBbhoKcO+NNYAkAiYmLMhJlqddg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761550633; c=relaxed/simple; bh=CI+E4mV3B2PZQXQZhHYOZydxBEEJkTkDZ4DtuomQKw0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=htJnq/S8Jq5aUGd1sdEoXOEstN2VWlFqpTXDvAaQyeuAaYouav91dnXplKwjviuVgu14TL7fqxRfAYPUHZHC8FoWAFxEKbH+G+JslazWU4pxDGr+JyMqkDXLWa3LQ3IsqafOVIxI9ufPU9+3Hn5WV/7hEssDT/J8IhQ8JVEPU9U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=h-partners.com; dkim=pass (1024-bit key) header.d=h-partners.com header.i=@h-partners.com header.b=MseAaJI4; arc=none smtp.client-ip=113.46.200.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=h-partners.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=h-partners.com header.i=@h-partners.com header.b="MseAaJI4" dkim-signature: v=1; a=rsa-sha256; d=h-partners.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=qUmwcgcyV9CESb9yBPaafpYmEmFHXGzOCz0FTFHXGKM=; b=MseAaJI4FKjwj2yEYRLslg9GIakYq7gt9ESA2IJK4+wg7ziW9ujaKvTB4sANg0PU2B3uoXGlf zECoxEKhrKP+k5ca75vqOJ6KRBzsURkGjyppbw0Equ4f0BKI0ymMdU9gQEHZwAadGPj3+RSmxBZ FzyVvnLlxglStN9ifxtX2EE= Received: from mail.maildlp.com (unknown [172.19.162.254]) by canpmsgout06.his.huawei.com (SkyGuard) with ESMTPS id 4cw51049kJzRhV6; Mon, 27 Oct 2025 15:36:36 +0800 (CST) Received: from dggemv712-chm.china.huawei.com (unknown [10.1.198.32]) by mail.maildlp.com (Postfix) with ESMTPS id 6F085180489; Mon, 27 Oct 2025 15:37:04 +0800 (CST) Received: from kwepemn500011.china.huawei.com (7.202.194.152) by dggemv712-chm.china.huawei.com (10.1.198.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Oct 2025 15:37:04 +0800 Received: from huawei.com (10.50.87.129) by kwepemn500011.china.huawei.com (7.202.194.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Oct 2025 15:37:03 +0800 From: To: , , , , , CC: , , , , , Subject: [PATCH v7 2/4] md: init bioset in mddev_init Date: Mon, 27 Oct 2025 15:29:13 +0800 Message-ID: <20251027072915.3014463-3-linan122@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251027072915.3014463-1-linan122@huawei.com> References: <20251027072915.3014463-1-linan122@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems500002.china.huawei.com (7.221.188.17) To kwepemn500011.china.huawei.com (7.202.194.152) Content-Type: text/plain; charset="utf-8" From: Li Nan IO operations may be needed before md_run(), such as updating metadata after writing sysfs. Without bioset, this triggers a NULL pointer dereference as below: BUG: kernel NULL pointer dereference, address: 0000000000000020 Call Trace: md_update_sb+0x658/0xe00 new_level_store+0xc5/0x120 md_attr_store+0xc9/0x1e0 sysfs_kf_write+0x6f/0xa0 kernfs_fop_write_iter+0x141/0x2a0 vfs_write+0x1fc/0x5a0 ksys_write+0x79/0x180 __x64_sys_write+0x1d/0x30 x64_sys_call+0x2818/0x2880 do_syscall_64+0xa9/0x580 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Reproducer ``` mdadm -CR /dev/md0 -l1 -n2 /dev/sd[cd] echo inactive > /sys/block/md0/md/array_state echo 10 > /sys/block/md0/md/new_level ``` Fixes: d981ed841930 ("md: Add new_level sysfs interface") Signed-off-by: Li Nan --- drivers/md/md.c | 74 +++++++++++++++++++++++++------------------------ 1 file changed, 38 insertions(+), 36 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index f6fd55a1637b..51f0201e4906 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -730,6 +730,8 @@ static void mddev_clear_bitmap_ops(struct mddev *mddev) =20 int mddev_init(struct mddev *mddev) { + int err =3D 0; + if (!IS_ENABLED(CONFIG_MD_BITMAP)) mddev->bitmap_id =3D ID_BITMAP_NONE; else @@ -741,8 +743,26 @@ int mddev_init(struct mddev *mddev) =20 if (percpu_ref_init(&mddev->writes_pending, no_op, PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) { - percpu_ref_exit(&mddev->active_io); - return -ENOMEM; + err =3D -ENOMEM; + goto exit_acitve_io; + } + + if (!bioset_initialized(&mddev->bio_set)) { + err =3D bioset_init(&mddev->bio_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS= ); + if (err) + goto exit_writes_pending; + } + if (!bioset_initialized(&mddev->sync_set)) { + err =3D bioset_init(&mddev->sync_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVEC= S); + if (err) + goto exit_bio_set; + } + + if (!bioset_initialized(&mddev->io_clone_set)) { + err =3D bioset_init(&mddev->io_clone_set, BIO_POOL_SIZE, + offsetof(struct md_io_clone, bio_clone), 0); + if (err) + goto exit_sync_set; } =20 /* We want to start with the refcount at zero */ @@ -773,11 +793,24 @@ int mddev_init(struct mddev *mddev) INIT_WORK(&mddev->del_work, mddev_delayed_delete); =20 return 0; + +exit_sync_set: + bioset_exit(&mddev->sync_set); +exit_bio_set: + bioset_exit(&mddev->bio_set); +exit_writes_pending: + percpu_ref_exit(&mddev->writes_pending); +exit_acitve_io: + percpu_ref_exit(&mddev->active_io); + return err; } EXPORT_SYMBOL_GPL(mddev_init); =20 void mddev_destroy(struct mddev *mddev) { + bioset_exit(&mddev->bio_set); + bioset_exit(&mddev->sync_set); + bioset_exit(&mddev->io_clone_set); percpu_ref_exit(&mddev->active_io); percpu_ref_exit(&mddev->writes_pending); } @@ -6393,29 +6426,9 @@ int md_run(struct mddev *mddev) nowait =3D nowait && bdev_nowait(rdev->bdev); } =20 - if (!bioset_initialized(&mddev->bio_set)) { - err =3D bioset_init(&mddev->bio_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS= ); - if (err) - return err; - } - if (!bioset_initialized(&mddev->sync_set)) { - err =3D bioset_init(&mddev->sync_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVEC= S); - if (err) - goto exit_bio_set; - } - - if (!bioset_initialized(&mddev->io_clone_set)) { - err =3D bioset_init(&mddev->io_clone_set, BIO_POOL_SIZE, - offsetof(struct md_io_clone, bio_clone), 0); - if (err) - goto exit_sync_set; - } - pers =3D get_pers(mddev->level, mddev->clevel); - if (!pers) { - err =3D -EINVAL; - goto abort; - } + if (!pers) + return -EINVAL; if (mddev->level !=3D pers->head.id) { mddev->level =3D pers->head.id; mddev->new_level =3D pers->head.id; @@ -6426,8 +6439,7 @@ int md_run(struct mddev *mddev) pers->start_reshape =3D=3D NULL) { /* This personality cannot handle reshaping... */ put_pers(pers); - err =3D -EINVAL; - goto abort; + return -EINVAL; } =20 if (pers->sync_request) { @@ -6554,12 +6566,6 @@ int md_run(struct mddev *mddev) mddev->private =3D NULL; put_pers(pers); md_bitmap_destroy(mddev); -abort: - bioset_exit(&mddev->io_clone_set); -exit_sync_set: - bioset_exit(&mddev->sync_set); -exit_bio_set: - bioset_exit(&mddev->bio_set); return err; } EXPORT_SYMBOL_GPL(md_run); @@ -6784,10 +6790,6 @@ static void __md_stop(struct mddev *mddev) mddev->private =3D NULL; put_pers(pers); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); - - bioset_exit(&mddev->bio_set); - bioset_exit(&mddev->sync_set); - bioset_exit(&mddev->io_clone_set); } =20 void md_stop(struct mddev *mddev) --=20 2.39.2 From nobody Sun Feb 8 09:12:15 2026 Received: from canpmsgout05.his.huawei.com (canpmsgout05.his.huawei.com [113.46.200.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4074D1A0BD6; Mon, 27 Oct 2025 07:37:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761550632; cv=none; b=CByXzIPYLrz5uH4X/dKc/JEzno9lqvhtBuiQbYsiAgJpFw6rZpqQnk5UjSxInkyrIopFTbZ3Lhg/CwnATxu/cuSN1ubWksza/6u6Z4a0PxlULqDpFFQrTkDFV+aUGla2wDrMi/dZSRNBqVdnmSKCywlRhZ5gJgl30i9i3iLj8yg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761550632; c=relaxed/simple; bh=479+mniNecubZehFmc8IlGfr/WAQyGK99DuJrT+llqA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OSBEi+3HuudEqpTdd/UVtCDoyfmpZ0eXxSAMxxL1L3XCEqUz4tsEumxlLmlOmSvNcpFzur3GZrBH2c4UJ302N+W2T+Azh9EyRheNy/K3WXWHTkQOMDRgMB2t9BmMRCdUM2T5LId/S8kwjRsmGBEwYOegVraYDlU5gaB2iu4K364= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=h-partners.com; dkim=pass (1024-bit key) header.d=h-partners.com header.i=@h-partners.com header.b=pvMBruke; arc=none smtp.client-ip=113.46.200.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=h-partners.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=h-partners.com header.i=@h-partners.com header.b="pvMBruke" dkim-signature: v=1; a=rsa-sha256; d=h-partners.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=d8IncEcvLlrcKLn7J9fR10HHDAHuWMu7vGLAsTM8sYg=; b=pvMBrukeDQ+tkMVrqag3imMWO+Ymw27EowOf/SExtpFlq5qAb1sWM9StX76wwWX6nNZj7TEhF bXHqXffx9JZMDPvnA+Xqnx8S60WfNrLykQzcoHvbtAz0pMWVr6ZmwdW0uy9Ank3HDj3+o0y/zsk P8wOLnewY3lh7RZPmfD83bY= Received: from mail.maildlp.com (unknown [172.19.163.48]) by canpmsgout05.his.huawei.com (SkyGuard) with ESMTPS id 4cw50p3T6vz12LKJ; Mon, 27 Oct 2025 15:36:26 +0800 (CST) Received: from dggemv705-chm.china.huawei.com (unknown [10.3.19.32]) by mail.maildlp.com (Postfix) with ESMTPS id DD46D1800B1; Mon, 27 Oct 2025 15:37:04 +0800 (CST) Received: from kwepemn500011.china.huawei.com (7.202.194.152) by dggemv705-chm.china.huawei.com (10.3.19.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Oct 2025 15:37:04 +0800 Received: from huawei.com (10.50.87.129) by kwepemn500011.china.huawei.com (7.202.194.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Oct 2025 15:37:04 +0800 From: To: , , , , , CC: , , , , , Subject: [PATCH v7 3/4] md/raid0: Move queue limit setup before r0conf initialization Date: Mon, 27 Oct 2025 15:29:14 +0800 Message-ID: <20251027072915.3014463-4-linan122@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251027072915.3014463-1-linan122@huawei.com> References: <20251027072915.3014463-1-linan122@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems500002.china.huawei.com (7.221.188.17) To kwepemn500011.china.huawei.com (7.202.194.152) Content-Type: text/plain; charset="utf-8" From: Li Nan Prepare for making logical blocksize configurable. Move raid0_set_limits() before create_strip_zones(). It is safe as fields modified in create_strip_zones() do not involve mddev configuration, and rdev modifications there are not used in raid0_set_limits(). 'blksize' in create_strip_zones() fetches mddev's logical block size. This change has no impact until logical block size becomes configurable. Signed-off-by: Li Nan --- drivers/md/raid0.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index e443e478645a..49477b560cc9 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -68,7 +68,7 @@ static int create_strip_zones(struct mddev *mddev, struct= r0conf **private_conf) struct strip_zone *zone; int cnt; struct r0conf *conf =3D kzalloc(sizeof(*conf), GFP_KERNEL); - unsigned blksize =3D 512; + unsigned int blksize =3D queue_logical_block_size(mddev->gendisk->queue); =20 *private_conf =3D ERR_PTR(-ENOMEM); if (!conf) @@ -405,6 +405,12 @@ static int raid0_run(struct mddev *mddev) if (md_check_no_bitmap(mddev)) return -EINVAL; =20 + if (!mddev_is_dm(mddev)) { + ret =3D raid0_set_limits(mddev); + if (ret) + return ret; + } + /* if private is not null, we are here after takeover */ if (mddev->private =3D=3D NULL) { ret =3D create_strip_zones(mddev, &conf); @@ -413,11 +419,6 @@ static int raid0_run(struct mddev *mddev) mddev->private =3D conf; } conf =3D mddev->private; - if (!mddev_is_dm(mddev)) { - ret =3D raid0_set_limits(mddev); - if (ret) - return ret; - } =20 /* calculate array device size */ md_set_array_sectors(mddev, raid0_size(mddev, 0, 0)); --=20 2.39.2 From nobody Sun Feb 8 09:12:15 2026 Received: from canpmsgout08.his.huawei.com (canpmsgout08.his.huawei.com [113.46.200.223]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19F2E2E06EF; Mon, 27 Oct 2025 07:37:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.223 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761550636; cv=none; b=ulEmId5Nsb5F0CmfZgxRfSXGP5ESQnv+n5EFHE5PG76EUmYAEkkzCHXEQM84oBrJPlSrMGS2OausT/PIaCJSbdZHoE91JEW4ocLZ7e88AWO69IXK9+CD6cfPVMAjdUpaj3+Zg+yRsnZ7swndmtocjng3mdk1jZ8I1xRhfgcvND0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761550636; c=relaxed/simple; bh=xTTlvCaCtD+dP4ZcwpmAaSgOGSM5Vw3vABlL+ftz62M=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=aypc1sy9nH7WMCTZlsCSW+2DaHOvfBt/tpcu/Dvn8DPCS0jGwZmsk8Fdug3Ojq0P/NIJj+UVJARJI6WFFIn+YPuheLrYI7zT8Qoc0c4vXIDoohpANh3KAd9jaq57KRH0JzcUGUjn7rkS/ncZGQ8kC36546A9KDMSRVeKIK8XUB8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=h-partners.com; dkim=pass (1024-bit key) header.d=h-partners.com header.i=@h-partners.com header.b=PgxFvp3m; arc=none smtp.client-ip=113.46.200.223 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=h-partners.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=h-partners.com header.i=@h-partners.com header.b="PgxFvp3m" dkim-signature: v=1; a=rsa-sha256; d=h-partners.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=9zNsStWWRJN2cY0FDEHLW+dBeSJEzqmQM4wkXH5m0JU=; b=PgxFvp3m3ySd58H8ykb3Og4NgL6cJ1hTvjMICX1K5eWILVcAg+t77hUxh9R+mjbxgLZFRuyur TUDadCmq2cPKxt5D4x/NNnxpqznPLph3e/G6jFtOjEUk3FI9q93+/d0oIYiWU43UAnewjfEUX2C wJ99Lxzf99EGHrD9sy19hFc= Received: from mail.maildlp.com (unknown [172.19.163.17]) by canpmsgout08.his.huawei.com (SkyGuard) with ESMTPS id 4cw5121v0WzmVBx; Mon, 27 Oct 2025 15:36:38 +0800 (CST) Received: from dggemv706-chm.china.huawei.com (unknown [10.3.19.33]) by mail.maildlp.com (Postfix) with ESMTPS id CBB991A0191; Mon, 27 Oct 2025 15:37:05 +0800 (CST) Received: from kwepemn500011.china.huawei.com (7.202.194.152) by dggemv706-chm.china.huawei.com (10.3.19.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Oct 2025 15:37:05 +0800 Received: from huawei.com (10.50.87.129) by kwepemn500011.china.huawei.com (7.202.194.152) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Oct 2025 15:37:04 +0800 From: To: , , , , , CC: , , , , , Subject: [PATCH v7 4/4] md: allow configuring logical block size Date: Mon, 27 Oct 2025 15:29:15 +0800 Message-ID: <20251027072915.3014463-5-linan122@huawei.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251027072915.3014463-1-linan122@huawei.com> References: <20251027072915.3014463-1-linan122@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems500002.china.huawei.com (7.221.188.17) To kwepemn500011.china.huawei.com (7.202.194.152) Content-Type: text/plain; charset="utf-8" From: Li Nan Previously, raid array used the maximum logical block size (LBS) of all member disks. Adding a larger LBS disk at runtime could unexpectedly increase RAID's LBS, risking corruption of existing partitions. This can be reproduced by: ``` # LBS of sd[de] is 512 bytes, sdf is 4096 bytes. mdadm -CRq /dev/md0 -l1 -n3 /dev/sd[de] missing --assume-clean # LBS is 512 cat /sys/block/md0/queue/logical_block_size # create partition md0p1 parted -s /dev/md0 mklabel gpt mkpart primary 1MiB 100% lsblk | grep md0p1 # LBS becomes 4096 after adding sdf mdadm --add -q /dev/md0 /dev/sdf cat /sys/block/md0/queue/logical_block_size # partition lost partprobe /dev/md0 lsblk | grep md0p1 ``` Simply restricting larger-LBS disks is inflexible. In some scenarios, only disks with 512 bytes LBS are available currently, but later, disks with 4KB LBS may be added to the array. Making LBS configurable is the best way to solve this scenario. After this patch, the raid will: - store LBS in disk metadata - add a read-write sysfs 'mdX/logical_block_size' Future mdadm should support setting LBS via metadata field during RAID creation and the new sysfs. Though the kernel allows runtime LBS changes, users should avoid modifying it after creating partitions or filesystems to prevent compatibility issues. Only 1.x metadata supports configurable LBS. 0.90 metadata inits all fields to default values at auto-detect. Supporting 0.90 would require more extensive changes and no such use case has been observed. Note that many RAID paths rely on PAGE_SIZE alignment, including for metadata I/O. A larger LBS than PAGE_SIZE will result in metadata read/write failures. So this config should be prevented. Signed-off-by: Li Nan --- Documentation/admin-guide/md.rst | 7 +++ drivers/md/md.h | 1 + include/uapi/linux/raid/md_p.h | 3 +- drivers/md/md-linear.c | 1 + drivers/md/md.c | 76 ++++++++++++++++++++++++++++++++ drivers/md/raid0.c | 1 + drivers/md/raid1.c | 1 + drivers/md/raid10.c | 1 + drivers/md/raid5.c | 1 + 9 files changed, 91 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/m= d.rst index 1c2eacc94758..0f143acd2db7 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -238,6 +238,13 @@ All md devices contain: the number of devices in a raid4/5/6, or to support external metadata formats which mandate such clipping. =20 + logical_block_size + Configure the array's logical block size in bytes. This attribute + is only supported for 1.x meta. The value should be written before + starting the array. The final array LBS will use the max value + between this configuration and all combined device's LBS. Note that + LBS cannot exceed PAGE_SIZE before RAID supports folio. + reshape_position This is either ``none`` or a sector number within the devices of the array where ``reshape`` is up to. If this is set, the three diff --git a/drivers/md/md.h b/drivers/md/md.h index 38a7c2fab150..a6b3cb69c28c 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -432,6 +432,7 @@ struct mddev { sector_t array_sectors; /* exported array size */ int external_size; /* size managed * externally */ + unsigned int logical_block_size; __u64 events; /* If the last 'event' was simply a clean->dirty transition, and * we didn't write it to the spares, then it is safe and simple diff --git a/include/uapi/linux/raid/md_p.h b/include/uapi/linux/raid/md_p.h index ac74133a4768..310068bb2a1d 100644 --- a/include/uapi/linux/raid/md_p.h +++ b/include/uapi/linux/raid/md_p.h @@ -291,7 +291,8 @@ struct mdp_superblock_1 { __le64 resync_offset; /* data before this offset (from data_offset) known= to be in sync */ __le32 sb_csum; /* checksum up to devs[max_dev] */ __le32 max_dev; /* size of devs[] array to consider */ - __u8 pad3[64-32]; /* set to 0 when writing */ + __le32 logical_block_size; /* same as q->limits->logical_block_size */ + __u8 pad3[64-36]; /* set to 0 when writing */ =20 /* device state information. Indexed by dev_number. * 2 bytes per device diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c index 7033d982d377..50d4a419a16e 100644 --- a/drivers/md/md-linear.c +++ b/drivers/md/md-linear.c @@ -72,6 +72,7 @@ static int linear_set_limits(struct mddev *mddev) =20 md_init_stacking_limits(&lim); lim.max_hw_sectors =3D mddev->chunk_sectors; + lim.logical_block_size =3D mddev->logical_block_size; lim.max_write_zeroes_sectors =3D mddev->chunk_sectors; lim.max_hw_wzeroes_unmap_sectors =3D mddev->chunk_sectors; lim.io_min =3D mddev->chunk_sectors << 9; diff --git a/drivers/md/md.c b/drivers/md/md.c index 51f0201e4906..0961bd11f1bc 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1998,6 +1998,7 @@ static int super_1_validate(struct mddev *mddev, stru= ct md_rdev *freshest, struc mddev->layout =3D le32_to_cpu(sb->layout); mddev->raid_disks =3D le32_to_cpu(sb->raid_disks); mddev->dev_sectors =3D le64_to_cpu(sb->size); + mddev->logical_block_size =3D le32_to_cpu(sb->logical_block_size); mddev->events =3D ev1; mddev->bitmap_info.offset =3D 0; mddev->bitmap_info.space =3D 0; @@ -2207,6 +2208,7 @@ static void super_1_sync(struct mddev *mddev, struct = md_rdev *rdev) sb->chunksize =3D cpu_to_le32(mddev->chunk_sectors); sb->level =3D cpu_to_le32(mddev->level); sb->layout =3D cpu_to_le32(mddev->layout); + sb->logical_block_size =3D cpu_to_le32(mddev->logical_block_size); if (test_bit(FailFast, &rdev->flags)) sb->devflags |=3D FailFast1; else @@ -5935,6 +5937,67 @@ static struct md_sysfs_entry md_serialize_policy =3D __ATTR(serialize_policy, S_IRUGO | S_IWUSR, serialize_policy_show, serialize_policy_store); =20 +static int mddev_set_logical_block_size(struct mddev *mddev, + unsigned int lbs) +{ + int err =3D 0; + struct queue_limits lim; + + if (queue_logical_block_size(mddev->gendisk->queue) >=3D lbs) { + pr_err("%s: Cannot set LBS smaller than mddev LBS %u\n", + mdname(mddev), lbs); + return -EINVAL; + } + + lim =3D queue_limits_start_update(mddev->gendisk->queue); + lim.logical_block_size =3D lbs; + pr_info("%s: logical_block_size is changed, data may be lost\n", + mdname(mddev)); + err =3D queue_limits_commit_update(mddev->gendisk->queue, &lim); + if (err) + return err; + + mddev->logical_block_size =3D lbs; + md_update_sb(mddev, 1); + return 0; +} + +static ssize_t +lbs_show(struct mddev *mddev, char *page) +{ + return sprintf(page, "%u\n", mddev->logical_block_size); +} + +static ssize_t +lbs_store(struct mddev *mddev, const char *buf, size_t len) +{ + unsigned int lbs; + int err =3D -EBUSY; + + /* Only 1.x meta supports configurable LBS */ + if (mddev->major_version =3D=3D 0) + return -EINVAL; + + if (mddev->pers) + return -EBUSY; + + err =3D kstrtouint(buf, 10, &lbs); + if (err < 0) + return -EINVAL; + + err =3D mddev_lock(mddev); + if (err) + goto unlock; + + err =3D mddev_set_logical_block_size(mddev, lbs); + +unlock: + mddev_unlock(mddev); + return err ?: len; +} + +static struct md_sysfs_entry md_logical_block_size =3D +__ATTR(logical_block_size, 0644, lbs_show, lbs_store); =20 static struct attribute *md_default_attrs[] =3D { &md_level.attr, @@ -5957,6 +6020,7 @@ static struct attribute *md_default_attrs[] =3D { &md_consistency_policy.attr, &md_fail_last_dev.attr, &md_serialize_policy.attr, + &md_logical_block_size.attr, NULL, }; =20 @@ -6087,6 +6151,17 @@ int mddev_stack_rdev_limits(struct mddev *mddev, str= uct queue_limits *lim, return -EINVAL; } =20 + /* + * Before RAID adding folio support, the logical_block_size + * should be smaller than the page size. + */ + if (lim->logical_block_size > PAGE_SIZE) { + pr_err("%s: logical_block_size must not larger than PAGE_SIZE\n", + mdname(mddev)); + return -EINVAL; + } + mddev->logical_block_size =3D lim->logical_block_size; + return 0; } EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits); @@ -6698,6 +6773,7 @@ static void md_clean(struct mddev *mddev) mddev->chunk_sectors =3D 0; mddev->ctime =3D mddev->utime =3D 0; mddev->layout =3D 0; + mddev->logical_block_size =3D 0; mddev->max_disks =3D 0; mddev->events =3D 0; mddev->can_decrease_events =3D 0; diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 49477b560cc9..f3b0d91d903d 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -383,6 +383,7 @@ static int raid0_set_limits(struct mddev *mddev) lim.max_hw_sectors =3D mddev->chunk_sectors; lim.max_write_zeroes_sectors =3D mddev->chunk_sectors; lim.max_hw_wzeroes_unmap_sectors =3D mddev->chunk_sectors; + lim.logical_block_size =3D mddev->logical_block_size; lim.io_min =3D mddev->chunk_sectors << 9; lim.io_opt =3D lim.io_min * mddev->raid_disks; lim.chunk_sectors =3D mddev->chunk_sectors; diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 64bfe8ca5b38..167768edaec1 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -3212,6 +3212,7 @@ static int raid1_set_limits(struct mddev *mddev) md_init_stacking_limits(&lim); lim.max_write_zeroes_sectors =3D 0; lim.max_hw_wzeroes_unmap_sectors =3D 0; + lim.logical_block_size =3D mddev->logical_block_size; lim.features |=3D BLK_FEAT_ATOMIC_WRITES; err =3D mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); if (err) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 6b2d4b7057ae..71bfed3b798d 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -4000,6 +4000,7 @@ static int raid10_set_queue_limits(struct mddev *mdde= v) md_init_stacking_limits(&lim); lim.max_write_zeroes_sectors =3D 0; lim.max_hw_wzeroes_unmap_sectors =3D 0; + lim.logical_block_size =3D mddev->logical_block_size; lim.io_min =3D mddev->chunk_sectors << 9; lim.chunk_sectors =3D mddev->chunk_sectors; lim.io_opt =3D lim.io_min * raid10_nr_stripes(conf); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index aa404abf5d17..92473850f381 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -7747,6 +7747,7 @@ static int raid5_set_limits(struct mddev *mddev) stripe =3D roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9)); =20 md_init_stacking_limits(&lim); + lim.logical_block_size =3D mddev->logical_block_size; lim.io_min =3D mddev->chunk_sectors << 9; lim.io_opt =3D lim.io_min * (conf->raid_disks - conf->max_degraded); lim.features |=3D BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE; --=20 2.39.2