From nobody Mon Feb 9 08:09:26 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBD6930ACFC; Mon, 3 Nov 2025 13:06:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175179; cv=none; b=ST0+hmiJxShBktG5+8UWor1NJ/5Sl/vWiIfT+V3A/sTekMqHALj15mAhUZ2imGrnG8wcaL0+yZAupQNcunTKPBmFrv+q0CmQcIrVQ4nxofo2LWCE0qGdX+3pVBszAW5eyjZWjLOcRXcua4jaZAw8LRe4GfJiZy9sqOMqKqMhV6E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175179; c=relaxed/simple; bh=PhQqaT7TLlwwWZMLTuLIHr+JuBUVA9SVerX47l+JoUk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZPmKn9dZgc9Gzl3HDc3mnOxBO2GIL3aXqEOCta5UAFOqlHMwD8jE3HhmRbkwZW0vD6DUxBWiudC03/D2HUx/W8Q8umlQU1UR3oBSfSpbI0ByTze2Iwmv7AYIjY9d5aRzaxjo2bgYVN2q+foOgo1ZfY05+S9gEkuneqLzyXixbIs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4d0Wzp0Tq8zYQtM0; Mon, 3 Nov 2025 21:05:58 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 7641F1A01A3; Mon, 3 Nov 2025 21:06:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.87.129]) by APP2 (Coremail) with SMTP id Syh0CgCHK0TCqAhp+jFMCg--.19557S5; Mon, 03 Nov 2025 21:06:11 +0800 (CST) From: linan666@huaweicloud.com To: corbet@lwn.net, song@kernel.org, yukuai@fnnas.com, linan122@huawei.com, xni@redhat.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, linan666@huaweicloud.com, yangerkun@huawei.com, yi.zhang@huawei.com Subject: [PATCH v9 1/5] md: delete md_redundancy_group when array is becoming inactive Date: Mon, 3 Nov 2025 20:57:53 +0800 Message-Id: <20251103125757.1405796-2-linan666@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251103125757.1405796-1-linan666@huaweicloud.com> References: <20251103125757.1405796-1-linan666@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgCHK0TCqAhp+jFMCg--.19557S5 X-Coremail-Antispam: 1UD129KBjvJXoW7ZFWktF4UJrW5CFy8Xr4rAFb_yoW8Cw45pr Z5KryYkr15tw1Iya1DZa48uFy5Aa1xXr9rXrZ7Cw1jva4xZw47CrWagFW0qr9rCFZ7CFWr Xa1kAFWrW3Z2kaUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAa c4AC62xK8xCEY4vEwIxC4wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzV Aqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr1j6F4UJwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7M4IIrI8v6xkF7I0E8cxan2IY04v7M4kE6xkIj40Ew7xC0wCY1x0262kKe7AKxVWUtVW8 ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r 1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij 64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr 0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF 0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JU46wtUUUUU= X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Li Nan 'md_redundancy_group' are created in md_run() and deleted in del_gendisk(), but these are not paired. Writing inactive/active to sysfs array_state can trigger md_run() multiple times without del_gendisk(), leading to duplicate creation as below: sysfs: cannot create duplicate filename '/devices/virtual/block/md0/md/syn= c_action' Call Trace: dump_stack_lvl+0x9f/0x120 dump_stack+0x14/0x20 sysfs_warn_dup+0x96/0xc0 sysfs_add_file_mode_ns+0x19c/0x1b0 internal_create_group+0x213/0x830 sysfs_create_group+0x17/0x20 md_run+0x856/0xe60 ? __x64_sys_openat+0x23/0x30 do_md_run+0x26/0x1d0 array_state_store+0x559/0x760 md_attr_store+0xc9/0x1e0 sysfs_kf_write+0x6f/0xa0 kernfs_fop_write_iter+0x141/0x2a0 vfs_write+0x1fc/0x5a0 ksys_write+0x79/0x180 __x64_sys_write+0x1d/0x30 x64_sys_call+0x2818/0x2880 do_syscall_64+0xa9/0x580 entry_SYSCALL_64_after_hwframe+0x4b/0x53 md: cannot register extra attributes for md0 Creation of it depends on 'pers', its lifecycle cannot be aligned with gendisk. So fix this issue by triggering 'md_redundancy_group' deletion when the array is becoming inactive. Fixes: 790abe4d77af ("md: remove/add redundancy group only in level change") Signed-off-by: Li Nan Reviewed-by: Xiao Ni --- drivers/md/md.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index fa13eb02874e..f6fd55a1637b 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6878,6 +6878,10 @@ static int do_md_stop(struct mddev *mddev, int mode) if (!md_is_rdwr(mddev)) set_disk_ro(disk, 0); =20 + if (mode =3D=3D 2 && mddev->pers->sync_request && + mddev->to_remove =3D=3D NULL) + mddev->to_remove =3D &md_redundancy_group; + __md_stop_writes(mddev); __md_stop(mddev); =20 --=20 2.39.2 From nobody Mon Feb 9 08:09:26 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA05E30C37B; Mon, 3 Nov 2025 13:06:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175178; cv=none; b=VWkawTILzoWanXU66qCPsYlvOHWzftJW2KC+ahJWo/GVk9/zfOSM8GR4C66uETLsZJhQhKTlX2NSNEsOCu2hvAHekLQxpXfm8WfAfOmpZ0tcshGjQgldXfY7KAcrAgRz0WYhJa/dLO28yFnewPwS5R3A56oFpda3ObWuInZi7Gk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175178; c=relaxed/simple; bh=jEYk7IoOCUlwQqplPMVYUiRApi1kbzCpFpscGepfb/g=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HeP54iJgqp6t4QU6vYYbCm8Qnvjvhc9RHgPiwfzDxM8ezB6iafKE8A2HVkJ9qntELpLQPoasXBC7XmItxofMichNwF+M0hGrSCQwJAo/G92m55UI+g3P01CVHZgEqc9eYXoSOl17BduWbWDa95a9jbJdMObdlQGLApfd+H8lkSM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4d0Wzz1zCkzKHMKp; Mon, 3 Nov 2025 21:06:07 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 915C01A0D7D; Mon, 3 Nov 2025 21:06:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.87.129]) by APP2 (Coremail) with SMTP id Syh0CgCHK0TCqAhp+jFMCg--.19557S6; Mon, 03 Nov 2025 21:06:11 +0800 (CST) From: linan666@huaweicloud.com To: corbet@lwn.net, song@kernel.org, yukuai@fnnas.com, linan122@huawei.com, xni@redhat.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, linan666@huaweicloud.com, yangerkun@huawei.com, yi.zhang@huawei.com Subject: [PATCH v9 2/5] md: init bioset in mddev_init Date: Mon, 3 Nov 2025 20:57:54 +0800 Message-Id: <20251103125757.1405796-3-linan666@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251103125757.1405796-1-linan666@huaweicloud.com> References: <20251103125757.1405796-1-linan666@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgCHK0TCqAhp+jFMCg--.19557S6 X-Coremail-Antispam: 1UD129KBjvJXoWxWw4kAF1ftr4DGFy3Cr1fCrg_yoWrtw1xpa yxXas5Kr4kJrWag347JF1v93WrXr1xtFZrtrW7Wrn5Aa1Syr4UG3WYgF48ZFykG3ykCa15 Ww1rJFW3WF15ur7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAa c4AC62xK8xCEY4vEwIxC4wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzV Aqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr1j6F4UJwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7M4IIrI8v6xkF7I0E8cxan2IY04v7M4kE6xkIj40Ew7xC0wCY1x0262kKe7AKxVWUtVW8 ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r 1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij 64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr 0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF 0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUHnQUUUUUU= X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Li Nan IO operations may be needed before md_run(), such as updating metadata after writing sysfs. Without bioset, this triggers a NULL pointer dereference as below: BUG: kernel NULL pointer dereference, address: 0000000000000020 Call Trace: md_update_sb+0x658/0xe00 new_level_store+0xc5/0x120 md_attr_store+0xc9/0x1e0 sysfs_kf_write+0x6f/0xa0 kernfs_fop_write_iter+0x141/0x2a0 vfs_write+0x1fc/0x5a0 ksys_write+0x79/0x180 __x64_sys_write+0x1d/0x30 x64_sys_call+0x2818/0x2880 do_syscall_64+0xa9/0x580 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Reproducer ``` mdadm -CR /dev/md0 -l1 -n2 /dev/sd[cd] echo inactive > /sys/block/md0/md/array_state echo 10 > /sys/block/md0/md/new_level ``` mddev_init() can only be called once per mddev, no need to test if bioset has been initialized anymore. Fixes: d981ed841930 ("md: Add new_level sysfs interface") Signed-off-by: Li Nan Reviewed-by: Xiao Ni --- drivers/md/md.c | 69 +++++++++++++++++++++++-------------------------- 1 file changed, 33 insertions(+), 36 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index f6fd55a1637b..dffc6a482181 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -730,6 +730,8 @@ static void mddev_clear_bitmap_ops(struct mddev *mddev) =20 int mddev_init(struct mddev *mddev) { + int err =3D 0; + if (!IS_ENABLED(CONFIG_MD_BITMAP)) mddev->bitmap_id =3D ID_BITMAP_NONE; else @@ -741,10 +743,23 @@ int mddev_init(struct mddev *mddev) =20 if (percpu_ref_init(&mddev->writes_pending, no_op, PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) { - percpu_ref_exit(&mddev->active_io); - return -ENOMEM; + err =3D -ENOMEM; + goto exit_acitve_io; } =20 + err =3D bioset_init(&mddev->bio_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS); + if (err) + goto exit_writes_pending; + + err =3D bioset_init(&mddev->sync_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS= ); + if (err) + goto exit_bio_set; + + err =3D bioset_init(&mddev->io_clone_set, BIO_POOL_SIZE, + offsetof(struct md_io_clone, bio_clone), 0); + if (err) + goto exit_sync_set; + /* We want to start with the refcount at zero */ percpu_ref_put(&mddev->writes_pending); =20 @@ -773,11 +788,24 @@ int mddev_init(struct mddev *mddev) INIT_WORK(&mddev->del_work, mddev_delayed_delete); =20 return 0; + +exit_sync_set: + bioset_exit(&mddev->sync_set); +exit_bio_set: + bioset_exit(&mddev->bio_set); +exit_writes_pending: + percpu_ref_exit(&mddev->writes_pending); +exit_acitve_io: + percpu_ref_exit(&mddev->active_io); + return err; } EXPORT_SYMBOL_GPL(mddev_init); =20 void mddev_destroy(struct mddev *mddev) { + bioset_exit(&mddev->bio_set); + bioset_exit(&mddev->sync_set); + bioset_exit(&mddev->io_clone_set); percpu_ref_exit(&mddev->active_io); percpu_ref_exit(&mddev->writes_pending); } @@ -6393,29 +6421,9 @@ int md_run(struct mddev *mddev) nowait =3D nowait && bdev_nowait(rdev->bdev); } =20 - if (!bioset_initialized(&mddev->bio_set)) { - err =3D bioset_init(&mddev->bio_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS= ); - if (err) - return err; - } - if (!bioset_initialized(&mddev->sync_set)) { - err =3D bioset_init(&mddev->sync_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVEC= S); - if (err) - goto exit_bio_set; - } - - if (!bioset_initialized(&mddev->io_clone_set)) { - err =3D bioset_init(&mddev->io_clone_set, BIO_POOL_SIZE, - offsetof(struct md_io_clone, bio_clone), 0); - if (err) - goto exit_sync_set; - } - pers =3D get_pers(mddev->level, mddev->clevel); - if (!pers) { - err =3D -EINVAL; - goto abort; - } + if (!pers) + return -EINVAL; if (mddev->level !=3D pers->head.id) { mddev->level =3D pers->head.id; mddev->new_level =3D pers->head.id; @@ -6426,8 +6434,7 @@ int md_run(struct mddev *mddev) pers->start_reshape =3D=3D NULL) { /* This personality cannot handle reshaping... */ put_pers(pers); - err =3D -EINVAL; - goto abort; + return -EINVAL; } =20 if (pers->sync_request) { @@ -6554,12 +6561,6 @@ int md_run(struct mddev *mddev) mddev->private =3D NULL; put_pers(pers); md_bitmap_destroy(mddev); -abort: - bioset_exit(&mddev->io_clone_set); -exit_sync_set: - bioset_exit(&mddev->sync_set); -exit_bio_set: - bioset_exit(&mddev->bio_set); return err; } EXPORT_SYMBOL_GPL(md_run); @@ -6784,10 +6785,6 @@ static void __md_stop(struct mddev *mddev) mddev->private =3D NULL; put_pers(pers); clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); - - bioset_exit(&mddev->bio_set); - bioset_exit(&mddev->sync_set); - bioset_exit(&mddev->io_clone_set); } =20 void md_stop(struct mddev *mddev) --=20 2.39.2 From nobody Mon Feb 9 08:09:26 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB94A2FCC10; Mon, 3 Nov 2025 13:06:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175180; cv=none; b=OXYJNJ0UuhExxnMw9TI4JLx+HHvF96pVbkDV54zpEoBYQdet/l+CgCDX2L5JN7tSmvl6us/Y0bDiRyarq3rxYMC9GXTkQwMFPeeG8a+AHH+ravbhZPAXb5QtHjIIA4rbZLasvYXiSZbuO+Rrl5AZ4yvJc0eJ6aanI+20xb2sRUg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175180; c=relaxed/simple; bh=mq/gv1GTUPJA1LP4h8sMmbcHhpFRDBryKB5G+p5WkvY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QDmE8qy7fNUord0O05ZINYZaL1t1Ha0cU5fnLqiKHwgCSdrizSwzModTK9PzW0aA+lUNc1imyFVxxWnTiNilmtULdjBQGWpZsLvCmj6zDrHrJESzeM2LMkW9tdqWQRJpYr5MahkDoWChijzRbPN7urnO1E9wsvJhxz42j19XDIA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4d0Wzp29LzzYQtqJ; Mon, 3 Nov 2025 21:05:58 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id AB9FC1A0359; Mon, 3 Nov 2025 21:06:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.87.129]) by APP2 (Coremail) with SMTP id Syh0CgCHK0TCqAhp+jFMCg--.19557S7; Mon, 03 Nov 2025 21:06:11 +0800 (CST) From: linan666@huaweicloud.com To: corbet@lwn.net, song@kernel.org, yukuai@fnnas.com, linan122@huawei.com, xni@redhat.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, linan666@huaweicloud.com, yangerkun@huawei.com, yi.zhang@huawei.com Subject: [PATCH v9 3/5] md/raid0: Move queue limit setup before r0conf initialization Date: Mon, 3 Nov 2025 20:57:55 +0800 Message-Id: <20251103125757.1405796-4-linan666@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251103125757.1405796-1-linan666@huaweicloud.com> References: <20251103125757.1405796-1-linan666@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgCHK0TCqAhp+jFMCg--.19557S7 X-Coremail-Antispam: 1UD129KBjvJXoW7Aw1kWw43Xr15WFW5JF15XFb_yoW8Kr1rpw s3K3ZIgry0gFW3WayDZrWkua4Fqa48trWDtF9xZ348Xryavr1FgFy3Xa45WFW3t3yrAF15 X3yYkFZ7Cr9xKrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAa c4AC62xK8xCEY4vEwIxC4wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzV Aqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr1j6F4UJwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7M4IIrI8v6xkF7I0E8cxan2IY04v7M4kE6xkIj40Ew7xC0wCY1x0262kKe7AKxVWUtVW8 ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r 1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij 64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr 0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF 0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUd5rcUUUUU= X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Li Nan Prepare for making logical blocksize configurable. This change has no impact until logical block size becomes configurable. Move raid0_set_limits() before create_strip_zones(). It is safe as fields modified in create_strip_zones() do not involve mddev configuration, and rdev modifications there are not used in raid0_set_limits(). 'blksize' in create_strip_zones() fetches mddev's logical block size, which is already the maximum aross all rdevs, so the later max() can be removed. Signed-off-by: Li Nan Reviewed-by: Xiao Ni --- drivers/md/raid0.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index e443e478645a..fbf763401521 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -68,7 +68,7 @@ static int create_strip_zones(struct mddev *mddev, struct= r0conf **private_conf) struct strip_zone *zone; int cnt; struct r0conf *conf =3D kzalloc(sizeof(*conf), GFP_KERNEL); - unsigned blksize =3D 512; + unsigned int blksize =3D queue_logical_block_size(mddev->gendisk->queue); =20 *private_conf =3D ERR_PTR(-ENOMEM); if (!conf) @@ -84,9 +84,6 @@ static int create_strip_zones(struct mddev *mddev, struct= r0conf **private_conf) sector_div(sectors, mddev->chunk_sectors); rdev1->sectors =3D sectors * mddev->chunk_sectors; =20 - blksize =3D max(blksize, queue_logical_block_size( - rdev1->bdev->bd_disk->queue)); - rdev_for_each(rdev2, mddev) { pr_debug("md/raid0:%s: comparing %pg(%llu)" " with %pg(%llu)\n", @@ -405,6 +402,12 @@ static int raid0_run(struct mddev *mddev) if (md_check_no_bitmap(mddev)) return -EINVAL; =20 + if (!mddev_is_dm(mddev)) { + ret =3D raid0_set_limits(mddev); + if (ret) + return ret; + } + /* if private is not null, we are here after takeover */ if (mddev->private =3D=3D NULL) { ret =3D create_strip_zones(mddev, &conf); @@ -413,11 +416,6 @@ static int raid0_run(struct mddev *mddev) mddev->private =3D conf; } conf =3D mddev->private; - if (!mddev_is_dm(mddev)) { - ret =3D raid0_set_limits(mddev); - if (ret) - return ret; - } =20 /* calculate array device size */ md_set_array_sectors(mddev, raid0_size(mddev, 0, 0)); --=20 2.39.2 From nobody Mon Feb 9 08:09:26 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9B5F30BBAE; Mon, 3 Nov 2025 13:06:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175178; cv=none; b=TB17Xv2eQNH4Ha4LonssI4zm3NeQvZWKuIc91vMvwFVkTOafz+1ArqapaYUtkDWPWSoPR0Om2mc8lmwEjJbEQTIUDJh/f0pgPGEZ5i0Y7bN20aBxV2A9solxrWRQBGVWgm2/qmcpdt2zEYV7yk1RtmLgQ39Y7GLLUseH6NYegZs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175178; c=relaxed/simple; bh=/PEyIDJ2p8WdahNDBHhGyuDuzhhotlhiawUBlxMwSSk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IeQPjnWOyWG+XRlzg7YIBR8lt5picPGv3E9H2xe5uBQmqVrJoaEqaBtnlT9kzHxVRRoxZ/jdfbddvYj6GrDU+5gY2JPKyShPH4nHWaV6yGO7Ge5lnOh4zdwikcPY3aSLwO9SXJu6wAwkl4e6w68kuN7IFrpJ+UfoWxdDdzA8HRc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4d0Wzz3BK4zKHMMY; Mon, 3 Nov 2025 21:06:07 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id C1D051A018D; Mon, 3 Nov 2025 21:06:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.87.129]) by APP2 (Coremail) with SMTP id Syh0CgCHK0TCqAhp+jFMCg--.19557S8; Mon, 03 Nov 2025 21:06:11 +0800 (CST) From: linan666@huaweicloud.com To: corbet@lwn.net, song@kernel.org, yukuai@fnnas.com, linan122@huawei.com, xni@redhat.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, linan666@huaweicloud.com, yangerkun@huawei.com, yi.zhang@huawei.com Subject: [PATCH v9 4/5] md: add check_new_feature module parameter Date: Mon, 3 Nov 2025 20:57:56 +0800 Message-Id: <20251103125757.1405796-5-linan666@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251103125757.1405796-1-linan666@huaweicloud.com> References: <20251103125757.1405796-1-linan666@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgCHK0TCqAhp+jFMCg--.19557S8 X-Coremail-Antispam: 1UD129KBjvJXoW7uFy7ur1DWr1kCryUAw47Jwb_yoW8CF1xpa 1rXryavr47Xw12yayvqr1kuryrJ3s2yay7Kry5A34xur1UKr95AFW3tFWFqrnF9ry5Zr4I gF4UZ3Wxu3WxCFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHa14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAac4AC62xK8xCEY4vEwIxC4wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0V AKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr1j6F4U JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7M4kE6xkIj40Ew7xC0wCY1x0262kKe7AKxVWU tVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8 JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUOPfHDU UUU X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Li Nan Raid checks if pad3 is zero when loading superblock from disk. Arrays created with new features may fail to assemble on old kernels as pad3 is used. Add module parameter check_new_feature to bypass this check. Signed-off-by: Li Nan Reviewed-by: Xiao Ni --- drivers/md/md.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index dffc6a482181..5921fb245bfa 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -339,6 +339,7 @@ static int start_readonly; */ static bool create_on_open =3D true; static bool legacy_async_del_gendisk =3D true; +static bool check_new_feature =3D true; =20 /* * We have a system wide 'event count' that is incremented @@ -1850,9 +1851,13 @@ static int super_1_load(struct md_rdev *rdev, struct= md_rdev *refdev, int minor_ } if (sb->pad0 || sb->pad3[0] || - memcmp(sb->pad3, sb->pad3+1, sizeof(sb->pad3) - sizeof(sb->pad3[1]))) - /* Some padding is non-zero, might be a new feature */ - return -EINVAL; + memcmp(sb->pad3, sb->pad3+1, sizeof(sb->pad3) - sizeof(sb->pad3[1])))= { + pr_warn("Some padding is non-zero on %pg, might be a new feature\n", + rdev->bdev); + if (check_new_feature) + return -EINVAL; + pr_warn("check_new_feature is disabled, data corruption possible\n"); + } =20 rdev->preferred_minor =3D 0xffff; rdev->data_offset =3D le64_to_cpu(sb->data_offset); @@ -10704,6 +10709,7 @@ module_param(start_dirty_degraded, int, S_IRUGO|S_I= WUSR); module_param_call(new_array, add_named_array, NULL, NULL, S_IWUSR); module_param(create_on_open, bool, S_IRUSR|S_IWUSR); module_param(legacy_async_del_gendisk, bool, 0600); +module_param(check_new_feature, bool, 0600); =20 MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("MD RAID framework"); --=20 2.39.2 From nobody Mon Feb 9 08:09:26 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A99824886A; Mon, 3 Nov 2025 13:06:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175179; cv=none; b=nDRXLbIrP70zX1fVn3MDMDyfFlHFHhnZ+OEMRLRvHbvHCPHF1O4uq5wy8zl4B/jp1NtELhlnywqyJrA3XGLGGIWZgkMsmdJrXuc7vx6kyF4v9YTZyT/NmK1TdXroRnhL5x6LEuLHR5D7aCcPRpFpROflEeduEPdsMuY7A/WUwPU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762175179; c=relaxed/simple; bh=pT4gWwni8GXCBGr287dZhpptjudcskhJYAasi+GoTRw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fAwjxNBF9/G+CUbsmJAVtQ17jtWe8Mx/vCbIPMpO25VvrmvTKOfbVozdhog9VDKeU9QtEIwHFs1LNbbYznwJ8Tie8UhdXNaSzEyl3oQCQb6pUv3ueGH50HhuoUNnimqGSZu937zUp2U4WlyKz5rvgAROftYGQ0xb91B5oKPpWLQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4d0Wzz3kBtzKHMSJ; Mon, 3 Nov 2025 21:06:07 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id D4B041A0C55; Mon, 3 Nov 2025 21:06:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.50.87.129]) by APP2 (Coremail) with SMTP id Syh0CgCHK0TCqAhp+jFMCg--.19557S9; Mon, 03 Nov 2025 21:06:11 +0800 (CST) From: linan666@huaweicloud.com To: corbet@lwn.net, song@kernel.org, yukuai@fnnas.com, linan122@huawei.com, xni@redhat.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, linan666@huaweicloud.com, yangerkun@huawei.com, yi.zhang@huawei.com Subject: [PATCH v9 5/5] md: allow configuring logical block size Date: Mon, 3 Nov 2025 20:57:57 +0800 Message-Id: <20251103125757.1405796-6-linan666@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20251103125757.1405796-1-linan666@huaweicloud.com> References: <20251103125757.1405796-1-linan666@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgCHK0TCqAhp+jFMCg--.19557S9 X-Coremail-Antispam: 1UD129KBjvJXoWfGF4Dtw4DXr45WrW3AF4xZwb_yoWkJFWrpa 97ZFyfZ34DXayYyan7AFykuF15X348GFWqkry7W3y0vr9xCr17GF4fWFy5Xryjqwn8AwnF q3WDKrWDu3Z2gF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUHa14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAac4AC62xK8xCEY4vEwIxC4wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0V AKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr1j6F4U JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20V AGYxC7M4IIrI8v6xkF7I0E8cxan2IY04v7M4kE6xkIj40Ew7xC0wCY1x0262kKe7AKxVWU tVW8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14 v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkG c2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI 0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8 JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUOPfHDU UUU X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Li Nan Previously, raid array used the maximum logical block size (LBS) of all member disks. Adding a larger LBS disk at runtime could unexpectedly increase RAID's LBS, risking corruption of existing partitions. This can be reproduced by: ``` # LBS of sd[de] is 512 bytes, sdf is 4096 bytes. mdadm -CRq /dev/md0 -l1 -n3 /dev/sd[de] missing --assume-clean # LBS is 512 cat /sys/block/md0/queue/logical_block_size # create partition md0p1 parted -s /dev/md0 mklabel gpt mkpart primary 1MiB 100% lsblk | grep md0p1 # LBS becomes 4096 after adding sdf mdadm --add -q /dev/md0 /dev/sdf cat /sys/block/md0/queue/logical_block_size # partition lost partprobe /dev/md0 lsblk | grep md0p1 ``` Simply restricting larger-LBS disks is inflexible. In some scenarios, only disks with 512 bytes LBS are available currently, but later, disks with 4KB LBS may be added to the array. Making LBS configurable is the best way to solve this scenario. After this patch, the raid will: - store LBS in disk metadata - add a read-write sysfs 'mdX/logical_block_size' Future mdadm should support setting LBS via metadata field during RAID creation and the new sysfs. Though the kernel allows runtime LBS changes, users should avoid modifying it after creating partitions or filesystems to prevent compatibility issues. Only 1.x metadata supports configurable LBS. 0.90 metadata inits all fields to default values at auto-detect. Supporting 0.90 would require more extensive changes and no such use case has been observed. Note that many RAID paths rely on PAGE_SIZE alignment, including for metadata I/O. A larger LBS than PAGE_SIZE will result in metadata read/write failures. So this config should be prevented. Signed-off-by: Li Nan Reviewed-by: Xiao Ni --- Documentation/admin-guide/md.rst | 10 +++++ drivers/md/md.h | 1 + include/uapi/linux/raid/md_p.h | 3 +- drivers/md/md-linear.c | 1 + drivers/md/md.c | 77 ++++++++++++++++++++++++++++++++ drivers/md/raid0.c | 1 + drivers/md/raid1.c | 1 + drivers/md/raid10.c | 1 + drivers/md/raid5.c | 1 + 9 files changed, 95 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/m= d.rst index 1c2eacc94758..b7e7081889fe 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -238,6 +238,16 @@ All md devices contain: the number of devices in a raid4/5/6, or to support external metadata formats which mandate such clipping. =20 + logical_block_size + Configure the array's logical block size in bytes. This attribute + is only supported for 1.x meta. Write the value before starting + array. The final array LBS uses the maximum between this + configuration and LBS of all combined devices. Note that + LBS cannot exceed PAGE_SIZE before RAID supports folio. + WARNING: Arrays created on new kernel cannot be assembled at old + kernel due to padding check, Set module parameter 'check_new_feature' + to false to bypass, but data loss may occur. + reshape_position This is either ``none`` or a sector number within the devices of the array where ``reshape`` is up to. If this is set, the three diff --git a/drivers/md/md.h b/drivers/md/md.h index 38a7c2fab150..a6b3cb69c28c 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -432,6 +432,7 @@ struct mddev { sector_t array_sectors; /* exported array size */ int external_size; /* size managed * externally */ + unsigned int logical_block_size; __u64 events; /* If the last 'event' was simply a clean->dirty transition, and * we didn't write it to the spares, then it is safe and simple diff --git a/include/uapi/linux/raid/md_p.h b/include/uapi/linux/raid/md_p.h index ac74133a4768..310068bb2a1d 100644 --- a/include/uapi/linux/raid/md_p.h +++ b/include/uapi/linux/raid/md_p.h @@ -291,7 +291,8 @@ struct mdp_superblock_1 { __le64 resync_offset; /* data before this offset (from data_offset) known= to be in sync */ __le32 sb_csum; /* checksum up to devs[max_dev] */ __le32 max_dev; /* size of devs[] array to consider */ - __u8 pad3[64-32]; /* set to 0 when writing */ + __le32 logical_block_size; /* same as q->limits->logical_block_size */ + __u8 pad3[64-36]; /* set to 0 when writing */ =20 /* device state information. Indexed by dev_number. * 2 bytes per device diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c index 7033d982d377..50d4a419a16e 100644 --- a/drivers/md/md-linear.c +++ b/drivers/md/md-linear.c @@ -72,6 +72,7 @@ static int linear_set_limits(struct mddev *mddev) =20 md_init_stacking_limits(&lim); lim.max_hw_sectors =3D mddev->chunk_sectors; + lim.logical_block_size =3D mddev->logical_block_size; lim.max_write_zeroes_sectors =3D mddev->chunk_sectors; lim.max_hw_wzeroes_unmap_sectors =3D mddev->chunk_sectors; lim.io_min =3D mddev->chunk_sectors << 9; diff --git a/drivers/md/md.c b/drivers/md/md.c index 5921fb245bfa..e5f994c33dfe 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1998,6 +1998,7 @@ static int super_1_validate(struct mddev *mddev, stru= ct md_rdev *freshest, struc mddev->layout =3D le32_to_cpu(sb->layout); mddev->raid_disks =3D le32_to_cpu(sb->raid_disks); mddev->dev_sectors =3D le64_to_cpu(sb->size); + mddev->logical_block_size =3D le32_to_cpu(sb->logical_block_size); mddev->events =3D ev1; mddev->bitmap_info.offset =3D 0; mddev->bitmap_info.space =3D 0; @@ -2207,6 +2208,7 @@ static void super_1_sync(struct mddev *mddev, struct = md_rdev *rdev) sb->chunksize =3D cpu_to_le32(mddev->chunk_sectors); sb->level =3D cpu_to_le32(mddev->level); sb->layout =3D cpu_to_le32(mddev->layout); + sb->logical_block_size =3D cpu_to_le32(mddev->logical_block_size); if (test_bit(FailFast, &rdev->flags)) sb->devflags |=3D FailFast1; else @@ -5935,6 +5937,68 @@ static struct md_sysfs_entry md_serialize_policy =3D __ATTR(serialize_policy, S_IRUGO | S_IWUSR, serialize_policy_show, serialize_policy_store); =20 +static int mddev_set_logical_block_size(struct mddev *mddev, + unsigned int lbs) +{ + int err =3D 0; + struct queue_limits lim; + + if (queue_logical_block_size(mddev->gendisk->queue) >=3D lbs) { + pr_err("%s: Cannot set LBS smaller than mddev LBS %u\n", + mdname(mddev), lbs); + return -EINVAL; + } + + lim =3D queue_limits_start_update(mddev->gendisk->queue); + lim.logical_block_size =3D lbs; + pr_info("%s: logical_block_size is changed, data may be lost\n", + mdname(mddev)); + err =3D queue_limits_commit_update(mddev->gendisk->queue, &lim); + if (err) + return err; + + mddev->logical_block_size =3D lbs; + /* New lbs will be written to superblock after array is running */ + set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); + return 0; +} + +static ssize_t +lbs_show(struct mddev *mddev, char *page) +{ + return sprintf(page, "%u\n", mddev->logical_block_size); +} + +static ssize_t +lbs_store(struct mddev *mddev, const char *buf, size_t len) +{ + unsigned int lbs; + int err =3D -EBUSY; + + /* Only 1.x meta supports configurable LBS */ + if (mddev->major_version =3D=3D 0) + return -EINVAL; + + if (mddev->pers) + return -EBUSY; + + err =3D kstrtouint(buf, 10, &lbs); + if (err < 0) + return -EINVAL; + + err =3D mddev_lock(mddev); + if (err) + goto unlock; + + err =3D mddev_set_logical_block_size(mddev, lbs); + +unlock: + mddev_unlock(mddev); + return err ?: len; +} + +static struct md_sysfs_entry md_logical_block_size =3D +__ATTR(logical_block_size, 0644, lbs_show, lbs_store); =20 static struct attribute *md_default_attrs[] =3D { &md_level.attr, @@ -5957,6 +6021,7 @@ static struct attribute *md_default_attrs[] =3D { &md_consistency_policy.attr, &md_fail_last_dev.attr, &md_serialize_policy.attr, + &md_logical_block_size.attr, NULL, }; =20 @@ -6087,6 +6152,17 @@ int mddev_stack_rdev_limits(struct mddev *mddev, str= uct queue_limits *lim, return -EINVAL; } =20 + /* + * Before RAID adding folio support, the logical_block_size + * should be smaller than the page size. + */ + if (lim->logical_block_size > PAGE_SIZE) { + pr_err("%s: logical_block_size must not larger than PAGE_SIZE\n", + mdname(mddev)); + return -EINVAL; + } + mddev->logical_block_size =3D lim->logical_block_size; + return 0; } EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits); @@ -6698,6 +6774,7 @@ static void md_clean(struct mddev *mddev) mddev->chunk_sectors =3D 0; mddev->ctime =3D mddev->utime =3D 0; mddev->layout =3D 0; + mddev->logical_block_size =3D 0; mddev->max_disks =3D 0; mddev->events =3D 0; mddev->can_decrease_events =3D 0; diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index fbf763401521..47aee1b1d4d1 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -380,6 +380,7 @@ static int raid0_set_limits(struct mddev *mddev) lim.max_hw_sectors =3D mddev->chunk_sectors; lim.max_write_zeroes_sectors =3D mddev->chunk_sectors; lim.max_hw_wzeroes_unmap_sectors =3D mddev->chunk_sectors; + lim.logical_block_size =3D mddev->logical_block_size; lim.io_min =3D mddev->chunk_sectors << 9; lim.io_opt =3D lim.io_min * mddev->raid_disks; lim.chunk_sectors =3D mddev->chunk_sectors; diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 64bfe8ca5b38..167768edaec1 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -3212,6 +3212,7 @@ static int raid1_set_limits(struct mddev *mddev) md_init_stacking_limits(&lim); lim.max_write_zeroes_sectors =3D 0; lim.max_hw_wzeroes_unmap_sectors =3D 0; + lim.logical_block_size =3D mddev->logical_block_size; lim.features |=3D BLK_FEAT_ATOMIC_WRITES; err =3D mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); if (err) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 6b2d4b7057ae..71bfed3b798d 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -4000,6 +4000,7 @@ static int raid10_set_queue_limits(struct mddev *mdde= v) md_init_stacking_limits(&lim); lim.max_write_zeroes_sectors =3D 0; lim.max_hw_wzeroes_unmap_sectors =3D 0; + lim.logical_block_size =3D mddev->logical_block_size; lim.io_min =3D mddev->chunk_sectors << 9; lim.chunk_sectors =3D mddev->chunk_sectors; lim.io_opt =3D lim.io_min * raid10_nr_stripes(conf); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index aa404abf5d17..92473850f381 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -7747,6 +7747,7 @@ static int raid5_set_limits(struct mddev *mddev) stripe =3D roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9)); =20 md_init_stacking_limits(&lim); + lim.logical_block_size =3D mddev->logical_block_size; lim.io_min =3D mddev->chunk_sectors << 9; lim.io_opt =3D lim.io_min * (conf->raid_disks - conf->max_degraded); lim.features |=3D BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE; --=20 2.39.2