From nobody Fri Jun 12 11:37:15 2026 Received: from va-2-39.ptr.blmpb.com (va-2-39.ptr.blmpb.com [209.127.231.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1CD2436375 for ; Fri, 15 May 2026 09:27:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.39 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778837255; cv=none; b=ffhZib4rR56+rhtgnzMC+yMcUMOcyyKt7qEkdJXrMNfPWVNZU2Vc3YAyBTOFJ/aX3j/P+M9+Fhie76J/zV1slt200wcgm5806/6zuHCU6TfiDhl81XR4e6PuHFeFOn559CaEbq+9+4FYraOCFSY67Z0oZOqlwDXG4Fli4IBL3Fs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778837255; c=relaxed/simple; bh=XYnGiWsaN649d0/7XY25+jqzHGVQHN5Xjv2seepneMI=; h=To:From:Mime-Version:Cc:Message-Id:References:Content-Type: Subject:Date:In-Reply-To; b=nMmLcopYj4Q71QGNLb3ojBCcOSoT0QnXv6PXu+bcz6YaAkZ2/quqWg6J9VYSwlG52RwHqDjG3ZWBn++uUJHzzUMqbpyX5scdReT4HTEWajdnSLT4VF02dMAvrj5n7gR01rY8Kj6SvShxRKJgNenKQBYJwhzU3e8jENRMXbxkjos= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com; spf=pass smtp.mailfrom=fnnas.com; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b=1gLItB+g; arc=none smtp.client-ip=209.127.231.39 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fnnas.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b="1gLItB+g" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=s1; d=fnnas-com.20200927.dkim.feishu.cn; t=1778837243; h=from:subject:mime-version:from:date:message-id:subject:to:cc: reply-to:content-type:mime-version:in-reply-to:message-id; bh=8diOLyWpYEtCsYStK3Oq539ZHqPLb2+WJkzbvQI1Q+4=; b=1gLItB+gzyB9VFySHuxocxH88B8u4+bucH4LR9mtZZmuZ8behk4Q79H4U2VCtQMxaH9CKv nFse0oTOJtR5aa1OIZFIUbNsc74xzp/hMlG2dpr3XTYbjm8oOyniH+w3IZ/aJUTCIxXRcR +jLV7QMoXejzidBtMczRCkFZ0BFYemRiXUtsAjyER2ng2DolcRzk+4b4PyrNijV4MONCsP hP9D0ah/AG0/G4+H97OIUadn15SQqh/JL0WuXeSWARSEVAblbX8EOMzS+2G2CccDG74sYc dH7O10H5I1i2oEvvESNwKvdeKlSrDyDT47p5yzjlrmTLx1FZ8y4+hL7jDeBR7A== To: "Yu Kuai" From: "Chen Cheng" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Cc: "Chen Cheng" , , Message-Id: <20260515092707.3436464-2-chencheng@fnnas.com> X-Original-From: chencheng@fnnas.com References: <20260515092707.3436464-1-chencheng@fnnas.com> Received: from localhost.localdomain ([113.111.244.134]) by smtp.feishu.cn with ESMTPS; Fri, 15 May 2026 17:27:20 +0800 X-Lms-Return-Path: Subject: [PATCH v2 1/2] md/raid10: make r10bio_pool use fixed-size objects Date: Fri, 15 May 2026 17:27:06 +0800 X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260515092707.3436464-1-chencheng@fnnas.com> Content-Type: text/plain; charset="utf-8" From: Chen Cheng raid10 currently allocates r10bio_pool objects with conf->geo.raid_disks, which makes regular r10bio objects geometry-dependent. That model breaks down across reshape. mempool objects are preallocated and reused, so a reshape that changes the number of raid disks can leave old r10bio objects in the regular I/O pool with a devs[] array sized for the previous geometry. After the geometry switch, those stale objects may be reused or later freed under the new layout, creating a width mismatch between the reused r10bio and the current array geometry. For example, during a 4-disk to 5-disk reshape, an r10bio allocated before the geometry switch has room for only 4 devs[] entries. After reshape updates conf->geo.raid_disks to 5, that stale object can be reused under the new geometry. Code such as __make_request(), put_all_bios(), and find_bio_disk() may then access devs[] using the new geometry and step past the end of the old 4-slot object, leading to slab out-of-bounds accesses. The root problem is that regular r10bio pool objects are geometry-dependent, while mempool elements are preallocated and reused across requests. Switch r10bio_pool to a fixed-size kmalloc mempool so regular I/O objects no longer carry an allocation width tied to the current geometry. Use the same fixed-size allocation rule for the standalone r10bio allocated from r10buf_pool_alloc(). Because reshape updates live array state such as conf->mirrors, conf->geo, reshape_progress, and reshape_safe, the geometry switch must happen only after normal I/O has gone fully quiet. raise_barrier() alone is not strong enough here: freeze_array() also marks that an array freeze is in progress, flushes pending writes, and waits until in-flight I/O has either completed or been queued. Freeze the array before switching reshape geometry, rebuild r10bio_pool for the new width inside that freeze window, and switch raid10_quiesce() to use freeze_array()/unfreeze_array() as well. This keeps new requests from reusi= ng stale-width regular I/O objects after the geometry change. Signed-off-by: Chen Cheng --- drivers/md/raid10.c | 57 +++++++++++++++++++++++++++++++++------------ drivers/md/raid10.h | 2 +- 2 files changed, 43 insertions(+), 16 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 39085e7dd6d2..886bbe6b1ebc 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -103,13 +103,28 @@ static inline struct r10bio *get_resync_r10bio(struct= bio *bio) return get_resync_pages(bio)->raid_bio; } =20 -static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) +static inline unsigned int calc_r10bio_pool_disks(struct mddev *mddev) { - struct r10conf *conf =3D data; - int size =3D offsetof(struct r10bio, devs[conf->geo.raid_disks]); + /* If delta_disks < 0, use bigger r10bio->devs[] is ok. */ + return mddev->raid_disks + max(0, mddev->delta_disks); +} + +static inline int calc_r10bio_size(struct mddev *mddev) +{ + return offsetof(struct r10bio, devs[calc_r10bio_pool_disks(mddev)]); +} + +static mempool_t *create_r10bio_pool(struct mddev *mddev) +{ + int size =3D calc_r10bio_size(mddev); + + return mempool_create_kmalloc_pool(NR_RAID_BIOS, size); +} + +static struct r10bio *alloc_r10bio(struct mddev *mddev, gfp_t gfp_flags) +{ + int size =3D calc_r10bio_size(mddev); =20 - /* allocate a r10bio with room for raid_disks entries in the - * bios array */ return kzalloc(size, gfp_flags); } =20 @@ -137,7 +152,7 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *= data) int nalloc, nalloc_rp; struct resync_pages *rps; =20 - r10_bio =3D r10bio_pool_alloc(gfp_flags, conf); + r10_bio =3D alloc_r10bio(conf->mddev, gfp_flags); if (!r10_bio) return NULL; =20 @@ -277,7 +292,7 @@ static void free_r10bio(struct r10bio *r10_bio) struct r10conf *conf =3D r10_bio->mddev->private; =20 put_all_bios(conf, r10_bio); - mempool_free(r10_bio, &conf->r10bio_pool); + mempool_free(r10_bio, conf->r10bio_pool); } =20 static void put_buf(struct r10bio *r10_bio) @@ -1531,7 +1546,7 @@ static void __make_request(struct mddev *mddev, struc= t bio *bio, int sectors) struct r10conf *conf =3D mddev->private; struct r10bio *r10_bio; =20 - r10_bio =3D mempool_alloc(&conf->r10bio_pool, GFP_NOIO); + r10_bio =3D mempool_alloc(conf->r10bio_pool, GFP_NOIO); =20 r10_bio->master_bio =3D bio; r10_bio->sectors =3D sectors; @@ -1723,7 +1738,7 @@ static int raid10_handle_discard(struct mddev *mddev,= struct bio *bio) (last_stripe_index << geo->chunk_shift); =20 retry_discard: - r10_bio =3D mempool_alloc(&conf->r10bio_pool, GFP_NOIO); + r10_bio =3D mempool_alloc(conf->r10bio_pool, GFP_NOIO); r10_bio->mddev =3D mddev; r10_bio->state =3D 0; r10_bio->sectors =3D 0; @@ -3823,7 +3838,7 @@ static void raid10_free_conf(struct r10conf *conf) if (!conf) return; =20 - mempool_exit(&conf->r10bio_pool); + mempool_destroy(conf->r10bio_pool); kfree(conf->mirrors); kfree(conf->mirrors_old); kfree(conf->mirrors_new); @@ -3870,9 +3885,8 @@ static struct r10conf *setup_conf(struct mddev *mddev) =20 conf->geo =3D geo; conf->copies =3D copies; - err =3D mempool_init(&conf->r10bio_pool, NR_RAID_BIOS, r10bio_pool_alloc, - rbio_pool_free, conf); - if (err) + conf->r10bio_pool =3D create_r10bio_pool(mddev); + if (!conf->r10bio_pool) goto out; =20 err =3D bioset_init(&conf->bio_split, BIO_POOL_SIZE, 0, 0); @@ -4131,9 +4145,9 @@ static void raid10_quiesce(struct mddev *mddev, int q= uiesce) struct r10conf *conf =3D mddev->private; =20 if (quiesce) - raise_barrier(conf, 0); + freeze_array(conf, 0); else - lower_barrier(conf); + unfreeze_array(conf); } =20 static int raid10_resize(struct mddev *mddev, sector_t sectors) @@ -4365,6 +4379,7 @@ static int raid10_start_reshape(struct mddev *mddev) struct md_rdev *rdev; int spares =3D 0; int ret; + mempool_t *new_pool; =20 if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) return -EBUSY; @@ -4400,7 +4415,17 @@ static int raid10_start_reshape(struct mddev *mddev) if (spares < mddev->delta_disks) return -EINVAL; =20 + freeze_array(conf, 0); conf->offset_diff =3D min_offset_diff; + if (mddev->delta_disks > 0) { + new_pool =3D create_r10bio_pool(mddev); + if (!new_pool) { + unfreeze_array(conf); + return -ENOMEM; + } + mempool_destroy(conf->r10bio_pool); + conf->r10bio_pool =3D new_pool; + } spin_lock_irq(&conf->device_lock); if (conf->mirrors_new) { memcpy(conf->mirrors_new, conf->mirrors, @@ -4417,6 +4442,7 @@ static int raid10_start_reshape(struct mddev *mddev) sector_t size =3D raid10_size(mddev, 0, 0); if (size < mddev->array_sectors) { spin_unlock_irq(&conf->device_lock); + unfreeze_array(conf); pr_warn("md/raid10:%s: array size must be reduce before number of disks= \n", mdname(mddev)); return -EINVAL; @@ -4427,6 +4453,7 @@ static int raid10_start_reshape(struct mddev *mddev) conf->reshape_progress =3D 0; conf->reshape_safe =3D conf->reshape_progress; spin_unlock_irq(&conf->device_lock); + unfreeze_array(conf); =20 if (mddev->delta_disks && mddev->bitmap) { struct mdp_superblock_1 *sb =3D NULL; diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h index ec79d87fb92f..b711626a5db7 100644 --- a/drivers/md/raid10.h +++ b/drivers/md/raid10.h @@ -87,7 +87,7 @@ struct r10conf { */ wait_queue_head_t wait_barrier; =20 - mempool_t r10bio_pool; + mempool_t *r10bio_pool; mempool_t r10buf_pool; struct page *tmppage; struct bio_set bio_split; --=20 2.54.0 From nobody Fri Jun 12 11:37:15 2026 Received: from va-2-28.ptr.blmpb.com (va-2-28.ptr.blmpb.com [209.127.231.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D27A93E9F8E for ; Fri, 15 May 2026 09:27:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.28 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778837257; cv=none; b=NYtO6kVzTXP2lnPBdCRfezrk4FadMIouG08JKqNlv/aglgtMjL83Yu3hSVrSa+YD24SKc6u57U5jko4RrFPuKnZxCh7l4oGtTR0SOeJO+PnqP9321mSKLSL/niVVzYMIvv7+WT7Vtxa9omNBa82jGKHJBs0/LVKZzZ35xIghEV4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778837257; c=relaxed/simple; bh=IFScY7YpC0U/kYgv/XrsH5CEwR8Rv7WRchD0hd5pjxo=; h=To:Subject:Cc:References:In-Reply-To:Content-Type:From:Date: Message-Id:Mime-Version; b=eWRaM+FbqOnqaoW/xxc/4AW4YmWtg5SrrmsVZ+PMRCWbsOwBDigtPf2Tcw95VDXdLXvu+wzIHBwoIgqFDD1R7OAHmSxemC86qS4eLgtp3YiCSdUaenKwkdUswbQ6qExjiUWhk1Dok5/YYdeY0OqSO155meKBZi2CNKU/0+BvoRs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com; spf=pass smtp.mailfrom=fnnas.com; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b=bKBjFj6m; arc=none smtp.client-ip=209.127.231.28 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fnnas.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b="bKBjFj6m" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=s1; d=fnnas-com.20200927.dkim.feishu.cn; t=1778837246; h=from:subject:mime-version:from:date:message-id:subject:to:cc: reply-to:content-type:mime-version:in-reply-to:message-id; bh=pfrTyTwT3TQtImo/uba2l2Roj3cvKEzaaMunpbS4ijY=; b=bKBjFj6mlJkr8ZReqFs+fspZcx8EVkF6O/yABUxysn+mdTyQyBjKTw1/VCZQgyecJ25nxn W/vD8RcyjGUG5KH2HrwQenyQVdAqRW0okDaBOsDIrV4ClrFZHZea26p1KhmFx3+IBI1U80 QP9SrYhgOCha1WPbvs89GhpuXx9KiF1HJJlWuKAjVXDyLe/mS7YbrO+Fx83pt2fFCjJt8/ tVcA9rD1zec4eJtmPnouM5opVKedml0tSajYeAi4yhsZssjP0XTZxxluPP6PJPg+wLPcnp PRvf64nDaff5mamwBiyMn+k9ifvWPM1IQ6x4sQAbrzc6BAcN0c+JJN4UE1DhnQ== To: "Yu Kuai" Subject: [PATCH v2 2/2] md/raid10: bound reused r10bio devs[] walks by used_nr_devs Cc: "Chen Cheng" , , X-Mailer: git-send-email 2.54.0 References: <20260515092707.3436464-1-chencheng@fnnas.com> In-Reply-To: <20260515092707.3436464-1-chencheng@fnnas.com> Content-Transfer-Encoding: quoted-printable X-Lms-Return-Path: X-Original-From: chencheng@fnnas.com From: "Chen Cheng" Date: Fri, 15 May 2026 17:27:07 +0800 Message-Id: <20260515092707.3436464-3-chencheng@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Received: from localhost.localdomain ([113.111.244.134]) by smtp.feishu.cn with ESMTPS; Fri, 15 May 2026 17:27:22 +0800 Content-Type: text/plain; charset="utf-8" From: Chen Cheng After reshape changes raid_disks, an in-flight r10bio from the old geometry can still be completed or freed later. In that case, using the current geometry to walk r10_bio->devs[] is unsafe. A failure was reproduced with a simple write workload while reshaping a raid10 array from 4 disks to 5 disk= s. e.g.: mdadm -C /dev/md777 -l10 -n4 /dev/sda /dev/sdb /dev/sdc /dev/sdd mkfs.ext4 /dev/md777 mount /dev/md777 /mnt/test fsstress -d /mnt/test -n 24000 -p 8 -l 24 & mdadm /dev/md777 --add /dev/sde mdadm --grow /dev/md777 --raid-devices=3D5 \ --backup-file=3D/tmp/md-reshape-backup the sequence above can trigger: BUG: KASAN: slab-out-of-bounds in free_r10bio+0x1c4/0x260 [raid10] Read of size 8 at addr ffff00008c2dfac8 by task ksoftirqd/0/15 free_r10bio raid_end_bio_io one_write_done raid10_end_write_request The buggy object was 200 bytes long, which matches an r10bio with space for only four devs[] entries. However, put_all_bios() and find_bio_disk() walk r10_bio->devs[] using the current conf->geo.raid_disks value. Once reshape switches conf->geo.raid_disks from 4 to 5, an old 4-slot r10bio can be completed or freed as if it had 5 slots, and the walk overruns devs[4]. The same stale-width mismatch can also surface during a 5-disk to 4-disk reshap= e. Track the number of valid devs[] entries in each reused r10bio with used_nr_devs. Initialize it whenever an r10bio is prepared for regular I/O, discard, or resync/recovery/reshape work, and use it to bound devs[] walks in put_all_bios() and find_bio_disk(). Signed-off-by: Chen Cheng --- drivers/md/raid10.c | 8 ++++++-- drivers/md/raid10.h | 2 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 886bbe6b1ebc..42865d822d95 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -275,7 +275,7 @@ static void put_all_bios(struct r10conf *conf, struct r= 10bio *r10_bio) { int i; =20 - for (i =3D 0; i < conf->geo.raid_disks; i++) { + for (i =3D 0; i < r10_bio->used_nr_devs; i++) { struct bio **bio =3D & r10_bio->devs[i].bio; if (!BIO_SPECIAL(*bio)) bio_put(*bio); @@ -372,7 +372,7 @@ static int find_bio_disk(struct r10conf *conf, struct r= 10bio *r10_bio, int slot; int repl =3D 0; =20 - for (slot =3D 0; slot < conf->geo.raid_disks; slot++) { + for (slot =3D 0; slot < r10_bio->used_nr_devs; slot++) { if (r10_bio->devs[slot].bio =3D=3D bio) break; if (r10_bio->devs[slot].repl_bio =3D=3D bio) { @@ -1555,6 +1555,7 @@ static void __make_request(struct mddev *mddev, struc= t bio *bio, int sectors) r10_bio->sector =3D bio->bi_iter.bi_sector; r10_bio->state =3D 0; r10_bio->read_slot =3D -1; + r10_bio->used_nr_devs =3D conf->geo.raid_disks; memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * conf->geo.raid_disks); =20 @@ -1742,6 +1743,7 @@ static int raid10_handle_discard(struct mddev *mddev,= struct bio *bio) r10_bio->mddev =3D mddev; r10_bio->state =3D 0; r10_bio->sectors =3D 0; + r10_bio->used_nr_devs =3D geo->raid_disks; memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * geo->raid_disks); wait_blocked_dev(mddev, r10_bio); =20 @@ -3076,6 +3078,8 @@ static struct r10bio *raid10_alloc_init_r10buf(struct= r10conf *conf) else nalloc =3D 2; /* recovery */ =20 + r10bio->used_nr_devs =3D nalloc; + for (i =3D 0; i < nalloc; i++) { bio =3D r10bio->devs[i].bio; rp =3D bio->bi_private; diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h index b711626a5db7..4751119f9770 100644 --- a/drivers/md/raid10.h +++ b/drivers/md/raid10.h @@ -127,6 +127,8 @@ struct r10bio { * if the IO is in READ direction, then this is where we read */ int read_slot; + /* Used to bound devs[] walks when the object is reused. */ + unsigned int used_nr_devs; =20 struct list_head retry_list; /* --=20 2.54.0