From nobody Mon Jun 8 07:22:54 2026 Received: from va-2-27.ptr.blmpb.com (va-2-27.ptr.blmpb.com [209.127.231.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67F8D25A2A2 for ; Wed, 3 Jun 2026 03:59:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.27 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780459190; cv=none; b=Hp2iQPEk3PUwTwkfaAElxxQkk5Toz3aD6VsBYl2i9NQFPuHDfw+26JaVmObbO8Tgq7+GJF39kq73x66MGwja0uVU/cfl7yZC5N+OtHUSsK6HP8BQU4KfVKDw1xxVc4pfyBeT3a7K1cbCQ8+Opex4sQ1zECXGh3UZM/gBwZLrF9w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780459190; c=relaxed/simple; bh=c5BQbYqejQFjB1Dvwa0j3DWOXNYbfrN5zfd9/TpBc+4=; h=Cc:Subject:Date:References:Message-Id:Mime-Version:In-Reply-To: Content-Type:To:From; b=Pf+VK6YqX3O7e0uLXyVI9KPTkRMaFzo0P3eiXunofdfIWJYXjFEl0gVjhZk2Wr0mC6QjmJkbOPjhFsOf7/EdxzUe5VDEPsMxXAseRNrBFZzkfs1TJ3qmXQsxKQ/r/DJRW4y0QeXMjJSSKzkOI1gvzf8C2VImpUVRcb46S6L5Qxg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com; spf=pass smtp.mailfrom=fnnas.com; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b=GEgNS1Dx; arc=none smtp.client-ip=209.127.231.27 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fnnas.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b="GEgNS1Dx" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=s1; d=fnnas-com.20200927.dkim.feishu.cn; t=1780459184; h=from:subject:mime-version:from:date:message-id:subject:to:cc: reply-to:content-type:mime-version:in-reply-to:message-id; bh=rDqvK++FuXr4+qMneJDGiyKGgBn60HyAFWmj1wid1EI=; b=GEgNS1DxyQBXouYfl2kDSa+L5XWlJHGpRdbkMDBBwzyTudQKzoVp+mT3ekDIIyRttrtWHI f5Z+AjkT7rVygxxga+5b+f3k/v6JiXyyR/T2r9TPrp8ax1YoUAUikaXaMEbvcqa8ITViEx nuU29D4z9u9VS1BvU2m6LZpAvXoG6lgMyeUrl5Sa+Uzq2iGIprhHHMsh7WrauOg7pkW61B nYBybezIZmkSBji1znG7uE4icEPec+5LqBbrJls/GaY5NcwIs3l5KUjHVtKTe2bZ7PnNdu GjLrGfuVwVyrMLvEQu7pENxS+vV9K+RLeZQdKO91rbPzg4RwxNJDjE1tOJ/9gw== Cc: , Subject: [PATCH v4 1/3] md: suspend array before raid10 reshape via sync_action Date: Wed, 3 Jun 2026 11:59:23 +0800 References: <20260603035925.217847-1-chencheng@fnnas.com> Message-Id: <20260603035925.217847-2-chencheng@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260603035925.217847-1-chencheng@fnnas.com> X-Original-From: chencheng@fnnas.com X-Lms-Return-Path: To: , From: "Chen Cheng" Received: from localhost.localdomain ([183.34.161.59]) by smtp.feishu.cn with ESMTPS; Wed, 03 Jun 2026 11:59:41 +0800 Content-Transfer-Encoding: quoted-printable X-Mailer: git-send-email 2.54.0 Content-Type: text/plain; charset="utf-8" From: Chen Cheng The sync_action=3Dreshape path currently enters mddev_start_reshape() with reconfig_mutex held but without suspending the array first. For raid10, that means raid10_start_reshape() has to drop reconfig_mutex and reacquire the array through mddev_suspend_and_lock_nointr() before it can safely switch geometry-dependent state. Use mddev_suspend_and_lock() for ACTION_RESHAPE in action_store(), so the sysfs reshape path reaches mddev_start_reshape() with the array already suspended and locked. Other sync_action operations keep using mddev_lock() unchanged. Signed-off-by: Chen Cheng --- drivers/md/md.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 096bb64e87bd..5bc937e149ac 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5256,30 +5256,39 @@ static int mddev_start_reshape(struct mddev *mddev) =20 static ssize_t action_store(struct mddev *mddev, const char *page, size_t len) { int ret; + bool suspended =3D false; enum sync_action action; =20 if (!mddev->pers || !mddev->pers->sync_request) return -EINVAL; =20 + action =3D md_sync_action_by_name(page); retry: if (work_busy(&mddev->sync_work)) flush_work(&mddev->sync_work); =20 - ret =3D mddev_lock(mddev); + if (action =3D=3D ACTION_RESHAPE) { + ret =3D mddev_suspend_and_lock(mddev); + suspended =3D true; + } else { + ret =3D mddev_lock(mddev); + suspended =3D false; + } if (ret) return ret; =20 if (work_busy(&mddev->sync_work)) { - mddev_unlock(mddev); + if (suspended) + mddev_unlock_and_resume(mddev); + else + mddev_unlock(mddev); goto retry; } =20 - action =3D md_sync_action_by_name(page); - /* TODO: mdadm rely on "idle" to start sync_thread. */ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) { switch (action) { case ACTION_FROZEN: md_frozen_sync_thread(mddev); @@ -5344,11 +5353,14 @@ action_store(struct mddev *mddev, const char *page,= size_t len) md_wakeup_thread(mddev->thread); sysfs_notify_dirent_safe(mddev->sysfs_action); ret =3D len; =20 out: - mddev_unlock(mddev); + if (suspended) + mddev_unlock_and_resume(mddev); + else + mddev_unlock(mddev); return ret; } =20 static struct md_sysfs_entry md_scan_mode =3D __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store); --=20 2.54.0 From nobody Mon Jun 8 07:22:54 2026 Received: from va-2-28.ptr.blmpb.com (va-2-28.ptr.blmpb.com [209.127.231.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A40A53F1ACA for ; Wed, 3 Jun 2026 03:59:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.28 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780459197; cv=none; b=o3qAz3Wt7hvePyQHFtN5hcuyCku4tFVaV2qlJ8h2XJyzV0Np6oE5ml0eWeSMhOTEjoEqF84qj8jdqlwQP6xmKxr83sxkh4515UIIDp2CjrakGr83dTyRBB8KC6oxnc5WyXJ5eWVAkxcpOru9o5iEAtz6p+KJUnx37I2ERanARXA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780459197; c=relaxed/simple; bh=dngVXY/Qf1Rg2F39aEyQlAaD9iic53HYsdKk3HDT1bY=; h=To:References:Mime-Version:Message-Id:In-Reply-To:Content-Type: Subject:Date:Cc:From; b=KR66IPX/bJQW+iQT1Jx7UBJ4UsIee0tAx+rN0Of2nwGENAML8Kf65WFdAgAnFxPu6LWrF60RLu/xJr0R3jQhErPRyyo6UVsZQe7JIKdJiKuq69hRBu2mv/guBbZAb8gE2saoqmi+LLRRg0gNdMU2lzZx3WGhZPyGdVyd36x3nQ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com; spf=pass smtp.mailfrom=fnnas.com; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b=P89aTiOT; arc=none smtp.client-ip=209.127.231.28 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fnnas.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b="P89aTiOT" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=s1; d=fnnas-com.20200927.dkim.feishu.cn; t=1780459186; h=from:subject:mime-version:from:date:message-id:subject:to:cc: reply-to:content-type:mime-version:in-reply-to:message-id; bh=rp1ZNRgyfojlQ5/e3lweNUa6nJunZToOQ7CDEDBOUuE=; b=P89aTiOTuHfesyhtCSd85/dliwog/rdeHRiD8L1AcQTpxC1QxW9SzZt/lr6W2DpwZvwpzg WXzC2WQ+QNfHQtPXCG/BwENfuxyNVAeD6dfQO1kx5TZUC6zexFQdND7tfpW4pQXZ3n5xS7 dft6YFoaOo4x9zVMJkyuzFvdwRav+PfIrDUPM6gndw69aZEO/L4+8rEdZfy3ukOgg32U/b E2lgaZJJExy4bKf1rIeNgH+vOJXG5wTSi3K8z9DJHcuMDLWz+ibv8KyXYJ5UUmZHrUhKPg NYHgYIi4Y+YFDjdFAxhv1TvfvjoVRVj1+QrKey+wO00EvAajAXgr7QkYOnzcjA== To: , References: <20260603035925.217847-1-chencheng@fnnas.com> Received: from localhost.localdomain ([183.34.161.59]) by smtp.feishu.cn with ESMTPS; Wed, 03 Jun 2026 11:59:43 +0800 X-Original-From: chencheng@fnnas.com Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.54.0 Message-Id: <20260603035925.217847-3-chencheng@fnnas.com> X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable In-Reply-To: <20260603035925.217847-1-chencheng@fnnas.com> Subject: [PATCH v4 2/3] md/raid10: make r10bio_pool use fixed-size objects Date: Wed, 3 Jun 2026 11:59:24 +0800 Cc: , From: "Chen Cheng" Content-Type: text/plain; charset="utf-8" From: Chen Cheng raid10 currently sizes regular r10bio_pool objects from conf->geo.raid_disks, which makes the mempool element width depend on the current geometry. That breaks across reshape. Regular r10bio objects are preallocated and reused, so after a geometry change the pool may still contain objects allocated for the old width. A later request under the new geometry can then reuse an r10bio whose devs[] array is still sized for the previous raid_disks value. Fix this by backing r10bio_pool with a fixed-size kmalloc mempool sized for the maximum width needed across the current reshape transition. Apply the same sizing rule to standalone r10bio objects allocated from r10buf_pool_alloc(). This removes the geometry-dependent allocation width from regular r10bio_pool objects and prevents reshape from reusing pool entries that are too small for the new layout. Signed-off-by: Chen Cheng --- drivers/md/raid10.c | 48 +++++++++++++++++++++++++++++++++------------ drivers/md/raid10.h | 2 +- 2 files changed, 36 insertions(+), 14 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index cee5a253a281..5eca34432e63 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -101,17 +101,32 @@ static void end_reshape(struct r10conf *conf); static inline struct r10bio *get_resync_r10bio(struct bio *bio) { return get_resync_pages(bio)->raid_bio; } =20 -static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data) +static inline unsigned int calc_r10bio_pool_disks(struct mddev *mddev) { - struct r10conf *conf =3D data; - int size =3D offsetof(struct r10bio, devs[conf->geo.raid_disks]); + /* If delta_disks < 0, use bigger r10bio->devs[] is ok. */ + return mddev->raid_disks + max(0, mddev->delta_disks); +} + +static inline int calc_r10bio_size(struct mddev *mddev) +{ + return offsetof(struct r10bio, devs[calc_r10bio_pool_disks(mddev)]); +} + +static mempool_t *create_r10bio_pool(struct mddev *mddev) +{ + int size =3D calc_r10bio_size(mddev); + + return mempool_create_kmalloc_pool(NR_RAID_BIOS, size); +} + +static struct r10bio *alloc_r10bio(struct mddev *mddev, gfp_t gfp_flags) +{ + int size =3D calc_r10bio_size(mddev); =20 - /* allocate a r10bio with room for raid_disks entries in the - * bios array */ return kzalloc(size, gfp_flags); } =20 #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9) /* amount of memory to reserve for resync requests */ @@ -135,11 +150,11 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void= *data) struct bio *bio; int j; int nalloc, nalloc_rp; struct resync_pages *rps; =20 - r10_bio =3D r10bio_pool_alloc(gfp_flags, conf); + r10_bio =3D alloc_r10bio(conf->mddev, gfp_flags); if (!r10_bio) return NULL; =20 if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery) || test_bit(MD_RECOVERY_RESHAPE, &conf->mddev->recovery)) @@ -275,11 +290,11 @@ static void put_all_bios(struct r10conf *conf, struct= r10bio *r10_bio) static void free_r10bio(struct r10bio *r10_bio) { struct r10conf *conf =3D r10_bio->mddev->private; =20 put_all_bios(conf, r10_bio); - mempool_free(r10_bio, &conf->r10bio_pool); + mempool_free(r10_bio, conf->r10bio_pool); } =20 static void put_buf(struct r10bio *r10_bio) { struct r10conf *conf =3D r10_bio->mddev->private; @@ -1537,11 +1552,11 @@ static void raid10_write_request(struct mddev *mdde= v, struct bio *bio, static void __make_request(struct mddev *mddev, struct bio *bio, int secto= rs) { struct r10conf *conf =3D mddev->private; struct r10bio *r10_bio; =20 - r10_bio =3D mempool_alloc(&conf->r10bio_pool, GFP_NOIO); + r10_bio =3D mempool_alloc(conf->r10bio_pool, GFP_NOIO); =20 r10_bio->master_bio =3D bio; r10_bio->sectors =3D sectors; =20 r10_bio->mddev =3D mddev; @@ -1729,11 +1744,11 @@ static int raid10_handle_discard(struct mddev *mdde= v, struct bio *bio) last_stripe_index *=3D geo->far_copies; end_disk_offset =3D (bio_end & geo->chunk_mask) + (last_stripe_index << geo->chunk_shift); =20 retry_discard: - r10_bio =3D mempool_alloc(&conf->r10bio_pool, GFP_NOIO); + r10_bio =3D mempool_alloc(conf->r10bio_pool, GFP_NOIO); r10_bio->mddev =3D mddev; r10_bio->state =3D 0; r10_bio->sectors =3D 0; r10_bio->read_slot =3D -1; memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * geo->raid_disks); @@ -3830,11 +3845,11 @@ static int setup_geo(struct geom *geo, struct mddev= *mddev, enum geo_type new) static void raid10_free_conf(struct r10conf *conf) { if (!conf) return; =20 - mempool_exit(&conf->r10bio_pool); + mempool_destroy(conf->r10bio_pool); kfree(conf->mirrors); kfree(conf->mirrors_old); kfree(conf->mirrors_new); safe_put_page(conf->tmppage); bioset_exit(&conf->bio_split); @@ -3877,13 +3892,12 @@ static struct r10conf *setup_conf(struct mddev *mdd= ev) if (!conf->tmppage) goto out; =20 conf->geo =3D geo; conf->copies =3D copies; - err =3D mempool_init(&conf->r10bio_pool, NR_RAID_BIOS, r10bio_pool_alloc, - rbio_pool_free, conf); - if (err) + conf->r10bio_pool =3D create_r10bio_pool(mddev); + if (!conf->r10bio_pool) goto out; =20 err =3D bioset_init(&conf->bio_split, BIO_POOL_SIZE, 0, 0); if (err) goto out; @@ -4373,10 +4387,11 @@ static int raid10_start_reshape(struct mddev *mddev) struct geom new; struct r10conf *conf =3D mddev->private; struct md_rdev *rdev; int spares =3D 0; int ret; + mempool_t *new_pool; =20 if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) return -EBUSY; =20 if (setup_geo(&new, mddev, geo_start) !=3D conf->copies) @@ -4409,10 +4424,17 @@ static int raid10_start_reshape(struct mddev *mddev) =20 if (spares < mddev->delta_disks) return -EINVAL; =20 conf->offset_diff =3D min_offset_diff; + if (mddev->delta_disks > 0) { + new_pool =3D create_r10bio_pool(mddev); + if (!new_pool) + return -ENOMEM; + mempool_destroy(conf->r10bio_pool); + conf->r10bio_pool =3D new_pool; + } spin_lock_irq(&conf->device_lock); if (conf->mirrors_new) { memcpy(conf->mirrors_new, conf->mirrors, sizeof(struct raid10_info)*conf->prev.raid_disks); smp_mb(); diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h index ec79d87fb92f..b711626a5db7 100644 --- a/drivers/md/raid10.h +++ b/drivers/md/raid10.h @@ -85,11 +85,11 @@ struct r10conf { int have_replacement; /* There is at least one * replacement device. */ wait_queue_head_t wait_barrier; =20 - mempool_t r10bio_pool; + mempool_t *r10bio_pool; mempool_t r10buf_pool; struct page *tmppage; struct bio_set bio_split; =20 /* When taking over an array from a different personality, we store --=20 2.54.0 From nobody Mon Jun 8 07:22:54 2026 Received: from va-2-38.ptr.blmpb.com (va-2-38.ptr.blmpb.com [209.127.231.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98D053F39E1 for ; Wed, 3 Jun 2026 03:59:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.231.38 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780459196; cv=none; b=riYmHh++MeiXbWggvOHA6or0Qj5d+rJwETHXSI9tPZmwC4FH9DL3JfFqK6c5QNFXNomjV9ERYlcgcl6mb+9oa140VjRU4wW82LblRJasa7IJ43Ik52kaxsg+o2gkLPk/A7cpcsyb57PUSm1Rx+1vPLjHLDo9+tYV4Uq2+K/wCsg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780459196; c=relaxed/simple; bh=u6hfYMnD2T6qFn+0OybO9maknaYANalrqFcZjGTSBC4=; h=Content-Type:To:Date:Subject:In-Reply-To:Message-Id:Mime-Version: References:Cc:From; b=Anaqx5JmlW5YfdwkmxwHltm2UZePFicFfRAYmyxtpJkem7pnN6MQ4jvxVImJoZcxrYjhcFxp3cOPYCvkzlYoTGGwJKj1vzO9zNdZnBs6hb1vfavuINhJ9lHMWnSx+oJiUc2WTtvRskk3Nu193PvlavZVszclRY1GsIO7Vu3UffE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com; spf=pass smtp.mailfrom=fnnas.com; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b=VwS43KnI; arc=none smtp.client-ip=209.127.231.38 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=fnnas.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fnnas.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fnnas-com.20200927.dkim.feishu.cn header.i=@fnnas-com.20200927.dkim.feishu.cn header.b="VwS43KnI" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=s1; d=fnnas-com.20200927.dkim.feishu.cn; t=1780459188; h=from:subject:mime-version:from:date:message-id:subject:to:cc: reply-to:content-type:mime-version:in-reply-to:message-id; bh=qml+20LJwSYOtCYfjs/2ze3QuU4sqQud+/zaVK4afqs=; b=VwS43KnI0uzIzsI5sdtejI2F5w3ZfdYSXgxU06vZbO5XnSg10bcd5s+ryRt3LAf2RyWE4i jt1wAyACNIL731u95a70b/leOx7OrHHTzxcd1ZMAJrAqF872jJ1SafgRdANOGv7nYQzGij qZ2oongnMRFwgOf15CSeUE0RfDJgplRcGMb/KWMSeQx2fgRqGd1ea9cKROoCSsTnIcxZOR GvvX/7+GDmDAkimmEbYWo+1Pmub72CY0gWLq66wRIWaHw3sc9BG+Ipll3hP0wU37tGLk15 Igh3WjG8yUG28s3uHgUsi1mda/Tvl+zAEyYt1NAKmTO+LLnIXQFGVXbcEVYttw== To: , Date: Wed, 3 Jun 2026 11:59:25 +0800 Subject: [PATCH v4 3/3] md/raid10: bound reused r10bio devs[] walks by used_nr_devs Received: from localhost.localdomain ([183.34.161.59]) by smtp.feishu.cn with ESMTPS; Wed, 03 Jun 2026 11:59:45 +0800 In-Reply-To: <20260603035925.217847-1-chencheng@fnnas.com> X-Original-From: chencheng@fnnas.com X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable Message-Id: <20260603035925.217847-4-chencheng@fnnas.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260603035925.217847-1-chencheng@fnnas.com> X-Mailer: git-send-email 2.54.0 Cc: , From: "Chen Cheng" Content-Type: text/plain; charset="utf-8" From: Chen Cheng After reshape changes raid_disks, an in-flight r10bio from the old geometry can still be completed or freed later. In that case, using the current geometry to walk r10_bio->devs[] is unsafe. A failure was reproduced with a simple write workload while reshaping a raid10 array from 4 disks to 5 disk= s. e.g.: mdadm -C /dev/md777 -l10 -n4 /dev/sda /dev/sdb /dev/sdc /dev/sdd mkfs.ext4 /dev/md777 mount /dev/md777 /mnt/test fsstress -d /mnt/test -n 24000 -p 8 -l 24 & mdadm /dev/md777 --add /dev/sde mdadm --grow /dev/md777 --raid-devices=3D5 \ --backup-file=3D/tmp/md-reshape-backup the sequence above can trigger: BUG: KASAN: slab-out-of-bounds in free_r10bio+0x1c4/0x260 [raid10] Read of size 8 at addr ffff00008c2dfac8 by task ksoftirqd/0/15 free_r10bio raid_end_bio_io one_write_done raid10_end_write_request The buggy object was 200 bytes long, which matches an r10bio with space for only four devs[] entries. However, put_all_bios() and find_bio_disk() walk r10_bio->devs[] using the current conf->geo.raid_disks value. Once reshape switches conf->geo.raid_disks from 4 to 5, an old 4-slot r10bio can be completed or freed as if it had 5 slots, and the walk overruns devs[4]. The same stale-width mismatch can also surface during a 5-disk to 4-disk reshap= e. Track the number of valid devs[] entries in each reused r10bio with used_nr_devs. Initialize it whenever an r10bio is prepared for regular I/O, discard, or resync/recovery/reshape work, and use it to bound devs[] walks in put_all_bios() and find_bio_disk(). Signed-off-by: Chen Cheng --- drivers/md/raid10.c | 8 ++++++-- drivers/md/raid10.h | 2 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 5eca34432e63..f134b93fd593 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -273,11 +273,11 @@ static void r10buf_pool_free(void *__r10_bio, void *d= ata) =20 static void put_all_bios(struct r10conf *conf, struct r10bio *r10_bio) { int i; =20 - for (i =3D 0; i < conf->geo.raid_disks; i++) { + for (i =3D 0; i < r10_bio->used_nr_devs; i++) { struct bio **bio =3D & r10_bio->devs[i].bio; if (!BIO_SPECIAL(*bio)) bio_put(*bio); *bio =3D NULL; bio =3D &r10_bio->devs[i].repl_bio; @@ -370,11 +370,11 @@ static int find_bio_disk(struct r10conf *conf, struct= r10bio *r10_bio, struct bio *bio, int *slotp, int *replp) { int slot; int repl =3D 0; =20 - for (slot =3D 0; slot < conf->geo.raid_disks; slot++) { + for (slot =3D 0; slot < r10_bio->used_nr_devs; slot++) { if (r10_bio->devs[slot].bio =3D=3D bio) break; if (r10_bio->devs[slot].repl_bio =3D=3D bio) { repl =3D 1; break; @@ -1561,10 +1561,11 @@ static void __make_request(struct mddev *mddev, str= uct bio *bio, int sectors) =20 r10_bio->mddev =3D mddev; r10_bio->sector =3D bio->bi_iter.bi_sector; r10_bio->state =3D 0; r10_bio->read_slot =3D -1; + r10_bio->used_nr_devs =3D conf->geo.raid_disks; memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * conf->geo.raid_disks); =20 if (bio_data_dir(bio) =3D=3D READ) raid10_read_request(mddev, bio, r10_bio); @@ -1749,10 +1750,11 @@ static int raid10_handle_discard(struct mddev *mdde= v, struct bio *bio) r10_bio =3D mempool_alloc(conf->r10bio_pool, GFP_NOIO); r10_bio->mddev =3D mddev; r10_bio->state =3D 0; r10_bio->sectors =3D 0; r10_bio->read_slot =3D -1; + r10_bio->used_nr_devs =3D geo->raid_disks; memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * geo->raid_disks); wait_blocked_dev(mddev, r10_bio); =20 /* * For far layout it needs more than one r10bio to cover all regions. @@ -3083,10 +3085,12 @@ static struct r10bio *raid10_alloc_init_r10buf(stru= ct r10conf *conf) test_bit(MD_RECOVERY_RESHAPE, &conf->mddev->recovery)) nalloc =3D conf->copies; /* resync */ else nalloc =3D 2; /* recovery */ =20 + r10bio->used_nr_devs =3D nalloc; + for (i =3D 0; i < nalloc; i++) { bio =3D r10bio->devs[i].bio; rp =3D bio->bi_private; bio_reset(bio, NULL, 0); bio->bi_private =3D rp; diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h index b711626a5db7..4751119f9770 100644 --- a/drivers/md/raid10.h +++ b/drivers/md/raid10.h @@ -125,10 +125,12 @@ struct r10bio { struct bio *master_bio; /* * if the IO is in READ direction, then this is where we read */ int read_slot; + /* Used to bound devs[] walks when the object is reused. */ + unsigned int used_nr_devs; =20 struct list_head retry_list; /* * if the IO is in WRITE direction, then multiple bios are used, * one for each copy. --=20 2.54.0