From nobody Thu Dec 18 04:43:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99C4CC61D9D for ; Sat, 25 Nov 2023 08:16:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231755AbjKYIQt (ORCPT ); Sat, 25 Nov 2023 03:16:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229462AbjKYIQc (ORCPT ); Sat, 25 Nov 2023 03:16:32 -0500 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75A9A18B; Sat, 25 Nov 2023 00:16:37 -0800 (PST) Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Scl7410TSz4f3kL1; Sat, 25 Nov 2023 16:16:32 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id E069B1A0424; Sat, 25 Nov 2023 16:16:34 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgCnqxFfrWFlP8KIBw--.32848S8; Sat, 25 Nov 2023 16:16:34 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v3 4/5] md/raid5: remove rcu protection to access rdev from conf Date: Sat, 25 Nov 2023 16:16:03 +0800 Message-Id: <20231125081604.3939938-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231125081604.3939938-1-yukuai1@huaweicloud.com> References: <20231125081604.3939938-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: cCh0CgCnqxFfrWFlP8KIBw--.32848S8 X-Coremail-Antispam: 1UD129KBjvAXoWfCFW8AryfXr47CF1rJFyUAwb_yoW8tFW8Wo Z7Zwsxta1xJryvg3y7trn3tr47uayrAw1fCr15WrZ5Za92gw4F9w13Cr45XF1UXF1fKFy7 Xr93Xw4vqF15CrZ3n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYl7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1l42xK82IYc2Ij64vIr4 1l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK 67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI 8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAv wI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14 v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUOBTYUUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Yu Kuai Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid456. Signed-off-by: Yu Kuai --- drivers/md/raid5-cache.c | 11 +-- drivers/md/raid5-ppl.c | 16 +--- drivers/md/raid5.c | 182 +++++++++++++-------------------------- drivers/md/raid5.h | 4 +- 4 files changed, 69 insertions(+), 144 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 6157f5beb9fe..874874fe4fa1 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1890,28 +1890,22 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf, continue; =20 /* in case device is broken */ - rcu_read_lock(); - rdev =3D rcu_dereference(conf->disks[disk_index].rdev); + rdev =3D conf->disks[disk_index].rdev; if (rdev) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); sync_page_io(rdev, sh->sector, PAGE_SIZE, sh->dev[disk_index].page, REQ_OP_WRITE, false); rdev_dec_pending(rdev, rdev->mddev); - rcu_read_lock(); } - rrdev =3D rcu_dereference(conf->disks[disk_index].replacement); + rrdev =3D conf->disks[disk_index].replacement; if (rrdev) { atomic_inc(&rrdev->nr_pending); - rcu_read_unlock(); sync_page_io(rrdev, sh->sector, PAGE_SIZE, sh->dev[disk_index].page, REQ_OP_WRITE, false); rdev_dec_pending(rrdev, rrdev->mddev); - rcu_read_lock(); } - rcu_read_unlock(); } ctx->data_parity_stripes++; out: @@ -2948,7 +2942,6 @@ bool r5c_big_stripe_cached(struct r5conf *conf, secto= r_t sect) if (!log) return false; =20 - WARN_ON_ONCE(!rcu_read_lock_held()); tree_index =3D r5c_tree_index(conf, sect); slot =3D radix_tree_lookup(&log->big_stripe_tree, tree_index); return slot !=3D NULL; diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c index eaea57aee602..da4ba736c4f0 100644 --- a/drivers/md/raid5-ppl.c +++ b/drivers/md/raid5-ppl.c @@ -620,11 +620,9 @@ static void ppl_do_flush(struct ppl_io_unit *io) struct md_rdev *rdev; struct block_device *bdev =3D NULL; =20 - rcu_read_lock(); - rdev =3D rcu_dereference(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; if (rdev && !test_bit(Faulty, &rdev->flags)) bdev =3D rdev->bdev; - rcu_read_unlock(); =20 if (bdev) { struct bio *bio; @@ -882,9 +880,7 @@ static int ppl_recover_entry(struct ppl_log *log, struc= t ppl_header_entry *e, (unsigned long long)r_sector, dd_idx, (unsigned long long)sector); =20 - /* Array has not started so rcu dereference is safe */ - rdev =3D rcu_dereference_protected( - conf->disks[dd_idx].rdev, 1); + rdev =3D conf->disks[dd_idx].rdev; if (!rdev || (!test_bit(In_sync, &rdev->flags) && sector >=3D rdev->recovery_offset)) { pr_debug("%s:%*s data member disk %d missing\n", @@ -936,9 +932,7 @@ static int ppl_recover_entry(struct ppl_log *log, struc= t ppl_header_entry *e, 0, &disk, &sh); BUG_ON(sh.pd_idx !=3D le32_to_cpu(e->parity_disk)); =20 - /* Array has not started so rcu dereference is safe */ - parity_rdev =3D rcu_dereference_protected( - conf->disks[sh.pd_idx].rdev, 1); + parity_rdev =3D conf->disks[sh.pd_idx].rdev; =20 BUG_ON(parity_rdev->bdev->bd_dev !=3D log->rdev->bdev->bd_dev); pr_debug("%s:%*s write parity at sector %llu, disk %pg\n", @@ -1404,9 +1398,7 @@ int ppl_init_log(struct r5conf *conf) =20 for (i =3D 0; i < ppl_conf->count; i++) { struct ppl_log *log =3D &ppl_conf->child_logs[i]; - /* Array has not started so rcu dereference is safe */ - struct md_rdev *rdev =3D - rcu_dereference_protected(conf->disks[i].rdev, 1); + struct md_rdev *rdev =3D conf->disks[i].rdev; =20 mutex_init(&log->io_mutex); spin_lock_init(&log->io_list_lock); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index fb009e3df132..8056071cf39f 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -694,12 +694,12 @@ int raid5_calc_degraded(struct r5conf *conf) int degraded, degraded2; int i; =20 - rcu_read_lock(); degraded =3D 0; for (i =3D 0; i < conf->previous_raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D READ_ONCE(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) - rdev =3D rcu_dereference(conf->disks[i].replacement); + rdev =3D READ_ONCE(conf->disks[i].replacement); if (!rdev || test_bit(Faulty, &rdev->flags)) degraded++; else if (test_bit(In_sync, &rdev->flags)) @@ -717,15 +717,14 @@ int raid5_calc_degraded(struct r5conf *conf) if (conf->raid_disks >=3D conf->previous_raid_disks) degraded++; } - rcu_read_unlock(); if (conf->raid_disks =3D=3D conf->previous_raid_disks) return degraded; - rcu_read_lock(); degraded2 =3D 0; for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D READ_ONCE(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) - rdev =3D rcu_dereference(conf->disks[i].replacement); + rdev =3D READ_ONCE(conf->disks[i].replacement); if (!rdev || test_bit(Faulty, &rdev->flags)) degraded2++; else if (test_bit(In_sync, &rdev->flags)) @@ -739,7 +738,6 @@ int raid5_calc_degraded(struct r5conf *conf) if (conf->raid_disks <=3D conf->previous_raid_disks) degraded2++; } - rcu_read_unlock(); if (degraded2 > degraded) return degraded2; return degraded; @@ -1177,14 +1175,8 @@ static void ops_run_io(struct stripe_head *sh, struc= t stripe_head_state *s) bi =3D &dev->req; rbi =3D &dev->rreq; /* For writing to replacement */ =20 - rcu_read_lock(); - rrdev =3D rcu_dereference(conf->disks[i].replacement); - smp_mb(); /* Ensure that if rrdev is NULL, rdev won't be */ - rdev =3D rcu_dereference(conf->disks[i].rdev); - if (!rdev) { - rdev =3D rrdev; - rrdev =3D NULL; - } + rdev =3D conf->disks[i].rdev; + rrdev =3D conf->disks[i].replacement; if (op_is_write(op)) { if (replace_only) rdev =3D NULL; @@ -1205,7 +1197,6 @@ static void ops_run_io(struct stripe_head *sh, struct= stripe_head_state *s) rrdev =3D NULL; if (rrdev) atomic_inc(&rrdev->nr_pending); - rcu_read_unlock(); =20 /* We have already checked bad blocks for reads. Now * need to check for writes. We never accept write errors @@ -2724,28 +2715,6 @@ static void shrink_stripes(struct r5conf *conf) conf->slab_cache =3D NULL; } =20 -/* - * This helper wraps rcu_dereference_protected() and can be used when - * it is known that the nr_pending of the rdev is elevated. - */ -static struct md_rdev *rdev_pend_deref(struct md_rdev __rcu *rdev) -{ - return rcu_dereference_protected(rdev, - atomic_read(&rcu_access_pointer(rdev)->nr_pending)); -} - -/* - * This helper wraps rcu_dereference_protected() and should be used - * when it is known that the mddev_lock() is held. This is safe - * seeing raid5_remove_disk() has the same lock held. - */ -static struct md_rdev *rdev_mdlock_deref(struct mddev *mddev, - struct md_rdev __rcu *rdev) -{ - return rcu_dereference_protected(rdev, - lockdep_is_held(&mddev->reconfig_mutex)); -} - static void raid5_end_read_request(struct bio * bi) { struct stripe_head *sh =3D bi->bi_private; @@ -2771,9 +2740,9 @@ static void raid5_end_read_request(struct bio * bi) * In that case it moved down to 'rdev'. * rdev is not removed until all requests are finished. */ - rdev =3D rdev_pend_deref(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (!rdev) - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; =20 if (use_new_offset(conf, sh)) s =3D sh->sector + rdev->new_data_offset; @@ -2886,11 +2855,11 @@ static void raid5_end_write_request(struct bio *bi) =20 for (i =3D 0 ; i < disks; i++) { if (bi =3D=3D &sh->dev[i].req) { - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; break; } if (bi =3D=3D &sh->dev[i].rreq) { - rdev =3D rdev_pend_deref(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (rdev) replacement =3D 1; else @@ -2898,7 +2867,7 @@ static void raid5_end_write_request(struct bio *bi) * replaced it. rdev is not removed * until all requests are finished. */ - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; break; } } @@ -3660,15 +3629,13 @@ handle_failed_stripe(struct r5conf *conf, struct st= ripe_head *sh, int bitmap_end =3D 0; =20 if (test_bit(R5_ReadError, &sh->dev[i].flags)) { - struct md_rdev *rdev; - rcu_read_lock(); - rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D conf->disks[i].rdev; + if (rdev && test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags)) atomic_inc(&rdev->nr_pending); else rdev =3D NULL; - rcu_read_unlock(); if (rdev) { if (!rdev_set_badblocks( rdev, @@ -3786,16 +3753,17 @@ handle_failed_sync(struct r5conf *conf, struct stri= pe_head *sh, /* During recovery devices cannot be removed, so * locking and refcounting of rdevs is not needed */ - rcu_read_lock(); for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D conf->disks[i].rdev; + if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && !rdev_set_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0)) abort =3D 1; - rdev =3D rcu_dereference(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; + if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) @@ -3803,7 +3771,6 @@ handle_failed_sync(struct r5conf *conf, struct stripe= _head *sh, RAID5_STRIPE_SECTORS(conf), 0)) abort =3D 1; } - rcu_read_unlock(); if (abort) conf->recovery_disabled =3D conf->mddev->recovery_disabled; @@ -3816,15 +3783,13 @@ static int want_replace(struct stripe_head *sh, int= disk_idx) struct md_rdev *rdev; int rv =3D 0; =20 - rcu_read_lock(); - rdev =3D rcu_dereference(sh->raid_conf->disks[disk_idx].replacement); + rdev =3D sh->raid_conf->disks[disk_idx].replacement; if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && (rdev->recovery_offset <=3D sh->sector || rdev->mddev->recovery_cp <=3D sh->sector)) rv =3D 1; - rcu_read_unlock(); return rv; } =20 @@ -4701,7 +4666,6 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) s->log_failed =3D r5l_log_disk_error(conf); =20 /* Now to look around and see what can be done */ - rcu_read_lock(); for (i=3Ddisks; i--; ) { struct md_rdev *rdev; sector_t first_bad; @@ -4746,7 +4710,7 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) /* Prefer to use the replacement for reads, but only * if it is recovered enough and has no bad blocks. */ - rdev =3D rcu_dereference(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (rdev && !test_bit(Faulty, &rdev->flags) && rdev->recovery_offset >=3D sh->sector + RAID5_STRIPE_SECTORS(conf) && !is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), @@ -4757,7 +4721,7 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) set_bit(R5_NeedReplace, &dev->flags); else clear_bit(R5_NeedReplace, &dev->flags); - rdev =3D rcu_dereference(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; clear_bit(R5_ReadRepl, &dev->flags); } if (rdev && test_bit(Faulty, &rdev->flags)) @@ -4804,8 +4768,8 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) if (test_bit(R5_WriteError, &dev->flags)) { /* This flag does not apply to '.replacement' * only to .rdev, so make sure to check that*/ - struct md_rdev *rdev2 =3D rcu_dereference( - conf->disks[i].rdev); + struct md_rdev *rdev2 =3D conf->disks[i].rdev; + if (rdev2 =3D=3D rdev) clear_bit(R5_Insync, &dev->flags); if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { @@ -4817,8 +4781,8 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) if (test_bit(R5_MadeGood, &dev->flags)) { /* This flag does not apply to '.replacement' * only to .rdev, so make sure to check that*/ - struct md_rdev *rdev2 =3D rcu_dereference( - conf->disks[i].rdev); + struct md_rdev *rdev2 =3D conf->disks[i].rdev; + if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { s->handle_bad_blocks =3D 1; atomic_inc(&rdev2->nr_pending); @@ -4826,8 +4790,8 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) clear_bit(R5_MadeGood, &dev->flags); } if (test_bit(R5_MadeGoodRepl, &dev->flags)) { - struct md_rdev *rdev2 =3D rcu_dereference( - conf->disks[i].replacement); + struct md_rdev *rdev2 =3D conf->disks[i].replacement; + if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { s->handle_bad_blocks =3D 1; atomic_inc(&rdev2->nr_pending); @@ -4848,8 +4812,7 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) if (rdev && !test_bit(Faulty, &rdev->flags)) do_recovery =3D 1; else if (!rdev) { - rdev =3D rcu_dereference( - conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (rdev && !test_bit(Faulty, &rdev->flags)) do_recovery =3D 1; } @@ -4876,7 +4839,6 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) else s->replacing =3D 1; } - rcu_read_unlock(); } =20 /* @@ -5333,23 +5295,23 @@ static void handle_stripe(struct stripe_head *sh) struct r5dev *dev =3D &sh->dev[i]; if (test_and_clear_bit(R5_WriteError, &dev->flags)) { /* We own a safe reference to the rdev */ - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; if (!rdev_set_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0)) md_error(conf->mddev, rdev); rdev_dec_pending(rdev, conf->mddev); } if (test_and_clear_bit(R5_MadeGood, &dev->flags)) { - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; rdev_clear_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0); rdev_dec_pending(rdev, conf->mddev); } if (test_and_clear_bit(R5_MadeGoodRepl, &dev->flags)) { - rdev =3D rdev_pend_deref(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (!rdev) /* rdev have been moved down */ - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; rdev_clear_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0); rdev_dec_pending(rdev, conf->mddev); @@ -5508,24 +5470,22 @@ static int raid5_read_one_chunk(struct mddev *mddev= , struct bio *raid_bio) &dd_idx, NULL); end_sector =3D sector + bio_sectors(raid_bio); =20 - rcu_read_lock(); if (r5c_big_stripe_cached(conf, sector)) - goto out_rcu_unlock; + return 0; =20 - rdev =3D rcu_dereference(conf->disks[dd_idx].replacement); + rdev =3D conf->disks[dd_idx].replacement; if (!rdev || test_bit(Faulty, &rdev->flags) || rdev->recovery_offset < end_sector) { - rdev =3D rcu_dereference(conf->disks[dd_idx].rdev); + rdev =3D conf->disks[dd_idx].rdev; if (!rdev) - goto out_rcu_unlock; + return 0; if (test_bit(Faulty, &rdev->flags) || !(test_bit(In_sync, &rdev->flags) || rdev->recovery_offset >=3D end_sector)) - goto out_rcu_unlock; + return 0; } =20 atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); =20 if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad, &bad_sectors)) { @@ -5569,10 +5529,6 @@ static int raid5_read_one_chunk(struct mddev *mddev,= struct bio *raid_bio) raid_bio->bi_iter.bi_sector); submit_bio_noacct(align_bio); return 1; - -out_rcu_unlock: - rcu_read_unlock(); - return 0; } =20 static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *rai= d_bio) @@ -6575,14 +6531,12 @@ static inline sector_t raid5_sync_request(struct md= dev *mddev, sector_t sector_n * Note in case of > 1 drive failures it's possible we're rebuilding * one drive while leaving another faulty drive in array. */ - rcu_read_lock(); for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D conf->disks[i].rdev; =20 if (rdev =3D=3D NULL || test_bit(Faulty, &rdev->flags)) still_degraded =3D 1; } - rcu_read_unlock(); =20 md_bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, still_degrad= ed); =20 @@ -7898,18 +7852,10 @@ static int raid5_run(struct mddev *mddev) =20 for (i =3D 0; i < conf->raid_disks && conf->previous_raid_disks; i++) { - rdev =3D rdev_mdlock_deref(mddev, conf->disks[i].rdev); - if (!rdev && conf->disks[i].replacement) { - /* The replacement is all we have yet */ - rdev =3D rdev_mdlock_deref(mddev, - conf->disks[i].replacement); - conf->disks[i].replacement =3D NULL; - clear_bit(Replacement, &rdev->flags); - rcu_assign_pointer(conf->disks[i].rdev, rdev); - } + rdev =3D conf->disks[i].rdev; if (!rdev) continue; - if (rcu_access_pointer(conf->disks[i].replacement) && + if (conf->disks[i].replacement && conf->reshape_progress !=3D MaxSector) { /* replacements and reshape simply do not mix. */ pr_warn("md: cannot handle concurrent replacement and reshape.\n"); @@ -8093,15 +8039,16 @@ static void raid5_status(struct seq_file *seq, stru= ct mddev *mddev) struct r5conf *conf =3D mddev->private; int i; =20 + lockdep_assert_held(&mddev->lock); + seq_printf(seq, " level %d, %dk chunk, algorithm %d", mddev->level, conf->chunk_sectors / 2, mddev->layout); seq_printf (seq, " [%d/%d] [", conf->raid_disks, conf->raid_disks - mddev= ->degraded); - rcu_read_lock(); for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D READ_ONCE(conf->disks[i].rdev); + seq_printf (seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "= _"); } - rcu_read_unlock(); seq_printf (seq, "]"); } =20 @@ -8139,9 +8086,8 @@ static int raid5_spare_active(struct mddev *mddev) unsigned long flags; =20 for (i =3D 0; i < conf->raid_disks; i++) { - rdev =3D rdev_mdlock_deref(mddev, conf->disks[i].rdev); - replacement =3D rdev_mdlock_deref(mddev, - conf->disks[i].replacement); + rdev =3D conf->disks[i].rdev; + replacement =3D conf->disks[i].replacement; if (replacement && replacement->recovery_offset =3D=3D MaxSector && !test_bit(Faulty, &replacement->flags) @@ -8180,7 +8126,7 @@ static int raid5_remove_disk(struct mddev *mddev, str= uct md_rdev *rdev) struct r5conf *conf =3D mddev->private; int err =3D 0; int number =3D rdev->raid_disk; - struct md_rdev __rcu **rdevp; + struct md_rdev **rdevp; struct disk_info *p; struct md_rdev *tmp; =20 @@ -8203,9 +8149,9 @@ static int raid5_remove_disk(struct mddev *mddev, str= uct md_rdev *rdev) if (unlikely(number >=3D conf->pool_size)) return 0; p =3D conf->disks + number; - if (rdev =3D=3D rcu_access_pointer(p->rdev)) + if (rdev =3D=3D p->rdev) rdevp =3D &p->rdev; - else if (rdev =3D=3D rcu_access_pointer(p->replacement)) + else if (rdev =3D=3D p->replacement) rdevp =3D &p->replacement; else return 0; @@ -8225,28 +8171,24 @@ static int raid5_remove_disk(struct mddev *mddev, s= truct md_rdev *rdev) if (!test_bit(Faulty, &rdev->flags) && mddev->recovery_disabled !=3D conf->recovery_disabled && !has_failed(conf) && - (!rcu_access_pointer(p->replacement) || - rcu_access_pointer(p->replacement) =3D=3D rdev) && + (!p->replacement || p->replacement =3D=3D rdev) && number < conf->raid_disks) { err =3D -EBUSY; goto abort; } - *rdevp =3D NULL; + WRITE_ONCE(*rdevp, NULL); if (!err) { err =3D log_modify(conf, rdev, false); if (err) goto abort; } =20 - tmp =3D rcu_access_pointer(p->replacement); + tmp =3D p->replacement; if (tmp) { /* We must have just cleared 'rdev' */ - rcu_assign_pointer(p->rdev, tmp); + WRITE_ONCE(p->rdev, tmp); clear_bit(Replacement, &tmp->flags); - smp_mb(); /* Make sure other CPUs may see both as identical - * but will never see neither - if they are careful - */ - rcu_assign_pointer(p->replacement, NULL); + WRITE_ONCE(p->replacement, NULL); =20 if (!err) err =3D log_modify(conf, tmp, true); @@ -8314,7 +8256,7 @@ static int raid5_add_disk(struct mddev *mddev, struct= md_rdev *rdev) rdev->raid_disk =3D disk; if (rdev->saved_raid_disk !=3D disk) conf->fullsync =3D 1; - rcu_assign_pointer(p->rdev, rdev); + WRITE_ONCE(p->rdev, rdev); =20 err =3D log_modify(conf, rdev, true); =20 @@ -8323,7 +8265,7 @@ static int raid5_add_disk(struct mddev *mddev, struct= md_rdev *rdev) } for (disk =3D first; disk <=3D last; disk++) { p =3D conf->disks + disk; - tmp =3D rdev_mdlock_deref(mddev, p->rdev); + tmp =3D p->rdev; if (test_bit(WantReplacement, &tmp->flags) && mddev->reshape_position =3D=3D MaxSector && p->replacement =3D=3D NULL) { @@ -8332,7 +8274,7 @@ static int raid5_add_disk(struct mddev *mddev, struct= md_rdev *rdev) rdev->raid_disk =3D disk; err =3D 0; conf->fullsync =3D 1; - rcu_assign_pointer(p->replacement, rdev); + WRITE_ONCE(p->replacement, rdev); break; } } @@ -8465,7 +8407,7 @@ static int raid5_start_reshape(struct mddev *mddev) if (mddev->recovery_cp < MaxSector) return -EBUSY; for (i =3D 0; i < conf->raid_disks; i++) - if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) + if (conf->disks[i].replacement) return -EBUSY; =20 rdev_for_each(rdev, mddev) { @@ -8636,12 +8578,10 @@ static void raid5_finish_reshape(struct mddev *mdde= v) for (d =3D conf->raid_disks ; d < conf->raid_disks - mddev->delta_disks; d++) { - rdev =3D rdev_mdlock_deref(mddev, - conf->disks[d].rdev); + rdev =3D conf->disks[d].rdev; if (rdev) clear_bit(In_sync, &rdev->flags); - rdev =3D rdev_mdlock_deref(mddev, - conf->disks[d].replacement); + rdev =3D conf->disks[d].replacement; if (rdev) clear_bit(In_sync, &rdev->flags); } diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 97a795979a35..9163c8cefb3f 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -473,8 +473,8 @@ enum { */ =20 struct disk_info { - struct md_rdev __rcu *rdev; - struct md_rdev __rcu *replacement; + struct md_rdev *rdev; + struct md_rdev *replacement; struct page *extra_page; /* extra page to use in prexor */ }; =20 --=20 2.39.2