From nobody Thu Dec 18 06:17:44 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39CCBC25B45 for ; Sat, 21 Oct 2023 02:25:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231543AbjJUCZj (ORCPT ); Fri, 20 Oct 2023 22:25:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231239AbjJUCZW (ORCPT ); Fri, 20 Oct 2023 22:25:22 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D2F4D78; Fri, 20 Oct 2023 19:25:17 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SC4zs0pw2z4f3lX0; Sat, 21 Oct 2023 10:25:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAnt9aHNjNlZ+cUDg--.5642S9; Sat, 21 Oct 2023 10:25:14 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 5/6] md/raid5: remove rcu protection to access rdev from conf Date: Sat, 21 Oct 2023 18:20:58 +0800 Message-Id: <20231021102059.3198284-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231021102059.3198284-1-yukuai1@huaweicloud.com> References: <20231021102059.3198284-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAnt9aHNjNlZ+cUDg--.5642S9 X-Coremail-Antispam: 1UD129KBjvAXoWfCFW8Aryxtw4kZr1DJFy3CFg_yoW8tFW8Wo Z7Zwsxta1xJryvg3y7trn3tr47uayrAw1fCr15WrZ5Za92gw4Fgw13Cr45XF1UXF1fKFy7 Xr93Xw4vqF15CrZ3n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l87I20VAvwVAaII0Ic2I_JFv_Gryl82xGYIkIc2 x26280x7IE14v26r126s0DM28IrcIa0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Y j41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwV C0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0 Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64 xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j 6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI 8I648v4I1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjTRKfOw UUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Yu Kuai Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid456. Signed-off-by: Yu Kuai --- drivers/md/raid5-cache.c | 11 +-- drivers/md/raid5-ppl.c | 16 +--- drivers/md/raid5.c | 182 +++++++++++++-------------------------- drivers/md/raid5.h | 4 +- 4 files changed, 69 insertions(+), 144 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 6157f5beb9fe..874874fe4fa1 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1890,28 +1890,22 @@ r5l_recovery_replay_one_stripe(struct r5conf *conf, continue; =20 /* in case device is broken */ - rcu_read_lock(); - rdev =3D rcu_dereference(conf->disks[disk_index].rdev); + rdev =3D conf->disks[disk_index].rdev; if (rdev) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); sync_page_io(rdev, sh->sector, PAGE_SIZE, sh->dev[disk_index].page, REQ_OP_WRITE, false); rdev_dec_pending(rdev, rdev->mddev); - rcu_read_lock(); } - rrdev =3D rcu_dereference(conf->disks[disk_index].replacement); + rrdev =3D conf->disks[disk_index].replacement; if (rrdev) { atomic_inc(&rrdev->nr_pending); - rcu_read_unlock(); sync_page_io(rrdev, sh->sector, PAGE_SIZE, sh->dev[disk_index].page, REQ_OP_WRITE, false); rdev_dec_pending(rrdev, rrdev->mddev); - rcu_read_lock(); } - rcu_read_unlock(); } ctx->data_parity_stripes++; out: @@ -2948,7 +2942,6 @@ bool r5c_big_stripe_cached(struct r5conf *conf, secto= r_t sect) if (!log) return false; =20 - WARN_ON_ONCE(!rcu_read_lock_held()); tree_index =3D r5c_tree_index(conf, sect); slot =3D radix_tree_lookup(&log->big_stripe_tree, tree_index); return slot !=3D NULL; diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c index eaea57aee602..da4ba736c4f0 100644 --- a/drivers/md/raid5-ppl.c +++ b/drivers/md/raid5-ppl.c @@ -620,11 +620,9 @@ static void ppl_do_flush(struct ppl_io_unit *io) struct md_rdev *rdev; struct block_device *bdev =3D NULL; =20 - rcu_read_lock(); - rdev =3D rcu_dereference(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; if (rdev && !test_bit(Faulty, &rdev->flags)) bdev =3D rdev->bdev; - rcu_read_unlock(); =20 if (bdev) { struct bio *bio; @@ -882,9 +880,7 @@ static int ppl_recover_entry(struct ppl_log *log, struc= t ppl_header_entry *e, (unsigned long long)r_sector, dd_idx, (unsigned long long)sector); =20 - /* Array has not started so rcu dereference is safe */ - rdev =3D rcu_dereference_protected( - conf->disks[dd_idx].rdev, 1); + rdev =3D conf->disks[dd_idx].rdev; if (!rdev || (!test_bit(In_sync, &rdev->flags) && sector >=3D rdev->recovery_offset)) { pr_debug("%s:%*s data member disk %d missing\n", @@ -936,9 +932,7 @@ static int ppl_recover_entry(struct ppl_log *log, struc= t ppl_header_entry *e, 0, &disk, &sh); BUG_ON(sh.pd_idx !=3D le32_to_cpu(e->parity_disk)); =20 - /* Array has not started so rcu dereference is safe */ - parity_rdev =3D rcu_dereference_protected( - conf->disks[sh.pd_idx].rdev, 1); + parity_rdev =3D conf->disks[sh.pd_idx].rdev; =20 BUG_ON(parity_rdev->bdev->bd_dev !=3D log->rdev->bdev->bd_dev); pr_debug("%s:%*s write parity at sector %llu, disk %pg\n", @@ -1404,9 +1398,7 @@ int ppl_init_log(struct r5conf *conf) =20 for (i =3D 0; i < ppl_conf->count; i++) { struct ppl_log *log =3D &ppl_conf->child_logs[i]; - /* Array has not started so rcu dereference is safe */ - struct md_rdev *rdev =3D - rcu_dereference_protected(conf->disks[i].rdev, 1); + struct md_rdev *rdev =3D conf->disks[i].rdev; =20 mutex_init(&log->io_mutex); spin_lock_init(&log->io_list_lock); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index a80be51b4825..ad6d5138a6bd 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -692,12 +692,12 @@ int raid5_calc_degraded(struct r5conf *conf) int degraded, degraded2; int i; =20 - rcu_read_lock(); degraded =3D 0; for (i =3D 0; i < conf->previous_raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D READ_ONCE(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) - rdev =3D rcu_dereference(conf->disks[i].replacement); + rdev =3D READ_ONCE(conf->disks[i].replacement); if (!rdev || test_bit(Faulty, &rdev->flags)) degraded++; else if (test_bit(In_sync, &rdev->flags)) @@ -715,15 +715,14 @@ int raid5_calc_degraded(struct r5conf *conf) if (conf->raid_disks >=3D conf->previous_raid_disks) degraded++; } - rcu_read_unlock(); if (conf->raid_disks =3D=3D conf->previous_raid_disks) return degraded; - rcu_read_lock(); degraded2 =3D 0; for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D READ_ONCE(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) - rdev =3D rcu_dereference(conf->disks[i].replacement); + rdev =3D READ_ONCE(conf->disks[i].replacement); if (!rdev || test_bit(Faulty, &rdev->flags)) degraded2++; else if (test_bit(In_sync, &rdev->flags)) @@ -737,7 +736,6 @@ int raid5_calc_degraded(struct r5conf *conf) if (conf->raid_disks <=3D conf->previous_raid_disks) degraded2++; } - rcu_read_unlock(); if (degraded2 > degraded) return degraded2; return degraded; @@ -1175,14 +1173,8 @@ static void ops_run_io(struct stripe_head *sh, struc= t stripe_head_state *s) bi =3D &dev->req; rbi =3D &dev->rreq; /* For writing to replacement */ =20 - rcu_read_lock(); - rrdev =3D rcu_dereference(conf->disks[i].replacement); - smp_mb(); /* Ensure that if rrdev is NULL, rdev won't be */ - rdev =3D rcu_dereference(conf->disks[i].rdev); - if (!rdev) { - rdev =3D rrdev; - rrdev =3D NULL; - } + rdev =3D conf->disks[i].rdev; + rrdev =3D conf->disks[i].replacement; if (op_is_write(op)) { if (replace_only) rdev =3D NULL; @@ -1203,7 +1195,6 @@ static void ops_run_io(struct stripe_head *sh, struct= stripe_head_state *s) rrdev =3D NULL; if (rrdev) atomic_inc(&rrdev->nr_pending); - rcu_read_unlock(); =20 /* We have already checked bad blocks for reads. Now * need to check for writes. We never accept write errors @@ -2722,28 +2713,6 @@ static void shrink_stripes(struct r5conf *conf) conf->slab_cache =3D NULL; } =20 -/* - * This helper wraps rcu_dereference_protected() and can be used when - * it is known that the nr_pending of the rdev is elevated. - */ -static struct md_rdev *rdev_pend_deref(struct md_rdev __rcu *rdev) -{ - return rcu_dereference_protected(rdev, - atomic_read(&rcu_access_pointer(rdev)->nr_pending)); -} - -/* - * This helper wraps rcu_dereference_protected() and should be used - * when it is known that the mddev_lock() is held. This is safe - * seeing raid5_remove_disk() has the same lock held. - */ -static struct md_rdev *rdev_mdlock_deref(struct mddev *mddev, - struct md_rdev __rcu *rdev) -{ - return rcu_dereference_protected(rdev, - lockdep_is_held(&mddev->reconfig_mutex)); -} - static void raid5_end_read_request(struct bio * bi) { struct stripe_head *sh =3D bi->bi_private; @@ -2769,9 +2738,9 @@ static void raid5_end_read_request(struct bio * bi) * In that case it moved down to 'rdev'. * rdev is not removed until all requests are finished. */ - rdev =3D rdev_pend_deref(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (!rdev) - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; =20 if (use_new_offset(conf, sh)) s =3D sh->sector + rdev->new_data_offset; @@ -2884,11 +2853,11 @@ static void raid5_end_write_request(struct bio *bi) =20 for (i =3D 0 ; i < disks; i++) { if (bi =3D=3D &sh->dev[i].req) { - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; break; } if (bi =3D=3D &sh->dev[i].rreq) { - rdev =3D rdev_pend_deref(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (rdev) replacement =3D 1; else @@ -2896,7 +2865,7 @@ static void raid5_end_write_request(struct bio *bi) * replaced it. rdev is not removed * until all requests are finished. */ - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; break; } } @@ -3658,15 +3627,13 @@ handle_failed_stripe(struct r5conf *conf, struct st= ripe_head *sh, int bitmap_end =3D 0; =20 if (test_bit(R5_ReadError, &sh->dev[i].flags)) { - struct md_rdev *rdev; - rcu_read_lock(); - rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D conf->disks[i].rdev; + if (rdev && test_bit(In_sync, &rdev->flags) && !test_bit(Faulty, &rdev->flags)) atomic_inc(&rdev->nr_pending); else rdev =3D NULL; - rcu_read_unlock(); if (rdev) { if (!rdev_set_badblocks( rdev, @@ -3784,16 +3751,17 @@ handle_failed_sync(struct r5conf *conf, struct stri= pe_head *sh, /* During recovery devices cannot be removed, so * locking and refcounting of rdevs is not needed */ - rcu_read_lock(); for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D conf->disks[i].rdev; + if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && !rdev_set_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0)) abort =3D 1; - rdev =3D rcu_dereference(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; + if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) @@ -3801,7 +3769,6 @@ handle_failed_sync(struct r5conf *conf, struct stripe= _head *sh, RAID5_STRIPE_SECTORS(conf), 0)) abort =3D 1; } - rcu_read_unlock(); if (abort) conf->recovery_disabled =3D conf->mddev->recovery_disabled; @@ -3814,15 +3781,13 @@ static int want_replace(struct stripe_head *sh, int= disk_idx) struct md_rdev *rdev; int rv =3D 0; =20 - rcu_read_lock(); - rdev =3D rcu_dereference(sh->raid_conf->disks[disk_idx].replacement); + rdev =3D sh->raid_conf->disks[disk_idx].replacement; if (rdev && !test_bit(Faulty, &rdev->flags) && !test_bit(In_sync, &rdev->flags) && (rdev->recovery_offset <=3D sh->sector || rdev->mddev->recovery_cp <=3D sh->sector)) rv =3D 1; - rcu_read_unlock(); return rv; } =20 @@ -4699,7 +4664,6 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) s->log_failed =3D r5l_log_disk_error(conf); =20 /* Now to look around and see what can be done */ - rcu_read_lock(); for (i=3Ddisks; i--; ) { struct md_rdev *rdev; sector_t first_bad; @@ -4744,7 +4708,7 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) /* Prefer to use the replacement for reads, but only * if it is recovered enough and has no bad blocks. */ - rdev =3D rcu_dereference(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (rdev && !test_bit(Faulty, &rdev->flags) && rdev->recovery_offset >=3D sh->sector + RAID5_STRIPE_SECTORS(conf) && !is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), @@ -4755,7 +4719,7 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) set_bit(R5_NeedReplace, &dev->flags); else clear_bit(R5_NeedReplace, &dev->flags); - rdev =3D rcu_dereference(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; clear_bit(R5_ReadRepl, &dev->flags); } if (rdev && test_bit(Faulty, &rdev->flags)) @@ -4802,8 +4766,8 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) if (test_bit(R5_WriteError, &dev->flags)) { /* This flag does not apply to '.replacement' * only to .rdev, so make sure to check that*/ - struct md_rdev *rdev2 =3D rcu_dereference( - conf->disks[i].rdev); + struct md_rdev *rdev2 =3D conf->disks[i].rdev; + if (rdev2 =3D=3D rdev) clear_bit(R5_Insync, &dev->flags); if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { @@ -4815,8 +4779,8 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) if (test_bit(R5_MadeGood, &dev->flags)) { /* This flag does not apply to '.replacement' * only to .rdev, so make sure to check that*/ - struct md_rdev *rdev2 =3D rcu_dereference( - conf->disks[i].rdev); + struct md_rdev *rdev2 =3D conf->disks[i].rdev; + if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { s->handle_bad_blocks =3D 1; atomic_inc(&rdev2->nr_pending); @@ -4824,8 +4788,8 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) clear_bit(R5_MadeGood, &dev->flags); } if (test_bit(R5_MadeGoodRepl, &dev->flags)) { - struct md_rdev *rdev2 =3D rcu_dereference( - conf->disks[i].replacement); + struct md_rdev *rdev2 =3D conf->disks[i].replacement; + if (rdev2 && !test_bit(Faulty, &rdev2->flags)) { s->handle_bad_blocks =3D 1; atomic_inc(&rdev2->nr_pending); @@ -4846,8 +4810,7 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) if (rdev && !test_bit(Faulty, &rdev->flags)) do_recovery =3D 1; else if (!rdev) { - rdev =3D rcu_dereference( - conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (rdev && !test_bit(Faulty, &rdev->flags)) do_recovery =3D 1; } @@ -4874,7 +4837,6 @@ static void analyse_stripe(struct stripe_head *sh, st= ruct stripe_head_state *s) else s->replacing =3D 1; } - rcu_read_unlock(); } =20 /* @@ -5331,23 +5293,23 @@ static void handle_stripe(struct stripe_head *sh) struct r5dev *dev =3D &sh->dev[i]; if (test_and_clear_bit(R5_WriteError, &dev->flags)) { /* We own a safe reference to the rdev */ - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; if (!rdev_set_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0)) md_error(conf->mddev, rdev); rdev_dec_pending(rdev, conf->mddev); } if (test_and_clear_bit(R5_MadeGood, &dev->flags)) { - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; rdev_clear_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0); rdev_dec_pending(rdev, conf->mddev); } if (test_and_clear_bit(R5_MadeGoodRepl, &dev->flags)) { - rdev =3D rdev_pend_deref(conf->disks[i].replacement); + rdev =3D conf->disks[i].replacement; if (!rdev) /* rdev have been moved down */ - rdev =3D rdev_pend_deref(conf->disks[i].rdev); + rdev =3D conf->disks[i].rdev; rdev_clear_badblocks(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf), 0); rdev_dec_pending(rdev, conf->mddev); @@ -5506,24 +5468,22 @@ static int raid5_read_one_chunk(struct mddev *mddev= , struct bio *raid_bio) &dd_idx, NULL); end_sector =3D sector + bio_sectors(raid_bio); =20 - rcu_read_lock(); if (r5c_big_stripe_cached(conf, sector)) - goto out_rcu_unlock; + return 0; =20 - rdev =3D rcu_dereference(conf->disks[dd_idx].replacement); + rdev =3D conf->disks[dd_idx].replacement; if (!rdev || test_bit(Faulty, &rdev->flags) || rdev->recovery_offset < end_sector) { - rdev =3D rcu_dereference(conf->disks[dd_idx].rdev); + rdev =3D conf->disks[dd_idx].rdev; if (!rdev) - goto out_rcu_unlock; + return 0; if (test_bit(Faulty, &rdev->flags) || !(test_bit(In_sync, &rdev->flags) || rdev->recovery_offset >=3D end_sector)) - goto out_rcu_unlock; + return 0; } =20 atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); =20 if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad, &bad_sectors)) { @@ -5567,10 +5527,6 @@ static int raid5_read_one_chunk(struct mddev *mddev,= struct bio *raid_bio) raid_bio->bi_iter.bi_sector); submit_bio_noacct(align_bio); return 1; - -out_rcu_unlock: - rcu_read_unlock(); - return 0; } =20 static struct bio *chunk_aligned_read(struct mddev *mddev, struct bio *rai= d_bio) @@ -6573,14 +6529,12 @@ static inline sector_t raid5_sync_request(struct md= dev *mddev, sector_t sector_n * Note in case of > 1 drive failures it's possible we're rebuilding * one drive while leaving another faulty drive in array. */ - rcu_read_lock(); for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D conf->disks[i].rdev; =20 if (rdev =3D=3D NULL || test_bit(Faulty, &rdev->flags)) still_degraded =3D 1; } - rcu_read_unlock(); =20 md_bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, still_degrad= ed); =20 @@ -7898,18 +7852,10 @@ static int raid5_run(struct mddev *mddev) =20 for (i =3D 0; i < conf->raid_disks && conf->previous_raid_disks; i++) { - rdev =3D rdev_mdlock_deref(mddev, conf->disks[i].rdev); - if (!rdev && conf->disks[i].replacement) { - /* The replacement is all we have yet */ - rdev =3D rdev_mdlock_deref(mddev, - conf->disks[i].replacement); - conf->disks[i].replacement =3D NULL; - clear_bit(Replacement, &rdev->flags); - rcu_assign_pointer(conf->disks[i].rdev, rdev); - } + rdev =3D conf->disks[i].rdev; if (!rdev) continue; - if (rcu_access_pointer(conf->disks[i].replacement) && + if (conf->disks[i].replacement && conf->reshape_progress !=3D MaxSector) { /* replacements and reshape simply do not mix. */ pr_warn("md: cannot handle concurrent replacement and reshape.\n"); @@ -8090,15 +8036,16 @@ static void raid5_status(struct seq_file *seq, stru= ct mddev *mddev) struct r5conf *conf =3D mddev->private; int i; =20 + lockdep_assert_held(&mddev->lock); + seq_printf(seq, " level %d, %dk chunk, algorithm %d", mddev->level, conf->chunk_sectors / 2, mddev->layout); seq_printf (seq, " [%d/%d] [", conf->raid_disks, conf->raid_disks - mddev= ->degraded); - rcu_read_lock(); for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->disks[i].rdev); + struct md_rdev *rdev =3D READ_ONCE(conf->disks[i].rdev); + seq_printf (seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "= _"); } - rcu_read_unlock(); seq_printf (seq, "]"); } =20 @@ -8111,9 +8058,8 @@ static int raid5_spare_active(struct mddev *mddev) unsigned long flags; =20 for (i =3D 0; i < conf->raid_disks; i++) { - rdev =3D rdev_mdlock_deref(mddev, conf->disks[i].rdev); - replacement =3D rdev_mdlock_deref(mddev, - conf->disks[i].replacement); + rdev =3D conf->disks[i].rdev; + replacement =3D conf->disks[i].replacement; if (replacement && replacement->recovery_offset =3D=3D MaxSector && !test_bit(Faulty, &replacement->flags) @@ -8151,7 +8097,7 @@ static int raid5_remove_disk(struct mddev *mddev, str= uct md_rdev *rdev) struct r5conf *conf =3D mddev->private; int err =3D 0; int number =3D rdev->raid_disk; - struct md_rdev __rcu **rdevp; + struct md_rdev **rdevp; struct disk_info *p; struct md_rdev *tmp; =20 @@ -8173,9 +8119,9 @@ static int raid5_remove_disk(struct mddev *mddev, str= uct md_rdev *rdev) if (unlikely(number >=3D conf->pool_size)) return 0; p =3D conf->disks + number; - if (rdev =3D=3D rcu_access_pointer(p->rdev)) + if (rdev =3D=3D p->rdev) rdevp =3D &p->rdev; - else if (rdev =3D=3D rcu_access_pointer(p->replacement)) + else if (rdev =3D=3D p->replacement) rdevp =3D &p->replacement; else return 0; @@ -8195,28 +8141,24 @@ static int raid5_remove_disk(struct mddev *mddev, s= truct md_rdev *rdev) if (!test_bit(Faulty, &rdev->flags) && mddev->recovery_disabled !=3D conf->recovery_disabled && !has_failed(conf) && - (!rcu_access_pointer(p->replacement) || - rcu_access_pointer(p->replacement) =3D=3D rdev) && + (!p->replacement || p->replacement =3D=3D rdev) && number < conf->raid_disks) { err =3D -EBUSY; goto abort; } - *rdevp =3D NULL; + WRITE_ONCE(*rdevp, NULL); if (!err) { err =3D log_modify(conf, rdev, false); if (err) goto abort; } =20 - tmp =3D rcu_access_pointer(p->replacement); + tmp =3D p->replacement; if (tmp) { /* We must have just cleared 'rdev' */ - rcu_assign_pointer(p->rdev, tmp); + WRITE_ONCE(p->rdev, tmp); clear_bit(Replacement, &tmp->flags); - smp_mb(); /* Make sure other CPUs may see both as identical - * but will never see neither - if they are careful - */ - rcu_assign_pointer(p->replacement, NULL); + WRITE_ONCE(p->replacement, NULL); =20 if (!err) err =3D log_modify(conf, tmp, true); @@ -8283,7 +8225,7 @@ static int raid5_add_disk(struct mddev *mddev, struct= md_rdev *rdev) rdev->raid_disk =3D disk; if (rdev->saved_raid_disk !=3D disk) conf->fullsync =3D 1; - rcu_assign_pointer(p->rdev, rdev); + WRITE_ONCE(p->rdev, rdev); =20 err =3D log_modify(conf, rdev, true); =20 @@ -8292,7 +8234,7 @@ static int raid5_add_disk(struct mddev *mddev, struct= md_rdev *rdev) } for (disk =3D first; disk <=3D last; disk++) { p =3D conf->disks + disk; - tmp =3D rdev_mdlock_deref(mddev, p->rdev); + tmp =3D p->rdev; if (test_bit(WantReplacement, &tmp->flags) && mddev->reshape_position =3D=3D MaxSector && p->replacement =3D=3D NULL) { @@ -8301,7 +8243,7 @@ static int raid5_add_disk(struct mddev *mddev, struct= md_rdev *rdev) rdev->raid_disk =3D disk; err =3D 0; conf->fullsync =3D 1; - rcu_assign_pointer(p->replacement, rdev); + WRITE_ONCE(p->replacement, rdev); break; } } @@ -8433,7 +8375,7 @@ static int raid5_start_reshape(struct mddev *mddev) if (mddev->recovery_cp < MaxSector) return -EBUSY; for (i =3D 0; i < conf->raid_disks; i++) - if (rdev_mdlock_deref(mddev, conf->disks[i].replacement)) + if (conf->disks[i].replacement) return -EBUSY; =20 rdev_for_each(rdev, mddev) { @@ -8604,12 +8546,10 @@ static void raid5_finish_reshape(struct mddev *mdde= v) for (d =3D conf->raid_disks ; d < conf->raid_disks - mddev->delta_disks; d++) { - rdev =3D rdev_mdlock_deref(mddev, - conf->disks[d].rdev); + rdev =3D conf->disks[d].rdev; if (rdev) clear_bit(In_sync, &rdev->flags); - rdev =3D rdev_mdlock_deref(mddev, - conf->disks[d].replacement); + rdev =3D conf->disks[d].replacement; if (rdev) clear_bit(In_sync, &rdev->flags); } diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 97a795979a35..9163c8cefb3f 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -473,8 +473,8 @@ enum { */ =20 struct disk_info { - struct md_rdev __rcu *rdev; - struct md_rdev __rcu *replacement; + struct md_rdev *rdev; + struct md_rdev *replacement; struct page *extra_page; /* extra page to use in prexor */ }; =20 --=20 2.39.2