From nobody Thu Dec 18 01:02:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 142A0C001DF for ; Sat, 21 Oct 2023 02:25:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233186AbjJUCZm (ORCPT ); Fri, 20 Oct 2023 22:25:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232568AbjJUCZd (ORCPT ); Fri, 20 Oct 2023 22:25:33 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8485810D0; Fri, 20 Oct 2023 19:25:28 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SC4zq0Z19z4f3nJY; Sat, 21 Oct 2023 10:25:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAnt9aHNjNlZ+cUDg--.5642S7; Sat, 21 Oct 2023 10:25:14 +0800 (CST) From: Yu Kuai To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai3@huawei.com, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH -next v2 3/6] md/raid1: remove rcu protection to access rdev from conf Date: Sat, 21 Oct 2023 18:20:56 +0800 Message-Id: <20231021102059.3198284-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231021102059.3198284-1-yukuai1@huaweicloud.com> References: <20231021102059.3198284-1-yukuai1@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAnt9aHNjNlZ+cUDg--.5642S7 X-Coremail-Antispam: 1UD129KBjvJXoW3WrWrJr43Wr47Aw1DXw17GFg_yoW3Wry3pw 43tas7JF4DX3s0gF1DAayDG3WSyryaqFWxJryfGw4I93s3KrZxtay8Gryaqry5CrZ8Ar15 X3W5K398CFyxKF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBE14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_JrWl82xGYIkIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2 z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0V AKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1l Ox8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErc IFxwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0pRPEf5UUUUU = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Yu Kuai Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid1. Signed-off-by: Yu Kuai --- drivers/md/raid1.c | 57 +++++++++++++++++----------------------------- 1 file changed, 21 insertions(+), 36 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 4348d670439d..5c647036663d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -609,7 +609,6 @@ static int read_balance(struct r1conf *conf, struct r1b= io *r1_bio, int *max_sect int choose_first; int choose_next_idle; =20 - rcu_read_lock(); /* * Check if we can balance. We can balance on the whole * device if no resync is going on, or below the resync window. @@ -642,7 +641,7 @@ static int read_balance(struct r1conf *conf, struct r1b= io *r1_bio, int *max_sect unsigned int pending; bool nonrot; =20 - rdev =3D rcu_dereference(conf->mirrors[disk].rdev); + rdev =3D conf->mirrors[disk].rdev; if (r1_bio->bios[disk] =3D=3D IO_BLOCKED || rdev =3D=3D NULL || test_bit(Faulty, &rdev->flags)) @@ -773,7 +772,7 @@ static int read_balance(struct r1conf *conf, struct r1b= io *r1_bio, int *max_sect } =20 if (best_disk >=3D 0) { - rdev =3D rcu_dereference(conf->mirrors[best_disk].rdev); + rdev =3D conf->mirrors[best_disk].rdev; if (!rdev) goto retry; atomic_inc(&rdev->nr_pending); @@ -784,7 +783,6 @@ static int read_balance(struct r1conf *conf, struct r1b= io *r1_bio, int *max_sect =20 conf->mirrors[best_disk].next_seq_sect =3D this_sector + sectors; } - rcu_read_unlock(); *max_sectors =3D sectors; =20 return best_disk; @@ -1235,14 +1233,12 @@ static void raid1_read_request(struct mddev *mddev,= struct bio *bio, =20 if (r1bio_existed) { /* Need to get the block device name carefully */ - struct md_rdev *rdev; - rcu_read_lock(); - rdev =3D rcu_dereference(conf->mirrors[r1_bio->read_disk].rdev); + struct md_rdev *rdev =3D conf->mirrors[r1_bio->read_disk].rdev; + if (rdev) snprintf(b, sizeof(b), "%pg", rdev->bdev); else strcpy(b, "???"); - rcu_read_unlock(); } =20 /* @@ -1396,10 +1392,9 @@ static void raid1_write_request(struct mddev *mddev,= struct bio *bio, =20 disks =3D conf->raid_disks * 2; blocked_rdev =3D NULL; - rcu_read_lock(); max_sectors =3D r1_bio->sectors; for (i =3D 0; i < disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->mirrors[i].rdev); + struct md_rdev *rdev =3D conf->mirrors[i].rdev; =20 /* * The write-behind io is only attempted on drives marked as @@ -1465,7 +1460,6 @@ static void raid1_write_request(struct mddev *mddev, = struct bio *bio, } r1_bio->bios[i] =3D bio; } - rcu_read_unlock(); =20 if (unlikely(blocked_rdev)) { /* Wait for this device to become unblocked */ @@ -1617,15 +1611,16 @@ static void raid1_status(struct seq_file *seq, stru= ct mddev *mddev) struct r1conf *conf =3D mddev->private; int i; =20 + lockdep_assert_held(&mddev->lock); + seq_printf(seq, " [%d/%d] [", conf->raid_disks, conf->raid_disks - mddev->degraded); - rcu_read_lock(); for (i =3D 0; i < conf->raid_disks; i++) { - struct md_rdev *rdev =3D rcu_dereference(conf->mirrors[i].rdev); + struct md_rdev *rdev =3D READ_ONCE(conf->mirrors[i].rdev); + seq_printf(seq, "%s", rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_"); } - rcu_read_unlock(); seq_printf(seq, "]"); } =20 @@ -1785,7 +1780,7 @@ static int raid1_add_disk(struct mddev *mddev, struct= md_rdev *rdev) */ if (rdev->saved_raid_disk < 0) conf->fullsync =3D 1; - rcu_assign_pointer(p->rdev, rdev); + WRITE_ONCE(p->rdev, rdev); break; } if (test_bit(WantReplacement, &p->rdev->flags) && @@ -1801,7 +1796,7 @@ static int raid1_add_disk(struct mddev *mddev, struct= md_rdev *rdev) rdev->raid_disk =3D repl_slot; err =3D 0; conf->fullsync =3D 1; - rcu_assign_pointer(p[conf->raid_disks].rdev, rdev); + WRITE_ONCE(p[conf->raid_disks].rdev, rdev); } =20 return err; @@ -1835,7 +1830,7 @@ static int raid1_remove_disk(struct mddev *mddev, str= uct md_rdev *rdev) err =3D -EBUSY; goto abort; } - p->rdev =3D NULL; + WRITE_ONCE(p->rdev, NULL); if (conf->mirrors[conf->raid_disks + number].rdev) { /* We just removed a device that is being replaced. * Move down the replacement. We drain all IO before @@ -1856,7 +1851,7 @@ static int raid1_remove_disk(struct mddev *mddev, str= uct md_rdev *rdev) goto abort; } clear_bit(Replacement, &repl->flags); - p->rdev =3D repl; + WRITE_ONCE(p->rdev, repl); conf->mirrors[conf->raid_disks + number].rdev =3D NULL; unfreeze_array(conf); } @@ -2253,8 +2248,7 @@ static void fix_read_error(struct r1conf *conf, int r= ead_disk, sector_t first_bad; int bad_sectors; =20 - rcu_read_lock(); - rdev =3D rcu_dereference(conf->mirrors[d].rdev); + rdev =3D conf->mirrors[d].rdev; if (rdev && (test_bit(In_sync, &rdev->flags) || (!test_bit(Faulty, &rdev->flags) && @@ -2262,15 +2256,14 @@ static void fix_read_error(struct r1conf *conf, int= read_disk, is_badblock(rdev, sect, s, &first_bad, &bad_sectors) =3D=3D 0) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); if (sync_page_io(rdev, sect, s<<9, conf->tmppage, REQ_OP_READ, false)) success =3D 1; rdev_dec_pending(rdev, mddev); if (success) break; - } else - rcu_read_unlock(); + } + d++; if (d =3D=3D conf->raid_disks * 2) d =3D 0; @@ -2289,29 +2282,24 @@ static void fix_read_error(struct r1conf *conf, int= read_disk, if (d=3D=3D0) d =3D conf->raid_disks * 2; d--; - rcu_read_lock(); - rdev =3D rcu_dereference(conf->mirrors[d].rdev); + rdev =3D conf->mirrors[d].rdev; if (rdev && !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); r1_sync_page_io(rdev, sect, s, conf->tmppage, WRITE); rdev_dec_pending(rdev, mddev); - } else - rcu_read_unlock(); + } } d =3D start; while (d !=3D read_disk) { if (d=3D=3D0) d =3D conf->raid_disks * 2; d--; - rcu_read_lock(); - rdev =3D rcu_dereference(conf->mirrors[d].rdev); + rdev =3D conf->mirrors[d].rdev; if (rdev && !test_bit(Faulty, &rdev->flags)) { atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); if (r1_sync_page_io(rdev, sect, s, conf->tmppage, READ)) { atomic_add(s, &rdev->corrected_errors); @@ -2322,8 +2310,7 @@ static void fix_read_error(struct r1conf *conf, int r= ead_disk, rdev->bdev); } rdev_dec_pending(rdev, mddev); - } else - rcu_read_unlock(); + } } sectors -=3D s; sect +=3D s; @@ -2704,7 +2691,6 @@ static sector_t raid1_sync_request(struct mddev *mdde= v, sector_t sector_nr, =20 r1_bio =3D raid1_alloc_init_r1buf(conf); =20 - rcu_read_lock(); /* * If we get a correctably read error during resync or recovery, * we might want to read from a different device. So we @@ -2725,7 +2711,7 @@ static sector_t raid1_sync_request(struct mddev *mdde= v, sector_t sector_nr, struct md_rdev *rdev; bio =3D r1_bio->bios[i]; =20 - rdev =3D rcu_dereference(conf->mirrors[i].rdev); + rdev =3D conf->mirrors[i].rdev; if (rdev =3D=3D NULL || test_bit(Faulty, &rdev->flags)) { if (i < conf->raid_disks) @@ -2783,7 +2769,6 @@ static sector_t raid1_sync_request(struct mddev *mdde= v, sector_t sector_nr, bio->bi_opf |=3D MD_FAILFAST; } } - rcu_read_unlock(); if (disk < 0) disk =3D wonly; r1_bio->read_disk =3D disk; --=20 2.39.2