From nobody Thu Jan 30 18:55:19 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F28351FC11E; Mon, 27 Jan 2025 08:58:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968300; cv=none; b=Qp4NMYtoR6UWNbCmKm6iwhsYKOO0Kj5idSIALzIEXpX8ZYEianPznU56ZD4nIzRYC7RNgNFunuRqGUpmFwF9nUmq4oRQ/GdnJg+uyyP8NhpCUBjQNbOAjI8BqzwulpXDfLkXkYeRPQH4ehqE7ONS2XCTzf0OdeVJqUH+b0nlp9o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968300; c=relaxed/simple; bh=P+uCVUy09YKWOrNy1GBTqFFjsuQfghM4T4zyRzBy3H8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Xq6nF/l0vW9uLw2W7+1h0QnrnbZbTJCxIoRYSKt3YebWvJmct09/KkUFWgTpk/hrD+Rc4R6oEAW9DmPbU0oIlcWu0nA99xudTT00kjDAtEilZ2QmerEKxNu2jYspRIBMqlRkDI8L1I4dAY+Yk3s+cyAasuOgxNuUC4M9979HzZk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YhMkq2fGGz4f3kFl; Mon, 27 Jan 2025 16:57:55 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 2EC5E1A1445; Mon, 27 Jan 2025 16:58:16 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAXe1+mSpdnShB+CA--.53281S5; Mon, 27 Jan 2025 16:58:16 +0800 (CST) From: Yu Kuai To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH 6.12 1/5] md/md-bitmap: factor behind write counters out from bitmap_{start/end}write() Date: Mon, 27 Jan 2025 16:52:10 +0800 Message-Id: <20250127085214.3197761-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250127085214.3197761-1-yukuai1@huaweicloud.com> References: <20250127085214.3197761-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAXe1+mSpdnShB+CA--.53281S5 X-Coremail-Antispam: 1UD129KBjvJXoW3Jw4kCFW8Ar48JF48KFWDXFb_yoWfur4kpa 9FqFy3CrW5trW3Xw1DAFyDuF1Fvw1ktr9rtrWfW3sYg3Wqyr90gF4rWFWrKwn8AFy3ZFW3 Zan8trWUCrW2qFUanT9S1TB71UUUUUDqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBC14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r1I6r4UM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVW8ZVWrXw CF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j 6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64 vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_ Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42 IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjTRuGQ6DUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai commit 08c50142a128dcb2d7060aa3b4c5db8837f7a46a upstream. behind_write is only used in raid1, prepare to refactor bitmap_{start/end}write(), there are no functional changes. Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni Link: https://lore.kernel.org/r/20250109015145.158868-2-yukuai1@huaweicloud= .com Signed-off-by: Song Liu --- drivers/md/md-bitmap.c | 57 +++++++++++++++++++++++++--------------- drivers/md/md-bitmap.h | 7 +++-- drivers/md/raid1.c | 12 +++++---- drivers/md/raid10.c | 6 ++--- drivers/md/raid5-cache.c | 3 +-- drivers/md/raid5.c | 11 ++++---- 6 files changed, 56 insertions(+), 40 deletions(-) diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index c3a42dd66ce5..84fb4cc67d5e 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -1671,24 +1671,13 @@ __acquires(bitmap->lock) } =20 static int bitmap_startwrite(struct mddev *mddev, sector_t offset, - unsigned long sectors, bool behind) + unsigned long sectors) { struct bitmap *bitmap =3D mddev->bitmap; =20 if (!bitmap) return 0; =20 - if (behind) { - int bw; - atomic_inc(&bitmap->behind_writes); - bw =3D atomic_read(&bitmap->behind_writes); - if (bw > bitmap->behind_writes_used) - bitmap->behind_writes_used =3D bw; - - pr_debug("inc write-behind count %d/%lu\n", - bw, bitmap->mddev->bitmap_info.max_write_behind); - } - while (sectors) { sector_t blocks; bitmap_counter_t *bmc; @@ -1737,21 +1726,13 @@ static int bitmap_startwrite(struct mddev *mddev, s= ector_t offset, } =20 static void bitmap_endwrite(struct mddev *mddev, sector_t offset, - unsigned long sectors, bool success, bool behind) + unsigned long sectors, bool success) { struct bitmap *bitmap =3D mddev->bitmap; =20 if (!bitmap) return; =20 - if (behind) { - if (atomic_dec_and_test(&bitmap->behind_writes)) - wake_up(&bitmap->behind_wait); - pr_debug("dec write-behind count %d/%lu\n", - atomic_read(&bitmap->behind_writes), - bitmap->mddev->bitmap_info.max_write_behind); - } - while (sectors) { sector_t blocks; unsigned long flags; @@ -2062,6 +2043,37 @@ static void md_bitmap_free(void *data) kfree(bitmap); } =20 +static void bitmap_start_behind_write(struct mddev *mddev) +{ + struct bitmap *bitmap =3D mddev->bitmap; + int bw; + + if (!bitmap) + return; + + atomic_inc(&bitmap->behind_writes); + bw =3D atomic_read(&bitmap->behind_writes); + if (bw > bitmap->behind_writes_used) + bitmap->behind_writes_used =3D bw; + + pr_debug("inc write-behind count %d/%lu\n", + bw, bitmap->mddev->bitmap_info.max_write_behind); +} + +static void bitmap_end_behind_write(struct mddev *mddev) +{ + struct bitmap *bitmap =3D mddev->bitmap; + + if (!bitmap) + return; + + if (atomic_dec_and_test(&bitmap->behind_writes)) + wake_up(&bitmap->behind_wait); + pr_debug("dec write-behind count %d/%lu\n", + atomic_read(&bitmap->behind_writes), + bitmap->mddev->bitmap_info.max_write_behind); +} + static void bitmap_wait_behind_writes(struct mddev *mddev) { struct bitmap *bitmap =3D mddev->bitmap; @@ -2981,6 +2993,9 @@ static struct bitmap_operations bitmap_ops =3D { .dirty_bits =3D bitmap_dirty_bits, .unplug =3D bitmap_unplug, .daemon_work =3D bitmap_daemon_work, + + .start_behind_write =3D bitmap_start_behind_write, + .end_behind_write =3D bitmap_end_behind_write, .wait_behind_writes =3D bitmap_wait_behind_writes, =20 .startwrite =3D bitmap_startwrite, diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index 662e6fc141a7..e87a1f493d3c 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -84,12 +84,15 @@ struct bitmap_operations { unsigned long e); void (*unplug)(struct mddev *mddev, bool sync); void (*daemon_work)(struct mddev *mddev); + + void (*start_behind_write)(struct mddev *mddev); + void (*end_behind_write)(struct mddev *mddev); void (*wait_behind_writes)(struct mddev *mddev); =20 int (*startwrite)(struct mddev *mddev, sector_t offset, - unsigned long sectors, bool behind); + unsigned long sectors); void (*endwrite)(struct mddev *mddev, sector_t offset, - unsigned long sectors, bool success, bool behind); + unsigned long sectors, bool success); bool (*start_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks, bool degraded); void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks); diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 6c9d24203f39..94d50ee8d9b9 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -420,10 +420,11 @@ static void close_write(struct r1bio *r1_bio) r1_bio->behind_master_bio =3D NULL; } =20 + if (test_bit(R1BIO_BehindIO, &r1_bio->state)) + mddev->bitmap_ops->end_behind_write(mddev); /* clear the bitmap if all writes complete successfully */ mddev->bitmap_ops->endwrite(mddev, r1_bio->sector, r1_bio->sectors, - !test_bit(R1BIO_Degraded, &r1_bio->state), - test_bit(R1BIO_BehindIO, &r1_bio->state)); + !test_bit(R1BIO_Degraded, &r1_bio->state)); md_write_end(mddev); } =20 @@ -1611,9 +1612,10 @@ static void raid1_write_request(struct mddev *mddev,= struct bio *bio, stats.behind_writes < max_write_behind) alloc_behind_master_bio(r1_bio, bio); =20 - mddev->bitmap_ops->startwrite( - mddev, r1_bio->sector, r1_bio->sectors, - test_bit(R1BIO_BehindIO, &r1_bio->state)); + if (test_bit(R1BIO_BehindIO, &r1_bio->state)) + mddev->bitmap_ops->start_behind_write(mddev); + mddev->bitmap_ops->startwrite(mddev, r1_bio->sector, + r1_bio->sectors); first_clone =3D 0; } =20 diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 862b1fb71d86..80873bd65851 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -430,8 +430,7 @@ static void close_write(struct r10bio *r10_bio) =20 /* clear the bitmap if all writes complete successfully */ mddev->bitmap_ops->endwrite(mddev, r10_bio->sector, r10_bio->sectors, - !test_bit(R10BIO_Degraded, &r10_bio->state), - false); + !test_bit(R10BIO_Degraded, &r10_bio->state)); md_write_end(mddev); } =20 @@ -1493,8 +1492,7 @@ static void raid10_write_request(struct mddev *mddev,= struct bio *bio, md_account_bio(mddev, &bio); r10_bio->master_bio =3D bio; atomic_set(&r10_bio->remaining, 1); - mddev->bitmap_ops->startwrite(mddev, r10_bio->sector, r10_bio->sectors, - false); + mddev->bitmap_ops->startwrite(mddev, r10_bio->sector, r10_bio->sectors); =20 for (i =3D 0; i < conf->copies; i++) { if (r10_bio->devs[i].bio) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index b4f7b79fd187..4c7ecdd5c1f3 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -315,8 +315,7 @@ void r5c_handle_cached_data_endio(struct r5conf *conf, r5c_return_dev_pending_writes(conf, &sh->dev[i]); conf->mddev->bitmap_ops->endwrite(conf->mddev, sh->sector, RAID5_STRIPE_SECTORS(conf), - !test_bit(STRIPE_DEGRADED, &sh->state), - false); + !test_bit(STRIPE_DEGRADED, &sh->state)); } } } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 2fa1f270fb1d..d08d64ae7bc0 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3564,7 +3564,7 @@ static void __add_stripe_bio(struct stripe_head *sh, = struct bio *bi, set_bit(STRIPE_BITMAP_PENDING, &sh->state); spin_unlock_irq(&sh->stripe_lock); conf->mddev->bitmap_ops->startwrite(conf->mddev, sh->sector, - RAID5_STRIPE_SECTORS(conf), false); + RAID5_STRIPE_SECTORS(conf)); spin_lock_irq(&sh->stripe_lock); clear_bit(STRIPE_BITMAP_PENDING, &sh->state); if (!sh->batch_head) { @@ -3665,7 +3665,7 @@ handle_failed_stripe(struct r5conf *conf, struct stri= pe_head *sh, if (bitmap_end) conf->mddev->bitmap_ops->endwrite(conf->mddev, sh->sector, RAID5_STRIPE_SECTORS(conf), - false, false); + false); bitmap_end =3D 0; /* and fail all 'written' */ bi =3D sh->dev[i].written; @@ -3712,7 +3712,7 @@ handle_failed_stripe(struct r5conf *conf, struct stri= pe_head *sh, if (bitmap_end) conf->mddev->bitmap_ops->endwrite(conf->mddev, sh->sector, RAID5_STRIPE_SECTORS(conf), - false, false); + false); /* If we were in the middle of a write the parity block might * still be locked - so just clear all R5_LOCKED flags */ @@ -4063,8 +4063,7 @@ static void handle_stripe_clean_event(struct r5conf *= conf, } conf->mddev->bitmap_ops->endwrite(conf->mddev, sh->sector, RAID5_STRIPE_SECTORS(conf), - !test_bit(STRIPE_DEGRADED, &sh->state), - false); + !test_bit(STRIPE_DEGRADED, &sh->state)); if (head_sh->batch_head) { sh =3D list_first_entry(&sh->batch_list, struct stripe_head, @@ -5788,7 +5787,7 @@ static void make_discard_request(struct mddev *mddev,= struct bio *bi) for (d =3D 0; d < conf->raid_disks - conf->max_degraded; d++) mddev->bitmap_ops->startwrite(mddev, sh->sector, - RAID5_STRIPE_SECTORS(conf), false); + RAID5_STRIPE_SECTORS(conf)); sh->bm_seq =3D conf->seq_flush + 1; set_bit(STRIPE_BIT_DELAY, &sh->state); } --=20 2.39.2 From nobody Thu Jan 30 18:55:19 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3C541FC7FF; Mon, 27 Jan 2025 08:58:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968302; cv=none; b=IhX1eOPA+Ru5v/5DMjiOdJMzpbt9X7M5QS6v5AepP6dUsKHtD++Dztsi6K8yZHdlcr2u4oiYPphPM8i4rJjv4SffmyIr6k+SkTp95pXgfeXcmElYOSxSTeiPBGh3VFX22vA1YQP+pmi5WRRLaS4fAmQ73UF+vatMrtEl0Nd1BNQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968302; c=relaxed/simple; bh=OopqUzPMWaYDUBSDPwv3uuP3M+byAlVD0mZcWUJiofA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Dnm2F2gPm+MdkzrzkcniMVtWeawWRXueoKYpX8zUuRMPqE6wHgZUiGnu9c65PpR2TC6pVyw+ALVG7k7Gg/oUshjgy0pdG8NROf0pu60rWUmqnYMEzzzT0Ci2hH3G6W5zjoS7E/0qDBw/0MGepi465124fv1Q2OjSagCXDfvHW3k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YhMkx0DTtz4f3jqx; Mon, 27 Jan 2025 16:58:01 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 976451A07C0; Mon, 27 Jan 2025 16:58:16 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAXe1+mSpdnShB+CA--.53281S6; Mon, 27 Jan 2025 16:58:16 +0800 (CST) From: Yu Kuai To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH 6.12 2/5] md/md-bitmap: remove the last parameter for bimtap_ops->endwrite() Date: Mon, 27 Jan 2025 16:52:11 +0800 Message-Id: <20250127085214.3197761-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250127085214.3197761-1-yukuai1@huaweicloud.com> References: <20250127085214.3197761-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAXe1+mSpdnShB+CA--.53281S6 X-Coremail-Antispam: 1UD129KBjvAXoW3tFyUAr1rZw4fKFWDtrW3Awb_yoW8JrW3uo Z5Zwn3ua1Fqw1rWFyDtr1ftFW3Was0k3ySy3WfWrs8WrZxXa4Fqr1xGrWfG3yUtr1fZFW3 Zr9Fqr48JF4rAw17n29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYE7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r15M28IrcIa0x kI8VCY1x0267AKxVW8JVW5JwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l84AC jcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr 1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1l e2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI 8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwAC jcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_GFv_Wr yl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWU JVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7V AKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26F4j 6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIx AIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7sR_jXHtUUUUU= = X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai commit 4f0e7d0e03b7b80af84759a9e7cfb0f81ac4adae upstream. For the case that IO failed for one rdev, the bit will be mark as NEEDED in following cases: 1) If badblocks is set and rdev is not faulty; 2) If rdev is faulty; Case 1) is useless because synchronize data to badblocks make no sense. Case 2) can be replaced with mddev->degraded. Also remove R1BIO_Degraded, R10BIO_Degraded and STRIPE_DEGRADED since case 2) no longer use them. Signed-off-by: Yu Kuai Link: https://lore.kernel.org/r/20250109015145.158868-3-yukuai1@huaweicloud= .com Signed-off-by: Song Liu --- drivers/md/md-bitmap.c | 19 ++++++++++--------- drivers/md/md-bitmap.h | 2 +- drivers/md/raid1.c | 26 +++----------------------- drivers/md/raid1.h | 1 - drivers/md/raid10.c | 23 +++-------------------- drivers/md/raid10.h | 1 - drivers/md/raid5-cache.c | 3 +-- drivers/md/raid5.c | 15 +++------------ drivers/md/raid5.h | 1 - 9 files changed, 21 insertions(+), 70 deletions(-) diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index 84fb4cc67d5e..b40a84b01085 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -1726,7 +1726,7 @@ static int bitmap_startwrite(struct mddev *mddev, sec= tor_t offset, } =20 static void bitmap_endwrite(struct mddev *mddev, sector_t offset, - unsigned long sectors, bool success) + unsigned long sectors) { struct bitmap *bitmap =3D mddev->bitmap; =20 @@ -1745,15 +1745,16 @@ static void bitmap_endwrite(struct mddev *mddev, se= ctor_t offset, return; } =20 - if (success && !bitmap->mddev->degraded && - bitmap->events_cleared < bitmap->mddev->events) { - bitmap->events_cleared =3D bitmap->mddev->events; - bitmap->need_sync =3D 1; - sysfs_notify_dirent_safe(bitmap->sysfs_can_clear); - } - - if (!success && !NEEDED(*bmc)) + if (!bitmap->mddev->degraded) { + if (bitmap->events_cleared < bitmap->mddev->events) { + bitmap->events_cleared =3D bitmap->mddev->events; + bitmap->need_sync =3D 1; + sysfs_notify_dirent_safe( + bitmap->sysfs_can_clear); + } + } else if (!NEEDED(*bmc)) { *bmc |=3D NEEDED_MASK; + } =20 if (COUNTER(*bmc) =3D=3D COUNTER_MAX) wake_up(&bitmap->overflow_wait); diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index e87a1f493d3c..31c93019c76b 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -92,7 +92,7 @@ struct bitmap_operations { int (*startwrite)(struct mddev *mddev, sector_t offset, unsigned long sectors); void (*endwrite)(struct mddev *mddev, sector_t offset, - unsigned long sectors, bool success); + unsigned long sectors); bool (*start_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks, bool degraded); void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks); diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 94d50ee8d9b9..85521579203c 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -423,8 +423,7 @@ static void close_write(struct r1bio *r1_bio) if (test_bit(R1BIO_BehindIO, &r1_bio->state)) mddev->bitmap_ops->end_behind_write(mddev); /* clear the bitmap if all writes complete successfully */ - mddev->bitmap_ops->endwrite(mddev, r1_bio->sector, r1_bio->sectors, - !test_bit(R1BIO_Degraded, &r1_bio->state)); + mddev->bitmap_ops->endwrite(mddev, r1_bio->sector, r1_bio->sectors); md_write_end(mddev); } =20 @@ -481,8 +480,6 @@ static void raid1_end_write_request(struct bio *bio) if (!test_bit(Faulty, &rdev->flags)) set_bit(R1BIO_WriteError, &r1_bio->state); else { - /* Fail the request */ - set_bit(R1BIO_Degraded, &r1_bio->state); /* Finished with this branch */ r1_bio->bios[mirror] =3D NULL; to_put =3D bio; @@ -1493,11 +1490,8 @@ static void raid1_write_request(struct mddev *mddev,= struct bio *bio, break; } r1_bio->bios[i] =3D NULL; - if (!rdev || test_bit(Faulty, &rdev->flags)) { - if (i < conf->raid_disks) - set_bit(R1BIO_Degraded, &r1_bio->state); + if (!rdev || test_bit(Faulty, &rdev->flags)) continue; - } =20 atomic_inc(&rdev->nr_pending); if (test_bit(WriteErrorSeen, &rdev->flags)) { @@ -1523,16 +1517,6 @@ static void raid1_write_request(struct mddev *mddev,= struct bio *bio, */ max_sectors =3D bad_sectors; rdev_dec_pending(rdev, mddev); - /* We don't set R1BIO_Degraded as that - * only applies if the disk is - * missing, so it might be re-added, - * and we want to know to recover this - * chunk. - * In this case the device is here, - * and the fact that this chunk is not - * in-sync is recorded in the bad - * block log - */ continue; } if (is_bad) { @@ -2569,12 +2553,10 @@ static void handle_write_finished(struct r1conf *co= nf, struct r1bio *r1_bio) * errors. */ fail =3D true; - if (!narrow_write_error(r1_bio, m)) { + if (!narrow_write_error(r1_bio, m)) md_error(conf->mddev, conf->mirrors[m].rdev); /* an I/O failed, we can't clear the bitmap */ - set_bit(R1BIO_Degraded, &r1_bio->state); - } rdev_dec_pending(conf->mirrors[m].rdev, conf->mddev); } @@ -2665,8 +2647,6 @@ static void raid1d(struct md_thread *thread) list_del(&r1_bio->retry_list); idx =3D sector_to_idx(r1_bio->sector); atomic_dec(&conf->nr_queued[idx]); - if (mddev->degraded) - set_bit(R1BIO_Degraded, &r1_bio->state); if (test_bit(R1BIO_WriteError, &r1_bio->state)) close_write(r1_bio); raid_end_bio_io(r1_bio); diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h index 5300cbaa58a4..33f318fcc268 100644 --- a/drivers/md/raid1.h +++ b/drivers/md/raid1.h @@ -188,7 +188,6 @@ struct r1bio { enum r1bio_state { R1BIO_Uptodate, R1BIO_IsSync, - R1BIO_Degraded, R1BIO_BehindIO, /* Set ReadError on bios that experience a readerror so that * raid1d knows what to do with them. diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 80873bd65851..f6aa56a7d400 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -429,8 +429,7 @@ static void close_write(struct r10bio *r10_bio) struct mddev *mddev =3D r10_bio->mddev; =20 /* clear the bitmap if all writes complete successfully */ - mddev->bitmap_ops->endwrite(mddev, r10_bio->sector, r10_bio->sectors, - !test_bit(R10BIO_Degraded, &r10_bio->state)); + mddev->bitmap_ops->endwrite(mddev, r10_bio->sector, r10_bio->sectors); md_write_end(mddev); } =20 @@ -500,7 +499,6 @@ static void raid10_end_write_request(struct bio *bio) set_bit(R10BIO_WriteError, &r10_bio->state); else { /* Fail the request */ - set_bit(R10BIO_Degraded, &r10_bio->state); r10_bio->devs[slot].bio =3D NULL; to_put =3D bio; dec_rdev =3D 1; @@ -1429,10 +1427,8 @@ static void raid10_write_request(struct mddev *mddev= , struct bio *bio, r10_bio->devs[i].bio =3D NULL; r10_bio->devs[i].repl_bio =3D NULL; =20 - if (!rdev && !rrdev) { - set_bit(R10BIO_Degraded, &r10_bio->state); + if (!rdev && !rrdev) continue; - } if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) { sector_t first_bad; sector_t dev_sector =3D r10_bio->devs[i].addr; @@ -1449,14 +1445,6 @@ static void raid10_write_request(struct mddev *mddev= , struct bio *bio, * to other devices yet */ max_sectors =3D bad_sectors; - /* We don't set R10BIO_Degraded as that - * only applies if the disk is missing, - * so it might be re-added, and we want to - * know to recover this chunk. - * In this case the device is here, and the - * fact that this chunk is not in-sync is - * recorded in the bad block log. - */ continue; } if (is_bad) { @@ -2908,11 +2896,8 @@ static void handle_write_completed(struct r10conf *c= onf, struct r10bio *r10_bio) rdev_dec_pending(rdev, conf->mddev); } else if (bio !=3D NULL && bio->bi_status) { fail =3D true; - if (!narrow_write_error(r10_bio, m)) { + if (!narrow_write_error(r10_bio, m)) md_error(conf->mddev, rdev); - set_bit(R10BIO_Degraded, - &r10_bio->state); - } rdev_dec_pending(rdev, conf->mddev); } bio =3D r10_bio->devs[m].repl_bio; @@ -2971,8 +2956,6 @@ static void raid10d(struct md_thread *thread) r10_bio =3D list_first_entry(&tmp, struct r10bio, retry_list); list_del(&r10_bio->retry_list); - if (mddev->degraded) - set_bit(R10BIO_Degraded, &r10_bio->state); =20 if (test_bit(R10BIO_WriteError, &r10_bio->state)) diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h index 2e75e88d0802..3f16ad6904a9 100644 --- a/drivers/md/raid10.h +++ b/drivers/md/raid10.h @@ -161,7 +161,6 @@ enum r10bio_state { R10BIO_IsSync, R10BIO_IsRecover, R10BIO_IsReshape, - R10BIO_Degraded, /* Set ReadError on bios that experience a read error * so that raid10d knows what to do with them. */ diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 4c7ecdd5c1f3..ba4f9577c737 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -314,8 +314,7 @@ void r5c_handle_cached_data_endio(struct r5conf *conf, set_bit(R5_UPTODATE, &sh->dev[i].flags); r5c_return_dev_pending_writes(conf, &sh->dev[i]); conf->mddev->bitmap_ops->endwrite(conf->mddev, - sh->sector, RAID5_STRIPE_SECTORS(conf), - !test_bit(STRIPE_DEGRADED, &sh->state)); + sh->sector, RAID5_STRIPE_SECTORS(conf)); } } } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index d08d64ae7bc0..4482e1b1e4e3 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1345,8 +1345,6 @@ static void ops_run_io(struct stripe_head *sh, struct= stripe_head_state *s) submit_bio_noacct(rbi); } if (!rdev && !rrdev) { - if (op_is_write(op)) - set_bit(STRIPE_DEGRADED, &sh->state); pr_debug("skip op %d on disc %d for sector %llu\n", bi->bi_opf, i, (unsigned long long)sh->sector); clear_bit(R5_LOCKED, &sh->dev[i].flags); @@ -2884,7 +2882,6 @@ static void raid5_end_write_request(struct bio *bi) set_bit(R5_MadeGoodRepl, &sh->dev[i].flags); } else { if (bi->bi_status) { - set_bit(STRIPE_DEGRADED, &sh->state); set_bit(WriteErrorSeen, &rdev->flags); set_bit(R5_WriteError, &sh->dev[i].flags); if (!test_and_set_bit(WantReplacement, &rdev->flags)) @@ -3664,8 +3661,7 @@ handle_failed_stripe(struct r5conf *conf, struct stri= pe_head *sh, } if (bitmap_end) conf->mddev->bitmap_ops->endwrite(conf->mddev, - sh->sector, RAID5_STRIPE_SECTORS(conf), - false); + sh->sector, RAID5_STRIPE_SECTORS(conf)); bitmap_end =3D 0; /* and fail all 'written' */ bi =3D sh->dev[i].written; @@ -3711,8 +3707,7 @@ handle_failed_stripe(struct r5conf *conf, struct stri= pe_head *sh, } if (bitmap_end) conf->mddev->bitmap_ops->endwrite(conf->mddev, - sh->sector, RAID5_STRIPE_SECTORS(conf), - false); + sh->sector, RAID5_STRIPE_SECTORS(conf)); /* If we were in the middle of a write the parity block might * still be locked - so just clear all R5_LOCKED flags */ @@ -4062,8 +4057,7 @@ static void handle_stripe_clean_event(struct r5conf *= conf, wbi =3D wbi2; } conf->mddev->bitmap_ops->endwrite(conf->mddev, - sh->sector, RAID5_STRIPE_SECTORS(conf), - !test_bit(STRIPE_DEGRADED, &sh->state)); + sh->sector, RAID5_STRIPE_SECTORS(conf)); if (head_sh->batch_head) { sh =3D list_first_entry(&sh->batch_list, struct stripe_head, @@ -4340,7 +4334,6 @@ static void handle_parity_checks5(struct r5conf *conf= , struct stripe_head *sh, s->locked++; set_bit(R5_Wantwrite, &dev->flags); =20 - clear_bit(STRIPE_DEGRADED, &sh->state); set_bit(STRIPE_INSYNC, &sh->state); break; case check_state_run: @@ -4497,7 +4490,6 @@ static void handle_parity_checks6(struct r5conf *conf= , struct stripe_head *sh, clear_bit(R5_Wantwrite, &dev->flags); s->locked--; } - clear_bit(STRIPE_DEGRADED, &sh->state); =20 set_bit(STRIPE_INSYNC, &sh->state); break; @@ -4900,7 +4892,6 @@ static void break_stripe_batch_list(struct stripe_hea= d *head_sh, =20 set_mask_bits(&sh->state, ~(STRIPE_EXPAND_SYNC_FLAGS | (1 << STRIPE_PREREAD_ACTIVE) | - (1 << STRIPE_DEGRADED) | (1 << STRIPE_ON_UNPLUG_LIST)), head_sh->state & (1 << STRIPE_INSYNC)); =20 diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 896ecfc4afa6..538843123d92 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -358,7 +358,6 @@ enum { STRIPE_REPLACED, STRIPE_PREREAD_ACTIVE, STRIPE_DELAYED, - STRIPE_DEGRADED, STRIPE_BIT_DELAY, STRIPE_EXPANDING, STRIPE_EXPAND_SOURCE, --=20 2.39.2 From nobody Thu Jan 30 18:55:19 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E82B11FCCF9; Mon, 27 Jan 2025 08:58:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968301; cv=none; b=XuYNR6HJoT3gMUXTSnZVuV2s8bt1NTCjBdcI3R+BrUN+fnxldzV+ulQAvjKBcl5V1+null2g3Vl6GI2bB8Foxt9sjLrZnqHmW1aCf7vo6n1qBqqO5eBiCYM4QvwLDfDaeKd5wf3a2Pc+FBAg8uWyqVSxA9BazhK9clx4hZWjoXY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968301; c=relaxed/simple; bh=y/LXVW6oH4u4C6MXkjImW0HbM2iucpRfObOtzgX0T2w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oNsxaBtExPVRUlSfnNffvIwzNAhiB+nQ6RknramCwFJh9bYdeQvB/x1dpse9to4aNulEfllX+hxKvhd47KJjMasrFub3MbI06jknPIwrTHvdIFa2BxkUFpl+6MGBioe7nJy0S5LbQjUrZlinGkXY1qUzGtE+qw3+O354J4HCW9A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YhMkr1hH1z4f3jXb; Mon, 27 Jan 2025 16:57:56 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 0AF791A0D8C; Mon, 27 Jan 2025 16:58:17 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAXe1+mSpdnShB+CA--.53281S7; Mon, 27 Jan 2025 16:58:16 +0800 (CST) From: Yu Kuai To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH 6.12 3/5] md: add a new callback pers->bitmap_sector() Date: Mon, 27 Jan 2025 16:52:12 +0800 Message-Id: <20250127085214.3197761-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250127085214.3197761-1-yukuai1@huaweicloud.com> References: <20250127085214.3197761-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAXe1+mSpdnShB+CA--.53281S7 X-Coremail-Antispam: 1UD129KBjvdXoW7GF47Cry8KF4UJFW5uw4Durg_yoWDArc_Cr nIqryfGFn8Grn5tr18ur1SvrWqywn3WF4DWFy7KFyfZF95t34fArWvkFyrJw1fXF95u3sx JryUXrs5ZFnrJjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUIcSsGvfJTRUUUb6xFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k26cxKx2IYs7xG 6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUWwA2048vs2IY02 0Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xv wVC0I7IYx2IY67AKxVW5JVW7JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UM2 8EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1Y6r17McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0En4kS14v26r1q6r43Mx AIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_ Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwI xGrwCI42IY6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWxJVW8 Jr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0x vEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JULBMNUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai commit 0c984a283a3ea3f10bebecd6c57c1d41b2e4f518 upstream. This callback will be used in raid5 to convert io ranges from array to bitmap. Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni Link: https://lore.kernel.org/r/20250109015145.158868-4-yukuai1@huaweicloud= .com Signed-off-by: Song Liu --- drivers/md/md.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/md/md.h b/drivers/md/md.h index 5d2e6bd58e4d..6836d6684cc1 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -746,6 +746,9 @@ struct md_personality void *(*takeover) (struct mddev *mddev); /* Changes the consistency policy of an active array. */ int (*change_consistency_policy)(struct mddev *mddev, const char *buf); + /* convert io ranges from array to bitmap */ + void (*bitmap_sector)(struct mddev *mddev, sector_t *offset, + unsigned long *sectors); }; =20 struct md_sysfs_entry { --=20 2.39.2 From nobody Thu Jan 30 18:55:19 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CCD31FBEB2; Mon, 27 Jan 2025 08:58:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968302; cv=none; b=nVK6LrMnDaMwxTwFYNSq7/QCShFOWOUlgthKTbzfLKP/w6+Sg6oKiZ5lBtgtBJ+Py4riFKcWpKT3zHQEeJAM0TxPGAfnRHInzKvJQFRKDkbQsYKAf4b3qQp+7QR8tsg0Gj7fjetxmX2VBCNnlzKHGX3MDO3ubJsvCVPnQWVs4Ms= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968302; c=relaxed/simple; bh=bqK8/2W54ozkCNWlbMzlicFfBQII9J/DaLScdZ9tgXo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=I6icCQH8Z0ViXop2uPZYp4x5zOJgT33eeQI5nQ0kS0INU8O2ikS0dPzQyS759aznqW++Viv+0k2uvoHKQDDWlbtaYH6ISjDqzEHIS+ZjJ+U4s9XDyQRE1CRvQQ+dp2xS+pRaMTTSq3/0Lp0kvDjLUwUredTCN66wnQeur8iftvQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YhMkr4bDDz4f3jR6; Mon, 27 Jan 2025 16:57:56 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 6EFF71A14CE; Mon, 27 Jan 2025 16:58:17 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAXe1+mSpdnShB+CA--.53281S8; Mon, 27 Jan 2025 16:58:17 +0800 (CST) From: Yu Kuai To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH 6.12 4/5] md/raid5: implement pers->bitmap_sector() Date: Mon, 27 Jan 2025 16:52:13 +0800 Message-Id: <20250127085214.3197761-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250127085214.3197761-1-yukuai1@huaweicloud.com> References: <20250127085214.3197761-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAXe1+mSpdnShB+CA--.53281S8 X-Coremail-Antispam: 1UD129KBjvJXoWxWry5Kr1xCFWDZw47Ar45trb_yoWrJw4kpa nFvrWagrWYqrnxWwsxJw1kuF4rta93Cw47JFW3WwsYg3W7GrWkZ3W8t3W5Zr1UCrWfJr90 yw15AFW8CF4qgFDanT9S1TB71UUUUUDqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVW8ZV WrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJw CI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjTR_nYFDUUU U X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai commit 9c89f604476cf15c31fbbdb043cff7fbf1dbe0cb upstream. Bitmap is used for the whole array for raid1/raid10, hence IO for the array can be used directly for bitmap. However, bitmap is used for underlying disks for raid5, hence IO for the array can't be used directly for bitmap. Implement pers->bitmap_sector() for raid5 to convert IO ranges from the array to the underlying disks. Signed-off-by: Yu Kuai Link: https://lore.kernel.org/r/20250109015145.158868-5-yukuai1@huaweicloud= .com Signed-off-by: Song Liu --- drivers/md/raid5.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 4482e1b1e4e3..7049960f4357 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5919,6 +5919,54 @@ static enum reshape_loc get_reshape_loc(struct mddev= *mddev, return LOC_BEHIND_RESHAPE; } =20 +static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset, + unsigned long *sectors) +{ + struct r5conf *conf =3D mddev->private; + sector_t start =3D *offset; + sector_t end =3D start + *sectors; + sector_t prev_start =3D start; + sector_t prev_end =3D end; + int sectors_per_chunk; + enum reshape_loc loc; + int dd_idx; + + sectors_per_chunk =3D conf->chunk_sectors * + (conf->raid_disks - conf->max_degraded); + start =3D round_down(start, sectors_per_chunk); + end =3D round_up(end, sectors_per_chunk); + + start =3D raid5_compute_sector(conf, start, 0, &dd_idx, NULL); + end =3D raid5_compute_sector(conf, end, 0, &dd_idx, NULL); + + /* + * For LOC_INSIDE_RESHAPE, this IO will wait for reshape to make + * progress, hence it's the same as LOC_BEHIND_RESHAPE. + */ + loc =3D get_reshape_loc(mddev, conf, prev_start); + if (likely(loc !=3D LOC_AHEAD_OF_RESHAPE)) { + *offset =3D start; + *sectors =3D end - start; + return; + } + + sectors_per_chunk =3D conf->prev_chunk_sectors * + (conf->previous_raid_disks - conf->max_degraded); + prev_start =3D round_down(prev_start, sectors_per_chunk); + prev_end =3D round_down(prev_end, sectors_per_chunk); + + prev_start =3D raid5_compute_sector(conf, prev_start, 1, &dd_idx, NULL); + prev_end =3D raid5_compute_sector(conf, prev_end, 1, &dd_idx, NULL); + + /* + * for LOC_AHEAD_OF_RESHAPE, reshape can make progress before this IO + * is handled in make_stripe_request(), we can't know this here hence + * we set bits for both. + */ + *offset =3D min(start, prev_start); + *sectors =3D max(end, prev_end) - *offset; +} + static enum stripe_result make_stripe_request(struct mddev *mddev, struct r5conf *conf, struct stripe_request_ctx *ctx, sector_t logical_sector, struct bio *bi) @@ -8967,6 +9015,7 @@ static struct md_personality raid6_personality =3D .takeover =3D raid6_takeover, .change_consistency_policy =3D raid5_change_consistency_policy, .prepare_suspend =3D raid5_prepare_suspend, + .bitmap_sector =3D raid5_bitmap_sector, }; static struct md_personality raid5_personality =3D { @@ -8992,6 +9041,7 @@ static struct md_personality raid5_personality =3D .takeover =3D raid5_takeover, .change_consistency_policy =3D raid5_change_consistency_policy, .prepare_suspend =3D raid5_prepare_suspend, + .bitmap_sector =3D raid5_bitmap_sector, }; =20 static struct md_personality raid4_personality =3D @@ -9018,6 +9068,7 @@ static struct md_personality raid4_personality =3D .takeover =3D raid4_takeover, .change_consistency_policy =3D raid5_change_consistency_policy, .prepare_suspend =3D raid5_prepare_suspend, + .bitmap_sector =3D raid5_bitmap_sector, }; =20 static int __init raid5_init(void) --=20 2.39.2 From nobody Thu Jan 30 18:55:19 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7AE41D540; Mon, 27 Jan 2025 08:58:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968303; cv=none; b=JYXVgY8+1fr/WJjxS/ju038xWAWuQHSWFLBKdOBSXaaftLJpAS1IL5ZxG3AjMaM9F0ZCl17UPwwFO1OMj6fZMg24LsgkOtPL72Zuvq3ZTDQIIVcT47CMw+GK2XbomtKLOg8f4kQbP23lxveh/zK9BrbE/RcLi5VnTz3RDSh1iQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737968303; c=relaxed/simple; bh=W7rfPm8GCH9StSJPDSg7q1S6uCCgksgRkxhAfkG74qU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=cB2K+XUqlj7YVwiZHnYKHQfqlpl60b+DR9C6YTPxkg9gZo5WgaFg0oXzYFcQDA1fic2vUleYVHgXEXupZH6aPdGJAOTieooctXG3HKodlM8KsN9t4GMJAlM8UXmLk0+l/o+9Xb5EQDaDbMuB1/bp7aJwwjmNZuNpiPq7U7Dy6fo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YhMkq5bCVz4f3lVX; Mon, 27 Jan 2025 16:57:55 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id D639B1A0D8C; Mon, 27 Jan 2025 16:58:17 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgAXe1+mSpdnShB+CA--.53281S9; Mon, 27 Jan 2025 16:58:17 +0800 (CST) From: Yu Kuai To: stable@vger.kernel.org, gregkh@linuxfoundation.org, song@kernel.org, yukuai3@huawei.com Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com Subject: [PATCH 6.12 5/5] md/md-bitmap: move bitmap_{start, end}write to md upper layer Date: Mon, 27 Jan 2025 16:52:14 +0800 Message-Id: <20250127085214.3197761-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250127085214.3197761-1-yukuai1@huaweicloud.com> References: <20250127085214.3197761-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgAXe1+mSpdnShB+CA--.53281S9 X-Coremail-Antispam: 1UD129KBjvAXoW3ZrWrtr15XF48Gr4Utr4fXwb_yoW8Gr1rCo Z7ZFy5Xrn5Kr48X34rAr45JFW3WryDJry5A345GryDWF9rXwnYqw1IkrW3Jr1jqr13XF43 Ar17J3WUJr1UJr1Dn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_GF v_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjTR_nYF DUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ From: Yu Kuai commit cd5fc653381811f1e0ba65f5d169918cab61476f upstream. There are two BUG reports that raid5 will hang at bitmap_startwrite([1],[2]), root cause is that bitmap start write and end write is unbalanced, it's not quite clear where, and while reviewing raid5 code, it's found that bitmap operations can be optimized. For example, for a 4 disks raid5, with chunksize=3D8k, if user issue a IO (0 + 48k) to the array: =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=90 =E2=94=82chunk 0 =E2=94= =82 =E2=94=82 =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=AC=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=AC=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=AC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=BC =E2=94=82 sh0 =E2=94=82A0: 0 + 4k =E2=94=82A1: 8k + 4k =E2=94=82A2: 16k = + 4k =E2=94=82A3: P =E2=94=82 =E2=94=82 =E2=94=BC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=BC=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=BC=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=BC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=BC =E2=94=82 sh1 =E2=94=82B0: 4k + 4k =E2=94=82B1: 12k + 4k =E2=94=82B2: 20k = + 4k =E2=94=82B3: P =E2=94=82 =E2=94=BC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=B4=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=B4=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=B4=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=B4=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=BC =E2=94=82chunk 1 =E2=94= =82 =E2=94=82 =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=AC=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=AC=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=AC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=A4 =E2=94=82 sh2 =E2=94=82C0: 24k + 4k=E2=94=82C1: 32k + 4k =E2=94=82C2: P = =E2=94=82C3: 40k + 4k=E2=94=82 =E2=94=82 =E2=94=BC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=BC=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=BC=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=BC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=BC =E2=94=82 sh3 =E2=94=82D0: 28k + 4k=E2=94=82D1: 36k + 4k =E2=94=82D2: P = =E2=94=82D3: 44k + 4k=E2=94=82 =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=B4=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=B4=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=B4=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=B4=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=98 Before this patch, 4 stripe head will be used, and each sh will attach bio for 3 disks, and each attached bio will trigger bitmap_startwrite() once, which means total 12 times. - 3 times (0 + 4k), for (A0, A1 and A2) - 3 times (4 + 4k), for (B0, B1 and B2) - 3 times (8 + 4k), for (C0, C1 and C3) - 3 times (12 + 4k), for (D0, D1 and D3) After this patch, md upper layer will calculate that IO range (0 + 48k) is corresponding to the bitmap (0 + 16k), and call bitmap_startwrite() just once. Noted that this patch will align bitmap ranges to the chunks, for example, if user issue a IO (0 + 4k) to array: - Before this patch, 1 time (0 + 4k), for A0; - After this patch, 1 time (0 + 8k) for chunk 0; Usually, one bitmap bit will represent more than one disk chunk, and this doesn't have any difference. And even if user really created a array that one chunk contain multiple bits, the overhead is that more data will be recovered after power failure. Also remove STRIPE_BITMAP_PENDING since it's not used anymore. [1] https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMh= bfts986Q@mail.gmail.com/ [2] https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flying= circus.io/ Signed-off-by: Yu Kuai Link: https://lore.kernel.org/r/20250109015145.158868-6-yukuai1@huaweicloud= .com Signed-off-by: Song Liu --- drivers/md/md.c | 29 +++++++++++++++++++++++ drivers/md/md.h | 2 ++ drivers/md/raid1.c | 4 ---- drivers/md/raid10.c | 3 --- drivers/md/raid5-cache.c | 2 -- drivers/md/raid5.c | 50 +++++----------------------------------- drivers/md/raid5.h | 3 --- 7 files changed, 37 insertions(+), 56 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 67108c397c5a..4f79583f593d 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8745,12 +8745,32 @@ void md_submit_discard_bio(struct mddev *mddev, str= uct md_rdev *rdev, } EXPORT_SYMBOL_GPL(md_submit_discard_bio); =20 +static void md_bitmap_start(struct mddev *mddev, + struct md_io_clone *md_io_clone) +{ + if (mddev->pers->bitmap_sector) + mddev->pers->bitmap_sector(mddev, &md_io_clone->offset, + &md_io_clone->sectors); + + mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset, + md_io_clone->sectors); +} + +static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_c= lone) +{ + mddev->bitmap_ops->endwrite(mddev, md_io_clone->offset, + md_io_clone->sectors); +} + static void md_end_clone_io(struct bio *bio) { struct md_io_clone *md_io_clone =3D bio->bi_private; struct bio *orig_bio =3D md_io_clone->orig_bio; struct mddev *mddev =3D md_io_clone->mddev; =20 + if (bio_data_dir(orig_bio) =3D=3D WRITE && mddev->bitmap) + md_bitmap_end(mddev, md_io_clone); + if (bio->bi_status && !orig_bio->bi_status) orig_bio->bi_status =3D bio->bi_status; =20 @@ -8775,6 +8795,12 @@ static void md_clone_bio(struct mddev *mddev, struct= bio **bio) if (blk_queue_io_stat(bdev->bd_disk->queue)) md_io_clone->start_time =3D bio_start_io_acct(*bio); =20 + if (bio_data_dir(*bio) =3D=3D WRITE && mddev->bitmap) { + md_io_clone->offset =3D (*bio)->bi_iter.bi_sector; + md_io_clone->sectors =3D bio_sectors(*bio); + md_bitmap_start(mddev, md_io_clone); + } + clone->bi_end_io =3D md_end_clone_io; clone->bi_private =3D md_io_clone; *bio =3D clone; @@ -8793,6 +8819,9 @@ void md_free_cloned_bio(struct bio *bio) struct bio *orig_bio =3D md_io_clone->orig_bio; struct mddev *mddev =3D md_io_clone->mddev; =20 + if (bio_data_dir(orig_bio) =3D=3D WRITE && mddev->bitmap) + md_bitmap_end(mddev, md_io_clone); + if (bio->bi_status && !orig_bio->bi_status) orig_bio->bi_status =3D bio->bi_status; =20 diff --git a/drivers/md/md.h b/drivers/md/md.h index 6836d6684cc1..8826dce9717d 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -831,6 +831,8 @@ struct md_io_clone { struct mddev *mddev; struct bio *orig_bio; unsigned long start_time; + sector_t offset; + unsigned long sectors; struct bio bio_clone; }; =20 diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 85521579203c..d83fe3b3abc0 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -422,8 +422,6 @@ static void close_write(struct r1bio *r1_bio) =20 if (test_bit(R1BIO_BehindIO, &r1_bio->state)) mddev->bitmap_ops->end_behind_write(mddev); - /* clear the bitmap if all writes complete successfully */ - mddev->bitmap_ops->endwrite(mddev, r1_bio->sector, r1_bio->sectors); md_write_end(mddev); } =20 @@ -1598,8 +1596,6 @@ static void raid1_write_request(struct mddev *mddev, = struct bio *bio, =20 if (test_bit(R1BIO_BehindIO, &r1_bio->state)) mddev->bitmap_ops->start_behind_write(mddev); - mddev->bitmap_ops->startwrite(mddev, r1_bio->sector, - r1_bio->sectors); first_clone =3D 0; } =20 diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index f6aa56a7d400..daf42acc4fb6 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -428,8 +428,6 @@ static void close_write(struct r10bio *r10_bio) { struct mddev *mddev =3D r10_bio->mddev; =20 - /* clear the bitmap if all writes complete successfully */ - mddev->bitmap_ops->endwrite(mddev, r10_bio->sector, r10_bio->sectors); md_write_end(mddev); } =20 @@ -1480,7 +1478,6 @@ static void raid10_write_request(struct mddev *mddev,= struct bio *bio, md_account_bio(mddev, &bio); r10_bio->master_bio =3D bio; atomic_set(&r10_bio->remaining, 1); - mddev->bitmap_ops->startwrite(mddev, r10_bio->sector, r10_bio->sectors); =20 for (i =3D 0; i < conf->copies; i++) { if (r10_bio->devs[i].bio) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index ba4f9577c737..011246e16a99 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -313,8 +313,6 @@ void r5c_handle_cached_data_endio(struct r5conf *conf, if (sh->dev[i].written) { set_bit(R5_UPTODATE, &sh->dev[i].flags); r5c_return_dev_pending_writes(conf, &sh->dev[i]); - conf->mddev->bitmap_ops->endwrite(conf->mddev, - sh->sector, RAID5_STRIPE_SECTORS(conf)); } } } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 7049960f4357..39e7596e78c0 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -906,8 +906,7 @@ static bool stripe_can_batch(struct stripe_head *sh) if (raid5_has_log(conf) || raid5_has_ppl(conf)) return false; return test_bit(STRIPE_BATCH_READY, &sh->state) && - !test_bit(STRIPE_BITMAP_PENDING, &sh->state) && - is_full_stripe_write(sh); + is_full_stripe_write(sh); } =20 /* we only do back search */ @@ -3545,29 +3544,9 @@ static void __add_stripe_bio(struct stripe_head *sh,= struct bio *bi, (*bip)->bi_iter.bi_sector, sh->sector, dd_idx, sh->dev[dd_idx].sector); =20 - if (conf->mddev->bitmap && firstwrite) { - /* Cannot hold spinlock over bitmap_startwrite, - * but must ensure this isn't added to a batch until - * we have added to the bitmap and set bm_seq. - * So set STRIPE_BITMAP_PENDING to prevent - * batching. - * If multiple __add_stripe_bio() calls race here they - * much all set STRIPE_BITMAP_PENDING. So only the first one - * to complete "bitmap_startwrite" gets to set - * STRIPE_BIT_DELAY. This is important as once a stripe - * is added to a batch, STRIPE_BIT_DELAY cannot be changed - * any more. - */ - set_bit(STRIPE_BITMAP_PENDING, &sh->state); - spin_unlock_irq(&sh->stripe_lock); - conf->mddev->bitmap_ops->startwrite(conf->mddev, sh->sector, - RAID5_STRIPE_SECTORS(conf)); - spin_lock_irq(&sh->stripe_lock); - clear_bit(STRIPE_BITMAP_PENDING, &sh->state); - if (!sh->batch_head) { - sh->bm_seq =3D conf->seq_flush+1; - set_bit(STRIPE_BIT_DELAY, &sh->state); - } + if (conf->mddev->bitmap && firstwrite && !sh->batch_head) { + sh->bm_seq =3D conf->seq_flush+1; + set_bit(STRIPE_BIT_DELAY, &sh->state); } } =20 @@ -3618,7 +3597,6 @@ handle_failed_stripe(struct r5conf *conf, struct stri= pe_head *sh, BUG_ON(sh->batch_head); for (i =3D disks; i--; ) { struct bio *bi; - int bitmap_end =3D 0; =20 if (test_bit(R5_ReadError, &sh->dev[i].flags)) { struct md_rdev *rdev =3D conf->disks[i].rdev; @@ -3643,8 +3621,6 @@ handle_failed_stripe(struct r5conf *conf, struct stri= pe_head *sh, sh->dev[i].towrite =3D NULL; sh->overwrite_disks =3D 0; spin_unlock_irq(&sh->stripe_lock); - if (bi) - bitmap_end =3D 1; =20 log_stripe_write_finished(sh); =20 @@ -3659,10 +3635,6 @@ handle_failed_stripe(struct r5conf *conf, struct str= ipe_head *sh, bio_io_error(bi); bi =3D nextbi; } - if (bitmap_end) - conf->mddev->bitmap_ops->endwrite(conf->mddev, - sh->sector, RAID5_STRIPE_SECTORS(conf)); - bitmap_end =3D 0; /* and fail all 'written' */ bi =3D sh->dev[i].written; sh->dev[i].written =3D NULL; @@ -3671,7 +3643,6 @@ handle_failed_stripe(struct r5conf *conf, struct stri= pe_head *sh, sh->dev[i].page =3D sh->dev[i].orig_page; } =20 - if (bi) bitmap_end =3D 1; while (bi && bi->bi_iter.bi_sector < sh->dev[i].sector + RAID5_STRIPE_SECTORS(conf)) { struct bio *bi2 =3D r5_next_bio(conf, bi, sh->dev[i].sector); @@ -3705,9 +3676,6 @@ handle_failed_stripe(struct r5conf *conf, struct stri= pe_head *sh, bi =3D nextbi; } } - if (bitmap_end) - conf->mddev->bitmap_ops->endwrite(conf->mddev, - sh->sector, RAID5_STRIPE_SECTORS(conf)); /* If we were in the middle of a write the parity block might * still be locked - so just clear all R5_LOCKED flags */ @@ -4056,8 +4024,7 @@ static void handle_stripe_clean_event(struct r5conf *= conf, bio_endio(wbi); wbi =3D wbi2; } - conf->mddev->bitmap_ops->endwrite(conf->mddev, - sh->sector, RAID5_STRIPE_SECTORS(conf)); + if (head_sh->batch_head) { sh =3D list_first_entry(&sh->batch_list, struct stripe_head, @@ -4883,8 +4850,7 @@ static void break_stripe_batch_list(struct stripe_hea= d *head_sh, (1 << STRIPE_COMPUTE_RUN) | (1 << STRIPE_DISCARD) | (1 << STRIPE_BATCH_READY) | - (1 << STRIPE_BATCH_ERR) | - (1 << STRIPE_BITMAP_PENDING)), + (1 << STRIPE_BATCH_ERR)), "stripe state: %lx\n", sh->state); WARN_ONCE(head_sh->state & ((1 << STRIPE_DISCARD) | (1 << STRIPE_REPLACED)), @@ -5775,10 +5741,6 @@ static void make_discard_request(struct mddev *mddev= , struct bio *bi) } spin_unlock_irq(&sh->stripe_lock); if (conf->mddev->bitmap) { - for (d =3D 0; d < conf->raid_disks - conf->max_degraded; - d++) - mddev->bitmap_ops->startwrite(mddev, sh->sector, - RAID5_STRIPE_SECTORS(conf)); sh->bm_seq =3D conf->seq_flush + 1; set_bit(STRIPE_BIT_DELAY, &sh->state); } diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 538843123d92..2e42c1641049 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -371,9 +371,6 @@ enum { STRIPE_ON_RELEASE_LIST, STRIPE_BATCH_READY, STRIPE_BATCH_ERR, - STRIPE_BITMAP_PENDING, /* Being added to bitmap, don't add - * to batch yet. - */ STRIPE_LOG_TRAPPED, /* trapped into log (see raid5-cache.c) * this bit is used in two scenarios: * --=20 2.39.2