From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 657F22877DE; Fri, 29 Aug 2025 08:13:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455202; cv=none; b=jlYoKvqW1w3/6AD53Vfk1pKf3DpHFiO0rf3FbBQdbe1EmC5WSUrXl0m3J1RFKv5lp3d2cQtfC9ANItRIXFCtUGnmFke+tLJWz+2r3pfVRP/VMunA02AeX7fK93SbOLxVyefDEXFGFVLPO4PIGd73BMOI9hCpXr+C9CStHToiC3E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455202; c=relaxed/simple; bh=xmOLclWHLymNJPq2KBcCgkTRP1hfYxI+YF9sV8VqEr0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZvHKYbmGxi+I5NSzk324TMWqPfFHc//ab3B/bMXGDrtG4/AJHFegqFrru2bdP/Q/ddmn1KNsHm9w9UMlA38iAim4vHSsYWS+5Zi1tFFOzNGznI732hegRZguMz5zAxM0zKRReCKYZqBGHIxER8mwbS2XAV41bSlGmMbXCZjQfqo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcS508vzYQvd2; Fri, 29 Aug 2025 16:13:12 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 389811A1272; Fri, 29 Aug 2025 16:13:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S5; Fri, 29 Aug 2025 16:13:10 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 01/11] md: add a new parameter 'offset' to md_super_write() Date: Fri, 29 Aug 2025 16:04:16 +0800 Message-Id: <20250829080426.1441678-2-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S5 X-Coremail-Antispam: 1UD129KBjvJXoW3Jr4kXF1DAFWDKr43KrWxZwb_yoW7CF4Dpa yIvFyfJrWayrW2qw17JFWDua4Fq34DKrZ7Kry3C34xu3W7KrykKF15XFy8Xr98uF9xCFs8 Xw4jkFW7uF1IgrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmvb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_Gr0_Xr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I 0E8cxan2IY04v7MxkF7I0En4kS14v26r4a6rW5MxAIw28IcxkI7VAKI48JMxC20s026xCa FVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_Jr Wlx4CE17CEb7AF67AKxVW8ZVWrXwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j 6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr 0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUv cSsGvfC2KfnxnUUI43ZEXa7sRMv31JUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai The parameter is always set to 0 for now, following patches will use this helper to write llbitmap to underlying disks, allow writing dirty sectors instead of the whole page. Also rename md_super_write to md_write_metadata since there is nothing super-block specific. Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Li Nan --- drivers/md/md-bitmap.c | 3 ++- drivers/md/md.c | 52 +++++++++++++++++++++++++----------------- drivers/md/md.h | 5 ++-- 3 files changed, 36 insertions(+), 24 deletions(-) diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index 5f62f2fd8f3f..b157119de123 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -485,7 +485,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct= bitmap *bitmap, return -EINVAL; } =20 - md_super_write(mddev, rdev, sboff + ps, (int)min(size, bitmap_limit), pag= e); + md_write_metadata(mddev, rdev, sboff + ps, (int)min(size, bitmap_limit), + page, 0); return 0; } =20 diff --git a/drivers/md/md.c b/drivers/md/md.c index 61a659820779..74f876497c09 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1038,15 +1038,26 @@ static void super_written(struct bio *bio) wake_up(&mddev->sb_wait); } =20 -void md_super_write(struct mddev *mddev, struct md_rdev *rdev, - sector_t sector, int size, struct page *page) -{ - /* write first size bytes of page to sector of rdev - * Increment mddev->pending_writes before returning - * and decrement it on completion, waking up sb_wait - * if zero is reached. - * If an error occurred, call md_error - */ +/** + * md_write_metadata - write metadata to underlying disk, including + * array superblock, badblocks, bitmap superblock and bitmap bits. + * @mddev: the array to write + * @rdev: the underlying disk to write + * @sector: the offset to @rdev + * @size: the length of the metadata + * @page: the metadata + * @offset: the offset to @page + * + * Write @size bytes of @page start from @offset, to @sector of @rdev, Inc= rement + * mddev->pending_writes before returning, and decrement it on completion, + * waking up sb_wait. Caller must call md_super_wait() after issuing io to= all + * rdev. If an error occurred, md_error() will be called, and the @rdev wi= ll be + * kicked out from @mddev. + */ +void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev, + sector_t sector, int size, struct page *page, + unsigned int offset) +{ struct bio *bio; =20 if (!page) @@ -1064,7 +1075,7 @@ void md_super_write(struct mddev *mddev, struct md_rd= ev *rdev, atomic_inc(&rdev->nr_pending); =20 bio->bi_iter.bi_sector =3D sector; - __bio_add_page(bio, page, size, 0); + __bio_add_page(bio, page, size, offset); bio->bi_private =3D rdev; bio->bi_end_io =3D super_written; =20 @@ -1674,8 +1685,8 @@ super_90_rdev_size_change(struct md_rdev *rdev, secto= r_t num_sectors) if ((u64)num_sectors >=3D (2ULL << 32) && rdev->mddev->level >=3D 1) num_sectors =3D (sector_t)(2ULL << 32) - 2; do { - md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size, - rdev->sb_page); + md_write_metadata(rdev->mddev, rdev, rdev->sb_start, + rdev->sb_size, rdev->sb_page, 0); } while (md_super_wait(rdev->mddev) < 0); return num_sectors; } @@ -2323,8 +2334,8 @@ super_1_rdev_size_change(struct md_rdev *rdev, sector= _t num_sectors) sb->super_offset =3D cpu_to_le64(rdev->sb_start); sb->sb_csum =3D calc_sb_1_csum(sb); do { - md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size, - rdev->sb_page); + md_write_metadata(rdev->mddev, rdev, rdev->sb_start, + rdev->sb_size, rdev->sb_page, 0); } while (md_super_wait(rdev->mddev) < 0); return num_sectors; =20 @@ -2833,18 +2844,17 @@ void md_update_sb(struct mddev *mddev, int force_ch= ange) continue; /* no noise on spare devices */ =20 if (!test_bit(Faulty, &rdev->flags)) { - md_super_write(mddev,rdev, - rdev->sb_start, rdev->sb_size, - rdev->sb_page); + md_write_metadata(mddev, rdev, rdev->sb_start, + rdev->sb_size, rdev->sb_page, 0); pr_debug("md: (write) %pg's sb offset: %llu\n", rdev->bdev, (unsigned long long)rdev->sb_start); rdev->sb_events =3D mddev->events; if (rdev->badblocks.size) { - md_super_write(mddev, rdev, - rdev->badblocks.sector, - rdev->badblocks.size << 9, - rdev->bb_page); + md_write_metadata(mddev, rdev, + rdev->badblocks.sector, + rdev->badblocks.size << 9, + rdev->bb_page, 0); rdev->badblocks.size =3D 0; } =20 diff --git a/drivers/md/md.h b/drivers/md/md.h index 081152c8de1f..cadd9bc99938 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -908,8 +908,9 @@ void md_account_bio(struct mddev *mddev, struct bio **b= io); void md_free_cloned_bio(struct bio *bio); =20 extern bool __must_check md_flush_request(struct mddev *mddev, struct bio = *bio); -extern void md_super_write(struct mddev *mddev, struct md_rdev *rdev, - sector_t sector, int size, struct page *page); +void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev, + sector_t sector, int size, struct page *page, + unsigned int offset); extern int md_super_wait(struct mddev *mddev); extern int sync_page_io(struct md_rdev *rdev, sector_t sector, int size, struct page *page, blk_opf_t opf, bool metadata_op); --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A521928724E; Fri, 29 Aug 2025 08:13:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455202; cv=none; b=UAuauOMjXF0Mwo/e8fx8ad6RBvmBhucR6jmu9+vyJwa8XgyOBEKn9CBeyT9l7sOncOPCU7VmCAeH0LK6NhQ3iTLXeDYEdFeodP65GmbaXGSSPJ4Q3ecnHXYYlXrFPT24eLK6Q7FpuMRHLYtaJHXGA1yFW1dKSaeDVXF00B4uOO4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455202; c=relaxed/simple; bh=RGBZynnur4cyQENYoo2wIGzEb5x7LlBsw6RCRwlfddk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=djC1Hvfl1NxQ0JIu4p/OAxER1mctrmfPj9Oru3vgR6CTqPtoap9XgEkpkkkM2Cy33A9W3ErNxVZdC6/fflW3dSCk1E1JiN6wXcVR/5PWi1cJkPEqRZvXD/7EaDmKaHObL7kQ3F/o4k8S0QyORjWNiFXTR2jjq11Hbh9cmJIDY9s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcS1YZWzKHMqQ; Fri, 29 Aug 2025 16:13:12 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id E8A101A1DDA; Fri, 29 Aug 2025 16:13:11 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S6; Fri, 29 Aug 2025 16:13:11 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 02/11] md: factor out a helper raid_is_456() Date: Fri, 29 Aug 2025 16:04:17 +0800 Message-Id: <20250829080426.1441678-3-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S6 X-Coremail-Antispam: 1UD129KBjvJXoW7Kr47Kr4xtr4fZw1xCF4UJwb_yoW8Ar4fpa 1fXFy3ZryUXFW3tw1DX3WkZa4Fgw1ftryqyrWxZ395XF1UJr1DKF1SqFZ2qryDWayrAFsI qa1jyr48C3W0gw7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIev Ja73UjIFyTuYvjTRNiSHDUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai There are no functional changes, the helper will be used by llbitmap in following patches. Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Li Nan --- drivers/md/md.c | 9 +-------- drivers/md/md.h | 6 ++++++ 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 74f876497c09..86cf97c0a77b 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9121,19 +9121,12 @@ static sector_t md_sync_position(struct mddev *mdde= v, enum sync_action action) =20 static bool sync_io_within_limit(struct mddev *mddev) { - int io_sectors; - /* * For raid456, sync IO is stripe(4k) per IO, for other levels, it's * RESYNC_PAGES(64k) per IO. */ - if (mddev->level =3D=3D 4 || mddev->level =3D=3D 5 || mddev->level =3D=3D= 6) - io_sectors =3D 8; - else - io_sectors =3D 128; - return atomic_read(&mddev->recovery_active) < - io_sectors * sync_io_depth(mddev); + (raid_is_456(mddev) ? 8 : 128) * sync_io_depth(mddev); } =20 #define SYNC_MARKS 10 diff --git a/drivers/md/md.h b/drivers/md/md.h index cadd9bc99938..5ef73109d14d 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -1033,6 +1033,12 @@ static inline bool mddev_is_dm(struct mddev *mddev) return !mddev->gendisk; } =20 +static inline bool raid_is_456(struct mddev *mddev) +{ + return mddev->level =3D=3D ID_RAID4 || mddev->level =3D=3D ID_RAID5 || + mddev->level =3D=3D ID_RAID6; +} + static inline void mddev_trace_remap(struct mddev *mddev, struct bio *bio, sector_t sector) { --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72BB32BE646; Fri, 29 Aug 2025 08:13:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455204; cv=none; b=taGlpI+upGutqeaihBIdtXYe4OH3YHYtIrOsvVdb1zIljsnAyt7lBGtPAIP9Kdsv46cI4IfGt+zRLY1yenvQb2iiL5GrHxvQIgFUXRcBc382HBlb9gVJi2TX1KfNf6MF7M2HHnotfQwK2O2UnV2Bw5rhCetOxqe3UA/TWjZQ2nc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455204; c=relaxed/simple; bh=qo4RfyaMJBGGkJDhlgeQr8RuNLowAOnWEKcrHR+D4CE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=TB1G/ciP6xTRewOBQ5mebHhfnboQK1HD559YxpKHEhiRZndE/eKjSoFjzf4lMFa9ZyMTcyM1ZkYl7jyvHOi4Fag43dOeu5WanjR7+FMCIh2KO+3dwxGsVtOjLSH1xCe5TKdi6gHJZK7lWlpHkBneoOtOITIYBzaQafvOp9B9c5s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcS6sjdzKHMqQ; Fri, 29 Aug 2025 16:13:12 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id A69EE1A1272; Fri, 29 Aug 2025 16:13:12 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S7; Fri, 29 Aug 2025 16:13:12 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 03/11] md/md-bitmap: support discard for bitmap ops Date: Fri, 29 Aug 2025 16:04:18 +0800 Message-Id: <20250829080426.1441678-4-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S7 X-Coremail-Antispam: 1UD129KBjvJXoWxXrW3uFW8KFW5trWDGF13CFg_yoWrWw48pF ZFqFy3GrW5XF4Yga47Aa4q9Fyrt34ktrZrtFW7W345WFyIkF9xCF4Fga4qyw1DGFy3CFsx Z3WFkr15Cr18XrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2 KfnxnUUI43ZEXa7sRiuWl3UUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Use two new methods {start, end}_discard in bitmap_ops and a new field 'rw' in struct md_io_clone to handle discard IO, prepare to support new md bitmap. Since all bitmap functions to hanlde write IO are the same, also add typedef to make code cleaner. Signed-off-by: Yu Kuai Reviewed-by: Xiao Ni Reviewed-by: Hannes Reinecke Reviewed-by: Li Nan --- drivers/md/md-bitmap.c | 3 +++ drivers/md/md-bitmap.h | 12 ++++++++---- drivers/md/md.c | 15 +++++++++++---- drivers/md/md.h | 1 + 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index b157119de123..dc050ff94d5b 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -3005,6 +3005,9 @@ static struct bitmap_operations bitmap_ops =3D { =20 .start_write =3D bitmap_start_write, .end_write =3D bitmap_end_write, + .start_discard =3D bitmap_start_write, + .end_discard =3D bitmap_end_write, + .start_sync =3D bitmap_start_sync, .end_sync =3D bitmap_end_sync, .cond_end_sync =3D bitmap_cond_end_sync, diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index 42f91755a341..8616ced49077 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -61,6 +61,9 @@ struct md_bitmap_stats { struct file *file; }; =20 +typedef void (md_bitmap_fn)(struct mddev *mddev, sector_t offset, + unsigned long sectors); + struct bitmap_operations { struct md_submodule_head head; =20 @@ -81,10 +84,11 @@ struct bitmap_operations { void (*end_behind_write)(struct mddev *mddev); void (*wait_behind_writes)(struct mddev *mddev); =20 - void (*start_write)(struct mddev *mddev, sector_t offset, - unsigned long sectors); - void (*end_write)(struct mddev *mddev, sector_t offset, - unsigned long sectors); + md_bitmap_fn *start_write; + md_bitmap_fn *end_write; + md_bitmap_fn *start_discard; + md_bitmap_fn *end_discard; + bool (*start_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks, bool degraded); void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks); diff --git a/drivers/md/md.c b/drivers/md/md.c index 86cf97c0a77b..2e088196d42c 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8933,18 +8933,24 @@ EXPORT_SYMBOL_GPL(md_submit_discard_bio); static void md_bitmap_start(struct mddev *mddev, struct md_io_clone *md_io_clone) { + md_bitmap_fn *fn =3D unlikely(md_io_clone->rw =3D=3D STAT_DISCARD) ? + mddev->bitmap_ops->start_discard : + mddev->bitmap_ops->start_write; + if (mddev->pers->bitmap_sector) mddev->pers->bitmap_sector(mddev, &md_io_clone->offset, &md_io_clone->sectors); =20 - mddev->bitmap_ops->start_write(mddev, md_io_clone->offset, - md_io_clone->sectors); + fn(mddev, md_io_clone->offset, md_io_clone->sectors); } =20 static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_c= lone) { - mddev->bitmap_ops->end_write(mddev, md_io_clone->offset, - md_io_clone->sectors); + md_bitmap_fn *fn =3D unlikely(md_io_clone->rw =3D=3D STAT_DISCARD) ? + mddev->bitmap_ops->end_discard : + mddev->bitmap_ops->end_write; + + fn(mddev, md_io_clone->offset, md_io_clone->sectors); } =20 static void md_end_clone_io(struct bio *bio) @@ -8983,6 +8989,7 @@ static void md_clone_bio(struct mddev *mddev, struct = bio **bio) if (bio_data_dir(*bio) =3D=3D WRITE && md_bitmap_enabled(mddev, false)) { md_io_clone->offset =3D (*bio)->bi_iter.bi_sector; md_io_clone->sectors =3D bio_sectors(*bio); + md_io_clone->rw =3D op_stat_group(bio_op(*bio)); md_bitmap_start(mddev, md_io_clone); } =20 diff --git a/drivers/md/md.h b/drivers/md/md.h index 5ef73109d14d..1b767b5320cf 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -872,6 +872,7 @@ struct md_io_clone { unsigned long start_time; sector_t offset; unsigned long sectors; + enum stat_group rw; struct bio bio_clone; }; =20 --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF96A2BE658; Fri, 29 Aug 2025 08:13:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455203; cv=none; b=N2TeGjV6j4JUPSXJabcpn+RE2vRqffhXyxI8nqy/yO27yKqjFfrRaTBSexFTvvDaUB1TXFsLNsVxNNe2cTGXO2BzWpKfuUoTdy/UgZ6swkWEVRZokvEzkhJbKzh8YCVpyFDECocUmd+LyZwc+rkDKX4EU98al02VOTbvq9Jz6WM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455203; c=relaxed/simple; bh=ZtzsUkYNyzaXjcn8Y1j15McSVbxGbmH59ttzdQKMFto=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iQzJyHwpgTm/juHRJaQu2CBFJgYNe+OPX3IuhM5osQlzjMrmfp6wIxtiPGMQLs2zvR0Gda9sYnZP8hxrTnEUnUY/WrnbvbzzbkAcYFIUJ3tRpJTCWI5aPqwdk0ipDznXQshh/AwvNzbjP/Be/RrO25dzmJAU1cerHcQSacJ4l+w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcV6HbMzYQvXX; Fri, 29 Aug 2025 16:13:14 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 632A71A11D7; Fri, 29 Aug 2025 16:13:13 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S8; Fri, 29 Aug 2025 16:13:13 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 04/11] md: add a new mddev field 'bitmap_id' Date: Fri, 29 Aug 2025 16:04:19 +0800 Message-Id: <20250829080426.1441678-5-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S8 X-Coremail-Antispam: 1UD129KBjvJXoWxuF4rGF1fWFWxGr17Kr18Grg_yoW5Xryfpa yxXa4fCFWrXFZ2qw43GasruFnYgwn2yFZFgrWfJ34rWFn8WrZ8WF4Fg3Wjqr1DG3WxXFnr u3W5tr48ury8ZF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUma14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCw CI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsG vfC2KfnxnUUI43ZEXa7sRiHUDtUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Prepare to store the bitmap id selected by user, also refactor mddev_set_bitmap_ops a bit in case the value is invalid. Signed-off-by: Yu Kuai Reviewed-by: Hannes Reinecke Reviewed-by: Li Nan Reviewed-by: Xiao Ni --- drivers/md/md.c | 37 +++++++++++++++++++++++++++++++------ drivers/md/md.h | 2 ++ 2 files changed, 33 insertions(+), 6 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 2e088196d42c..82c84bdabe79 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -676,13 +676,33 @@ static void active_io_release(struct percpu_ref *ref) =20 static void no_op(struct percpu_ref *r) {} =20 -static void mddev_set_bitmap_ops(struct mddev *mddev, enum md_submodule_id= id) +static bool mddev_set_bitmap_ops(struct mddev *mddev) { + struct md_submodule_head *head; + + if (mddev->bitmap_id =3D=3D ID_BITMAP_NONE) + return true; + xa_lock(&md_submodule); - mddev->bitmap_ops =3D xa_load(&md_submodule, id); + head =3D xa_load(&md_submodule, mddev->bitmap_id); + + if (!head) { + pr_warn("md: can't find bitmap id %d\n", mddev->bitmap_id); + goto err; + } + + if (head->type !=3D MD_BITMAP) { + pr_warn("md: invalid bitmap id %d\n", mddev->bitmap_id); + goto err; + } + + mddev->bitmap_ops =3D (void *)head; xa_unlock(&md_submodule); - if (!mddev->bitmap_ops) - pr_warn_once("md: can't find bitmap id %d\n", id); + return true; + +err: + xa_unlock(&md_submodule); + return false; } =20 static void mddev_clear_bitmap_ops(struct mddev *mddev) @@ -692,8 +712,13 @@ static void mddev_clear_bitmap_ops(struct mddev *mddev) =20 int mddev_init(struct mddev *mddev) { - /* TODO: support more versions */ - mddev_set_bitmap_ops(mddev, ID_BITMAP); + if (!IS_ENABLED(CONFIG_MD_BITMAP)) { + mddev->bitmap_id =3D ID_BITMAP_NONE; + } else { + mddev->bitmap_id =3D ID_BITMAP; + if (!mddev_set_bitmap_ops(mddev)) + return -EINVAL; + } =20 if (percpu_ref_init(&mddev->active_io, active_io_release, PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) { diff --git a/drivers/md/md.h b/drivers/md/md.h index 1b767b5320cf..4fa5a3e68a0c 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -40,6 +40,7 @@ enum md_submodule_id { ID_CLUSTER, ID_BITMAP, ID_LLBITMAP, /* TODO */ + ID_BITMAP_NONE, }; =20 struct md_submodule_head { @@ -565,6 +566,7 @@ struct mddev { struct percpu_ref writes_pending; int sync_checkers; /* # of threads checking writes_pending */ =20 + enum md_submodule_id bitmap_id; void *bitmap; /* the bitmap for the device */ struct bitmap_operations *bitmap_ops; struct { --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91D4F2BEFEE; Fri, 29 Aug 2025 08:13:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455204; cv=none; b=NZ1bVi0cTUe8ED96H6XDkKvOtvNC/so9zbfoT+nQrjHLVns1ijjo6krty7vimZAvgQj/JLN+uLXXnuqDA52Hy3rAxDYpRAwi8ycvUZC6AzVPWlaZmlxa+tHt/r9VBhape0jOECxqQUe7OQHgASnif+qVs8JsQrALXOL3pDjLsyo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455204; c=relaxed/simple; bh=M5Y9WPsTPOlhH/LEZk1KfLpI+CUDIfIo56iJZ1L59OQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ojwLytyu6FxDsh4XAwlGHMWHZtxDpi+NcBQiR385NrvulmWDSMqDfPDi1On41aOq80mVxrEKETlcK2ARLj0xVTFhhrv70QBPcEqQDjq1g5TDSrOqMJObRSaRCgqspOpe8QdSvOsGOOY62uoamKS5R3Dh0Q7RmbeFcRJpL4Z6tSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcW4t2gzYQvd2; Fri, 29 Aug 2025 16:13:15 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 3140C1A12BA; Fri, 29 Aug 2025 16:13:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S9; Fri, 29 Aug 2025 16:13:13 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 05/11] md/md-bitmap: add a new sysfs api bitmap_type Date: Fri, 29 Aug 2025 16:04:20 +0800 Message-Id: <20250829080426.1441678-6-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S9 X-Coremail-Antispam: 1UD129KBjvJXoWxKryDWr1rJr1rAry8Gr1fXrb_yoW7ZFWkpF WxKryayrZ5ArsxXr17J3WDuF1SqrWvy39xt3sa93sYkry5WrnxAFyfK3WrtwnrCr95CF47 ZFs8JFWUWryjvF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai The api will be used by mdadm to set bitmap_type while creating new array or assembling array, prepare to add a new bitmap. Currently available options are: cat /sys/block/md0/md/bitmap_type none [bitmap] Signed-off-by: Yu Kuai Reviewed-by: Hannes Reinecke Reviewed-by: Xiao Ni Reviewed-by: Li Nan --- Documentation/admin-guide/md.rst | 73 ++++++++++++++++------------ drivers/md/md.c | 81 ++++++++++++++++++++++++++++++++ 2 files changed, 124 insertions(+), 30 deletions(-) diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/m= d.rst index 4ff2cc291d18..356d2a344f08 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -347,6 +347,49 @@ All md devices contain: active-idle like active, but no writes have been seen for a while (safe_mode_= delay). =20 + consistency_policy + This indicates how the array maintains consistency in case of unexpec= ted + shutdown. It can be: + + none + Array has no redundancy information, e.g. raid0, linear. + + resync + Full resync is performed and all redundancy is regenerated when the + array is started after unclean shutdown. + + bitmap + Resync assisted by a write-intent bitmap. + + journal + For raid4/5/6, journal device is used to log transactions and replay + after unclean shutdown. + + ppl + For raid5 only, Partial Parity Log is used to close the write hole = and + eliminate resync. + + The accepted values when writing to this file are ``ppl`` and ``resyn= c``, + used to enable and disable PPL. + + uuid + This indicates the UUID of the array in the following format: + xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + + bitmap_type + [RW] When read, this file will display the current and available + bitmap for this array. The currently active bitmap will be enclosed + in [] brackets. Writing an bitmap name or ID to this file will switch + control of this array to that new bitmap. Note that writing a new + bitmap for created array is forbidden. + + none + No bitmap + bitmap + The default internal bitmap + +If bitmap_type is bitmap, then the md device will also contain: + bitmap/location This indicates where the write-intent bitmap for the array is stored. @@ -401,36 +444,6 @@ All md devices contain: once the array becomes non-degraded, and this fact has been recorded in the metadata. =20 - consistency_policy - This indicates how the array maintains consistency in case of unexpec= ted - shutdown. It can be: - - none - Array has no redundancy information, e.g. raid0, linear. - - resync - Full resync is performed and all redundancy is regenerated when the - array is started after unclean shutdown. - - bitmap - Resync assisted by a write-intent bitmap. - - journal - For raid4/5/6, journal device is used to log transactions and replay - after unclean shutdown. - - ppl - For raid5 only, Partial Parity Log is used to close the write hole = and - eliminate resync. - - The accepted values when writing to this file are ``ppl`` and ``resyn= c``, - used to enable and disable PPL. - - uuid - This indicates the UUID of the array in the following format: - xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - As component devices are added to an md array, they appear in the ``md`` directory as new directories named:: =20 diff --git a/drivers/md/md.c b/drivers/md/md.c index 82c84bdabe79..aeae0d4854dc 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4207,6 +4207,86 @@ new_level_store(struct mddev *mddev, const char *buf= , size_t len) static struct md_sysfs_entry md_new_level =3D __ATTR(new_level, 0664, new_level_show, new_level_store); =20 +static ssize_t +bitmap_type_show(struct mddev *mddev, char *page) +{ + struct md_submodule_head *head; + unsigned long i; + ssize_t len =3D 0; + + if (mddev->bitmap_id =3D=3D ID_BITMAP_NONE) + len +=3D sprintf(page + len, "[none] "); + else + len +=3D sprintf(page + len, "none "); + + xa_lock(&md_submodule); + xa_for_each(&md_submodule, i, head) { + if (head->type !=3D MD_BITMAP) + continue; + + if (mddev->bitmap_id =3D=3D head->id) + len +=3D sprintf(page + len, "[%s] ", head->name); + else + len +=3D sprintf(page + len, "%s ", head->name); + } + xa_unlock(&md_submodule); + + len +=3D sprintf(page + len, "\n"); + return len; +} + +static ssize_t +bitmap_type_store(struct mddev *mddev, const char *buf, size_t len) +{ + struct md_submodule_head *head; + enum md_submodule_id id; + unsigned long i; + int err =3D 0; + + xa_lock(&md_submodule); + + if (mddev->bitmap_ops) { + err =3D -EBUSY; + goto out; + } + + if (cmd_match(buf, "none")) { + mddev->bitmap_id =3D ID_BITMAP_NONE; + goto out; + } + + xa_for_each(&md_submodule, i, head) { + if (head->type =3D=3D MD_BITMAP && cmd_match(buf, head->name)) { + mddev->bitmap_id =3D head->id; + goto out; + } + } + + err =3D kstrtoint(buf, 10, &id); + if (err) + goto out; + + if (id =3D=3D ID_BITMAP_NONE) { + mddev->bitmap_id =3D id; + goto out; + } + + head =3D xa_load(&md_submodule, id); + if (head && head->type =3D=3D MD_BITMAP) { + mddev->bitmap_id =3D id; + goto out; + } + + err =3D -ENOENT; + +out: + xa_unlock(&md_submodule); + return err ? err : len; +} + +static struct md_sysfs_entry md_bitmap_type =3D +__ATTR(bitmap_type, 0664, bitmap_type_show, bitmap_type_store); + static ssize_t layout_show(struct mddev *mddev, char *page) { @@ -5813,6 +5893,7 @@ __ATTR(serialize_policy, S_IRUGO | S_IWUSR, serialize= _policy_show, static struct attribute *md_default_attrs[] =3D { &md_level.attr, &md_new_level.attr, + &md_bitmap_type.attr, &md_layout.attr, &md_raid_disks.attr, &md_uuid.attr, --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7115A2C11D2; Fri, 29 Aug 2025 08:13:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455207; cv=none; b=SLqOWCntJqFYEWCMsFhg7Bce/XP5pdqv+ghZtW4VaehoQrMoW/KePuUpQZoHLEzKWcKb3fIL+mZe9C3+84q7FSO2LHXk4eVTAMRIP/9ZRvBMWlSnvMooY3UwLrpckbvjd3m3tWDoWtFlrHJXspeW5Gh2JafDtc5dsRqk7FtB4jM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455207; c=relaxed/simple; bh=xZZnpBSVoXz+dHyRlLf67QhW0eoF9zkzKtaJSK7ZTGk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=A9P422BEhmBkr5traU+BpvV+rvrL/UMpbz7SKansQ1EDyfEwmSaoN3HrVvbEVsBuoCjyqhjFkEeue58IUE2qOycT3DfkgbV8hhkyAPTHV7VcCjLWx36xwMKGx51dvf2BJGgdZa68dxT7nE0bxrJyNdGLFsjpkD4fpNQ6ivKCtqM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcX2sCpzYQvWg; Fri, 29 Aug 2025 16:13:16 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id E02131A164A; Fri, 29 Aug 2025 16:13:14 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S10; Fri, 29 Aug 2025 16:13:14 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 06/11] md/md-bitmap: delay registration of bitmap_ops until creating bitmap Date: Fri, 29 Aug 2025 16:04:21 +0800 Message-Id: <20250829080426.1441678-7-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S10 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw1fAr4xJr17Jw4UCw4Uurg_yoW3WFy5p3 yft3Z5Kr4rJrZIgw47XFyv9F1rXFn7tr9xtryxXw15Grn7JrnxXa1rWF1Utw18Ga48ZFs8 Zw45tr48Gr13uF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Currently bitmap_ops is registered while allocating mddev, this is fine when there is only one bitmap_ops. Delay setting bitmap_ops until creating bitmap, so that user can choose which bitmap to use before running the array. Link: https://lore.kernel.org/linux-raid/20250721171557.34587-7-yukuai@kern= el.org Signed-off-by: Yu Kuai Reviewed-by: Hannes Reinecke Reviewed-by: Li Nan Reviewed-by: Xiao Ni --- Documentation/admin-guide/md.rst | 3 ++ drivers/md/md.c | 90 +++++++++++++++++++------------- 2 files changed, 56 insertions(+), 37 deletions(-) diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/m= d.rst index 356d2a344f08..001363f81850 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -388,6 +388,9 @@ All md devices contain: bitmap The default internal bitmap =20 +If bitmap_type is not none, then additional bitmap attributes bitmap/xxx or +llbitmap/xxx will be created after md device KOBJ_CHANGE event. + If bitmap_type is bitmap, then the md device will also contain: =20 bitmap/location diff --git a/drivers/md/md.c b/drivers/md/md.c index aeae0d4854dc..6560bd89d0a2 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -678,9 +678,11 @@ static void no_op(struct percpu_ref *r) {} =20 static bool mddev_set_bitmap_ops(struct mddev *mddev) { + struct bitmap_operations *old =3D mddev->bitmap_ops; struct md_submodule_head *head; =20 - if (mddev->bitmap_id =3D=3D ID_BITMAP_NONE) + if (mddev->bitmap_id =3D=3D ID_BITMAP_NONE || + (old && old->head.id =3D=3D mddev->bitmap_id)) return true; =20 xa_lock(&md_submodule); @@ -698,6 +700,18 @@ static bool mddev_set_bitmap_ops(struct mddev *mddev) =20 mddev->bitmap_ops =3D (void *)head; xa_unlock(&md_submodule); + + if (!mddev_is_dm(mddev) && mddev->bitmap_ops->group) { + if (sysfs_create_group(&mddev->kobj, mddev->bitmap_ops->group)) + pr_warn("md: cannot register extra bitmap attributes for %s\n", + mdname(mddev)); + else + /* + * Inform user with KOBJ_CHANGE about new bitmap + * attributes. + */ + kobject_uevent(&mddev->kobj, KOBJ_CHANGE); + } return true; =20 err: @@ -707,28 +721,26 @@ static bool mddev_set_bitmap_ops(struct mddev *mddev) =20 static void mddev_clear_bitmap_ops(struct mddev *mddev) { + if (!mddev_is_dm(mddev) && mddev->bitmap_ops && + mddev->bitmap_ops->group) + sysfs_remove_group(&mddev->kobj, mddev->bitmap_ops->group); + mddev->bitmap_ops =3D NULL; } =20 int mddev_init(struct mddev *mddev) { - if (!IS_ENABLED(CONFIG_MD_BITMAP)) { + if (!IS_ENABLED(CONFIG_MD_BITMAP)) mddev->bitmap_id =3D ID_BITMAP_NONE; - } else { + else mddev->bitmap_id =3D ID_BITMAP; - if (!mddev_set_bitmap_ops(mddev)) - return -EINVAL; - } =20 if (percpu_ref_init(&mddev->active_io, active_io_release, - PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) { - mddev_clear_bitmap_ops(mddev); + PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) return -ENOMEM; - } =20 if (percpu_ref_init(&mddev->writes_pending, no_op, PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) { - mddev_clear_bitmap_ops(mddev); percpu_ref_exit(&mddev->active_io); return -ENOMEM; } @@ -766,7 +778,6 @@ EXPORT_SYMBOL_GPL(mddev_init); =20 void mddev_destroy(struct mddev *mddev) { - mddev_clear_bitmap_ops(mddev); percpu_ref_exit(&mddev->active_io); percpu_ref_exit(&mddev->writes_pending); } @@ -6196,11 +6207,6 @@ struct mddev *md_alloc(dev_t dev, char *name) return ERR_PTR(error); } =20 - if (md_bitmap_registered(mddev) && mddev->bitmap_ops->group) - if (sysfs_create_group(&mddev->kobj, mddev->bitmap_ops->group)) - pr_warn("md: cannot register extra bitmap attributes for %s\n", - mdname(mddev)); - kobject_uevent(&mddev->kobj, KOBJ_ADD); mddev->sysfs_state =3D sysfs_get_dirent_safe(mddev->kobj.sd, "array_state= "); mddev->sysfs_level =3D sysfs_get_dirent_safe(mddev->kobj.sd, "level"); @@ -6279,6 +6285,26 @@ static void md_safemode_timeout(struct timer_list *t) =20 static int start_dirty_degraded; =20 +static int md_bitmap_create(struct mddev *mddev) +{ + if (mddev->bitmap_id =3D=3D ID_BITMAP_NONE) + return -EINVAL; + + if (!mddev_set_bitmap_ops(mddev)) + return -ENOENT; + + return mddev->bitmap_ops->create(mddev); +} + +static void md_bitmap_destroy(struct mddev *mddev) +{ + if (!md_bitmap_registered(mddev)) + return; + + mddev->bitmap_ops->destroy(mddev); + mddev_clear_bitmap_ops(mddev); +} + int md_run(struct mddev *mddev) { int err; @@ -6443,9 +6469,9 @@ int md_run(struct mddev *mddev) (unsigned long long)pers->size(mddev, 0, 0) / 2); err =3D -EINVAL; } - if (err =3D=3D 0 && pers->sync_request && md_bitmap_registered(mddev) && + if (err =3D=3D 0 && pers->sync_request && (mddev->bitmap_info.file || mddev->bitmap_info.offset)) { - err =3D mddev->bitmap_ops->create(mddev); + err =3D md_bitmap_create(mddev); if (err) pr_warn("%s: failed to create bitmap (%d)\n", mdname(mddev), err); @@ -6518,8 +6544,7 @@ int md_run(struct mddev *mddev) pers->free(mddev, mddev->private); mddev->private =3D NULL; put_pers(pers); - if (md_bitmap_registered(mddev)) - mddev->bitmap_ops->destroy(mddev); + md_bitmap_destroy(mddev); abort: bioset_exit(&mddev->io_clone_set); exit_sync_set: @@ -6542,7 +6567,7 @@ int do_md_run(struct mddev *mddev) if (md_bitmap_registered(mddev)) { err =3D mddev->bitmap_ops->load(mddev); if (err) { - mddev->bitmap_ops->destroy(mddev); + md_bitmap_destroy(mddev); goto out; } } @@ -6740,8 +6765,7 @@ static void __md_stop(struct mddev *mddev) { struct md_personality *pers =3D mddev->pers; =20 - if (md_bitmap_registered(mddev)) - mddev->bitmap_ops->destroy(mddev); + md_bitmap_destroy(mddev); mddev_detach(mddev); spin_lock(&mddev->lock); mddev->pers =3D NULL; @@ -7518,16 +7542,16 @@ static int set_bitmap_file(struct mddev *mddev, int= fd) err =3D 0; if (mddev->pers) { if (fd >=3D 0) { - err =3D mddev->bitmap_ops->create(mddev); + err =3D md_bitmap_create(mddev); if (!err) err =3D mddev->bitmap_ops->load(mddev); =20 if (err) { - mddev->bitmap_ops->destroy(mddev); + md_bitmap_destroy(mddev); fd =3D -1; } } else if (fd < 0) { - mddev->bitmap_ops->destroy(mddev); + md_bitmap_destroy(mddev); } } =20 @@ -7812,14 +7836,6 @@ static int update_array_info(struct mddev *mddev, md= u_array_info_t *info) rv =3D update_raid_disks(mddev, info->raid_disks); =20 if ((state ^ info->state) & (1<pers->quiesce =3D=3D NULL || mddev->thread =3D=3D NULL) { rv =3D -EINVAL; goto err; @@ -7842,12 +7858,12 @@ static int update_array_info(struct mddev *mddev, m= du_array_info_t *info) mddev->bitmap_info.default_offset; mddev->bitmap_info.space =3D mddev->bitmap_info.default_space; - rv =3D mddev->bitmap_ops->create(mddev); + rv =3D md_bitmap_create(mddev); if (!rv) rv =3D mddev->bitmap_ops->load(mddev); =20 if (rv) - mddev->bitmap_ops->destroy(mddev); + md_bitmap_destroy(mddev); } else { struct md_bitmap_stats stats; =20 @@ -7873,7 +7889,7 @@ static int update_array_info(struct mddev *mddev, mdu= _array_info_t *info) put_cluster_ops(mddev); mddev->safemode_delay =3D DEFAULT_SAFEMODE_DELAY; } - mddev->bitmap_ops->destroy(mddev); + md_bitmap_destroy(mddev); mddev->bitmap_info.offset =3D 0; } } --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE3152C21C9; Fri, 29 Aug 2025 08:13:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455205; cv=none; b=Lh07lGc57Zf5dG2Goi4+ZBen/n4J0Eo8+9TGTfV08EiM86y0svQ1YrgxZVZ24JxBKOBDLB2MYz0uswSWmps3ZPtX+133zOqsfgGBqZRkpZe7Pyghw1wheYVmUFCkQ34dI7B63tZlwcS8NF2jHSu+NVhIWRCwEvNUPaPtADbpXgw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455205; c=relaxed/simple; bh=zKkxTiTvjfz4SNEWg1YdxRreefbnu1jpQsfD/fKdC9s=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BMsYogNbmVgbe0VkLLzxmrnmbt43htNQZfqhgZRZzzPMnpZjaUuy964A3yx5+IBVxGEgBZgJ306Iew+cbiF1Ul99WsIoFhifY8ElqeJZKjOOtND4dDYCY5I4lnNsdnGkmooTyhYAUEQkova9CEoqTpDLWioVg6zhmy0vWnQHYCE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcY17gwzYQvgh; Fri, 29 Aug 2025 16:13:17 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id A4CA21A166D; Fri, 29 Aug 2025 16:13:15 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S11; Fri, 29 Aug 2025 16:13:15 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 07/11] md/md-bitmap: add a new method skip_sync_blocks() in bitmap_operations Date: Fri, 29 Aug 2025 16:04:22 +0800 Message-Id: <20250829080426.1441678-8-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S11 X-Coremail-Antispam: 1UD129KBjvJXoW7AryxtF15Xr1DZw4kJw1kKrg_yoW8Cw43pa 97JFy3Cry5Xr45Z3W7XFyDuFyFv34ktFy7tFWxu34rWr97JrnxGF4Yga40qa4DCF13AFsx Z3W5ArWrZF1Iqw7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai This method is used to check if blocks can be skipped before calling into pers->sync_request(), llbitmap will use this method to skip resync for unwritten/clean data blocks, and recovery/check/repair for unwritten data blocks; Signed-off-by: Yu Kuai Reviewed-by: Christoph Hellwig Reviewed-by: Xiao Ni Reviewed-by: Hannes Reinecke Reviewed-by: Li Nan --- drivers/md/md-bitmap.h | 1 + drivers/md/md.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index 8616ced49077..95453696c68e 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -89,6 +89,7 @@ struct bitmap_operations { md_bitmap_fn *start_discard; md_bitmap_fn *end_discard; =20 + sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset); bool (*start_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks, bool degraded); void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks); diff --git a/drivers/md/md.c b/drivers/md/md.c index 6560bd89d0a2..7196e7f6b2a4 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9460,6 +9460,12 @@ void md_do_sync(struct md_thread *thread) if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) break; =20 + if (mddev->bitmap_ops && mddev->bitmap_ops->skip_sync_blocks) { + sectors =3D mddev->bitmap_ops->skip_sync_blocks(mddev, j); + if (sectors) + goto update; + } + sectors =3D mddev->pers->sync_request(mddev, j, max_sectors, &skipped); if (sectors =3D=3D 0) { @@ -9475,6 +9481,7 @@ void md_do_sync(struct md_thread *thread) if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) break; =20 +update: j +=3D sectors; if (j > max_sectors) /* when skipping, extra large numbers can be returned. */ --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5CD52C3272; Fri, 29 Aug 2025 08:13:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455206; cv=none; b=T/G5M8w4aeEgrtFd8jNv3HVKPhp/P01/SdNuG/7btQVkcGb0hn8vg/g/Q12Qhc+8rd9slYMPXaptJZZoayTQ4nWTnmqgXePuJWkYOD1SgWmPZxi4115yn8g1unVuXL3ea6soGi5LoSQs4oCaYpO+TOjqUmNmrzLdGgbFXpmOGnk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455206; c=relaxed/simple; bh=Fw9O4rb+oqsolIGn5yCfjkcLO2+Wkqxnh+0O3jhUmi8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=EN2WpX0w+l3L942qEfxHKM5ywSUuPosASfbIqGqKHh26NAATslKwi/BAzDNcf6yhbmLTui61B0brvwGZGciCuWQUnMXHYVAs0GFNhuQ6yH0gb0lgaRsjYNTHhepp/ljxCTVjUV0Y+5rVAK1JycYdCcZywneK11cVpz9TJ+1U9yI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcX4wCQzKHNHt; Fri, 29 Aug 2025 16:13:16 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 623C11A1679; Fri, 29 Aug 2025 16:13:16 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S12; Fri, 29 Aug 2025 16:13:16 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 08/11] md/md-bitmap: add a new method blocks_synced() in bitmap_operations Date: Fri, 29 Aug 2025 16:04:23 +0800 Message-Id: <20250829080426.1441678-9-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S12 X-Coremail-Antispam: 1UD129KBjvJXoWxCF48ZFWxJr1kWr15AFW7XFb_yoW5uFWfp3 9rXasxG3yYgrZ3XFy7Z3yDuFyFv34DXryUtFyfuw1ruF9Ygwn8WF4Sga4jyFy5Ga4rZFy3 Zwn8trW5Cr10qrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Currently, raid456 must perform a whole array initial recovery to build initail xor data, then IO to the array won't have to read all the blocks in underlying disks. This behavior will affect IO performance a lot, and nowadays there are huge disks and the initial recovery can take a long time. Hence llbitmap will support lazy initial recovery in following patches. This method is used to check if data blocks is synced or not, if not then IO will still have to read all blocks for raid456. Signed-off-by: Yu Kuai --- drivers/md/md-bitmap.h | 1 + drivers/md/raid5.c | 15 +++++++++++---- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index 95453696c68e..5f41724cbcd8 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -90,6 +90,7 @@ struct bitmap_operations { md_bitmap_fn *end_discard; =20 sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset); + bool (*blocks_synced)(struct mddev *mddev, sector_t offset); bool (*start_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks, bool degraded); void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 5285e72341a2..672ab226e43c 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4097,7 +4097,8 @@ static int handle_stripe_dirtying(struct r5conf *conf, int disks) { int rmw =3D 0, rcw =3D 0, i; - sector_t resync_offset =3D conf->mddev->resync_offset; + struct mddev *mddev =3D conf->mddev; + sector_t resync_offset =3D mddev->resync_offset; =20 /* Check whether resync is now happening or should start. * If yes, then the array is dirty (after unclean shutdown or @@ -4116,6 +4117,12 @@ static int handle_stripe_dirtying(struct r5conf *con= f, pr_debug("force RCW rmw_level=3D%u, resync_offset=3D%llu sh->sector=3D%l= lu\n", conf->rmw_level, (unsigned long long)resync_offset, (unsigned long long)sh->sector); + } else if (mddev->bitmap_ops && mddev->bitmap_ops->blocks_synced && + !mddev->bitmap_ops->blocks_synced(mddev, sh->sector)) { + /* The initial recover is not done, must read everything */ + rcw =3D 1; rmw =3D 2; + pr_debug("force RCW by lazy recovery, sh->sector=3D%llu\n", + sh->sector); } else for (i =3D disks; i--; ) { /* would I have to read this buffer for read_modify_write */ struct r5dev *dev =3D &sh->dev[i]; @@ -4148,7 +4155,7 @@ static int handle_stripe_dirtying(struct r5conf *conf, set_bit(STRIPE_HANDLE, &sh->state); if ((rmw < rcw || (rmw =3D=3D rcw && conf->rmw_level =3D=3D PARITY_PREFER= _RMW)) && rmw > 0) { /* prefer read-modify-write, but need to get some data */ - mddev_add_trace_msg(conf->mddev, "raid5 rmw %llu %d", + mddev_add_trace_msg(mddev, "raid5 rmw %llu %d", sh->sector, rmw); =20 for (i =3D disks; i--; ) { @@ -4227,8 +4234,8 @@ static int handle_stripe_dirtying(struct r5conf *conf, set_bit(STRIPE_DELAYED, &sh->state); } } - if (rcw && !mddev_is_dm(conf->mddev)) - blk_add_trace_msg(conf->mddev->gendisk->queue, + if (rcw && !mddev_is_dm(mddev)) + blk_add_trace_msg(mddev->gendisk->queue, "raid5 rcw %llu %d %d %d", (unsigned long long)sh->sector, rcw, qread, test_bit(STRIPE_DELAYED, &sh->state)); --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEE672C21CD; Fri, 29 Aug 2025 08:13:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455206; cv=none; b=Wv46CFfhHZ5zARCiAhYakJDJgJ1qeQN2iPe3jl2WfGIAf8p/ATZOlF6NbN/szJZwm2OkDTlO+IzWCnJM7s7yhlO4I+DkbcV0fMcDYGxELQmfrlOTjiYtzfRwTfNGpeNjEW6VT4RH9zyTfKklSGVY50yUJDYvSUMKbcmTZvWGXxE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455206; c=relaxed/simple; bh=b0E9QQ+KD2k/UPoJgRbRysxykR2WPDllt/B2Y0Glm1c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=keU6bfVAtbHOPPIyp3X4zOy/zlvreisshQFJtklrk5DsNEbKLeB6afsppt6hGaMxTpFW0sEh4ErIgBYhL/LConDoXttD9c5+l7IgCw9n1qa5et2Bq0aJq+OsK219jZ1XEfq3LKt0/bO++5PzuR/WFCTylrnUmrDDYFy4mnl18ps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcZ4M6szYQvj2; Fri, 29 Aug 2025 16:13:18 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 20F181A16A8; Fri, 29 Aug 2025 16:13:17 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S13; Fri, 29 Aug 2025 16:13:16 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 09/11] md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER Date: Fri, 29 Aug 2025 16:04:24 +0800 Message-Id: <20250829080426.1441678-10-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S13 X-Coremail-Antispam: 1UD129KBjvJXoW3Jr4kXF4xGr4kGFW8tFWxtFb_yoW7Aw47pa yIyF98Cr4DJFWfZrZrt3WDWFWrZw18KrWqyFyfW3ykJF98trnxZF1UWFy3JrWDJa9ava12 qw1DJFW7uF15uw7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai This flag is used by llbitmap in later patches to skip raid456 initial recover and delay building initial xor data to first write. Signed-off-by: Yu Kuai --- drivers/md/md.c | 47 +++++++++++++++++++++++++++++++++++++++++++++- drivers/md/md.h | 2 ++ drivers/md/raid5.c | 19 +++++++++++++++---- 3 files changed, 63 insertions(+), 5 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 7196e7f6b2a4..199843356449 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9199,6 +9199,39 @@ static sector_t md_sync_max_sectors(struct mddev *md= dev, } } =20 +/* + * If lazy recovery is requested and all rdevs are in sync, select the rde= v with + * the higest index to perfore recovery to build initial xor data, this is= the + * same as old bitmap. + */ +static bool mddev_select_lazy_recover_rdev(struct mddev *mddev) +{ + struct md_rdev *recover_rdev =3D NULL; + struct md_rdev *rdev; + bool ret =3D false; + + rcu_read_lock(); + rdev_for_each_rcu(rdev, mddev) { + if (rdev->raid_disk < 0) + continue; + + if (test_bit(Faulty, &rdev->flags) || + !test_bit(In_sync, &rdev->flags)) + break; + + if (!recover_rdev || recover_rdev->raid_disk < rdev->raid_disk) + recover_rdev =3D rdev; + } + + if (recover_rdev) { + clear_bit(In_sync, &recover_rdev->flags); + ret =3D true; + } + + rcu_read_unlock(); + return ret; +} + static sector_t md_sync_position(struct mddev *mddev, enum sync_action act= ion) { sector_t start =3D 0; @@ -9230,6 +9263,14 @@ static sector_t md_sync_position(struct mddev *mddev= , enum sync_action action) start =3D rdev->recovery_offset; rcu_read_unlock(); =20 + /* + * If there are no spares, and raid456 lazy initial recover is + * requested. + */ + if (test_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery) && + start =3D=3D MaxSector && mddev_select_lazy_recover_rdev(mddev)) + start =3D 0; + /* If there is a bitmap, we need to make sure all * writes that started before we added a spare * complete before we start doing a recovery. @@ -9791,6 +9832,7 @@ static bool md_choose_sync_action(struct mddev *mddev= , int *spares) =20 set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); + clear_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery); return true; } =20 @@ -9799,6 +9841,7 @@ static bool md_choose_sync_action(struct mddev *mddev= , int *spares) remove_spares(mddev, NULL); set_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); + clear_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery); return true; } =20 @@ -9808,7 +9851,7 @@ static bool md_choose_sync_action(struct mddev *mddev= , int *spares) * re-add. */ *spares =3D remove_and_add_spares(mddev, NULL); - if (*spares) { + if (*spares || test_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery)) { clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); @@ -10021,6 +10064,7 @@ void md_check_recovery(struct mddev *mddev) } =20 clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); + clear_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery); clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); =20 @@ -10131,6 +10175,7 @@ void md_reap_sync_thread(struct mddev *mddev) clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); + clear_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery); /* * We call mddev->cluster_ops->update_size here because sync_size could * be changed by md_update_sb, and MD_RECOVERY_RESHAPE is cleared, diff --git a/drivers/md/md.h b/drivers/md/md.h index 4fa5a3e68a0c..7b6357879a84 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -667,6 +667,8 @@ enum recovery_flags { MD_RECOVERY_RESHAPE, /* remote node is running resync thread */ MD_RESYNCING_REMOTE, + /* raid456 lazy initial recover */ + MD_RECOVERY_LAZY_RECOVER, }; =20 enum md_ro_state { diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 672ab226e43c..5112658ef5f6 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4705,10 +4705,21 @@ static void analyse_stripe(struct stripe_head *sh, = struct stripe_head_state *s) } } else if (test_bit(In_sync, &rdev->flags)) set_bit(R5_Insync, &dev->flags); - else if (sh->sector + RAID5_STRIPE_SECTORS(conf) <=3D rdev->recovery_off= set) - /* in sync if before recovery_offset */ - set_bit(R5_Insync, &dev->flags); - else if (test_bit(R5_UPTODATE, &dev->flags) && + else if (sh->sector + RAID5_STRIPE_SECTORS(conf) <=3D + rdev->recovery_offset) { + /* + * in sync if: + * - normal IO, or + * - resync IO that is not lazy recovery + * + * For lazy recovery, we have to mark the rdev without + * In_sync as failed, to build initial xor data. + */ + if (!test_bit(STRIPE_SYNCING, &sh->state) || + !test_bit(MD_RECOVERY_LAZY_RECOVER, + &conf->mddev->recovery)) + set_bit(R5_Insync, &dev->flags); + } else if (test_bit(R5_UPTODATE, &dev->flags) && test_bit(R5_Expanded, &dev->flags)) /* If we've reshaped into here, we assume it is Insync. * We will shortly update recovery_offset to make --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A52BA2BE636; Fri, 29 Aug 2025 08:13:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455201; cv=none; b=cyesx7sNIwMG05Bb0xi5bPg7BFAEhwvR6hj0xOLJGC1Abk/P1YpPmd4K+KKEjl0HiD3ME/ruWfpIGWxhZKb6BFQqjt3KJrSeGMeY3Zal0jWn3cg8iefLPzKGDGh45j7dn6uBhrPqBrTYfVNM0FtCrDrw5tYHtQwF2QK5ISLoBVs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455201; c=relaxed/simple; bh=dpUl4hCBK7//dMfJfKDAztTA1OHzY5NA+c4xSwXqoTk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jvHJZbQMEef0WoNRrxDZO6sWOQOVbhfB08U47hAF/N4fXqEu+KdsMX1gdg/i9D5oEeG4r9bcSU62gmHduuqMyyMM6ypFsX2aMILcOWo1zXGGXWH+DrnJe6O8nIj6UBeoty1KMk6K420NkF27oBjuUp2QD3yNAG/StUwp9g+1410= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcZ0r2RzKHNL8; Fri, 29 Aug 2025 16:13:18 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id CFB6E1A1E56; Fri, 29 Aug 2025 16:13:17 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S14; Fri, 29 Aug 2025 16:13:17 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 10/11] md/md-bitmap: make method bitmap_ops->daemon_work optional Date: Fri, 29 Aug 2025 16:04:25 +0800 Message-Id: <20250829080426.1441678-11-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S14 X-Coremail-Antispam: 1UD129KBjvJXoWrtFyUGryxCry7KF47Zw4xZwb_yoW8JrWrpa 9xWw15ZrWUAaya93W7XFykuFyF9ayktFWqyFWxAw13Was8Wrn8Gr4fKFyqyr98Cr1F9Fnx AF1FvryrJ3W8trJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai daemon_work() will be called by daemon thread, on the one hand, daemon thread doesn't have strict wake-up time; on the other hand, too much work are put to daemon thread, like handle sync IO, handle failed or specail normal IO, handle recovery, and so on. Hence daemon thread may be too busy to clear dirty bits in time. Make bitmap_ops->daemon_work() optional and following patches will use separate async work to clear dirty bits for the new bitmap. Signed-off-by: Yu Kuai Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Li Nan --- drivers/md/md.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 199843356449..3a3a3fdecfbd 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -9997,7 +9997,7 @@ static void unregister_sync_thread(struct mddev *mdde= v) */ void md_check_recovery(struct mddev *mddev) { - if (md_bitmap_enabled(mddev, false)) + if (md_bitmap_enabled(mddev, false) && mddev->bitmap_ops->daemon_work) mddev->bitmap_ops->daemon_work(mddev); =20 if (signal_pending(current)) { --=20 2.39.2 From nobody Fri Oct 3 14:29:32 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 934D02D29C7; Fri, 29 Aug 2025 08:13:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455209; cv=none; b=eZsP9nYaKqJ4YvI2VCv11gKhGHTw0a8GRJdoi7XUQJ9XBr1Bp2N++zpU3jc92Z6Ir/HbulgBwU12W0flS1dDZCUV7GUireIal9nJC6QvQsOuqGrq99tboqhDSDT/kSo9+FHpAsoh6Jg7ScYK3GCqRd1aYNIFSOQQcvheFdB2Tco= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756455209; c=relaxed/simple; bh=N5dqVElnRSwlx+6LjzVJEDD9VsBcdwkI8qNEXKdawkg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JoR8Z1uZmXrCPES5eCXCIMgMi69QaTxthmfBH3v+suDgYtjJ09PtdSTY4SKt7snQ6287QJdlG3hLF/SV5SB+7PQZsjt0x/jzuGHFdAA9SgachvjfFXAZ0JqLejpWDKrKGI7DFUcOCdsvuBDdER+rSmURQMHNvY2IEsqCxFz/Msg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cCrcc18k3zYQvg5; Fri, 29 Aug 2025 16:13:20 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id A236B1A1249; Fri, 29 Aug 2025 16:13:18 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP4 (Coremail) with SMTP id gCh0CgB3wY0RYbFohAO2Ag--.45648S15; Fri, 29 Aug 2025 16:13:18 +0800 (CST) From: Yu Kuai To: hch@infradead.org, xni@redhat.com, colyli@kernel.org, linan122@huawei.com, corbet@lwn.net, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, song@kernel.org, yukuai3@huawei.com, hare@suse.de Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, yukuai1@huaweicloud.com, yi.zhang@huawei.com, yangerkun@huawei.com, johnny.chenyi@huawei.com, hailan@yukuai.org.cn Subject: [PATCH v7 md-6.18 11/11] md/md-llbitmap: introduce new lockless bitmap Date: Fri, 29 Aug 2025 16:04:26 +0800 Message-Id: <20250829080426.1441678-12-yukuai1@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20250829080426.1441678-1-yukuai1@huaweicloud.com> References: <20250829080426.1441678-1-yukuai1@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgB3wY0RYbFohAO2Ag--.45648S15 X-Coremail-Antispam: 1UD129KBjvAXoWDCry7GrW3Jw48Xw47uF1DGFg_yoW7AF4UWo WfuryUJw48Xrn8WrykAr1YkFy3Ww18Kwn0y34YkFn8WF4DX3W0v343GrW3Gr1DtrW5ur42 qF92qr4rXFs7GFWfn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOV7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2 ka0xkIwI1lc7CjxVAaw2AFwI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Y z7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zV AF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1l IxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIY CTnIWIevJa73UjIFyTuYvjTRNdb1DUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Content-Type: text/plain; charset="utf-8" From: Yu Kuai Redundant data is used to enhance data fault tolerance, and the storage method for redundant data vary depending on the RAID levels. And it's important to maintain the consistency of redundant data. Bitmap is used to record which data blocks have been synchronized and which ones need to be resynchronized or recovered. Each bit in the bitmap represents a segment of data in the array. When a bit is set, it indicates that the multiple redundant copies of that data segment may not be consistent. Data synchronization can be performed based on the bitmap after power failure or readding a disk. If there is no bitmap, a full disk synchronization is required. Due to known performance issues with md-bitmap and the unreasonable implementations: - self-managed IO submitting like filemap_write_page(); - global spin_lock I have decided not to continue optimizing based on the current bitmap implementation, this new bitmap is invented without locking from IO fast path and can be used with fast disks. For designs and details, see the comments in drivers/md-llbitmap.c. Signed-off-by: Yu Kuai --- Documentation/admin-guide/md.rst | 20 + drivers/md/Kconfig | 11 + drivers/md/Makefile | 1 + drivers/md/md-bitmap.c | 9 - drivers/md/md-bitmap.h | 31 +- drivers/md/md-llbitmap.c | 1625 ++++++++++++++++++++++++++++++ drivers/md/md.c | 6 + drivers/md/md.h | 4 +- 8 files changed, 1695 insertions(+), 12 deletions(-) create mode 100644 drivers/md/md-llbitmap.c diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/m= d.rst index 001363f81850..1c2eacc94758 100644 --- a/Documentation/admin-guide/md.rst +++ b/Documentation/admin-guide/md.rst @@ -387,6 +387,8 @@ All md devices contain: No bitmap bitmap The default internal bitmap + llbitmap + The lockless internal bitmap =20 If bitmap_type is not none, then additional bitmap attributes bitmap/xxx or llbitmap/xxx will be created after md device KOBJ_CHANGE event. @@ -447,6 +449,24 @@ If bitmap_type is bitmap, then the md device will also= contain: once the array becomes non-degraded, and this fact has been recorded in the metadata. =20 +If bitmap_type is llbitmap, then the md device will also contain: + + llbitmap/bits + This is read-only, show status of bitmap bits, the number of each + value. + + llbitmap/metadata + This is read-only, show bitmap metadata, include chunksize, chunkshif= t, + chunks, offset and daemon_sleep. + + llbitmap/daemon_sleep + This is read-write, time in seconds that daemon function will be + triggered to clear dirty bits. + + llbitmap/barrier_idle + This is read-write, time in seconds that page barrier will be idled, + means dirty bits in the page will be cleared. + As component devices are added to an md array, they appear in the ``md`` directory as new directories named:: =20 diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index f913579e731c..07c19b2182ca 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -52,6 +52,17 @@ config MD_BITMAP =20 If unsure, say Y. =20 +config MD_LLBITMAP + bool "MD RAID lockless bitmap support" + depends on BLK_DEV_MD + help + If you say Y here, support for the lockless write intent bitmap will + be enabled. + + Note, this is an experimental feature. + + If unsure, say N. + config MD_AUTODETECT bool "Autodetect RAID arrays during kernel boot" depends on BLK_DEV_MD=3Dy diff --git a/drivers/md/Makefile b/drivers/md/Makefile index 2e18147a9c40..5a51b3408b70 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -29,6 +29,7 @@ dm-zoned-y +=3D dm-zoned-target.o dm-zoned-metadata.o dm-= zoned-reclaim.o =20 md-mod-y +=3D md.o md-mod-$(CONFIG_MD_BITMAP) +=3D md-bitmap.o +md-mod-$(CONFIG_MD_LLBITMAP) +=3D md-llbitmap.o raid456-y +=3D raid5.o raid5-cache.o raid5-ppl.o linear-y +=3D md-linear.o =20 diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index dc050ff94d5b..84b7e2af6dba 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -34,15 +34,6 @@ #include "md-bitmap.h" #include "md-cluster.h" =20 -#define BITMAP_MAJOR_LO 3 -/* version 4 insists the bitmap is in little-endian order - * with version 3, it is host-endian which is non-portable - * Version 5 is currently set only for clustered devices - */ -#define BITMAP_MAJOR_HI 4 -#define BITMAP_MAJOR_CLUSTERED 5 -#define BITMAP_MAJOR_HOSTENDIAN 3 - /* * in-memory bitmap: * diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index 5f41724cbcd8..b42a28fa83a0 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -9,10 +9,26 @@ =20 #define BITMAP_MAGIC 0x6d746962 =20 +/* + * version 3 is host-endian order, this is deprecated and not used for new + * array + */ +#define BITMAP_MAJOR_LO 3 +#define BITMAP_MAJOR_HOSTENDIAN 3 +/* version 4 is little-endian order, the default value */ +#define BITMAP_MAJOR_HI 4 +/* version 5 is only used for cluster */ +#define BITMAP_MAJOR_CLUSTERED 5 +/* version 6 is only used for lockless bitmap */ +#define BITMAP_MAJOR_LOCKLESS 6 + /* use these for bitmap->flags and bitmap->sb->state bit-fields */ enum bitmap_state { - BITMAP_STALE =3D 1, /* the bitmap file is out of date or had -EIO */ + BITMAP_STALE =3D 1, /* the bitmap file is out of date or had -EIO */ BITMAP_WRITE_ERROR =3D 2, /* A write error has occurred */ + BITMAP_FIRST_USE =3D 3, /* llbitmap is just created */ + BITMAP_CLEAN =3D 4, /* llbitmap is created with assume_clean */ + BITMAP_DAEMON_BUSY =3D 5, /* llbitmap daemon is not finished after daemon= _sleep */ BITMAP_HOSTENDIAN =3D15, }; =20 @@ -166,4 +182,17 @@ static inline void md_bitmap_exit(void) } #endif =20 +#ifdef CONFIG_MD_LLBITMAP +int md_llbitmap_init(void); +void md_llbitmap_exit(void); +#else +static inline int md_llbitmap_init(void) +{ + return 0; +} +static inline void md_llbitmap_exit(void) +{ +} +#endif + #endif diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c new file mode 100644 index 000000000000..6da3d99cdbdd --- /dev/null +++ b/drivers/md/md-llbitmap.c @@ -0,0 +1,1625 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "md.h" +#include "md-bitmap.h" + +/* + * #### Background + * + * Redundant data is used to enhance data fault tolerance, and the storage + * methods for redundant data vary depending on the RAID levels. And it's + * important to maintain the consistency of redundant data. + * + * Bitmap is used to record which data blocks have been synchronized and w= hich + * ones need to be resynchronized or recovered. Each bit in the bitmap + * represents a segment of data in the array. When a bit is set, it indica= tes + * that the multiple redundant copies of that data segment may not be + * consistent. Data synchronization can be performed based on the bitmap a= fter + * power failure or readding a disk. If there is no bitmap, a full disk + * synchronization is required. + * + * #### Key Features + * + * - IO fastpath is lockless, if user issues lots of write IO to the same + * bitmap bit in a short time, only the first write has additional overhe= ad + * to update bitmap bit, no additional overhead for the following writes; + * - support only resync or recover written data, means in the case creat= ing + * new array or replacing with a new disk, there is no need to do a full = disk + * resync/recovery; + * + * #### Key Concept + * + * ##### State Machine + * + * Each bit is one byte, contain 6 different states, see llbitmap_state. A= nd + * there are total 8 different actions, see llbitmap_action, can change st= ate: + * + * llbitmap state machine: transitions between states + * + * | | Startwrite | Startsync | Endsync | Abortsync| + * | --------- | ---------- | --------- | ------- | ------- | + * | Unwritten | Dirty | x | x | x | + * | Clean | Dirty | x | x | x | + * | Dirty | x | x | x | x | + * | NeedSync | x | Syncing | x | x | + * | Syncing | x | Syncing | Dirty | NeedSync | + * + * | | Reload | Daemon | Discard | Stale | + * | --------- | -------- | ------ | --------- | --------- | + * | Unwritten | x | x | x | x | + * | Clean | x | x | Unwritten | NeedSync | + * | Dirty | NeedSync | Clean | Unwritten | NeedSync | + * | NeedSync | x | x | Unwritten | x | + * | Syncing | NeedSync | x | Unwritten | NeedSync | + * + * Typical scenarios: + * + * 1) Create new array + * All bits will be set to Unwritten by default, if --assume-clean is set, + * all bits will be set to Clean instead. + * + * 2) write data, raid1/raid10 have full copy of data, while raid456 doesn= 't and + * rely on xor data + * + * 2.1) write new data to raid1/raid10: + * Unwritten --StartWrite--> Dirty + * + * 2.2) write new data to raid456: + * Unwritten --StartWrite--> NeedSync + * + * Because the initial recover for raid456 is skipped, the xor data is not= built + * yet, the bit must be set to NeedSync first and after lazy initial recov= er is + * finished, the bit will finally set to Dirty(see 5.1 and 5.4); + * + * 2.3) cover write + * Clean --StartWrite--> Dirty + * + * 3) daemon, if the array is not degraded: + * Dirty --Daemon--> Clean + * + * 4) discard + * {Clean, Dirty, NeedSync, Syncing} --Discard--> Unwritten + * + * 5) resync and recover + * + * 5.1) common process + * NeedSync --Startsync--> Syncing --Endsync--> Dirty --Daemon--> Clean + * + * 5.2) resync after power failure + * Dirty --Reload--> NeedSync + * + * 5.3) recover while replacing with a new disk + * By default, the old bitmap framework will recover all data, and llbitmap + * implements this by a new helper, see llbitmap_skip_sync_blocks: + * + * skip recover for bits other than dirty or clean; + * + * 5.4) lazy initial recover for raid5: + * By default, the old bitmap framework will only allow new recover when t= here + * are spares(new disk), a new recovery flag MD_RECOVERY_LAZY_RECOVER is a= dded + * to perform raid456 lazy recover for set bits(from 2.2). + * + * 6. special handling for degraded array: + * + * - Dirty bits will never be cleared, daemon will just do nothing, so tha= t if + * a disk is readded, Clean bits can be skipped with recovery; + * - Dirty bits will convert to Syncing from start write, to do data recov= ery + * for new added disks; + * - New write will convert bits to NeedSync directly; + * + * ##### Bitmap IO + * + * ##### Chunksize + * + * The default bitmap size is 128k, incluing 1k bitmap super block, and + * the default size of segment of data in the array each bit(chunksize) is= 64k, + * and chunksize will adjust to twice the old size each time if the total = number + * bits is not less than 127k.(see llbitmap_init) + * + * ##### READ + * + * While creating bitmap, all pages will be allocated and read for llbitma= p, + * there won't be read afterwards + * + * ##### WRITE + * + * WRITE IO is divided into logical_block_size of the array, the dirty sta= te + * of each block is tracked independently, for example: + * + * each page is 4k, contain 8 blocks; each block is 512 bytes contain 512 = bit; + * + * | page0 | page1 | ... | page 31 | + * | | + * | \-----------------------\ + * | | + * | block0 | block1 | ... | block 8| + * | | + * | \-----------------\ + * | | + * | bit0 | bit1 | ... | bit511 | + * + * From IO path, if one bit is changed to Dirty or NeedSync, the correspon= ding + * subpage will be marked dirty, such block must write first before the IO= is + * issued. This behaviour will affect IO performance, to reduce the impact= , if + * multiple bits are changed in the same block in a short time, all bits i= n this + * block will be changed to Dirty/NeedSync, so that there won't be any ove= rhead + * until daemon clears dirty bits. + * + * ##### Dirty Bits synchronization + * + * IO fast path will set bits to dirty, and those dirty bits will be clear= ed + * by daemon after IO is done. llbitmap_page_ctl is used to synchronize be= tween + * IO path and daemon; + * + * IO path: + * 1) try to grab a reference, if succeed, set expire time after 5s and r= eturn; + * 2) if failed to grab a reference, wait for daemon to finish clearing d= irty + * bits; + * + * Daemon (Daemon will be woken up every daemon_sleep seconds): + * For each page: + * 1) check if page expired, if not skip this page; for expired page: + * 2) suspend the page and wait for inflight write IO to be done; + * 3) change dirty page to clean; + * 4) resume the page; + */ + +#define BITMAP_DATA_OFFSET 1024 + +/* 64k is the max IO size of sync IO for raid1/raid10 */ +#define MIN_CHUNK_SIZE (64 * 2) + +/* By default, daemon will be woken up every 30s */ +#define DEFAULT_DAEMON_SLEEP 30 + +/* + * Dirtied bits that have not been accessed for more than 5s will be clear= ed + * by daemon. + */ +#define DEFAULT_BARRIER_IDLE 5 + +enum llbitmap_state { + /* No valid data, init state after assemble the array */ + BitUnwritten =3D 0, + /* data is consistent */ + BitClean, + /* data will be consistent after IO is done, set directly for writes */ + BitDirty, + /* + * data need to be resynchronized: + * 1) set directly for writes if array is degraded, prevent full disk + * synchronization after readding a disk; + * 2) reassemble the array after power failure, and dirty bits are + * found after reloading the bitmap; + * 3) set for first write for raid5, to build initial xor data lazily + */ + BitNeedSync, + /* data is synchronizing */ + BitSyncing, + BitStateCount, + BitNone =3D 0xff, +}; + +enum llbitmap_action { + /* User write new data, this is the only action from IO fast path */ + BitmapActionStartwrite =3D 0, + /* Start recovery */ + BitmapActionStartsync, + /* Finish recovery */ + BitmapActionEndsync, + /* Failed recovery */ + BitmapActionAbortsync, + /* Reassemble the array */ + BitmapActionReload, + /* Daemon thread is trying to clear dirty bits */ + BitmapActionDaemon, + /* Data is deleted */ + BitmapActionDiscard, + /* + * Bitmap is stale, mark all bits in addition to BitUnwritten to + * BitNeedSync. + */ + BitmapActionStale, + BitmapActionCount, + /* Init state is BitUnwritten */ + BitmapActionInit, +}; + +enum llbitmap_page_state { + LLPageFlush =3D 0, + LLPageDirty, +}; + +struct llbitmap_page_ctl { + char *state; + struct page *page; + unsigned long expire; + unsigned long flags; + wait_queue_head_t wait; + struct percpu_ref active; + /* Per block size dirty state, maximum 64k page / 1 sector =3D 128 */ + unsigned long dirty[]; +}; + +struct llbitmap { + struct mddev *mddev; + struct llbitmap_page_ctl **pctl; + + unsigned int nr_pages; + unsigned int io_size; + unsigned int blocks_per_page; + + /* shift of one chunk */ + unsigned long chunkshift; + /* size of one chunk in sector */ + unsigned long chunksize; + /* total number of chunks */ + unsigned long chunks; + unsigned long last_end_sync; + /* + * time in seconds that dirty bits will be cleared if the page is not + * accessed. + */ + unsigned long barrier_idle; + /* fires on first BitDirty state */ + struct timer_list pending_timer; + struct work_struct daemon_work; + + unsigned long flags; + __u64 events_cleared; + + /* for slow disks */ + atomic_t behind_writes; + wait_queue_head_t behind_wait; +}; + +struct llbitmap_unplug_work { + struct work_struct work; + struct llbitmap *llbitmap; + struct completion *done; +}; + +static struct workqueue_struct *md_llbitmap_io_wq; +static struct workqueue_struct *md_llbitmap_unplug_wq; + +static char state_machine[BitStateCount][BitmapActionCount] =3D { + [BitUnwritten] =3D { + [BitmapActionStartwrite] =3D BitDirty, + [BitmapActionStartsync] =3D BitNone, + [BitmapActionEndsync] =3D BitNone, + [BitmapActionAbortsync] =3D BitNone, + [BitmapActionReload] =3D BitNone, + [BitmapActionDaemon] =3D BitNone, + [BitmapActionDiscard] =3D BitNone, + [BitmapActionStale] =3D BitNone, + }, + [BitClean] =3D { + [BitmapActionStartwrite] =3D BitDirty, + [BitmapActionStartsync] =3D BitNone, + [BitmapActionEndsync] =3D BitNone, + [BitmapActionAbortsync] =3D BitNone, + [BitmapActionReload] =3D BitNone, + [BitmapActionDaemon] =3D BitNone, + [BitmapActionDiscard] =3D BitUnwritten, + [BitmapActionStale] =3D BitNeedSync, + }, + [BitDirty] =3D { + [BitmapActionStartwrite] =3D BitNone, + [BitmapActionStartsync] =3D BitNone, + [BitmapActionEndsync] =3D BitNone, + [BitmapActionAbortsync] =3D BitNone, + [BitmapActionReload] =3D BitNeedSync, + [BitmapActionDaemon] =3D BitClean, + [BitmapActionDiscard] =3D BitUnwritten, + [BitmapActionStale] =3D BitNeedSync, + }, + [BitNeedSync] =3D { + [BitmapActionStartwrite] =3D BitNone, + [BitmapActionStartsync] =3D BitSyncing, + [BitmapActionEndsync] =3D BitNone, + [BitmapActionAbortsync] =3D BitNone, + [BitmapActionReload] =3D BitNone, + [BitmapActionDaemon] =3D BitNone, + [BitmapActionDiscard] =3D BitUnwritten, + [BitmapActionStale] =3D BitNone, + }, + [BitSyncing] =3D { + [BitmapActionStartwrite] =3D BitNone, + [BitmapActionStartsync] =3D BitSyncing, + [BitmapActionEndsync] =3D BitDirty, + [BitmapActionAbortsync] =3D BitNeedSync, + [BitmapActionReload] =3D BitNeedSync, + [BitmapActionDaemon] =3D BitNone, + [BitmapActionDiscard] =3D BitUnwritten, + [BitmapActionStale] =3D BitNeedSync, + }, +}; + +static void __llbitmap_flush(struct mddev *mddev); + +static enum llbitmap_state llbitmap_read(struct llbitmap *llbitmap, loff_t= pos) +{ + unsigned int idx; + unsigned int offset; + + pos +=3D BITMAP_DATA_OFFSET; + idx =3D pos >> PAGE_SHIFT; + offset =3D offset_in_page(pos); + + return llbitmap->pctl[idx]->state[offset]; +} + +/* set all the bits in the subpage as dirty */ +static void llbitmap_infect_dirty_bits(struct llbitmap *llbitmap, + struct llbitmap_page_ctl *pctl, + unsigned int block) +{ + bool level_456 =3D raid_is_456(llbitmap->mddev); + unsigned int io_size =3D llbitmap->io_size; + int pos; + + for (pos =3D block * io_size; pos < (block + 1) * io_size; pos++) { + switch (pctl->state[pos]) { + case BitUnwritten: + pctl->state[pos] =3D level_456 ? BitNeedSync : BitDirty; + break; + case BitClean: + pctl->state[pos] =3D BitDirty; + break; + }; + } +} + +static void llbitmap_set_page_dirty(struct llbitmap *llbitmap, int idx, + int offset) +{ + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[idx]; + unsigned int io_size =3D llbitmap->io_size; + int block =3D offset / io_size; + int pos; + + if (!test_bit(LLPageDirty, &pctl->flags)) + set_bit(LLPageDirty, &pctl->flags); + + /* + * For degraded array, dirty bits will never be cleared, and we must + * resync all the dirty bits, hence skip infect new dirty bits to + * prevent resync unnecessary data. + */ + if (llbitmap->mddev->degraded) { + set_bit(block, pctl->dirty); + return; + } + + /* + * The subpage usually contains a total of 512 bits. If any single bit + * within the subpage is marked as dirty, the entire sector will be + * written. To avoid impacting write performance, when multiple bits + * within the same sector are modified within llbitmap->barrier_idle, + * all bits in the sector will be collectively marked as dirty at once. + */ + if (test_and_set_bit(block, pctl->dirty)) { + llbitmap_infect_dirty_bits(llbitmap, pctl, block); + return; + } + + for (pos =3D block * io_size; pos < (block + 1) * io_size; pos++) { + if (pos =3D=3D offset) + continue; + if (pctl->state[pos] =3D=3D BitDirty || + pctl->state[pos] =3D=3D BitNeedSync) { + llbitmap_infect_dirty_bits(llbitmap, pctl, block); + return; + } + } +} + +static void llbitmap_write(struct llbitmap *llbitmap, enum llbitmap_state = state, + loff_t pos) +{ + unsigned int idx; + unsigned int bit; + + pos +=3D BITMAP_DATA_OFFSET; + idx =3D pos >> PAGE_SHIFT; + bit =3D offset_in_page(pos); + + llbitmap->pctl[idx]->state[bit] =3D state; + if (state =3D=3D BitDirty || state =3D=3D BitNeedSync) + llbitmap_set_page_dirty(llbitmap, idx, bit); +} + +static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx) +{ + struct mddev *mddev =3D llbitmap->mddev; + struct page *page =3D NULL; + struct md_rdev *rdev; + + if (llbitmap->pctl && llbitmap->pctl[idx]) + page =3D llbitmap->pctl[idx]->page; + if (page) + return page; + + page =3D alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return ERR_PTR(-ENOMEM); + + rdev_for_each(rdev, mddev) { + sector_t sector; + + if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags)) + continue; + + sector =3D mddev->bitmap_info.offset + + (idx << PAGE_SECTORS_SHIFT); + + if (sync_page_io(rdev, sector, PAGE_SIZE, page, REQ_OP_READ, + true)) + return page; + + md_error(mddev, rdev); + } + + __free_page(page); + return ERR_PTR(-EIO); +} + +static void llbitmap_write_page(struct llbitmap *llbitmap, int idx) +{ + struct page *page =3D llbitmap->pctl[idx]->page; + struct mddev *mddev =3D llbitmap->mddev; + struct md_rdev *rdev; + int block; + + for (block =3D 0; block < llbitmap->blocks_per_page; block++) { + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[idx]; + + if (!test_and_clear_bit(block, pctl->dirty)) + continue; + + rdev_for_each(rdev, mddev) { + sector_t sector; + sector_t bit_sector =3D llbitmap->io_size >> SECTOR_SHIFT; + + if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags)) + continue; + + sector =3D mddev->bitmap_info.offset + rdev->sb_start + + (idx << PAGE_SECTORS_SHIFT) + + block * bit_sector; + md_write_metadata(mddev, rdev, sector, + llbitmap->io_size, page, + block * llbitmap->io_size); + } + } +} + +static void active_release(struct percpu_ref *ref) +{ + struct llbitmap_page_ctl *pctl =3D + container_of(ref, struct llbitmap_page_ctl, active); + + wake_up(&pctl->wait); +} + +static void llbitmap_free_pages(struct llbitmap *llbitmap) +{ + int i; + + if (!llbitmap->pctl) + return; + + for (i =3D 0; i < llbitmap->nr_pages; i++) { + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[i]; + + if (!pctl || !pctl->page) + break; + + __free_page(pctl->page); + percpu_ref_exit(&pctl->active); + } + + kfree(llbitmap->pctl[0]); + kfree(llbitmap->pctl); + llbitmap->pctl =3D NULL; +} + +static int llbitmap_cache_pages(struct llbitmap *llbitmap) +{ + struct llbitmap_page_ctl *pctl; + unsigned int nr_pages =3D DIV_ROUND_UP(llbitmap->chunks + + BITMAP_DATA_OFFSET, PAGE_SIZE); + unsigned int size =3D struct_size(pctl, dirty, BITS_TO_LONGS( + llbitmap->blocks_per_page)); + int i; + + llbitmap->pctl =3D kmalloc_array(nr_pages, sizeof(void *), + GFP_KERNEL | __GFP_ZERO); + if (!llbitmap->pctl) + return -ENOMEM; + + size =3D round_up(size, cache_line_size()); + pctl =3D kmalloc_array(nr_pages, size, GFP_KERNEL | __GFP_ZERO); + if (!pctl) { + kfree(llbitmap->pctl); + return -ENOMEM; + } + + llbitmap->nr_pages =3D nr_pages; + + for (i =3D 0; i < nr_pages; i++, pctl =3D (void *)pctl + size) { + struct page *page =3D llbitmap_read_page(llbitmap, i); + + llbitmap->pctl[i] =3D pctl; + + if (IS_ERR(page)) { + llbitmap_free_pages(llbitmap); + return PTR_ERR(page); + } + + if (percpu_ref_init(&pctl->active, active_release, + PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) { + __free_page(page); + llbitmap_free_pages(llbitmap); + return -ENOMEM; + } + + pctl->page =3D page; + pctl->state =3D page_address(page); + init_waitqueue_head(&pctl->wait); + } + + return 0; +} + +static void llbitmap_init_state(struct llbitmap *llbitmap) +{ + enum llbitmap_state state =3D BitUnwritten; + unsigned long i; + + if (test_and_clear_bit(BITMAP_CLEAN, &llbitmap->flags)) + state =3D BitClean; + + for (i =3D 0; i < llbitmap->chunks; i++) + llbitmap_write(llbitmap, state, i); +} + +/* The return value is only used from resync, where @start =3D=3D @end. */ +static enum llbitmap_state llbitmap_state_machine(struct llbitmap *llbitma= p, + unsigned long start, + unsigned long end, + enum llbitmap_action action) +{ + struct mddev *mddev =3D llbitmap->mddev; + enum llbitmap_state state =3D BitNone; + bool level_456 =3D raid_is_456(llbitmap->mddev); + bool need_resync =3D false; + bool need_recovery =3D false; + + if (test_bit(BITMAP_WRITE_ERROR, &llbitmap->flags)) + return BitNone; + + if (action =3D=3D BitmapActionInit) { + llbitmap_init_state(llbitmap); + return BitNone; + } + + while (start <=3D end) { + enum llbitmap_state c =3D llbitmap_read(llbitmap, start); + + if (c < 0 || c >=3D BitStateCount) { + pr_err("%s: invalid bit %lu state %d action %d, forcing resync\n", + __func__, start, c, action); + state =3D BitNeedSync; + goto write_bitmap; + } + + if (c =3D=3D BitNeedSync) + need_resync =3D !mddev->degraded; + + state =3D state_machine[c][action]; + if (state =3D=3D BitNone) { + start++; + continue; + } + +write_bitmap: + if (unlikely(mddev->degraded)) { + /* For degraded array, mark new data as need sync. */ + if (state =3D=3D BitDirty && + action =3D=3D BitmapActionStartwrite) + state =3D BitNeedSync; + /* + * For degraded array, resync not unwritten data, noted + * if array is still degraded after resync is done, all + * new data will still be dirty until array is clean. + */ + else if (state !=3D BitUnwritten && + action =3D=3D BitmapActionStartsync) + state =3D BitSyncing; + } else if (c =3D=3D BitUnwritten && state =3D=3D BitDirty && + action =3D=3D BitmapActionStartwrite && level_456) { + /* Delay raid456 initial recovery to first write. */ + state =3D BitNeedSync; + } + + llbitmap_write(llbitmap, state, start); + + if (state =3D=3D BitNeedSync) + need_resync =3D !mddev->degraded; + else if (state =3D=3D BitDirty && + !timer_pending(&llbitmap->pending_timer)) + mod_timer(&llbitmap->pending_timer, + jiffies + mddev->bitmap_info.daemon_sleep * HZ); + + start++; + } + + if (need_resync && level_456) + need_recovery =3D true; + + if (need_recovery) { + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + set_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery); + md_wakeup_thread(mddev->thread); + } else if (need_resync) { + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + set_bit(MD_RECOVERY_SYNC, &mddev->recovery); + md_wakeup_thread(mddev->thread); + } + + return state; +} + +static void llbitmap_raise_barrier(struct llbitmap *llbitmap, int page_idx) +{ + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[page_idx]; + +retry: + if (likely(percpu_ref_tryget_live(&pctl->active))) { + WRITE_ONCE(pctl->expire, jiffies + llbitmap->barrier_idle * HZ); + return; + } + + wait_event(pctl->wait, !percpu_ref_is_dying(&pctl->active)); + goto retry; +} + +static void llbitmap_release_barrier(struct llbitmap *llbitmap, int page_i= dx) +{ + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[page_idx]; + + percpu_ref_put(&pctl->active); +} + +static int llbitmap_suspend_timeout(struct llbitmap *llbitmap, int page_id= x) +{ + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[page_idx]; + + percpu_ref_kill(&pctl->active); + + if (!wait_event_timeout(pctl->wait, percpu_ref_is_zero(&pctl->active), + llbitmap->mddev->bitmap_info.daemon_sleep * HZ)) + return -ETIMEDOUT; + + return 0; +} + +static void llbitmap_resume(struct llbitmap *llbitmap, int page_idx) +{ + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[page_idx]; + + pctl->expire =3D LONG_MAX; + percpu_ref_resurrect(&pctl->active); + wake_up(&pctl->wait); +} + +static int llbitmap_check_support(struct mddev *mddev) +{ + if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) { + pr_notice("md/llbitmap: %s: array with journal cannot have bitmap\n", + mdname(mddev)); + return -EBUSY; + } + + if (mddev->bitmap_info.space =3D=3D 0) { + if (mddev->bitmap_info.default_space =3D=3D 0) { + pr_notice("md/llbitmap: %s: no space for bitmap\n", + mdname(mddev)); + return -ENOSPC; + } + } + + if (!mddev->persistent) { + pr_notice("md/llbitmap: %s: array must be persistent\n", + mdname(mddev)); + return -EOPNOTSUPP; + } + + if (mddev->bitmap_info.file) { + pr_notice("md/llbitmap: %s: doesn't support bitmap file\n", + mdname(mddev)); + return -EOPNOTSUPP; + } + + if (mddev->bitmap_info.external) { + pr_notice("md/llbitmap: %s: doesn't support external metadata\n", + mdname(mddev)); + return -EOPNOTSUPP; + } + + if (mddev_is_dm(mddev)) { + pr_notice("md/llbitmap: %s: doesn't support dm-raid\n", + mdname(mddev)); + return -EOPNOTSUPP; + } + + return 0; +} + +static int llbitmap_init(struct llbitmap *llbitmap) +{ + struct mddev *mddev =3D llbitmap->mddev; + sector_t blocks =3D mddev->resync_max_sectors; + unsigned long chunksize =3D MIN_CHUNK_SIZE; + unsigned long chunks =3D DIV_ROUND_UP(blocks, chunksize); + unsigned long space =3D mddev->bitmap_info.space << SECTOR_SHIFT; + int ret; + + while (chunks > space) { + chunksize =3D chunksize << 1; + chunks =3D DIV_ROUND_UP(blocks, chunksize); + } + + llbitmap->barrier_idle =3D DEFAULT_BARRIER_IDLE; + llbitmap->chunkshift =3D ffz(~chunksize); + llbitmap->chunksize =3D chunksize; + llbitmap->chunks =3D chunks; + mddev->bitmap_info.daemon_sleep =3D DEFAULT_DAEMON_SLEEP; + + ret =3D llbitmap_cache_pages(llbitmap); + if (ret) + return ret; + + llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, + BitmapActionInit); + /* flush initial llbitmap to disk */ + __llbitmap_flush(mddev); + + return 0; +} + +static int llbitmap_read_sb(struct llbitmap *llbitmap) +{ + struct mddev *mddev =3D llbitmap->mddev; + unsigned long daemon_sleep; + unsigned long chunksize; + unsigned long events; + struct page *sb_page; + bitmap_super_t *sb; + int ret =3D -EINVAL; + + if (!mddev->bitmap_info.offset) { + pr_err("md/llbitmap: %s: no super block found", mdname(mddev)); + return -EINVAL; + } + + sb_page =3D llbitmap_read_page(llbitmap, 0); + if (IS_ERR(sb_page)) { + pr_err("md/llbitmap: %s: read super block failed", + mdname(mddev)); + return -EIO; + } + + sb =3D kmap_local_page(sb_page); + if (sb->magic !=3D cpu_to_le32(BITMAP_MAGIC)) { + pr_err("md/llbitmap: %s: invalid super block magic number", + mdname(mddev)); + goto out_put_page; + } + + if (sb->version !=3D cpu_to_le32(BITMAP_MAJOR_LOCKLESS)) { + pr_err("md/llbitmap: %s: invalid super block version", + mdname(mddev)); + goto out_put_page; + } + + if (memcmp(sb->uuid, mddev->uuid, 16)) { + pr_err("md/llbitmap: %s: bitmap superblock UUID mismatch\n", + mdname(mddev)); + goto out_put_page; + } + + if (mddev->bitmap_info.space =3D=3D 0) { + int room =3D le32_to_cpu(sb->sectors_reserved); + + if (room) + mddev->bitmap_info.space =3D room; + else + mddev->bitmap_info.space =3D mddev->bitmap_info.default_space; + } + llbitmap->flags =3D le32_to_cpu(sb->state); + if (test_and_clear_bit(BITMAP_FIRST_USE, &llbitmap->flags)) { + ret =3D llbitmap_init(llbitmap); + goto out_put_page; + } + + chunksize =3D le32_to_cpu(sb->chunksize); + if (!is_power_of_2(chunksize)) { + pr_err("md/llbitmap: %s: chunksize not a power of 2", + mdname(mddev)); + goto out_put_page; + } + + if (chunksize < DIV_ROUND_UP(mddev->resync_max_sectors, + mddev->bitmap_info.space << SECTOR_SHIFT)) { + pr_err("md/llbitmap: %s: chunksize too small %lu < %llu / %lu", + mdname(mddev), chunksize, mddev->resync_max_sectors, + mddev->bitmap_info.space); + goto out_put_page; + } + + daemon_sleep =3D le32_to_cpu(sb->daemon_sleep); + if (daemon_sleep < 1 || daemon_sleep > MAX_SCHEDULE_TIMEOUT / HZ) { + pr_err("md/llbitmap: %s: daemon sleep %lu period out of range", + mdname(mddev), daemon_sleep); + goto out_put_page; + } + + events =3D le64_to_cpu(sb->events); + if (events < mddev->events) { + pr_warn("md/llbitmap :%s: bitmap file is out of date (%lu < %llu) -- for= cing full recovery", + mdname(mddev), events, mddev->events); + set_bit(BITMAP_STALE, &llbitmap->flags); + } + + sb->sync_size =3D cpu_to_le64(mddev->resync_max_sectors); + mddev->bitmap_info.chunksize =3D chunksize; + mddev->bitmap_info.daemon_sleep =3D daemon_sleep; + + llbitmap->barrier_idle =3D DEFAULT_BARRIER_IDLE; + llbitmap->chunksize =3D chunksize; + llbitmap->chunks =3D DIV_ROUND_UP(mddev->resync_max_sectors, chunksize); + llbitmap->chunkshift =3D ffz(~chunksize); + ret =3D llbitmap_cache_pages(llbitmap); + +out_put_page: + __free_page(sb_page); + kunmap_local(sb); + return ret; +} + +static void llbitmap_pending_timer_fn(struct timer_list *pending_timer) +{ + struct llbitmap *llbitmap =3D + container_of(pending_timer, struct llbitmap, pending_timer); + + if (work_busy(&llbitmap->daemon_work)) { + pr_warn("md/llbitmap: %s daemon_work not finished in %lu seconds\n", + mdname(llbitmap->mddev), + llbitmap->mddev->bitmap_info.daemon_sleep); + set_bit(BITMAP_DAEMON_BUSY, &llbitmap->flags); + return; + } + + queue_work(md_llbitmap_io_wq, &llbitmap->daemon_work); +} + +static void md_llbitmap_daemon_fn(struct work_struct *work) +{ + struct llbitmap *llbitmap =3D + container_of(work, struct llbitmap, daemon_work); + unsigned long start; + unsigned long end; + bool restart; + int idx; + + if (llbitmap->mddev->degraded) + return; +retry: + start =3D 0; + end =3D min(llbitmap->chunks, PAGE_SIZE - BITMAP_DATA_OFFSET) - 1; + restart =3D false; + + for (idx =3D 0; idx < llbitmap->nr_pages; idx++) { + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[idx]; + + if (idx > 0) { + start =3D end + 1; + end =3D min(end + PAGE_SIZE, llbitmap->chunks - 1); + } + + if (!test_bit(LLPageFlush, &pctl->flags) && + time_before(jiffies, pctl->expire)) { + restart =3D true; + continue; + } + + if (llbitmap_suspend_timeout(llbitmap, idx) < 0) { + pr_warn("md/llbitmap: %s: %s waiting for page %d timeout\n", + mdname(llbitmap->mddev), __func__, idx); + continue; + } + + llbitmap_state_machine(llbitmap, start, end, BitmapActionDaemon); + llbitmap_resume(llbitmap, idx); + } + + /* + * If the daemon took a long time to finish, retry to prevent missing + * clearing dirty bits. + */ + if (test_and_clear_bit(BITMAP_DAEMON_BUSY, &llbitmap->flags)) + goto retry; + + /* If some page is dirty but not expired, setup timer again */ + if (restart) + mod_timer(&llbitmap->pending_timer, + jiffies + llbitmap->mddev->bitmap_info.daemon_sleep * HZ); +} + +static int llbitmap_create(struct mddev *mddev) +{ + struct llbitmap *llbitmap; + int ret; + + ret =3D llbitmap_check_support(mddev); + if (ret) + return ret; + + llbitmap =3D kzalloc(sizeof(*llbitmap), GFP_KERNEL); + if (!llbitmap) + return -ENOMEM; + + llbitmap->mddev =3D mddev; + llbitmap->io_size =3D bdev_logical_block_size(mddev->gendisk->part0); + llbitmap->blocks_per_page =3D PAGE_SIZE / llbitmap->io_size; + + timer_setup(&llbitmap->pending_timer, llbitmap_pending_timer_fn, 0); + INIT_WORK(&llbitmap->daemon_work, md_llbitmap_daemon_fn); + atomic_set(&llbitmap->behind_writes, 0); + init_waitqueue_head(&llbitmap->behind_wait); + + mutex_lock(&mddev->bitmap_info.mutex); + mddev->bitmap =3D llbitmap; + ret =3D llbitmap_read_sb(llbitmap); + mutex_unlock(&mddev->bitmap_info.mutex); + if (ret) { + kfree(llbitmap); + mddev->bitmap =3D NULL; + } + + return ret; +} + +static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunk= size) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long chunks; + + if (chunksize =3D=3D 0) + chunksize =3D llbitmap->chunksize; + + /* If there is enough space, leave the chunksize unchanged. */ + chunks =3D DIV_ROUND_UP(blocks, chunksize); + while (chunks > mddev->bitmap_info.space << SECTOR_SHIFT) { + chunksize =3D chunksize << 1; + chunks =3D DIV_ROUND_UP(blocks, chunksize); + } + + llbitmap->chunkshift =3D ffz(~chunksize); + llbitmap->chunksize =3D chunksize; + llbitmap->chunks =3D chunks; + + return 0; +} + +static int llbitmap_load(struct mddev *mddev) +{ + enum llbitmap_action action =3D BitmapActionReload; + struct llbitmap *llbitmap =3D mddev->bitmap; + + if (test_and_clear_bit(BITMAP_STALE, &llbitmap->flags)) + action =3D BitmapActionStale; + + llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, action); + return 0; +} + +static void llbitmap_destroy(struct mddev *mddev) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + + if (!llbitmap) + return; + + mutex_lock(&mddev->bitmap_info.mutex); + + timer_delete_sync(&llbitmap->pending_timer); + flush_workqueue(md_llbitmap_io_wq); + flush_workqueue(md_llbitmap_unplug_wq); + + mddev->bitmap =3D NULL; + llbitmap_free_pages(llbitmap); + kfree(llbitmap); + mutex_unlock(&mddev->bitmap_info.mutex); +} + +static void llbitmap_start_write(struct mddev *mddev, sector_t offset, + unsigned long sectors) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long start =3D offset >> llbitmap->chunkshift; + unsigned long end =3D (offset + sectors - 1) >> llbitmap->chunkshift; + int page_start =3D (start + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; + int page_end =3D (end + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; + + llbitmap_state_machine(llbitmap, start, end, BitmapActionStartwrite); + + while (page_start <=3D page_end) { + llbitmap_raise_barrier(llbitmap, page_start); + page_start++; + } +} + +static void llbitmap_end_write(struct mddev *mddev, sector_t offset, + unsigned long sectors) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long start =3D offset >> llbitmap->chunkshift; + unsigned long end =3D (offset + sectors - 1) >> llbitmap->chunkshift; + int page_start =3D (start + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; + int page_end =3D (end + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; + + while (page_start <=3D page_end) { + llbitmap_release_barrier(llbitmap, page_start); + page_start++; + } +} + +static void llbitmap_start_discard(struct mddev *mddev, sector_t offset, + unsigned long sectors) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long start =3D DIV_ROUND_UP(offset, llbitmap->chunksize); + unsigned long end =3D (offset + sectors - 1) >> llbitmap->chunkshift; + int page_start =3D (start + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; + int page_end =3D (end + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; + + llbitmap_state_machine(llbitmap, start, end, BitmapActionDiscard); + + while (page_start <=3D page_end) { + llbitmap_raise_barrier(llbitmap, page_start); + page_start++; + } +} + +static void llbitmap_end_discard(struct mddev *mddev, sector_t offset, + unsigned long sectors) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long start =3D DIV_ROUND_UP(offset, llbitmap->chunksize); + unsigned long end =3D (offset + sectors - 1) >> llbitmap->chunkshift; + int page_start =3D (start + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; + int page_end =3D (end + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; + + while (page_start <=3D page_end) { + llbitmap_release_barrier(llbitmap, page_start); + page_start++; + } +} + +static void llbitmap_unplug_fn(struct work_struct *work) +{ + struct llbitmap_unplug_work *unplug_work =3D + container_of(work, struct llbitmap_unplug_work, work); + struct llbitmap *llbitmap =3D unplug_work->llbitmap; + struct blk_plug plug; + int i; + + blk_start_plug(&plug); + + for (i =3D 0; i < llbitmap->nr_pages; i++) { + if (!test_bit(LLPageDirty, &llbitmap->pctl[i]->flags) || + !test_and_clear_bit(LLPageDirty, &llbitmap->pctl[i]->flags)) + continue; + + llbitmap_write_page(llbitmap, i); + } + + blk_finish_plug(&plug); + md_super_wait(llbitmap->mddev); + complete(unplug_work->done); +} + +static bool llbitmap_dirty(struct llbitmap *llbitmap) +{ + int i; + + for (i =3D 0; i < llbitmap->nr_pages; i++) + if (test_bit(LLPageDirty, &llbitmap->pctl[i]->flags)) + return true; + + return false; +} + +static void llbitmap_unplug(struct mddev *mddev, bool sync) +{ + DECLARE_COMPLETION_ONSTACK(done); + struct llbitmap *llbitmap =3D mddev->bitmap; + struct llbitmap_unplug_work unplug_work =3D { + .llbitmap =3D llbitmap, + .done =3D &done, + }; + + if (!llbitmap_dirty(llbitmap)) + return; + + /* + * Issue new bitmap IO under submit_bio() context will deadlock: + * - the bio will wait for bitmap bio to be done, before it can be + * issued; + * - bitmap bio will be added to current->bio_list and wait for this + * bio to be issued; + */ + INIT_WORK_ONSTACK(&unplug_work.work, llbitmap_unplug_fn); + queue_work(md_llbitmap_unplug_wq, &unplug_work.work); + wait_for_completion(&done); + destroy_work_on_stack(&unplug_work.work); +} + +/* + * Force to write all bitmap pages to disk, called when stopping the array= , or + * every daemon_sleep seconds when sync_thread is running. + */ +static void __llbitmap_flush(struct mddev *mddev) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + struct blk_plug plug; + int i; + + blk_start_plug(&plug); + for (i =3D 0; i < llbitmap->nr_pages; i++) { + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[i]; + + /* mark all blocks as dirty */ + set_bit(LLPageDirty, &pctl->flags); + bitmap_fill(pctl->dirty, llbitmap->blocks_per_page); + llbitmap_write_page(llbitmap, i); + } + blk_finish_plug(&plug); + md_super_wait(llbitmap->mddev); +} + +static void llbitmap_flush(struct mddev *mddev) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + int i; + + for (i =3D 0; i < llbitmap->nr_pages; i++) + set_bit(LLPageFlush, &llbitmap->pctl[i]->flags); + + timer_delete_sync(&llbitmap->pending_timer); + queue_work(md_llbitmap_io_wq, &llbitmap->daemon_work); + flush_work(&llbitmap->daemon_work); + + __llbitmap_flush(mddev); +} + +/* This is used for raid5 lazy initial recovery */ +static bool llbitmap_blocks_synced(struct mddev *mddev, sector_t offset) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long p =3D offset >> llbitmap->chunkshift; + enum llbitmap_state c =3D llbitmap_read(llbitmap, p); + + return c =3D=3D BitClean || c =3D=3D BitDirty; +} + +static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t of= fset) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long p =3D offset >> llbitmap->chunkshift; + int blocks =3D llbitmap->chunksize - (offset & (llbitmap->chunksize - 1)); + enum llbitmap_state c =3D llbitmap_read(llbitmap, p); + + /* always skip unwritten blocks */ + if (c =3D=3D BitUnwritten) + return blocks; + + /* For degraded array, don't skip */ + if (mddev->degraded) + return 0; + + /* For resync also skip clean/dirty blocks */ + if ((c =3D=3D BitClean || c =3D=3D BitDirty) && + test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && + !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) + return blocks; + + return 0; +} + +static bool llbitmap_start_sync(struct mddev *mddev, sector_t offset, + sector_t *blocks, bool degraded) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long p =3D offset >> llbitmap->chunkshift; + + /* + * Handle one bit at a time, this is much simpler. And it doesn't matter + * if md_do_sync() loop more times. + */ + *blocks =3D llbitmap->chunksize - (offset & (llbitmap->chunksize - 1)); + return llbitmap_state_machine(llbitmap, p, p, + BitmapActionStartsync) =3D=3D BitSyncing; +} + +/* Something is wrong, sync_thread stop at @offset */ +static void llbitmap_end_sync(struct mddev *mddev, sector_t offset, + sector_t *blocks) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long p =3D offset >> llbitmap->chunkshift; + + *blocks =3D llbitmap->chunksize - (offset & (llbitmap->chunksize - 1)); + llbitmap_state_machine(llbitmap, p, llbitmap->chunks - 1, + BitmapActionAbortsync); +} + +/* A full sync_thread is finished */ +static void llbitmap_close_sync(struct mddev *mddev) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + int i; + + for (i =3D 0; i < llbitmap->nr_pages; i++) { + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[i]; + + /* let daemon_fn clear dirty bits immediately */ + WRITE_ONCE(pctl->expire, jiffies); + } + + llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, + BitmapActionEndsync); +} + +/* + * sync_thread have reached @sector, update metadata every daemon_sleep se= conds, + * just in case sync_thread have to restart after power failure. + */ +static void llbitmap_cond_end_sync(struct mddev *mddev, sector_t sector, + bool force) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + + if (sector =3D=3D 0) { + llbitmap->last_end_sync =3D jiffies; + return; + } + + if (time_before(jiffies, llbitmap->last_end_sync + + HZ * mddev->bitmap_info.daemon_sleep)) + return; + + wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); + + mddev->curr_resync_completed =3D sector; + set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); + llbitmap_state_machine(llbitmap, 0, sector >> llbitmap->chunkshift, + BitmapActionEndsync); + __llbitmap_flush(mddev); + + llbitmap->last_end_sync =3D jiffies; + sysfs_notify_dirent_safe(mddev->sysfs_completed); +} + +static bool llbitmap_enabled(void *data, bool flush) +{ + struct llbitmap *llbitmap =3D data; + + return llbitmap && !test_bit(BITMAP_WRITE_ERROR, &llbitmap->flags); +} + +static void llbitmap_dirty_bits(struct mddev *mddev, unsigned long s, + unsigned long e) +{ + llbitmap_state_machine(mddev->bitmap, s, e, BitmapActionStartwrite); +} + +static void llbitmap_write_sb(struct llbitmap *llbitmap) +{ + int nr_blocks =3D DIV_ROUND_UP(BITMAP_DATA_OFFSET, llbitmap->io_size); + + bitmap_fill(llbitmap->pctl[0]->dirty, nr_blocks); + llbitmap_write_page(llbitmap, 0); + md_super_wait(llbitmap->mddev); +} + +static void llbitmap_update_sb(void *data) +{ + struct llbitmap *llbitmap =3D data; + struct mddev *mddev =3D llbitmap->mddev; + struct page *sb_page; + bitmap_super_t *sb; + + if (test_bit(BITMAP_WRITE_ERROR, &llbitmap->flags)) + return; + + sb_page =3D llbitmap_read_page(llbitmap, 0); + if (IS_ERR(sb_page)) { + pr_err("%s: %s: read super block failed", __func__, + mdname(mddev)); + set_bit(BITMAP_WRITE_ERROR, &llbitmap->flags); + return; + } + + if (mddev->events < llbitmap->events_cleared) + llbitmap->events_cleared =3D mddev->events; + + sb =3D kmap_local_page(sb_page); + sb->events =3D cpu_to_le64(mddev->events); + sb->state =3D cpu_to_le32(llbitmap->flags); + sb->chunksize =3D cpu_to_le32(llbitmap->chunksize); + sb->sync_size =3D cpu_to_le64(mddev->resync_max_sectors); + sb->events_cleared =3D cpu_to_le64(llbitmap->events_cleared); + sb->sectors_reserved =3D cpu_to_le32(mddev->bitmap_info.space); + sb->daemon_sleep =3D cpu_to_le32(mddev->bitmap_info.daemon_sleep); + + kunmap_local(sb); + llbitmap_write_sb(llbitmap); +} + +static int llbitmap_get_stats(void *data, struct md_bitmap_stats *stats) +{ + struct llbitmap *llbitmap =3D data; + + memset(stats, 0, sizeof(*stats)); + + stats->missing_pages =3D 0; + stats->pages =3D llbitmap->nr_pages; + stats->file_pages =3D llbitmap->nr_pages; + + stats->behind_writes =3D atomic_read(&llbitmap->behind_writes); + stats->behind_wait =3D wq_has_sleeper(&llbitmap->behind_wait); + stats->events_cleared =3D llbitmap->events_cleared; + + return 0; +} + +/* just flag all pages as needing to be written */ +static void llbitmap_write_all(struct mddev *mddev) +{ + int i; + struct llbitmap *llbitmap =3D mddev->bitmap; + + for (i =3D 0; i < llbitmap->nr_pages; i++) { + struct llbitmap_page_ctl *pctl =3D llbitmap->pctl[i]; + + set_bit(LLPageDirty, &pctl->flags); + bitmap_fill(pctl->dirty, llbitmap->blocks_per_page); + } +} + +static void llbitmap_start_behind_write(struct mddev *mddev) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + + atomic_inc(&llbitmap->behind_writes); +} + +static void llbitmap_end_behind_write(struct mddev *mddev) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + + if (atomic_dec_and_test(&llbitmap->behind_writes)) + wake_up(&llbitmap->behind_wait); +} + +static void llbitmap_wait_behind_writes(struct mddev *mddev) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + + if (!llbitmap) + return; + + wait_event(llbitmap->behind_wait, + atomic_read(&llbitmap->behind_writes) =3D=3D 0); + +} + +static ssize_t bits_show(struct mddev *mddev, char *page) +{ + struct llbitmap *llbitmap; + int bits[BitStateCount] =3D {0}; + loff_t start =3D 0; + + mutex_lock(&mddev->bitmap_info.mutex); + llbitmap =3D mddev->bitmap; + if (!llbitmap || !llbitmap->pctl) { + mutex_unlock(&mddev->bitmap_info.mutex); + return sprintf(page, "no bitmap\n"); + } + + if (test_bit(BITMAP_WRITE_ERROR, &llbitmap->flags)) { + mutex_unlock(&mddev->bitmap_info.mutex); + return sprintf(page, "bitmap io error\n"); + } + + while (start < llbitmap->chunks) { + enum llbitmap_state c =3D llbitmap_read(llbitmap, start); + + if (c < 0 || c >=3D BitStateCount) + pr_err("%s: invalid bit %llu state %d\n", + __func__, start, c); + else + bits[c]++; + start++; + } + + mutex_unlock(&mddev->bitmap_info.mutex); + return sprintf(page, "unwritten %d\nclean %d\ndirty %d\nneed sync %d\nsyn= cing %d\n", + bits[BitUnwritten], bits[BitClean], bits[BitDirty], + bits[BitNeedSync], bits[BitSyncing]); +} + +static struct md_sysfs_entry llbitmap_bits =3D __ATTR_RO(bits); + +static ssize_t metadata_show(struct mddev *mddev, char *page) +{ + struct llbitmap *llbitmap; + ssize_t ret; + + mutex_lock(&mddev->bitmap_info.mutex); + llbitmap =3D mddev->bitmap; + if (!llbitmap) { + mutex_unlock(&mddev->bitmap_info.mutex); + return sprintf(page, "no bitmap\n"); + } + + ret =3D sprintf(page, "chunksize %lu\nchunkshift %lu\nchunks %lu\noffset= %llu\ndaemon_sleep %lu\n", + llbitmap->chunksize, llbitmap->chunkshift, + llbitmap->chunks, mddev->bitmap_info.offset, + llbitmap->mddev->bitmap_info.daemon_sleep); + mutex_unlock(&mddev->bitmap_info.mutex); + + return ret; +} + +static struct md_sysfs_entry llbitmap_metadata =3D __ATTR_RO(metadata); + +static ssize_t +daemon_sleep_show(struct mddev *mddev, char *page) +{ + return sprintf(page, "%lu\n", mddev->bitmap_info.daemon_sleep); +} + +static ssize_t +daemon_sleep_store(struct mddev *mddev, const char *buf, size_t len) +{ + unsigned long timeout; + int rv =3D kstrtoul(buf, 10, &timeout); + + if (rv) + return rv; + + mddev->bitmap_info.daemon_sleep =3D timeout; + return len; +} + +static struct md_sysfs_entry llbitmap_daemon_sleep =3D __ATTR_RW(daemon_sl= eep); + +static ssize_t +barrier_idle_show(struct mddev *mddev, char *page) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + + return sprintf(page, "%lu\n", llbitmap->barrier_idle); +} + +static ssize_t +barrier_idle_store(struct mddev *mddev, const char *buf, size_t len) +{ + struct llbitmap *llbitmap =3D mddev->bitmap; + unsigned long timeout; + int rv =3D kstrtoul(buf, 10, &timeout); + + if (rv) + return rv; + + llbitmap->barrier_idle =3D timeout; + return len; +} + +static struct md_sysfs_entry llbitmap_barrier_idle =3D __ATTR_RW(barrier_i= dle); + +static struct attribute *md_llbitmap_attrs[] =3D { + &llbitmap_bits.attr, + &llbitmap_metadata.attr, + &llbitmap_daemon_sleep.attr, + &llbitmap_barrier_idle.attr, + NULL +}; + +static struct attribute_group md_llbitmap_group =3D { + .name =3D "llbitmap", + .attrs =3D md_llbitmap_attrs, +}; + +static struct bitmap_operations llbitmap_ops =3D { + .head =3D { + .type =3D MD_BITMAP, + .id =3D ID_LLBITMAP, + .name =3D "llbitmap", + }, + + .enabled =3D llbitmap_enabled, + .create =3D llbitmap_create, + .resize =3D llbitmap_resize, + .load =3D llbitmap_load, + .destroy =3D llbitmap_destroy, + + .start_write =3D llbitmap_start_write, + .end_write =3D llbitmap_end_write, + .start_discard =3D llbitmap_start_discard, + .end_discard =3D llbitmap_end_discard, + .unplug =3D llbitmap_unplug, + .flush =3D llbitmap_flush, + + .start_behind_write =3D llbitmap_start_behind_write, + .end_behind_write =3D llbitmap_end_behind_write, + .wait_behind_writes =3D llbitmap_wait_behind_writes, + + .blocks_synced =3D llbitmap_blocks_synced, + .skip_sync_blocks =3D llbitmap_skip_sync_blocks, + .start_sync =3D llbitmap_start_sync, + .end_sync =3D llbitmap_end_sync, + .close_sync =3D llbitmap_close_sync, + .cond_end_sync =3D llbitmap_cond_end_sync, + + .update_sb =3D llbitmap_update_sb, + .get_stats =3D llbitmap_get_stats, + .dirty_bits =3D llbitmap_dirty_bits, + .write_all =3D llbitmap_write_all, + + .group =3D &md_llbitmap_group, +}; + +int md_llbitmap_init(void) +{ + md_llbitmap_io_wq =3D alloc_workqueue("md_llbitmap_io", + WQ_MEM_RECLAIM | WQ_UNBOUND, 0); + if (!md_llbitmap_io_wq) + return -ENOMEM; + + md_llbitmap_unplug_wq =3D alloc_workqueue("md_llbitmap_unplug", + WQ_MEM_RECLAIM | WQ_UNBOUND, 0); + if (!md_llbitmap_unplug_wq) { + destroy_workqueue(md_llbitmap_io_wq); + md_llbitmap_io_wq =3D NULL; + return -ENOMEM; + } + + return register_md_submodule(&llbitmap_ops.head); +} + +void md_llbitmap_exit(void) +{ + destroy_workqueue(md_llbitmap_io_wq); + md_llbitmap_io_wq =3D NULL; + destroy_workqueue(md_llbitmap_unplug_wq); + md_llbitmap_unplug_wq =3D NULL; + unregister_md_submodule(&llbitmap_ops.head); +} diff --git a/drivers/md/md.c b/drivers/md/md.c index 3a3a3fdecfbd..722c76b4fade 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -10328,6 +10328,10 @@ static int __init md_init(void) if (ret) return ret; =20 + ret =3D md_llbitmap_init(); + if (ret) + goto err_bitmap; + ret =3D -ENOMEM; md_wq =3D alloc_workqueue("md", WQ_MEM_RECLAIM, 0); if (!md_wq) @@ -10359,6 +10363,8 @@ static int __init md_init(void) err_misc_wq: destroy_workqueue(md_wq); err_wq: + md_llbitmap_exit(); +err_bitmap: md_bitmap_exit(); return ret; } diff --git a/drivers/md/md.h b/drivers/md/md.h index 7b6357879a84..1979c2d4fe89 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -26,7 +26,7 @@ enum md_submodule_type { MD_PERSONALITY =3D 0, MD_CLUSTER, - MD_BITMAP, /* TODO */ + MD_BITMAP, }; =20 enum md_submodule_id { @@ -39,7 +39,7 @@ enum md_submodule_id { ID_RAID10 =3D 10, ID_CLUSTER, ID_BITMAP, - ID_LLBITMAP, /* TODO */ + ID_LLBITMAP, ID_BITMAP_NONE, }; =20 --=20 2.39.2