From nobody Mon Sep 29 21:13:01 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C348C32757 for ; Tue, 16 Aug 2022 00:04:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355901AbiHPABr (ORCPT ); Mon, 15 Aug 2022 20:01:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356218AbiHOXx6 (ORCPT ); Mon, 15 Aug 2022 19:53:58 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2468A15E6A5; Mon, 15 Aug 2022 13:18:41 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1871260F17; Mon, 15 Aug 2022 20:18:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 091AAC433C1; Mon, 15 Aug 2022 20:18:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1660594714; bh=uGV0oAYI1DaH27rMwlkN03ACoMpW3nZ3BAo4v9G+Vaw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SZd64X0pgkVcA+LCfqnaQ+N7e1HbwXm1wmb2QijwdtZ88nVzcvV6Hhx+TbEQRD4Ji OMCTbeKhrNIsFkPKTMCtlmH4EL0UM6A2fRd0MXq6U+XZhOUAnchsVoynLuknad5qso XwsTSLUU1lK3nfBE01otG5Gw6WB84uZvzg5bFFAg= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Qu Wenruo , David Sterba , Sasha Levin Subject: [PATCH 5.19 0535/1157] btrfs: update stripe_sectors::uptodate in steal_rbio Date: Mon, 15 Aug 2022 19:58:11 +0200 Message-Id: <20220815180501.056583969@linuxfoundation.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220815180439.416659447@linuxfoundation.org> References: <20220815180439.416659447@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Qu Wenruo [ Upstream commit 4d10046613333508d31fe926c545c8c0b620508a ] [BUG] With added debugging, it turns out the following write sequence would cause extra read which is unnecessary: # xfs_io -f -s -c "pwrite -b 32k 0 32k" -c "pwrite -b 32k 32k 32k" \ -c "pwrite -b 32k 64k 32k" -c "pwrite -b 32k 96k 32k" \ $mnt/file The debug message looks like this (btrfs header skipped): partial rmw, full stripe=3D389152768 opf=3D0x0 devid=3D3 type=3D1 offset= =3D32768 physical=3D323059712 len=3D32768 partial rmw, full stripe=3D389152768 opf=3D0x0 devid=3D1 type=3D2 offset= =3D0 physical=3D67174400 len=3D65536 full stripe rmw, full stripe=3D389152768 opf=3D0x1 devid=3D3 type=3D1 off= set=3D0 physical=3D323026944 len=3D32768 full stripe rmw, full stripe=3D389152768 opf=3D0x1 devid=3D2 type=3D-1 of= fset=3D0 physical=3D323026944 len=3D32768 partial rmw, full stripe=3D298844160 opf=3D0x0 devid=3D1 type=3D1 offset= =3D32768 physical=3D22052864 len=3D32768 partial rmw, full stripe=3D298844160 opf=3D0x0 devid=3D2 type=3D2 offset= =3D0 physical=3D277872640 len=3D65536 full stripe rmw, full stripe=3D298844160 opf=3D0x1 devid=3D1 type=3D1 off= set=3D0 physical=3D22020096 len=3D32768 full stripe rmw, full stripe=3D298844160 opf=3D0x1 devid=3D3 type=3D-1 of= fset=3D0 physical=3D277872640 len=3D32768 partial rmw, full stripe=3D389152768 opf=3D0x0 devid=3D3 type=3D1 offset= =3D0 physical=3D323026944 len=3D32768 partial rmw, full stripe=3D389152768 opf=3D0x0 devid=3D1 type=3D2 offset= =3D0 physical=3D67174400 len=3D65536 ^^^^ Still partial read, even 389152768 is already cached by the first. write. full stripe rmw, full stripe=3D389152768 opf=3D0x1 devid=3D3 type=3D1 off= set=3D32768 physical=3D323059712 len=3D32768 full stripe rmw, full stripe=3D389152768 opf=3D0x1 devid=3D2 type=3D-1 of= fset=3D32768 physical=3D323059712 len=3D32768 partial rmw, full stripe=3D298844160 opf=3D0x0 devid=3D1 type=3D1 offset= =3D0 physical=3D22020096 len=3D32768 partial rmw, full stripe=3D298844160 opf=3D0x0 devid=3D2 type=3D2 offset= =3D0 physical=3D277872640 len=3D65536 ^^^^ Still partial read for 298844160. full stripe rmw, full stripe=3D298844160 opf=3D0x1 devid=3D1 type=3D1 off= set=3D32768 physical=3D22052864 len=3D32768 full stripe rmw, full stripe=3D298844160 opf=3D0x1 devid=3D3 type=3D-1 of= fset=3D32768 physical=3D277905408 len=3D32768 This means every 32K writes, even they are in the same full stripe, still trigger read for previously cached data. This would cause extra RAID56 IO, making the btrfs raid56 cache useless. [CAUSE] Commit d4e28d9b5f04 ("btrfs: raid56: make steal_rbio() subpage compatible") tries to make steal_rbio() subpage compatible, but during that conversion, there is one thing missing. We no longer rely on PageUptodate(rbio->stripe_pages[i]), but rbio->stripe_nsectors[i].uptodate to determine if a sector is uptodate. This means, previously if we switch the pointer, everything is done, as the PageUptodate flag is still bound to that page. But now we have to manually mark the involved sectors uptodate, or later raid56_rmw_stripe() will find the stolen sector is not uptodate, and assemble the read bio for it, wasting IO. [FIX] We can easily fix the bug, by also update the rbio->stripe_sectors[].uptodate in steal_rbio(). With this fixed, now the same write pattern no longer leads to the same unnecessary read: partial rmw, full stripe=3D389152768 opf=3D0x0 devid=3D3 type=3D1 offset= =3D32768 physical=3D323059712 len=3D32768 partial rmw, full stripe=3D389152768 opf=3D0x0 devid=3D1 type=3D2 offset= =3D0 physical=3D67174400 len=3D65536 full stripe rmw, full stripe=3D389152768 opf=3D0x1 devid=3D3 type=3D1 off= set=3D0 physical=3D323026944 len=3D32768 full stripe rmw, full stripe=3D389152768 opf=3D0x1 devid=3D2 type=3D-1 of= fset=3D0 physical=3D323026944 len=3D32768 partial rmw, full stripe=3D298844160 opf=3D0x0 devid=3D1 type=3D1 offset= =3D32768 physical=3D22052864 len=3D32768 partial rmw, full stripe=3D298844160 opf=3D0x0 devid=3D2 type=3D2 offset= =3D0 physical=3D277872640 len=3D65536 full stripe rmw, full stripe=3D298844160 opf=3D0x1 devid=3D1 type=3D1 off= set=3D0 physical=3D22020096 len=3D32768 full stripe rmw, full stripe=3D298844160 opf=3D0x1 devid=3D3 type=3D-1 of= fset=3D0 physical=3D277872640 len=3D32768 ^^^ No more partial read, directly into the write path. full stripe rmw, full stripe=3D389152768 opf=3D0x1 devid=3D3 type=3D1 off= set=3D32768 physical=3D323059712 len=3D32768 full stripe rmw, full stripe=3D389152768 opf=3D0x1 devid=3D2 type=3D-1 of= fset=3D32768 physical=3D323059712 len=3D32768 full stripe rmw, full stripe=3D298844160 opf=3D0x1 devid=3D1 type=3D1 off= set=3D32768 physical=3D22052864 len=3D32768 full stripe rmw, full stripe=3D298844160 opf=3D0x1 devid=3D3 type=3D-1 of= fset=3D32768 physical=3D277905408 len=3D32768 Fixes: d4e28d9b5f04 ("btrfs: raid56: make steal_rbio() subpage compatible") Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/raid56.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index a5b623ee6fac..13e0bb0479e6 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -347,6 +347,24 @@ static void index_stripe_sectors(struct btrfs_raid_bio= *rbio) } } =20 +static void steal_rbio_page(struct btrfs_raid_bio *src, + struct btrfs_raid_bio *dest, int page_nr) +{ + const u32 sectorsize =3D src->bioc->fs_info->sectorsize; + const u32 sectors_per_page =3D PAGE_SIZE / sectorsize; + int i; + + if (dest->stripe_pages[page_nr]) + __free_page(dest->stripe_pages[page_nr]); + dest->stripe_pages[page_nr] =3D src->stripe_pages[page_nr]; + src->stripe_pages[page_nr] =3D NULL; + + /* Also update the sector->uptodate bits. */ + for (i =3D sectors_per_page * page_nr; + i < sectors_per_page * page_nr + sectors_per_page; i++) + dest->stripe_sectors[i].uptodate =3D true; +} + /* * Stealing an rbio means taking all the uptodate pages from the stripe ar= ray * in the source rbio and putting them into the destination rbio. @@ -358,7 +376,6 @@ static void steal_rbio(struct btrfs_raid_bio *src, stru= ct btrfs_raid_bio *dest) { int i; struct page *s; - struct page *d; =20 if (!test_bit(RBIO_CACHE_READY_BIT, &src->flags)) return; @@ -368,12 +385,7 @@ static void steal_rbio(struct btrfs_raid_bio *src, str= uct btrfs_raid_bio *dest) if (!s || !full_page_sectors_uptodate(src, i)) continue; =20 - d =3D dest->stripe_pages[i]; - if (d) - __free_page(d); - - dest->stripe_pages[i] =3D s; - src->stripe_pages[i] =3D NULL; + steal_rbio_page(src, dest, i); } index_stripe_sectors(dest); index_stripe_sectors(src); --=20 2.35.1