From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9E32C433FE for ; Wed, 20 Apr 2022 19:55:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381880AbiDTT55 (ORCPT ); Wed, 20 Apr 2022 15:57:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244058AbiDTT5y (ORCPT ); Wed, 20 Apr 2022 15:57:54 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC97313F36; Wed, 20 Apr 2022 12:55:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=Tc/4updu0oXiBoTJVCaXt56ab14DRILkwn0ElXxLw4k=; b=osP4/WJYBYQbjKitA+IgBIZBj2 8u2yO6Kx//0FaaIKxTzh2hJROz8PJ26zI2sWKDWaeLflVsjC1u+0PEbZVpMZPs62KmPYsg8ht3MR5 6F5tlAZbLsJm9IfGvxrE3zodddmyeC/QU3dEK1/qJWpwe5NeweksYIqYnxjk5emXsg2HlX+a3d+rY 4haCV8HrI0KI/aKZexNZi48stTTi3P1NGgH1vowRbmrQqNONqUeMj0I8YPR7RsvWkliFwg9uaV9Dj i30MoeUWwfoKDWnXkAxZi382c3zF4PiHEvAIqJna6hDhzgT1xkuW3kGN2FtTtG0HsP71D5Yh/vCs4 EHBIfiVA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPp-00CRsE-AT; Wed, 20 Apr 2022 13:55:05 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPo-00096C-Sb; Wed, 20 Apr 2022 13:55:04 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Wed, 20 Apr 2022 13:54:14 -0600 Message-Id: <20220420195425.34911-2-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 01/12] md/raid5: Factor out ahead_of_reshape() function X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There are a few uses of an ugly ternary operator in raid5_make_request() to check if a sector is a head of a reshape sector. Factor this out into a simple helper called ahead_of_reshape(). This appears to fix the first bio_wouldblock_error() check which appears to have comparison operators that didn't match the check below which causes a schedule. Besides this, no functional changes intended. Suggested-by: Christoph Hellwig Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Paul Menzel --- drivers/md/raid5.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 7f7d1546b9ba..97b23c18402b 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5787,6 +5787,15 @@ static void make_discard_request(struct mddev *mddev= , struct bio *bi) bio_endio(bi); } =20 +static bool ahead_of_reshape(struct mddev *mddev, sector_t sector, + sector_t reshape_sector) +{ + if (mddev->reshape_backwards) + return sector < reshape_sector; + else + return sector >=3D reshape_sector; +} + static bool raid5_make_request(struct mddev *mddev, struct bio * bi) { struct r5conf *conf =3D mddev->private; @@ -5843,9 +5852,8 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) /* Bail out if conflicts with reshape and REQ_NOWAIT is set */ if ((bi->bi_opf & REQ_NOWAIT) && (conf->reshape_progress !=3D MaxSector) && - (mddev->reshape_backwards - ? (logical_sector > conf->reshape_progress && logical_sector <=3D con= f->reshape_safe) - : (logical_sector >=3D conf->reshape_safe && logical_sector < conf->r= eshape_progress))) { + !ahead_of_reshape(mddev, logical_sector, conf->reshape_progress) && + ahead_of_reshape(mddev, logical_sector, conf->reshape_safe)) { bio_wouldblock_error(bi); if (rw =3D=3D WRITE) md_write_end(mddev); @@ -5874,14 +5882,12 @@ static bool raid5_make_request(struct mddev *mddev,= struct bio * bi) * to check again. */ spin_lock_irq(&conf->device_lock); - if (mddev->reshape_backwards - ? logical_sector < conf->reshape_progress - : logical_sector >=3D conf->reshape_progress) { + if (ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) { previous =3D 1; } else { - if (mddev->reshape_backwards - ? logical_sector < conf->reshape_safe - : logical_sector >=3D conf->reshape_safe) { + if (ahead_of_reshape(mddev, logical_sector, + conf->reshape_safe)) { spin_unlock_irq(&conf->device_lock); schedule(); do_prepare =3D true; @@ -5912,9 +5918,8 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) */ int must_retry =3D 0; spin_lock_irq(&conf->device_lock); - if (mddev->reshape_backwards - ? logical_sector >=3D conf->reshape_progress - : logical_sector < conf->reshape_progress) + if (!ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) /* mismatch, need to try again */ must_retry =3D 1; spin_unlock_irq(&conf->device_lock); --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB18AC433EF for ; Wed, 20 Apr 2022 19:55:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381935AbiDTT6N (ORCPT ); Wed, 20 Apr 2022 15:58:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348590AbiDTT5z (ORCPT ); Wed, 20 Apr 2022 15:57:55 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB20F13FBE; Wed, 20 Apr 2022 12:55:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=5Wt2EBL1j0ItIg9XZA+T0KAUXfaUNBmnwfs/d6ShImU=; b=CtomW+u3GgcqLEkkNzeVacivHN wksuFjNdTuuN922F+Z6CAf6N5BRVsTeK/2jufvl9aVtTFE03Z/8r6fQZyNV2ZTHkrvSPt/Tw18Toc dB0iiWmSGSpPLKZkSG19lqkWGQdIlGZpFZIHhFQg1DHMstN8ZEjtk5yQfP83g1AvlH4Y5VubTDY7n TXwNSBxWPodwas+qaDtqjk0eGTQJuk8MT4g8GUcndxrvfRyKFOSQVmxDs4A/m5Y8CZ6Dysr/e9DwO XBjSH+M/kbUJF1zlyWjiuQ0jmGW4BovmKUuFBhrOTdxn106IIRd46cs0NEodzwOcxGXkPab5tSZT3 LExi1klQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPp-00CRsG-Ej; Wed, 20 Apr 2022 13:55:06 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPp-00096E-16; Wed, 20 Apr 2022 13:55:05 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:15 -0600 Message-Id: <20220420195425.34911-3-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 02/12] md/raid5: Refactor raid5_make_request loop X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Break immediately if raid5_get_active_stripe() returns NULL and deindent the rest of the loop. Annotate this check with an unlikely(). This makes the code easier to read and reduces the indentation level. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 109 +++++++++++++++++++++++---------------------- 1 file changed, 55 insertions(+), 54 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 97b23c18402b..cda6857e6207 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5906,68 +5906,69 @@ static bool raid5_make_request(struct mddev *mddev,= struct bio * bi) =20 sh =3D raid5_get_active_stripe(conf, new_sector, previous, (bi->bi_opf & REQ_RAHEAD), 0); - if (sh) { - if (unlikely(previous)) { - /* expansion might have moved on while waiting for a - * stripe, so we must do the range check again. - * Expansion could still move past after this - * test, but as we are holding a reference to - * 'sh', we know that if that happens, - * STRIPE_EXPANDING will get set and the expansion - * won't proceed until we finish with the stripe. - */ - int must_retry =3D 0; - spin_lock_irq(&conf->device_lock); - if (!ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) - /* mismatch, need to try again */ - must_retry =3D 1; - spin_unlock_irq(&conf->device_lock); - if (must_retry) { - raid5_release_stripe(sh); - schedule(); - do_prepare =3D true; - goto retry; - } - } - if (read_seqcount_retry(&conf->gen_lock, seq)) { - /* Might have got the wrong stripe_head - * by accident - */ - raid5_release_stripe(sh); - goto retry; - } + if (unlikely(!sh)) { + /* cannot get stripe, just give-up */ + bi->bi_status =3D BLK_STS_IOERR; + break; + } =20 - if (test_bit(STRIPE_EXPANDING, &sh->state) || - !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { - /* Stripe is busy expanding or - * add failed due to overlap. Flush everything - * and wait a while - */ - md_wakeup_thread(mddev->thread); + if (unlikely(previous)) { + /* expansion might have moved on while waiting for a + * stripe, so we must do the range check again. + * Expansion could still move past after this + * test, but as we are holding a reference to + * 'sh', we know that if that happens, + * STRIPE_EXPANDING will get set and the expansion + * won't proceed until we finish with the stripe. + */ + int must_retry =3D 0; + spin_lock_irq(&conf->device_lock); + if (!ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) + /* mismatch, need to try again */ + must_retry =3D 1; + spin_unlock_irq(&conf->device_lock); + if (must_retry) { raid5_release_stripe(sh); schedule(); do_prepare =3D true; goto retry; } - if (do_flush) { - set_bit(STRIPE_R5C_PREFLUSH, &sh->state); - /* we only need flush for one stripe */ - do_flush =3D false; - } + } =20 - set_bit(STRIPE_HANDLE, &sh->state); - clear_bit(STRIPE_DELAYED, &sh->state); - if ((!sh->batch_head || sh =3D=3D sh->batch_head) && - (bi->bi_opf & REQ_SYNC) && - !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) - atomic_inc(&conf->preread_active_stripes); - release_stripe_plug(mddev, sh); - } else { - /* cannot get stripe for read-ahead, just give-up */ - bi->bi_status =3D BLK_STS_IOERR; - break; + if (read_seqcount_retry(&conf->gen_lock, seq)) { + /* Might have got the wrong stripe_head by accident */ + raid5_release_stripe(sh); + goto retry; + } + + if (test_bit(STRIPE_EXPANDING, &sh->state) || + !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { + /* + * Stripe is busy expanding or add failed due to + * overlap. Flush everything and wait a while. + */ + md_wakeup_thread(mddev->thread); + raid5_release_stripe(sh); + schedule(); + do_prepare =3D true; + goto retry; } + + if (do_flush) { + set_bit(STRIPE_R5C_PREFLUSH, &sh->state); + /* we only need flush for one stripe */ + do_flush =3D false; + } + + set_bit(STRIPE_HANDLE, &sh->state); + clear_bit(STRIPE_DELAYED, &sh->state); + if ((!sh->batch_head || sh =3D=3D sh->batch_head) && + (bi->bi_opf & REQ_SYNC) && + !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) + atomic_inc(&conf->preread_active_stripes); + + release_stripe_plug(mddev, sh); } finish_wait(&conf->wait_for_overlap, &w); =20 --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BFB2C433FE for ; Wed, 20 Apr 2022 19:55:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381897AbiDTT6C (ORCPT ); Wed, 20 Apr 2022 15:58:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344286AbiDTT5y (ORCPT ); Wed, 20 Apr 2022 15:57:54 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0306014003; Wed, 20 Apr 2022 12:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=1+vyla/uaazPWsKpL3zLeU/4Q484AsJ5Xk5mcUxpJCo=; b=pE7ZqxnLRObgKJju9kdNMv2oDL Ku/79G3fV2XH55cw1A7MaP9p08FLNAUpOX6VWebqsBZZMl/lbU9t8h2IxPGAPcpnwu5R/u1MqY2Ab 7xQqJqBGbD/9YwwPhSvWAKSQY8JvK8UgdKr1wgWNi3ntNx/0K/5DUZBouNf6+/c3VH4k3datAJmx1 AOTuPf9dLopWtnGqrnWPzLKI6PgRQ6POvfs9jRoDzVsNbUwC81wMyXtD1oUFA/p3tPZxcJhsgIWIk RB9EdRh5UUvmJdnaA3aOail9DHCOiB8+Jv99lGUf9tq3/SPC+nPkHzNQMN0ytloJ63UEakyVe6ss9 sHrPoJHQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPp-00CRsI-J8; Wed, 20 Apr 2022 13:55:06 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPp-00096H-5u; Wed, 20 Apr 2022 13:55:05 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe , Christoph Hellwig Date: Wed, 20 Apr 2022 13:54:16 -0600 Message-Id: <20220420195425.34911-4-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com, hch@lst.de X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 03/12] md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" stripe_add_to_batch_list() is better done in the loop in make_request instead of inside add_stripe_bio(). This is clearer and allows for storing the batch_head state outside the loop in a subsequent patch. The call to add_stripe_bio() in retry_aligned_read() is for read and batching only applies to write. So it's impossible for batching to happen at that call site. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Guoqing Jiang --- drivers/md/raid5.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index cda6857e6207..8e1ece5ce984 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3534,8 +3534,6 @@ static int add_stripe_bio(struct stripe_head *sh, str= uct bio *bi, int dd_idx, } spin_unlock_irq(&sh->stripe_lock); =20 - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); return 1; =20 overlap: @@ -5955,6 +5953,9 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) goto retry; } =20 + if (stripe_can_batch(sh)) + stripe_add_to_batch_list(conf, sh); + if (do_flush) { set_bit(STRIPE_R5C_PREFLUSH, &sh->state); /* we only need flush for one stripe */ --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C9B2C433EF for ; Wed, 20 Apr 2022 19:55:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359852AbiDTT63 (ORCPT ); Wed, 20 Apr 2022 15:58:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381860AbiDTT54 (ORCPT ); Wed, 20 Apr 2022 15:57:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 192B014006; Wed, 20 Apr 2022 12:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=2orb3f0NC9C2bcIxYzbVjADbh8M+FMnQs2d0YwhKidw=; b=LJUMx76ZtdHWlHoyvtXtijvnbe 6Gq/l0doo2PFsT7+q5M/bAXXZD6ZVVgFkg9kVW4dMTR+EvFJUaTI1FgovRErF4wu5yAqhXvS6AceK mqW8Xe3PKdLcfXMewPjXK5VLDr3SVfi/+RtsRRRTYRB7Gi/XFYpMa25ur1c5sS1sQqHNPmZ3k+Q9V L8Vx7meyHKeVz+aBYFQY4H9VT8xZ71AjWoOwK6AlqJuz2aFcp4q7IkySkkG+rF29JFvOFhks+4rGJ XC5+WD4xBBQmXYt3nnnVQ5t5UJO/KSZqPRBpEdD+QRFlDrkF8YXVXYbOaUyHvSd5lYtR+rq4200J/ oyIAwqXg==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPp-00CRsL-On; Wed, 20 Apr 2022 13:55:07 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPp-00096K-AG; Wed, 20 Apr 2022 13:55:05 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:17 -0600 Message-Id: <20220420195425.34911-5-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 04/12] md/raid5: Move common stripe count increment code into __find_stripe() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Both uses of find_stripe() require a fairly complicated dance to increment the reference count. Move this into a common find_get_stripe() helper. No functional changes intended. Signed-off-by: Logan Gunthorpe Acked-by: Guoqing Jiang Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 133 ++++++++++++++++++++++----------------------- 1 file changed, 65 insertions(+), 68 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 8e1ece5ce984..a0946af5b1ac 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -612,7 +612,7 @@ static void init_stripe(struct stripe_head *sh, sector_= t sector, int previous) } =20 static struct stripe_head *__find_stripe(struct r5conf *conf, sector_t sec= tor, - short generation) + short generation, int hash) { struct stripe_head *sh; =20 @@ -624,6 +624,49 @@ static struct stripe_head *__find_stripe(struct r5conf= *conf, sector_t sector, return NULL; } =20 +static struct stripe_head *find_get_stripe(struct r5conf *conf, + sector_t sector, short generation, int hash) +{ + int inc_empty_inactive_list_flag; + struct stripe_head *sh; + + sh =3D __find_stripe(conf, sector, generation, hash); + if (!sh) + return NULL; + + if (atomic_inc_not_zero(&sh->count)) + return sh; + + /* + * Slow path. The reference count is zero which means the stripe must + * be on a list (sh->lru). Must remove the stripe from the list that + * references it with the device_lock held. + */ + + spin_lock(&conf->device_lock); + if (!atomic_read(&sh->count)) { + if (!test_bit(STRIPE_HANDLE, &sh->state)) + atomic_inc(&conf->active_stripes); + BUG_ON(list_empty(&sh->lru) && + !test_bit(STRIPE_EXPANDING, &sh->state)); + inc_empty_inactive_list_flag =3D 0; + if (!list_empty(conf->inactive_list + hash)) + inc_empty_inactive_list_flag =3D 1; + list_del_init(&sh->lru); + if (list_empty(conf->inactive_list + hash) && + inc_empty_inactive_list_flag) + atomic_inc(&conf->empty_inactive_list_nr); + if (sh->group) { + sh->group->stripes_cnt--; + sh->group =3D NULL; + } + } + atomic_inc(&sh->count); + spin_unlock(&conf->device_lock); + + return sh; +} + /* * Need to check if array has failed when deciding whether to: * - start an array @@ -716,7 +759,6 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t s= ector, { struct stripe_head *sh; int hash =3D stripe_hash_locks_hash(conf, sector); - int inc_empty_inactive_list_flag; =20 pr_debug("get_stripe, sector %llu\n", (unsigned long long)sector); =20 @@ -726,57 +768,34 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t= sector, wait_event_lock_irq(conf->wait_for_quiescent, conf->quiesce =3D=3D 0 || noquiesce, *(conf->hash_locks + hash)); - sh =3D __find_stripe(conf, sector, conf->generation - previous); - if (!sh) { - if (!test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)) { - sh =3D get_free_stripe(conf, hash); - if (!sh && !test_bit(R5_DID_ALLOC, - &conf->cache_state)) - set_bit(R5_ALLOC_MORE, - &conf->cache_state); - } - if (noblock && sh =3D=3D NULL) - break; + sh =3D find_get_stripe(conf, sector, conf->generation - previous, + hash); + if (sh) + break; =20 - r5c_check_stripe_cache_usage(conf); - if (!sh) { - set_bit(R5_INACTIVE_BLOCKED, - &conf->cache_state); - r5l_wake_reclaim(conf->log, 0); - wait_event_lock_irq( - conf->wait_for_stripe, + if (!test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)) { + sh =3D get_free_stripe(conf, hash); + if (!sh && !test_bit(R5_DID_ALLOC, &conf->cache_state)) + set_bit(R5_ALLOC_MORE, &conf->cache_state); + } + if (noblock && !sh) + break; + + r5c_check_stripe_cache_usage(conf); + if (!sh) { + set_bit(R5_INACTIVE_BLOCKED, &conf->cache_state); + r5l_wake_reclaim(conf->log, 0); + wait_event_lock_irq(conf->wait_for_stripe, !list_empty(conf->inactive_list + hash) && (atomic_read(&conf->active_stripes) < (conf->max_nr_stripes * 3 / 4) || !test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)), *(conf->hash_locks + hash)); - clear_bit(R5_INACTIVE_BLOCKED, - &conf->cache_state); - } else { - init_stripe(sh, sector, previous); - atomic_inc(&sh->count); - } - } else if (!atomic_inc_not_zero(&sh->count)) { - spin_lock(&conf->device_lock); - if (!atomic_read(&sh->count)) { - if (!test_bit(STRIPE_HANDLE, &sh->state)) - atomic_inc(&conf->active_stripes); - BUG_ON(list_empty(&sh->lru) && - !test_bit(STRIPE_EXPANDING, &sh->state)); - inc_empty_inactive_list_flag =3D 0; - if (!list_empty(conf->inactive_list + hash)) - inc_empty_inactive_list_flag =3D 1; - list_del_init(&sh->lru); - if (list_empty(conf->inactive_list + hash) && inc_empty_inactive_list_= flag) - atomic_inc(&conf->empty_inactive_list_nr); - if (sh->group) { - sh->group->stripes_cnt--; - sh->group =3D NULL; - } - } + clear_bit(R5_INACTIVE_BLOCKED, &conf->cache_state); + } else { + init_stripe(sh, sector, previous); atomic_inc(&sh->count); - spin_unlock(&conf->device_lock); } } while (sh =3D=3D NULL); =20 @@ -830,7 +849,6 @@ static void stripe_add_to_batch_list(struct r5conf *con= f, struct stripe_head *sh sector_t head_sector, tmp_sec; int hash; int dd_idx; - int inc_empty_inactive_list_flag; =20 /* Don't cross chunks, so stripe pd_idx/qd_idx is the same */ tmp_sec =3D sh->sector; @@ -840,28 +858,7 @@ static void stripe_add_to_batch_list(struct r5conf *co= nf, struct stripe_head *sh =20 hash =3D stripe_hash_locks_hash(conf, head_sector); spin_lock_irq(conf->hash_locks + hash); - head =3D __find_stripe(conf, head_sector, conf->generation); - if (head && !atomic_inc_not_zero(&head->count)) { - spin_lock(&conf->device_lock); - if (!atomic_read(&head->count)) { - if (!test_bit(STRIPE_HANDLE, &head->state)) - atomic_inc(&conf->active_stripes); - BUG_ON(list_empty(&head->lru) && - !test_bit(STRIPE_EXPANDING, &head->state)); - inc_empty_inactive_list_flag =3D 0; - if (!list_empty(conf->inactive_list + hash)) - inc_empty_inactive_list_flag =3D 1; - list_del_init(&head->lru); - if (list_empty(conf->inactive_list + hash) && inc_empty_inactive_list_f= lag) - atomic_inc(&conf->empty_inactive_list_nr); - if (head->group) { - head->group->stripes_cnt--; - head->group =3D NULL; - } - } - atomic_inc(&head->count); - spin_unlock(&conf->device_lock); - } + head =3D find_get_stripe(conf, head_sector, conf->generation, hash); spin_unlock_irq(conf->hash_locks + hash); =20 if (!head) --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24FC8C433EF for ; Wed, 20 Apr 2022 19:55:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381950AbiDTT6X (ORCPT ); Wed, 20 Apr 2022 15:58:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381871AbiDTT54 (ORCPT ); Wed, 20 Apr 2022 15:57:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F6D214037; Wed, 20 Apr 2022 12:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=H68OA6SWmK0cj7c5y+BfsCNV4WEEz0+Ysu2P6c0IvRI=; b=jEUo7p/yyLYUZVI1Ix1mAgciyC B/5xDI9BBVnN2MFHdQS7Ltk2MZhAU2eoDhTE/gFQ/aeYbd9msncr1O0IzLx00ZRfULdzTEOsI0ilv qFrsJe7YTQkiwbYNL1faUcLY5XKHd8zyrfCeoL26GQ8XzLv0hwD0xFK3eXZoIQrAXul1TuxuXDKWV Jg3cQrzgPAoME+4btMKkWUsLIGTIuKY66iZb/jnuHtx/9hMnbgX3r/YpEUi5ZAk6lRuyA9d7PYbKq v2nGpX/TDtJmTnBsoHOjURZX1itC2roMbcxQ2xUHX05usXTJmVoDY7CTgg3PVujzzy/QWs+wD1aJE 3ddIgd9A==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPp-00CRsO-UD; Wed, 20 Apr 2022 13:55:07 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPp-00096N-Fv; Wed, 20 Apr 2022 13:55:05 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:18 -0600 Message-Id: <20220420195425.34911-6-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 05/12] md/raid5: Factor out helper from raid5_make_request() loop X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Factor out the inner loop of raid5_make_request() into it's own helper called make_stripe_request(). The helper returns a number of statuses: SUCCESS, RETRY, SCHEDULE_AND_RETRY and FAIL. This makes the code a bit easier to understand and allows the SCHEDULE_AND_RETRY path to be made common. A context structure is added to contain do_flush. It will be used more in subsequent patches for state that needs to be kept outside the loop. No functional changes intended. This will be cleaned up further in subsequent patches to untangle the gen_lock and do_prepare logic further. Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 225 +++++++++++++++++++++++++-------------------- 1 file changed, 125 insertions(+), 100 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index a0946af5b1ac..5a7334ba0997 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5791,17 +5791,131 @@ static bool ahead_of_reshape(struct mddev *mddev, = sector_t sector, return sector >=3D reshape_sector; } =20 +enum stripe_result { + STRIPE_SUCCESS =3D 0, + STRIPE_RETRY, + STRIPE_SCHEDULE_AND_RETRY, + STRIPE_FAIL, +}; + +struct stripe_request_ctx { + bool do_flush; +}; + +static enum stripe_result make_stripe_request(struct mddev *mddev, + struct r5conf *conf, struct stripe_request_ctx *ctx, + sector_t logical_sector, struct bio *bi, int seq) +{ + const int rw =3D bio_data_dir(bi); + struct stripe_head *sh; + sector_t new_sector; + int previous =3D 0; + int dd_idx; + + if (unlikely(conf->reshape_progress !=3D MaxSector)) { + /* spinlock is needed as reshape_progress may be + * 64bit on a 32bit platform, and so it might be + * possible to see a half-updated value + * Of course reshape_progress could change after + * the lock is dropped, so once we get a reference + * to the stripe that we think it is, we will have + * to check again. + */ + spin_lock_irq(&conf->device_lock); + if (ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) { + previous =3D 1; + } else { + if (ahead_of_reshape(mddev, logical_sector, + conf->reshape_safe)) { + spin_unlock_irq(&conf->device_lock); + return STRIPE_SCHEDULE_AND_RETRY; + } + } + spin_unlock_irq(&conf->device_lock); + } + + new_sector =3D raid5_compute_sector(conf, logical_sector, previous, + &dd_idx, NULL); + pr_debug("raid456: %s, sector %llu logical %llu\n", __func__, + new_sector, logical_sector); + + sh =3D raid5_get_active_stripe(conf, new_sector, previous, + (bi->bi_opf & REQ_RAHEAD), 0); + if (unlikely(!sh)) { + /* cannot get stripe, just give-up */ + bi->bi_status =3D BLK_STS_IOERR; + return STRIPE_FAIL; + } + + if (unlikely(previous)) { + /* expansion might have moved on while waiting for a + * stripe, so we must do the range check again. + * Expansion could still move past after this + * test, but as we are holding a reference to + * 'sh', we know that if that happens, + * STRIPE_EXPANDING will get set and the expansion + * won't proceed until we finish with the stripe. + */ + int must_retry =3D 0; + spin_lock_irq(&conf->device_lock); + if (!ahead_of_reshape(mddev, logical_sector, + conf->reshape_progress)) + /* mismatch, need to try again */ + must_retry =3D 1; + spin_unlock_irq(&conf->device_lock); + if (must_retry) { + raid5_release_stripe(sh); + return STRIPE_SCHEDULE_AND_RETRY; + } + } + + if (read_seqcount_retry(&conf->gen_lock, seq)) { + /* Might have got the wrong stripe_head by accident */ + raid5_release_stripe(sh); + return STRIPE_RETRY; + } + + if (test_bit(STRIPE_EXPANDING, &sh->state) || + !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { + /* + * Stripe is busy expanding or add failed due to + * overlap. Flush everything and wait a while. + */ + md_wakeup_thread(mddev->thread); + raid5_release_stripe(sh); + return STRIPE_SCHEDULE_AND_RETRY; + } + + if (stripe_can_batch(sh)) + stripe_add_to_batch_list(conf, sh); + + if (ctx->do_flush) { + set_bit(STRIPE_R5C_PREFLUSH, &sh->state); + /* we only need flush for one stripe */ + ctx->do_flush =3D false; + } + + set_bit(STRIPE_HANDLE, &sh->state); + clear_bit(STRIPE_DELAYED, &sh->state); + if ((!sh->batch_head || sh =3D=3D sh->batch_head) && + (bi->bi_opf & REQ_SYNC) && + !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) + atomic_inc(&conf->preread_active_stripes); + + release_stripe_plug(mddev, sh); + return STRIPE_SUCCESS; +} + static bool raid5_make_request(struct mddev *mddev, struct bio * bi) { struct r5conf *conf =3D mddev->private; - int dd_idx; - sector_t new_sector; sector_t logical_sector, last_sector; - struct stripe_head *sh; + struct stripe_request_ctx ctx =3D {}; const int rw =3D bio_data_dir(bi); + enum stripe_result res; DEFINE_WAIT(w); bool do_prepare; - bool do_flush =3D false; =20 if (unlikely(bi->bi_opf & REQ_PREFLUSH)) { int ret =3D log_handle_flush_request(conf, bi); @@ -5817,7 +5931,7 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) * if r5l_handle_flush_request() didn't clear REQ_PREFLUSH, * we need to flush journal device */ - do_flush =3D bi->bi_opf & REQ_PREFLUSH; + ctx.do_flush =3D bi->bi_opf & REQ_PREFLUSH; } =20 if (!md_write_start(mddev, bi)) @@ -5857,117 +5971,28 @@ static bool raid5_make_request(struct mddev *mddev= , struct bio * bi) md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); for (; logical_sector < last_sector; logical_sector +=3D RAID5_STRIPE_SEC= TORS(conf)) { - int previous; int seq; =20 do_prepare =3D false; retry: seq =3D read_seqcount_begin(&conf->gen_lock); - previous =3D 0; if (do_prepare) prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); - if (unlikely(conf->reshape_progress !=3D MaxSector)) { - /* spinlock is needed as reshape_progress may be - * 64bit on a 32bit platform, and so it might be - * possible to see a half-updated value - * Of course reshape_progress could change after - * the lock is dropped, so once we get a reference - * to the stripe that we think it is, we will have - * to check again. - */ - spin_lock_irq(&conf->device_lock); - if (ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) { - previous =3D 1; - } else { - if (ahead_of_reshape(mddev, logical_sector, - conf->reshape_safe)) { - spin_unlock_irq(&conf->device_lock); - schedule(); - do_prepare =3D true; - goto retry; - } - } - spin_unlock_irq(&conf->device_lock); - } - - new_sector =3D raid5_compute_sector(conf, logical_sector, - previous, - &dd_idx, NULL); - pr_debug("raid456: raid5_make_request, sector %llu logical %llu\n", - (unsigned long long)new_sector, - (unsigned long long)logical_sector); =20 - sh =3D raid5_get_active_stripe(conf, new_sector, previous, - (bi->bi_opf & REQ_RAHEAD), 0); - if (unlikely(!sh)) { - /* cannot get stripe, just give-up */ - bi->bi_status =3D BLK_STS_IOERR; + res =3D make_stripe_request(mddev, conf, &ctx, logical_sector, + bi, seq); + if (res =3D=3D STRIPE_FAIL) { break; - } - - if (unlikely(previous)) { - /* expansion might have moved on while waiting for a - * stripe, so we must do the range check again. - * Expansion could still move past after this - * test, but as we are holding a reference to - * 'sh', we know that if that happens, - * STRIPE_EXPANDING will get set and the expansion - * won't proceed until we finish with the stripe. - */ - int must_retry =3D 0; - spin_lock_irq(&conf->device_lock); - if (!ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) - /* mismatch, need to try again */ - must_retry =3D 1; - spin_unlock_irq(&conf->device_lock); - if (must_retry) { - raid5_release_stripe(sh); - schedule(); - do_prepare =3D true; - goto retry; - } - } - - if (read_seqcount_retry(&conf->gen_lock, seq)) { - /* Might have got the wrong stripe_head by accident */ - raid5_release_stripe(sh); + } else if (res =3D=3D STRIPE_RETRY) { goto retry; - } - - if (test_bit(STRIPE_EXPANDING, &sh->state) || - !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { - /* - * Stripe is busy expanding or add failed due to - * overlap. Flush everything and wait a while. - */ - md_wakeup_thread(mddev->thread); - raid5_release_stripe(sh); + } else if (res =3D=3D STRIPE_SCHEDULE_AND_RETRY) { schedule(); do_prepare =3D true; goto retry; } - - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); - - if (do_flush) { - set_bit(STRIPE_R5C_PREFLUSH, &sh->state); - /* we only need flush for one stripe */ - do_flush =3D false; - } - - set_bit(STRIPE_HANDLE, &sh->state); - clear_bit(STRIPE_DELAYED, &sh->state); - if ((!sh->batch_head || sh =3D=3D sh->batch_head) && - (bi->bi_opf & REQ_SYNC) && - !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) - atomic_inc(&conf->preread_active_stripes); - - release_stripe_plug(mddev, sh); } + finish_wait(&conf->wait_for_overlap, &w); =20 if (rw =3D=3D WRITE) --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2879BC433F5 for ; Wed, 20 Apr 2022 19:55:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381957AbiDTT6k (ORCPT ); Wed, 20 Apr 2022 15:58:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350703AbiDTT54 (ORCPT ); Wed, 20 Apr 2022 15:57:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D0FF14019; Wed, 20 Apr 2022 12:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=Vm/ulxm5akqPzXRbyuYV6shpoL3L01EFuT6X2nbDUsY=; b=Y1lXibcYmfxAjKEvhIJcHpd+6p V1dsMeJTt2tQiRQJUjATezlU/9tbUf9RAud3dcAanXsxxWbGTv1AbGhXDT1QOPqe3Rlmp/SDXXUI1 yCh0lzGamZzxpO8ahSWKCMxWPVjd/XeXL3JF2q0Tf0v5k3HxE/FCS2rQ+H2maddWR3i5p6x82lfNl nyoii6AN2tmkJm9my6WvD1e9R/efzdBWv5a9tXbx2bzxYNYuhc2QNQ8VHF3zqJ1/0qcHq5KvJH9yD rhGNZWXavcbh97vqcz5SSy8rZUzWTfbD9uefj1G4h3DEeSVXqKdWMVl4RqRAsQgxeIMV+knVBOIkT CewNFR1Q==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPq-00CRsG-B6; Wed, 20 Apr 2022 13:55:07 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPp-00096Q-LZ; Wed, 20 Apr 2022 13:55:05 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:19 -0600 Message-Id: <20220420195425.34911-7-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 06/12] md/raid5: Drop the do_prepare flag in raid5_make_request() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" prepare_to_wait() can be reasonably called after schedule instead of setting a flag and preparing in the next loop iteration. This means that prepare_to_wait() will be called before read_seqcount_begin(), but there shouldn't be any reason that the order matters here. On the first iteration of the loop prepare_to_wait() is already called first. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Guoqing Jiang --- drivers/md/raid5.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 5a7334ba0997..b9f618356446 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5915,7 +5915,6 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) const int rw =3D bio_data_dir(bi); enum stripe_result res; DEFINE_WAIT(w); - bool do_prepare; =20 if (unlikely(bi->bi_opf & REQ_PREFLUSH)) { int ret =3D log_handle_flush_request(conf, bi); @@ -5973,12 +5972,8 @@ static bool raid5_make_request(struct mddev *mddev, = struct bio * bi) for (; logical_sector < last_sector; logical_sector +=3D RAID5_STRIPE_SEC= TORS(conf)) { int seq; =20 - do_prepare =3D false; retry: seq =3D read_seqcount_begin(&conf->gen_lock); - if (do_prepare) - prepare_to_wait(&conf->wait_for_overlap, &w, - TASK_UNINTERRUPTIBLE); =20 res =3D make_stripe_request(mddev, conf, &ctx, logical_sector, bi, seq); @@ -5988,7 +5983,8 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) goto retry; } else if (res =3D=3D STRIPE_SCHEDULE_AND_RETRY) { schedule(); - do_prepare =3D true; + prepare_to_wait(&conf->wait_for_overlap, &w, + TASK_UNINTERRUPTIBLE); goto retry; } } --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC47BC433EF for ; Wed, 20 Apr 2022 19:55:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381944AbiDTT6U (ORCPT ); Wed, 20 Apr 2022 15:58:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381853AbiDTT54 (ORCPT ); Wed, 20 Apr 2022 15:57:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B35914008; Wed, 20 Apr 2022 12:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=r4Vq1d0T34q+iSgyurra75zDOgfwBQsKG4gqaeEAplU=; b=CNGXxhIsebsykcxUuU2h1E8zlv +8TjhfWouh6sucEIzz+gsZ36yYn7kbJb5NNctNP5+VEy9p3Fa9cb8Exd0dBTRs/P/MbDKSAaQMOPB 9KTeTw5tGtoF6U54+BEnb9StqWHaXmomlNDIGad5hSqkKbCT9p4CIJiGn34PLhWLA7OplVHXdlwMT nYFVO7vZequFzyhvq54gODWYJ/dk2jqLHCINhQ9yNzbL61PLxbqIp1PL7k0U2ZwFbl8RCBTZttUYI n8665+kWmXqf40ERN2nyr4T3Y2Kc8e9Ep78YSZgbHX5ZZhTUb9tgv1Psa1LbGaXbkaTu4h41460of DePlg/Zw==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPq-00CRsX-80; Wed, 20 Apr 2022 13:55:06 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPp-00096T-Q8; Wed, 20 Apr 2022 13:55:05 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:20 -0600 Message-Id: <20220420195425.34911-8-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 07/12] md/raid5: Move read_seqcount_begin() into make_stripe_request() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now that prepare_to_wait() isn't in the way, move read_sequcount_begin() into make_stripe_request(). No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Reviewed-by: Guoqing Jiang --- drivers/md/raid5.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index b9f618356446..1bce9075e165 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5804,13 +5804,15 @@ struct stripe_request_ctx { =20 static enum stripe_result make_stripe_request(struct mddev *mddev, struct r5conf *conf, struct stripe_request_ctx *ctx, - sector_t logical_sector, struct bio *bi, int seq) + sector_t logical_sector, struct bio *bi) { const int rw =3D bio_data_dir(bi); struct stripe_head *sh; sector_t new_sector; int previous =3D 0; - int dd_idx; + int seq, dd_idx; + + seq =3D read_seqcount_begin(&conf->gen_lock); =20 if (unlikely(conf->reshape_progress !=3D MaxSector)) { /* spinlock is needed as reshape_progress may be @@ -5970,13 +5972,9 @@ static bool raid5_make_request(struct mddev *mddev, = struct bio * bi) md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); for (; logical_sector < last_sector; logical_sector +=3D RAID5_STRIPE_SEC= TORS(conf)) { - int seq; - retry: - seq =3D read_seqcount_begin(&conf->gen_lock); - res =3D make_stripe_request(mddev, conf, &ctx, logical_sector, - bi, seq); + bi); if (res =3D=3D STRIPE_FAIL) { break; } else if (res =3D=3D STRIPE_RETRY) { --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9A29C433EF for ; Wed, 20 Apr 2022 19:56:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346357AbiDTT7J (ORCPT ); Wed, 20 Apr 2022 15:59:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381885AbiDTT56 (ORCPT ); Wed, 20 Apr 2022 15:57:58 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACEBF14003; Wed, 20 Apr 2022 12:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=vIG3yPf6Z99Rw9//hJOFyO9CpFp0TwWujYCD5Vmfae0=; b=GECck8rsp4SiCN2EXz5B+wiOFG cY766AErBfJij2w+ZMA2+8K+VTGYY+cHEBRPPaQk4PpoRYt4ccFR0cZ2lE4XdidyMuI0BCCtBl4Tr bXF2zwojxkMXDih+s0NGHK4qyzsghjfSEVPITAF5FuPLIJ5I8QLjTsu6mtYFKiiUgVxAVFNV0qWmN yHLXEuZJn1ou1mlY4safFGzHQDZhRGBz06MXV6OsWKnrHDKEz0RO3Uo+6v/uZ5E0UeQpF6wwY1Aya 6LBn318x5xcU1CbLVKos1EUSFvcZtxo9HMU+2R3MjHQiMqBFn7OkIl/jgDG3iQx849/qViQ5MUTXg QsIy386g==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPr-00CRsO-K5; Wed, 20 Apr 2022 13:55:08 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPp-00096W-UT; Wed, 20 Apr 2022 13:55:05 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:21 -0600 Message-Id: <20220420195425.34911-9-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 08/12] md/raid5: Refactor for loop in raid5_make_request() into while loop X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The for loop with retry label can be more cleanly expressed as a while loop by moving the logical_sector increment into the success path. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 1bce9075e165..0c250cc3bfff 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5971,20 +5971,21 @@ static bool raid5_make_request(struct mddev *mddev,= struct bio * bi) } md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); - for (; logical_sector < last_sector; logical_sector +=3D RAID5_STRIPE_SEC= TORS(conf)) { - retry: + while (logical_sector < last_sector) { res =3D make_stripe_request(mddev, conf, &ctx, logical_sector, bi); if (res =3D=3D STRIPE_FAIL) { break; } else if (res =3D=3D STRIPE_RETRY) { - goto retry; + continue; } else if (res =3D=3D STRIPE_SCHEDULE_AND_RETRY) { schedule(); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); - goto retry; + continue; } + + logical_sector +=3D RAID5_STRIPE_SECTORS(conf); } =20 finish_wait(&conf->wait_for_overlap, &w); --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FB52C433F5 for ; Wed, 20 Apr 2022 19:55:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381954AbiDTT6e (ORCPT ); Wed, 20 Apr 2022 15:58:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381873AbiDTT54 (ORCPT ); Wed, 20 Apr 2022 15:57:56 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE65714094; Wed, 20 Apr 2022 12:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=WhwSGYFZO9kGAfRzt28H/YYsFOtAZPiqsXGEK2+Lavw=; b=PDe+IeOmXMjlqqWIauzVVtEq4P JpZtRlG1zJBel2cyyo3kVX/zYDPFwxEO2HEQXvzN7OujO0QGDm1/DyojyzmodZmQn1rMT4yVeSSdK 4j7VSmiEZmRKdQNAs2sk6bDUlDMPBn1/lwUikM5FbUV+5he8eIqgSvSe3BJRzOBGDRjvWmJ/cyvcb G2pTG0fixLageVeGDuxbhKP42yXQC8FHvbqeZcEXDKuot485oiXbi9d9OYoZdoTGSNre+ONLyOJZ8 8OiTPmkNeHzlthGwlTb5J1i07zvWlV4lKZItBszBN4A16Yl0YUOh8w1fobcoALtWMYY4pfATTHI18 MGeVgaig==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPq-00CRsI-R0; Wed, 20 Apr 2022 13:55:08 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPq-00096b-4k; Wed, 20 Apr 2022 13:55:06 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:22 -0600 Message-Id: <20220420195425.34911-10-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 09/12] md/raid5: Keep a reference to last stripe_head for batch X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When batching, every stripe head has to find the previous stripe head to add to the batch list. This involves taking the hash lock which is highly contended during IO. Instead of finding the previous stripe_head each time, store a reference to the previous stripe_head in a pointer so that it doesn't require taking the contended lock another time. The reference to the previous stripe must be released before scheduling and waiting for work to get done. Otherwise, it can hold up raid5_activate_delayed() and deadlock. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 51 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 39 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 0c250cc3bfff..28ea7b9b6ab6 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -843,7 +843,8 @@ static bool stripe_can_batch(struct stripe_head *sh) } =20 /* we only do back search */ -static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_he= ad *sh) +static void stripe_add_to_batch_list(struct r5conf *conf, + struct stripe_head *sh, struct stripe_head *last_sh) { struct stripe_head *head; sector_t head_sector, tmp_sec; @@ -856,15 +857,20 @@ static void stripe_add_to_batch_list(struct r5conf *c= onf, struct stripe_head *sh return; head_sector =3D sh->sector - RAID5_STRIPE_SECTORS(conf); =20 - hash =3D stripe_hash_locks_hash(conf, head_sector); - spin_lock_irq(conf->hash_locks + hash); - head =3D find_get_stripe(conf, head_sector, conf->generation, hash); - spin_unlock_irq(conf->hash_locks + hash); - - if (!head) - return; - if (!stripe_can_batch(head)) - goto out; + if (last_sh && head_sector =3D=3D last_sh->sector) { + head =3D last_sh; + atomic_inc(&head->count); + } else { + hash =3D stripe_hash_locks_hash(conf, head_sector); + spin_lock_irq(conf->hash_locks + hash); + head =3D find_get_stripe(conf, head_sector, conf->generation, + hash); + spin_unlock_irq(conf->hash_locks + hash); + if (!head) + return; + if (!stripe_can_batch(head)) + goto out; + } =20 lock_two_stripes(head, sh); /* clear_batch_ready clear the flag */ @@ -5800,6 +5806,7 @@ enum stripe_result { =20 struct stripe_request_ctx { bool do_flush; + struct stripe_head *batch_last; }; =20 static enum stripe_result make_stripe_request(struct mddev *mddev, @@ -5889,8 +5896,13 @@ static enum stripe_result make_stripe_request(struct= mddev *mddev, return STRIPE_SCHEDULE_AND_RETRY; } =20 - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); + if (stripe_can_batch(sh)) { + stripe_add_to_batch_list(conf, sh, ctx->batch_last); + if (ctx->batch_last) + raid5_release_stripe(ctx->batch_last); + atomic_inc(&sh->count); + ctx->batch_last =3D sh; + } =20 if (ctx->do_flush) { set_bit(STRIPE_R5C_PREFLUSH, &sh->state); @@ -5979,6 +5991,18 @@ static bool raid5_make_request(struct mddev *mddev, = struct bio * bi) } else if (res =3D=3D STRIPE_RETRY) { continue; } else if (res =3D=3D STRIPE_SCHEDULE_AND_RETRY) { + /* + * Must release the reference to batch_last before + * scheduling and waiting for work to be done, + * otherwise the batch_last stripe head could prevent + * raid5_activate_delayed() from making progress + * and thus deadlocking. + */ + if (ctx.batch_last) { + raid5_release_stripe(ctx.batch_last); + ctx.batch_last =3D NULL; + } + schedule(); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); @@ -5990,6 +6014,9 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) =20 finish_wait(&conf->wait_for_overlap, &w); =20 + if (ctx.batch_last) + raid5_release_stripe(ctx.batch_last); + if (rw =3D=3D WRITE) md_write_end(mddev); bio_endio(bi); --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF7E8C433EF for ; Wed, 20 Apr 2022 19:56:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382017AbiDTT65 (ORCPT ); Wed, 20 Apr 2022 15:58:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381883AbiDTT56 (ORCPT ); Wed, 20 Apr 2022 15:57:58 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACE2013FBE; Wed, 20 Apr 2022 12:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=Bk/GVIwZMRE7BfrPX4RukHpgbSjfh2df83GTvHPnqeE=; b=r6mAeQ/3FqkDWaO4JlW9Fu/AR9 cToXgK2ax9Bd9zSBLPhGFiKb3eYjYShzlciv6f1XhRKql+SfJK9AZoJ6NkwElQ0/lzRKsUMwYH3na cT8e+8Nr91C33fnYOgdeVBbou1WKQelWZcslcIkpYqeYRdmJ5pYpvJ/VyvrCeM6hgV82qYT92aI08 3qWpeasNT9yNapCM9N7MkYxj6sut+8E2XK6/QHSlRabTBLabeAK465XBF2GJnPNBlG9fCs81w3NcS 8Sq8uoksK4UYhCXkVlkXrJLi6jUURdkMwVjCZ+BeEnhoeerJ/wDySjLor5rYJC287l9uOQAEbBfd2 EsbOmYzA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPr-00CRsL-Jd; Wed, 20 Apr 2022 13:55:09 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPq-00096l-CG; Wed, 20 Apr 2022 13:55:06 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:23 -0600 Message-Id: <20220420195425.34911-11-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 10/12] md/raid5: Refactor add_stripe_bio() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Factor out two helper functions from add_stripe_bio(): one to check for overlap (stripe_bio_overlaps()), and one to actually add the bio to the stripe (__add_stripe_bio()). The latter function will always succeed. This will be useful in the next patch so that overlap can be checked for multiple disks before adding any Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 86 ++++++++++++++++++++++++++++++---------------- 1 file changed, 56 insertions(+), 30 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 28ea7b9b6ab6..1fa82d8fa89e 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3418,39 +3418,32 @@ schedule_reconstruction(struct stripe_head *sh, str= uct stripe_head_state *s, s->locked, s->ops_request); } =20 -/* - * Each stripe/dev can have one or more bion attached. - * toread/towrite point to the first in a chain. - * The bi_next chain must be in order. - */ -static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_i= dx, - int forwrite, int previous) +static bool stripe_bio_overlaps(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite) { - struct bio **bip; struct r5conf *conf =3D sh->raid_conf; - int firstwrite=3D0; + struct bio **bip; =20 - pr_debug("adding bi b#%llu to stripe s#%llu\n", - (unsigned long long)bi->bi_iter.bi_sector, - (unsigned long long)sh->sector); + pr_debug("checking bi b#%llu to stripe s#%llu\n", + bi->bi_iter.bi_sector, sh->sector); =20 - spin_lock_irq(&sh->stripe_lock); /* Don't allow new IO added to stripes in batch list */ if (sh->batch_head) - goto overlap; - if (forwrite) { + return true; + + if (forwrite) bip =3D &sh->dev[dd_idx].towrite; - if (*bip =3D=3D NULL) - firstwrite =3D 1; - } else + else bip =3D &sh->dev[dd_idx].toread; + while (*bip && (*bip)->bi_iter.bi_sector < bi->bi_iter.bi_sector) { if (bio_end_sector(*bip) > bi->bi_iter.bi_sector) - goto overlap; - bip =3D & (*bip)->bi_next; + return true; + bip =3D &(*bip)->bi_next; } + if (*bip && (*bip)->bi_iter.bi_sector < bio_end_sector(bi)) - goto overlap; + return true; =20 if (forwrite && raid5_has_ppl(conf)) { /* @@ -3479,9 +3472,30 @@ static int add_stripe_bio(struct stripe_head *sh, st= ruct bio *bi, int dd_idx, } =20 if (first + conf->chunk_sectors * (count - 1) !=3D last) - goto overlap; + return true; } =20 + return false; +} + +static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite, int previous) +{ + struct r5conf *conf =3D sh->raid_conf; + struct bio **bip; + int firstwrite =3D 0; + + if (forwrite) { + bip =3D &sh->dev[dd_idx].towrite; + if (!*bip) + firstwrite =3D 1; + } else { + bip =3D &sh->dev[dd_idx].toread; + } + + while (*bip && (*bip)->bi_iter.bi_sector < bi->bi_iter.bi_sector) + bip =3D &(*bip)->bi_next; + if (!forwrite || previous) clear_bit(STRIPE_BATCH_READY, &sh->state); =20 @@ -3508,8 +3522,7 @@ static int add_stripe_bio(struct stripe_head *sh, str= uct bio *bi, int dd_idx, } =20 pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", - (unsigned long long)(*bip)->bi_iter.bi_sector, - (unsigned long long)sh->sector, dd_idx); + (*bip)->bi_iter.bi_sector, sh->sector, dd_idx); =20 if (conf->mddev->bitmap && firstwrite) { /* Cannot hold spinlock over bitmap_startwrite, @@ -3517,7 +3530,7 @@ static int add_stripe_bio(struct stripe_head *sh, str= uct bio *bi, int dd_idx, * we have added to the bitmap and set bm_seq. * So set STRIPE_BITMAP_PENDING to prevent * batching. - * If multiple add_stripe_bio() calls race here they + * If multiple __add_stripe_bio() calls race here they * much all set STRIPE_BITMAP_PENDING. So only the first one * to complete "bitmap_startwrite" gets to set * STRIPE_BIT_DELAY. This is important as once a stripe @@ -3535,14 +3548,27 @@ static int add_stripe_bio(struct stripe_head *sh, s= truct bio *bi, int dd_idx, set_bit(STRIPE_BIT_DELAY, &sh->state); } } - spin_unlock_irq(&sh->stripe_lock); +} =20 - return 1; +/* + * Each stripe/dev can have one or more bios attached. + * toread/towrite point to the first in a chain. + * The bi_next chain must be in order. + */ +static bool add_stripe_bio(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite, int previous) +{ + spin_lock_irq(&sh->stripe_lock); =20 - overlap: - set_bit(R5_Overlap, &sh->dev[dd_idx].flags); + if (stripe_bio_overlaps(sh, bi, dd_idx, forwrite)) { + set_bit(R5_Overlap, &sh->dev[dd_idx].flags); + spin_unlock_irq(&sh->stripe_lock); + return false; + } + + __add_stripe_bio(sh, bi, dd_idx, forwrite, previous); spin_unlock_irq(&sh->stripe_lock); - return 0; + return true; } =20 static void end_reshape(struct r5conf *conf); --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1B69C433EF for ; Wed, 20 Apr 2022 19:56:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381926AbiDTT7C (ORCPT ); Wed, 20 Apr 2022 15:59:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381884AbiDTT56 (ORCPT ); Wed, 20 Apr 2022 15:57:58 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACFD314095; Wed, 20 Apr 2022 12:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=Y305VOn5uSVqG7ZLxlVP+RdBS8eUiSPvZm9RD+YTK1o=; b=qLDKiAo39D8VStwKrfvtcxk6D5 dXxnm2OinpeQgJgsg52JWhkzqPTtmY47iLKLY11Pt5HJxDhij8wggDE3XztLBk8Unafz1iuBcaxM5 Gx0e9RchhtVybP+rXRJw1545DoiA1nLOoSNkmD4dQ8S8pbEVLJfSBlCtKPYf/U/gY3ZymwHHjvrUM 54Gp+qngPwwmGJiIoWa6WNOXByW3uq8hBO3x0kH134TVBHTdEb6F2iZ4us1PAEpoUPtAhxd4hCfs2 V3cZCgTu+x/fm4HABxiIFJmoTSoIAWQq82M4+QprLdxCZW0F1ppgiQIwz5ow9Aj78FVGegNjLDjJz ZZAAaNHQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPr-00CRsG-JX; Wed, 20 Apr 2022 13:55:08 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPq-00096p-Jo; Wed, 20 Apr 2022 13:55:06 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:24 -0600 Message-Id: <20220420195425.34911-12-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 11/12] md/raid5: Check all disks in a stripe_head for reshape progress X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When testing if a previous stripe has had reshape expand past it, use the earliest or latest logical sector in all the disks for that stripe head. This will allow adding multiple disks at a time in a subesquent patch. To do this cleaner, refactor the check into a helper function called stripe_ahead_of_reshape(). Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 55 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 41 insertions(+), 14 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 1fa82d8fa89e..40a25c4b80bd 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5823,6 +5823,42 @@ static bool ahead_of_reshape(struct mddev *mddev, se= ctor_t sector, return sector >=3D reshape_sector; } =20 +static bool range_ahead_of_reshape(struct mddev *mddev, sector_t min, + sector_t max, sector_t reshape_sector) +{ + if (mddev->reshape_backwards) + return max < reshape_sector; + else + return min >=3D reshape_sector; +} + +static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *co= nf, + struct stripe_head *sh) +{ + sector_t max_sector =3D 0, min_sector =3D MaxSector; + bool ret =3D false; + int dd_idx; + + for (dd_idx =3D 0; dd_idx < sh->disks; dd_idx++) { + if (dd_idx =3D=3D sh->pd_idx) + continue; + + min_sector =3D min(min_sector, sh->dev[dd_idx].sector); + max_sector =3D min(max_sector, sh->dev[dd_idx].sector); + } + + spin_lock_irq(&conf->device_lock); + + if (!range_ahead_of_reshape(mddev, min_sector, max_sector, + conf->reshape_progress)) + /* mismatch, need to try again */ + ret =3D true; + + spin_unlock_irq(&conf->device_lock); + + return ret; +} + enum stripe_result { STRIPE_SUCCESS =3D 0, STRIPE_RETRY, @@ -5883,26 +5919,17 @@ static enum stripe_result make_stripe_request(struc= t mddev *mddev, return STRIPE_FAIL; } =20 - if (unlikely(previous)) { - /* expansion might have moved on while waiting for a - * stripe, so we must do the range check again. + if (unlikely(previous) && + stripe_ahead_of_reshape(mddev, conf, sh)) { + /* Expansion moved on while waiting for a stripe. * Expansion could still move past after this * test, but as we are holding a reference to * 'sh', we know that if that happens, * STRIPE_EXPANDING will get set and the expansion * won't proceed until we finish with the stripe. */ - int must_retry =3D 0; - spin_lock_irq(&conf->device_lock); - if (!ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) - /* mismatch, need to try again */ - must_retry =3D 1; - spin_unlock_irq(&conf->device_lock); - if (must_retry) { - raid5_release_stripe(sh); - return STRIPE_SCHEDULE_AND_RETRY; - } + raid5_release_stripe(sh); + return STRIPE_SCHEDULE_AND_RETRY; } =20 if (read_seqcount_retry(&conf->gen_lock, seq)) { --=20 2.30.2 From nobody Mon May 11 00:07:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BDDDC433F5 for ; Wed, 20 Apr 2022 19:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381960AbiDTT6p (ORCPT ); Wed, 20 Apr 2022 15:58:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381877AbiDTT55 (ORCPT ); Wed, 20 Apr 2022 15:57:57 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3982B1208E; Wed, 20 Apr 2022 12:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=VK+5QNMcF70KuSsGQWmdoeeUasVGtef6jeUVy60gHrc=; b=EHBhk1aL+kitC5UugHkxMYjmDX LiT+HNA1uqLg+gY2XnsAA1CADxF6x0BJ+/SEpwlZJ8uAGe6TA2MvA+nx+yIkcF0G3FoMNwM/44DQt uremYGjBqFxT61vvCwYUR0ImgzILHPmMbvuEALInh6kmmiQNY0I7NKvsvuAPcZc4pP2PByFmhBXpO EgyFIaRSWdYc8rKUxKsopycT92jhHrTGbeKfr1uRMBQBH1fYd/D4y4JX3n3NSiUFtwSELKtwwalcs Iait0MSttU4xUJzr0Oi1Km0lee2xVUHPqm4H0gEB/pnfc9eQw+rDf3gw0TaAsxdc7Bfg5wVuTtBCE mO5tX6NA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nhGPr-00CRsX-Iy; Wed, 20 Apr 2022 13:55:08 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1nhGPq-00096z-T1; Wed, 20 Apr 2022 13:55:06 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Christoph Hellwig , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Wed, 20 Apr 2022 13:54:25 -0600 Message-Id: <20220420195425.34911-13-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220420195425.34911-1-logang@deltatee.com> References: <20220420195425.34911-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, hch@infradead.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v2 12/12] md/raid5: Pivot raid5_make_request() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" raid5_make_request() loops through every page in the request, finds the appropriate stripe and adds the bio for that page in the disk. This causes a great deal of contention on the hash_lock seeing the lock for that hash must be taken for every single page. The number of times the lock is taken can be reduced by pivoting raid5_make_request() so that it loops through every stripe and then loops through every disk in that stripe to see if the bio must be added. This reduces the number of times the lock must be taken by a factor equal to the number of data disks. To accomplish this, store the minimum and maxmimum disk sector that has already been finished and continue to the next logical sector if it is found that the disk sector has already been done. Then add a add_all_stripe_bios() to check all the bios for overlap and add them all if none of them overlap. Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 92 +++++++++++++++++++++++++++++++++++++++++++--- drivers/md/raid5.h | 1 + 2 files changed, 88 insertions(+), 5 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 40a25c4b80bd..f86866cb15be 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3571,6 +3571,48 @@ static bool add_stripe_bio(struct stripe_head *sh, s= truct bio *bi, return true; } =20 +static int add_all_stripe_bios(struct stripe_head *sh, struct bio *bi, + sector_t first_logical_sector, sector_t last_sector, + int forwrite, int previous) +{ + int dd_idx; + int ret =3D 1; + + spin_lock_irq(&sh->stripe_lock); + + for (dd_idx =3D 0; dd_idx < sh->disks; dd_idx++) { + struct r5dev *dev =3D &sh->dev[dd_idx]; + + clear_bit(R5_BioReady, &dev->flags); + + if (dd_idx =3D=3D sh->pd_idx) + continue; + + if (dev->sector < first_logical_sector || + dev->sector >=3D last_sector) + continue; + + if (stripe_bio_overlaps(sh, bi, dd_idx, forwrite)) { + set_bit(R5_Overlap, &dev->flags); + ret =3D 0; + continue; + } + + set_bit(R5_BioReady, &dev->flags); + } + + if (!ret) + goto out; + + for (dd_idx =3D 0; dd_idx < sh->disks; dd_idx++) + if (test_bit(R5_BioReady, &sh->dev[dd_idx].flags)) + __add_stripe_bio(sh, bi, dd_idx, forwrite, previous); + +out: + spin_unlock_irq(&sh->stripe_lock); + return ret; +} + static void end_reshape(struct r5conf *conf); =20 static void stripe_set_idx(sector_t stripe, struct r5conf *conf, int previ= ous, @@ -5869,6 +5911,10 @@ enum stripe_result { struct stripe_request_ctx { bool do_flush; struct stripe_head *batch_last; + sector_t disk_sector_done; + sector_t start_disk_sector; + bool first_wrap; + sector_t last_sector; }; =20 static enum stripe_result make_stripe_request(struct mddev *mddev, @@ -5908,6 +5954,36 @@ static enum stripe_result make_stripe_request(struct= mddev *mddev, =20 new_sector =3D raid5_compute_sector(conf, logical_sector, previous, &dd_idx, NULL); + + /* + * This is a tricky algorithm to figure out which stripe_heads that + * have already been visited and exit early if the stripe_head has + * already been done. (Seeing all disks are added to a stripe_head + * once in add_all_stripe_bios(). + * + * To start with, the disk sector of the last stripe that has been + * completed is stored in ctx->disk_sector_done. If the new_sector is + * less than this value, the stripe_head has already been done. + * + * There's one issue with this: if the request starts in the middle of + * a chunk, all the stripe heads before the starting offset will be + * missed. To account for this, set the first_wrap boolean to true + * if new_sector is less than the starting sector. Clear the + * boolean once the start sector is hit for the second time. + * When first_wrap is set, ignore the disk_sector_done. + */ + if (ctx->start_disk_sector =3D=3D MaxSector) { + ctx->start_disk_sector =3D new_sector; + } else if (new_sector < ctx->start_disk_sector) { + ctx->first_wrap =3D true; + } else if (new_sector =3D=3D ctx->start_disk_sector) { + ctx->first_wrap =3D false; + ctx->start_disk_sector =3D 0; + return STRIPE_SUCCESS; + } else if (!ctx->first_wrap && new_sector <=3D ctx->disk_sector_done) { + return STRIPE_SUCCESS; + } + pr_debug("raid456: %s, sector %llu logical %llu\n", __func__, new_sector, logical_sector); =20 @@ -5939,7 +6015,8 @@ static enum stripe_result make_stripe_request(struct = mddev *mddev, } =20 if (test_bit(STRIPE_EXPANDING, &sh->state) || - !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { + !add_all_stripe_bios(sh, bi, logical_sector, ctx->last_sector, rw, + previous)) { /* * Stripe is busy expanding or add failed due to * overlap. Flush everything and wait a while. @@ -5949,6 +6026,9 @@ static enum stripe_result make_stripe_request(struct = mddev *mddev, return STRIPE_SCHEDULE_AND_RETRY; } =20 + if (new_sector > ctx->disk_sector_done) + ctx->disk_sector_done =3D new_sector; + if (stripe_can_batch(sh)) { stripe_add_to_batch_list(conf, sh, ctx->batch_last); if (ctx->batch_last) @@ -5977,8 +6057,10 @@ static enum stripe_result make_stripe_request(struct= mddev *mddev, static bool raid5_make_request(struct mddev *mddev, struct bio * bi) { struct r5conf *conf =3D mddev->private; - sector_t logical_sector, last_sector; - struct stripe_request_ctx ctx =3D {}; + sector_t logical_sector; + struct stripe_request_ctx ctx =3D { + .start_disk_sector =3D MaxSector, + }; const int rw =3D bio_data_dir(bi); enum stripe_result res; DEFINE_WAIT(w); @@ -6021,7 +6103,7 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) } =20 logical_sector =3D bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTO= RS(conf)-1); - last_sector =3D bio_end_sector(bi); + ctx.last_sector =3D bio_end_sector(bi); bi->bi_next =3D NULL; =20 /* Bail out if conflicts with reshape and REQ_NOWAIT is set */ @@ -6036,7 +6118,7 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) } md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); - while (logical_sector < last_sector) { + while (logical_sector < ctx.last_sector) { res =3D make_stripe_request(mddev, conf, &ctx, logical_sector, bi); if (res =3D=3D STRIPE_FAIL) { diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 638d29863503..e73b58844f83 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -308,6 +308,7 @@ enum r5dev_flags { R5_Wantwrite, R5_Overlap, /* There is a pending overlapping request * on this block */ + R5_BioReady, /* The current bio can be added to this disk */ R5_ReadNoMerge, /* prevent bio from merging in block-layer */ R5_ReadError, /* seen a read error here recently */ R5_ReWrite, /* have tried to over-write the readerror */ --=20 2.30.2