From nobody Thu May 14 07:12:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16241C433FE for ; Thu, 7 Apr 2022 17:22:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346210AbiDGRYs (ORCPT ); Thu, 7 Apr 2022 13:24:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346123AbiDGRYH (ORCPT ); Thu, 7 Apr 2022 13:24:07 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D84C21AF37; Thu, 7 Apr 2022 10:22:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=rfpeReUyOakDeWgnZ8ReEXJWBQv5CMxmL/J4NdHYak8=; b=KJx+vBhsS5uHJB+cFfcY8I4WEr z+w4X0fM9q1jvM/Y7cAT34U+WeameXar9PjOaz+JtAyvBezDyiFDM5fxl3Cie6jkDI+PjQN5UtqMW UiLNR3JuCB6hMXKaE29X2AvMp48vhsZ6OQpyaOxBQdTmBV0ovKcawR8282QLCbkxR+eenuj/PGTTf mSi7D73PIzVKk+Vbb6UmJ2PhY6fZGdl0y9mgbip/5Rde/Cw5qlGFfk2yLWFJc120VtSiswhxkr2Rh AlUTDTuzZmrPbVpP/HfIb/c5SwZS4diYS+GQiU3FJmFhM7BoeUue63YFsNRJGKPVGLGDDrfDSdcpZ 1nTPdd8g==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ncVG1-002CHl-9u; Thu, 07 Apr 2022 10:45:18 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ncVFz-0002Db-UX; Thu, 07 Apr 2022 10:45:16 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Shaohua Li , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 7 Apr 2022 10:45:04 -0600 Message-Id: <20220407164511.8472-2-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220407164511.8472-1-logang@deltatee.com> References: <20220407164511.8472-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, shli@kernel.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v1 1/8] md/raid5: Refactor raid5_make_request loop X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Break immediately if raid5_get_active_stripe() returns NULL and deindent the rest of the loop. Annotate this check with an unlikely(). This makes the code easier to read and reduces the indentation level. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 111 +++++++++++++++++++++++---------------------- 1 file changed, 56 insertions(+), 55 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 351d341a1ffa..b794253efd15 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5868,69 +5868,70 @@ static bool raid5_make_request(struct mddev *mddev,= struct bio * bi) =20 sh =3D raid5_get_active_stripe(conf, new_sector, previous, (bi->bi_opf & REQ_RAHEAD), 0); - if (sh) { - if (unlikely(previous)) { - /* expansion might have moved on while waiting for a - * stripe, so we must do the range check again. - * Expansion could still move past after this - * test, but as we are holding a reference to - * 'sh', we know that if that happens, - * STRIPE_EXPANDING will get set and the expansion - * won't proceed until we finish with the stripe. - */ - int must_retry =3D 0; - spin_lock_irq(&conf->device_lock); - if (mddev->reshape_backwards - ? logical_sector >=3D conf->reshape_progress - : logical_sector < conf->reshape_progress) - /* mismatch, need to try again */ - must_retry =3D 1; - spin_unlock_irq(&conf->device_lock); - if (must_retry) { - raid5_release_stripe(sh); - schedule(); - do_prepare =3D true; - goto retry; - } - } - if (read_seqcount_retry(&conf->gen_lock, seq)) { - /* Might have got the wrong stripe_head - * by accident - */ - raid5_release_stripe(sh); - goto retry; - } + if (unlikely(!sh)) { + /* cannot get stripe, just give-up */ + bi->bi_status =3D BLK_STS_IOERR; + break; + } =20 - if (test_bit(STRIPE_EXPANDING, &sh->state) || - !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { - /* Stripe is busy expanding or - * add failed due to overlap. Flush everything - * and wait a while - */ - md_wakeup_thread(mddev->thread); + if (unlikely(previous)) { + /* + * Expansion might have moved on while waiting for a + * stripe, so we must do the range check again. + * Expansion could still move past after this + * test, but as we are holding a reference to + * 'sh', we know that if that happens, + * STRIPE_EXPANDING will get set and the expansion + * won't proceed until we finish with the stripe. + */ + int must_retry =3D 0; + spin_lock_irq(&conf->device_lock); + if (mddev->reshape_backwards + ? logical_sector >=3D conf->reshape_progress + : logical_sector < conf->reshape_progress) + /* mismatch, need to try again */ + must_retry =3D 1; + spin_unlock_irq(&conf->device_lock); + if (must_retry) { raid5_release_stripe(sh); schedule(); do_prepare =3D true; goto retry; } - if (do_flush) { - set_bit(STRIPE_R5C_PREFLUSH, &sh->state); - /* we only need flush for one stripe */ - do_flush =3D false; - } + } + if (read_seqcount_retry(&conf->gen_lock, seq)) { + /* Might have got the wrong stripe_head by accident */ + raid5_release_stripe(sh); + goto retry; + } =20 - set_bit(STRIPE_HANDLE, &sh->state); - clear_bit(STRIPE_DELAYED, &sh->state); - if ((!sh->batch_head || sh =3D=3D sh->batch_head) && - (bi->bi_opf & REQ_SYNC) && - !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) - atomic_inc(&conf->preread_active_stripes); - release_stripe_plug(mddev, sh); - } else { - /* cannot get stripe for read-ahead, just give-up */ - bi->bi_status =3D BLK_STS_IOERR; - break; + if (test_bit(STRIPE_EXPANDING, &sh->state) || + !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { + /* + * Stripe is busy expanding or add failed due to + * overlap. Flush everything and wait a while. + */ + md_wakeup_thread(mddev->thread); + raid5_release_stripe(sh); + schedule(); + do_prepare =3D true; + goto retry; } + + if (do_flush) { + set_bit(STRIPE_R5C_PREFLUSH, &sh->state); + /* we only need flush for one stripe */ + do_flush =3D false; + } + + set_bit(STRIPE_HANDLE, &sh->state); + clear_bit(STRIPE_DELAYED, &sh->state); + if ((!sh->batch_head || sh =3D=3D sh->batch_head) && + (bi->bi_opf & REQ_SYNC) && + !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) + atomic_inc(&conf->preread_active_stripes); + + release_stripe_plug(mddev, sh); } finish_wait(&conf->wait_for_overlap, &w); =20 --=20 2.30.2 From nobody Thu May 14 07:12:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68D0CC433F5 for ; Thu, 7 Apr 2022 17:22:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346152AbiDGRYz (ORCPT ); Thu, 7 Apr 2022 13:24:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346118AbiDGRYH (ORCPT ); Thu, 7 Apr 2022 13:24:07 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06782186DC; Thu, 7 Apr 2022 10:22:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=S+r51FPP6n10yfgtvgNM7cq5++cCBWiReTm/0uolfQY=; b=rYlQCuelx0cDeN85iw8hh4CWvr LMbonNjYra2S9eW+CNjQD9KXGqbRr1Fb7DDvNPeI4frDJYTUjbAOJoy4GOqOdRnzt2m53OE1r3feS OpHYXB4oqTP9wITJxILkMQ11xCDerUtvyOpdf7Najb5IqOJsOOcHBorNqxl3SMMXryX5PoAYZ2MMh /lDd/oo1/lOeRjSqpoWhqvXSEdmid8zPpElTxESUaLWkGh7qNgzVIu86v3xp10cJ9g4BB3psd01T4 u80uTGSE+00er1CVhP+/dQwDPBJvwOYtL4r2wDxDeYc9eZ2tCUA4dVdlFtO+R4x84gmeoGPiHgOif 9WzlcTNA==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ncVG1-002CHm-9u; Thu, 07 Apr 2022 10:45:19 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ncVG0-0002De-41; Thu, 07 Apr 2022 10:45:16 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Shaohua Li , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 7 Apr 2022 10:45:05 -0600 Message-Id: <20220407164511.8472-3-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220407164511.8472-1-logang@deltatee.com> References: <20220407164511.8472-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, shli@kernel.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v1 2/8] md/raid5: Move stripe_add_to_batch_list() call out of add_stripe_bio() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" stripe_add_to_batch_list() is better done in the loop in make_request instead of inside add_stripe_bio(). This is clearer and allows for storing the batch_head state outside the loop in a subsequent patch. The call to add_stripe_bio() in retry_aligned_read() is for a read only and thus wouldn't have added the batch anyway. No functional changes intended. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/md/raid5.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index b794253efd15..e3c75b3b8923 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3504,8 +3504,6 @@ static int add_stripe_bio(struct stripe_head *sh, str= uct bio *bi, int dd_idx, } spin_unlock_irq(&sh->stripe_lock); =20 - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); return 1; =20 overlap: @@ -5918,6 +5916,9 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) goto retry; } =20 + if (stripe_can_batch(sh)) + stripe_add_to_batch_list(conf, sh); + if (do_flush) { set_bit(STRIPE_R5C_PREFLUSH, &sh->state); /* we only need flush for one stripe */ --=20 2.30.2 From nobody Thu May 14 07:12:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1950EC433F5 for ; Thu, 7 Apr 2022 17:23:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346234AbiDGRZA (ORCPT ); Thu, 7 Apr 2022 13:25:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346131AbiDGRYJ (ORCPT ); Thu, 7 Apr 2022 13:24:09 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D854B286C3; Thu, 7 Apr 2022 10:22:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=43ieDeq7qFS7TQx+ySkeNxmmOsQba+pv9Ms1C88DsoQ=; b=UT1Tb1sWIBWWtUp5Pt5mUWoiaV eJV1QDZ5uo4Rj3wEDshNlOop6MGUpDaWDOC02aRGwC1s3G3oGChKenS0eDSeSBpUN/0h3GrdFigqR GE7XbCeYNc1mPdovxP3VRKUK3TdLtDS3uaIidNAPWNUGhlbS1Y0KxZW+AqLdZkKs2YAZohqOKsi9N Y4J7zRP2yTttz+JNegEhw1m8hnAtzyK8EafCqONYZJSovEmnbwxvuCC/KlAHE57xLiBvwNpgVgzWr MvMCZvxo7p9rg5J5FK3BI6L3H4guSBVPJWcM+y4yFJSEacziraPVBWI658GLBF4OQPyzXq1vQlmf0 xg0ezj9Q==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ncVG1-002CHn-9w; Thu, 07 Apr 2022 10:45:18 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ncVG0-0002Dh-8S; Thu, 07 Apr 2022 10:45:16 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Shaohua Li , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 7 Apr 2022 10:45:06 -0600 Message-Id: <20220407164511.8472-4-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220407164511.8472-1-logang@deltatee.com> References: <20220407164511.8472-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, shli@kernel.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v1 3/8] md/raid5: Move common stripe count increment code into __find_stripe() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Both uses of find_stripe() require a fairly complicated dance to increment the reference count. Move this into a common place in __find_stripe() No functional changes intended. Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 118 +++++++++++++++++++-------------------------- 1 file changed, 49 insertions(+), 69 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index e3c75b3b8923..be01c4515f0e 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -605,16 +605,42 @@ static void init_stripe(struct stripe_head *sh, secto= r_t sector, int previous) } =20 static struct stripe_head *__find_stripe(struct r5conf *conf, sector_t sec= tor, - short generation) + short generation, int hash) { + int inc_empty_inactive_list_flag; struct stripe_head *sh; =20 pr_debug("__find_stripe, sector %llu\n", (unsigned long long)sector); hlist_for_each_entry(sh, stripe_hash(conf, sector), hash) if (sh->sector =3D=3D sector && sh->generation =3D=3D generation) - return sh; + goto found; pr_debug("__stripe %llu not in cache\n", (unsigned long long)sector); return NULL; + +found: + if (!atomic_inc_not_zero(&sh->count)) { + spin_lock(&conf->device_lock); + if (!atomic_read(&sh->count)) { + if (!test_bit(STRIPE_HANDLE, &sh->state)) + atomic_inc(&conf->active_stripes); + BUG_ON(list_empty(&sh->lru) && + !test_bit(STRIPE_EXPANDING, &sh->state)); + inc_empty_inactive_list_flag =3D 0; + if (!list_empty(conf->inactive_list + hash)) + inc_empty_inactive_list_flag =3D 1; + list_del_init(&sh->lru); + if (list_empty(conf->inactive_list + hash) && + inc_empty_inactive_list_flag) + atomic_inc(&conf->empty_inactive_list_nr); + if (sh->group) { + sh->group->stripes_cnt--; + sh->group =3D NULL; + } + } + atomic_inc(&sh->count); + spin_unlock(&conf->device_lock); + } + return sh; } =20 /* @@ -705,7 +731,6 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t s= ector, { struct stripe_head *sh; int hash =3D stripe_hash_locks_hash(conf, sector); - int inc_empty_inactive_list_flag; =20 pr_debug("get_stripe, sector %llu\n", (unsigned long long)sector); =20 @@ -715,57 +740,34 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t= sector, wait_event_lock_irq(conf->wait_for_quiescent, conf->quiesce =3D=3D 0 || noquiesce, *(conf->hash_locks + hash)); - sh =3D __find_stripe(conf, sector, conf->generation - previous); - if (!sh) { - if (!test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)) { - sh =3D get_free_stripe(conf, hash); - if (!sh && !test_bit(R5_DID_ALLOC, - &conf->cache_state)) - set_bit(R5_ALLOC_MORE, - &conf->cache_state); - } - if (noblock && sh =3D=3D NULL) - break; + sh =3D __find_stripe(conf, sector, conf->generation - previous, + hash); + if (sh) + break; =20 - r5c_check_stripe_cache_usage(conf); - if (!sh) { - set_bit(R5_INACTIVE_BLOCKED, - &conf->cache_state); - r5l_wake_reclaim(conf->log, 0); - wait_event_lock_irq( - conf->wait_for_stripe, + if (!test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)) { + sh =3D get_free_stripe(conf, hash); + if (!sh && !test_bit(R5_DID_ALLOC, &conf->cache_state)) + set_bit(R5_ALLOC_MORE, &conf->cache_state); + } + if (noblock && !sh) + break; + + r5c_check_stripe_cache_usage(conf); + if (!sh) { + set_bit(R5_INACTIVE_BLOCKED, &conf->cache_state); + r5l_wake_reclaim(conf->log, 0); + wait_event_lock_irq(conf->wait_for_stripe, !list_empty(conf->inactive_list + hash) && (atomic_read(&conf->active_stripes) < (conf->max_nr_stripes * 3 / 4) || !test_bit(R5_INACTIVE_BLOCKED, &conf->cache_state)), *(conf->hash_locks + hash)); - clear_bit(R5_INACTIVE_BLOCKED, - &conf->cache_state); - } else { - init_stripe(sh, sector, previous); - atomic_inc(&sh->count); - } - } else if (!atomic_inc_not_zero(&sh->count)) { - spin_lock(&conf->device_lock); - if (!atomic_read(&sh->count)) { - if (!test_bit(STRIPE_HANDLE, &sh->state)) - atomic_inc(&conf->active_stripes); - BUG_ON(list_empty(&sh->lru) && - !test_bit(STRIPE_EXPANDING, &sh->state)); - inc_empty_inactive_list_flag =3D 0; - if (!list_empty(conf->inactive_list + hash)) - inc_empty_inactive_list_flag =3D 1; - list_del_init(&sh->lru); - if (list_empty(conf->inactive_list + hash) && inc_empty_inactive_list_= flag) - atomic_inc(&conf->empty_inactive_list_nr); - if (sh->group) { - sh->group->stripes_cnt--; - sh->group =3D NULL; - } - } + clear_bit(R5_INACTIVE_BLOCKED, &conf->cache_state); + } else { + init_stripe(sh, sector, previous); atomic_inc(&sh->count); - spin_unlock(&conf->device_lock); } } while (sh =3D=3D NULL); =20 @@ -819,7 +821,6 @@ static void stripe_add_to_batch_list(struct r5conf *con= f, struct stripe_head *sh sector_t head_sector, tmp_sec; int hash; int dd_idx; - int inc_empty_inactive_list_flag; =20 /* Don't cross chunks, so stripe pd_idx/qd_idx is the same */ tmp_sec =3D sh->sector; @@ -829,28 +830,7 @@ static void stripe_add_to_batch_list(struct r5conf *co= nf, struct stripe_head *sh =20 hash =3D stripe_hash_locks_hash(conf, head_sector); spin_lock_irq(conf->hash_locks + hash); - head =3D __find_stripe(conf, head_sector, conf->generation); - if (head && !atomic_inc_not_zero(&head->count)) { - spin_lock(&conf->device_lock); - if (!atomic_read(&head->count)) { - if (!test_bit(STRIPE_HANDLE, &head->state)) - atomic_inc(&conf->active_stripes); - BUG_ON(list_empty(&head->lru) && - !test_bit(STRIPE_EXPANDING, &head->state)); - inc_empty_inactive_list_flag =3D 0; - if (!list_empty(conf->inactive_list + hash)) - inc_empty_inactive_list_flag =3D 1; - list_del_init(&head->lru); - if (list_empty(conf->inactive_list + hash) && inc_empty_inactive_list_f= lag) - atomic_inc(&conf->empty_inactive_list_nr); - if (head->group) { - head->group->stripes_cnt--; - head->group =3D NULL; - } - } - atomic_inc(&head->count); - spin_unlock(&conf->device_lock); - } + head =3D __find_stripe(conf, head_sector, conf->generation, hash); spin_unlock_irq(conf->hash_locks + hash); =20 if (!head) --=20 2.30.2 From nobody Thu May 14 07:12:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34F9BC433EF for ; Thu, 7 Apr 2022 17:23:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346130AbiDGRZL (ORCPT ); Thu, 7 Apr 2022 13:25:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346139AbiDGRYJ (ORCPT ); Thu, 7 Apr 2022 13:24:09 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A38213F42; Thu, 7 Apr 2022 10:22:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=8A2LLz32tCEzRewNnQLykD+6PqGRxl8cxRVvTKPtTeQ=; b=MbTP8kOnFBzPY2nh4TMKjRI/O2 fxolg8y0bbcKWFS/oE8KzsRTLvDVNWeuZyAsSzuu4l/yEPxMQFY5AAHDOiasLJUJlP8D6evspaHDu fIlLlEyJb1Dbq4rwCiPsSG69WF5Gzk5alMcQAvI1r2zFqMXQx6Prvl+qCeYPWDoWKz7mgifxgRhof R6Rs5Y2YV8M7y0krVNAnEk2PdE6T+WDxmtPiRFlywuiltEbKTm2+94Q1kJRf7/q5NR/UqMEJgRPRE RBfTsviOzYOkzhUGwKaDvegje8pgSi1bQl4A+fiW10sRs+hJu4neqMMcDqfctGCQ4qXNHATEtRjKW MxNEW7hg==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ncVG1-002CHo-9v; Thu, 07 Apr 2022 10:45:19 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ncVG0-0002Dk-Dr; Thu, 07 Apr 2022 10:45:16 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Shaohua Li , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 7 Apr 2022 10:45:07 -0600 Message-Id: <20220407164511.8472-5-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220407164511.8472-1-logang@deltatee.com> References: <20220407164511.8472-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, shli@kernel.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v1 4/8] md/raid5: Make common label for schedule/retry in raid5_make_request() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Cleanup the code to make a common label for the schedule, prepare_to_wait() and retry path. This drops the do_prepare boolean. This requires moveing the prepare_to_wait() above the read_seqcount_begin() call on the retry path. However there's no appearant requirement for ordering between these two calls. This should hopefully be easier to read rather than following the extra do_prepare boolean, but it will also be used in a subsequent patch to add more code common to all schedule() calls. Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index be01c4515f0e..f963ffb35484 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5742,7 +5742,6 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) struct stripe_head *sh; const int rw =3D bio_data_dir(bi); DEFINE_WAIT(w); - bool do_prepare; bool do_flush =3D false; =20 if (unlikely(bi->bi_opf & REQ_PREFLUSH)) { @@ -5803,13 +5802,9 @@ static bool raid5_make_request(struct mddev *mddev, = struct bio * bi) int previous; int seq; =20 - do_prepare =3D false; retry: seq =3D read_seqcount_begin(&conf->gen_lock); previous =3D 0; - if (do_prepare) - prepare_to_wait(&conf->wait_for_overlap, &w, - TASK_UNINTERRUPTIBLE); if (unlikely(conf->reshape_progress !=3D MaxSector)) { /* spinlock is needed as reshape_progress may be * 64bit on a 32bit platform, and so it might be @@ -5829,9 +5824,7 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) ? logical_sector < conf->reshape_safe : logical_sector >=3D conf->reshape_safe) { spin_unlock_irq(&conf->device_lock); - schedule(); - do_prepare =3D true; - goto retry; + goto schedule_and_retry; } } spin_unlock_irq(&conf->device_lock); @@ -5872,9 +5865,7 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) spin_unlock_irq(&conf->device_lock); if (must_retry) { raid5_release_stripe(sh); - schedule(); - do_prepare =3D true; - goto retry; + goto schedule_and_retry; } } if (read_seqcount_retry(&conf->gen_lock, seq)) { @@ -5891,9 +5882,7 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) */ md_wakeup_thread(mddev->thread); raid5_release_stripe(sh); - schedule(); - do_prepare =3D true; - goto retry; + goto schedule_and_retry; } =20 if (stripe_can_batch(sh)) @@ -5913,6 +5902,13 @@ static bool raid5_make_request(struct mddev *mddev, = struct bio * bi) atomic_inc(&conf->preread_active_stripes); =20 release_stripe_plug(mddev, sh); + continue; + +schedule_and_retry: + schedule(); + prepare_to_wait(&conf->wait_for_overlap, &w, + TASK_UNINTERRUPTIBLE); + goto retry; } finish_wait(&conf->wait_for_overlap, &w); =20 --=20 2.30.2 From nobody Thu May 14 07:12:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EB2AC433EF for ; Thu, 7 Apr 2022 17:22:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346204AbiDGRYl (ORCPT ); Thu, 7 Apr 2022 13:24:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346116AbiDGRYH (ORCPT ); Thu, 7 Apr 2022 13:24:07 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BADD413F1B; Thu, 7 Apr 2022 10:22:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=FtZ6XZ4sLMZyp/6XFOcX1uRt8DKxESo3vxIZqbe3A+I=; b=GWVWJ5SQ4JmJ4oram3Y9E/GJ/w LUN+JrsZMttV4Qr5cPC8+rcGr5psvXVLf6XkYCMgLcBcrQ1KkgiW7B810RC7gI/rNFT65Avcn6fdA n6bKKZLddlpXhq4SB6IrstBlmjojZwEeOi2z8W9jzo/6mkKFTleJQWq8cwqCr8d3zofjpC4B9CKtc VecN1RE60Nlb4iyxdGuN0lo/P3PDgmrcaz8eipPt6TjkcaCHJFlpUWSHsF21vYjLZ3XYN0YDweRvT 0M0ugKWNtpT2JcQvUvOsYF5c7FLPF0H19gaA1q28vFz7APXBJ/thiZUmXs8gK8dTMIQS6iTMJmKtI sT2zoc+g==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ncVG3-002CHm-Aj; Thu, 07 Apr 2022 10:45:20 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ncVG0-0002Dn-Iy; Thu, 07 Apr 2022 10:45:16 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Shaohua Li , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 7 Apr 2022 10:45:08 -0600 Message-Id: <20220407164511.8472-6-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220407164511.8472-1-logang@deltatee.com> References: <20220407164511.8472-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, shli@kernel.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v1 5/8] md/raid5: Keep a reference to last stripe_head for batch X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When batching, every stripe head has to find the previous stripe head to add to the batch list. This involves taking the hash lock which is highly contended during IO. Instead of finding the previous stripe_head each time, store a reference to the previous stripe_head in a pointer so that it doesn't require taking the contended lock another time. The reference to the previous stripe must be released before scheduling and waiting for work to get done. Otherwise, it can hold up raid5_activate_delayed() and deadlock. Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 51 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 39 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index f963ffb35484..b852b6439898 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -815,7 +815,8 @@ static bool stripe_can_batch(struct stripe_head *sh) } =20 /* we only do back search */ -static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_he= ad *sh) +static void stripe_add_to_batch_list(struct r5conf *conf, + struct stripe_head *sh, struct stripe_head *last_sh) { struct stripe_head *head; sector_t head_sector, tmp_sec; @@ -828,15 +829,20 @@ static void stripe_add_to_batch_list(struct r5conf *c= onf, struct stripe_head *sh return; head_sector =3D sh->sector - RAID5_STRIPE_SECTORS(conf); =20 - hash =3D stripe_hash_locks_hash(conf, head_sector); - spin_lock_irq(conf->hash_locks + hash); - head =3D __find_stripe(conf, head_sector, conf->generation, hash); - spin_unlock_irq(conf->hash_locks + hash); - - if (!head) - return; - if (!stripe_can_batch(head)) - goto out; + if (last_sh && head_sector =3D=3D last_sh->sector) { + head =3D last_sh; + atomic_inc(&head->count); + } else { + hash =3D stripe_hash_locks_hash(conf, head_sector); + spin_lock_irq(conf->hash_locks + hash); + head =3D __find_stripe(conf, head_sector, conf->generation, + hash); + spin_unlock_irq(conf->hash_locks + hash); + if (!head) + return; + if (!stripe_can_batch(head)) + goto out; + } =20 lock_two_stripes(head, sh); /* clear_batch_ready clear the flag */ @@ -5735,6 +5741,7 @@ static void make_discard_request(struct mddev *mddev,= struct bio *bi) =20 static bool raid5_make_request(struct mddev *mddev, struct bio * bi) { + struct stripe_head *batch_last =3D NULL; struct r5conf *conf =3D mddev->private; int dd_idx; sector_t new_sector; @@ -5885,8 +5892,13 @@ static bool raid5_make_request(struct mddev *mddev, = struct bio * bi) goto schedule_and_retry; } =20 - if (stripe_can_batch(sh)) - stripe_add_to_batch_list(conf, sh); + if (stripe_can_batch(sh)) { + stripe_add_to_batch_list(conf, sh, batch_last); + if (batch_last) + raid5_release_stripe(batch_last); + atomic_inc(&sh->count); + batch_last =3D sh; + } =20 if (do_flush) { set_bit(STRIPE_R5C_PREFLUSH, &sh->state); @@ -5905,6 +5917,18 @@ static bool raid5_make_request(struct mddev *mddev, = struct bio * bi) continue; =20 schedule_and_retry: + /* + * Must release the reference to batch_last before + * scheduling and waiting for work to be done, otherwise + * the batch_last stripe head could prevent + * raid5_activate_delayed() from making progress + * and thus deadlocking. + */ + if (batch_last) { + raid5_release_stripe(batch_last); + batch_last =3D NULL; + } + schedule(); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); @@ -5912,6 +5936,9 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) } finish_wait(&conf->wait_for_overlap, &w); =20 + if (batch_last) + raid5_release_stripe(batch_last); + if (rw =3D=3D WRITE) md_write_end(mddev); bio_endio(bi); --=20 2.30.2 From nobody Thu May 14 07:12:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DCFAC433F5 for ; Thu, 7 Apr 2022 17:22:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346199AbiDGRYi (ORCPT ); Thu, 7 Apr 2022 13:24:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346095AbiDGRYB (ORCPT ); Thu, 7 Apr 2022 13:24:01 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9C9713F4A; Thu, 7 Apr 2022 10:22:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=pkS50UtgxUT5WhmNktfEDPXa47O0zigYebLyfkXNQoQ=; b=H/wkw+E12OzeRdh35eq+E/KS9k s8vRzoE21bNMNucd36W4E9V+M09cKWnJjMPPFlLdzE50hGn1iijGO8zJAb12XXYsD8BFqNJNsPjZL Nq4+ARBJTZRt8Z1c1Q5ijpxpDsL9jlnvJsDtdYlhea0YnugV5kRsY3bSER1iXTsB1/h4HS7c/rJNs s0fo0LpT9aT7jgl/uWqK2auPrulABt7erS63/oZ6bVv//xU2rNxwsmFEe9/gNFqdZp+HQw6UuwRgU 1P1iJL/6ndvJNK5DI1VYmXtHEd6tWuMAE+jfqPC3rq5xIPdMnALoIC/1rOAy6tgC0NIXWp51WIrnU oxPcuVbQ==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ncVG2-002CHn-F5; Thu, 07 Apr 2022 10:45:19 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ncVG0-0002Dq-O8; Thu, 07 Apr 2022 10:45:16 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Shaohua Li , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 7 Apr 2022 10:45:09 -0600 Message-Id: <20220407164511.8472-7-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220407164511.8472-1-logang@deltatee.com> References: <20220407164511.8472-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, shli@kernel.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v1 6/8] md/raid5: Refactor add_stripe_bio() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Factor out two helper functions from add_stripe_bio(): one to check for overlap (stripe_bio_overlaps()), and one to actually add the bio to the stripe (__add_stripe_bio()). The latter function will always succeed. This will be useful in the next patch so that overlap can be checked for multiple disks before adding any Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 79 ++++++++++++++++++++++++++++++---------------- 1 file changed, 52 insertions(+), 27 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index b852b6439898..52227dd91e89 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3371,39 +3371,33 @@ schedule_reconstruction(struct stripe_head *sh, str= uct stripe_head_state *s, s->locked, s->ops_request); } =20 -/* - * Each stripe/dev can have one or more bion attached. - * toread/towrite point to the first in a chain. - * The bi_next chain must be in order. - */ -static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_i= dx, - int forwrite, int previous) +static int stripe_bio_overlaps(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite) { - struct bio **bip; struct r5conf *conf =3D sh->raid_conf; - int firstwrite=3D0; + struct bio **bip; =20 - pr_debug("adding bi b#%llu to stripe s#%llu\n", - (unsigned long long)bi->bi_iter.bi_sector, - (unsigned long long)sh->sector); + pr_debug("checking bi b#%llu to stripe s#%llu\n", + (unsigned long long)bi->bi_iter.bi_sector, + (unsigned long long)sh->sector); =20 - spin_lock_irq(&sh->stripe_lock); /* Don't allow new IO added to stripes in batch list */ if (sh->batch_head) - goto overlap; - if (forwrite) { + return 1; + + if (forwrite) bip =3D &sh->dev[dd_idx].towrite; - if (*bip =3D=3D NULL) - firstwrite =3D 1; - } else + else bip =3D &sh->dev[dd_idx].toread; + while (*bip && (*bip)->bi_iter.bi_sector < bi->bi_iter.bi_sector) { if (bio_end_sector(*bip) > bi->bi_iter.bi_sector) - goto overlap; + return 1; bip =3D & (*bip)->bi_next; } + if (*bip && (*bip)->bi_iter.bi_sector < bio_end_sector(bi)) - goto overlap; + return 1; =20 if (forwrite && raid5_has_ppl(conf)) { /* @@ -3432,7 +3426,25 @@ static int add_stripe_bio(struct stripe_head *sh, st= ruct bio *bi, int dd_idx, } =20 if (first + conf->chunk_sectors * (count - 1) !=3D last) - goto overlap; + return 1; + } + + return 0; +} + +static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite, int previous) +{ + struct r5conf *conf =3D sh->raid_conf; + struct bio **bip; + int firstwrite =3D 0; + + if (forwrite) { + bip =3D &sh->dev[dd_idx].towrite; + if (!*bip) + firstwrite =3D 1; + } else { + bip =3D &sh->dev[dd_idx].toread; } =20 if (!forwrite || previous) @@ -3470,7 +3482,7 @@ static int add_stripe_bio(struct stripe_head *sh, str= uct bio *bi, int dd_idx, * we have added to the bitmap and set bm_seq. * So set STRIPE_BITMAP_PENDING to prevent * batching. - * If multiple add_stripe_bio() calls race here they + * If multiple __add_stripe_bio() calls race here they * much all set STRIPE_BITMAP_PENDING. So only the first one * to complete "bitmap_startwrite" gets to set * STRIPE_BIT_DELAY. This is important as once a stripe @@ -3488,14 +3500,27 @@ static int add_stripe_bio(struct stripe_head *sh, s= truct bio *bi, int dd_idx, set_bit(STRIPE_BIT_DELAY, &sh->state); } } - spin_unlock_irq(&sh->stripe_lock); +} =20 - return 1; +/* + * Each stripe/dev can have one or more bion attached. + * toread/towrite point to the first in a chain. + * The bi_next chain must be in order. + */ +static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, + int dd_idx, int forwrite, int previous) +{ + spin_lock_irq(&sh->stripe_lock); =20 - overlap: - set_bit(R5_Overlap, &sh->dev[dd_idx].flags); + if (stripe_bio_overlaps(sh, bi, dd_idx, forwrite)) { + set_bit(R5_Overlap, &sh->dev[dd_idx].flags); + spin_unlock_irq(&sh->stripe_lock); + return 0; + } + + __add_stripe_bio(sh, bi, dd_idx, forwrite, previous); spin_unlock_irq(&sh->stripe_lock); - return 0; + return 1; } =20 static void end_reshape(struct r5conf *conf); --=20 2.30.2 From nobody Thu May 14 07:12:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDFE5C433FE for ; Thu, 7 Apr 2022 16:45:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345814AbiDGQr1 (ORCPT ); Thu, 7 Apr 2022 12:47:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233048AbiDGQrY (ORCPT ); Thu, 7 Apr 2022 12:47:24 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1CFC1C7C06 for ; Thu, 7 Apr 2022 09:45:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=GXFnpjQYBIWFRco3UvVRDWrcEOzkin/loJqwgahQP5M=; b=LjtLKi2PHqcacczp6eBGkdioNd rp+Gvt6m3C3jIYLAiFWAJ/7WRyjTWzW9D2sNHlIxbBYZEQiYq4uhteaSkjDbG8M29xWEv51G1Vw0S QeRx6CL9r5v4Ffah2e2cT40PYq/ENgPupl6tKckUvy3jSoMp3eVifiQOxna3e7v1ttKGiHrsxfFTc gZkgVoGcLblw+eoFH5xuQpUBqyw65aqHUv9OzPg1UtYavfrvIl+OZOPjk2vpL5vDGNbUPR/ms/Z7l 7l0L3IkGDN2qF12c4ofg617HLXD+lA3Ml5o6ed32V1Pzs2Ou8Ff1i4gu4IGsGeYWkgjeeh6uykWPZ /cpYJyzg==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ncVG3-002CHk-8h; Thu, 07 Apr 2022 10:45:20 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ncVG0-0002Dt-TO; Thu, 07 Apr 2022 10:45:16 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Shaohua Li , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 7 Apr 2022 10:45:10 -0600 Message-Id: <20220407164511.8472-8-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220407164511.8472-1-logang@deltatee.com> References: <20220407164511.8472-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, shli@kernel.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v1 7/8] md/raid5: Check all disks in a stripe_head for reshape progress X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When testing if a previous stripe has had reshape expand past it, use the earliest or latest logical sector in all the disks for that stripe head. This will allow adding multiple disks at a time in a subesquent patch. To do this cleaner, refactor the check into a helper function called stripe_ahead_of_reshape(). Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 47 +++++++++++++++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 52227dd91e89..1ddce09970fa 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5764,6 +5764,33 @@ static void make_discard_request(struct mddev *mddev= , struct bio *bi) bio_endio(bi); } =20 +static bool stripe_ahead_of_reshape(struct mddev *mddev, struct r5conf *co= nf, + struct stripe_head *sh) +{ + sector_t max_sector =3D 0, min_sector =3D MaxSector; + int dd_idx, ret =3D 0; + + for (dd_idx =3D 0; dd_idx < sh->disks; dd_idx++) { + if (dd_idx =3D=3D sh->pd_idx) + continue; + + min_sector =3D min(min_sector, sh->dev[dd_idx].sector); + max_sector =3D min(max_sector, sh->dev[dd_idx].sector); + } + + spin_lock_irq(&conf->device_lock); + + if (mddev->reshape_backwards + ? max_sector >=3D conf->reshape_progress + : min_sector < conf->reshape_progress) + /* mismatch, need to try again */ + ret =3D 1; + + spin_unlock_irq(&conf->device_lock); + + return ret; +} + static bool raid5_make_request(struct mddev *mddev, struct bio * bi) { struct stripe_head *batch_last =3D NULL; @@ -5877,28 +5904,18 @@ static bool raid5_make_request(struct mddev *mddev,= struct bio * bi) break; } =20 - if (unlikely(previous)) { + if (unlikely(previous) && + stripe_ahead_of_reshape(mddev, conf, sh)) { /* - * Expansion might have moved on while waiting for a - * stripe, so we must do the range check again. + * Expansion moved on while waiting for a stripe. * Expansion could still move past after this * test, but as we are holding a reference to * 'sh', we know that if that happens, * STRIPE_EXPANDING will get set and the expansion * won't proceed until we finish with the stripe. */ - int must_retry =3D 0; - spin_lock_irq(&conf->device_lock); - if (mddev->reshape_backwards - ? logical_sector >=3D conf->reshape_progress - : logical_sector < conf->reshape_progress) - /* mismatch, need to try again */ - must_retry =3D 1; - spin_unlock_irq(&conf->device_lock); - if (must_retry) { - raid5_release_stripe(sh); - goto schedule_and_retry; - } + raid5_release_stripe(sh); + goto schedule_and_retry; } if (read_seqcount_retry(&conf->gen_lock, seq)) { /* Might have got the wrong stripe_head by accident */ --=20 2.30.2 From nobody Thu May 14 07:12:52 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0262C433F5 for ; Thu, 7 Apr 2022 16:45:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345831AbiDGQrb (ORCPT ); Thu, 7 Apr 2022 12:47:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230186AbiDGQrY (ORCPT ); Thu, 7 Apr 2022 12:47:24 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 071D01C7B97 for ; Thu, 7 Apr 2022 09:45:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=hnwyp44gPiKRkRVvUhzcljMO9JeoPP9ZZWxoPK4J/7A=; b=PVfzazSzCmmdxIjJq7KA2p85I3 BlMcdw4YBoYfMtr7W4osyQKgIlJJ41KC8IAD9jzOiAeRkfn8A8FMt0AuEWsgU+PpZmi67kvDwZYhD NhX5REbJMvRgXYstb2SHkXp1lM+fHvtZL0zI9wgxjBxRltuN+42gN06w6fmlSV0qy89olXmN68IrU IO2dE9bg5DStihZA9kzSkuxwEPNRTA2dYXF3mmvyStMN/wMMOkajDUqtta2OdVQqQ403v5FY7kUXo vHN1AofhDMka92JbrYOboTj0yD9+5d+GD1Fy0k8d+89j8jNVSflPhsnVuqR/8zzm4JMG6nUvsWsgC YKgckQPw==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ncVG2-002CHl-Dl; Thu, 07 Apr 2022 10:45:19 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1ncVG1-0002Dx-2o; Thu, 07 Apr 2022 10:45:17 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, Song Liu Cc: Shaohua Li , Guoqing Jiang , Stephen Bates , Martin Oliveira , David Sloan , Logan Gunthorpe Date: Thu, 7 Apr 2022 10:45:11 -0600 Message-Id: <20220407164511.8472-9-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220407164511.8472-1-logang@deltatee.com> References: <20220407164511.8472-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, song@kernel.org, shli@kernel.org, guoqing.jiang@linux.dev, sbates@raithlin.com, Martin.Oliveira@eideticom.com, David.Sloan@eideticom.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com Subject: [PATCH v1 8/8] md/raid5: Pivot raid5_make_request() X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" raid5_make_request() loops through every page in the request, finds the appropriate stripe and adds the bio for that page in the disk. This causes a great deal of contention on the hash_lock seeing the lock for that hash must be taken for every single page. The number of times the lock is taken can be reduced by pivoting raid5_make_request() so that it loops through every stripe and then loops through every disk in that stripe to see if the bio must be added. This reduces the number of times the lock must be taken by a factor equal to the number of data disks. To accomplish this, store the disk sector that has currently been finished and continue to the next logical sector if the disk sector has already been done. Then add a add_all_stripe_bios() to check all the bios for overlap and add them all if none of them overlap. Signed-off-by: Logan Gunthorpe --- drivers/md/raid5.c | 53 ++++++++++++++++++++++++++++++++++++++++++++-- drivers/md/raid5.h | 1 + 2 files changed, 52 insertions(+), 2 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 1ddce09970fa..6b098819f7db 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3523,6 +3523,48 @@ static int add_stripe_bio(struct stripe_head *sh, st= ruct bio *bi, return 1; } =20 +static int add_all_stripe_bios(struct stripe_head *sh, struct bio *bi, + sector_t first_logical_sector, sector_t last_sector, + int forwrite, int previous) +{ + int dd_idx; + int ret =3D 1; + + spin_lock_irq(&sh->stripe_lock); + + for (dd_idx =3D 0; dd_idx < sh->disks; dd_idx++) { + struct r5dev *dev =3D &sh->dev[dd_idx]; + + clear_bit(R5_BioReady, &dev->flags); + + if (dd_idx =3D=3D sh->pd_idx) + continue; + + if (dev->sector < first_logical_sector || + dev->sector >=3D last_sector) + continue; + + if (stripe_bio_overlaps(sh, bi, dd_idx, forwrite)) { + set_bit(R5_Overlap, &dev->flags); + ret =3D 0; + continue; + } + + set_bit(R5_BioReady, &dev->flags); + } + + if (!ret) + goto out; + + for (dd_idx =3D 0; dd_idx < sh->disks; dd_idx++) + if (test_bit(R5_BioReady, &sh->dev[dd_idx].flags)) + __add_stripe_bio(sh, bi, dd_idx, forwrite, previous); + +out: + spin_unlock_irq(&sh->stripe_lock); + return ret; +} + static void end_reshape(struct r5conf *conf); =20 static void stripe_set_idx(sector_t stripe, struct r5conf *conf, int previ= ous, @@ -5796,7 +5838,7 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) struct stripe_head *batch_last =3D NULL; struct r5conf *conf =3D mddev->private; int dd_idx; - sector_t new_sector; + sector_t new_sector, disk_sector =3D MaxSector; sector_t logical_sector, last_sector; struct stripe_head *sh; const int rw =3D bio_data_dir(bi); @@ -5855,6 +5897,7 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) md_write_end(mddev); return true; } + md_account_bio(mddev, &bi); prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE); for (; logical_sector < last_sector; logical_sector +=3D RAID5_STRIPE_SEC= TORS(conf)) { @@ -5892,6 +5935,9 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) new_sector =3D raid5_compute_sector(conf, logical_sector, previous, &dd_idx, NULL); + if (disk_sector !=3D MaxSector && new_sector <=3D disk_sector) + continue; + pr_debug("raid456: raid5_make_request, sector %llu logical %llu\n", (unsigned long long)new_sector, (unsigned long long)logical_sector); @@ -5924,7 +5970,8 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) } =20 if (test_bit(STRIPE_EXPANDING, &sh->state) || - !add_stripe_bio(sh, bi, dd_idx, rw, previous)) { + !add_all_stripe_bios(sh, bi, logical_sector, + last_sector, rw, previous)) { /* * Stripe is busy expanding or add failed due to * overlap. Flush everything and wait a while. @@ -5934,6 +5981,8 @@ static bool raid5_make_request(struct mddev *mddev, s= truct bio * bi) goto schedule_and_retry; } =20 + disk_sector =3D new_sector; + if (stripe_can_batch(sh)) { stripe_add_to_batch_list(conf, sh, batch_last); if (batch_last) diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 9e8486a9e445..6f1ef9c75504 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -308,6 +308,7 @@ enum r5dev_flags { R5_Wantwrite, R5_Overlap, /* There is a pending overlapping request * on this block */ + R5_BioReady, /* The current bio can be added to this disk */ R5_ReadNoMerge, /* prevent bio from merging in block-layer */ R5_ReadError, /* seen a read error here recently */ R5_ReWrite, /* have tried to over-write the readerror */ --=20 2.30.2