From nobody Sun Sep 14 08:25:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D85BC27C76 for ; Wed, 25 Jan 2023 13:45:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235657AbjAYNpT (ORCPT ); Wed, 25 Jan 2023 08:45:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234997AbjAYNpQ (ORCPT ); Wed, 25 Jan 2023 08:45:16 -0500 Received: from outbound-smtp57.blacknight.com (outbound-smtp57.blacknight.com [46.22.136.241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B0B958642 for ; Wed, 25 Jan 2023 05:44:59 -0800 (PST) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp57.blacknight.com (Postfix) with ESMTPS id D07F3FAB20 for ; Wed, 25 Jan 2023 13:44:57 +0000 (GMT) Received: (qmail 20634 invoked from network); 25 Jan 2023 13:44:57 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 25 Jan 2023 13:44:57 -0000 From: Mel Gorman To: Vlastimil Babka Cc: Andrew Morton , Jiri Slaby , Maxim Levitsky , Michal Hocko , Pedro Falcato , Paolo Bonzini , Chuyi Zhou , Linux-MM , LKML , Mel Gorman Subject: [PATCH 1/4] mm, compaction: Rename compact_control->rescan to finish_pageblock Date: Wed, 25 Jan 2023 13:44:31 +0000 Message-Id: <20230125134434.18017-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230125134434.18017-1-mgorman@techsingularity.net> References: <20230125134434.18017-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The rescan field was not well named albeit accurate at the time. Rename the field to finish_pageblock to indicate that the remainder of the pageblock should be scanned regardless of COMPACT_CLUSTER_MAX. The intent is that pageblocks with transient failures get marked for skipping to avoid revisiting the same pageblock. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 24 ++++++++++++------------ mm/internal.h | 6 +++++- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index ca1603524bbe..c018b0e65720 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1102,12 +1102,12 @@ isolate_migratepages_block(struct compact_control *= cc, unsigned long low_pfn, =20 /* * Avoid isolating too much unless this block is being - * rescanned (e.g. dirty/writeback pages, parallel allocation) + * fully scanned (e.g. dirty/writeback pages, parallel allocation) * or a lock is contended. For contention, isolate quickly to * potentially remove one source of contention. */ if (cc->nr_migratepages >=3D COMPACT_CLUSTER_MAX && - !cc->rescan && !cc->contended) { + !cc->finish_pageblock && !cc->contended) { ++low_pfn; break; } @@ -1172,14 +1172,14 @@ isolate_migratepages_block(struct compact_control *= cc, unsigned long low_pfn, } =20 /* - * Updated the cached scanner pfn once the pageblock has been scanned + * Update the cached scanner pfn once the pageblock has been scanned. * Pages will either be migrated in which case there is no point * scanning in the near future or migration failed in which case the * failure reason may persist. The block is marked for skipping if * there were no pages isolated in the block or if the block is * rescanned twice in a row. */ - if (low_pfn =3D=3D end_pfn && (!nr_isolated || cc->rescan)) { + if (low_pfn =3D=3D end_pfn && (!nr_isolated || cc->finish_pageblock)) { if (valid_page && !skip_updated) set_pageblock_skip(valid_page); update_cached_migrate(cc, low_pfn); @@ -2374,17 +2374,17 @@ compact_zone(struct compact_control *cc, struct cap= ture_control *capc) unsigned long iteration_start_pfn =3D cc->migrate_pfn; =20 /* - * Avoid multiple rescans which can happen if a page cannot be - * isolated (dirty/writeback in async mode) or if the migrated - * pages are being allocated before the pageblock is cleared. - * The first rescan will capture the entire pageblock for - * migration. If it fails, it'll be marked skip and scanning - * will proceed as normal. + * Avoid multiple rescans of the same pageblock which can + * happen if a page cannot be isolated (dirty/writeback in + * async mode) or if the migrated pages are being allocated + * before the pageblock is cleared. The first rescan will + * capture the entire pageblock for migration. If it fails, + * it'll be marked skip and scanning will proceed as normal. */ - cc->rescan =3D false; + cc->finish_pageblock =3D false; if (pageblock_start_pfn(last_migrated_pfn) =3D=3D pageblock_start_pfn(iteration_start_pfn)) { - cc->rescan =3D true; + cc->finish_pageblock =3D true; } =20 switch (isolate_migratepages(cc)) { diff --git a/mm/internal.h b/mm/internal.h index bcf75a8b032d..21466d0ab22f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -422,7 +422,11 @@ struct compact_control { bool proactive_compaction; /* kcompactd proactive compaction */ bool whole_zone; /* Whole zone should/has been scanned */ bool contended; /* Signal lock contention */ - bool rescan; /* Rescanning the same pageblock */ + bool finish_pageblock; /* Scan the remainder of a pageblock. Used + * when there are potentially transient + * isolation or migration failures to + * ensure forward progress. + */ bool alloc_contig; /* alloc_contig_range allocation */ }; =20 --=20 2.35.3 From nobody Sun Sep 14 08:25:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2110C27C76 for ; Wed, 25 Jan 2023 13:45:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235668AbjAYNpW (ORCPT ); Wed, 25 Jan 2023 08:45:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234742AbjAYNpS (ORCPT ); Wed, 25 Jan 2023 08:45:18 -0500 Received: from outbound-smtp41.blacknight.com (outbound-smtp41.blacknight.com [46.22.139.224]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D9F814239 for ; Wed, 25 Jan 2023 05:45:09 -0800 (PST) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp41.blacknight.com (Postfix) with ESMTPS id 7E3781F90 for ; Wed, 25 Jan 2023 13:45:08 +0000 (GMT) Received: (qmail 21346 invoked from network); 25 Jan 2023 13:45:08 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 25 Jan 2023 13:45:07 -0000 From: Mel Gorman To: Vlastimil Babka Cc: Andrew Morton , Jiri Slaby , Maxim Levitsky , Michal Hocko , Pedro Falcato , Paolo Bonzini , Chuyi Zhou , Linux-MM , LKML , Mel Gorman Subject: [PATCH 2/4] mm, compaction: Check if a page has been captured before draining PCP pages Date: Wed, 25 Jan 2023 13:44:32 +0000 Message-Id: <20230125134434.18017-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230125134434.18017-1-mgorman@techsingularity.net> References: <20230125134434.18017-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If a page has been captured then draining is unnecssary so check first for a captured page. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index c018b0e65720..28711a21a8a2 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2441,6 +2441,12 @@ compact_zone(struct compact_control *cc, struct capt= ure_control *capc) } } =20 + /* Stop if a page has been captured */ + if (capc && capc->page) { + ret =3D COMPACT_SUCCESS; + break; + } + check_drain: /* * Has the migration scanner moved away from the previous @@ -2459,12 +2465,6 @@ compact_zone(struct compact_control *cc, struct capt= ure_control *capc) last_migrated_pfn =3D 0; } } - - /* Stop if a page has been captured */ - if (capc && capc->page) { - ret =3D COMPACT_SUCCESS; - break; - } } =20 out: --=20 2.35.3 From nobody Sun Sep 14 08:25:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4830FC54E94 for ; Wed, 25 Jan 2023 13:45:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235555AbjAYNpa (ORCPT ); Wed, 25 Jan 2023 08:45:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235651AbjAYNpV (ORCPT ); Wed, 25 Jan 2023 08:45:21 -0500 Received: from outbound-smtp28.blacknight.com (outbound-smtp28.blacknight.com [81.17.249.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5292913503 for ; Wed, 25 Jan 2023 05:45:20 -0800 (PST) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp28.blacknight.com (Postfix) with ESMTPS id CA76246026 for ; Wed, 25 Jan 2023 13:45:18 +0000 (GMT) Received: (qmail 21935 invoked from network); 25 Jan 2023 13:45:18 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 25 Jan 2023 13:45:18 -0000 From: Mel Gorman To: Vlastimil Babka Cc: Andrew Morton , Jiri Slaby , Maxim Levitsky , Michal Hocko , Pedro Falcato , Paolo Bonzini , Chuyi Zhou , Linux-MM , LKML , Mel Gorman Subject: [PATCH 3/4] mm, compaction: Finish scanning the current pageblock if requested Date: Wed, 25 Jan 2023 13:44:33 +0000 Message-Id: <20230125134434.18017-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230125134434.18017-1-mgorman@techsingularity.net> References: <20230125134434.18017-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" cc->finish_pageblock is set when the current pageblock should be rescanned but fast_find_migrateblock can select an alternative block. Disable fast_find_migrateblock when the current pageblock scan should be completed. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 28711a21a8a2..4b3a0238879c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1762,6 +1762,13 @@ static unsigned long fast_find_migrateblock(struct c= ompact_control *cc) if (cc->ignore_skip_hint) return pfn; =20 + /* + * If the pageblock should be finished then do not select a different + * pageblock. + */ + if (cc->finish_pageblock) + return pfn; + /* * If the migrate_pfn is not at the start of a zone or the start * of a pageblock then assume this is a continuation of a previous --=20 2.35.3 From nobody Sun Sep 14 08:25:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8422C54E94 for ; Wed, 25 Jan 2023 13:45:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235458AbjAYNpq (ORCPT ); Wed, 25 Jan 2023 08:45:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235404AbjAYNpl (ORCPT ); Wed, 25 Jan 2023 08:45:41 -0500 Received: from outbound-smtp26.blacknight.com (outbound-smtp26.blacknight.com [81.17.249.194]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 669823AA0 for ; Wed, 25 Jan 2023 05:45:30 -0800 (PST) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp26.blacknight.com (Postfix) with ESMTPS id ED3771E038 for ; Wed, 25 Jan 2023 13:45:28 +0000 (GMT) Received: (qmail 22502 invoked from network); 25 Jan 2023 13:45:28 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 25 Jan 2023 13:45:28 -0000 From: Mel Gorman To: Vlastimil Babka Cc: Andrew Morton , Jiri Slaby , Maxim Levitsky , Michal Hocko , Pedro Falcato , Paolo Bonzini , Chuyi Zhou , Linux-MM , LKML , Mel Gorman Subject: [PATCH 4/4] mm, compaction: Finish pageblocks on complete migration failure Date: Wed, 25 Jan 2023 13:44:34 +0000 Message-Id: <20230125134434.18017-5-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230125134434.18017-1-mgorman@techsingularity.net> References: <20230125134434.18017-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Commit 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock") address an issue where a pageblock selected by fast_find_migrateblock() was ignored. Unfortunately, the same fix resulted in numerous reports of khugepaged or kcompactd stalling for long periods of time or consuming 100% of CPU. Tracing showed that there was a lot of rescanning between a small subset of pageblocks because the conditions for marking the block skip are not met. The scan is not reaching the end of the pageblock because enough pages were isolated but none were migrated successfully. Eventually it circles back to the same block. Pageblock skip tracking tries to minimise both latency and excessive scanning but tracking exactly when a block is fully scanned requires an excessive amount of state. This patch forcibly rescans a pageblock when all isolated pages fail to migrate even though it could be for transient reasons such as page writeback or page dirty. This will sometimes migrate too many pages but pageblocks will be marked skip and forward progress will be made. "Usemen" from the mmtests configuration workload-usemem-stress-numa-compact was used to stress compaction. The compaction trace events were recorded using a 6.2-rc5 kernel that includes commit 7efc3b726103 and count of unique ranges were measured. The top 5 ranges were 3076 range=3D(0x10ca00-0x10cc00) 3076 range=3D(0x110a00-0x110c00) 3098 range=3D(0x13b600-0x13b800) 3104 range=3D(0x141c00-0x141e00) 11424 range=3D(0x11b600-0x11b800) While this workload is very different than what the bugs reported, the pattern of the same subset of blocks being repeatedly scanned is observed. At one point, *only* the range range=3D(0x11b600 ~ 0x11b800) was scanned for 2 seconds. 14 seconds passed between the first migration-related event and the last. With the series applied including this patch, the top 5 ranges were 1 range=3D(0x11607e-0x116200) 1 range=3D(0x116200-0x116278) 1 range=3D(0x116278-0x116400) 1 range=3D(0x116400-0x116424) 1 range=3D(0x116424-0x116600) Only unique ranges were scanned and the time between the first migration-related event was 0.11 milliseconds. Fixes: 7efc3b726103 ("mm/compaction: fix set skip in fast_find_migrateblock= ") Signed-off-by: Mel Gorman --- mm/compaction.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 4b3a0238879c..937ec2f05f2c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2394,6 +2394,7 @@ compact_zone(struct compact_control *cc, struct captu= re_control *capc) cc->finish_pageblock =3D true; } =20 +rescan: switch (isolate_migratepages(cc)) { case ISOLATE_ABORT: ret =3D COMPACT_CONTENDED; @@ -2436,15 +2437,28 @@ compact_zone(struct compact_control *cc, struct cap= ture_control *capc) goto out; } /* - * We failed to migrate at least one page in the current - * order-aligned block, so skip the rest of it. + * If an ASYNC or SYNC_LIGHT fails to migrate a page + * within the current order-aligned block, scan the + * remainder of the pageblock. This will mark the + * pageblock "skip" to avoid rescanning in the near + * future. This will isolate more pages than necessary + * for the request but avoid loops due to + * fast_find_migrateblock revisiting blocks that were + * recently partially scanned. */ - if (cc->direct_compaction && - (cc->mode =3D=3D MIGRATE_ASYNC)) { - cc->migrate_pfn =3D block_end_pfn( - cc->migrate_pfn - 1, cc->order); - /* Draining pcplists is useless in this case */ - last_migrated_pfn =3D 0; + if (cc->direct_compaction && !cc->finish_pageblock && + (cc->mode < MIGRATE_SYNC)) { + cc->finish_pageblock =3D true; + + /* + * Draining pcplists does not help THP if + * any page failed to migrate. Even after + * drain, the pageblock will not be free. + */ + if (cc->order =3D=3D COMPACTION_HPAGE_ORDER) + last_migrated_pfn =3D 0; + + goto rescan; } } =20 --=20 2.35.3