From nobody Mon Sep 15 23:03:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4480C678DE for ; Mon, 9 Jan 2023 15:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237329AbjAIPS6 (ORCPT ); Mon, 9 Jan 2023 10:18:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236916AbjAIPRr (ORCPT ); Mon, 9 Jan 2023 10:17:47 -0500 Received: from outbound-smtp24.blacknight.com (outbound-smtp24.blacknight.com [81.17.249.192]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5CD238AE4 for ; Mon, 9 Jan 2023 07:16:55 -0800 (PST) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp24.blacknight.com (Postfix) with ESMTPS id 7D6D2C0D26 for ; Mon, 9 Jan 2023 15:16:54 +0000 (GMT) Received: (qmail 16592 invoked from network); 9 Jan 2023 15:16:54 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 9 Jan 2023 15:16:54 -0000 From: Mel Gorman To: Linux-MM Cc: Andrew Morton , Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , LKML , Mel Gorman Subject: [PATCH 1/7] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Date: Mon, 9 Jan 2023 15:16:25 +0000 Message-Id: <20230109151631.24923-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230109151631.24923-1-mgorman@techsingularity.net> References: <20230109151631.24923-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" __GFP_HIGH aliases to ALLOC_HIGH but the name does not really hint what it means. As ALLOC_HIGH is internal to the allocator, rename it to ALLOC_MIN_RESERVE to document that the min reserves can be depleted. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- mm/internal.h | 4 +++- mm/page_alloc.c | 8 ++++---- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index bcf75a8b032d..403e4386626d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -736,7 +736,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone = *zone, #endif =20 #define ALLOC_HARDER 0x10 /* try to alloc harder */ -#define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ +#define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50% + * of the min watermark. + */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ #define ALLOC_CMA 0x80 /* allow allocations from CMA areas */ #ifdef CONFIG_ZONE_DMA32 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0745aedebb37..244c1e675dc8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3976,7 +3976,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int= order, unsigned long mark, /* free_pages may go negative - that's OK */ free_pages -=3D __zone_watermark_unusable_free(z, order, alloc_flags); =20 - if (alloc_flags & ALLOC_HIGH) + if (alloc_flags & ALLOC_MIN_RESERVE) min -=3D min / 2; =20 if (unlikely(alloc_harder)) { @@ -4818,18 +4818,18 @@ gfp_to_alloc_flags(gfp_t gfp_mask) unsigned int alloc_flags =3D ALLOC_WMARK_MIN | ALLOC_CPUSET; =20 /* - * __GFP_HIGH is assumed to be the same as ALLOC_HIGH + * __GFP_HIGH is assumed to be the same as ALLOC_MIN_RESERVE * and __GFP_KSWAPD_RECLAIM is assumed to be the same as ALLOC_KSWAPD * to save two branches. */ - BUILD_BUG_ON(__GFP_HIGH !=3D (__force gfp_t) ALLOC_HIGH); + BUILD_BUG_ON(__GFP_HIGH !=3D (__force gfp_t) ALLOC_MIN_RESERVE); BUILD_BUG_ON(__GFP_KSWAPD_RECLAIM !=3D (__force gfp_t) ALLOC_KSWAPD); =20 /* * The caller may dip into page reserves a bit more if the caller * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will - * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH). + * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH). */ alloc_flags |=3D (__force int) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); --=20 2.35.3 From nobody Mon Sep 15 23:03:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9785BC678D9 for ; Mon, 9 Jan 2023 15:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237266AbjAIPSh (ORCPT ); Mon, 9 Jan 2023 10:18:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237125AbjAIPRv (ORCPT ); Mon, 9 Jan 2023 10:17:51 -0500 Received: from outbound-smtp46.blacknight.com (outbound-smtp46.blacknight.com [46.22.136.58]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E68C36312 for ; Mon, 9 Jan 2023 07:17:06 -0800 (PST) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp46.blacknight.com (Postfix) with ESMTPS id 1858EFABB0 for ; Mon, 9 Jan 2023 15:17:05 +0000 (GMT) Received: (qmail 17223 invoked from network); 9 Jan 2023 15:17:04 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 9 Jan 2023 15:17:04 -0000 From: Mel Gorman To: Linux-MM Cc: Andrew Morton , Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , LKML , Mel Gorman Subject: [PATCH 2/7] mm/page_alloc: Treat RT tasks similar to __GFP_HIGH Date: Mon, 9 Jan 2023 15:16:26 +0000 Message-Id: <20230109151631.24923-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230109151631.24923-1-mgorman@techsingularity.net> References: <20230109151631.24923-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" RT tasks are allowed to dip below the min reserve but ALLOC_HARDER is typically combined with ALLOC_MIN_RESERVE so RT tasks are a little unusual. While there is some justification for allowing RT tasks access to memory reserves, there is a strong chance that a RT task that is also under memory pressure is at risk of missing deadlines anyway. Relax how much reserves an RT task can access by treating it the same as __GFP_HIGH allocations. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 244c1e675dc8..0040b4e00913 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4847,7 +4847,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) */ alloc_flags &=3D ~ALLOC_CPUSET; } else if (unlikely(rt_task(current)) && in_task()) - alloc_flags |=3D ALLOC_HARDER; + alloc_flags |=3D ALLOC_MIN_RESERVE; =20 alloc_flags =3D gfp_to_alloc_flags_cma(gfp_mask, alloc_flags); =20 --=20 2.35.3 From nobody Mon Sep 15 23:03:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4727BC678D6 for ; Mon, 9 Jan 2023 15:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237152AbjAIPS3 (ORCPT ); Mon, 9 Jan 2023 10:18:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237180AbjAIPRz (ORCPT ); Mon, 9 Jan 2023 10:17:55 -0500 Received: from outbound-smtp17.blacknight.com (outbound-smtp17.blacknight.com [46.22.139.234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA63E1A1 for ; Mon, 9 Jan 2023 07:17:16 -0800 (PST) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp17.blacknight.com (Postfix) with ESMTPS id 37BFB1C40FC for ; Mon, 9 Jan 2023 15:17:15 +0000 (GMT) Received: (qmail 17642 invoked from network); 9 Jan 2023 15:17:15 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 9 Jan 2023 15:17:15 -0000 From: Mel Gorman To: Linux-MM Cc: Andrew Morton , Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , LKML , Mel Gorman Subject: [PATCH 3/7] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Date: Mon, 9 Jan 2023 15:16:27 +0000 Message-Id: <20230109151631.24923-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230109151631.24923-1-mgorman@techsingularity.net> References: <20230109151631.24923-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" A high-order ALLOC_HARDER allocation is assumed to be atomic. While that is accurate, it changes later in the series. In preparation, explicitly record high-order atomic allocations in gfp_to_alloc_flags(). There is a slight functional change in that OOM handling avoids using high-order reserve until it has to. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/internal.h | 1 + mm/page_alloc.c | 29 +++++++++++++++++++++++------ 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 403e4386626d..178484d9fd94 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -746,6 +746,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone = *zone, #else #define ALLOC_NOFRAGMENT 0x0 #endif +#define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAI= M set */ =20 enum ttu_flags; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0040b4e00913..0ef4f3236a5a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3706,10 +3706,20 @@ struct page *rmqueue_buddy(struct zone *preferred_z= one, struct zone *zone, * reserved for high-order atomic allocation, so order-0 * request should skip it. */ - if (order > 0 && alloc_flags & ALLOC_HARDER) + if (alloc_flags & ALLOC_HIGHATOMIC) page =3D __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { page =3D __rmqueue(zone, order, migratetype, alloc_flags); + + /* + * If the allocation fails, allow OOM handling access + * to HIGHATOMIC reserves as failing now is worse than + * failing a high-order atomic allocation in the + * future. + */ + if (!page && (alloc_flags & ALLOC_OOM)) + page =3D __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); + if (!page) { spin_unlock_irqrestore(&zone->lock, flags); return NULL; @@ -4023,8 +4033,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned in= t order, unsigned long mark, return true; } #endif - if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC)) + if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) && + !free_area_empty(area, MIGRATE_HIGHATOMIC)) { return true; + } } return false; } @@ -4286,7 +4298,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int o= rder, int alloc_flags, * If this is a high-order atomic allocation then check * if the pageblock should be reserved for the future */ - if (unlikely(order && (alloc_flags & ALLOC_HARDER))) + if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) reserve_highatomic_pageblock(page, zone, order); =20 return page; @@ -4813,7 +4825,7 @@ static void wake_all_kswapds(unsigned int order, gfp_= t gfp_mask, } =20 static inline unsigned int -gfp_to_alloc_flags(gfp_t gfp_mask) +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) { unsigned int alloc_flags =3D ALLOC_WMARK_MIN | ALLOC_CPUSET; =20 @@ -4839,8 +4851,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask) * Not worth trying to allocate harder for __GFP_NOMEMALLOC even * if it can't schedule. */ - if (!(gfp_mask & __GFP_NOMEMALLOC)) + if (!(gfp_mask & __GFP_NOMEMALLOC)) { alloc_flags |=3D ALLOC_HARDER; + + if (order > 0) + alloc_flags |=3D ALLOC_HIGHATOMIC; + } + /* * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the * comment for __cpuset_node_allowed(). @@ -5048,7 +5065,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, * kswapd needs to be woken up, and to avoid the cost of setting up * alloc_flags precisely. So we do that now. */ - alloc_flags =3D gfp_to_alloc_flags(gfp_mask); + alloc_flags =3D gfp_to_alloc_flags(gfp_mask, order); =20 /* * We need to recalculate the starting point for the zonelist iterator --=20 2.35.3 From nobody Mon Sep 15 23:03:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE970C678DB for ; Mon, 9 Jan 2023 15:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237316AbjAIPSy (ORCPT ); Mon, 9 Jan 2023 10:18:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237181AbjAIPRz (ORCPT ); Mon, 9 Jan 2023 10:17:55 -0500 Received: from outbound-smtp39.blacknight.com (outbound-smtp39.blacknight.com [46.22.139.222]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D074D32E for ; Mon, 9 Jan 2023 07:17:26 -0800 (PST) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp39.blacknight.com (Postfix) with ESMTPS id 775D71FB2 for ; Mon, 9 Jan 2023 15:17:25 +0000 (GMT) Received: (qmail 18139 invoked from network); 9 Jan 2023 15:17:25 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 9 Jan 2023 15:17:25 -0000 From: Mel Gorman To: Linux-MM Cc: Andrew Morton , Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , LKML , Mel Gorman Subject: [PATCH 4/7] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Date: Mon, 9 Jan 2023 15:16:28 +0000 Message-Id: <20230109151631.24923-5-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230109151631.24923-1-mgorman@techsingularity.net> References: <20230109151631.24923-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" As there are more ALLOC_ flags that affect reserves, define what flags affect reserves and clarify the effect of each flag. Signed-off-by: Mel Gorman Acked-by: Michal Hocko Acked-by: Vlastimil Babka --- mm/internal.h | 3 +++ mm/page_alloc.c | 34 ++++++++++++++++++++++------------ 2 files changed, 25 insertions(+), 12 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 178484d9fd94..8706d46863df 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -749,6 +749,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone = *zone, #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAI= M set */ =20 +/* Flags that allow allocations below the min watermark. */ +#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|AL= LOC_OOM) + enum ttu_flags; struct tlbflush_unmap_batch; =20 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0ef4f3236a5a..6f41b84a97ac 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3949,15 +3949,14 @@ ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE); static inline long __zone_watermark_unusable_free(struct zone *z, unsigned int order, unsigned int alloc_flags) { - const bool alloc_harder =3D (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); long unusable_free =3D (1 << order) - 1; =20 /* - * If the caller does not have rights to ALLOC_HARDER then subtract - * the high-atomic reserves. This will over-estimate the size of the - * atomic reserve but it avoids a search. + * If the caller does not have rights to reserves below the min + * watermark then subtract the high-atomic reserves. This will + * over-estimate the size of the atomic reserve but it avoids a search. */ - if (likely(!alloc_harder)) + if (likely(!(alloc_flags & ALLOC_RESERVES))) unusable_free +=3D z->nr_reserved_highatomic; =20 #ifdef CONFIG_CMA @@ -3981,25 +3980,36 @@ bool __zone_watermark_ok(struct zone *z, unsigned i= nt order, unsigned long mark, { long min =3D mark; int o; - const bool alloc_harder =3D (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); =20 /* free_pages may go negative - that's OK */ free_pages -=3D __zone_watermark_unusable_free(z, order, alloc_flags); =20 - if (alloc_flags & ALLOC_MIN_RESERVE) - min -=3D min / 2; + if (unlikely(alloc_flags & ALLOC_RESERVES)) { + /* + * __GFP_HIGH allows access to 50% of the min reserve as well + * as OOM. + */ + if (alloc_flags & ALLOC_MIN_RESERVE) + min -=3D min / 2; =20 - if (unlikely(alloc_harder)) { /* - * OOM victims can try even harder than normal ALLOC_HARDER + * Non-blocking allocations can access some of the reserve + * with more access if also __GFP_HIGH. The reasoning is that + * a non-blocking caller may incur a more severe penalty + * if it cannot get memory quickly, particularly if it's + * also __GFP_HIGH. + */ + if (alloc_flags & ALLOC_HARDER) + min -=3D min / 4; + + /* + * OOM victims can try even harder than the normal reserve * users on the grounds that it's definitely going to be in * the exit path shortly and free memory. Any allocation it * makes during the free path will be small and short-lived. */ if (alloc_flags & ALLOC_OOM) min -=3D min / 2; - else - min -=3D min / 4; } =20 /* --=20 2.35.3 From nobody Mon Sep 15 23:03:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE7B3C678DC for ; Mon, 9 Jan 2023 15:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237296AbjAIPSj (ORCPT ); Mon, 9 Jan 2023 10:18:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237187AbjAIPRz (ORCPT ); Mon, 9 Jan 2023 10:17:55 -0500 Received: from outbound-smtp46.blacknight.com (outbound-smtp46.blacknight.com [46.22.136.58]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BE642BC7 for ; Mon, 9 Jan 2023 07:17:36 -0800 (PST) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp46.blacknight.com (Postfix) with ESMTPS id B3A48FA8C8 for ; Mon, 9 Jan 2023 15:17:35 +0000 (GMT) Received: (qmail 18620 invoked from network); 9 Jan 2023 15:17:35 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 9 Jan 2023 15:17:35 -0000 From: Mel Gorman To: Linux-MM Cc: Andrew Morton , Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , LKML , Mel Gorman Subject: [PATCH 5/7] mm/page_alloc.c: Allow __GFP_NOFAIL requests deeper access to reserves Date: Mon, 9 Jan 2023 15:16:29 +0000 Message-Id: <20230109151631.24923-6-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230109151631.24923-1-mgorman@techsingularity.net> References: <20230109151631.24923-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently __GFP_NOFAIL allocations without any other flags can access 25% of the reserves but these requests imply that the system cannot make forward progress until the allocation succeeds. Allow __GFP_NOFAIL access to 75% of the min reserve. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6f41b84a97ac..d2df78f5baa2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5308,7 +5308,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, * could deplete whole memory reserves which would just make * the situation worse */ - page =3D __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac= ); + page =3D __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERV= E|ALLOC_HARDER, ac); if (page) goto got_pg; =20 --=20 2.35.3 From nobody Mon Sep 15 23:03:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 141ACC6379F for ; Mon, 9 Jan 2023 15:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233892AbjAIPSR (ORCPT ); Mon, 9 Jan 2023 10:18:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237211AbjAIPR5 (ORCPT ); Mon, 9 Jan 2023 10:17:57 -0500 Received: from outbound-smtp47.blacknight.com (outbound-smtp47.blacknight.com [46.22.136.64]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A04955A1 for ; Mon, 9 Jan 2023 07:17:47 -0800 (PST) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp47.blacknight.com (Postfix) with ESMTPS id D1B80FA8A7 for ; Mon, 9 Jan 2023 15:17:45 +0000 (GMT) Received: (qmail 19193 invoked from network); 9 Jan 2023 15:17:45 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 9 Jan 2023 15:17:45 -0000 From: Mel Gorman To: Linux-MM Cc: Andrew Morton , Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , LKML , Mel Gorman Subject: [PATCH 6/7] mm/page_alloc: Give GFP_ATOMIC and non-blocking allocations access to reserves Date: Mon, 9 Jan 2023 15:16:30 +0000 Message-Id: <20230109151631.24923-7-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230109151631.24923-1-mgorman@techsingularity.net> References: <20230109151631.24923-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Explicit GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a bit vague. In preparation for removing __GFP_ATOMIC, give GFP_ATOMIC and other non-blocking allocation requests equal access to reserve. Rename ALLOC_HARDER to ALLOC_NON_BLOCK to make it more clear what the flag means. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/internal.h | 7 +++++-- mm/page_alloc.c | 23 +++++++++++++---------- 2 files changed, 18 insertions(+), 12 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 8706d46863df..23a37588073a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -735,7 +735,10 @@ unsigned int reclaim_clean_pages_from_list(struct zone= *zone, #define ALLOC_OOM ALLOC_NO_WATERMARKS #endif =20 -#define ALLOC_HARDER 0x10 /* try to alloc harder */ +#define ALLOC_NON_BLOCK 0x10 /* Caller cannot block. Allow access + * to 25% of the min watermark or + * 62.5% if __GFP_HIGH is set. + */ #define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50% * of the min watermark. */ @@ -750,7 +753,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone = *zone, #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAI= M set */ =20 /* Flags that allow allocations below the min watermark. */ -#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|AL= LOC_OOM) +#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC= |ALLOC_OOM) =20 enum ttu_flags; struct tlbflush_unmap_batch; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d2df78f5baa2..2217bab2dbb2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3999,7 +3999,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int= order, unsigned long mark, * if it cannot get memory quickly, particularly if it's * also __GFP_HIGH. */ - if (alloc_flags & ALLOC_HARDER) + if (alloc_flags & ALLOC_NON_BLOCK) min -=3D min / 4; =20 /* @@ -4851,28 +4851,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int ord= er) * The caller may dip into page reserves a bit more if the caller * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will - * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH). + * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH). */ alloc_flags |=3D (__force int) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); =20 - if (gfp_mask & __GFP_ATOMIC) { + if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) { /* * Not worth trying to allocate harder for __GFP_NOMEMALLOC even * if it can't schedule. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) { - alloc_flags |=3D ALLOC_HARDER; + alloc_flags |=3D ALLOC_NON_BLOCK; =20 if (order > 0) alloc_flags |=3D ALLOC_HIGHATOMIC; } =20 /* - * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the - * comment for __cpuset_node_allowed(). + * Ignore cpuset mems for non-blocking __GFP_HIGH (probably + * GFP_ATOMIC) rather than fail, see the comment for + * __cpuset_node_allowed(). */ - alloc_flags &=3D ~ALLOC_CPUSET; + if (alloc_flags & ALLOC_MIN_RESERVE) + alloc_flags &=3D ~ALLOC_CPUSET; } else if (unlikely(rt_task(current)) && in_task()) alloc_flags |=3D ALLOC_MIN_RESERVE; =20 @@ -5304,11 +5306,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, =20 /* * Help non-failing allocations by giving them access to memory - * reserves but do not use ALLOC_NO_WATERMARKS because this + * reserves normally used for high priority non-blocking + * allocations but do not use ALLOC_NO_WATERMARKS because this * could deplete whole memory reserves which would just make - * the situation worse + * the situation worse. */ - page =3D __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERV= E|ALLOC_HARDER, ac); + page =3D __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERV= E|ALLOC_NON_BLOCK, ac); if (page) goto got_pg; =20 --=20 2.35.3 From nobody Mon Sep 15 23:03:04 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E58C3C678DD for ; Mon, 9 Jan 2023 15:19:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237342AbjAIPTB (ORCPT ); Mon, 9 Jan 2023 10:19:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234922AbjAIPR7 (ORCPT ); Mon, 9 Jan 2023 10:17:59 -0500 Received: from outbound-smtp41.blacknight.com (outbound-smtp41.blacknight.com [46.22.139.224]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4DBFE4C for ; Mon, 9 Jan 2023 07:17:57 -0800 (PST) Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp41.blacknight.com (Postfix) with ESMTPS id 0894C1FB2 for ; Mon, 9 Jan 2023 15:17:56 +0000 (GMT) Received: (qmail 19711 invoked from network); 9 Jan 2023 15:17:55 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 9 Jan 2023 15:17:55 -0000 From: Mel Gorman To: Linux-MM Cc: Andrew Morton , Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , LKML , Mel Gorman Subject: [PATCH 7/7] mm: discard __GFP_ATOMIC Date: Mon, 9 Jan 2023 15:16:31 +0000 Message-Id: <20230109151631.24923-8-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230109151631.24923-1-mgorman@techsingularity.net> References: <20230109151631.24923-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: NeilBrown __GFP_ATOMIC serves little purpose. Its main effect is to set ALLOC_HARDER which adds a few little boosts to increase the chance of an allocation succeeding, one of which is to lower the water-mark at which it will succeed. It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also adjusts this watermark. It is probable that other users of __GFP_HIGH should benefit from the other little bonuses that __GFP_ATOMIC gets. __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. There is little point to this. We already get a might_sleep() warning if __GFP_DIRECT_RECLAIM is set. __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is probable that testing ALLOC_HARDER is a better fit here. __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might sleep. This should test __GFP_DIRECT_RECLAIM instead. This patch: - removes __GFP_ATOMIC - allows __GFP_HIGH allocations to ignore watermark boosting as well as GFP_ATOMIC requests. - makes other adjustments as suggested by the above. The net result is not change to GFP_ATOMIC allocations. Other allocations that use __GFP_HIGH will benefit from a few different extra privileges. This affects: xen, dm, md, ntfs3 the vermillion frame buffer hibernation ksm swap all of which likely produce more benefit than cost if these selected allocation are more likely to succeed quickly. [mgorman: Minor adjustments to rework on top of a series] Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@nobl= e.neil.brown.name Signed-off-by: NeilBrown Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- Documentation/mm/balance.rst | 2 +- drivers/iommu/tegra-smmu.c | 4 ++-- include/linux/gfp_types.h | 12 ++++-------- include/trace/events/mmflags.h | 1 - lib/test_printf.c | 8 ++++---- mm/internal.h | 2 +- mm/page_alloc.c | 13 +++---------- tools/perf/builtin-kmem.c | 1 - 8 files changed, 15 insertions(+), 28 deletions(-) diff --git a/Documentation/mm/balance.rst b/Documentation/mm/balance.rst index 6a1fadf3e173..e38e9d83c1c7 100644 --- a/Documentation/mm/balance.rst +++ b/Documentation/mm/balance.rst @@ -6,7 +6,7 @@ Memory Balancing =20 Started Jan 2000 by Kanoj Sarcar =20 -Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as +Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as well as for non __GFP_IO allocations. =20 The first reason why a caller may avoid reclaim is that the caller can not diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 5b1af40221ec..af8d0e685260 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -671,12 +671,12 @@ static struct page *as_get_pde_page(struct tegra_smmu= _as *as, * allocate page in a sleeping context if GFP flags permit. Hence * spinlock needs to be unlocked and re-locked after allocation. */ - if (!(gfp & __GFP_ATOMIC)) + if (gfpflags_allow_blocking(gfp)) spin_unlock_irqrestore(&as->lock, *flags); =20 page =3D alloc_page(gfp | __GFP_DMA | __GFP_ZERO); =20 - if (!(gfp & __GFP_ATOMIC)) + if (gfpflags_allow_blocking(gfp)) spin_lock_irqsave(&as->lock, *flags); =20 /* diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index d88c46ca82e1..5088637fe5c2 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -31,7 +31,7 @@ typedef unsigned int __bitwise gfp_t; #define ___GFP_IO 0x40u #define ___GFP_FS 0x80u #define ___GFP_ZERO 0x100u -#define ___GFP_ATOMIC 0x200u +/* 0x200u unused */ #define ___GFP_DIRECT_RECLAIM 0x400u #define ___GFP_KSWAPD_RECLAIM 0x800u #define ___GFP_WRITE 0x1000u @@ -116,11 +116,8 @@ typedef unsigned int __bitwise gfp_t; * * %__GFP_HIGH indicates that the caller is high-priority and that granting * the request is necessary before the system can make forward progress. - * For example, creating an IO context to clean pages. - * - * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is - * high priority. Users are typically interrupt handlers. This may be - * used in conjunction with %__GFP_HIGH + * For example creating an IO context to clean pages and requests + * from atomic context. * * %__GFP_MEMALLOC allows access to all memory. This should only be used w= hen * the caller guarantees the allocation will allow more memory to be freed @@ -135,7 +132,6 @@ typedef unsigned int __bitwise gfp_t; * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency rese= rves. * This takes precedence over the %__GFP_MEMALLOC flag if both are set. */ -#define __GFP_ATOMIC ((__force gfp_t)___GFP_ATOMIC) #define __GFP_HIGH ((__force gfp_t)___GFP_HIGH) #define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC) #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) @@ -329,7 +325,7 @@ typedef unsigned int __bitwise gfp_t; * version does not attempt reclaim/compaction at all and is by default us= ed * in page fault path, while the non-light is used by khugepaged. */ -#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) +#define GFP_ATOMIC (__GFP_HIGH|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 412b5a46374c..9db52bc4ce19 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -31,7 +31,6 @@ gfpflag_string(__GFP_HIGHMEM), \ gfpflag_string(GFP_DMA32), \ gfpflag_string(__GFP_HIGH), \ - gfpflag_string(__GFP_ATOMIC), \ gfpflag_string(__GFP_IO), \ gfpflag_string(__GFP_FS), \ gfpflag_string(__GFP_NOWARN), \ diff --git a/lib/test_printf.c b/lib/test_printf.c index d34dc636b81c..46b4e6c414a3 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -674,17 +674,17 @@ flags(void) gfp =3D GFP_ATOMIC|__GFP_DMA; test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp); =20 - gfp =3D __GFP_ATOMIC; - test("__GFP_ATOMIC", "%pGg", &gfp); + gfp =3D __GFP_HIGH; + test("__GFP_HIGH", "%pGg", &gfp); =20 /* Any flags not translated by the table should remain numeric */ gfp =3D ~__GFP_BITS_MASK; snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp); test(cmp_buffer, "%pGg", &gfp); =20 - snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx", + snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx", (unsigned long) gfp); - gfp |=3D __GFP_ATOMIC; + gfp |=3D __GFP_HIGH; test(cmp_buffer, "%pGg", &gfp); =20 kfree(cmp_buffer); diff --git a/mm/internal.h b/mm/internal.h index 23a37588073a..71b1111427f3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -24,7 +24,7 @@ struct folio_batch; #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\ __GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\ __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\ - __GFP_ATOMIC|__GFP_NOLOCKDEP) + __GFP_NOLOCKDEP) =20 /* The GFP flags allowed during early boot */ #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS= )) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2217bab2dbb2..7244ab522028 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4086,13 +4086,14 @@ static inline bool zone_watermark_fast(struct zone = *z, unsigned int order, if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags, free_pages)) return true; + /* - * Ignore watermark boosting for GFP_ATOMIC order-0 allocations + * Ignore watermark boosting for __GFP_HIGH order-0 allocations * when checking the min watermark. The min watermark is the * point where boosting is ignored so that kswapd is woken up * when below the low watermark. */ - if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost + if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_= boost && ((alloc_flags & ALLOC_WMARK_MASK) =3D=3D WMARK_MIN))) { mark =3D z->_watermark[WMARK_MIN]; return __zone_watermark_ok(z, order, mark, highest_zoneidx, @@ -5057,14 +5058,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, unsigned int zonelist_iter_cookie; int reserve_flags; =20 - /* - * We also sanity check to catch abuse of atomic reserves being used by - * callers that are not in atomic context. - */ - if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) =3D=3D - (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) - gfp_mask &=3D ~__GFP_ATOMIC; - restart: compaction_retries =3D 0; no_progress_loops =3D 0; diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index e20656c431a4..173d407dce92 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -641,7 +641,6 @@ static const struct { { "__GFP_HIGHMEM", "HM" }, { "GFP_DMA32", "D32" }, { "__GFP_HIGH", "H" }, - { "__GFP_ATOMIC", "_A" }, { "__GFP_IO", "I" }, { "__GFP_FS", "F" }, { "__GFP_NOWARN", "NWR" }, --=20 2.35.3