From nobody Sat Feb 7 07:29:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45008C54EBD for ; Fri, 13 Jan 2023 11:18:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235391AbjAMLSY (ORCPT ); Fri, 13 Jan 2023 06:18:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241343AbjAMLRU (ORCPT ); Fri, 13 Jan 2023 06:17:20 -0500 Received: from outbound-smtp29.blacknight.com (outbound-smtp29.blacknight.com [81.17.249.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68D42659F for ; Fri, 13 Jan 2023 03:12:39 -0800 (PST) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp29.blacknight.com (Postfix) with ESMTPS id 6BE4C18E031 for ; Fri, 13 Jan 2023 11:12:38 +0000 (GMT) Received: (qmail 8795 invoked from network); 13 Jan 2023 11:12:38 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jan 2023 11:12:38 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , Linux-MM , LKML , Mel Gorman Subject: [PATCH 1/6] mm/page_alloc: Rename ALLOC_HIGH to ALLOC_MIN_RESERVE Date: Fri, 13 Jan 2023 11:12:12 +0000 Message-Id: <20230113111217.14134-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230113111217.14134-1-mgorman@techsingularity.net> References: <20230113111217.14134-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" __GFP_HIGH aliases to ALLOC_HIGH but the name does not really hint what it means. As ALLOC_HIGH is internal to the allocator, rename it to ALLOC_MIN_RESERVE to document that the min reserves can be depleted. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- mm/internal.h | 4 +++- mm/page_alloc.c | 8 ++++---- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index bcf75a8b032d..403e4386626d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -736,7 +736,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone = *zone, #endif =20 #define ALLOC_HARDER 0x10 /* try to alloc harder */ -#define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ +#define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50% + * of the min watermark. + */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ #define ALLOC_CMA 0x80 /* allow allocations from CMA areas */ #ifdef CONFIG_ZONE_DMA32 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0745aedebb37..244c1e675dc8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3976,7 +3976,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int= order, unsigned long mark, /* free_pages may go negative - that's OK */ free_pages -=3D __zone_watermark_unusable_free(z, order, alloc_flags); =20 - if (alloc_flags & ALLOC_HIGH) + if (alloc_flags & ALLOC_MIN_RESERVE) min -=3D min / 2; =20 if (unlikely(alloc_harder)) { @@ -4818,18 +4818,18 @@ gfp_to_alloc_flags(gfp_t gfp_mask) unsigned int alloc_flags =3D ALLOC_WMARK_MIN | ALLOC_CPUSET; =20 /* - * __GFP_HIGH is assumed to be the same as ALLOC_HIGH + * __GFP_HIGH is assumed to be the same as ALLOC_MIN_RESERVE * and __GFP_KSWAPD_RECLAIM is assumed to be the same as ALLOC_KSWAPD * to save two branches. */ - BUILD_BUG_ON(__GFP_HIGH !=3D (__force gfp_t) ALLOC_HIGH); + BUILD_BUG_ON(__GFP_HIGH !=3D (__force gfp_t) ALLOC_MIN_RESERVE); BUILD_BUG_ON(__GFP_KSWAPD_RECLAIM !=3D (__force gfp_t) ALLOC_KSWAPD); =20 /* * The caller may dip into page reserves a bit more if the caller * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will - * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_HIGH (__GFP_HIGH). + * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH). */ alloc_flags |=3D (__force int) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); --=20 2.35.3 From nobody Sat Feb 7 07:29:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C3B2C677F1 for ; Fri, 13 Jan 2023 11:19:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240806AbjAMLTC (ORCPT ); Fri, 13 Jan 2023 06:19:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241326AbjAMLR5 (ORCPT ); Fri, 13 Jan 2023 06:17:57 -0500 Received: from outbound-smtp26.blacknight.com (outbound-smtp26.blacknight.com [81.17.249.194]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 063CC7D9D5 for ; Fri, 13 Jan 2023 03:12:49 -0800 (PST) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp26.blacknight.com (Postfix) with ESMTPS id 8A53ECAE82 for ; Fri, 13 Jan 2023 11:12:48 +0000 (GMT) Received: (qmail 9303 invoked from network); 13 Jan 2023 11:12:48 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jan 2023 11:12:48 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , Linux-MM , LKML , Mel Gorman Subject: [PATCH 2/6] mm/page_alloc: Treat RT tasks similar to __GFP_HIGH Date: Fri, 13 Jan 2023 11:12:13 +0000 Message-Id: <20230113111217.14134-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230113111217.14134-1-mgorman@techsingularity.net> References: <20230113111217.14134-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" RT tasks are allowed to dip below the min reserve but ALLOC_HARDER is typically combined with ALLOC_MIN_RESERVE so RT tasks are a little unusual. While there is some justification for allowing RT tasks access to memory reserves, there is a strong chance that a RT task that is also under memory pressure is at risk of missing deadlines anyway. Relax how much reserves an RT task can access by treating it the same as __GFP_HIGH allocations. Note that in a future kernel release that the RT special casing will be removed. Hard realtime tasks should be locking down resources in advance and ensuring enough memory is available. Even a soft-realtime task like audio or video live decoding which cannot jitter should be allocating both memory and any disk space required up-front before the recording starts instead of relying on reserves. At best, reserve access will only delay the problem by a very short interval. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 244c1e675dc8..0040b4e00913 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4847,7 +4847,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) */ alloc_flags &=3D ~ALLOC_CPUSET; } else if (unlikely(rt_task(current)) && in_task()) - alloc_flags |=3D ALLOC_HARDER; + alloc_flags |=3D ALLOC_MIN_RESERVE; =20 alloc_flags =3D gfp_to_alloc_flags_cma(gfp_mask, alloc_flags); =20 --=20 2.35.3 From nobody Sat Feb 7 07:29:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D602C54EBD for ; Fri, 13 Jan 2023 11:19:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241067AbjAMLTL (ORCPT ); Fri, 13 Jan 2023 06:19:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241245AbjAMLSJ (ORCPT ); Fri, 13 Jan 2023 06:18:09 -0500 Received: from outbound-smtp06.blacknight.com (outbound-smtp06.blacknight.com [81.17.249.39]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1021213D00 for ; Fri, 13 Jan 2023 03:12:59 -0800 (PST) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp06.blacknight.com (Postfix) with ESMTPS id 99AA3C2BBB for ; Fri, 13 Jan 2023 11:12:58 +0000 (GMT) Received: (qmail 9751 invoked from network); 13 Jan 2023 11:12:58 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jan 2023 11:12:58 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , Linux-MM , LKML , Mel Gorman Subject: [PATCH 3/6] mm/page_alloc: Explicitly record high-order atomic allocations in alloc_flags Date: Fri, 13 Jan 2023 11:12:14 +0000 Message-Id: <20230113111217.14134-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230113111217.14134-1-mgorman@techsingularity.net> References: <20230113111217.14134-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" A high-order ALLOC_HARDER allocation is assumed to be atomic. While that is accurate, it changes later in the series. In preparation, explicitly record high-order atomic allocations in gfp_to_alloc_flags(). Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- mm/internal.h | 1 + mm/page_alloc.c | 29 +++++++++++++++++++++++------ 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 403e4386626d..178484d9fd94 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -746,6 +746,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone = *zone, #else #define ALLOC_NOFRAGMENT 0x0 #endif +#define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAI= M set */ =20 enum ttu_flags; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0040b4e00913..0ef4f3236a5a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3706,10 +3706,20 @@ struct page *rmqueue_buddy(struct zone *preferred_z= one, struct zone *zone, * reserved for high-order atomic allocation, so order-0 * request should skip it. */ - if (order > 0 && alloc_flags & ALLOC_HARDER) + if (alloc_flags & ALLOC_HIGHATOMIC) page =3D __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { page =3D __rmqueue(zone, order, migratetype, alloc_flags); + + /* + * If the allocation fails, allow OOM handling access + * to HIGHATOMIC reserves as failing now is worse than + * failing a high-order atomic allocation in the + * future. + */ + if (!page && (alloc_flags & ALLOC_OOM)) + page =3D __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); + if (!page) { spin_unlock_irqrestore(&zone->lock, flags); return NULL; @@ -4023,8 +4033,10 @@ bool __zone_watermark_ok(struct zone *z, unsigned in= t order, unsigned long mark, return true; } #endif - if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC)) + if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) && + !free_area_empty(area, MIGRATE_HIGHATOMIC)) { return true; + } } return false; } @@ -4286,7 +4298,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int o= rder, int alloc_flags, * If this is a high-order atomic allocation then check * if the pageblock should be reserved for the future */ - if (unlikely(order && (alloc_flags & ALLOC_HARDER))) + if (unlikely(alloc_flags & ALLOC_HIGHATOMIC)) reserve_highatomic_pageblock(page, zone, order); =20 return page; @@ -4813,7 +4825,7 @@ static void wake_all_kswapds(unsigned int order, gfp_= t gfp_mask, } =20 static inline unsigned int -gfp_to_alloc_flags(gfp_t gfp_mask) +gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order) { unsigned int alloc_flags =3D ALLOC_WMARK_MIN | ALLOC_CPUSET; =20 @@ -4839,8 +4851,13 @@ gfp_to_alloc_flags(gfp_t gfp_mask) * Not worth trying to allocate harder for __GFP_NOMEMALLOC even * if it can't schedule. */ - if (!(gfp_mask & __GFP_NOMEMALLOC)) + if (!(gfp_mask & __GFP_NOMEMALLOC)) { alloc_flags |=3D ALLOC_HARDER; + + if (order > 0) + alloc_flags |=3D ALLOC_HIGHATOMIC; + } + /* * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the * comment for __cpuset_node_allowed(). @@ -5048,7 +5065,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, * kswapd needs to be woken up, and to avoid the cost of setting up * alloc_flags precisely. So we do that now. */ - alloc_flags =3D gfp_to_alloc_flags(gfp_mask); + alloc_flags =3D gfp_to_alloc_flags(gfp_mask, order); =20 /* * We need to recalculate the starting point for the zonelist iterator --=20 2.35.3 From nobody Sat Feb 7 07:29:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EB27C54EBD for ; Fri, 13 Jan 2023 11:19:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241028AbjAMLT0 (ORCPT ); Fri, 13 Jan 2023 06:19:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241489AbjAMLSW (ORCPT ); Fri, 13 Jan 2023 06:18:22 -0500 Received: from outbound-smtp27.blacknight.com (outbound-smtp27.blacknight.com [81.17.249.195]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71AC9175BF for ; Fri, 13 Jan 2023 03:13:10 -0800 (PST) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp27.blacknight.com (Postfix) with ESMTPS id BEE6CCAE83 for ; Fri, 13 Jan 2023 11:13:08 +0000 (GMT) Received: (qmail 10332 invoked from network); 13 Jan 2023 11:13:08 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jan 2023 11:13:08 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , Linux-MM , LKML , Mel Gorman Subject: [PATCH 4/6] mm/page_alloc: Explicitly define what alloc flags deplete min reserves Date: Fri, 13 Jan 2023 11:12:15 +0000 Message-Id: <20230113111217.14134-5-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230113111217.14134-1-mgorman@techsingularity.net> References: <20230113111217.14134-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" As there are more ALLOC_ flags that affect reserves, define what flags affect reserves and clarify the effect of each flag. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- mm/internal.h | 3 +++ mm/page_alloc.c | 34 ++++++++++++++++++++++------------ 2 files changed, 25 insertions(+), 12 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 178484d9fd94..8706d46863df 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -749,6 +749,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone = *zone, #define ALLOC_HIGHATOMIC 0x200 /* Allows access to MIGRATE_HIGHATOMIC */ #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAI= M set */ =20 +/* Flags that allow allocations below the min watermark. */ +#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|AL= LOC_OOM) + enum ttu_flags; struct tlbflush_unmap_batch; =20 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0ef4f3236a5a..6f41b84a97ac 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3949,15 +3949,14 @@ ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE); static inline long __zone_watermark_unusable_free(struct zone *z, unsigned int order, unsigned int alloc_flags) { - const bool alloc_harder =3D (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); long unusable_free =3D (1 << order) - 1; =20 /* - * If the caller does not have rights to ALLOC_HARDER then subtract - * the high-atomic reserves. This will over-estimate the size of the - * atomic reserve but it avoids a search. + * If the caller does not have rights to reserves below the min + * watermark then subtract the high-atomic reserves. This will + * over-estimate the size of the atomic reserve but it avoids a search. */ - if (likely(!alloc_harder)) + if (likely(!(alloc_flags & ALLOC_RESERVES))) unusable_free +=3D z->nr_reserved_highatomic; =20 #ifdef CONFIG_CMA @@ -3981,25 +3980,36 @@ bool __zone_watermark_ok(struct zone *z, unsigned i= nt order, unsigned long mark, { long min =3D mark; int o; - const bool alloc_harder =3D (alloc_flags & (ALLOC_HARDER|ALLOC_OOM)); =20 /* free_pages may go negative - that's OK */ free_pages -=3D __zone_watermark_unusable_free(z, order, alloc_flags); =20 - if (alloc_flags & ALLOC_MIN_RESERVE) - min -=3D min / 2; + if (unlikely(alloc_flags & ALLOC_RESERVES)) { + /* + * __GFP_HIGH allows access to 50% of the min reserve as well + * as OOM. + */ + if (alloc_flags & ALLOC_MIN_RESERVE) + min -=3D min / 2; =20 - if (unlikely(alloc_harder)) { /* - * OOM victims can try even harder than normal ALLOC_HARDER + * Non-blocking allocations can access some of the reserve + * with more access if also __GFP_HIGH. The reasoning is that + * a non-blocking caller may incur a more severe penalty + * if it cannot get memory quickly, particularly if it's + * also __GFP_HIGH. + */ + if (alloc_flags & ALLOC_HARDER) + min -=3D min / 4; + + /* + * OOM victims can try even harder than the normal reserve * users on the grounds that it's definitely going to be in * the exit path shortly and free memory. Any allocation it * makes during the free path will be small and short-lived. */ if (alloc_flags & ALLOC_OOM) min -=3D min / 2; - else - min -=3D min / 4; } =20 /* --=20 2.35.3 From nobody Sat Feb 7 07:29:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E12D3C54EBE for ; Fri, 13 Jan 2023 11:19:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241216AbjAMLTz (ORCPT ); Fri, 13 Jan 2023 06:19:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241585AbjAMLSj (ORCPT ); Fri, 13 Jan 2023 06:18:39 -0500 Received: from outbound-smtp55.blacknight.com (outbound-smtp55.blacknight.com [46.22.136.239]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B991543DA2 for ; Fri, 13 Jan 2023 03:13:20 -0800 (PST) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp55.blacknight.com (Postfix) with ESMTPS id EB420FAEB9 for ; Fri, 13 Jan 2023 11:13:18 +0000 (GMT) Received: (qmail 10897 invoked from network); 13 Jan 2023 11:13:18 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jan 2023 11:13:18 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , Linux-MM , LKML , Mel Gorman Subject: [PATCH 5/6] mm/page_alloc: Explicitly define how __GFP_HIGH non-blocking allocations accesses reserves Date: Fri, 13 Jan 2023 11:12:16 +0000 Message-Id: <20230113111217.14134-6-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230113111217.14134-1-mgorman@techsingularity.net> References: <20230113111217.14134-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" GFP_ATOMIC allocations get flagged ALLOC_HARDER which is a vague description. In preparation for the removal of GFP_ATOMIC redefine __GFP_ATOMIC to simply mean non-blocking and renaming ALLOC_HARDER to ALLOC_NON_BLOCK accordingly. __GFP_HIGH is required for access to reserves but non-blocking is granted more access. For example, GFP_NOWAIT is non-blocking but has no special access to reserves. A __GFP_NOFAIL blocking allocation is granted access similar to __GFP_HIGH if the only alternative is an OOM kill. Signed-off-by: Mel Gorman Acked-by: Michal Hocko Acked-by: Vlastimil Babka --- mm/internal.h | 7 +++++-- mm/page_alloc.c | 44 ++++++++++++++++++++++++-------------------- 2 files changed, 29 insertions(+), 22 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 8706d46863df..23a37588073a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -735,7 +735,10 @@ unsigned int reclaim_clean_pages_from_list(struct zone= *zone, #define ALLOC_OOM ALLOC_NO_WATERMARKS #endif =20 -#define ALLOC_HARDER 0x10 /* try to alloc harder */ +#define ALLOC_NON_BLOCK 0x10 /* Caller cannot block. Allow access + * to 25% of the min watermark or + * 62.5% if __GFP_HIGH is set. + */ #define ALLOC_MIN_RESERVE 0x20 /* __GFP_HIGH set. Allow access to 50% * of the min watermark. */ @@ -750,7 +753,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone = *zone, #define ALLOC_KSWAPD 0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAI= M set */ =20 /* Flags that allow allocations below the min watermark. */ -#define ALLOC_RESERVES (ALLOC_HARDER|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC|AL= LOC_OOM) +#define ALLOC_RESERVES (ALLOC_NON_BLOCK|ALLOC_MIN_RESERVE|ALLOC_HIGHATOMIC= |ALLOC_OOM) =20 enum ttu_flags; struct tlbflush_unmap_batch; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6f41b84a97ac..b9ae0ba0a2ab 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3989,18 +3989,19 @@ bool __zone_watermark_ok(struct zone *z, unsigned i= nt order, unsigned long mark, * __GFP_HIGH allows access to 50% of the min reserve as well * as OOM. */ - if (alloc_flags & ALLOC_MIN_RESERVE) + if (alloc_flags & ALLOC_MIN_RESERVE) { min -=3D min / 2; =20 - /* - * Non-blocking allocations can access some of the reserve - * with more access if also __GFP_HIGH. The reasoning is that - * a non-blocking caller may incur a more severe penalty - * if it cannot get memory quickly, particularly if it's - * also __GFP_HIGH. - */ - if (alloc_flags & ALLOC_HARDER) - min -=3D min / 4; + /* + * Non-blocking allocations (e.g. GFP_ATOMIC) can + * access more reserves than just __GFP_HIGH. Other + * non-blocking allocations requests such as GFP_NOWAIT + * or (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) do not get + * access to the min reserve. + */ + if (alloc_flags & ALLOC_NON_BLOCK) + min -=3D min / 4; + } =20 /* * OOM victims can try even harder than the normal reserve @@ -4851,28 +4852,30 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int ord= er) * The caller may dip into page reserves a bit more if the caller * cannot run direct reclaim, or if the caller has realtime scheduling * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will - * set both ALLOC_HARDER (__GFP_ATOMIC) and ALLOC_MIN_RESERVE(__GFP_HIGH). + * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH). */ alloc_flags |=3D (__force int) (gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM)); =20 - if (gfp_mask & __GFP_ATOMIC) { + if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) { /* * Not worth trying to allocate harder for __GFP_NOMEMALLOC even * if it can't schedule. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) { - alloc_flags |=3D ALLOC_HARDER; + alloc_flags |=3D ALLOC_NON_BLOCK; =20 if (order > 0) alloc_flags |=3D ALLOC_HIGHATOMIC; } =20 /* - * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the - * comment for __cpuset_node_allowed(). + * Ignore cpuset mems for non-blocking __GFP_HIGH (probably + * GFP_ATOMIC) rather than fail, see the comment for + * __cpuset_node_allowed(). */ - alloc_flags &=3D ~ALLOC_CPUSET; + if (alloc_flags & ALLOC_MIN_RESERVE) + alloc_flags &=3D ~ALLOC_CPUSET; } else if (unlikely(rt_task(current)) && in_task()) alloc_flags |=3D ALLOC_MIN_RESERVE; =20 @@ -5303,12 +5306,13 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, WARN_ON_ONCE_GFP(costly_order, gfp_mask); =20 /* - * Help non-failing allocations by giving them access to memory - * reserves but do not use ALLOC_NO_WATERMARKS because this + * Help non-failing allocations by giving some access to memory + * reserves normally used for high priority non-blocking + * allocations but do not use ALLOC_NO_WATERMARKS because this * could deplete whole memory reserves which would just make - * the situation worse + * the situation worse. */ - page =3D __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac= ); + page =3D __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_MIN_RESERV= E, ac); if (page) goto got_pg; =20 --=20 2.35.3 From nobody Sat Feb 7 07:29:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB818C54EBE for ; Fri, 13 Jan 2023 11:20:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240917AbjAMLUf (ORCPT ); Fri, 13 Jan 2023 06:20:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232723AbjAMLS5 (ORCPT ); Fri, 13 Jan 2023 06:18:57 -0500 Received: from outbound-smtp05.blacknight.com (outbound-smtp05.blacknight.com [81.17.249.38]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7FF2482B2 for ; Fri, 13 Jan 2023 03:13:30 -0800 (PST) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp05.blacknight.com (Postfix) with ESMTPS id 143F3CCCE4 for ; Fri, 13 Jan 2023 11:13:29 +0000 (GMT) Received: (qmail 11467 invoked from network); 13 Jan 2023 11:13:28 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jan 2023 11:13:28 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , NeilBrown , Thierry Reding , Matthew Wilcox , Vlastimil Babka , Linux-MM , LKML , Mel Gorman Subject: [PATCH 6/6] mm: discard __GFP_ATOMIC Date: Fri, 13 Jan 2023 11:12:17 +0000 Message-Id: <20230113111217.14134-7-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20230113111217.14134-1-mgorman@techsingularity.net> References: <20230113111217.14134-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: NeilBrown __GFP_ATOMIC serves little purpose. Its main effect is to set ALLOC_HARDER which adds a few little boosts to increase the chance of an allocation succeeding, one of which is to lower the water-mark at which it will succeed. It is *always* paired with __GFP_HIGH which sets ALLOC_HIGH which also adjusts this watermark. It is probable that other users of __GFP_HIGH should benefit from the other little bonuses that __GFP_ATOMIC gets. __GFP_ATOMIC also gives a warning if used with __GFP_DIRECT_RECLAIM. There is little point to this. We already get a might_sleep() warning if __GFP_DIRECT_RECLAIM is set. __GFP_ATOMIC allows the "watermark_boost" to be side-stepped. It is probable that testing ALLOC_HARDER is a better fit here. __GFP_ATOMIC is used by tegra-smmu.c to check if the allocation might sleep. This should test __GFP_DIRECT_RECLAIM instead. This patch: - removes __GFP_ATOMIC - allows __GFP_HIGH allocations to ignore watermark boosting as well as GFP_ATOMIC requests. - makes other adjustments as suggested by the above. The net result is not change to GFP_ATOMIC allocations. Other allocations that use __GFP_HIGH will benefit from a few different extra privileges. This affects: xen, dm, md, ntfs3 the vermillion frame buffer hibernation ksm swap all of which likely produce more benefit than cost if these selected allocation are more likely to succeed quickly. [mgorman: Minor adjustments to rework on top of a series] Link: https://lkml.kernel.org/r/163712397076.13692.4727608274002939094@nobl= e.neil.brown.name Signed-off-by: NeilBrown Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka Acked-by: Michal Hocko --- Documentation/mm/balance.rst | 2 +- drivers/iommu/tegra-smmu.c | 4 ++-- include/linux/gfp_types.h | 12 ++++-------- include/trace/events/mmflags.h | 1 - lib/test_printf.c | 8 ++++---- mm/internal.h | 2 +- mm/page_alloc.c | 13 +++---------- tools/perf/builtin-kmem.c | 1 - 8 files changed, 15 insertions(+), 28 deletions(-) diff --git a/Documentation/mm/balance.rst b/Documentation/mm/balance.rst index 6a1fadf3e173..e38e9d83c1c7 100644 --- a/Documentation/mm/balance.rst +++ b/Documentation/mm/balance.rst @@ -6,7 +6,7 @@ Memory Balancing =20 Started Jan 2000 by Kanoj Sarcar =20 -Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as +Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as well as for non __GFP_IO allocations. =20 The first reason why a caller may avoid reclaim is that the caller can not diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 5b1af40221ec..af8d0e685260 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -671,12 +671,12 @@ static struct page *as_get_pde_page(struct tegra_smmu= _as *as, * allocate page in a sleeping context if GFP flags permit. Hence * spinlock needs to be unlocked and re-locked after allocation. */ - if (!(gfp & __GFP_ATOMIC)) + if (gfpflags_allow_blocking(gfp)) spin_unlock_irqrestore(&as->lock, *flags); =20 page =3D alloc_page(gfp | __GFP_DMA | __GFP_ZERO); =20 - if (!(gfp & __GFP_ATOMIC)) + if (gfpflags_allow_blocking(gfp)) spin_lock_irqsave(&as->lock, *flags); =20 /* diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index d88c46ca82e1..5088637fe5c2 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -31,7 +31,7 @@ typedef unsigned int __bitwise gfp_t; #define ___GFP_IO 0x40u #define ___GFP_FS 0x80u #define ___GFP_ZERO 0x100u -#define ___GFP_ATOMIC 0x200u +/* 0x200u unused */ #define ___GFP_DIRECT_RECLAIM 0x400u #define ___GFP_KSWAPD_RECLAIM 0x800u #define ___GFP_WRITE 0x1000u @@ -116,11 +116,8 @@ typedef unsigned int __bitwise gfp_t; * * %__GFP_HIGH indicates that the caller is high-priority and that granting * the request is necessary before the system can make forward progress. - * For example, creating an IO context to clean pages. - * - * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is - * high priority. Users are typically interrupt handlers. This may be - * used in conjunction with %__GFP_HIGH + * For example creating an IO context to clean pages and requests + * from atomic context. * * %__GFP_MEMALLOC allows access to all memory. This should only be used w= hen * the caller guarantees the allocation will allow more memory to be freed @@ -135,7 +132,6 @@ typedef unsigned int __bitwise gfp_t; * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency rese= rves. * This takes precedence over the %__GFP_MEMALLOC flag if both are set. */ -#define __GFP_ATOMIC ((__force gfp_t)___GFP_ATOMIC) #define __GFP_HIGH ((__force gfp_t)___GFP_HIGH) #define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC) #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) @@ -329,7 +325,7 @@ typedef unsigned int __bitwise gfp_t; * version does not attempt reclaim/compaction at all and is by default us= ed * in page fault path, while the non-light is used by khugepaged. */ -#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) +#define GFP_ATOMIC (__GFP_HIGH|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 412b5a46374c..9db52bc4ce19 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -31,7 +31,6 @@ gfpflag_string(__GFP_HIGHMEM), \ gfpflag_string(GFP_DMA32), \ gfpflag_string(__GFP_HIGH), \ - gfpflag_string(__GFP_ATOMIC), \ gfpflag_string(__GFP_IO), \ gfpflag_string(__GFP_FS), \ gfpflag_string(__GFP_NOWARN), \ diff --git a/lib/test_printf.c b/lib/test_printf.c index d34dc636b81c..46b4e6c414a3 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -674,17 +674,17 @@ flags(void) gfp =3D GFP_ATOMIC|__GFP_DMA; test("GFP_ATOMIC|GFP_DMA", "%pGg", &gfp); =20 - gfp =3D __GFP_ATOMIC; - test("__GFP_ATOMIC", "%pGg", &gfp); + gfp =3D __GFP_HIGH; + test("__GFP_HIGH", "%pGg", &gfp); =20 /* Any flags not translated by the table should remain numeric */ gfp =3D ~__GFP_BITS_MASK; snprintf(cmp_buffer, BUF_SIZE, "%#lx", (unsigned long) gfp); test(cmp_buffer, "%pGg", &gfp); =20 - snprintf(cmp_buffer, BUF_SIZE, "__GFP_ATOMIC|%#lx", + snprintf(cmp_buffer, BUF_SIZE, "__GFP_HIGH|%#lx", (unsigned long) gfp); - gfp |=3D __GFP_ATOMIC; + gfp |=3D __GFP_HIGH; test(cmp_buffer, "%pGg", &gfp); =20 kfree(cmp_buffer); diff --git a/mm/internal.h b/mm/internal.h index 23a37588073a..71b1111427f3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -24,7 +24,7 @@ struct folio_batch; #define GFP_RECLAIM_MASK (__GFP_RECLAIM|__GFP_HIGH|__GFP_IO|__GFP_FS|\ __GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NOFAIL|\ __GFP_NORETRY|__GFP_MEMALLOC|__GFP_NOMEMALLOC|\ - __GFP_ATOMIC|__GFP_NOLOCKDEP) + __GFP_NOLOCKDEP) =20 /* The GFP flags allowed during early boot */ #define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_RECLAIM|__GFP_IO|__GFP_FS= )) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b9ae0ba0a2ab..78ffebc4798b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4087,13 +4087,14 @@ static inline bool zone_watermark_fast(struct zone = *z, unsigned int order, if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags, free_pages)) return true; + /* - * Ignore watermark boosting for GFP_ATOMIC order-0 allocations + * Ignore watermark boosting for __GFP_HIGH order-0 allocations * when checking the min watermark. The min watermark is the * point where boosting is ignored so that kswapd is woken up * when below the low watermark. */ - if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost + if (unlikely(!order && (alloc_flags & ALLOC_MIN_RESERVE) && z->watermark_= boost && ((alloc_flags & ALLOC_WMARK_MASK) =3D=3D WMARK_MIN))) { mark =3D z->_watermark[WMARK_MIN]; return __zone_watermark_ok(z, order, mark, highest_zoneidx, @@ -5058,14 +5059,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, unsigned int zonelist_iter_cookie; int reserve_flags; =20 - /* - * We also sanity check to catch abuse of atomic reserves being used by - * callers that are not in atomic context. - */ - if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) =3D=3D - (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM))) - gfp_mask &=3D ~__GFP_ATOMIC; - restart: compaction_retries =3D 0; no_progress_loops =3D 0; diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index e20656c431a4..173d407dce92 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -641,7 +641,6 @@ static const struct { { "__GFP_HIGHMEM", "HM" }, { "GFP_DMA32", "D32" }, { "__GFP_HIGH", "H" }, - { "__GFP_ATOMIC", "_A" }, { "__GFP_IO", "I" }, { "__GFP_FS", "F" }, { "__GFP_NOWARN", "NWR" }, --=20 2.35.3