From nobody Thu Dec 18 23:23:23 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4B6D2D3ECF for ; Tue, 16 Dec 2025 15:55:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765900503; cv=none; b=Mva0Cbbg1ZligS/FQ059jTdTldR3PhFhsV3ZFtQ1gjbpinWSxcIhlUfrQi+sDU1bqRDEHk63vaU2UdgoLEV7R0Mlypn9s33mLoALrMI+SRLKPrhfAaF6tWlmJ4IbM2E5InnDT9OpoM8J+11X14hjk/d6ab/8V713FJkOtZyuPsY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765900503; c=relaxed/simple; bh=++7eyxevsPzdZz3r4oaGv/mi9oN1jPDkXFLxAIjV77w=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=eUg2Q1DyXo9m4ehAOymh6h266uR+N2YzSOgWoV4h5wI+QtJVALnY7YmHLPV2Au+8LsqfqZ3sXGdbEb562xptXtimVwa2Assrq66GRdZnJNCFDvUHZdJF/S9Ubsx8G74IXxEta1V5rfeofXoKRFh5RTtwAx1nRJfmeXyHKwTLEbk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=stZyuU0g; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=0zVztx15; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=stZyuU0g; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=0zVztx15; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="stZyuU0g"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="0zVztx15"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="stZyuU0g"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="0zVztx15" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 43AB65BD55; Tue, 16 Dec 2025 15:54:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1765900490; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MR3ffO4xrLecxa1eKRT0t3GVFusO+oJc5aPAffkAlAA=; b=stZyuU0gwxnGvgWXFpl7LuhBB1voWsNTTkvPT+WswZNSV2gPyG9jPGycXsTNNxQhMe3TAG Y856XMs1aBUyfvSUtgShR2TbiUeSplax174+Dp1QHSvTqu8nHkq1wmfHpF0UcaLVRJQJ5b ODVuvs/OO5uC1TYWZJXHi7jTZVrcCOc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1765900490; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MR3ffO4xrLecxa1eKRT0t3GVFusO+oJc5aPAffkAlAA=; b=0zVztx15yRHE1z9NiZqleDonMvOAGODkKq0/ZEaVDvVEukvTb7pmFKYVhXSUpT0ZOF+2sr vP3Nw3WCMczRNWDQ== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1765900490; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MR3ffO4xrLecxa1eKRT0t3GVFusO+oJc5aPAffkAlAA=; b=stZyuU0gwxnGvgWXFpl7LuhBB1voWsNTTkvPT+WswZNSV2gPyG9jPGycXsTNNxQhMe3TAG Y856XMs1aBUyfvSUtgShR2TbiUeSplax174+Dp1QHSvTqu8nHkq1wmfHpF0UcaLVRJQJ5b ODVuvs/OO5uC1TYWZJXHi7jTZVrcCOc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1765900490; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MR3ffO4xrLecxa1eKRT0t3GVFusO+oJc5aPAffkAlAA=; b=0zVztx15yRHE1z9NiZqleDonMvOAGODkKq0/ZEaVDvVEukvTb7pmFKYVhXSUpT0ZOF+2sr vP3Nw3WCMczRNWDQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 178D43EA65; Tue, 16 Dec 2025 15:54:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id SIhwBcqAQWkSGgAAD6G6ig (envelope-from ); Tue, 16 Dec 2025 15:54:50 +0000 From: Vlastimil Babka Date: Tue, 16 Dec 2025 16:54:21 +0100 Subject: [PATCH RFC 1/2] mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251216-thp-thisnode-tweak-v1-1-0e499d13d2eb@suse.cz> References: <20251216-thp-thisnode-tweak-v1-0-0e499d13d2eb@suse.cz> In-Reply-To: <20251216-thp-thisnode-tweak-v1-0-0e499d13d2eb@suse.cz> To: Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , David Rientjes , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Joshua Hahn , Pedro Falcato Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=2872; i=vbabka@suse.cz; h=from:subject:message-id; bh=++7eyxevsPzdZz3r4oaGv/mi9oN1jPDkXFLxAIjV77w=; b=owEBiQF2/pANAwAIAbvgsHXSRYiaAcsmYgBpQYDDO9621vlOjfKJDP3he3X3FDlxbK4Z7sw3V 9mXc8uFKu6JAU8EAAEIADkWIQR7u8hBFZkjSJZITfG74LB10kWImgUCaUGAwxsUgAAAAAAEAA5t YW51MiwyLjUrMS4xMSwyLDIACgkQu+CwddJFiJqilQgAoe5BI5j7ckydfXr0N41i7/T33THXbja C1dWnYjnr+HaIFjKCci+iDrHfvHV/rQyL5io3kJmedy+hwFI+4AnS3X4CzotgBbDvD9FkhCIm0r GHxfL+5Z4wfUeRSCK+SpGp+IeMqvCUP7Ldwsht2vZr23afldA80K2EAa4GxqCZE0X9wVXwDPT4q vCHwfE52c2fL+IVSTp1YaCbmTRgccI+GumBql2M5fXMQ4nv8b1EZ+ZIthCCNEeDtSLjGfrz3w9r /0UD80J08Jp/BHi/2/eZSU43Qj5UaUt4tLzgruPdc3AzBQwZ7mRfA0i3TWofqXAUfpQjjlsahcx F6VoARvY9rA== X-Developer-Key: i=vbabka@suse.cz; a=openpgp; fpr=A940D434992C2E8E99103D50224FA7E7CC82A664 X-Spamd-Result: default: False [-6.80 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.995]; MIME_GOOD(-0.10)[text/plain]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; RCPT_COUNT_TWELVE(0.00)[16]; TAGGED_RCPT(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; R_RATELIMIT(0.00)[to_ip_from(RL8ogcagzi1y561i1mcnzpnkwh)]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FREEMAIL_TO(0.00)[linux-foundation.org,google.com,suse.com,cmpxchg.org,nvidia.com,kernel.org,oracle.com,gmail.com,suse.de]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.cz:mid,suse.cz:email,imap1.dmz-prg2.suse.org:helo] X-Spam-Level: X-Spam-Flag: NO X-Spam-Score: -6.80 Since commit cc638f329ef6 ("mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations"), THP page fault allocations have settled on the following scheme (from the commit log): 1. local node only THP allocation with no reclaim, just compaction. 2. for madvised VMA's or when synchronous compaction is enabled always - THP allocation from any node with effort determined by global defrag setting and VMA madvise 3. fallback to base pages on any node Recent customer reports however revealed we have a gap in step 1 above. What we have seen is excessive reclaim due to THP page faults on a NUMA node that's close to its high watermark, while other nodes have plenty of free memory. The problem with step 1 is that it promises no reclaim after the compaction attempt, however reclaim is only avoided for certain compaction outcomes (deferred, or skipped due to insufficient free base pages), and not e.g. when compaction is actually performed but fails (we did see compact_fail vmstat counter increasing). THP page faults can therefore exhibit a zone_reclaim_mode-like behavior, which is not the intention. Thus add a check for __GFP_THISNODE that corresponds to this exact situation and prevents continuing with reclaim/compaction once the initial compaction attempt isn't successful in allocating the page. Note that commit cc638f329ef6 has not introduced this over-reclaim possibility; it appears to exist in some form since commit 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations"). Followup commits b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") and cc638f329ef6 have moved in the right direction, but left the abovementioned gap. Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") Signed-off-by: Vlastimil Babka Acked-by: Johannes Weiner Acked-by: Michal Hocko Acked-by: Pedro Falcato =20 --- mm/page_alloc.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 822e05f1a964..e6fd1213328b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4788,6 +4788,20 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, compact_result =3D=3D COMPACT_DEFERRED) goto nopage; =20 + /* + * THP page faults may attempt local node only first, + * but are then allowed to only compact, not reclaim, + * see alloc_pages_mpol() + * + * compaction can fail for other reasons than those + * checked above and we don't want such THP allocations + * to put reclaim pressure on a single node in a + * situation where other nodes might have plenty of + * available memory + */ + if (gfp_mask & __GFP_THISNODE) + goto nopage; + /* * Looks like reclaim/compaction is worth trying, but * sync compaction could be very expensive, so keep --=20 2.52.0 From nobody Thu Dec 18 23:23:23 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C729336EE8 for ; Tue, 16 Dec 2025 15:55:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765900510; cv=none; b=JxlmvirMYwrN+g6YEmr0uOehn/FwbgPr0HWaN7RrJ8P1Ks5ozhAIHAWsAer7bsKqWSR1seoq2WuMG9waPFIjp7lpODNnvwR67Zdei+g8J5z3bqtpXvMEfGoBr+pv4UW2YBu2SDxAH4Me0aqs2oVFFZ7jlFnvzQRZm28PfCJgADw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765900510; c=relaxed/simple; bh=Lkc/ZcxsK4gnLADhpDuMkrjWSePxTGl4SqTD12NEnag=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=j8I/1iTjL/+vPW/tFE2tkN+7sRE3zS9XXH4ygNGX2yDP0zLCIKNtnG6YASYCJiAWsa8/EIJeB4T+mirE5dv0JtBtSfWRBt/cGg3T/LdqE9MQuz0NBObeSYtVIPHyA7c3m6K1HCVOSIMBYK4lC9dwi662bClxagBxGCpeuoh7m6k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=HO8IN6v5; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=J/rUXhTY; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=HO8IN6v5; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=J/rUXhTY; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="HO8IN6v5"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="J/rUXhTY"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="HO8IN6v5"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="J/rUXhTY" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5631A5BD56; Tue, 16 Dec 2025 15:54:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1765900490; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MMTjfxlAnDMc1QSvwGfhAnbDz3V/hZGaPx315KuNIfg=; b=HO8IN6v5rfRA6QSMDEyHkcQxlS3tDP5er96y7JEIBhUE4Rz1R+3k0lv/zNsjmmmNwN/b5o XSkTXNIzYdFD6loRHfGgS8lEVfQMb+vggtt0X2pFChXkf9K0Q6hP3HPjQjFyy8hNrgnlfR 0TduxsRKNNg97Df7cYprekep26HFUPg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1765900490; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MMTjfxlAnDMc1QSvwGfhAnbDz3V/hZGaPx315KuNIfg=; b=J/rUXhTYPxBcI5iEQ5GzbSyIzzQJ/uM8fSkw+7qA006FQLSb5TE8yCJIIJWWs+YGYhQbVY n9sB12+RNaQvhzBw== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1765900490; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MMTjfxlAnDMc1QSvwGfhAnbDz3V/hZGaPx315KuNIfg=; b=HO8IN6v5rfRA6QSMDEyHkcQxlS3tDP5er96y7JEIBhUE4Rz1R+3k0lv/zNsjmmmNwN/b5o XSkTXNIzYdFD6loRHfGgS8lEVfQMb+vggtt0X2pFChXkf9K0Q6hP3HPjQjFyy8hNrgnlfR 0TduxsRKNNg97Df7cYprekep26HFUPg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1765900490; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MMTjfxlAnDMc1QSvwGfhAnbDz3V/hZGaPx315KuNIfg=; b=J/rUXhTYPxBcI5iEQ5GzbSyIzzQJ/uM8fSkw+7qA006FQLSb5TE8yCJIIJWWs+YGYhQbVY n9sB12+RNaQvhzBw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 396483EA66; Tue, 16 Dec 2025 15:54:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id aFFrDcqAQWkSGgAAD6G6ig (envelope-from ); Tue, 16 Dec 2025 15:54:50 +0000 From: Vlastimil Babka Date: Tue, 16 Dec 2025 16:54:22 +0100 Subject: [PATCH RFC 2/2] mm, page_alloc: fail costly __GFP_NORETRY allocations faster Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251216-thp-thisnode-tweak-v1-2-0e499d13d2eb@suse.cz> References: <20251216-thp-thisnode-tweak-v1-0-0e499d13d2eb@suse.cz> In-Reply-To: <20251216-thp-thisnode-tweak-v1-0-0e499d13d2eb@suse.cz> To: Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , David Rientjes , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Joshua Hahn , Pedro Falcato Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=4826; i=vbabka@suse.cz; h=from:subject:message-id; bh=Lkc/ZcxsK4gnLADhpDuMkrjWSePxTGl4SqTD12NEnag=; b=owEBiQF2/pANAwAIAbvgsHXSRYiaAcsmYgBpQYDG06nlhB1wt73I8tK0coD6fyIQLtXh75T33 MZSZvDfa76JAU8EAAEIADkWIQR7u8hBFZkjSJZITfG74LB10kWImgUCaUGAxhsUgAAAAAAEAA5t YW51MiwyLjUrMS4xMSwyLDIACgkQu+CwddJFiJqhBwf+O0DgjKAQej4/tVa/QabMW061ukKC7g+ o+BqzwIIRiljekNTb5O188A5C824YXBgghPtbK6z3OLzSqJgSQGMmAsb8SRazBFj4KL95EikCrd qdHpOH3lMWllR7TUhuAKgR5rfzTb6agP2XM7cZRK6JocrkriLhiVIUemvhz3gEAWPvDSf+MMa0b +eZkRfof4vYftVcYzyPc7+cxh92nMpp/mgK7aLwyopimUP+RDMjdIcdR5D5G66hAPB2jMPcV3ag Bfs8fzyS5C2wJBUIosO+mwIEe4tSqY0fNGeEzLFMkF9ZwpBg/KI0oR7VodBXtRV3k3uLqRXA1iG mYUBQxFzrvQ== X-Developer-Key: i=vbabka@suse.cz; a=openpgp; fpr=A940D434992C2E8E99103D50224FA7E7CC82A664 X-Spam-Flag: NO X-Spam-Score: -6.80 X-Spam-Level: X-Spamd-Result: default: False [-6.80 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.996]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; FUZZY_RATELIMITED(0.00)[rspamd.com]; ARC_NA(0.00)[]; FREEMAIL_TO(0.00)[linux-foundation.org,google.com,suse.com,cmpxchg.org,nvidia.com,kernel.org,oracle.com,gmail.com,suse.de]; RCPT_COUNT_TWELVE(0.00)[16]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; RCVD_TLS_ALL(0.00)[]; R_RATELIMIT(0.00)[to_ip_from(RL8ogcagzi1y561i1mcnzpnkwh)]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TAGGED_RCPT(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.cz:email,suse.cz:mid,imap1.dmz-prg2.suse.org:helo] For allocations that are of costly order and __GFP_NORETRY (and can perform compaction) we attempt direct compaction first. If that fails, we continue with a single round of direct reclaim+compaction (as for other __GFP_NORETRY allocations, except the compaction is of lower priority), with two exceptions that fail immediately: - __GFP_THISNODE is specified, to prevent zone_reclaim_mode-like behavior for e.g. THP page faults - compaction failed because it was deferred (i.e. has been failing recently so further attempts are not done for a while) or skipped, which means there are insufficient free base pages to defragment to begin with Upon closer inspection, the second condition has a somewhat flawed reasoning. If there are not enough base pages and reclaim could create them, we instead fail. When there are enough base pages and compaction has already ran and failed, we proceed and hope that reclaim and the subsequent compaction attempt will succeed. But it's unclear why they should and whether it will be as inexpensive as intended. It might make therefore more sense to just fail unconditionally after the initial compaction attempt, so do that instead. Costly allocations that do want the reclaim/compaction to happen at least once can omit __GFP_NORETRY, or even specify __GFP_RETRY_MAYFAIL for more than one attempt. There is a slight potential unfairness in that costly __GFP_NORETRY allocations that can't perform direct compaction (i.e. lack __GFP_IO) will still be allowed to direct reclaim, while those that can direct compact will now never attempt direct reclaim. However, in cases of memory pressure causing compaction to be skipped due to insufficient base pages, direct reclaim was already not done before, so there should be no functional regressions from this change. Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko --- include/linux/gfp_types.h | 2 ++ mm/page_alloc.c | 47 +++----------------------------------------= ---- 2 files changed, 5 insertions(+), 44 deletions(-) diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 3de43b12209e..051311fdbdb1 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -218,6 +218,8 @@ enum { * caller must handle the failure which is quite likely to happen under * heavy memory pressure. The flag is suitable when failure can easily be * handled at small cost, such as reduced throughput. + * For costly orders, only memory compaction can be attempted with no recl= aim + * under some conditions. * * %__GFP_RETRY_MAYFAIL: The VM implementation will retry memory reclaim * procedures that have previously failed if there is some indication diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e6fd1213328b..2671cbbd6375 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4763,52 +4763,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, goto got_pg; =20 /* - * Checks for costly allocations with __GFP_NORETRY, which - * includes some THP page fault allocations + * Compaction didn't succeed and we were told not to try hard, + * so fail now. */ if (costly_order && (gfp_mask & __GFP_NORETRY)) { - /* - * If allocating entire pageblock(s) and compaction - * failed because all zones are below low watermarks - * or is prohibited because it recently failed at this - * order, fail immediately unless the allocator has - * requested compaction and reclaim retry. - * - * Reclaim is - * - potentially very expensive because zones are far - * below their low watermarks or this is part of very - * bursty high order allocations, - * - not guaranteed to help because isolate_freepages() - * may not iterate over freed pages as part of its - * linear scan, and - * - unlikely to make entire pageblocks free on its - * own. - */ - if (compact_result =3D=3D COMPACT_SKIPPED || - compact_result =3D=3D COMPACT_DEFERRED) - goto nopage; - - /* - * THP page faults may attempt local node only first, - * but are then allowed to only compact, not reclaim, - * see alloc_pages_mpol() - * - * compaction can fail for other reasons than those - * checked above and we don't want such THP allocations - * to put reclaim pressure on a single node in a - * situation where other nodes might have plenty of - * available memory - */ - if (gfp_mask & __GFP_THISNODE) - goto nopage; - - /* - * Looks like reclaim/compaction is worth trying, but - * sync compaction could be very expensive, so keep - * using async compaction. - */ - compact_priority =3D INIT_COMPACT_PRIORITY; - } + goto nopage; } =20 retry: --=20 2.52.0