From nobody Sun Feb 8 20:17:53 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91ECC30DEA4 for ; Tue, 6 Jan 2026 11:52:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700370; cv=none; b=XDrJUhSkD6Ttx4d/P4tVfrpzy7lHipvBX4MSnHkkugbpbQb9g487yfLedXaxOXiOtv+FwHJ+NyQJ3jidBJZPydlQONLqrmAl/qM3qfcXblqgJ5SBDAWgW619dzXkTCa3eSdhxCa1MhNMsv9EEbeh1LhXRLwS7ZI0ubhrRyz0rSw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700370; c=relaxed/simple; bh=2UTSnRidjRYuamNSWux/SEIMHK2sswGF+lX7fPJzU+A=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=uLB/ft388ylQswhMW+LTzne57iSpwtBXVQQslpt/sNx8Hts4BdafFkvxZlkUlRW6CI75Km5gJSAclF17G2OvR5aGcS8y3MSdnFS6MvbWvKLwXqDo4BPkXA07P/jpGwDMYIps3wOGvVGpaTiGq7bAHds4Dwo8ELjr/i9KDlLoH5U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=FlhGfaBz; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=z8EtBhI+; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=FlhGfaBz; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=z8EtBhI+; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="FlhGfaBz"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="z8EtBhI+"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="FlhGfaBz"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="z8EtBhI+" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 118D5339E6; Tue, 6 Jan 2026 11:52:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1767700361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CaSHlvLRbEa+D8dhJvUczkpVUEn7xfSA0sZfubvINqI=; b=FlhGfaBzEx21fCCO4WNjZF+weLO7Amms+ev27Z8agBq1ehIpDOqQC+itIK+EFYLUJqHUGR tcNrd2t37TDyE/fy2fXP86qdOaE9u2gG1Ar8pRaea9QN2tJfHEl8MFKLzMnwM6KYVFd8AT w/uyFmLDwdQMj0WOJHaM8i9GUtrJX5w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1767700361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CaSHlvLRbEa+D8dhJvUczkpVUEn7xfSA0sZfubvINqI=; b=z8EtBhI+SNiZF2LFXAb7g1V53UlZ7Ajfg6XTcTz6xyEhG7IeTIKp3eMosRBPiBxs6Q4669 ClnCmrMzjspdE/Cw== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1767700361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CaSHlvLRbEa+D8dhJvUczkpVUEn7xfSA0sZfubvINqI=; b=FlhGfaBzEx21fCCO4WNjZF+weLO7Amms+ev27Z8agBq1ehIpDOqQC+itIK+EFYLUJqHUGR tcNrd2t37TDyE/fy2fXP86qdOaE9u2gG1Ar8pRaea9QN2tJfHEl8MFKLzMnwM6KYVFd8AT w/uyFmLDwdQMj0WOJHaM8i9GUtrJX5w= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1767700361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CaSHlvLRbEa+D8dhJvUczkpVUEn7xfSA0sZfubvINqI=; b=z8EtBhI+SNiZF2LFXAb7g1V53UlZ7Ajfg6XTcTz6xyEhG7IeTIKp3eMosRBPiBxs6Q4669 ClnCmrMzjspdE/Cw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E6C6C3EA65; Tue, 6 Jan 2026 11:52:40 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id OOAOOIj3XGnsZwAAD6G6ig (envelope-from ); Tue, 06 Jan 2026 11:52:40 +0000 From: Vlastimil Babka Date: Tue, 06 Jan 2026 12:52:36 +0100 Subject: [PATCH mm-unstable v3 1/3] mm/page_alloc: ignore the exact initial compaction result Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260106-thp-thisnode-tweak-v3-1-f5d67c21a193@suse.cz> References: <20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz> In-Reply-To: <20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz> To: Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , David Rientjes , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Joshua Hahn , Pedro Falcato Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.14.3 X-Spam-Flag: NO X-Spam-Score: -6.80 X-Spam-Level: X-Spamd-Result: default: False [-6.80 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.996]; MIME_GOOD(-0.10)[text/plain]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; RCPT_COUNT_TWELVE(0.00)[16]; TAGGED_RCPT(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; R_RATELIMIT(0.00)[to_ip_from(RL8ogcagzi1y561i1mcnzpnkwh)]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FREEMAIL_TO(0.00)[linux-foundation.org,google.com,suse.com,cmpxchg.org,nvidia.com,kernel.org,oracle.com,gmail.com,suse.de]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,suse.cz:email,suse.cz:mid] For allocations that are of costly order and __GFP_NORETRY (and can perform compaction) we attempt direct compaction first. If that fails, we continue with a single round of direct reclaim+compaction (as for other __GFP_NORETRY allocations, except the compaction is of lower priority), with two exceptions that fail immediately: - __GFP_THISNODE is specified, to prevent zone_reclaim_mode-like behavior for e.g. THP page faults - compaction failed because it was deferred (i.e. has been failing recently so further attempts are not done for a while) or skipped, which means there are insufficient free base pages to defragment to begin with Upon closer inspection, the second condition has a somewhat flawed reasoning. If there are not enough base pages and reclaim could create them, we instead fail. When there are enough base pages and compaction has already ran and failed, we proceed and hope that reclaim and the subsequent compaction attempt will succeed. But it's unclear why they should and whether it will be as inexpensive as intended. It might make therefore more sense to just fail unconditionally after the initial compaction attempt. However that would change the semantics of __GFP_NORETRY to attempt reclaim at least once. Alternatively we can remove the compaction result checks and proceed with the single reclaim and (lower priority) compaction attempt, leaving only the __GFP_THISNODE exception for failing immediately. Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko --- mm/page_alloc.c | 34 ++++++---------------------------- 1 file changed, 6 insertions(+), 28 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ac8a12076b00..b06b1cb01e0e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4805,44 +4805,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, * includes some THP page fault allocations */ if (costly_order && (gfp_mask & __GFP_NORETRY)) { - /* - * If allocating entire pageblock(s) and compaction - * failed because all zones are below low watermarks - * or is prohibited because it recently failed at this - * order, fail immediately unless the allocator has - * requested compaction and reclaim retry. - * - * Reclaim is - * - potentially very expensive because zones are far - * below their low watermarks or this is part of very - * bursty high order allocations, - * - not guaranteed to help because isolate_freepages() - * may not iterate over freed pages as part of its - * linear scan, and - * - unlikely to make entire pageblocks free on its - * own. - */ - if (compact_result =3D=3D COMPACT_SKIPPED || - compact_result =3D=3D COMPACT_DEFERRED) - goto nopage; - /* * THP page faults may attempt local node only first, * but are then allowed to only compact, not reclaim, * see alloc_pages_mpol(). * - * Compaction can fail for other reasons than those - * checked above and we don't want such THP allocations - * to put reclaim pressure on a single node in a - * situation where other nodes might have plenty of - * available memory. + * Compaction has failed above and we don't want such + * THP allocations to put reclaim pressure on a single + * node in a situation where other nodes might have + * plenty of available memory. */ if (gfp_mask & __GFP_THISNODE) goto nopage; =20 /* - * Looks like reclaim/compaction is worth trying, but - * sync compaction could be very expensive, so keep + * Proceed with single round of reclaim/compaction, but + * since sync compaction could be very expensive, keep * using async compaction. */ compact_priority =3D INIT_COMPACT_PRIORITY; --=20 2.52.0 From nobody Sun Feb 8 20:17:53 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24F4530DEA4 for ; Tue, 6 Jan 2026 11:53:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700382; cv=none; b=td/FRoPkpUngNwmgYJyUhZGYRIIUEp+onSwYKSnuTS2mYasnlMOtX99yu/NRFnpApC034kXpVCSYX8Y/MWswXKaFXTL2zNjJ/jabvGnEPabIpf2jUpMtNsvxCpNS1IUfehMxGKIWTQwRDjElBn89kgZKphtvaQqnRQpVQZJ4j2Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700382; c=relaxed/simple; bh=qkIsVnqXMoiSXwiv//4k1D7c0nxzB/d1QFFweKRLgxo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=BhpQc2fLh2hEVThQ0SUAGe0R/tymeEavzt9yml9M6V8/f69Nbp573cRvGfs6cApLiEFQWMN2/N7l0XWXyGLZNr8aXPbPPQ/ey84fMRx8XJoR0V9IKA/JAi1KbGRZstU0U+5dbxQQ01BBywbzm3+IpiBOWlYdSBFX9VkPOtyTNqI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=nMhFzuqT; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=XWt8Tip/; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=nMhFzuqT; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=XWt8Tip/; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="nMhFzuqT"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="XWt8Tip/"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="nMhFzuqT"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="XWt8Tip/" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 28A30339E7; Tue, 6 Jan 2026 11:52:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1767700361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PD2nK4yyRFKk9zNZMhwG4gs4iSjrSsb1YK98rOdGu6s=; b=nMhFzuqTslp4J5jcA3ZLOG+GGmUsgs1jGA+zYK4+TSAOwF5ntks1N+hGU3BvxNcW+LZ25r tlOoCsFhH1vNyPvTi5ilKK5jX/o87VcQo78ePh351hXFijZEz9VP2jhHyTihak5YFG6mKN l+IiRcfhSFQR08CHvM2Yx0BI3nCsOzg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1767700361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PD2nK4yyRFKk9zNZMhwG4gs4iSjrSsb1YK98rOdGu6s=; b=XWt8Tip/hbFvcJoIxcUhHXpTEhFWX0UAEKxoZtbQ/AWhT/GPwamjGvfWxHByhnboLG3V2e Ipa9UTESEBs7OqCA== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1767700361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PD2nK4yyRFKk9zNZMhwG4gs4iSjrSsb1YK98rOdGu6s=; b=nMhFzuqTslp4J5jcA3ZLOG+GGmUsgs1jGA+zYK4+TSAOwF5ntks1N+hGU3BvxNcW+LZ25r tlOoCsFhH1vNyPvTi5ilKK5jX/o87VcQo78ePh351hXFijZEz9VP2jhHyTihak5YFG6mKN l+IiRcfhSFQR08CHvM2Yx0BI3nCsOzg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1767700361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PD2nK4yyRFKk9zNZMhwG4gs4iSjrSsb1YK98rOdGu6s=; b=XWt8Tip/hbFvcJoIxcUhHXpTEhFWX0UAEKxoZtbQ/AWhT/GPwamjGvfWxHByhnboLG3V2e Ipa9UTESEBs7OqCA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 0A5DC3EA66; Tue, 6 Jan 2026 11:52:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id iCFFAon3XGnsZwAAD6G6ig (envelope-from ); Tue, 06 Jan 2026 11:52:41 +0000 From: Vlastimil Babka Date: Tue, 06 Jan 2026 12:52:37 +0100 Subject: [PATCH mm-unstable v3 2/3] mm/page_alloc: refactor the initial compaction handling Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260106-thp-thisnode-tweak-v3-2-f5d67c21a193@suse.cz> References: <20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz> In-Reply-To: <20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz> To: Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , David Rientjes , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Joshua Hahn , Pedro Falcato Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.14.3 X-Spam-Flag: NO X-Spam-Score: -6.80 X-Spam-Level: X-Spamd-Result: default: False [-6.80 / 50.00]; REPLY(-4.00)[]; BAYES_HAM(-3.00)[100.00%]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.995]; MIME_GOOD(-0.10)[text/plain]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; RCPT_COUNT_TWELVE(0.00)[16]; TAGGED_RCPT(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; R_RATELIMIT(0.00)[to_ip_from(RL8ogcagzi1y561i1mcnzpnkwh)]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FREEMAIL_TO(0.00)[linux-foundation.org,google.com,suse.com,cmpxchg.org,nvidia.com,kernel.org,oracle.com,gmail.com,suse.de]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,suse.cz:email,suse.cz:mid] The initial direct compaction done in some cases in __alloc_pages_slowpath() stands out from the main retry loop of reclaim + compaction. We can simplify this by instead skipping the initial reclaim attempt via a new local variable compact_first, and handle the compact_prority as necessary to match the original behavior. No functional change intended. Suggested-by: Johannes Weiner Signed-off-by: Vlastimil Babka Reviewed-by: Joshua Hahn Acked-by: Michal Hocko --- include/linux/gfp.h | 8 ++++- mm/page_alloc.c | 100 +++++++++++++++++++++++++-----------------------= ---- 2 files changed, 55 insertions(+), 53 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index aa45989f410d..6ecf6dda93e0 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -407,9 +407,15 @@ extern gfp_t gfp_allowed_mask; /* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask); =20 +/* A helper for checking if gfp includes all the specified flags */ +static inline bool gfp_has_flags(gfp_t gfp, gfp_t flags) +{ + return (gfp & flags) =3D=3D flags; +} + static inline bool gfp_has_io_fs(gfp_t gfp) { - return (gfp & (__GFP_IO | __GFP_FS)) =3D=3D (__GFP_IO | __GFP_FS); + return gfp_has_flags(gfp, __GFP_IO | __GFP_FS); } =20 /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b06b1cb01e0e..3b2579c5716f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4702,7 +4702,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, struct alloc_context *ac) { bool can_direct_reclaim =3D gfp_mask & __GFP_DIRECT_RECLAIM; - bool can_compact =3D gfp_compaction_allowed(gfp_mask); + bool can_compact =3D can_direct_reclaim && gfp_compaction_allowed(gfp_mas= k); bool nofail =3D gfp_mask & __GFP_NOFAIL; const bool costly_order =3D order > PAGE_ALLOC_COSTLY_ORDER; struct page *page =3D NULL; @@ -4715,6 +4715,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, unsigned int cpuset_mems_cookie; unsigned int zonelist_iter_cookie; int reserve_flags; + bool compact_first =3D false; =20 if (unlikely(nofail)) { /* @@ -4738,6 +4739,19 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, cpuset_mems_cookie =3D read_mems_allowed_begin(); zonelist_iter_cookie =3D zonelist_iter_begin(); =20 + /* + * For costly allocations, try direct compaction first, as it's likely + * that we have enough base pages and don't need to reclaim. For non- + * movable high-order allocations, do that as well, as compaction will + * try prevent permanent fragmentation by migrating from blocks of the + * same migratetype. + */ + if (can_compact && (costly_order || (order > 0 && + ac->migratetype !=3D MIGRATE_MOVABLE))) { + compact_first =3D true; + compact_priority =3D INIT_COMPACT_PRIORITY; + } + /* * The fast path uses conservative alloc_flags to succeed only until * kswapd needs to be woken up, and to avoid the cost of setting up @@ -4780,53 +4794,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, if (page) goto got_pg; =20 - /* - * For costly allocations, try direct compaction first, as it's likely - * that we have enough base pages and don't need to reclaim. For non- - * movable high-order allocations, do that as well, as compaction will - * try prevent permanent fragmentation by migrating from blocks of the - * same migratetype. - * Don't try this for allocations that are allowed to ignore - * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. - */ - if (can_direct_reclaim && can_compact && - (costly_order || - (order > 0 && ac->migratetype !=3D MIGRATE_MOVABLE)) - && !gfp_pfmemalloc_allowed(gfp_mask)) { - page =3D __alloc_pages_direct_compact(gfp_mask, order, - alloc_flags, ac, - INIT_COMPACT_PRIORITY, - &compact_result); - if (page) - goto got_pg; - - /* - * Checks for costly allocations with __GFP_NORETRY, which - * includes some THP page fault allocations - */ - if (costly_order && (gfp_mask & __GFP_NORETRY)) { - /* - * THP page faults may attempt local node only first, - * but are then allowed to only compact, not reclaim, - * see alloc_pages_mpol(). - * - * Compaction has failed above and we don't want such - * THP allocations to put reclaim pressure on a single - * node in a situation where other nodes might have - * plenty of available memory. - */ - if (gfp_mask & __GFP_THISNODE) - goto nopage; - - /* - * Proceed with single round of reclaim/compaction, but - * since sync compaction could be very expensive, keep - * using async compaction. - */ - compact_priority =3D INIT_COMPACT_PRIORITY; - } - } - retry: /* * Deal with possible cpuset update races or zonelist updates to avoid @@ -4870,10 +4837,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, goto nopage; =20 /* Try direct reclaim and then allocating */ - page =3D __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac, - &did_some_progress); - if (page) - goto got_pg; + if (!compact_first) { + page =3D __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, + ac, &did_some_progress); + if (page) + goto got_pg; + } =20 /* Try direct compaction and then allocating */ page =3D __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac, @@ -4881,6 +4850,33 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, if (page) goto got_pg; =20 + if (compact_first) { + /* + * THP page faults may attempt local node only first, but are + * then allowed to only compact, not reclaim, see + * alloc_pages_mpol(). + * + * Compaction has failed above and we don't want such THP + * allocations to put reclaim pressure on a single node in a + * situation where other nodes might have plenty of available + * memory. + */ + if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE)) + goto nopage; + + /* + * For the initial compaction attempt we have lowered its + * priority. Restore it for further retries, if those are + * allowed. With __GFP_NORETRY there will be a single round of + * reclaim and compaction with the lowered priority. + */ + if (!(gfp_mask & __GFP_NORETRY)) + compact_priority =3D DEF_COMPACT_PRIORITY; + + compact_first =3D false; + goto retry; + } + /* Do not loop if specifically requested */ if (gfp_mask & __GFP_NORETRY) goto nopage; --=20 2.52.0 From nobody Sun Feb 8 20:17:53 2026 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD7D830DEA4 for ; Tue, 6 Jan 2026 11:52:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700376; cv=none; b=OPbmR85UTd+buFYJbOzfnpFGvuwPKePcW7lmTe9pW/atf9KnzZNPZ71q4Q8VVHAPdelO120hZD/KaQcCAC0Sw8AhH8/qFEc3U5XXcYr6wY/MqinTTEjMN07Xf6FQWx1Pqnfjt3B2jUq1MbGC4Wa8v414O9JfKgDNfZuQULplrY8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700376; c=relaxed/simple; bh=KGE8YMFL2z0uD+r5nqaIVLDmX26gGnsMoB5aApB8dII=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=tkz8Q1LjkPHX15xAB8A/qnXKkHOgFF7TFgZLW0ApMXcSwZK2ABC6Ew16/Osgouc0AF1PmwPOAjjlpCgarwJxac5m91lSV2Mgh4FI0WytKb7A924g3+KEY3vOWJ1qZK9B4vE5ot8lz8915Pb9CasP2SacND38nX01YWTgWTSlcRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 414E0339E8; Tue, 6 Jan 2026 11:52:41 +0000 (UTC) Authentication-Results: smtp-out1.suse.de; none Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 221803EA67; Tue, 6 Jan 2026 11:52:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id gHEGCIn3XGnsZwAAD6G6ig (envelope-from ); Tue, 06 Jan 2026 11:52:41 +0000 From: Vlastimil Babka Date: Tue, 06 Jan 2026 12:52:38 +0100 Subject: [PATCH mm-unstable v3 3/3] mm/page_alloc: simplify __alloc_pages_slowpath() flow Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260106-thp-thisnode-tweak-v3-3-f5d67c21a193@suse.cz> References: <20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz> In-Reply-To: <20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz> To: Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , David Rientjes , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Joshua Hahn , Pedro Falcato Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka X-Mailer: b4 0.14.3 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 50.00]; REPLY(-4.00)[]; TAGGED_RCPT(0.00)[] X-Rspamd-Queue-Id: 414E0339E8 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spam-Score: -4.00 X-Spam-Level: X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spam-Flag: NO X-Rspamd-Action: no action The actions done before entering the main retry loop include waking up kswapds and an allocation attempt with the precise alloc_flags. Then in the loop we keep waking up kswapds, and we retry the allocation with flags potentially further adjusted by being allowed to use reserves (due to e.g. becoming an OOM killer victim). We can adjust the retry loop to keep only one instance of waking up kswapds and allocation attempt. Introduce the can_retry_reserves variable for retrying once when we become eligible for reserves. It is still useful not to evaluate reserve_flags immediately for the first allocation attempt, because it's better to first try succeed in a non-preferred zone above the min watermark before allocating immediately from the preferred zone below min watermark. Additionally move the cpuset update checks introduced by e05741fb10c3 ("mm/page_alloc.c: avoid infinite retries caused by cpuset race") further down the retry loop. It's enough to do the checks only before reaching any potentially infinite 'goto retry;' loop. There should be no meaningful functional changes. The change of exact moments the retry for reserves and cpuset updates are checked should not result in different outomes modulo races with concurrent allocator activity. Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko --- mm/page_alloc.c | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3b2579c5716f..c02564042618 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4716,6 +4716,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, unsigned int zonelist_iter_cookie; int reserve_flags; bool compact_first =3D false; + bool can_retry_reserves =3D true; =20 if (unlikely(nofail)) { /* @@ -4783,6 +4784,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int o= rder, goto nopage; } =20 +retry: + /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac); =20 @@ -4794,19 +4797,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, if (page) goto got_pg; =20 -retry: - /* - * Deal with possible cpuset update races or zonelist updates to avoid - * infinite retries. - */ - if (check_retry_cpuset(cpuset_mems_cookie, ac) || - check_retry_zonelist(zonelist_iter_cookie)) - goto restart; - - /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ - if (alloc_flags & ALLOC_KSWAPD) - wake_all_kswapds(order, gfp_mask, ac); - reserve_flags =3D __gfp_pfmemalloc_flags(gfp_mask); if (reserve_flags) alloc_flags =3D gfp_to_alloc_flags_cma(gfp_mask, reserve_flags) | @@ -4821,12 +4811,18 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int= order, ac->nodemask =3D NULL; ac->preferred_zoneref =3D first_zones_zonelist(ac->zonelist, ac->highest_zoneidx, ac->nodemask); - } =20 - /* Attempt with potentially adjusted zonelist and alloc_flags */ - page =3D get_page_from_freelist(gfp_mask, order, alloc_flags, ac); - if (page) - goto got_pg; + /* + * The first time we adjust anything due to being allowed to + * ignore memory policies or watermarks, retry immediately. This + * allows us to keep the first allocation attempt optimistic so + * it can succeed in a zone that is still above watermarks. + */ + if (can_retry_reserves) { + can_retry_reserves =3D false; + goto retry; + } + } =20 /* Caller is not willing to reclaim, we can't balance anything */ if (!can_direct_reclaim) @@ -4889,6 +4885,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int = order, !(gfp_mask & __GFP_RETRY_MAYFAIL))) goto nopage; =20 + /* + * Deal with possible cpuset update races or zonelist updates to avoid + * infinite retries. No "goto retry;" can be placed above this check + * unless it can execute just once. + */ + if (check_retry_cpuset(cpuset_mems_cookie, ac) || + check_retry_zonelist(zonelist_iter_cookie)) + goto restart; + if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, did_some_progress > 0, &no_progress_loops)) goto retry; --=20 2.52.0