From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7638EC77B75
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:14:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232819AbjDRTNz (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:13:55 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57036 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232874AbjDRTN3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:29 -0400
Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com
 [IPv6:2607:f8b0:4864:20::f2e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83ED4C66A
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:21 -0700 (PDT)
Received: by mail-qv1-xf2e.google.com with SMTP id me15so9555374qvb.4
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845200;
 x=1684437200;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=k7sno8AaRf5ATInZUw+i9Aey3v69SOK+YNtRO/2P9OU=;
        b=x7kihz2Xek9Oawcxi+XxZhyApuNkNkSuXAlGa0k24xE/zxYmZiL9Tjnypqyo29xZrW
         Nsmvw/Opca5Epg2iXzQaqdiJrL5rqFjWGypHqp5urO1vUn8xS+vqGK55FJzonBr1mCwf
         wGaiiiFg2tItrhXI7KwlkaJG/RDfu09WqpY33e+D80EVxDPradvBJJ12Je5vjNrE+TPi
         ts+YdchVOie8BtYVf5YyCDVqV+r9hhn3Nl1kzyCzI5gGydNCC6lEPeXQFfpGWMSh2+pi
         CGhIO6Il7AoD+ydlRVCr5v40EZuhLfXbDgvfOUIzqKX+MJNjjoo0qr3XT7MqqIkU6bO/
         IqJQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845200; x=1684437200;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=k7sno8AaRf5ATInZUw+i9Aey3v69SOK+YNtRO/2P9OU=;
        b=NrTVueUMFASXhcrND38dhLe/koeJIvmgA8lq+TDM2rBb43oK/pKgidf8gEMMOwRIuQ
         cxz2uIydibwwg66ycuyaR3oLYy+CSwTzRI2IjuCqVtnbSzjC96a9FH8ZwAMvEJJWLWdr
         ThBu8+D3eHD47DsoS8IYZ4cLCSxLGazSlrX25Lww5a1J4H+iElMIK8vLln6STnbUs4+j
         5C/01vgdD2IVFWSrR1PrUZrii65d5bkfCwNVA3sp7HirIJfZ8mzmZ9WgMldgYt4Me7na
         QaQodiEz0KMiN/hYd7SllEmIdKcjBPP7hTweriHljNV7oxf3Q52EP4tKPd5RB+yTT45z
         S9vA==
X-Gm-Message-State: AAQBX9fihF4B3AVTid7L/Lbk1RdpjiY23/IgkYpDvzCIMipxuPJguJya
        pVwn7+b5dM4hhAAMziDkPtGQKA==
X-Google-Smtp-Source: 
 AKy350Y2WvA1kj6Bsm7oquBxOdrZDbffru8om2OFKpHsG6Pe1CEinFIFTzd6IhJxiRR2FIVaaYlT9A==
X-Received: by 2002:a05:6214:c4b:b0:5ef:807b:2a96 with SMTP id
 r11-20020a0562140c4b00b005ef807b2a96mr11066482qvj.21.1681845200482;
        Tue, 18 Apr 2023 12:13:20 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 k21-20020a05620a0b9500b0074e04516389sm659536qkh.35.2023.04.18.12.13.20
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:20 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 01/26] block: bdev: blockdev page cache is movable
Date: Tue, 18 Apr 2023 15:12:48 -0400
Message-Id: <20230418191313.268131-2-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

While inspecting page blocks for the type of pages in them, I noticed
a large number of blockdev cache in unmovable blocks. However, these
pages are actually on the LRU, and their mapping has a .migrate_folio
callback; they can be reclaimed and compacted as necessary.

Put them into movable blocks, so they don't cause pollution, and
subsequent proliferation, of unmovable blocks.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 block/bdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bdev.c b/block/bdev.c
index edc110d90df4..6abe4766d073 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -488,7 +488,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u=
8 partno)
 	inode->i_mode =3D S_IFBLK;
 	inode->i_rdev =3D 0;
 	inode->i_data.a_ops =3D &def_blk_aops;
-	mapping_set_gfp_mask(&inode->i_data, GFP_USER);
+	mapping_set_gfp_mask(&inode->i_data, GFP_USER|__GFP_MOVABLE);
=20
 	bdev =3D I_BDEV(inode);
 	mutex_init(&bdev->bd_fsfreeze_mutex);
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 450D6C77B78
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:14:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232836AbjDRTOC (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57096 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232882AbjDRTNc (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:32 -0400
Received: from mail-qv1-xf2f.google.com (mail-qv1-xf2f.google.com
 [IPv6:2607:f8b0:4864:20::f2f])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A779ABBB5
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:22 -0700 (PDT)
Received: by mail-qv1-xf2f.google.com with SMTP id a15so7394815qvn.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845201;
 x=1684437201;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=+u1UuZT1pIMEXypNrA+9XQUlZTAHYDviJjF8gqonVQU=;
        b=SR96bpw9MNAx3LCxCgWDO8w9zD5dt4ByUGuuyAnVydDGhAmlwQbHzS187awsb5xwdO
         3E5aqY4YcBiZQTeoz3fQUkUokUCWEDRMjGydvXec0gcHjnJOsbk/0vKlUc/tnBFoT/0C
         kWH+7WMwtws4YbPqzcCRt8bBH1bl+ICBQsLfkT1ezlUyIH/5+oOmw29dhYFofT1hkTdY
         fyMPwBPciCacz/1C0AgpYcdlzBi7yEJearS5hkjaNIgnLfuukw0DCDMopCTZGHeXjebj
         KTN/KuXdWL+pLGpBy6kKsnYoyQAxLtS4wrxhGbhjaIIdY/MpFdAuz6/NMu5kBVygLQNV
         /nvQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845201; x=1684437201;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=+u1UuZT1pIMEXypNrA+9XQUlZTAHYDviJjF8gqonVQU=;
        b=X2DgNCD8iHy7cBrtqeA5Eyw5LsmyfKXosug9cG5W91cyuNSs6Ym8j1SekBHrR28+IJ
         LrxwtQw1Q3DU4/iuJkl9V74eZ8jXagzL2c+QG6SruliU80+b01A2XsEMGSqhdtJ1q7RG
         H889eYEHTeVdExUKGq/meiELk4qcGwl+A4Pq0o1IxXyS9O8OMF1YTOxsUaUwUwEEsogK
         DxgY30n2PLanFWLsvOEdokbOafmeIS2fNcNs3iA6cPetBiLDQr1Oz6F1mkOpTMc/gBqD
         cfwTz4DzoLaQohKRqkr+R0RAkAZ4NB9hWstjYiuSJlBM/1cf3kmCflaN2UEGfoRNqgEo
         I3jQ==
X-Gm-Message-State: AAQBX9dniAE+KPaXODHQvSd7dOkaqeMb+a5p3EZ0pAspejtVuFMNZ9Up
        uPQkynf0t5aQqrcPJn5sUio0DQ==
X-Google-Smtp-Source: 
 AKy350Y9G6t8iPg8KGaVLYACgNehTbb26PoR9hC3G0xtoef9vY2sm05dJCXh7ylUzUWaDJ07gr2qMA==
X-Received: by 2002:ad4:5b82:0:b0:5e9:9eb:e026 with SMTP id
 2-20020ad45b82000000b005e909ebe026mr25993947qvp.29.1681845201676;
        Tue, 18 Apr 2023 12:13:21 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 ay35-20020a05620a17a300b0074e0951c7e7sm428997qkb.28.2023.04.18.12.13.21
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:21 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 02/26] mm: compaction: avoid GFP_NOFS deadlocks
Date: Tue, 18 Apr 2023 15:12:49 -0400
Message-Id: <20230418191313.268131-3-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

During stress testing, two deadlock scenarios were observed:

1. One GFP_NOFS allocation was sleeping on too_many_isolated(), and
   all CPUs were busy with compactors that appeared to be spinning on
   buffer locks.

   Give GFP_NOFS compactors additional isolation headroom, the same
   way we do during reclaim, to eliminate this deadlock scenario.

2. In a more pernicious scenario, the GFP_NOFS allocation was
   busy-spinning in compaction, but seemingly never making
   progress. Upon closer inspection, memory was dominated by file
   pages, which the fs compactor isn't allowed to touch. The remaining
   anon pages didn't have the contiguity to satisfy the request.

   Allow GFP_NOFS allocations to bypass watermarks when compaction
   failed at the highest priority.

While these deadlocks were encountered only in tests with the
subsequent patches (which put a lot more demand on compaction), in
theory these problems already exist in the code today. Fix them now.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 15 +++++++++++++--
 mm/page_alloc.c | 10 +++++++++-
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 8238e83385a7..84db84e8fd3a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc,
 }
=20
 /* Similar to reclaim, but different enough that they don't share logic */
-static bool too_many_isolated(pg_data_t *pgdat)
+static bool too_many_isolated(struct compact_control *cc)
 {
+	pg_data_t *pgdat =3D cc->zone->zone_pgdat;
 	bool too_many;
=20
 	unsigned long active, inactive, isolated;
@@ -758,6 +759,16 @@ static bool too_many_isolated(pg_data_t *pgdat)
 	isolated =3D node_page_state(pgdat, NR_ISOLATED_FILE) +
 			node_page_state(pgdat, NR_ISOLATED_ANON);
=20
+	/*
+	 * GFP_NOFS callers are allowed to isolate more pages, so they
+	 * won't get blocked by normal direct-reclaimers, forming a
+	 * circular deadlock. GFP_NOIO won't get here.
+	 */
+	if (cc->gfp_mask & __GFP_FS) {
+		inactive >>=3D 3;
+		active >>=3D 3;
+	}
+
 	too_many =3D isolated > (inactive + active) / 2;
 	if (!too_many)
 		wake_throttle_isolated(pgdat);
@@ -806,7 +817,7 @@ isolate_migratepages_block(struct compact_control *cc, =
unsigned long low_pfn,
 	 * list by either parallel reclaimers or compaction. If there are,
 	 * delay for some time until fewer pages are isolated
 	 */
-	while (unlikely(too_many_isolated(pgdat))) {
+	while (unlikely(too_many_isolated(cc))) {
 		/* stop isolation if there are still pages not migrated */
 		if (cc->nr_migratepages)
 			return -EAGAIN;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3bb3484563ed..ac03571e0532 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4508,8 +4508,16 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigne=
d int order,
 		prep_new_page(page, order, gfp_mask, alloc_flags);
=20
 	/* Try get a page from the freelist if available */
-	if (!page)
+	if (!page) {
+		/*
+		 * It's possible that the only migration sources are
+		 * file pages, and the GFP_NOFS stack is holding up
+		 * other compactors. Use reserves to avoid deadlock.
+		 */
+		if (prio =3D=3D MIN_COMPACT_PRIORITY && !(gfp_mask & __GFP_FS))
+			alloc_flags |=3D ALLOC_NO_WATERMARKS;
 		page =3D get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
+	}
=20
 	if (page) {
 		struct zone *zone =3D page_zone(page);
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F1342C6FD18
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:14:24 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232823AbjDRTN7 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:13:59 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57104 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232884AbjDRTNd (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:33 -0400
Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com
 [IPv6:2607:f8b0:4864:20::f29])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A268A10240
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:23 -0700 (PDT)
Received: by mail-qv1-xf29.google.com with SMTP id m16so18343888qvx.9
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845202;
 x=1684437202;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=R5SnIlaO0rXlDNrbIipd1QFsl3Kwll1/DWWk9w0ZS8w=;
        b=phgXjhYwXg3OCGPRKhBvTgFcaFQYHSKs8TQycpKy6Vj7NYOeHax3ZHG4ro7qbt6UuA
         JF51v3B18zEUVvZDZnNNDBXHiS2b6Tq1HXFiJxKw9p+6Mx89Rgd2RJU/zsjn7vLY241i
         MKeUzmB5AHKtmXkf4hDMRr20OIpBEF+bV8qpHw5olo3beiyALYxqKqMbkp/HbNpGk7NJ
         ukwixb6eSH0cNmhdMS8bZQt3NGWA7C/bL+/YY6wI83+dKvJxLNBr78X4ennNHHmOj0Yt
         5r4Yf3GUa7UROctvzCXAsXuKLK1G6qtC30yjMZWjnRlXK/bEWWb7onJzOj7tlzRdljt6
         Babg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845202; x=1684437202;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=R5SnIlaO0rXlDNrbIipd1QFsl3Kwll1/DWWk9w0ZS8w=;
        b=dIknDFQVR6vUzCvpYWi8Y701gfHBviXYqAp11IB0Mz1I3LMGFQq3J/3eJFwSenRwnJ
         dykKRAnDEWH/9Ow/BG7ZiCSk6QqLedBynqpEy/MvYh3+p9auFW8PBvcYblIMU/Sh9S0Z
         u7Ua8QHsH6lYCXtpZ8THbuISlB/dk0VGDSBr89OacSue/ONxKKjbu/ZxiGWenPKIyS+g
         8W+ON+pn1TPN8XLfRJrKHPX108bOfLWXZxn5bzKZL9Z90blKBvJHhaPyedEv0ZbcQuLe
         X0LjVXR8hJyhqDyVyc59BB1G9WRnRY561+zoz/yVioyNTuvR3P/X8R2Ufdd7FvXUasXT
         m20Q==
X-Gm-Message-State: AAQBX9e6Wm+jCt9x19ersY7ORJvkZ3aFno1dLdiIrFq46f0wYPEMDG40
        GMAAHWW9pVt27hjijxk6UCQcsTqmV7tK7zh3f+I=
X-Google-Smtp-Source: 
 AKy350b9afMxBv6edVSpodj5/I8/s1MMj3kTb89vTQ4djsDWn/SVzn9x9uBL9GyIXdZ9FhPbmcE+vA==
X-Received: by 2002:a05:6214:20ed:b0:5c5:c835:c8f1 with SMTP id
 13-20020a05621420ed00b005c5c835c8f1mr25959976qvk.22.1681845202760;
        Tue, 18 Apr 2023 12:13:22 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 o5-20020a0cc385000000b005ef4ee791casm3855524qvi.120.2023.04.18.12.13.22
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:22 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 03/26] mm: make pageblock_order 2M per default
Date: Tue, 18 Apr 2023 15:12:50 -0400
Message-Id: <20230418191313.268131-4-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

pageblock_order can be of various sizes, depending on configuration,
but the default is MAX_ORDER-1. Given 4k pages, that comes out to
4M. This is a large chunk for the allocator/reclaim/compaction to try
to keep grouped per migratetype. It's also unnecessary as the majority
of higher order allocations - THP and slab - are smaller than that.

Before subsequent patches increase the effort that goes into
maintaining migratetype isolation, it's important to first set the
defrag block size to what's likely to have common consumers.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/pageblock-flags.h | 4 ++--
 mm/page_alloc.c                 | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flag=
s.h
index 5f1ae07d724b..05b6811f8cee 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -47,8 +47,8 @@ extern unsigned int pageblock_order;
=20
 #else /* CONFIG_HUGETLB_PAGE */
=20
-/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
-#define pageblock_order		(MAX_ORDER-1)
+/* Manage fragmentation at the 2M level */
+#define pageblock_order		ilog2(2U << (20 - PAGE_SHIFT))
=20
 #endif /* CONFIG_HUGETLB_PAGE */
=20
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ac03571e0532..5e04a69f6a26 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7634,7 +7634,7 @@ static inline void setup_usemap(struct zone *zone) {}
 /* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
 void __init set_pageblock_order(void)
 {
-	unsigned int order =3D MAX_ORDER - 1;
+	unsigned int order =3D ilog2(2U << (20 - PAGE_SHIFT));
=20
 	/* Check that pageblock_nr_pages has not already been setup */
 	if (pageblock_order)
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 56B77C77B7E
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:14:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232866AbjDRTOE (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:04 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57324 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232895AbjDRTNi (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:38 -0400
Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com
 [IPv6:2607:f8b0:4864:20::82e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6D9AA26F
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:24 -0700 (PDT)
Received: by mail-qt1-x82e.google.com with SMTP id l11so30661643qtj.4
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845204;
 x=1684437204;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=TaODw93P+SR5wwuqKRfmGnBXSp8Lv6zFL4m1Q/yirc8=;
        b=kkfQYtWHK0cjsqR3Tr2Evmha89C/CGQuIEEPuLJT4pN5xWQi7d8Vq8uFMbzxn0pTuC
         vDsdfEFFx8thdvtcSffx5yY2knbkiKEIrb36FC9V3aQefF2s2mRjSVPD8070oJiQg8ZU
         r5RGXeP3tLr0RU10A3wP/Cz9FMVpBN6V/JJLvDmUodacBd0JSArMMCR8nLaXExEBOlpe
         yNw3G0y7GS5KEOFNn1FFACN24lKo6bTWTBgwy+WWKudhSd5CtHRcuY63yhaEwdjZrzkl
         7WYaotdrp8RPDc3Q3kZ6QS6x9WTWeqdC+Y/EyJLjEHZmM+bvNm4BrANxjwU1IMwgDCiF
         l0rQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845204; x=1684437204;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=TaODw93P+SR5wwuqKRfmGnBXSp8Lv6zFL4m1Q/yirc8=;
        b=jQ0OxWtvcJqOtoLwQfW7ABMAAm9rvGev9zzPE78N6gFKgwp5hkF5S4ac6u/RbiX9rl
         bL/JKOu4P8PcA9eZTOwFLTHq9U9qFRz39yfvCLpPWfqlxItH0XaxuB/hCblq3z7RNcds
         4NHMFJaiO0xfTkw0JZWfpdbyjs5RXjeg53i/kc1zZnl30QiobhmOuw2w1JHLgYRm2zH1
         7ikXWv/ju0O3kIRcFhHVhVU0x7KEWgKZ3A7hhTGe0T4kKPPusRjJc6QsBNJNNB38fr8g
         0r/Uxulf3XWXGHvZQ5c//5LhEm+T/MEyGQkenSbxsrx+RuxRyeqQZ5OD0P7ml+2UnBEN
         IqWg==
X-Gm-Message-State: AAQBX9fuYgxUNQqYAzm8QizsIzUwZfHC+3knMS1w6BeJhWtvi/qlv8zV
        Ht4DifelIGG/crakBifMy1F5aA==
X-Google-Smtp-Source: 
 AKy350aw5jRncIivAr25Em9/snSJvkqiUm+fvvm6Oxsi6j5yZZnDL1qzCh3Cxx7kfD1wTwtewWkLkA==
X-Received: by 2002:a05:622a:1042:b0:3ef:2db1:6e75 with SMTP id
 f2-20020a05622a104200b003ef2db16e75mr1455616qte.24.1681845203869;
        Tue, 18 Apr 2023 12:13:23 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 p23-20020a05620a22f700b0074ad04609f8sm1217413qki.117.2023.04.18.12.13.23
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:23 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 04/26] mm: page_isolation: write proper kerneldoc
Date: Tue, 18 Apr 2023 15:12:51 -0400
Message-Id: <20230418191313.268131-5-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

And remove the incorrect header comments.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/page-isolation.h | 24 ++++++------------------
 mm/page_isolation.c            | 29 ++++++++++++++++++++++++-----
 2 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 5456b7be38ae..0ab089e89db4 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -37,24 +37,12 @@ void set_pageblock_migratetype(struct page *page, int m=
igratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype, int *num_movable);
=20
-/*
- * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- */
-int
-start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			 int migratetype, int flags, gfp_t gfp_flags);
-
-/*
- * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
- * target range is [start_pfn, end_pfn)
- */
-void
-undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			int migratetype);
-
-/*
- * Test all pages in [start_pfn, end_pfn) are isolated or not.
- */
+int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pf=
n,
+			     int migratetype, int flags, gfp_t gfp_flags);
+
+void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pf=
n,
+			     int migratetype);
+
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 			int isol_flags);
=20
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 47fbc1696466..b67800f7f6b1 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -481,8 +481,7 @@ static int isolate_single_pageblock(unsigned long bound=
ary_pfn, int flags,
 }
=20
 /**
- * start_isolate_page_range() - make page-allocation-type of range of page=
s to
- * be MIGRATE_ISOLATE.
+ * start_isolate_page_range() - mark page range MIGRATE_ISOLATE
  * @start_pfn:		The lower PFN of the range to be isolated.
  * @end_pfn:		The upper PFN of the range to be isolated.
  * @migratetype:	Migrate type to set in error recovery.
@@ -571,8 +570,14 @@ int start_isolate_page_range(unsigned long start_pfn, =
unsigned long end_pfn,
 	return 0;
 }
=20
-/*
- * Make isolated pages available again.
+/**
+ * undo_isolate_page_range - undo effects of start_isolate_page_range()
+ * @start_pfn:		The lower PFN of the isolated range
+ * @end_pfn:		The upper PFN of the isolated range
+ * @migratetype:	New migrate type to set on the range
+ *
+ * This finds every MIGRATE_ISOLATE page block in the given range
+ * and switches it to @migratetype.
  */
 void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pf=
n,
 			    int migratetype)
@@ -631,7 +636,21 @@ __test_page_isolated_in_pageblock(unsigned long pfn, u=
nsigned long end_pfn,
 	return pfn;
 }
=20
-/* Caller should ensure that requested range is in a single zone */
+/**
+ * test_pages_isolated - check if pageblocks in range are isolated
+ * @start_pfn:		The first PFN of the isolated range
+ * @end_pfn:		The first PFN *after* the isolated range
+ * @isol_flags:		Testing mode flags
+ *
+ * This tests if all in the specified range are free.
+ *
+ * If %MEMORY_OFFLINE is specified in @flags, it will consider
+ * poisoned and offlined pages free as well.
+ *
+ * Caller must ensure the requested range doesn't span zones.
+ *
+ * Returns 0 if true, -EBUSY if one or more pages are in use.
+ */
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 			int isol_flags)
 {
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6D3A7C77B7D
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:14:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232887AbjDRTOF (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:05 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57336 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232897AbjDRTNj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:39 -0400
Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com
 [IPv6:2607:f8b0:4864:20::f2d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D88007EE5
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:25 -0700 (PDT)
Received: by mail-qv1-xf2d.google.com with SMTP id e13so12949890qvd.8
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845205;
 x=1684437205;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=sTPdiQIdreogGm8cBWdKbYJibQNRp8zcbNt+du8HIBU=;
        b=HuPalACvDhuLF7KvDz6g49PPKO7XyqovkGZVXCYy6pLcxgFDYRU6w4grAPkt0YvGk6
         pHJ521bPBzsCouaZvOOPJXwsvNRmORjMrJzURqRnWoJKVVIQ0LYCfgJo8dAWEnQZSnm/
         +qvQ+RSmv9XdJ2WBvUINtpcBP4b/WzY099qV6MGA2jIRIxot+itSnsyKvOmEvArOiu1Q
         S+M3Q1BZHNM97/Br+cYmnpOE4izzfu1L2C7HfcIzhT5hZSS40LopQDZ3JnFsmhJL3dMJ
         Cq3x+nKbD8VyFtiLxqGEhPZyLptBiJ+Tm3yhwtrd5LnIqpPyDSHTFaSAvqO642ZLRgyW
         OGog==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845205; x=1684437205;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=sTPdiQIdreogGm8cBWdKbYJibQNRp8zcbNt+du8HIBU=;
        b=RJgHCDlx9YOsQviBzUV5tCT7r5k3GubSVCBWoshZPlC2dmuDVwKTTLZAFWnjTOl/je
         EaTfx/rwLW2IyTwFVFVd6A+tDEELsbeBtu45mqoC6NaXaslo5NbG6DqVPEK43g7gU5GN
         WSzFFuZepfz0PkEXy5bjkx03+k1LCb/PIyT5sqPfSHUNjWAWdc6Ps/zhYAY6Qty42uhv
         oTMXtmGf2NH5uFCQP/KiCEHAHShxTfeLeW5P9fCWYwMatk3CmF3OopCkVgt4f075KXi3
         HUPAmEQZe7NuNNbkM7vJighgFqM3k/eKr1wdzzihX9ELqEHyYnbgeVUr+HSxd1FL41fZ
         yxmw==
X-Gm-Message-State: AAQBX9f9zOmJaSChIbwPA4CtXcs4SStcxsSgxH5tsE9qjKinA/T7QAZG
        CLmsxA+vNWV+mCGXVBvOibkucgzOJyAIHCiTgW8=
X-Google-Smtp-Source: 
 AKy350aDUeUhXJBSWIivTN/XnDLT5DJ+ST1mc086iGBK1cA1BfRASnqlSz8S0NO45keDkBhRWTpTfw==
X-Received: by 2002:a05:6214:2305:b0:5f1:6a35:60be with SMTP id
 gc5-20020a056214230500b005f16a3560bemr288353qvb.23.1681845204992;
        Tue, 18 Apr 2023 12:13:24 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 m8-20020a05620a220800b007468b183a65sm4166837qkh.30.2023.04.18.12.13.24
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:24 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 05/26] mm: page_alloc: per-migratetype pcplist for THPs
Date: Tue, 18 Apr 2023 15:12:52 -0400
Message-Id: <20230418191313.268131-6-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Right now, there is only one pcplist for THP allocations. However,
while most THPs are movable, the huge zero page is not. This means a
movable THP allocation can grab an unmovable block from the pcplist,
and a subsequent THP split, partial free, and reallocation of the
remainder will mix movable and unmovable pages in the block.

While this isn't a huge source of block pollution in practice, it
happens often enough to trigger debug warnings fairly quickly under
load. In the interest of tightening up pageblock hygiene, make the THP
pcplists fully migratetype-aware, just like the lower order ones.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mmzone.h | 8 +++-----
 mm/page_alloc.c        | 4 ++--
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cd28a100d9e4..53e55882a4e7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -552,13 +552,11 @@ enum zone_watermarks {
 };
=20
 /*
- * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional li=
st
- * for THP which will usually be GFP_MOVABLE. Even if it is another type,
- * it should not contribute to serious fragmentation causing THP allocation
- * failures.
+ * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional set
+ * for THP (usually GFP_MOVABLE, but with exception of the huge zero page.)
  */
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define NR_PCP_THP 1
+#define NR_PCP_THP MIGRATE_PCPTYPES
 #else
 #define NR_PCP_THP 0
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5e04a69f6a26..d3d01019ce77 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -710,7 +710,7 @@ static inline unsigned int order_to_pindex(int migratet=
ype, int order)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	if (order > PAGE_ALLOC_COSTLY_ORDER) {
 		VM_BUG_ON(order !=3D pageblock_order);
-		return NR_LOWORDER_PCP_LISTS;
+		return NR_LOWORDER_PCP_LISTS + migratetype;
 	}
 #else
 	VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
@@ -724,7 +724,7 @@ static inline int pindex_to_order(unsigned int pindex)
 	int order =3D pindex / MIGRATE_PCPTYPES;
=20
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (pindex =3D=3D NR_LOWORDER_PCP_LISTS)
+	if (pindex >=3D NR_LOWORDER_PCP_LISTS)
 		order =3D pageblock_order;
 #else
 	VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8F4AEC7EE20
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:14:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232898AbjDRTOH (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:07 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57416 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232913AbjDRTNq (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:46 -0400
Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com
 [IPv6:2607:f8b0:4864:20::835])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A69B6E8E
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:27 -0700 (PDT)
Received: by mail-qt1-x835.google.com with SMTP id z1so19837245qti.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845206;
 x=1684437206;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Yvli2ztOIxf68cKgZ+2R5Rhio4udTEYI5UCWBnx3AD8=;
        b=POy4P8vqdPiLeVSxt76e8jJPxRl/eozN6LB9E3d68P5jpjweAEUzrlWviGMg34u66M
         DX44ULcevGJYU1hFf1mgK6kd8RFLuTvhIJeohRXfelo0YUOwbJ2hiSsbpjgzctQtJMmk
         aB8zg6UGDs1xRfRvOwzO+9s5eIWHOX8MZqZ9UQkd5da4H1q7eayqraQwt5iMYeDJ3wrF
         2BEBL1DvnoTbl0fLZxWCzY8F3sCBapTDuPbo3yEPRPlXgpKT8Aqj+HuVZ1Ex7zqT3+Rx
         7Z4lWFsd7xtsn4IW/qycOjrNsmfsYyltylCVUG3/5TIBnIDCJksCo9fcXBl148+k7yVp
         25HQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845206; x=1684437206;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Yvli2ztOIxf68cKgZ+2R5Rhio4udTEYI5UCWBnx3AD8=;
        b=N+OhLchQwX9UrqE3E55VQ+bciYZEUkThRjRBr0GwETAGy3FMXYDH9PCNPH905IT3ir
         I4VyEQAPVFVY6+o/E1kIpD5DRZdYyoJBgpoI2ciKLJKp/AmTTgLtY9tXoxK2TdRO4sSd
         8miTiZzJFpolzpFUBvbzFlPvRn1muGoDa4l8nKaNFjwH6IAB2ezfPzEC442dQyYiTe5q
         KdHCqtkKXERdyRDiAW/3pXFJwlc/u8qcRgDhXmWWcdF4Hf2aJu1qniGni3sMbVJY69Zz
         FmekhgLprUhTMPqvmv2pCA7CtgIfJ1FcgufRE26znCmYseoFb4bFNElwyz7kJ2w7dVb3
         JIdA==
X-Gm-Message-State: AAQBX9dxsoWteLshHYoQXKzWr95Ut1Ed3vq+RXNxQFaQyVyiT2mhmN01
        VTM1UlOrf63MXN0opHAykF2suA==
X-Google-Smtp-Source: 
 AKy350ZdHSWRh7iDOoY7e2deWBswJWxcHm+J7bJRVccNErJ7bE6cTwx7vNM6q+rfQXee9INBqqd6Cw==
X-Received: by 2002:ac8:578d:0:b0:3bf:c3be:758e with SMTP id
 v13-20020ac8578d000000b003bfc3be758emr1593777qta.16.1681845206131;
        Tue, 18 Apr 2023 12:13:26 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 p23-20020a05620a22f700b0074ad04609f8sm1217427qki.117.2023.04.18.12.13.25
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:25 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 06/26] mm: page_alloc: consolidate free page accounting
Date: Tue, 18 Apr 2023 15:12:53 -0400
Message-Id: <20230418191313.268131-7-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Free page accounting currently happens a bit too high up the call
stack, where it has to deal with guard pages, compaction capturing,
block stealing and even page isolation. This is subtle and fragile,
and makes it difficult to hack on the code.

Push the accounting down to where pages enter and leave the physical
freelists, where all these higher-level exceptions are of no concern.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/page-isolation.h |   2 +-
 include/linux/vmstat.h         |   8 --
 mm/internal.h                  |   5 --
 mm/page_alloc.c                | 153 +++++++++++++++++----------------
 mm/page_isolation.c            |  13 ++-
 5 files changed, 83 insertions(+), 98 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 0ab089e89db4..b519fffb3dee 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -35,7 +35,7 @@ static inline bool is_migrate_isolate(int migratetype)
=20
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype, int *num_movable);
+			 int old_mt, int new_mt, int *num_movable);
=20
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pf=
n,
 			     int migratetype, int flags, gfp_t gfp_flags);
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 19cf5b6892ce..219ccf3f91cd 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -481,14 +481,6 @@ static inline void node_stat_sub_folio(struct folio *f=
olio,
 	mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
 }
=20
-static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pag=
es,
-					     int migratetype)
-{
-	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
-	if (is_migrate_cma(migratetype))
-		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
-}
-
 extern const char * const vmstat_text[];
=20
 static inline const char *zone_stat_name(enum zone_stat_item item)
diff --git a/mm/internal.h b/mm/internal.h
index bcf75a8b032d..024affd4e4b5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -781,11 +781,6 @@ static inline bool is_migrate_highatomic(enum migratet=
ype migratetype)
 	return migratetype =3D=3D MIGRATE_HIGHATOMIC;
 }
=20
-static inline bool is_migrate_highatomic_page(struct page *page)
-{
-	return get_pageblock_migratetype(page) =3D=3D MIGRATE_HIGHATOMIC;
-}
-
 void setup_zone_pageset(struct zone *zone);
=20
 struct migration_target_control {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d3d01019ce77..b9366c002334 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -843,7 +843,7 @@ static int __init debug_guardpage_minorder_setup(char *=
buf)
 early_param("debug_guardpage_minorder", debug_guardpage_minorder_setup);
=20
 static inline bool set_page_guard(struct zone *zone, struct page *page,
-				unsigned int order, int migratetype)
+				  unsigned int order
 {
 	if (!debug_guardpage_enabled())
 		return false;
@@ -854,15 +854,12 @@ static inline bool set_page_guard(struct zone *zone, =
struct page *page,
 	__SetPageGuard(page);
 	INIT_LIST_HEAD(&page->buddy_list);
 	set_page_private(page, order);
-	/* Guard pages are not available for any usage */
-	if (!is_migrate_isolate(migratetype))
-		__mod_zone_freepage_state(zone, -(1 << order), migratetype);
=20
 	return true;
 }
=20
 static inline void clear_page_guard(struct zone *zone, struct page *page,
-				unsigned int order, int migratetype)
+				    unsigned int order)
 {
 	if (!debug_guardpage_enabled())
 		return;
@@ -870,14 +867,12 @@ static inline void clear_page_guard(struct zone *zone=
, struct page *page,
 	__ClearPageGuard(page);
=20
 	set_page_private(page, 0);
-	if (!is_migrate_isolate(migratetype))
-		__mod_zone_freepage_state(zone, (1 << order), migratetype);
 }
 #else
 static inline bool set_page_guard(struct zone *zone, struct page *page,
-			unsigned int order, int migratetype) { return false; }
+			unsigned int order) { return false; }
 static inline void clear_page_guard(struct zone *zone, struct page *page,
-				unsigned int order, int migratetype) {}
+				unsigned int order) {}
 #endif
=20
 /*
@@ -994,24 +989,36 @@ compaction_capture(struct capture_control *capc, stru=
ct page *page,
 }
 #endif /* CONFIG_COMPACTION */
=20
-/* Used for pages not on another list */
-static inline void add_to_free_list(struct page *page, struct zone *zone,
-				    unsigned int order, int migratetype)
+static inline void account_freepages(struct page *page, struct zone *zone,
+				     int nr_pages, int migratetype)
 {
-	struct free_area *area =3D &zone->free_area[order];
+	if (is_migrate_isolate(migratetype))
+		return;
=20
-	list_add(&page->buddy_list, &area->free_list[migratetype]);
-	area->nr_free++;
+	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
+
+	if (is_migrate_cma(migratetype))
+		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
 }
=20
 /* Used for pages not on another list */
-static inline void add_to_free_list_tail(struct page *page, struct zone *z=
one,
-					 unsigned int order, int migratetype)
+static inline void add_to_free_list(struct page *page, struct zone *zone,
+				    unsigned int order, int migratetype,
+				    bool tail)
 {
 	struct free_area *area =3D &zone->free_area[order];
=20
-	list_add_tail(&page->buddy_list, &area->free_list[migratetype]);
+	VM_WARN_ONCE(get_pageblock_migratetype(page) !=3D migratetype,
+		     "page type is %lu, passed migratetype is %d (nr=3D%d)\n",
+		     get_pageblock_migratetype(page), migratetype, 1 << order);
+
+	if (tail)
+		list_add_tail(&page->buddy_list, &area->free_list[migratetype]);
+	else
+		list_add(&page->buddy_list, &area->free_list[migratetype]);
 	area->nr_free++;
+
+	account_freepages(page, zone, 1 << order, migratetype);
 }
=20
 /*
@@ -1020,16 +1027,23 @@ static inline void add_to_free_list_tail(struct pag=
e *page, struct zone *zone,
  * allocation again (e.g., optimization for memory onlining).
  */
 static inline void move_to_free_list(struct page *page, struct zone *zone,
-				     unsigned int order, int migratetype)
+				     unsigned int order, int old_mt, int new_mt)
 {
 	struct free_area *area =3D &zone->free_area[order];
=20
-	list_move_tail(&page->buddy_list, &area->free_list[migratetype]);
+	list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
+
+	account_freepages(page, zone, -(1 << order), old_mt);
+	account_freepages(page, zone, 1 << order, new_mt);
 }
=20
 static inline void del_page_from_free_list(struct page *page, struct zone =
*zone,
-					   unsigned int order)
+					   unsigned int order, int migratetype)
 {
+        VM_WARN_ONCE(get_pageblock_migratetype(page) !=3D migratetype,
+		     "page type is %lu, passed migratetype is %d (nr=3D%d)\n",
+		     get_pageblock_migratetype(page), migratetype, 1 << order);
+
 	/* clear reported state and update reported page count */
 	if (page_reported(page))
 		__ClearPageReported(page);
@@ -1038,6 +1052,8 @@ static inline void del_page_from_free_list(struct pag=
e *page, struct zone *zone,
 	__ClearPageBuddy(page);
 	set_page_private(page, 0);
 	zone->free_area[order].nr_free--;
+
+	account_freepages(page, zone, -(1 << order), migratetype);
 }
=20
 /*
@@ -1104,23 +1120,21 @@ static inline void __free_one_page(struct page *pag=
e,
 	VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
=20
 	VM_BUG_ON(migratetype =3D=3D -1);
-	if (likely(!is_migrate_isolate(migratetype)))
-		__mod_zone_freepage_state(zone, 1 << order, migratetype);
-
 	VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
=20
 	while (order < MAX_ORDER - 1) {
-		if (compaction_capture(capc, page, order, migratetype)) {
-			__mod_zone_freepage_state(zone, -(1 << order),
-								migratetype);
+		int buddy_mt;
+
+		if (compaction_capture(capc, page, order, migratetype))
 			return;
-		}
=20
 		buddy =3D find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
 		if (!buddy)
 			goto done_merging;
=20
+		buddy_mt =3D get_pageblock_migratetype(buddy);
+
 		if (unlikely(order >=3D pageblock_order)) {
 			/*
 			 * We want to prevent merge between freepages on pageblock
@@ -1128,8 +1142,6 @@ static inline void __free_one_page(struct page *page,
 			 * pageblock isolation could cause incorrect freepage or CMA
 			 * accounting or HIGHATOMIC accounting.
 			 */
-			int buddy_mt =3D get_pageblock_migratetype(buddy);
-
 			if (migratetype !=3D buddy_mt
 					&& (!migratetype_is_mergeable(migratetype) ||
 						!migratetype_is_mergeable(buddy_mt)))
@@ -1141,9 +1153,9 @@ static inline void __free_one_page(struct page *page,
 		 * merge with it and move up one order.
 		 */
 		if (page_is_guard(buddy))
-			clear_page_guard(zone, buddy, order, migratetype);
+			clear_page_guard(zone, buddy, order);
 		else
-			del_page_from_free_list(buddy, zone, order);
+			del_page_from_free_list(buddy, zone, order, buddy_mt);
 		combined_pfn =3D buddy_pfn & pfn;
 		page =3D page + (combined_pfn - pfn);
 		pfn =3D combined_pfn;
@@ -1160,10 +1172,7 @@ static inline void __free_one_page(struct page *page,
 	else
 		to_tail =3D buddy_merge_likely(pfn, buddy_pfn, page, order);
=20
-	if (to_tail)
-		add_to_free_list_tail(page, zone, order, migratetype);
-	else
-		add_to_free_list(page, zone, order, migratetype);
+	add_to_free_list(page, zone, order, migratetype, to_tail);
=20
 	/* Notify page reporting subsystem of freed page */
 	if (!(fpi_flags & FPI_SKIP_REPORT_NOTIFY))
@@ -1205,10 +1214,8 @@ int split_free_page(struct page *free_page,
 	}
=20
 	mt =3D get_pageblock_migratetype(free_page);
-	if (likely(!is_migrate_isolate(mt)))
-		__mod_zone_freepage_state(zone, -(1UL << order), mt);
+	del_page_from_free_list(free_page, zone, order, mt);
=20
-	del_page_from_free_list(free_page, zone, order);
 	for (pfn =3D free_page_pfn;
 	     pfn < free_page_pfn + (1UL << order);) {
 		int mt =3D get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
@@ -2352,10 +2359,10 @@ static inline void expand(struct zone *zone, struct=
 page *page,
 		 * Corresponding page table entries will not be touched,
 		 * pages will stay not present in virtual address space
 		 */
-		if (set_page_guard(zone, &page[size], high, migratetype))
+		if (set_page_guard(zone, &page[size], high))
 			continue;
=20
-		add_to_free_list(&page[size], zone, high, migratetype);
+		add_to_free_list(&page[size], zone, high, migratetype, false);
 		set_buddy_order(&page[size], high);
 	}
 }
@@ -2559,11 +2566,15 @@ struct page *__rmqueue_smallest(struct zone *zone, =
unsigned int order,
=20
 	/* Find a page of the appropriate size in the preferred list */
 	for (current_order =3D order; current_order < MAX_ORDER; ++current_order)=
 {
+		int actual_mt;
+
 		area =3D &(zone->free_area[current_order]);
 		page =3D get_page_from_free_area(area, migratetype);
 		if (!page)
 			continue;
-		del_page_from_free_list(page, zone, current_order);
+		/* move_freepages_block() may strand types on wrong list */
+		actual_mt =3D get_pageblock_migratetype(page);
+		del_page_from_free_list(page, zone, current_order, actual_mt);
 		expand(zone, page, order, current_order, migratetype);
 		set_pcppage_migratetype(page, migratetype);
 		trace_mm_page_alloc_zone_locked(page, order, migratetype,
@@ -2606,7 +2617,7 @@ static inline struct page *__rmqueue_cma_fallback(str=
uct zone *zone,
  */
 static int move_freepages(struct zone *zone,
 			  unsigned long start_pfn, unsigned long end_pfn,
-			  int migratetype, int *num_movable)
+			  int old_mt, int new_mt, int *num_movable)
 {
 	struct page *page;
 	unsigned long pfn;
@@ -2633,7 +2644,7 @@ static int move_freepages(struct zone *zone,
 		VM_BUG_ON_PAGE(page_zone(page) !=3D zone, page);
=20
 		order =3D buddy_order(page);
-		move_to_free_list(page, zone, order, migratetype);
+		move_to_free_list(page, zone, order, old_mt, new_mt);
 		pfn +=3D 1 << order;
 		pages_moved +=3D 1 << order;
 	}
@@ -2642,7 +2653,7 @@ static int move_freepages(struct zone *zone,
 }
=20
 int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype, int *num_movable)
+			 int old_mt, int new_mt, int *num_movable)
 {
 	unsigned long start_pfn, end_pfn, pfn;
=20
@@ -2659,8 +2670,8 @@ int move_freepages_block(struct zone *zone, struct pa=
ge *page,
 	if (!zone_spans_pfn(zone, end_pfn))
 		return 0;
=20
-	return move_freepages(zone, start_pfn, end_pfn, migratetype,
-								num_movable);
+	return move_freepages(zone, start_pfn, end_pfn,
+			      old_mt, new_mt, num_movable);
 }
=20
 static void change_pageblock_range(struct page *pageblock_page,
@@ -2786,8 +2797,9 @@ static void steal_suitable_fallback(struct zone *zone=
, struct page *page,
 	if (!whole_block)
 		goto single_page;
=20
-	free_pages =3D move_freepages_block(zone, page, start_type,
-						&movable_pages);
+	free_pages =3D move_freepages_block(zone, page, old_block_type,
+					  start_type, &movable_pages);
+
 	/*
 	 * Determine how many pages are compatible with our allocation.
 	 * For movable allocation, it's the number of movable pages which
@@ -2825,7 +2837,8 @@ static void steal_suitable_fallback(struct zone *zone=
, struct page *page,
 	return;
=20
 single_page:
-	move_to_free_list(page, zone, current_order, start_type);
+	move_to_free_list(page, zone, current_order,
+			  old_block_type, start_type);
 }
=20
 /*
@@ -2895,7 +2908,7 @@ static void reserve_highatomic_pageblock(struct page =
*page, struct zone *zone,
 	if (migratetype_is_mergeable(mt)) {
 		zone->nr_reserved_highatomic +=3D pageblock_nr_pages;
 		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
-		move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
+		move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC, NULL);
 	}
=20
 out_unlock:
@@ -2935,11 +2948,13 @@ static bool unreserve_highatomic_pageblock(const st=
ruct alloc_context *ac,
 		spin_lock_irqsave(&zone->lock, flags);
 		for (order =3D 0; order < MAX_ORDER; order++) {
 			struct free_area *area =3D &(zone->free_area[order]);
+			int mt;
=20
 			page =3D get_page_from_free_area(area, MIGRATE_HIGHATOMIC);
 			if (!page)
 				continue;
=20
+			mt =3D get_pageblock_migratetype(page);
 			/*
 			 * In page freeing path, migratetype change is racy so
 			 * we can counter several free pages in a pageblock
@@ -2947,7 +2962,7 @@ static bool unreserve_highatomic_pageblock(const stru=
ct alloc_context *ac,
 			 * from highatomic to ac->migratetype. So we should
 			 * adjust the count once.
 			 */
-			if (is_migrate_highatomic_page(page)) {
+			if (is_migrate_highatomic(mt)) {
 				/*
 				 * It should never happen but changes to
 				 * locking could inadvertently allow a per-cpu
@@ -2970,8 +2985,8 @@ static bool unreserve_highatomic_pageblock(const stru=
ct alloc_context *ac,
 			 * may increase.
 			 */
 			set_pageblock_migratetype(page, ac->migratetype);
-			ret =3D move_freepages_block(zone, page, ac->migratetype,
-									NULL);
+			ret =3D move_freepages_block(zone, page, mt,
+						   ac->migratetype, NULL);
 			if (ret) {
 				spin_unlock_irqrestore(&zone->lock, flags);
 				return ret;
@@ -3142,18 +3157,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned =
int order,
 		 */
 		list_add_tail(&page->pcp_list, list);
 		allocated++;
-		if (is_migrate_cma(get_pcppage_migratetype(page)))
-			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-					      -(1 << order));
 	}
-
-	/*
-	 * i pages were removed from the buddy list even if some leak due
-	 * to check_pcp_refill failing so adjust NR_FREE_PAGES based
-	 * on i. Do not confuse with 'allocated' which is the number of
-	 * pages added to the pcp list.
-	 */
-	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return allocated;
 }
@@ -3615,11 +3619,9 @@ int __isolate_free_page(struct page *page, unsigned =
int order)
 		watermark =3D zone->_watermark[WMARK_MIN] + (1UL << order);
 		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
 			return 0;
-
-		__mod_zone_freepage_state(zone, -(1UL << order), mt);
 	}
=20
-	del_page_from_free_list(page, zone, order);
+	del_page_from_free_list(page, zone, order, mt);
=20
 	/*
 	 * Set the pageblock if the isolated page is at least half of a
@@ -3715,8 +3717,6 @@ struct page *rmqueue_buddy(struct zone *preferred_zon=
e, struct zone *zone,
 				return NULL;
 			}
 		}
-		__mod_zone_freepage_state(zone, -(1 << order),
-					  get_pcppage_migratetype(page));
 		spin_unlock_irqrestore(&zone->lock, flags);
 	} while (check_new_pages(page, order));
=20
@@ -9578,8 +9578,9 @@ void __offline_isolated_pages(unsigned long start_pfn=
, unsigned long end_pfn)
=20
 		BUG_ON(page_count(page));
 		BUG_ON(!PageBuddy(page));
+		VM_WARN_ON(get_pageblock_migratetype(page) !=3D MIGRATE_ISOLATE);
 		order =3D buddy_order(page);
-		del_page_from_free_list(page, zone, order);
+		del_page_from_free_list(page, zone, order, MIGRATE_ISOLATE);
 		pfn +=3D (1 << order);
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
@@ -9630,11 +9631,12 @@ static void break_down_buddy_pages(struct zone *zon=
e, struct page *page,
 			current_buddy =3D page + size;
 		}
=20
-		if (set_page_guard(zone, current_buddy, high, migratetype))
+		if (set_page_guard(zone, current_buddy, high))
 			continue;
=20
 		if (current_buddy !=3D target) {
-			add_to_free_list(current_buddy, zone, high, migratetype);
+			add_to_free_list(current_buddy, zone, high,
+					 migratetype, false);
 			set_buddy_order(current_buddy, high);
 			page =3D next_page;
 		}
@@ -9662,12 +9664,11 @@ bool take_page_off_buddy(struct page *page)
 			int migratetype =3D get_pfnblock_migratetype(page_head,
 								   pfn_head);
=20
-			del_page_from_free_list(page_head, zone, page_order);
+			del_page_from_free_list(page_head, zone, page_order,
+						migratetype);
 			break_down_buddy_pages(zone, page_head, page, 0,
 						page_order, migratetype);
 			SetPageHWPoisonTakenOff(page);
-			if (!is_migrate_isolate(migratetype))
-				__mod_zone_freepage_state(zone, -1, migratetype);
 			ret =3D true;
 			break;
 		}
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b67800f7f6b1..e119a37ac661 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -183,10 +183,8 @@ static int set_migratetype_isolate(struct page *page, =
int migratetype, int isol_
=20
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
 		zone->nr_isolate_pageblock++;
-		nr_pages =3D move_freepages_block(zone, page, MIGRATE_ISOLATE,
-									NULL);
-
-		__mod_zone_freepage_state(zone, -nr_pages, mt);
+		nr_pages =3D move_freepages_block(zone, page, mt,
+						MIGRATE_ISOLATE, NULL);
 		spin_unlock_irqrestore(&zone->lock, flags);
 		return 0;
 	}
@@ -251,10 +249,9 @@ static void unset_migratetype_isolate(struct page *pag=
e, int migratetype)
 	 * onlining - just onlined memory won't immediately be considered for
 	 * allocation.
 	 */
-	if (!isolated_page) {
-		nr_pages =3D move_freepages_block(zone, page, migratetype, NULL);
-		__mod_zone_freepage_state(zone, nr_pages, migratetype);
-	}
+	if (!isolated_page)
+		nr_pages =3D move_freepages_block(zone, page, MIGRATE_ISOLATE,
+						migratetype, NULL);
 	set_pageblock_migratetype(page, migratetype);
 	if (isolated_page)
 		__putback_isolated_page(page, order, migratetype);
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A5CF0C77B7F
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:14:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232912AbjDRTOJ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57462 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232920AbjDRTNt (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:49 -0400
Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com
 [IPv6:2607:f8b0:4864:20::f2e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76BCAAD2F
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:28 -0700 (PDT)
Received: by mail-qv1-xf2e.google.com with SMTP id op30so18258741qvb.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845207;
 x=1684437207;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=13UUYwnQDW8sv6Z5U/gfNQWUn2NaHbpFqCKtGCMsvrI=;
        b=KYPCQZ61APdNYs9D70nl3yomvGcC38mIMM4ac/0Jxqh0MQrsikJDyqz8kNAKqw9iC1
         1bmnq4F1MqaGWpFLA8ADTd1KOLEiSxDVYMfswSH15y4crGrkoG6QG/Dr8TRu4HNJqULs
         3ondml+qKSu04IkF3zQ8fUI8BpomkXLoFtEZTlNb5W8qR5dw6PS9O8KDT/9Kxlz6inFB
         Zoapu1IejZKECDPcQcl9q/KpgYWJmZLUxb2uAQzBruh9kz4IWlq+6bWZCm5vzL8tOPu8
         21csNAuzKAdA/rLnXVEEXuAuDwsTm2WP87lOxN6ifFsa6PFpUcCMCZg4Et1TQULauXd7
         sXdw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845207; x=1684437207;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=13UUYwnQDW8sv6Z5U/gfNQWUn2NaHbpFqCKtGCMsvrI=;
        b=bgVPfgiBLM40POG/QOadjZxIFJ7Hfm9xKwTemnmgYawprJcBm7ElfD2N4Xc15Pjg6C
         jjKro0OW/V+Cwk5VqIze65MUIbnLEcVs6XuJYEka5ODSr5j7rIelDsxvODbTFsusxuHd
         +++W60DiugVNuU0iJvYmTW5nHA2GoyZFJ+LD6nwFpm9OqEp4LJ4+veWXwNiaEIu9JJwz
         V6r/CChZn1nyyBF3LdMbt8tiYZFf09jNOHbs3ynLepiAKUMkT3C8k4COZRG6MQXU5jTT
         ef4FrGUtSK4N5f8apIKioV5xnAsS0RHLO6fZglrvantc8MkWNRc1YgbqkuO3NzNeXHVG
         Ymzg==
X-Gm-Message-State: AAQBX9fHBp11ESJl+YenteUM3eRZRgDhpsLNRTgOxhdcyR/cPGbELmiV
        MVvvUAq3X35eQ1ujt0rzSFxqPQ==
X-Google-Smtp-Source: 
 AKy350bRhkIUr8N7okrlgh3Hwo07F/vQhkNz4v1tFl3I8eV2V83V9lNwfkKYujFHpq4QwdXTdItkPw==
X-Received: by 2002:ad4:5946:0:b0:5ef:9b22:dc88 with SMTP id
 eo6-20020ad45946000000b005ef9b22dc88mr5188244qvb.0.1681845207300;
        Tue, 18 Apr 2023 12:13:27 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 db3-20020a056214170300b005e5b30eef24sm3907199qvb.56.2023.04.18.12.13.26
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:26 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 07/26] mm: page_alloc: move capture_control to the page
 allocator
Date: Tue, 18 Apr 2023 15:12:54 -0400
Message-Id: <20230418191313.268131-8-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Compaction capturing is really a component of the page allocator.
Later patches will also disconnect allocation request size from the
compaction size, so move the setup of the capturing parameters to the
"request domain", i.e. the page allocator. No functional change.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/compaction.h |  3 ++-
 mm/compaction.c            | 33 ++++++++++-----------------------
 mm/page_alloc.c            | 25 ++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 52a9ff65faee..06eeb2e25833 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -56,6 +56,7 @@ enum compact_result {
 };
=20
 struct alloc_context; /* in mm/internal.h */
+struct capture_control; /* in mm/internal.h */
=20
 /*
  * Number of free order-0 pages that should be available above given water=
mark
@@ -94,7 +95,7 @@ extern int fragmentation_index(struct zone *zone, unsigne=
d int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
 		const struct alloc_context *ac, enum compact_priority prio,
-		struct page **page);
+		struct capture_control *capc);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int orde=
r,
 		unsigned int alloc_flags, int highest_zoneidx);
diff --git a/mm/compaction.c b/mm/compaction.c
index 84db84e8fd3a..a2280001eea3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2510,7 +2510,7 @@ compact_zone(struct compact_control *cc, struct captu=
re_control *capc)
 static enum compact_result compact_zone_order(struct zone *zone, int order,
 		gfp_t gfp_mask, enum compact_priority prio,
 		unsigned int alloc_flags, int highest_zoneidx,
-		struct page **capture)
+		struct capture_control *capc)
 {
 	enum compact_result ret;
 	struct compact_control cc =3D {
@@ -2527,38 +2527,25 @@ static enum compact_result compact_zone_order(struc=
t zone *zone, int order,
 		.ignore_skip_hint =3D (prio =3D=3D MIN_COMPACT_PRIORITY),
 		.ignore_block_suitable =3D (prio =3D=3D MIN_COMPACT_PRIORITY)
 	};
-	struct capture_control capc =3D {
-		.cc =3D &cc,
-		.page =3D NULL,
-	};
=20
-	/*
-	 * Make sure the structs are really initialized before we expose the
-	 * capture control, in case we are interrupted and the interrupt handler
-	 * frees a page.
-	 */
+	/* See the comment in __alloc_pages_direct_compact() */
 	barrier();
-	WRITE_ONCE(current->capture_control, &capc);
+	WRITE_ONCE(capc->cc, &cc);
=20
-	ret =3D compact_zone(&cc, &capc);
+	ret =3D compact_zone(&cc, capc);
+
+	WRITE_ONCE(capc->cc, NULL);
=20
 	VM_BUG_ON(!list_empty(&cc.freepages));
 	VM_BUG_ON(!list_empty(&cc.migratepages));
=20
-	/*
-	 * Make sure we hide capture control first before we read the captured
-	 * page pointer, otherwise an interrupt could free and capture a page
-	 * and we would leak it.
-	 */
-	WRITE_ONCE(current->capture_control, NULL);
-	*capture =3D READ_ONCE(capc.page);
 	/*
 	 * Technically, it is also possible that compaction is skipped but
 	 * the page is still captured out of luck(IRQ came and freed the page).
 	 * Returning COMPACT_SUCCESS in such cases helps in properly accounting
 	 * the COMPACT[STALL|FAIL] when compaction is skipped.
 	 */
-	if (*capture)
+	if (capc->page)
 		ret =3D COMPACT_SUCCESS;
=20
 	return ret;
@@ -2573,13 +2560,13 @@ int sysctl_extfrag_threshold =3D 500;
  * @alloc_flags: The allocation flags of the current allocation
  * @ac: The context of current allocation
  * @prio: Determines how hard direct compaction should try to succeed
- * @capture: Pointer to free page created by compaction will be stored here
+ * @capc: The context for capturing pages during freeing
  *
  * This is the main entry point for direct page compaction.
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int orde=
r,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio, struct page **capture)
+		enum compact_priority prio, struct capture_control *capc)
 {
 	int may_perform_io =3D (__force int)(gfp_mask & __GFP_IO);
 	struct zoneref *z;
@@ -2607,7 +2594,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_ma=
sk, unsigned int order,
 		}
=20
 		status =3D compact_zone_order(zone, order, gfp_mask, prio,
-				alloc_flags, ac->highest_zoneidx, capture);
+				alloc_flags, ac->highest_zoneidx, capc);
 		rc =3D max(status, rc);
=20
 		/* The allocation should succeed, stop compacting */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b9366c002334..4d20513c83be 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -944,7 +944,7 @@ static inline struct capture_control *task_capc(struct =
zone *zone)
 {
 	struct capture_control *capc =3D current->capture_control;
=20
-	return unlikely(capc) &&
+	return unlikely(capc && capc->cc) &&
 		!(current->flags & PF_KTHREAD) &&
 		!capc->page &&
 		capc->cc->zone =3D=3D zone ? capc : NULL;
@@ -4480,22 +4480,41 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsign=
ed int order,
 	struct page *page =3D NULL;
 	unsigned long pflags;
 	unsigned int noreclaim_flag;
+	struct capture_control capc =3D {
+		.page =3D NULL,
+	};
=20
 	if (!order)
 		return NULL;
=20
+	/*
+	 * Make sure the structs are really initialized before we expose the
+	 * capture control, in case we are interrupted and the interrupt handler
+	 * frees a page.
+	 */
+	barrier();
+	WRITE_ONCE(current->capture_control, &capc);
+
 	psi_memstall_enter(&pflags);
 	delayacct_compact_start();
 	noreclaim_flag =3D memalloc_noreclaim_save();
=20
 	*compact_result =3D try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-								prio, &page);
+					       prio, &capc);
=20
 	memalloc_noreclaim_restore(noreclaim_flag);
 	psi_memstall_leave(&pflags);
 	delayacct_compact_end();
=20
-	if (*compact_result =3D=3D COMPACT_SKIPPED)
+	/*
+	 * Make sure we hide capture control first before we read the captured
+	 * page pointer, otherwise an interrupt could free and capture a page
+	 * and we would leak it.
+	 */
+	WRITE_ONCE(current->capture_control, NULL);
+	page =3D READ_ONCE(capc.page);
+
+	if (!page && *compact_result =3D=3D COMPACT_SKIPPED)
 		return NULL;
 	/*
 	 * At least in one zone compaction wasn't deferred or skipped, so let's
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B7EBDC7EE22
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:14:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232931AbjDRTOL (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:11 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57536 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232653AbjDRTNx (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:53 -0400
Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com
 [IPv6:2607:f8b0:4864:20::f2a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90D89C67A
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:29 -0700 (PDT)
Received: by mail-qv1-xf2a.google.com with SMTP id oo30so15744241qvb.12
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845208;
 x=1684437208;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=+Fg7akH6DcyqQI9O6hhnfSSKo6TLLXotHGx/ELhTsVs=;
        b=duezlBdfFiEByRb2UhLuZ0Hbbz17B9Qr/emODLH8mZ2CGMjptqHGa11W+QzdSjQ/7F
         NrpNC/GaqkJEOfmZT8IQhhmClT+39ZedrNCcu8ABafH+q5j+YOBKEgdjdxPGtGKwAbo+
         P2TEzDioWeN451luG7HEJwPMt/rOzqsoER6GOlwhz2qDDjhKRGqWoYpeyDSp91a0bKPf
         25r1qZ+NsVNcaUBJQLFKjSOaPj15VqfhoKvKXLa7AyFqKWsmjO1Nkx11EK8FkuJho1GV
         PlZj02YYGkSrKmCNJW7AgCuXuYglqYWYf5+G3gp8PSNESKrtKbSuyF4B4khcnFe0QCvs
         SdMw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845208; x=1684437208;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=+Fg7akH6DcyqQI9O6hhnfSSKo6TLLXotHGx/ELhTsVs=;
        b=egV5nnUsJlZ9RFvsWQVswQn0i+d8wpt00UMCRE15Yed4SZ3qNNP2yC7Qblz4tmnKUD
         lug33365g4PGqhta1/Dt+I2au1CxWaPnnGqTsaqWAu30BJOFckt7HiKl7HwWjG0iaRYv
         5j4C0GLxqrfiD4YlXTkVIuZ8H8CHoIjueeOhqjWv4UQCzq06+VVzV4Uzvtw6WNVFmly9
         w6/I7PZT8iEQQUwlJCX9Jp+Dwz+gaMUo0fEpuAoMgsYqPRIuRuTjg5apOiDR0sMoqHSu
         PZcDZAOW86u7NNBOv6zULIesbjpMOzUy2reg9jNnFXzIiLu/tCmA/iyWhyovT2oAuTJ2
         2KqA==
X-Gm-Message-State: AAQBX9d6LIqj59EBejqt5to21lVDnfJ5I/hXCrtQiIziwUapx5eIiX6j
        E/h8gQC5I0papJ7zfCp/pQidkg==
X-Google-Smtp-Source: 
 AKy350aBP+4ihxJVgdmom0s8xvFh5tp4l/UPrJZnbm1VerN0bSQuYJNYtHePgve1c8xb8nyNL5IrOA==
X-Received: by 2002:a05:6214:252a:b0:5ef:8004:e0b4 with SMTP id
 gg10-20020a056214252a00b005ef8004e0b4mr11443725qvb.48.1681845208456;
        Tue, 18 Apr 2023 12:13:28 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 c2-20020a0ceb42000000b005dd8b9345e3sm3924742qvq.123.2023.04.18.12.13.27
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:28 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 08/26] mm: page_alloc: claim blocks during compaction
 capturing
Date: Tue, 18 Apr 2023 15:12:55 -0400
Message-Id: <20230418191313.268131-9-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

When capturing a whole block, update the migratetype accordingly. For
example, a THP allocation might capture an unmovable block. If the THP
gets split and partially freed later, the remainder should group up
with movable allocations.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/internal.h   |  1 +
 mm/page_alloc.c | 42 ++++++++++++++++++++++++------------------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 024affd4e4b5..39f65a463631 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -432,6 +432,7 @@ struct compact_control {
  */
 struct capture_control {
 	struct compact_control *cc;
+	int migratetype;
 	struct page *page;
 };
=20
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4d20513c83be..8e5996f8b4b4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -615,6 +615,17 @@ void set_pageblock_migratetype(struct page *page, int =
migratetype)
 				page_to_pfn(page), MIGRATETYPE_MASK);
 }
=20
+static void change_pageblock_range(struct page *pageblock_page,
+					int start_order, int migratetype)
+{
+	int nr_pageblocks =3D 1 << (start_order - pageblock_order);
+
+	while (nr_pageblocks--) {
+		set_pageblock_migratetype(pageblock_page, migratetype);
+		pageblock_page +=3D pageblock_nr_pages;
+	}
+}
+
 #ifdef CONFIG_DEBUG_VM
 static int page_outside_zone_boundaries(struct zone *zone, struct page *pa=
ge)
 {
@@ -962,14 +973,19 @@ compaction_capture(struct capture_control *capc, stru=
ct page *page,
 	    is_migrate_isolate(migratetype))
 		return false;
=20
-	/*
-	 * Do not let lower order allocations pollute a movable pageblock.
-	 * This might let an unmovable request use a reclaimable pageblock
-	 * and vice-versa but no more than normal fallback logic which can
-	 * have trouble finding a high-order free page.
-	 */
-	if (order < pageblock_order && migratetype =3D=3D MIGRATE_MOVABLE)
+	if (order >=3D pageblock_order) {
+		migratetype =3D capc->migratetype;
+		change_pageblock_range(page, order, migratetype);
+	} else if (migratetype =3D=3D MIGRATE_MOVABLE) {
+		/*
+		 * Do not let lower order allocations pollute a
+		 * movable pageblock.  This might let an unmovable
+		 * request use a reclaimable pageblock and vice-versa
+		 * but no more than normal fallback logic which can
+		 * have trouble finding a high-order free page.
+		 */
 		return false;
+	}
=20
 	capc->page =3D page;
 	return true;
@@ -2674,17 +2690,6 @@ int move_freepages_block(struct zone *zone, struct p=
age *page,
 			      old_mt, new_mt, num_movable);
 }
=20
-static void change_pageblock_range(struct page *pageblock_page,
-					int start_order, int migratetype)
-{
-	int nr_pageblocks =3D 1 << (start_order - pageblock_order);
-
-	while (nr_pageblocks--) {
-		set_pageblock_migratetype(pageblock_page, migratetype);
-		pageblock_page +=3D pageblock_nr_pages;
-	}
-}
-
 /*
  * When we are falling back to another migratetype during allocation, try =
to
  * steal extra free pages from the same pageblocks to satisfy further
@@ -4481,6 +4486,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned=
 int order,
 	unsigned long pflags;
 	unsigned int noreclaim_flag;
 	struct capture_control capc =3D {
+		.migratetype =3D ac->migratetype,
 		.page =3D NULL,
 	};
=20
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5B0FCC6FD18
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231513AbjDRTOZ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:25 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57580 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232708AbjDRTNz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:13:55 -0400
Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com
 [IPv6:2607:f8b0:4864:20::f34])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 869EAC65A
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:30 -0700 (PDT)
Received: by mail-qv1-xf34.google.com with SMTP id dd8so17376152qvb.13
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845209;
 x=1684437209;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=3ziCYFLP0gIvrTMKVGAOENHj/570OA5tLu/LhfprcHs=;
        b=tASioUHSg0O8JLU9lmpT2o9qieQ5UPbSCjZ2FFf4DPmc+sm1lG5E2sefokRjX82PWv
         /naa0xXtkW6heFV0tkRw9s2v57ltr8jqTBis+P8WeOGGhuTUTbP/NaZa2KY0naFlxm+B
         W0CWp0uB41dkHuQ1/kpuzy+c28wZgenF79DWz7OZV3qE+8W75G/CgO9SPnK2fL9a320t
         yXKyc58RVcAaAikakCIJxYNwp+SzurktP4fqmo2U9wFIB3ZelkMdhT468zPi5iN0Uonq
         qlshkMrXESVN82SilYYT0M1b22r6Yho1HwmEKjRb0CLDZMxgSBPEuAYI8Xtu54okMtlA
         dd3w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845209; x=1684437209;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=3ziCYFLP0gIvrTMKVGAOENHj/570OA5tLu/LhfprcHs=;
        b=cBMt7pYOKWpCq5hkZndrgGT5rsQcIj7aiHTxElqk08Gn3VfEfe5mglVMMBcmBS36vS
         ikusUusewfwZyhkytx/trnBKCm71+DiEEb4xamilOgFVb5F55yXTxwm2fj/tWviZGUbw
         1pRLy7AJ487bha5RWru6lMf1XvhIOaawlCdbD0yE2ZP5Rm22cyhePvoZaYsOrR32C1/o
         Uxy7PQUSGCooSQ6bLLH1z3ZSFkYrJCxqbSfezr2tCa7FKZ6xOtyq9Fek1DOxMTbXvS1+
         Sq2ih8z6qkEQQK0DBnaARSxf3HYLAT/YpyMZOQRgwPS/25fUZc9hBUfzUne0K0XvkPhy
         SKaA==
X-Gm-Message-State: AAQBX9eIimlX1NO0BZiUtI0YcFHXhSQ0TN4sfQzjUS3EshIu7uDzS+dq
        yKR0TkodogOZT8UagBOSph8/aw==
X-Google-Smtp-Source: 
 AKy350ZOi+88hEvm9GEFxPHg078nMJgdO2hJcrbQjNkq+nM1p74G7tq0XzGJcBNJQ1SbVH8WMthykg==
X-Received: by 2002:a05:6214:f23:b0:5ef:77c4:4540 with SMTP id
 iw3-20020a0562140f2300b005ef77c44540mr16271521qvb.27.1681845209552;
        Tue, 18 Apr 2023 12:13:29 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 dr3-20020a05621408e300b005dd8b9345dbsm3884921qvb.115.2023.04.18.12.13.29
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:29 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 09/26] mm: page_alloc: move expand() above
 compaction_capture()
Date: Tue, 18 Apr 2023 15:12:56 -0400
Message-Id: <20230418191313.268131-10-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

The next patch will allow compaction to capture from
larger-than-requested page blocks and free the remainder.

Rearrange the code in advance to make the diff more readable. No
functional change.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 186 ++++++++++++++++++++++++------------------------
 1 file changed, 93 insertions(+), 93 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8e5996f8b4b4..cd86f80d7bbe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -950,61 +950,6 @@ static inline void set_buddy_order(struct page *page, =
unsigned int order)
 	__SetPageBuddy(page);
 }
=20
-#ifdef CONFIG_COMPACTION
-static inline struct capture_control *task_capc(struct zone *zone)
-{
-	struct capture_control *capc =3D current->capture_control;
-
-	return unlikely(capc && capc->cc) &&
-		!(current->flags & PF_KTHREAD) &&
-		!capc->page &&
-		capc->cc->zone =3D=3D zone ? capc : NULL;
-}
-
-static inline bool
-compaction_capture(struct capture_control *capc, struct page *page,
-		   int order, int migratetype)
-{
-	if (!capc || order !=3D capc->cc->order)
-		return false;
-
-	/* Do not accidentally pollute CMA or isolated regions*/
-	if (is_migrate_cma(migratetype) ||
-	    is_migrate_isolate(migratetype))
-		return false;
-
-	if (order >=3D pageblock_order) {
-		migratetype =3D capc->migratetype;
-		change_pageblock_range(page, order, migratetype);
-	} else if (migratetype =3D=3D MIGRATE_MOVABLE) {
-		/*
-		 * Do not let lower order allocations pollute a
-		 * movable pageblock.  This might let an unmovable
-		 * request use a reclaimable pageblock and vice-versa
-		 * but no more than normal fallback logic which can
-		 * have trouble finding a high-order free page.
-		 */
-		return false;
-	}
-
-	capc->page =3D page;
-	return true;
-}
-
-#else
-static inline struct capture_control *task_capc(struct zone *zone)
-{
-	return NULL;
-}
-
-static inline bool
-compaction_capture(struct capture_control *capc, struct page *page,
-		   int order, int migratetype)
-{
-	return false;
-}
-#endif /* CONFIG_COMPACTION */
-
 static inline void account_freepages(struct page *page, struct zone *zone,
 				     int nr_pages, int migratetype)
 {
@@ -1072,6 +1017,99 @@ static inline void del_page_from_free_list(struct pa=
ge *page, struct zone *zone,
 	account_freepages(page, zone, -(1 << order), migratetype);
 }
=20
+/*
+ * The order of subdivision here is critical for the IO subsystem.
+ * Please do not alter this order without good reasons and regression
+ * testing. Specifically, as large blocks of memory are subdivided,
+ * the order in which smaller blocks are delivered depends on the order
+ * they're subdivided in this function. This is the primary factor
+ * influencing the order in which pages are delivered to the IO
+ * subsystem according to empirical testing, and this is also justified
+ * by considering the behavior of a buddy system containing a single
+ * large block of memory acted on by a series of small allocations.
+ * This behavior is a critical factor in sglist merging's success.
+ *
+ * -- nyc
+ */
+static inline void expand(struct zone *zone, struct page *page,
+	int low, int high, int migratetype)
+{
+	unsigned long size =3D 1 << high;
+
+	while (high > low) {
+		high--;
+		size >>=3D 1;
+		VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]);
+
+		/*
+		 * Mark as guard pages (or page), that will allow to
+		 * merge back to allocator when buddy will be freed.
+		 * Corresponding page table entries will not be touched,
+		 * pages will stay not present in virtual address space
+		 */
+		if (set_page_guard(zone, &page[size], high))
+			continue;
+
+		add_to_free_list(&page[size], zone, high, migratetype, false);
+		set_buddy_order(&page[size], high);
+	}
+}
+
+#ifdef CONFIG_COMPACTION
+static inline struct capture_control *task_capc(struct zone *zone)
+{
+	struct capture_control *capc =3D current->capture_control;
+
+	return unlikely(capc && capc->cc) &&
+		!(current->flags & PF_KTHREAD) &&
+		!capc->page &&
+		capc->cc->zone =3D=3D zone ? capc : NULL;
+}
+
+static inline bool
+compaction_capture(struct capture_control *capc, struct page *page,
+		   int order, int migratetype)
+{
+	if (!capc || order !=3D capc->cc->order)
+		return false;
+
+	/* Do not accidentally pollute CMA or isolated regions*/
+	if (is_migrate_cma(migratetype) ||
+	    is_migrate_isolate(migratetype))
+		return false;
+
+	if (order >=3D pageblock_order) {
+		migratetype =3D capc->migratetype;
+		change_pageblock_range(page, order, migratetype);
+	} else if (migratetype =3D=3D MIGRATE_MOVABLE) {
+		/*
+		 * Do not let lower order allocations pollute a
+		 * movable pageblock.  This might let an unmovable
+		 * request use a reclaimable pageblock and vice-versa
+		 * but no more than normal fallback logic which can
+		 * have trouble finding a high-order free page.
+		 */
+		return false;
+	}
+
+	capc->page =3D page;
+	return true;
+}
+
+#else
+static inline struct capture_control *task_capc(struct zone *zone)
+{
+	return NULL;
+}
+
+static inline bool
+compaction_capture(struct capture_control *capc, struct page *page,
+		   int order, int migratetype)
+{
+	return false;
+}
+#endif /* CONFIG_COMPACTION */
+
 /*
  * If this is not the largest possible page, check if the buddy
  * of the next-highest order is free. If it is, it's possible
@@ -2345,44 +2383,6 @@ void __init init_cma_reserved_pageblock(struct page =
*page)
 }
 #endif
=20
-/*
- * The order of subdivision here is critical for the IO subsystem.
- * Please do not alter this order without good reasons and regression
- * testing. Specifically, as large blocks of memory are subdivided,
- * the order in which smaller blocks are delivered depends on the order
- * they're subdivided in this function. This is the primary factor
- * influencing the order in which pages are delivered to the IO
- * subsystem according to empirical testing, and this is also justified
- * by considering the behavior of a buddy system containing a single
- * large block of memory acted on by a series of small allocations.
- * This behavior is a critical factor in sglist merging's success.
- *
- * -- nyc
- */
-static inline void expand(struct zone *zone, struct page *page,
-	int low, int high, int migratetype)
-{
-	unsigned long size =3D 1 << high;
-
-	while (high > low) {
-		high--;
-		size >>=3D 1;
-		VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]);
-
-		/*
-		 * Mark as guard pages (or page), that will allow to
-		 * merge back to allocator when buddy will be freed.
-		 * Corresponding page table entries will not be touched,
-		 * pages will stay not present in virtual address space
-		 */
-		if (set_page_guard(zone, &page[size], high))
-			continue;
-
-		add_to_free_list(&page[size], zone, high, migratetype, false);
-		set_buddy_order(&page[size], high);
-	}
-}
-
 static void check_new_page_bad(struct page *page)
 {
 	if (unlikely(page->flags & __PG_HWPOISON)) {
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 09F2AC77B75
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232438AbjDRTO3 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:29 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57744 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232833AbjDRTOB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:01 -0400
Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com
 [IPv6:2607:f8b0:4864:20::f2d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 959679018
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:31 -0700 (PDT)
Received: by mail-qv1-xf2d.google.com with SMTP id oj8so1343059qvb.11
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845210;
 x=1684437210;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=W40OmA9FJpZ9KM8b83NoKn4oLbbGXwdSocdEiJSdSOk=;
        b=wn0fopvGsbR3i6PvjhCvPLQ1N5JWQrSjYBsgbrLuF6738z7mGxzLK6h6jbm6MmLsmJ
         pF8n/O/stCPiQEof58dQGBFCvd4zD48BHZayas7tdYm+iNpqPr5K40ekJd/L+fqSbnWx
         dF76tgKVdT587CaFRK8QZjlJ0JQitDXtdsiSxO7LH78byqCngXvJYUfGQXVnbLa3i4e/
         Kwi9t5fN+cwPv3JGfkRVdEAeRey7ve3x28vGryxxL8HGqMYxVgzfDZIdBf6OLivjGG2u
         1lZUigMFLGf6JwZyC57zeYywVprLLuC3qi6qYNpVO4TuaZDQqSJWnTgv3OsLRhreN2Jg
         PPfg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845210; x=1684437210;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=W40OmA9FJpZ9KM8b83NoKn4oLbbGXwdSocdEiJSdSOk=;
        b=XdLX64PmwEyHaW4xdnrksQtUAIkUFjb94ZYXuos9Au341K5prLWSYQWyFO44J58llf
         Al/uBIo+y/WfhmjO6Tux4vNk2w0Nlz1KTDjPvBTDHmhWobti+V+8EUS+xw68QDJZlVvF
         TmmQsbWnKiwNvydldf2O1SY43VtMcK8El6a82ct9S7h6Fkh1TxmlKeq4TTlFoqVgn9wV
         KIdyIsb6a4bI6qxPlhPgKha5MoUSEO7qmdXVWj1w6kzIFHK+yLNdZ3hIcwscfK/BUHI5
         fr3gmJM50oxm1BlEc4+Vwscm0IHeWnSz8k+vYde7NaF8nrTYrrAd0ht9jVfalUQAAyS+
         hMow==
X-Gm-Message-State: AAQBX9dcgaP8GDkmJ6vJ5/TO6Y3SG1OVC3TdY6B7DNMS8en/ZWlt2Vs4
        I/MVlbKiiXMK7oTlp1C7jKsqUA==
X-Google-Smtp-Source: 
 AKy350ZdXxpY1FsQYWCci5zoHDWqXfr/R4sK4OR7K8k8+oSMxZVw0C1MOwCqOi9gSXpDKZHblUbJiQ==
X-Received: by 2002:ad4:4ea4:0:b0:5ea:6a2a:140a with SMTP id
 ed4-20020ad44ea4000000b005ea6a2a140amr26931362qvb.16.1681845210696;
        Tue, 18 Apr 2023 12:13:30 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 r1-20020ac87941000000b003df7d7bbc8csm4249673qtt.75.2023.04.18.12.13.30
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:30 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 10/26] mm: page_alloc: allow compaction capturing from
 larger blocks
Date: Tue, 18 Apr 2023 15:12:57 -0400
Message-Id: <20230418191313.268131-11-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Currently, capturing only works on matching orders and matching
migratetypes. However, if capturing is initially skipped on the
migratetype, it's possible that merging continues up to a full
pageblock, in which case the migratetype is up for grabs again.

Allow capturing to grab smaller chunks from claimed pageblocks, and
expand the remainder of the block back onto the freelists.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cd86f80d7bbe..5ebfcf18537b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1067,10 +1067,10 @@ static inline struct capture_control *task_capc(str=
uct zone *zone)
 }
=20
 static inline bool
-compaction_capture(struct capture_control *capc, struct page *page,
-		   int order, int migratetype)
+compaction_capture(struct zone *zone, struct page *page, int order,
+		   int migratetype, struct capture_control *capc)
 {
-	if (!capc || order !=3D capc->cc->order)
+	if (!capc || order < capc->cc->order)
 		return false;
=20
 	/* Do not accidentally pollute CMA or isolated regions*/
@@ -1092,6 +1092,9 @@ compaction_capture(struct capture_control *capc, stru=
ct page *page,
 		return false;
 	}
=20
+	if (order > capc->cc->order)
+		expand(zone, page, capc->cc->order, order, migratetype);
+
 	capc->page =3D page;
 	return true;
 }
@@ -1103,8 +1106,8 @@ static inline struct capture_control *task_capc(struc=
t zone *zone)
 }
=20
 static inline bool
-compaction_capture(struct capture_control *capc, struct page *page,
-		   int order, int migratetype)
+compaction_capture(struct zone *zone, struct page *page, int order,
+		   int migratetype, struct capture_control *capc)
 {
 	return false;
 }
@@ -1180,7 +1183,7 @@ static inline void __free_one_page(struct page *page,
 	while (order < MAX_ORDER - 1) {
 		int buddy_mt;
=20
-		if (compaction_capture(capc, page, order, migratetype))
+		if (compaction_capture(zone, page, order, migratetype, capc))
 			return;
=20
 		buddy =3D find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1A20CC77B7D
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232649AbjDRTOe (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:34 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57802 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232844AbjDRTOC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:02 -0400
Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com
 [IPv6:2607:f8b0:4864:20::f36])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB2B3C147
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:32 -0700 (PDT)
Received: by mail-qv1-xf36.google.com with SMTP id js7so10389891qvb.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845212;
 x=1684437212;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=5cDejj0JwuwHnUJP/oEtDQ+Dx04Y5ly69JDrx/A+L3g=;
        b=CMPWhKkMWeLbM1Rq+FTNlaR0dSWXr97zWbAFsGnhIYGtP+CzQS0HVHu9IoQvtoACE8
         mhYIBqbKGjtBs+LtkNOHMbMHPLnpfnolTnLtMYSphbh3QM/YHVg9xGMX5JifyMhJRDZN
         RvXl9lXXUKC1e1qgby2qHSWtJc2hSDPS51QzKdNqBwfuWtGPigSB5OT3E52N3lqpNtBI
         r4M586z/x6k8gKidXuSXIy+fp2eMvjQvShdWfKbhEZC/kyJxnvrBCpaNZNxoThlYVR5D
         o6a1X3kolTsoZyqQip7R4cFUcH4uGzvAAHQyusAJPnGmI3LEgBvst3Dp6eXqmeXOlajY
         VOcg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845212; x=1684437212;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=5cDejj0JwuwHnUJP/oEtDQ+Dx04Y5ly69JDrx/A+L3g=;
        b=O0oHWUg91uV9IZq/Ab7O6hbVAqnnhd+iy/FN4SV857gcU9eyDUrGwRkAZW5FanZMO0
         ioiXWVFA9eOtEAHUd0zDcN57VdC/M10awR+QU7c6+zEt+0yE22pTj/JPS7k7F59sNk83
         ocFyG6oqlQYo0aBEkTg9AqnPPTgS3vjuxxLTzN9IggseETy7ZXxkLZ338tKIc/+1vo8B
         HxDU00YOZ4vQh2wHqos9VsR0b9XekuBFGhuIh6VqdHpt1/METTTowS/SNpsKj3F1slKE
         lN8MVckmpTkcShxFy6VTWIv/SqU3dg9BEZLhLPAk2TKW6KyLTG6NPRPTz8ObUO3V61Jj
         Gtwg==
X-Gm-Message-State: AAQBX9cEpFCMqNOJ9Qsz33AdZgHXqoJrf/SyHmbi6XeZTDjhx1QZkPbE
        yDfXQ1ZqlU16/+Zg+LcRx3V62g==
X-Google-Smtp-Source: 
 AKy350aMgOd3m7oKqMvaJHVobJC6wjWKHn3Vg6kPYlU6Q91voRGhTKGXzatfaK4cnRY78f9Dp31ahQ==
X-Received: by 2002:a05:6214:e81:b0:5ee:b788:2f48 with SMTP id
 hf1-20020a0562140e8100b005eeb7882f48mr26312276qvb.31.1681845211935;
        Tue, 18 Apr 2023 12:13:31 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 i2-20020a0cf942000000b005ef529dc39esm3619309qvo.108.2023.04.18.12.13.31
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:31 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 11/26] mm: page_alloc: introduce MIGRATE_FREE
Date: Tue, 18 Apr 2023 15:12:58 -0400
Message-Id: <20230418191313.268131-12-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

To cut down on type mixing, put empty pageblocks on separate freelists
and make them the first fallback preference before stealing space from
incompatible blocks.

The neutral block designation will also be handy in subsequent patches
that: simplify compaction; add per-mt freelist counts and make
compaction_suitable() more precise; and ultimately make pageblocks the
basis of free memory management.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mmzone.h |  3 +-
 mm/memory_hotplug.c    |  4 +--
 mm/page_alloc.c        | 63 +++++++++++++++++++++++++++++++-----------
 3 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 53e55882a4e7..20542e5a0a43 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -45,6 +45,7 @@ enum migratetype {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
 	MIGRATE_HIGHATOMIC =3D MIGRATE_PCPTYPES,
+	MIGRATE_FREE,
 #ifdef CONFIG_CMA
 	/*
 	 * MIGRATE_CMA migration type is designed to mimic the way
@@ -88,7 +89,7 @@ static inline bool is_migrate_movable(int mt)
  */
 static inline bool migratetype_is_mergeable(int mt)
 {
-	return mt < MIGRATE_PCPTYPES;
+	return mt < MIGRATE_PCPTYPES || mt =3D=3D MIGRATE_FREE;
 }
=20
 #define for_each_migratetype_order(order, type) \
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fd40f7e9f176..d7b9f0e70b58 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1129,7 +1129,7 @@ int __ref online_pages(unsigned long pfn, unsigned lo=
ng nr_pages,
 		build_all_zonelists(NULL);
=20
 	/* Basic onlining is complete, allow allocation of onlined pages. */
-	undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_MOVABLE);
+	undo_isolate_page_range(pfn, pfn + nr_pages, MIGRATE_FREE);
=20
 	/*
 	 * Freshly onlined pages aren't shuffled (e.g., all pages are placed to
@@ -1951,7 +1951,7 @@ int __ref offline_pages(unsigned long start_pfn, unsi=
gned long nr_pages,
=20
 failed_removal_isolated:
 	/* pushback to free area */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_FREE);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 failed_removal_pcplists_disabled:
 	lru_cache_enable();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5ebfcf18537b..44da23625f51 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -380,6 +380,7 @@ const char * const migratetype_names[MIGRATE_TYPES] =3D=
 {
 	"Movable",
 	"Reclaimable",
 	"HighAtomic",
+	"Free",
 #ifdef CONFIG_CMA
 	"CMA",
 #endif
@@ -1222,6 +1223,13 @@ static inline void __free_one_page(struct page *page,
 done_merging:
 	set_buddy_order(page, order);
=20
+	/* If we freed one or normal page blocks, mark them free. */
+	if (unlikely(order >=3D pageblock_order &&
+		     migratetype_is_mergeable(migratetype))) {
+		change_pageblock_range(page, order, MIGRATE_FREE);
+		migratetype =3D MIGRATE_FREE;
+	}
+
 	if (fpi_flags & FPI_TO_TAIL)
 		to_tail =3D true;
 	else if (is_shuffle_order(order))
@@ -1961,14 +1969,14 @@ static void __init deferred_free_range(unsigned lon=
g pfn,
=20
 	/* Free a large naturally-aligned chunk if possible */
 	if (nr_pages =3D=3D pageblock_nr_pages && pageblock_aligned(pfn)) {
-		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+		set_pageblock_migratetype(page, MIGRATE_FREE);
 		__free_pages_core(page, pageblock_order);
 		return;
 	}
=20
 	for (i =3D 0; i < nr_pages; i++, page++, pfn++) {
 		if (pageblock_aligned(pfn))
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			set_pageblock_migratetype(page, MIGRATE_FREE);
 		__free_pages_core(page, 0);
 	}
 }
@@ -2612,10 +2620,10 @@ struct page *__rmqueue_smallest(struct zone *zone, =
unsigned int order,
  *
  * The other migratetypes do not have fallbacks.
  */
-static int fallbacks[MIGRATE_TYPES][3] =3D {
-	[MIGRATE_UNMOVABLE]   =3D { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRA=
TE_TYPES },
-	[MIGRATE_MOVABLE]     =3D { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRA=
TE_TYPES },
-	[MIGRATE_RECLAIMABLE] =3D { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRA=
TE_TYPES },
+static int fallbacks[MIGRATE_TYPES][4] =3D {
+	[MIGRATE_UNMOVABLE]   =3D { MIGRATE_FREE, MIGRATE_RECLAIMABLE, MIGRATE_MO=
VABLE,   MIGRATE_TYPES },
+	[MIGRATE_MOVABLE]     =3D { MIGRATE_FREE, MIGRATE_RECLAIMABLE, MIGRATE_UN=
MOVABLE, MIGRATE_TYPES },
+	[MIGRATE_RECLAIMABLE] =3D { MIGRATE_FREE, MIGRATE_UNMOVABLE,   MIGRATE_MO=
VABLE,   MIGRATE_TYPES },
 };
=20
 #ifdef CONFIG_CMA
@@ -2705,8 +2713,13 @@ int move_freepages_block(struct zone *zone, struct p=
age *page,
  * is worse than movable allocations stealing from unmovable and reclaimab=
le
  * pageblocks.
  */
-static bool can_steal_fallback(unsigned int order, int start_mt)
+static bool can_steal_fallback(unsigned int order, int start_mt,
+			       int fallback_mt)
 {
+	/* The first allocation in a free block *must* claim it. */
+	if (fallback_mt =3D=3D MIGRATE_FREE)
+		return true;
+
 	/*
 	 * Leaving this order check is intended, although there is
 	 * relaxed order check in next check. The reason is that
@@ -2808,6 +2821,21 @@ static void steal_suitable_fallback(struct zone *zon=
e, struct page *page,
 	free_pages =3D move_freepages_block(zone, page, old_block_type,
 					  start_type, &movable_pages);
=20
+	/*
+	 * If we fell back into a free block, claim the whole thing
+	 */
+	if (old_block_type =3D=3D MIGRATE_FREE) {
+		set_pageblock_migratetype(page, start_type);
+		if (!free_pages) {
+			/*
+			 * This can leave some non-FREE pages on the
+			 * FREE list. Future fallbacks will get them.
+			 */
+			goto single_page;
+		}
+		return;
+	}
+
 	/*
 	 * Determine how many pages are compatible with our allocation.
 	 * For movable allocation, it's the number of movable pages which
@@ -2873,7 +2901,7 @@ int find_suitable_fallback(struct free_area *area, un=
signed int order,
 		if (free_area_empty(area, fallback_mt))
 			continue;
=20
-		if (can_steal_fallback(order, migratetype))
+		if (can_steal_fallback(order, migratetype, fallback_mt))
 			*can_steal =3D true;
=20
 		if (!only_stealable)
@@ -3485,7 +3513,7 @@ void free_unref_page(struct page *page, unsigned int =
order)
 	 */
 	migratetype =3D get_pcppage_migratetype(page);
 	if (unlikely(migratetype >=3D MIGRATE_PCPTYPES)) {
-		if (unlikely(is_migrate_isolate(migratetype))) {
+		if (unlikely(is_migrate_isolate(migratetype) || migratetype =3D=3D MIGRA=
TE_FREE)) {
 			free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
 			return;
 		}
@@ -3529,7 +3557,7 @@ void free_unref_page_list(struct list_head *list)
 		 * comment in free_unref_page.
 		 */
 		migratetype =3D get_pcppage_migratetype(page);
-		if (unlikely(is_migrate_isolate(migratetype))) {
+		if (unlikely(is_migrate_isolate(migratetype) || migratetype =3D=3D MIGRA=
TE_FREE)) {
 			list_del(&page->lru);
 			free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
 			continue;
@@ -3632,10 +3660,10 @@ int __isolate_free_page(struct page *page, unsigned=
 int order)
 	del_page_from_free_list(page, zone, order, mt);
=20
 	/*
-	 * Set the pageblock if the isolated page is at least half of a
-	 * pageblock
+	 * Set the pageblock if the isolated page is from a free block
+	 * or at least half of a pageblock
 	 */
-	if (order >=3D pageblock_order - 1) {
+	if (mt =3D=3D MIGRATE_FREE || order >=3D pageblock_order - 1) {
 		struct page *endpage =3D page + (1 << order) - 1;
 		for (; page < endpage; page +=3D pageblock_nr_pages) {
 			int mt =3D get_pageblock_migratetype(page);
@@ -4020,6 +4048,9 @@ bool __zone_watermark_ok(struct zone *z, unsigned int=
 order, unsigned long mark,
 		if (!area->nr_free)
 			continue;
=20
+		if (!free_area_empty(area, MIGRATE_FREE))
+			return true;
+
 		for (mt =3D 0; mt < MIGRATE_PCPTYPES; mt++) {
 			if (!free_area_empty(area, mt))
 				return true;
@@ -6081,6 +6112,7 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_MOVABLE]	=3D 'M',
 		[MIGRATE_RECLAIMABLE]	=3D 'E',
 		[MIGRATE_HIGHATOMIC]	=3D 'H',
+		[MIGRATE_FREE]		=3D 'F',
 #ifdef CONFIG_CMA
 		[MIGRATE_CMA]		=3D 'C',
 #endif
@@ -7025,7 +7057,7 @@ static void __init memmap_init_zone_range(struct zone=
 *zone,
 		return;
=20
 	memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
-			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
+			  zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_FREE);
=20
 	if (*hole_pfn < start_pfn)
 		init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
@@ -9422,8 +9454,7 @@ static int __alloc_contig_pages(unsigned long start_p=
fn,
 {
 	unsigned long end_pfn =3D start_pfn + nr_pages;
=20
-	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
-				  gfp_mask);
+	return alloc_contig_range(start_pfn, end_pfn, MIGRATE_FREE, gfp_mask);
 }
=20
 static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2BC9EC77B78
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232832AbjDRTOi (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57756 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232852AbjDRTOD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:03 -0400
Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com
 [IPv6:2607:f8b0:4864:20::831])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04C291027A
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:33 -0700 (PDT)
Received: by mail-qt1-x831.google.com with SMTP id a23so25544231qtj.8
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845213;
 x=1684437213;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=VKzxWkw9Wp7aX3AQonPLuV4v1F6kfO1MbtWYkZecR34=;
        b=fnKaIHJG7NChil0OqR30RHO2yD9ilNfbfzz+l5UrDoOhQMcIjn9Z7u0gwD85iPVAp0
         tffEBcyMY2FouSlHJNy6srGCvt5E9IDj4dJBVKRE6XA/ar/BC0EWMcwavJ9D8HNRuvlG
         Qdrqep2d7P+SpC95Rfj9aYd53ubAphhsP8E2gw2I9iDczVYbpjshQr8hBgBoVIb0a4N3
         k6mQGeB1gv6RgDAMcI2tUgO6k3f+/MPwwLRAHp5lTK9eH/oa31Gg7tq6UC02MZCEJxJ0
         CZVghku4DrrkzUssr6FthK28bwb1ynE8SrIKpU9sP12KWUg87rBmxhFx+GjHDJFIysNI
         WwlQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845213; x=1684437213;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=VKzxWkw9Wp7aX3AQonPLuV4v1F6kfO1MbtWYkZecR34=;
        b=VvieWOPiBSKHuSrqB0h+IoVE09JZVQ4+tEOhFQ4L4beIchTpQRb3eWzni78limmrvk
         Cn6lDbjzFFgTOoKupv22kXB9sOn7HOz6RkLduSs0oLF2T5bLT/d3EIxX0wP4f2qEs0JG
         4fR+3wQKx8uvoZTu+hkXWsVfttjg2hBpBrZbMr+HpdTLFsLaY28koNAvxiJ8T8DQy0Zk
         76OkTT6TWjORbTrFpuUpTN50wy1s8Wa2BpxpCb0BMsIQNxzJwwB5mxjjSfe9PqVoULHy
         do9xpKZ9I2uuK4ditaXOy6LNML3RvYfnWg9/88LXJhjqx9mysEjwJRDJEr/WrC7isBgt
         og8g==
X-Gm-Message-State: AAQBX9cq8xtWc6+bohukx+Xwt2rLTYhmNIgJiIdakbwt7/aQGTF594yA
        D4xgWeVmf1nKwxc7J9bd4u59kA==
X-Google-Smtp-Source: 
 AKy350YRTIp6emLy7bR9ZUZa/RhO/Obyaw6AxSNUEm8mzOLj17NAjGHuS3AbcjQstxdD3Htg+F7UQg==
X-Received: by 2002:a05:622a:4d1:b0:3ef:3d3f:17a2 with SMTP id
 q17-20020a05622a04d100b003ef3d3f17a2mr1327574qtx.68.1681845213072;
        Tue, 18 Apr 2023 12:13:33 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 y30-20020a05620a09de00b0074683c45f6csm4141283qky.1.2023.04.18.12.13.32
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:32 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 12/26] mm: page_alloc: per-migratetype free counts
Date: Tue, 18 Apr 2023 15:12:59 -0400
Message-Id: <20230418191313.268131-13-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Increase visibility into the defragmentation behavior by tracking and
reporting per-migratetype free counters.

Subsequent patches will also use those counters to make more targeted
reclaim/compaction decisions.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/mmzone.h |  5 +++++
 mm/page_alloc.c        | 29 +++++++++++++++++++++++++----
 mm/vmstat.c            |  5 +++++
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 20542e5a0a43..d1083ab81998 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -139,6 +139,11 @@ enum numa_stat_item {
 enum zone_stat_item {
 	/* First 128 byte cacheline (assuming 64 bit words) */
 	NR_FREE_PAGES,
+	NR_FREE_UNMOVABLE,
+	NR_FREE_MOVABLE,
+	NR_FREE_RECLAIMABLE,
+	NR_FREE_HIGHATOMIC,
+	NR_FREE_FREE,
 	NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
 	NR_ZONE_INACTIVE_ANON =3D NR_ZONE_LRU_BASE,
 	NR_ZONE_ACTIVE_ANON,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 44da23625f51..5f2a0037bed1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -959,8 +959,12 @@ static inline void account_freepages(struct page *page=
, struct zone *zone,
=20
 	__mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
=20
-	if (is_migrate_cma(migratetype))
+	if (migratetype <=3D MIGRATE_FREE)
+		__mod_zone_page_state(zone, NR_FREE_UNMOVABLE + migratetype, nr_pages);
+	else if (is_migrate_cma(migratetype))
 		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
+	else
+		VM_WARN_ONCE(1, "unexpected migratetype %d\n", migratetype);
 }
=20
 /* Used for pages not on another list */
@@ -6175,7 +6179,9 @@ void __show_free_areas(unsigned int filter, nodemask_=
t *nodemask, int max_zone_i
 		" mapped:%lu shmem:%lu pagetables:%lu\n"
 		" sec_pagetables:%lu bounce:%lu\n"
 		" kernel_misc_reclaimable:%lu\n"
-		" free:%lu free_pcp:%lu free_cma:%lu\n",
+		" free:%lu free_unmovable:%lu free_movable:%lu\n"
+		" free_reclaimable:%lu free_highatomic:%lu free_free:%lu\n"
+		" free_cma:%lu free_pcp:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
 		global_node_page_state(NR_INACTIVE_ANON),
 		global_node_page_state(NR_ISOLATED_ANON),
@@ -6194,8 +6200,13 @@ void __show_free_areas(unsigned int filter, nodemask=
_t *nodemask, int max_zone_i
 		global_zone_page_state(NR_BOUNCE),
 		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE),
 		global_zone_page_state(NR_FREE_PAGES),
-		free_pcp,
-		global_zone_page_state(NR_FREE_CMA_PAGES));
+		global_zone_page_state(NR_FREE_UNMOVABLE),
+		global_zone_page_state(NR_FREE_MOVABLE),
+		global_zone_page_state(NR_FREE_RECLAIMABLE),
+		global_zone_page_state(NR_FREE_HIGHATOMIC),
+		global_zone_page_state(NR_FREE_FREE),
+		global_zone_page_state(NR_FREE_CMA_PAGES),
+		free_pcp);
=20
 	for_each_online_pgdat(pgdat) {
 		if (show_mem_node_skip(filter, pgdat->node_id, nodemask))
@@ -6273,6 +6284,11 @@ void __show_free_areas(unsigned int filter, nodemask=
_t *nodemask, int max_zone_i
 		printk(KERN_CONT
 			"%s"
 			" free:%lukB"
+			" free_unmovable:%lukB"
+			" free_movable:%lukB"
+			" free_reclaimable:%lukB"
+			" free_highatomic:%lukB"
+			" free_free:%lukB"
 			" boost:%lukB"
 			" min:%lukB"
 			" low:%lukB"
@@ -6294,6 +6310,11 @@ void __show_free_areas(unsigned int filter, nodemask=
_t *nodemask, int max_zone_i
 			"\n",
 			zone->name,
 			K(zone_page_state(zone, NR_FREE_PAGES)),
+			K(zone_page_state(zone, NR_FREE_UNMOVABLE)),
+			K(zone_page_state(zone, NR_FREE_MOVABLE)),
+			K(zone_page_state(zone, NR_FREE_RECLAIMABLE)),
+			K(zone_page_state(zone, NR_FREE_HIGHATOMIC)),
+			K(zone_page_state(zone, NR_FREE_FREE)),
 			K(zone->watermark_boost),
 			K(min_wmark_pages(zone)),
 			K(low_wmark_pages(zone)),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1ea6a5ce1c41..c8b8e6e259da 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1168,6 +1168,11 @@ int fragmentation_index(struct zone *zone, unsigned =
int order)
 const char * const vmstat_text[] =3D {
 	/* enum zone_stat_item counters */
 	"nr_free_pages",
+	"nr_free_unmovable",
+	"nr_free_movable",
+	"nr_free_reclaimable",
+	"nr_free_highatomic",
+	"nr_free_free",
 	"nr_zone_inactive_anon",
 	"nr_zone_active_anon",
 	"nr_zone_inactive_file",
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 59193C77B7E
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232936AbjDRTOu (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57786 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232861AbjDRTOD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:03 -0400
Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com
 [IPv6:2607:f8b0:4864:20::82a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F8CFBB8A
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:35 -0700 (PDT)
Received: by mail-qt1-x82a.google.com with SMTP id gb12so27467599qtb.6
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845214;
 x=1684437214;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=fvrO/1+4yglcd2Jvs04tq1hDUyEyh2m1Sh3LN7l5OG8=;
        b=YCxaymFjTOB/xEVxOKn/iRYET7rdfjWPu3AZDYJ5KtaGsEjeOO2z8QOjdpBzKiI+fh
         zq/KjeYnjCaxkKVGYNcD9mghxgZwPN5syoeQUj2HsktOmS7F69bhZq+UdMOlukwdy6mr
         zhoMS/Tx7lhyB4yQTexZo6hq18nMIH/j4LR8HxWWTAFstqWcBJC/+aaBnM/4R2GA77/e
         eaZxJoD9eYoQfHGUVsharCxWu8P9awFx8F/6/oFYKGnqiIiwRsnchL6XPeWp9ufx15vN
         it9IVwB25jyUUFrrcgBYhumIpZbTgM/FEDmcZyOcdKC+L06MuZYxfBEyLL+xGpVRPQMZ
         aekw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845214; x=1684437214;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=fvrO/1+4yglcd2Jvs04tq1hDUyEyh2m1Sh3LN7l5OG8=;
        b=UO3EGx4YHvhTZAmSwdCeeEiwWKGwh+jd2ZP6iuYxKnaJF7DZcCGIOfqWujqkZoifcf
         CW9bszHuGILq/d3kT0uPHM+HBZxKPT7yY8BGxa0JHlWMjIViERoTBgMUREJLbRxti7Jk
         PWZ7cf6ZAIl1gnFtvxsJLGRk0W2QuJ7vAL1+7ZnHiVAQQt9BBXOguVa8iT3pDSgqGpnY
         xjhDF8IPtq2U6JFP7Z5BeUFDAPbKfibo8he8jnBzGmJeG4OY4rwoc7fQ7D7rR0oJKJbL
         u9dAdDpwdcIJbfqwQQrC/dpgo9RKxkZZMLbIDJFZDJ6f9oJSY/tnV81oqFBm2BMPhZBM
         S3rQ==
X-Gm-Message-State: AAQBX9cJUUe77mI2MvHurnZ2HMH6H1J3sAogsV4X+HSYoXX1tnwusgki
        A3c5iTvrtYx4B3bge8111POKe7GXt7zAbcBLF8U=
X-Google-Smtp-Source: 
 AKy350b8I9apZCXgd3Fc463SUtxO97BghAjCQl+N88Qns+QuGdCqUDlXOuuktIWM1UP9lMqTVC8ihw==
X-Received: by 2002:ac8:4e93:0:b0:3ef:3880:9db6 with SMTP id
 19-20020ac84e93000000b003ef38809db6mr1674175qtp.6.1681845214206;
        Tue, 18 Apr 2023 12:13:34 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 f17-20020a05622a1a1100b003ef415f0184sm39541qtb.69.2023.04.18.12.13.33
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:33 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 13/26] mm: compaction: remove compaction result helpers
Date: Tue, 18 Apr 2023 15:13:00 -0400
Message-Id: <20230418191313.268131-14-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

I found myself repeatedly looking up the implementation of these
helpers while working on the code, which suggests they are not a
helpful abstraction. Inline them.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/compaction.h     | 92 ----------------------------------
 include/trace/events/mmflags.h |  4 +-
 mm/page_alloc.c                | 30 ++++++-----
 3 files changed, 19 insertions(+), 107 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 06eeb2e25833..7635e220215a 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -103,78 +103,6 @@ extern enum compact_result compaction_suitable(struct =
zone *zone, int order,
 extern void compaction_defer_reset(struct zone *zone, int order,
 				bool alloc_success);
=20
-/* Compaction has made some progress and retrying makes sense */
-static inline bool compaction_made_progress(enum compact_result result)
-{
-	/*
-	 * Even though this might sound confusing this in fact tells us
-	 * that the compaction successfully isolated and migrated some
-	 * pageblocks.
-	 */
-	if (result =3D=3D COMPACT_SUCCESS)
-		return true;
-
-	return false;
-}
-
-/* Compaction has failed and it doesn't make much sense to keep retrying. =
*/
-static inline bool compaction_failed(enum compact_result result)
-{
-	/* All zones were scanned completely and still not result. */
-	if (result =3D=3D COMPACT_COMPLETE)
-		return true;
-
-	return false;
-}
-
-/* Compaction needs reclaim to be performed first, so it can continue. */
-static inline bool compaction_needs_reclaim(enum compact_result result)
-{
-	/*
-	 * Compaction backed off due to watermark checks for order-0
-	 * so the regular reclaim has to try harder and reclaim something.
-	 */
-	if (result =3D=3D COMPACT_SKIPPED)
-		return true;
-
-	return false;
-}
-
-/*
- * Compaction has backed off for some reason after doing some work or none
- * at all. It might be throttling or lock contention. Retrying might be st=
ill
- * worthwhile, but with a higher priority if allowed.
- */
-static inline bool compaction_withdrawn(enum compact_result result)
-{
-	/*
-	 * If compaction is deferred for high-order allocations, it is
-	 * because sync compaction recently failed. If this is the case
-	 * and the caller requested a THP allocation, we do not want
-	 * to heavily disrupt the system, so we fail the allocation
-	 * instead of entering direct reclaim.
-	 */
-	if (result =3D=3D COMPACT_DEFERRED)
-		return true;
-
-	/*
-	 * If compaction in async mode encounters contention or blocks higher
-	 * priority task we back off early rather than cause stalls.
-	 */
-	if (result =3D=3D COMPACT_CONTENDED)
-		return true;
-
-	/*
-	 * Page scanners have met but we haven't scanned full zones so this
-	 * is a back off in fact.
-	 */
-	if (result =3D=3D COMPACT_PARTIAL_SKIPPED)
-		return true;
-
-	return false;
-}
-
-
 bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 					int alloc_flags);
=20
@@ -193,26 +121,6 @@ static inline enum compact_result compaction_suitable(=
struct zone *zone, int ord
 	return COMPACT_SKIPPED;
 }
=20
-static inline bool compaction_made_progress(enum compact_result result)
-{
-	return false;
-}
-
-static inline bool compaction_failed(enum compact_result result)
-{
-	return false;
-}
-
-static inline bool compaction_needs_reclaim(enum compact_result result)
-{
-	return false;
-}
-
-static inline bool compaction_withdrawn(enum compact_result result)
-{
-	return true;
-}
-
 static inline void kcompactd_run(int nid)
 {
 }
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 412b5a46374c..47bfeca4cf02 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -222,8 +222,8 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
 #define compact_result_to_feedback(result)	\
 ({						\
 	enum compact_result __result =3D result;	\
-	(compaction_failed(__result)) ? COMPACTION_FAILED : \
-		(compaction_withdrawn(__result)) ? COMPACTION_WITHDRAWN : COMPACTION_PRO=
GRESS; \
+	(__result =3D=3D COMPACT_COMPLETE) ? COMPACTION_FAILED : \
+		(__result =3D=3D COMPACT_SUCCESS) ? COMPACTION_PROGRESS : COMPACTION_WIT=
HDRAWN; \
 })
=20
 #define COMPACTION_FEEDBACK		\
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5f2a0037bed1..c3b7dc479936 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4620,35 +4620,39 @@ should_compact_retry(struct alloc_context *ac, int =
order, int alloc_flags,
 	if (fatal_signal_pending(current))
 		return false;
=20
-	if (compaction_made_progress(compact_result))
+	/*
+	 * Compaction managed to coalesce some page blocks, but the
+	 * allocation failed presumably due to a race. Retry some.
+	 */
+	if (compact_result =3D=3D COMPACT_SUCCESS)
 		(*compaction_retries)++;
=20
 	/*
-	 * compaction considers all the zone as desperately out of memory
-	 * so it doesn't really make much sense to retry except when the
+	 * All zones were scanned completely and still no result. It
+	 * doesn't really make much sense to retry except when the
 	 * failure could be caused by insufficient priority
 	 */
-	if (compaction_failed(compact_result))
+	if (compact_result =3D=3D COMPACT_COMPLETE)
 		goto check_priority;
=20
 	/*
-	 * compaction was skipped because there are not enough order-0 pages
-	 * to work with, so we retry only if it looks like reclaim can help.
+	 * Compaction was skipped due to a lack of free order-0
+	 * migration targets. Continue if reclaim can help.
 	 */
-	if (compaction_needs_reclaim(compact_result)) {
+	if (compact_result =3D=3D COMPACT_SKIPPED) {
 		ret =3D compaction_zonelist_suitable(ac, order, alloc_flags);
 		goto out;
 	}
=20
 	/*
-	 * make sure the compaction wasn't deferred or didn't bail out early
-	 * due to locks contention before we declare that we should give up.
-	 * But the next retry should use a higher priority if allowed, so
-	 * we don't just keep bailing out endlessly.
+	 * If compaction backed due to being deferred, due to
+	 * contended locks in async mode, or due to scanners meeting
+	 * after a partial scan, retry with increased priority.
 	 */
-	if (compaction_withdrawn(compact_result)) {
+	if (compact_result =3D=3D COMPACT_DEFERRED ||
+	    compact_result =3D=3D COMPACT_CONTENDED ||
+	    compact_result =3D=3D COMPACT_PARTIAL_SKIPPED)
 		goto check_priority;
-	}
=20
 	/*
 	 * !costly requests are much more important than __GFP_RETRY_MAYFAIL
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AA08EC7EE22
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233081AbjDRTPC (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:02 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57848 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232868AbjDRTOE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:04 -0400
Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com
 [IPv6:2607:f8b0:4864:20::f36])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68CF68A60
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:36 -0700 (PDT)
Received: by mail-qv1-xf36.google.com with SMTP id op30so18259176qvb.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845215;
 x=1684437215;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=l59ZyyVgyCUBMvUoPArfOg/Tu2nHrtxfgxoKhkDEsRQ=;
        b=fHLEDZEsV9N0D0X0rv9luI4kf2Ny4hu81LBmLjiQQISEpJJ+RqngAVpAPnFoR0YJg5
         +rnu1yHL/o1ivzsyaceccwRhzNhhqiyoYCKe93qsF3ayw2qwII6swNfekI6XGKLg0WKN
         ufiP8FtZHi0gw8HoJMh10Ahc4nfREGT88vUp2AeAYI8oivqhu8YfXmH0mXPBZwkUumtM
         cdyBUXt7wmT0HtYc+n8SW7cn8uD60GO8c+eVN2+jqkKgf8Z6KnRU8bZkIL2jEokwqAx5
         sPCj0+lVYfaoZN3UpGcQ5ZNRelLySP7T0KS9IIKSrs2ehaDuMkRxPwMV16n9lIi+UQCl
         Tevw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845215; x=1684437215;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=l59ZyyVgyCUBMvUoPArfOg/Tu2nHrtxfgxoKhkDEsRQ=;
        b=beB6hRg24HWRMEAEKTQX7OIcUtV69Q05qNXP9B3mcC13LHk0eV/03q6bwYW7FXPmhz
         4c2UL3ZNJpP3Anbx8YR8mXgizetgeSO+C52W77ouSb+hF7puug7CMrjVLGbfKussik1d
         XFFlVnj6sJ1PIn3H0pRaTKFenWpR6JQ+3iOIm2DAcBvVp9HphfEMH2vx/uDDhIwVjVAN
         J/mBr8Uuab8x+XRsFgmqVzKHVF4+QLRz7w6j0OnhclweSG49i7gpUJhRiORCLhwZAthB
         UVVP+NU7EkadOBPZaObOe4Dm7fUcWsytkXWDg+QfiRNqrTNymaI4v03EAAQAXTNLx+WE
         N/Mw==
X-Gm-Message-State: AAQBX9dobuq8Y90mFHFi/M7Gyd/1AeXxSkegUql3/JCme/bWD6avVCd0
        obWJgZi3Ihyyc9waigKxdPg8tw==
X-Google-Smtp-Source: 
 AKy350Y+h1X4MZEW+S29kRDU7ycQ7OqCtKgHBzDa7byusJ+iYdu4NQRhrwkjbrpSqc3/yAnVuA2tCQ==
X-Received: by 2002:a05:6214:1942:b0:5f1:6904:a2d6 with SMTP id
 q2-20020a056214194200b005f16904a2d6mr1035437qvk.51.1681845215347;
        Tue, 18 Apr 2023 12:13:35 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 k15-20020a05620a414f00b007463509f94asm4089576qko.55.2023.04.18.12.13.34
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:35 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 14/26] mm: compaction: simplify should_compact_retry()
Date: Tue, 18 Apr 2023 15:13:01 -0400
Message-Id: <20230418191313.268131-15-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

The different branches for retry are unnecessarily complicated. There
is really only three outcomes: progress, skipped, failed. Also, the
retry counter only applies to loops that made progress, move it there.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 60 +++++++++++++++++--------------------------------
 1 file changed, 20 insertions(+), 40 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c3b7dc479936..18fa2bbba44b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4608,7 +4608,6 @@ should_compact_retry(struct alloc_context *ac, int or=
der, int alloc_flags,
 		     enum compact_priority *compact_priority,
 		     int *compaction_retries)
 {
-	int max_retries =3D MAX_COMPACT_RETRIES;
 	int min_priority;
 	bool ret =3D false;
 	int retries =3D *compaction_retries;
@@ -4621,19 +4620,27 @@ should_compact_retry(struct alloc_context *ac, int =
order, int alloc_flags,
 		return false;
=20
 	/*
-	 * Compaction managed to coalesce some page blocks, but the
-	 * allocation failed presumably due to a race. Retry some.
+	 * Compaction coalesced some page blocks, but the allocation
+	 * failed, presumably due to a race. Retry a few times.
 	 */
-	if (compact_result =3D=3D COMPACT_SUCCESS)
-		(*compaction_retries)++;
+	if (compact_result =3D=3D COMPACT_SUCCESS) {
+		int max_retries =3D MAX_COMPACT_RETRIES;
=20
-	/*
-	 * All zones were scanned completely and still no result. It
-	 * doesn't really make much sense to retry except when the
-	 * failure could be caused by insufficient priority
-	 */
-	if (compact_result =3D=3D COMPACT_COMPLETE)
-		goto check_priority;
+		/*
+		 * !costly requests are much more important than
+		 * __GFP_RETRY_MAYFAIL costly ones because they are de
+		 * facto nofail and invoke OOM killer to move on while
+		 * costly can fail and users are ready to cope with
+		 * that. 1/4 retries is rather arbitrary but we would
+		 * need much more detailed feedback from compaction to
+		 * make a better decision.
+		 */
+		if (order > PAGE_ALLOC_COSTLY_ORDER)
+			max_retries /=3D 4;
+
+		ret =3D ++(*compaction_retries) <=3D MAX_COMPACT_RETRIES;
+		goto out;
+	}
=20
 	/*
 	 * Compaction was skipped due to a lack of free order-0
@@ -4645,35 +4652,8 @@ should_compact_retry(struct alloc_context *ac, int o=
rder, int alloc_flags,
 	}
=20
 	/*
-	 * If compaction backed due to being deferred, due to
-	 * contended locks in async mode, or due to scanners meeting
-	 * after a partial scan, retry with increased priority.
-	 */
-	if (compact_result =3D=3D COMPACT_DEFERRED ||
-	    compact_result =3D=3D COMPACT_CONTENDED ||
-	    compact_result =3D=3D COMPACT_PARTIAL_SKIPPED)
-		goto check_priority;
-
-	/*
-	 * !costly requests are much more important than __GFP_RETRY_MAYFAIL
-	 * costly ones because they are de facto nofail and invoke OOM
-	 * killer to move on while costly can fail and users are ready
-	 * to cope with that. 1/4 retries is rather arbitrary but we
-	 * would need much more detailed feedback from compaction to
-	 * make a better decision.
-	 */
-	if (order > PAGE_ALLOC_COSTLY_ORDER)
-		max_retries /=3D 4;
-	if (*compaction_retries <=3D max_retries) {
-		ret =3D true;
-		goto out;
-	}
-
-	/*
-	 * Make sure there are attempts at the highest priority if we exhausted
-	 * all retries or failed at the lower priorities.
+	 * Compaction failed. Retry with increasing priority.
 	 */
-check_priority:
 	min_priority =3D (order > PAGE_ALLOC_COSTLY_ORDER) ?
 			MIN_COMPACT_COSTLY_PRIORITY : MIN_COMPACT_PRIORITY;
=20
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 78B28C77B75
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:20:51 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232858AbjDRTOq (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:46 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57838 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232865AbjDRTOE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:04 -0400
Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com
 [IPv6:2607:f8b0:4864:20::f2e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55E4EAF26
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:37 -0700 (PDT)
Received: by mail-qv1-xf2e.google.com with SMTP id js7so10390143qvb.5
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845216;
 x=1684437216;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=oyoKJQkIW9YTYt09kObRpzKT5MnW6BvwQNuIxs7T4Tc=;
        b=aeYONsgPSzQQgWyb5FHu2pLc4enHXJiXdaQ/URnHTI4N+Nq62uPdJYSr8YJxsZo7DW
         gB6lifiemln0yXBoUpetZdAa+m3U7Murk1q9y6eEqAxekRgWZCiGw2kWjYInRjtT3eFT
         DkRD7r1ECPLst/vkbKvyzSRNiHJuj9OxC47RNcVmUl1WWTCQ+sn8Jx5NKLfVkdObtBW0
         S8SXnhzF5QsXyMf5O/UpkE9MPOzstc3gDLc3FhYxPAPxbgDfog2ZjfNGJa31r+oOo2GA
         REeMAyuiQFEJZ21mYaqXnYQUFwEa8R+Jwrb338SotrhcoLFx+tzMMjRnzYViLbYZJo1o
         t/mQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845216; x=1684437216;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=oyoKJQkIW9YTYt09kObRpzKT5MnW6BvwQNuIxs7T4Tc=;
        b=gSxzYP80a+LKVPfxNmf+/4LwXi4BvH2v4EaT0q+vNBWgK+jt7DuqXvxGTktq+rl78A
         8mzf0E7VNMAzGZ4gYfctYHrxlU/kxy1lVaniRkSTC6vuDGXf9JYcYbKB/cRWKy/IEpHY
         53WBPhAzR9QPU6NmZycmYot5hNTVAi9kbKfiUDb3RE5spWbkm/30v0D65f9SFa6pgEXD
         Pm28fraKzJoHJYsWB9aLNHEa0lPgU9jJceJQ/RjY2C24yIB01y+jQw8c1gB2PrCWQtVr
         JGsrarAaJf8Z6ObrPeqgE2O6eus6pBZmPy8xFN3aR/hIgHXSEjP7JQlMGQbq2smwiC8z
         O+9g==
X-Gm-Message-State: AAQBX9ffRI7ST9aEfQD4Pb7qAdmlTKPmeNUtQ1tmKGQEQltc15NR3VZK
        WCsMujkOPdgUbKYXy/lIqS21LBegX80+pH/k2G8=
X-Google-Smtp-Source: 
 AKy350awQ/PRfC/BhFkVKh85YOny0ld5DKS6bQHcNwzqu22Ols+1ZFQkDT9bbDkAlKwae5T6XGpjKg==
X-Received: by 2002:ad4:5762:0:b0:5ef:739a:1c46 with SMTP id
 r2-20020ad45762000000b005ef739a1c46mr16977002qvx.1.1681845216529;
        Tue, 18 Apr 2023 12:13:36 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 kr24-20020a0562142b9800b005eee320b5d7sm3844283qvb.63.2023.04.18.12.13.36
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:36 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 15/26] mm: compaction: simplify free block check in
 suitable_migration_target()
Date: Tue, 18 Apr 2023 15:13:02 -0400
Message-Id: <20230418191313.268131-16-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Free page blocks are now marked MIGRATE_FREE. Consult that directly.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a2280001eea3..b9eed0d43403 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1281,22 +1281,17 @@ static bool suitable_migration_source(struct compac=
t_control *cc,
 static bool suitable_migration_target(struct compact_control *cc,
 							struct page *page)
 {
+	int mt =3D get_pageblock_migratetype(page);
+
 	/* If the page is a large free page, then disallow migration */
-	if (PageBuddy(page)) {
-		/*
-		 * We are checking page_order without zone->lock taken. But
-		 * the only small danger is that we skip a potentially suitable
-		 * pageblock, so it's not worth to check order for valid range.
-		 */
-		if (buddy_order_unsafe(page) >=3D pageblock_order)
-			return false;
-	}
+	if (mt =3D=3D MIGRATE_FREE)
+		return false;
=20
 	if (cc->ignore_block_suitable)
 		return true;
=20
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (is_migrate_movable(get_pageblock_migratetype(page)))
+	if (is_migrate_movable(mt))
 		return true;
=20
 	/* Otherwise skip the block */
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 42F19C77B7F
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232839AbjDRTOl (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57792 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232867AbjDRTOE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:04 -0400
Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com
 [IPv6:2607:f8b0:4864:20::836])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CF177AAC
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:38 -0700 (PDT)
Received: by mail-qt1-x836.google.com with SMTP id u37so5543976qtc.10
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845217;
 x=1684437217;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=G1QAyddYnWJUKuKAIak/itfaS7Ph3sBa4ofaAGmayWk=;
        b=ROm0ArlPuXBY1EzWc/8PRgqaTVBI+lb7icLJ0Q57BP9ugdMTZmrUmryBS96SSVH9Nh
         eeLrq3q4kINpOOeYwSiqOxAJrdmOgMg80zVTThvjx9VkzE5k0WCPil9tESj3Hku5cLuy
         SPJk1gyPDWl9cU4d7hzMlhCvpJn6xdGkNXplbkrtzq0OiV6LSdAYuNrP+KJa/f3MP5dx
         4wKlDRUsNPqCre2MCxjIRsO9xKT8/coTjpyJXWIkKBFyHmv7zdnJQKHUNs+zIEe6gm+0
         cTwYRc8KLsJOFaKN9aFOQI6B99BvcPA5pw+NCRvbGraHEF3iyw5uVssKuGhFRoPeJkSZ
         mtEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845217; x=1684437217;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=G1QAyddYnWJUKuKAIak/itfaS7Ph3sBa4ofaAGmayWk=;
        b=a/3MVAwYaenWGRkNW+3s+HwVmndtlicBXR8l69mZCam7rrAZ3xYX6Z5K+HJbdtCr+w
         qBBzimbzK3qAnHMll4HiGdW5X7xZ3CdnYm0EAkgnfVD11axqGBlonu+bhcQ2GFKpA2N8
         WNOtygLpubWcc6b76v5Ox0I/2Kxkbp2j0WhSXXpK7EHQxnMutvXKS8qcbCuqltVdustu
         w4FhX05SymI7z/5JkLA+GLLHIoOJxnN4kqHV2bDNFM1YQdbutedlaNhHCzEQ1GuniIkc
         hn1TOOQT6nOPGKONDyDYNe+yvojh2aX4hdVQSJ3C4+J6mBsTjlTAKE44J8H2/cg+yJl5
         PmPA==
X-Gm-Message-State: AAQBX9dOkVdTwji/zz3D/cgesxFmiTPRrLy9hZyQDygRf2wevTnL62rW
        U7KnBJpiMmZuxtV+Bl9B8fH4eg==
X-Google-Smtp-Source: 
 AKy350ZUW/3fdavMXIT1x5ul2i8OAycLqVKqq8/q+G7rUbGTG7TjkJBhgSj/nJexX1DU7fRe/5Ryig==
X-Received: by 2002:a05:622a:1703:b0:3e4:deff:4634 with SMTP id
 h3-20020a05622a170300b003e4deff4634mr1499645qtk.24.1681845217590;
        Tue, 18 Apr 2023 12:13:37 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 w4-20020ac86b04000000b003e64d076256sm4281417qts.51.2023.04.18.12.13.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:37 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 16/26] mm: compaction: improve compaction_suitable()
 accuracy
Date: Tue, 18 Apr 2023 15:13:03 -0400
Message-Id: <20230418191313.268131-17-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

With the new per-mt free counts, compaction can check the watermarks
specifically against suitable migration targets. This ensures reclaim
keeps going when the free pages are in blocks that aren't actually
suitable migration targets: MIGRATE_FREE, UNMOVABLE, RECLAIMABLE.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 23 +++++++++++++++--------
 mm/vmscan.c     |  7 +++++--
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b9eed0d43403..f637b4ed7f3c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2226,11 +2226,17 @@ enum compact_result compaction_suitable(struct zone=
 *zone, int order,
 					unsigned int alloc_flags,
 					int highest_zoneidx)
 {
+	unsigned long free_pages;
 	enum compact_result ret;
 	int fragindex;
=20
-	ret =3D __compaction_suitable(zone, order, alloc_flags, highest_zoneidx,
-				    zone_page_state(zone, NR_FREE_PAGES));
+	/* Suitable migration targets */
+	free_pages =3D zone_page_state(zone, NR_FREE_MOVABLE);
+	free_pages +=3D zone_page_state(zone, NR_FREE_CMA_PAGES);
+
+	ret =3D __compaction_suitable(zone, order, alloc_flags,
+				    highest_zoneidx, free_pages);
+
 	/*
 	 * fragmentation index determines if allocation failures are due to
 	 * low memory or external fragmentation
@@ -2273,19 +2279,20 @@ bool compaction_zonelist_suitable(struct alloc_cont=
ext *ac, int order,
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
 				ac->highest_zoneidx, ac->nodemask) {
 		unsigned long available;
-		enum compact_result compact_result;
=20
+		available =3D zone_page_state_snapshot(zone, NR_FREE_MOVABLE);
+		available +=3D zone_page_state_snapshot(zone, NR_FREE_CMA_PAGES);
 		/*
 		 * Do not consider all the reclaimable memory because we do not
 		 * want to trash just for a single high order allocation which
 		 * is even not guaranteed to appear even if __compaction_suitable
 		 * is happy about the watermark check.
 		 */
-		available =3D zone_reclaimable_pages(zone) / order;
-		available +=3D zone_page_state_snapshot(zone, NR_FREE_PAGES);
-		compact_result =3D __compaction_suitable(zone, order, alloc_flags,
-				ac->highest_zoneidx, available);
-		if (compact_result =3D=3D COMPACT_CONTINUE)
+		available +=3D zone_reclaimable_pages(zone) / order;
+
+		if (__compaction_suitable(zone, order, alloc_flags,
+					  ac->highest_zoneidx,
+					  available) =3D=3D COMPACT_CONTINUE)
 			return true;
 	}
=20
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5b7b8d4f5297..9ecf29f4dab8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6270,6 +6270,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan=
_control *sc)
 static inline bool compaction_ready(struct zone *zone, struct scan_control=
 *sc)
 {
 	unsigned long watermark;
+	unsigned long free_pages;
 	enum compact_result suitable;
=20
 	suitable =3D compaction_suitable(zone, sc->order, 0, sc->reclaim_idx);
@@ -6290,8 +6291,10 @@ static inline bool compaction_ready(struct zone *zon=
e, struct scan_control *sc)
 	 * we are already above the high+gap watermark, don't reclaim at all.
 	 */
 	watermark =3D high_wmark_pages(zone) + compact_gap(sc->order);
-
-	return zone_watermark_ok_safe(zone, 0, watermark, sc->reclaim_idx);
+	free_pages =3D zone_page_state_snapshot(zone, NR_FREE_MOVABLE);
+	free_pages +=3D zone_page_state_snapshot(zone, NR_FREE_CMA_PAGES);
+	return __zone_watermark_ok(zone, 0, watermark, sc->reclaim_idx,
+				   ALLOC_CMA, free_pages);
 }
=20
 static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_contro=
l *sc)
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 980D6C7EE23
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233064AbjDRTPA (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:00 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57736 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232873AbjDRTOE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:04 -0400
Received: from mail-qv1-xf2f.google.com (mail-qv1-xf2f.google.com
 [IPv6:2607:f8b0:4864:20::f2f])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57943901F
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:39 -0700 (PDT)
Received: by mail-qv1-xf2f.google.com with SMTP id a15so7395704qvn.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845219;
 x=1684437219;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=EFgZ3D8VKBLVvaJCzeMlC8+6IsY+n+UHK2kcdHMAd6Y=;
        b=PrIghyUCPpn833RcxLlSVfU5Tk3SpnQwff6jbo/QPuj/XkyPt6wBzL/Dp9O4CsPEIx
         vlbtgOz19a8Bj5Q9Tvz3XxufWwoP1pyylG1i+Eubg/l0s5lNHABn8jx+RIo3qNp8yvJd
         xxmWan9BQx+LKGsE5QvB32mS3Fey4Ysn5JLrhFNxDsTGokonj/n7lqrpIjGCmJsUuM9+
         Ab+1ebZi1/6VKaIDC4ffD88G61bRq54CAS9nhM+89W7ACAiv40xaJvzX03Xeni/fkhUD
         S0GccLFy6IRYr+8AC7mboIhQozDrEx6WmyjvaCp6CXY+ejn1Dw19uqM1MKn105sfCPkW
         reiw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845219; x=1684437219;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=EFgZ3D8VKBLVvaJCzeMlC8+6IsY+n+UHK2kcdHMAd6Y=;
        b=OuUavGefsaiFhLyWlIYNSLTAl9uFprt6G2RYY8RnW5eb7AhAV+++Fkm/25G0g/iQvt
         Jk6eCkJgZU5ZHxTNsXgeFrNwdsVtU/wwKiTIoPDSgcyJF9Dw2uJ+uSqVgs8ExOw+yen+
         0eHFcwmIR/8sucBfSEXrHjOM12MA6v/kCNYrxCZ4xMpFEukpYFS8KlaTPRIUPszA2dEh
         nyYLmfOvYjUHb1dd/qlYtYIjV6GIH6a9QITElkh+n6GGB+3kdmxn4T+HLD8POiM946Nr
         Zq0QSWOsimWV4IUc7fEWxSjLU79qVhBYi4I7bd99PW3+EzQMzIA0tJuPOUODlRuLoSTb
         KS4w==
X-Gm-Message-State: AAQBX9f1HjDK7B0WJ4VoOBO/CSbzqC/XNjLyrvZMkTO14NpIjpKl0EEI
        MCxSHT2oscECldtSF3bBGexxTQ==
X-Google-Smtp-Source: 
 AKy350YOTexSjADk98sBY4JcGk1SQDwC9c/+fGeG6k+kXfYRIwBupTZdOSOM2/pP7fHju0r9Wjpp2g==
X-Received: by 2002:a05:6214:20e2:b0:56a:b623:9b09 with SMTP id
 2-20020a05621420e200b0056ab6239b09mr22229605qvk.14.1681845218901;
        Tue, 18 Apr 2023 12:13:38 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 z9-20020a0ce609000000b005dd8b9345c7sm3914040qvm.95.2023.04.18.12.13.38
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:38 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 17/26] mm: compaction: refactor __compaction_suitable()
Date: Tue, 18 Apr 2023 15:13:04 -0400
Message-Id: <20230418191313.268131-18-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

__compaction_suitable() is supposed to check for available migration
targets. However, it also checks whether the operation was requested
via /proc/sys/vm/compact_memory, and whether the original allocation
request can already succeed. These don't apply to all callsites.

Move the checks out to the callers, so that later patches can deal
with them one by one. No functional change intended.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/compaction.h |  4 +-
 mm/compaction.c            | 80 ++++++++++++++++++++++++--------------
 mm/vmscan.c                | 35 ++++++++++-------
 3 files changed, 74 insertions(+), 45 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 7635e220215a..9e1b2c56df62 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -98,7 +98,7 @@ extern enum compact_result try_to_compact_pages(gfp_t gfp=
_mask,
 		struct capture_control *capc);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int orde=
r,
-		unsigned int alloc_flags, int highest_zoneidx);
+					       int highest_zoneidx);
=20
 extern void compaction_defer_reset(struct zone *zone, int order,
 				bool alloc_success);
@@ -116,7 +116,7 @@ static inline void reset_isolation_suitable(pg_data_t *=
pgdat)
 }
=20
 static inline enum compact_result compaction_suitable(struct zone *zone, i=
nt order,
-					int alloc_flags, int highest_zoneidx)
+						      int highest_zoneidx)
 {
 	return COMPACT_SKIPPED;
 }
diff --git a/mm/compaction.c b/mm/compaction.c
index f637b4ed7f3c..d4b7d5b36600 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2173,24 +2173,10 @@ static enum compact_result compact_finished(struct =
compact_control *cc)
 }
=20
 static enum compact_result __compaction_suitable(struct zone *zone, int or=
der,
-					unsigned int alloc_flags,
 					int highest_zoneidx,
 					unsigned long wmark_target)
 {
 	unsigned long watermark;
-
-	if (is_via_compact_memory(order))
-		return COMPACT_CONTINUE;
-
-	watermark =3D wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
-	/*
-	 * If watermarks for high-order allocation are already met, there
-	 * should be no need for compaction at all.
-	 */
-	if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
-								alloc_flags))
-		return COMPACT_SUCCESS;
-
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
 	 * isolate free pages for migration targets. This means that the
@@ -2223,7 +2209,6 @@ static enum compact_result __compaction_suitable(stru=
ct zone *zone, int order,
  *   COMPACT_CONTINUE - If compaction should run now
  */
 enum compact_result compaction_suitable(struct zone *zone, int order,
-					unsigned int alloc_flags,
 					int highest_zoneidx)
 {
 	unsigned long free_pages;
@@ -2234,8 +2219,7 @@ enum compact_result compaction_suitable(struct zone *=
zone, int order,
 	free_pages =3D zone_page_state(zone, NR_FREE_MOVABLE);
 	free_pages +=3D zone_page_state(zone, NR_FREE_CMA_PAGES);
=20
-	ret =3D __compaction_suitable(zone, order, alloc_flags,
-				    highest_zoneidx, free_pages);
+	ret =3D __compaction_suitable(zone, order, highest_zoneidx, free_pages);
=20
 	/*
 	 * fragmentation index determines if allocation failures are due to
@@ -2279,6 +2263,16 @@ bool compaction_zonelist_suitable(struct alloc_conte=
xt *ac, int order,
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
 				ac->highest_zoneidx, ac->nodemask) {
 		unsigned long available;
+		unsigned long watermark;
+
+		if (is_via_compact_memory(order))
+			return true;
+
+		/* Allocation can already succeed, nothing to do */
+		watermark =3D wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
+		if (zone_watermark_ok(zone, order, watermark,
+				      ac->highest_zoneidx, alloc_flags))
+			continue;
=20
 		available =3D zone_page_state_snapshot(zone, NR_FREE_MOVABLE);
 		available +=3D zone_page_state_snapshot(zone, NR_FREE_CMA_PAGES);
@@ -2290,8 +2284,7 @@ bool compaction_zonelist_suitable(struct alloc_contex=
t *ac, int order,
 		 */
 		available +=3D zone_reclaimable_pages(zone) / order;
=20
-		if (__compaction_suitable(zone, order, alloc_flags,
-					  ac->highest_zoneidx,
+		if (__compaction_suitable(zone, order, ac->highest_zoneidx,
 					  available) =3D=3D COMPACT_CONTINUE)
 			return true;
 	}
@@ -2322,14 +2315,26 @@ compact_zone(struct compact_control *cc, struct cap=
ture_control *capc)
 	INIT_LIST_HEAD(&cc->migratepages);
=20
 	cc->migratetype =3D gfp_migratetype(cc->gfp_mask);
-	ret =3D compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
-							cc->highest_zoneidx);
-	/* Compaction is likely to fail */
-	if (ret =3D=3D COMPACT_SUCCESS || ret =3D=3D COMPACT_SKIPPED)
-		return ret;
=20
-	/* huh, compaction_suitable is returning something unexpected */
-	VM_BUG_ON(ret !=3D COMPACT_CONTINUE);
+	if (!is_via_compact_memory(cc->order)) {
+		unsigned long watermark;
+
+		/* Allocation can already succeed, nothing to do */
+		watermark =3D wmark_pages(cc->zone,
+					cc->alloc_flags & ALLOC_WMARK_MASK);
+		if (zone_watermark_ok(cc->zone, cc->order, watermark,
+				      cc->highest_zoneidx, cc->alloc_flags))
+			return COMPACT_SUCCESS;
+
+		ret =3D compaction_suitable(cc->zone, cc->order,
+					  cc->highest_zoneidx);
+		/* Compaction is likely to fail */
+		if (ret =3D=3D COMPACT_SKIPPED)
+			return ret;
+
+		/* huh, compaction_suitable is returning something unexpected */
+		VM_BUG_ON(ret !=3D COMPACT_CONTINUE);
+	}
=20
 	/*
 	 * Clear pageblock skip if there were failures recently and compaction
@@ -2803,7 +2808,16 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat)
 		if (!populated_zone(zone))
 			continue;
=20
-		if (compaction_suitable(zone, pgdat->kcompactd_max_order, 0,
+		if (is_via_compact_memory(pgdat->kcompactd_max_order))
+			return true;
+
+		/* Allocation can already succeed, check other zones */
+		if (zone_watermark_ok(zone, pgdat->kcompactd_max_order,
+				      min_wmark_pages(zone),
+				      highest_zoneidx, 0))
+			continue;
+
+		if (compaction_suitable(zone, pgdat->kcompactd_max_order,
 					highest_zoneidx) =3D=3D COMPACT_CONTINUE)
 			return true;
 	}
@@ -2841,10 +2855,18 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 		if (compaction_deferred(zone, cc.order))
 			continue;
=20
-		if (compaction_suitable(zone, cc.order, 0, zoneid) !=3D
-							COMPACT_CONTINUE)
+		if (is_via_compact_memory(cc.order))
+			goto compact;
+
+		/* Allocation can already succeed, nothing to do */
+		if (zone_watermark_ok(zone, cc.order,
+				      min_wmark_pages(zone), zoneid, 0))
 			continue;
=20
+		if (compaction_suitable(zone, cc.order,
+					zoneid) !=3D COMPACT_CONTINUE)
+			continue;
+compact:
 		if (kthread_should_stop())
 			return;
=20
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9ecf29f4dab8..a0ebdbf3efcf 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6076,14 +6076,17 @@ static inline bool should_continue_reclaim(struct p=
glist_data *pgdat,
 		if (!managed_zone(zone))
 			continue;
=20
-		switch (compaction_suitable(zone, sc->order, 0, sc->reclaim_idx)) {
-		case COMPACT_SUCCESS:
-		case COMPACT_CONTINUE:
+		if (sc->order =3D=3D -1) /* is_via_compact_memory() */
+			return false;
+
+		/* Allocation can already succeed, nothing to do */
+		if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
+				      sc->reclaim_idx, 0))
+			return false;
+
+		if (compaction_suitable(zone, sc->order,
+					sc->reclaim_idx) =3D=3D COMPACT_CONTINUE)
 			return false;
-		default:
-			/* check next zone */
-			;
-		}
 	}
=20
 	/*
@@ -6271,16 +6274,20 @@ static inline bool compaction_ready(struct zone *zo=
ne, struct scan_control *sc)
 {
 	unsigned long watermark;
 	unsigned long free_pages;
-	enum compact_result suitable;
=20
-	suitable =3D compaction_suitable(zone, sc->order, 0, sc->reclaim_idx);
-	if (suitable =3D=3D COMPACT_SUCCESS)
-		/* Allocation should succeed already. Don't reclaim. */
+	if (sc->order =3D=3D -1) /* is_via_compact_memory() */
+		goto suitable;
+
+	/* Allocation can already succeed, nothing to do */
+	if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
+			      sc->reclaim_idx, 0))
 		return true;
-	if (suitable =3D=3D COMPACT_SKIPPED)
-		/* Compaction cannot yet proceed. Do reclaim. */
-		return false;
=20
+	/* Compaction cannot yet proceed. Do reclaim. */
+	if (compaction_suitable(zone, sc->order,
+				sc->reclaim_idx) =3D=3D COMPACT_SKIPPED)
+		return false;
+suitable:
 	/*
 	 * Compaction is already possible, but it takes time to run and there
 	 * are potentially other callers using the pages just freed. So proceed
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6F932C7EE21
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232995AbjDRTOx (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57800 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232877AbjDRTOF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:05 -0400
Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com
 [IPv6:2607:f8b0:4864:20::82e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 843357EE5
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:40 -0700 (PDT)
Received: by mail-qt1-x82e.google.com with SMTP id l11so30662352qtj.4
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845220;
 x=1684437220;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=35b4U2+QOBa6ocJ8iDrcqYLn7rCBrGVWRfOUb6rO6cw=;
        b=NAETWyARHURXPOi9rz+E7BtKwnN/WCnP/+JO7LpCEDD8Yt6Nwt9wWNaKjVbRvN3xz2
         CI34kcfshgq+HARendk3lJbfoBE5MkvdQWSrlrPBJRjmqsIYk3kDyWnUU7Frq+txzTlS
         H4HITpALdNoddu4n6BLDIclxULAYQC3G62W5TzlCuce+rhSeoOwKcEgfrWQxqneeYLbo
         7wKg/+icg/zDzq1yq49DPPPiRs66pUC5NSRrssu+vBTZ4mZZ9Ce9QCpZuahqpc5imLz3
         XtNWiOATBaotDlzMFP51F9IOzwQmK3bigZcA6HkmSjex3STf6qkddtjV7o1GG3sIRsXg
         ppGA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845220; x=1684437220;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=35b4U2+QOBa6ocJ8iDrcqYLn7rCBrGVWRfOUb6rO6cw=;
        b=CWPN/Py9jaL3C3BICvSHu68mK9Ews/X1mmtSyl9evjMJgLNEIgYR/oqdJ/QBMtYYFO
         +QaI2GcAm1l/TX76TINJLtZGuKblB+ZCtXzpqQ1ppqfjCwwB3E7ugzE4oM5Y6+8AXoX5
         HvxKrK3OLixMwwe/LAYrqax7aUsGvHlO1yu/l8cqxUDBVVlRfQUmI29eyOYS83f8VSlF
         jD6j5ByLoTHGv8cBvLP9L9isLmXLybNetO6/GW6TEi4V8ytCOkvQm2FINm5VBRtR5okc
         nfNS62pNB8untXuRoCXcbz55Fw0cM5+kUvmQEwPjlcWlXfk1jOtK1X2kO/STSYvIWnXl
         ux9g==
X-Gm-Message-State: AAQBX9e63tRKY/IA2cAVcd05joqVT+vqZ4IHwoQGrI3KoVmWl/Cs2mMx
        ydkIrxTU+xBX/12467RhTnW+Vg==
X-Google-Smtp-Source: 
 AKy350Y9D95GmbNdWa9qjbsQ3a5RYAukR0KDWI7MiGw1X4tAzUYurG+5JZjzgpO5fAjmLVD2oLXHwg==
X-Received: by 2002:ac8:7fd6:0:b0:3bd:1081:b939 with SMTP id
 b22-20020ac87fd6000000b003bd1081b939mr2038244qtk.0.1681845220210;
        Tue, 18 Apr 2023 12:13:40 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 d14-20020a37680e000000b0074d1b6a8187sm2482147qkc.130.2023.04.18.12.13.39
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:39 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 18/26] mm: compaction: remove unnecessary
 is_via_compact_memory() checks
Date: Tue, 18 Apr 2023 15:13:05 -0400
Message-Id: <20230418191313.268131-19-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Remove from all paths not reachable via /proc/sys/vm/compact_memory.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 11 +----------
 mm/vmscan.c     |  8 +-------
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d4b7d5b36600..0aa2a0a192dc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2265,9 +2265,6 @@ bool compaction_zonelist_suitable(struct alloc_contex=
t *ac, int order,
 		unsigned long available;
 		unsigned long watermark;
=20
-		if (is_via_compact_memory(order))
-			return true;
-
 		/* Allocation can already succeed, nothing to do */
 		watermark =3D wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
 		if (zone_watermark_ok(zone, order, watermark,
@@ -2808,9 +2805,6 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat)
 		if (!populated_zone(zone))
 			continue;
=20
-		if (is_via_compact_memory(pgdat->kcompactd_max_order))
-			return true;
-
 		/* Allocation can already succeed, check other zones */
 		if (zone_watermark_ok(zone, pgdat->kcompactd_max_order,
 				      min_wmark_pages(zone),
@@ -2855,9 +2849,6 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 		if (compaction_deferred(zone, cc.order))
 			continue;
=20
-		if (is_via_compact_memory(cc.order))
-			goto compact;
-
 		/* Allocation can already succeed, nothing to do */
 		if (zone_watermark_ok(zone, cc.order,
 				      min_wmark_pages(zone), zoneid, 0))
@@ -2866,7 +2857,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 		if (compaction_suitable(zone, cc.order,
 					zoneid) !=3D COMPACT_CONTINUE)
 			continue;
-compact:
+
 		if (kthread_should_stop())
 			return;
=20
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0ebdbf3efcf..ee8c8ca2e7b5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6076,9 +6076,6 @@ static inline bool should_continue_reclaim(struct pgl=
ist_data *pgdat,
 		if (!managed_zone(zone))
 			continue;
=20
-		if (sc->order =3D=3D -1) /* is_via_compact_memory() */
-			return false;
-
 		/* Allocation can already succeed, nothing to do */
 		if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
 				      sc->reclaim_idx, 0))
@@ -6275,9 +6272,6 @@ static inline bool compaction_ready(struct zone *zone=
, struct scan_control *sc)
 	unsigned long watermark;
 	unsigned long free_pages;
=20
-	if (sc->order =3D=3D -1) /* is_via_compact_memory() */
-		goto suitable;
-
 	/* Allocation can already succeed, nothing to do */
 	if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
 			      sc->reclaim_idx, 0))
@@ -6287,7 +6281,7 @@ static inline bool compaction_ready(struct zone *zone=
, struct scan_control *sc)
 	if (compaction_suitable(zone, sc->order,
 				sc->reclaim_idx) =3D=3D COMPACT_SKIPPED)
 		return false;
-suitable:
+
 	/*
 	 * Compaction is already possible, but it takes time to run and there
 	 * are potentially other callers using the pages just freed. So proceed
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 84687C7EE20
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233042AbjDRTO5 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:14:57 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57746 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232878AbjDRTOF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:05 -0400
Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com
 [IPv6:2607:f8b0:4864:20::82f])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 242D1A26F
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:42 -0700 (PDT)
Received: by mail-qt1-x82f.google.com with SMTP id l11so30662402qtj.4
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845221;
 x=1684437221;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=bns8+mp3X6nXdCFxV2tvIrIcPUTCwJdDXntfPdL7pp0=;
        b=mwxAjt5FHbXosh8eO+VsA0QaIzkPHSj509IMResMBLrFCDtFqQuAb39AWA+k+QpfGQ
         2sTi7ak9g0/7anwaLuFZa8TnEH9/W0IsasBzTf35DDXZIjNZ/+iQTItjl6xvXY6x+qbV
         svWyu0WEhJTQsOP5r4KsHn0kvCVHMCqKPohNvbGE7v7GTuISWCmFDVvEjh+wYDMd6gPV
         QPpD51iUk1E7ChSCX2CJ4//Jg45qmYKpV2D2NrCvvOkGI9qVSedCxygcX/hpgxgpYiRw
         KQnkSn73bb0v0CpIU4XcXH/w9PR1l/pMtfWeS5WYklu48+aB4n8+2PHRP8wlzmglM4v3
         6yMw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845221; x=1684437221;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=bns8+mp3X6nXdCFxV2tvIrIcPUTCwJdDXntfPdL7pp0=;
        b=O5Yzh06/KAHcbfpl5i+VkqQcw8YOe0pzlXC4+CO5gbzJy4X0fMHA8f4zwmAfCGjbfH
         +RQi4TkmuPyBI4A84GQGJcjxMm8ONryyZjC2T5VUud2M2bheeJw4UuyCjOVEwCRcWJEk
         Iz2mK0rUaQspW7Wffo7wt3uEbkm2MUa/GUjOdFVH5s2BYC2KzjB7WOCWkn6iNYQhI/SS
         Ip43J40jvm2AvkCLd46506dyegJuCIEKcYc44Ymq8YipmUp/BnPiP3BvLK/8s5gXsYu7
         4pDCGANX0kg7vg1mBxz15PLtFJG1JF0JC6XNGgPPfKAUqwVnh6ufyiKD5Bs20VgHU7aZ
         liAg==
X-Gm-Message-State: AAQBX9dH43aZJXTUY8oFNBZN9ury+87A2YMglcF4sTiS7poFWALXeTGz
        L8XPvt98J2oUi6zpDJ4A6thr0w==
X-Google-Smtp-Source: 
 AKy350YxL97fUsu7ft+nTwgxWroY8EhxqBqMYvhNTWaLrHbHqzHqIxvgWZLrQ3ewFfKTOzgKcfLxmQ==
X-Received: by 2002:a05:622a:20a:b0:3e4:e94a:5082 with SMTP id
 b10-20020a05622a020a00b003e4e94a5082mr1611125qtx.5.1681845221305;
        Tue, 18 Apr 2023 12:13:41 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 r15-20020a05622a034f00b003e4d43038e2sm1829712qtw.5.2023.04.18.12.13.40
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:41 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 19/26] mm: compaction: drop redundant watermark check in
 compaction_zonelist_suitable()
Date: Tue, 18 Apr 2023 15:13:06 -0400
Message-Id: <20230418191313.268131-20-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

should_compact_retry() is called after direct reclaim and compaction
have failed to produce a page, to check if there is still enough
compactable memory to warrant continuing. If there aren't, one last
attempt at the freelists happens in __alloc_pages_may_oom(). The
watermark check in should_compact_retry() is not necessary. Kill it.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 0aa2a0a192dc..52103545d58c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2263,13 +2263,6 @@ bool compaction_zonelist_suitable(struct alloc_conte=
xt *ac, int order,
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
 				ac->highest_zoneidx, ac->nodemask) {
 		unsigned long available;
-		unsigned long watermark;
-
-		/* Allocation can already succeed, nothing to do */
-		watermark =3D wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
-		if (zone_watermark_ok(zone, order, watermark,
-				      ac->highest_zoneidx, alloc_flags))
-			continue;
=20
 		available =3D zone_page_state_snapshot(zone, NR_FREE_MOVABLE);
 		available +=3D zone_page_state_snapshot(zone, NR_FREE_CMA_PAGES);
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3C5D4C77B78
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233115AbjDRTPH (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:07 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57744 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232884AbjDRTOF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:05 -0400
Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com
 [IPv6:2607:f8b0:4864:20::f2a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67DDBAF0D
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:43 -0700 (PDT)
Received: by mail-qv1-xf2a.google.com with SMTP id dd8so17376824qvb.13
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845222;
 x=1684437222;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=bD8cI1T5Y1yip08nzZ5ycgmWClJik+UbLC7ebo3KxqY=;
        b=ABr/1kuC79+cxAX8L6WJR2/bWTqWb8HcQEjFbpPZF0HuB9YW1YJMuRW7gyOVzYVPRw
         03H6aLq7CX52ef2Om4hh96VGqT7TjGkiKZRVbtTbvnP86DodSH+fBT3Bg8K3RBiskmEn
         PBPAPvjvrcmBXvcXT5D1ENEFAgwZsOibeMRYfxOUHZ/XIjDWRlAOEno0v6h4ZjF413dj
         hOYeaOUBMh9OFP6N6OvsQe4bSQ4Xtq8Jibe8ALXvvyUnSx8zCdaZb/tqXiSYgET+QTx0
         uVr7LhBCB60jsNBnVVB4qFMsgKQHi2Oe9RlYbzhzQb172EdH3Z+Txzl+m/HRkKeizyY4
         nKRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845222; x=1684437222;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=bD8cI1T5Y1yip08nzZ5ycgmWClJik+UbLC7ebo3KxqY=;
        b=VL6O/yF0K8RslDGzEcJ15tyXobMVjcc+H9LDn69CGMQS0+KCYKTjsYuuu/hjcb3tn+
         YfHjmSLba8JuqTrhFKkKGdc2fxsX57ycbjAou0RqhNvhh9FjWKUr3vgsAlyvgD2hhXe6
         OqhZ5YvqP5V+V2AaE5VcI4SDoN4g6SFPdlRrKpHQoywaFZWFAREXSNVd7sXUHklbYRkU
         UgkupnPVJ5nukjceX/hKYwy4EmBQeyOpGA6O7KwdBfv2829q76Yj9NnIWs1TKCkQ6IhC
         H+/SwJ3pkzjXmoNGNMr8esJ7JmH1TSgR+m6ETQldyIFUUJGZCjzpnhNhPTxHmkoS5noe
         88uA==
X-Gm-Message-State: AAQBX9fRqAIZbBkQAfhmTSVJtWsPuSZ78qk55Q+nchpaoic6H8LMKg5v
        qIDnAbDYcOoxDkCTg95Bp2wnAQ==
X-Google-Smtp-Source: 
 AKy350ZedblEphVsszu1HNXxFw88IPawTMC/tacVCADIVXo/ITvdIoDkqo0MjIIYPO3ZAuvFaGP6+g==
X-Received: by 2002:a05:6214:e49:b0:5ef:564a:3296 with SMTP id
 o9-20020a0562140e4900b005ef564a3296mr17532356qvc.44.1681845222547;
        Tue, 18 Apr 2023 12:13:42 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 ep19-20020a05621418f300b005dd8b934594sm3931380qvb.44.2023.04.18.12.13.42
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:42 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 20/26] mm: vmscan: use compaction_suitable() check in
 kswapd
Date: Tue, 18 Apr 2023 15:13:07 -0400
Message-Id: <20230418191313.268131-21-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Kswapd currently bails on higher-order allocations with an open-coded
check for whether it's reclaimed the compaction gap.

compaction_suitable() is the customary interface to coordinate reclaim
with compaction.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c | 67 ++++++++++++++++++-----------------------------------
 1 file changed, 23 insertions(+), 44 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ee8c8ca2e7b5..723705b9e4d9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6872,12 +6872,18 @@ static bool pgdat_balanced(pg_data_t *pgdat, int or=
der, int highest_zoneidx)
 		if (!managed_zone(zone))
 			continue;
=20
+		/* Allocation can succeed in any zone, done */
 		if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
 			mark =3D wmark_pages(zone, WMARK_PROMO);
 		else
 			mark =3D high_wmark_pages(zone);
 		if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx))
 			return true;
+
+		/* Allocation can't succeed, but enough order-0 to compact */
+		if (compaction_suitable(zone, order,
+					highest_zoneidx) =3D=3D COMPACT_CONTINUE)
+			return true;
 	}
=20
 	/*
@@ -6968,16 +6974,6 @@ static bool kswapd_shrink_node(pg_data_t *pgdat,
 	 */
 	shrink_node(pgdat, sc);
=20
-	/*
-	 * Fragmentation may mean that the system cannot be rebalanced for
-	 * high-order allocations. If twice the allocation size has been
-	 * reclaimed then recheck watermarks only at order-0 to prevent
-	 * excessive reclaim. Assume that a process requested a high-order
-	 * can direct reclaim/compact.
-	 */
-	if (sc->order && sc->nr_reclaimed >=3D compact_gap(sc->order))
-		sc->order =3D 0;
-
 	return sc->nr_scanned >=3D sc->nr_to_reclaim;
 }
=20
@@ -7018,15 +7014,13 @@ clear_reclaim_active(pg_data_t *pgdat, int highest_=
zoneidx)
  * that are eligible for use by the caller until at least one zone is
  * balanced.
  *
- * Returns the order kswapd finished reclaiming at.
- *
  * kswapd scans the zones in the highmem->normal->dma direction.  It skips
  * zones which have free_pages > high_wmark_pages(zone), but once a zone is
  * found to have free_pages <=3D high_wmark_pages(zone), any page in that =
zone
  * or lower is eligible for reclaim until at least one usable zone is
  * balanced.
  */
-static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
+static void balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
 {
 	int i;
 	unsigned long nr_soft_reclaimed;
@@ -7226,14 +7220,6 @@ static int balance_pgdat(pg_data_t *pgdat, int order=
, int highest_zoneidx)
 	__fs_reclaim_release(_THIS_IP_);
 	psi_memstall_leave(&pflags);
 	set_task_reclaim_state(current, NULL);
-
-	/*
-	 * Return the order kswapd stopped reclaiming at as
-	 * prepare_kswapd_sleep() takes it into account. If another caller
-	 * entered the allocator slow path while kswapd was awake, order will
-	 * remain at the higher level.
-	 */
-	return sc.order;
 }
=20
 /*
@@ -7251,7 +7237,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_=
t *pgdat,
 	return curr_idx =3D=3D MAX_NR_ZONES ? prev_highest_zoneidx : curr_idx;
 }
=20
-static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int rec=
laim_order,
+static void kswapd_try_to_sleep(pg_data_t *pgdat, int order,
 				unsigned int highest_zoneidx)
 {
 	long remaining =3D 0;
@@ -7269,7 +7255,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int=
 alloc_order, int reclaim_o
 	 * eligible zone balanced that it's also unlikely that compaction will
 	 * succeed.
 	 */
-	if (prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
+	if (prepare_kswapd_sleep(pgdat, order, highest_zoneidx)) {
 		/*
 		 * Compaction records what page blocks it recently failed to
 		 * isolate pages from and skips them in the future scanning.
@@ -7282,7 +7268,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int=
 alloc_order, int reclaim_o
 		 * We have freed the memory, now we should compact it to make
 		 * allocation of the requested order possible.
 		 */
-		wakeup_kcompactd(pgdat, alloc_order, highest_zoneidx);
+		wakeup_kcompactd(pgdat, order, highest_zoneidx);
=20
 		remaining =3D schedule_timeout(HZ/10);
=20
@@ -7296,8 +7282,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int=
 alloc_order, int reclaim_o
 					kswapd_highest_zoneidx(pgdat,
 							highest_zoneidx));
=20
-			if (READ_ONCE(pgdat->kswapd_order) < reclaim_order)
-				WRITE_ONCE(pgdat->kswapd_order, reclaim_order);
+			if (READ_ONCE(pgdat->kswapd_order) < order)
+				WRITE_ONCE(pgdat->kswapd_order, order);
 		}
=20
 		finish_wait(&pgdat->kswapd_wait, &wait);
@@ -7308,8 +7294,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int=
 alloc_order, int reclaim_o
 	 * After a short sleep, check if it was a premature sleep. If not, then
 	 * go fully to sleep until explicitly woken up.
 	 */
-	if (!remaining &&
-	    prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) {
+	if (!remaining && prepare_kswapd_sleep(pgdat, order, highest_zoneidx)) {
 		trace_mm_vmscan_kswapd_sleep(pgdat->node_id);
=20
 		/*
@@ -7350,8 +7335,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int=
 alloc_order, int reclaim_o
  */
 static int kswapd(void *p)
 {
-	unsigned int alloc_order, reclaim_order;
-	unsigned int highest_zoneidx =3D MAX_NR_ZONES - 1;
+	unsigned int order, highest_zoneidx;
 	pg_data_t *pgdat =3D (pg_data_t *)p;
 	struct task_struct *tsk =3D current;
 	const struct cpumask *cpumask =3D cpumask_of_node(pgdat->node_id);
@@ -7374,22 +7358,20 @@ static int kswapd(void *p)
 	tsk->flags |=3D PF_MEMALLOC | PF_KSWAPD;
 	set_freezable();
=20
-	WRITE_ONCE(pgdat->kswapd_order, 0);
+	order =3D 0;
+	highest_zoneidx =3D MAX_NR_ZONES - 1;
+	WRITE_ONCE(pgdat->kswapd_order, order);
 	WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES);
+
 	atomic_set(&pgdat->nr_writeback_throttled, 0);
+
 	for ( ; ; ) {
 		bool ret;
=20
-		alloc_order =3D reclaim_order =3D READ_ONCE(pgdat->kswapd_order);
-		highest_zoneidx =3D kswapd_highest_zoneidx(pgdat,
-							highest_zoneidx);
-
-kswapd_try_sleep:
-		kswapd_try_to_sleep(pgdat, alloc_order, reclaim_order,
-					highest_zoneidx);
+		kswapd_try_to_sleep(pgdat, order, highest_zoneidx);
=20
 		/* Read the new order and highest_zoneidx */
-		alloc_order =3D READ_ONCE(pgdat->kswapd_order);
+		order =3D READ_ONCE(pgdat->kswapd_order);
 		highest_zoneidx =3D kswapd_highest_zoneidx(pgdat,
 							highest_zoneidx);
 		WRITE_ONCE(pgdat->kswapd_order, 0);
@@ -7415,11 +7397,8 @@ static int kswapd(void *p)
 		 * request (alloc_order).
 		 */
 		trace_mm_vmscan_kswapd_wake(pgdat->node_id, highest_zoneidx,
-						alloc_order);
-		reclaim_order =3D balance_pgdat(pgdat, alloc_order,
-						highest_zoneidx);
-		if (reclaim_order < alloc_order)
-			goto kswapd_try_sleep;
+					    order);
+		balance_pgdat(pgdat, order, highest_zoneidx);
 	}
=20
 	tsk->flags &=3D ~(PF_MEMALLOC | PF_KSWAPD);
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 230F3C77B75
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S233102AbjDRTPF (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:05 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57896 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232885AbjDRTOF (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:05 -0400
Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com
 [IPv6:2607:f8b0:4864:20::f2c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D5C3B450
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:44 -0700 (PDT)
Received: by mail-qv1-xf2c.google.com with SMTP id l17so9291333qvq.10
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845223;
 x=1684437223;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=hsTBnxogc0dXbXldiQnTcfaSMnaw37pvCiHuIsUwhKs=;
        b=SD0CfPNo/FtA52mn8PTuMlWwsWPdBrYNWFQm6HroBkjTNWL2qXQf7wrFmhK/WWm7IP
         xcIXlFfLFGEf/BrGyHz+UE3kHnP2QzzQB7WXPEZFLWPxISNTxWhAaa0h1tT0olaHgEUs
         0hpjB9LzjiA5OHWB18U6c4fyqiiCG4R128OFbaQXw6cA+VNmAHuxaR45uultoYesSJx0
         CVUbWebDBfJA3QQt/c1BCijGivD+dCjM9agqh7+5bqifw5Letf8PMz5QSYbJeZ5UTCUP
         tSHckFJNQ1uiBdu7JKbqXKX13Y5T5FnoLNhX7VZobyOD8tLkxHSMFOnr57kBpsZRcWht
         QozQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845223; x=1684437223;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=hsTBnxogc0dXbXldiQnTcfaSMnaw37pvCiHuIsUwhKs=;
        b=JecI5p0UiMmOZEUqCo/7yw4MR+hxEQtZn+xSnGAjLpCYTfrA+Pv7KiI+WCfkTR1rGS
         pX8U9zo1dXhqFfauJ39UvqcKioefZ3RPtcnHHsATvk1AcSTyNrNNojs41UUzqmIswx9w
         brhgX7VR27ZSb0CYoFrQhyPwmYtVg53HyyW7lMsFSBsoQTGvbldpPbotE2cWdr43pWAg
         8fKadlPpRYwNS20lVIYbcxK6iRrbH62ycw9bXPGFjA8JFroaDjMDZAd+gveoYBjXAD+O
         kLgf8wJ5KvdKDNPlMTzDSpUi/p91DYgoTbqKe6hJR6J8lHZkBzaNfrgYhem3lGJ1QjeF
         fDPw==
X-Gm-Message-State: AAQBX9fcTFwk4CwWNCe1MSDHSHkuOHGUo/D0Gs9gcIc+Ng8YhZ8T2uo+
        aEJnRpD2L81PJNC8pPkKAZwREA==
X-Google-Smtp-Source: 
 AKy350afvFhbEDO4u0w1iKrPXbytvWWwl+mYLWsq6qZEuaKTZTEbfiazBxZv7n8V0c8QP05wvdn3Kw==
X-Received: by 2002:a05:6214:2607:b0:5c2:a8b0:d71a with SMTP id
 gu7-20020a056214260700b005c2a8b0d71amr27349805qvb.43.1681845223606;
        Tue, 18 Apr 2023 12:13:43 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 x18-20020a0ce252000000b005dd8b934598sm3884188qvl.48.2023.04.18.12.13.43
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:43 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 21/26] mm: compaction: align compaction goals with reclaim
 goals
Date: Tue, 18 Apr 2023 15:13:08 -0400
Message-Id: <20230418191313.268131-22-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Kswapd's goal is to balance at least one zone in the node for the
requested zoneidx, but no more than that. Kcompactd on the other hand
compacts all the zones in the node, even if one of them is already
compacted for the given request.

This can hog kcompactd unnecessarily long on a requested zoneidx. It
also has kcompactd working on zones without the cooperation of
kswapd. There is a compaction_suitable() check of course, but whether
that is true or not depends on luck, risking erratic behavior.

Make kcompactd follow the same criteria as kswapd when deciding to
work on a node, to keep them working in unison as much as possible.

Likewise, direct reclaim can bail as soon as one zone in the zonelist
is compaction_ready(), so check up front before hammering lower zones
while higher zones might already be suitable. This matches
compaction_zonelist_suitable() on the compaction side.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c |  5 +++--
 mm/vmscan.c     | 35 +++++++++++++++++------------------
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 52103545d58c..8080c04e644a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2798,12 +2798,13 @@ static bool kcompactd_node_suitable(pg_data_t *pgda=
t)
 		if (!populated_zone(zone))
 			continue;
=20
-		/* Allocation can already succeed, check other zones */
+		/* Allocation can succeed in any zone, done */
 		if (zone_watermark_ok(zone, pgdat->kcompactd_max_order,
 				      min_wmark_pages(zone),
 				      highest_zoneidx, 0))
-			continue;
+			return true;
=20
+		/* Allocation can't succed, but enough order-0 to compact */
 		if (compaction_suitable(zone, pgdat->kcompactd_max_order,
 					highest_zoneidx) =3D=3D COMPACT_CONTINUE)
 			return true;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 723705b9e4d9..14d6116384cc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6277,7 +6277,7 @@ static inline bool compaction_ready(struct zone *zone=
, struct scan_control *sc)
 			      sc->reclaim_idx, 0))
 		return true;
=20
-	/* Compaction cannot yet proceed. Do reclaim. */
+	/* Compaction cannot yet proceed, might need reclaim */
 	if (compaction_suitable(zone, sc->order,
 				sc->reclaim_idx) =3D=3D COMPACT_SKIPPED)
 		return false;
@@ -6357,6 +6357,21 @@ static void shrink_zones(struct zonelist *zonelist, =
struct scan_control *sc)
 		sc->reclaim_idx =3D gfp_zone(sc->gfp_mask);
 	}
=20
+	/* Bail if any of the zones are already compactable */
+	if (IS_ENABLED(CONFIG_COMPACTION) &&
+	    sc->order > PAGE_ALLOC_COSTLY_ORDER) {
+		for_each_zone_zonelist_nodemask(zone, z, zonelist,
+						sc->reclaim_idx, sc->nodemask) {
+			if (!cpuset_zone_allowed(zone,
+						 GFP_KERNEL | __GFP_HARDWALL))
+				continue;
+			if (compaction_ready(zone, sc)) {
+				sc->compaction_ready =3D true;
+				goto out;
+			}
+		}
+	}
+
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 					sc->reclaim_idx, sc->nodemask) {
 		/*
@@ -6368,22 +6383,6 @@ static void shrink_zones(struct zonelist *zonelist, =
struct scan_control *sc)
 						 GFP_KERNEL | __GFP_HARDWALL))
 				continue;
=20
-			/*
-			 * If we already have plenty of memory free for
-			 * compaction in this zone, don't free any more.
-			 * Even though compaction is invoked for any
-			 * non-zero order, only frequent costly order
-			 * reclamation is disruptive enough to become a
-			 * noticeable problem, like transparent huge
-			 * page allocations.
-			 */
-			if (IS_ENABLED(CONFIG_COMPACTION) &&
-			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
-			    compaction_ready(zone, sc)) {
-				sc->compaction_ready =3D true;
-				continue;
-			}
-
 			/*
 			 * Shrink each node in the zonelist once. If the
 			 * zonelist is ordered by zone (not the default) then a
@@ -6420,7 +6419,7 @@ static void shrink_zones(struct zonelist *zonelist, s=
truct scan_control *sc)
=20
 	if (first_pgdat)
 		consider_reclaim_throttle(first_pgdat, sc);
-
+out:
 	/*
 	 * Restore to original mask to avoid the impact on the caller if we
 	 * promoted it to __GFP_HIGHMEM.
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9B441C7EE21
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232942AbjDRTP3 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:29 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57816 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232894AbjDRTOH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:07 -0400
Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com
 [IPv6:2607:f8b0:4864:20::f2d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57E76B74E
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:45 -0700 (PDT)
Received: by mail-qv1-xf2d.google.com with SMTP id e13so12950925qvd.8
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845225;
 x=1684437225;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=xFydGPrrlcR5ue8H0k4/5BgZKiD2zceF6yJIdiDg/e0=;
        b=tVrrto59kTHt+5P8OO0NTl96kO6fYFXftlF9A3CAuw7mkP6vDPrG6Y7lZfcyBzps7q
         /gMDUjmhruC6JAgv2lNt70nZGvgEKZPuKRjT85XlnbmpS0gHScOiwvV0CeKGtkebfUO5
         GuRu2B/dbtgriKFhkU+bqrvVfqiYW8ac4FZqca9W6ySikMkgizqrdHAjbR5vdbcQrVed
         qXeZWHFuDKk0K/VtTx0r2tcS0PQ0RyfZX+iVchXZvObxU+M15TeBNYCSjPyOC7INUTz4
         z7Q+PUSdqHdSQo5iQxaxwIudluosvB0gAAvUePxiBSpUWodkFxlz3QjRjOpnT/xEQ3i/
         lkaw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845225; x=1684437225;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=xFydGPrrlcR5ue8H0k4/5BgZKiD2zceF6yJIdiDg/e0=;
        b=YvWK3xjf4VoWOC9RWW7es9YiZHuRb3uq88I6vug7YwHKkLt7hzfN76GEh/5dyZ6TuU
         ZJrEI84kXqxHnJiQQXzKrSg1V0sg/rIj/+Ux5dk1n1+a/XpG/Geh9J3y585oUE2OkjZm
         e9lNZuncfYzLco2vOr1+qFgbetXWePgW02AAQdm92Wp6eMNWeWgPK1zfySiOVOWVGY9+
         aFet7bgdfN4thBXzGapZIqZn03NuqFUnGW9O2R8fwfnD5GE8bH9JTGhSN68W2W/LvL9Z
         Nk1e+ebk5hZQV1ssr2w6YrnZwMa0Y/BnXnsS/Jrd0lnFcrcx9RdO7BrF3Z9P2UcpGhgW
         0gFQ==
X-Gm-Message-State: AAQBX9fdEIfiZkNNC9gi4vPE6oprXg6ALaT3FpGKN2BK4gQU1lucbXDc
        HvE0Jhhs5ghjIfXr4DOfMNai+3Oy7GUlIh/vuzk=
X-Google-Smtp-Source: 
 AKy350aY/eDWP5VAtiKzy0kcSE6ktwznpTC01RhU2t2EYa7ev5AWfpSsbruwvnwNAwe0Wpm5J/BisA==
X-Received: by 2002:ad4:5b8b:0:b0:570:ccb9:a4d0 with SMTP id
 11-20020ad45b8b000000b00570ccb9a4d0mr20299103qvp.16.1681845224818;
        Tue, 18 Apr 2023 12:13:44 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 pp26-20020a056214139a00b005dd8b934571sm3935790qvb.9.2023.04.18.12.13.44
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:44 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 22/26] mm: page_alloc: manage free memory in whole
 pageblocks
Date: Tue, 18 Apr 2023 15:13:09 -0400
Message-Id: <20230418191313.268131-23-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Right now, allocation requests only reclaim (and compact) for their
exact order. Since the majority of allocation requests are smaller
than a pageblock, this is likely to result in partial blocks being
freed, which are subsequently fragmented by fallbacks. This defeats
the allocator's efforts to group pageblocks by mobility.

Fix this mismatch between the allocator and reclaim/compaction: make
the pageblock the default unit for free memory by enforcing watermarks
against MIGRATE_FREE blocks, and have reclaim/compaction produce them.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/compaction.h |   1 -
 mm/compaction.c            |  65 ++++---------
 mm/internal.h              |   1 +
 mm/page_alloc.c            | 183 ++++++++++++++++++++++---------------
 mm/vmscan.c                |   6 +-
 5 files changed, 131 insertions(+), 125 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 9e1b2c56df62..52b2487ef901 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -10,7 +10,6 @@ enum compact_priority {
 	COMPACT_PRIO_SYNC_FULL,
 	MIN_COMPACT_PRIORITY =3D COMPACT_PRIO_SYNC_FULL,
 	COMPACT_PRIO_SYNC_LIGHT,
-	MIN_COMPACT_COSTLY_PRIORITY =3D COMPACT_PRIO_SYNC_LIGHT,
 	DEF_COMPACT_PRIORITY =3D COMPACT_PRIO_SYNC_LIGHT,
 	COMPACT_PRIO_ASYNC,
 	INIT_COMPACT_PRIORITY =3D COMPACT_PRIO_ASYNC
diff --git a/mm/compaction.c b/mm/compaction.c
index 8080c04e644a..e33c99eb34a8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1784,15 +1784,6 @@ static unsigned long fast_find_migrateblock(struct c=
ompact_control *cc)
 	if (cc->order <=3D PAGE_ALLOC_COSTLY_ORDER)
 		return pfn;
=20
-	/*
-	 * Only allow kcompactd and direct requests for movable pages to
-	 * quickly clear out a MOVABLE pageblock for allocation. This
-	 * reduces the risk that a large movable pageblock is freed for
-	 * an unmovable/reclaimable small allocation.
-	 */
-	if (cc->direct_compaction && cc->migratetype !=3D MIGRATE_MOVABLE)
-		return pfn;
-
 	/*
 	 * When starting the migration scanner, pick any pageblock within the
 	 * first half of the search space. Otherwise try and pick a pageblock
@@ -2065,8 +2056,7 @@ static bool should_proactive_compact_node(pg_data_t *=
pgdat)
=20
 static enum compact_result __compact_finished(struct compact_control *cc)
 {
-	unsigned int order;
-	const int migratetype =3D cc->migratetype;
+	unsigned long mark;
 	int ret;
=20
 	/* Compaction run completes if the migrate and free scanner meet */
@@ -2120,39 +2110,16 @@ static enum compact_result __compact_finished(struc=
t compact_control *cc)
 	if (!pageblock_aligned(cc->migrate_pfn))
 		return COMPACT_CONTINUE;
=20
-	/* Direct compactor: Is a suitable page free? */
+	/* Done when watermarks are restored */
 	ret =3D COMPACT_NO_SUITABLE_PAGE;
-	for (order =3D cc->order; order < MAX_ORDER; order++) {
-		struct free_area *area =3D &cc->zone->free_area[order];
-		bool can_steal;
-
-		/* Job done if page is free of the right migratetype */
-		if (!free_area_empty(area, migratetype))
-			return COMPACT_SUCCESS;
-
-#ifdef CONFIG_CMA
-		/* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
-		if (migratetype =3D=3D MIGRATE_MOVABLE &&
-			!free_area_empty(area, MIGRATE_CMA))
-			return COMPACT_SUCCESS;
-#endif
-		/*
-		 * Job done if allocation would steal freepages from
-		 * other migratetype buddy lists.
-		 */
-		if (find_suitable_fallback(area, order, migratetype,
-						true, &can_steal) !=3D -1)
-			/*
-			 * Movable pages are OK in any pageblock. If we are
-			 * stealing for a non-movable allocation, make sure
-			 * we finish compacting the current pageblock first
-			 * (which is assured by the above migrate_pfn align
-			 * check) so it is as free as possible and we won't
-			 * have to steal another one soon.
-			 */
-			return COMPACT_SUCCESS;
-	}
-
+	if (cc->direct_compaction)
+		mark =3D wmark_pages(cc->zone,
+				   cc->alloc_flags & ALLOC_WMARK_MASK);
+	else
+		mark =3D high_wmark_pages(cc->zone);
+	if (zone_watermark_ok(cc->zone, cc->order, mark,
+			      cc->highest_zoneidx, cc->alloc_flags))
+		return COMPACT_SUCCESS;
 out:
 	if (cc->contended || fatal_signal_pending(current))
 		ret =3D COMPACT_CONTENDED;
@@ -2310,8 +2277,12 @@ compact_zone(struct compact_control *cc, struct capt=
ure_control *capc)
 		unsigned long watermark;
=20
 		/* Allocation can already succeed, nothing to do */
-		watermark =3D wmark_pages(cc->zone,
-					cc->alloc_flags & ALLOC_WMARK_MASK);
+		if (cc->direct_compaction)
+			watermark =3D wmark_pages(cc->zone,
+						cc->alloc_flags &
+						ALLOC_WMARK_MASK);
+		else
+			watermark =3D high_wmark_pages(cc->zone);
 		if (zone_watermark_ok(cc->zone, cc->order, watermark,
 				      cc->highest_zoneidx, cc->alloc_flags))
 			return COMPACT_SUCCESS;
@@ -2800,7 +2771,7 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat)
=20
 		/* Allocation can succeed in any zone, done */
 		if (zone_watermark_ok(zone, pgdat->kcompactd_max_order,
-				      min_wmark_pages(zone),
+				      high_wmark_pages(zone),
 				      highest_zoneidx, 0))
 			return true;
=20
@@ -2845,7 +2816,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
=20
 		/* Allocation can already succeed, nothing to do */
 		if (zone_watermark_ok(zone, cc.order,
-				      min_wmark_pages(zone), zoneid, 0))
+				      high_wmark_pages(zone), zoneid, 0))
 			continue;
=20
 		if (compaction_suitable(zone, cc.order,
diff --git a/mm/internal.h b/mm/internal.h
index 39f65a463631..5c76455f8042 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -432,6 +432,7 @@ struct compact_control {
  */
 struct capture_control {
 	struct compact_control *cc;
+	int order;
 	int migratetype;
 	struct page *page;
 };
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 18fa2bbba44b..6f0bfc226c36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1075,7 +1075,7 @@ static inline bool
 compaction_capture(struct zone *zone, struct page *page, int order,
 		   int migratetype, struct capture_control *capc)
 {
-	if (!capc || order < capc->cc->order)
+	if (!capc || order < capc->order)
 		return false;
=20
 	/* Do not accidentally pollute CMA or isolated regions*/
@@ -1097,8 +1097,8 @@ compaction_capture(struct zone *zone, struct page *pa=
ge, int order,
 		return false;
 	}
=20
-	if (order > capc->cc->order)
-		expand(zone, page, capc->cc->order, order, migratetype);
+	if (order > capc->order)
+		expand(zone, page, capc->order, order, migratetype);
=20
 	capc->page =3D page;
 	return true;
@@ -3649,15 +3649,15 @@ int __isolate_free_page(struct page *page, unsigned=
 int order)
 	int mt =3D get_pageblock_migratetype(page);
=20
 	if (!is_migrate_isolate(mt)) {
+		long free_pages =3D zone_page_state(zone, NR_FREE_PAGES);
 		unsigned long watermark;
 		/*
-		 * Obey watermarks as if the page was being allocated. We can
-		 * emulate a high-order watermark check with a raised order-0
-		 * watermark, because we already know our high-order page
-		 * exists.
+		 * Keep a lid on concurrent compaction. MIGRATE_FREE
+		 * watermarks alone cannot be checked here, because
+		 * that's what the caller is trying to produce.
 		 */
 		watermark =3D zone->_watermark[WMARK_MIN] + (1UL << order);
-		if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
+		if (!__zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA, free_pages))
 			return 0;
 	}
=20
@@ -3976,27 +3976,59 @@ noinline bool should_fail_alloc_page(gfp_t gfp_mask=
, unsigned int order)
 }
 ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE);
=20
-static inline long __zone_watermark_unusable_free(struct zone *z,
-				unsigned int order, unsigned int alloc_flags)
+static long page_state(struct zone *zone, enum zone_stat_item item, bool s=
afe)
 {
-	const bool alloc_harder =3D (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
-	long unusable_free =3D (1 << order) - 1;
+	if (safe)
+		return zone_page_state_snapshot(zone, item);
+	else
+		return zone_page_state(zone, item);
+}
+
+static long __zone_free_pages(struct zone *zone, int alloc_flags, bool saf=
e)
+{
+	long free_pages;
=20
 	/*
-	 * If the caller does not have rights to ALLOC_HARDER then subtract
-	 * the high-atomic reserves. This will over-estimate the size of the
-	 * atomic reserve but it avoids a search.
+	 * Enforce watermarks against MIGRATE_FREE pages. This ensures
+	 * that there is always a reserve of higher-order pages
+	 * maintained for all migratetypes and allocation contexts.
+	 *
+	 * Allocations will still use up any compatible free pages
+	 * that may exist inside claimed blocks first. But the reserve
+	 * prevents smaller allocations from starving out higher-order
+	 * requests (which may not be able to sleep, e.g. highatomic).
+	 *
+	 * The additional memory requirements of this are mininmal. If
+	 * internal free pages already exceed the compact_gap(), only
+	 * compaction is necessary to restore the watermarks.
 	 */
-	if (likely(!alloc_harder))
-		unusable_free +=3D z->nr_reserved_highatomic;
+	free_pages =3D page_state(zone, NR_FREE_FREE, safe);
+	if (alloc_flags & (ALLOC_HARDER | ALLOC_OOM))
+		free_pages +=3D page_state(zone, NR_FREE_HIGHATOMIC, safe);
+	if (IS_ENABLED(CONFIG_CMA) && (alloc_flags & ALLOC_CMA))
+		free_pages +=3D page_state(zone, NR_FREE_CMA_PAGES, safe);
=20
-#ifdef CONFIG_CMA
-	/* If allocation can't use CMA areas don't use free CMA pages */
-	if (!(alloc_flags & ALLOC_CMA))
-		unusable_free +=3D zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
+	if (!IS_ENABLED(CONFIG_COMPACTION)) {
+		/*
+		 * We can't reasonably defragment without compaction.
+		 * Consider everything and do best-effort grouping.
+		 */
+		free_pages +=3D page_state(zone, NR_FREE_UNMOVABLE, safe);
+		free_pages +=3D page_state(zone, NR_FREE_MOVABLE, safe);
+		free_pages +=3D page_state(zone, NR_FREE_RECLAIMABLE, safe);
+	}
=20
-	return unusable_free;
+	return free_pages;
+}
+
+static long zone_free_pages(struct zone *zone, int alloc_flags)
+{
+	return __zone_free_pages(zone, alloc_flags, false);
+}
+
+static long zone_free_pages_safe(struct zone *zone, int alloc_flags)
+{
+	return __zone_free_pages(zone, alloc_flags, true);
 }
=20
 /*
@@ -4014,7 +4046,7 @@ bool __zone_watermark_ok(struct zone *z, unsigned int=
 order, unsigned long mark,
 	const bool alloc_harder =3D (alloc_flags & (ALLOC_HARDER|ALLOC_OOM));
=20
 	/* free_pages may go negative - that's OK */
-	free_pages -=3D __zone_watermark_unusable_free(z, order, alloc_flags);
+	free_pages -=3D (1 << order) - 1;
=20
 	if (alloc_flags & ALLOC_HIGH)
 		min -=3D min / 2;
@@ -4076,33 +4108,22 @@ bool zone_watermark_ok(struct zone *z, unsigned int=
 order, unsigned long mark,
 		      int highest_zoneidx, unsigned int alloc_flags)
 {
 	return __zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
-					zone_page_state(z, NR_FREE_PAGES));
+				   zone_free_pages(z, alloc_flags));
 }
=20
 static inline bool zone_watermark_fast(struct zone *z, unsigned int order,
 				unsigned long mark, int highest_zoneidx,
 				unsigned int alloc_flags, gfp_t gfp_mask)
 {
-	long free_pages;
-
-	free_pages =3D zone_page_state(z, NR_FREE_PAGES);
+	long free_pages =3D zone_free_pages(z, alloc_flags);
=20
 	/*
 	 * Fast check for order-0 only. If this fails then the reserves
 	 * need to be calculated.
 	 */
-	if (!order) {
-		long usable_free;
-		long reserved;
-
-		usable_free =3D free_pages;
-		reserved =3D __zone_watermark_unusable_free(z, 0, alloc_flags);
-
-		/* reserved may over estimate high-atomic reserves. */
-		usable_free -=3D min(usable_free, reserved);
-		if (usable_free > mark + z->lowmem_reserve[highest_zoneidx])
-			return true;
-	}
+	if (!order && (free_pages - ((1 << order) - 1) >
+		       mark + z->lowmem_reserve[highest_zoneidx]))
+		return true;
=20
 	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
 					free_pages))
@@ -4126,13 +4147,8 @@ static inline bool zone_watermark_fast(struct zone *=
z, unsigned int order,
 bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
 			unsigned long mark, int highest_zoneidx)
 {
-	long free_pages =3D zone_page_state(z, NR_FREE_PAGES);
-
-	if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark)
-		free_pages =3D zone_page_state_snapshot(z, NR_FREE_PAGES);
-
 	return __zone_watermark_ok(z, order, mark, highest_zoneidx, 0,
-								free_pages);
+				   zone_free_pages_safe(z, 0));
 }
=20
 #ifdef CONFIG_NUMA
@@ -4524,12 +4540,14 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsign=
ed int order,
 	unsigned long pflags;
 	unsigned int noreclaim_flag;
 	struct capture_control capc =3D {
+		.order =3D order,
 		.migratetype =3D ac->migratetype,
 		.page =3D NULL,
 	};
+	int compact_order;
=20
-	if (!order)
-		return NULL;
+	/* Use reclaim/compaction to produce neutral blocks */
+	compact_order =3D max_t(int, order, pageblock_order);
=20
 	/*
 	 * Make sure the structs are really initialized before we expose the
@@ -4543,8 +4561,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned=
 int order,
 	delayacct_compact_start();
 	noreclaim_flag =3D memalloc_noreclaim_save();
=20
-	*compact_result =3D try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-					       prio, &capc);
+	*compact_result =3D try_to_compact_pages(gfp_mask, compact_order,
+					       alloc_flags, ac, prio, &capc);
=20
 	memalloc_noreclaim_restore(noreclaim_flag);
 	psi_memstall_leave(&pflags);
@@ -4608,13 +4626,12 @@ should_compact_retry(struct alloc_context *ac, int =
order, int alloc_flags,
 		     enum compact_priority *compact_priority,
 		     int *compaction_retries)
 {
-	int min_priority;
 	bool ret =3D false;
 	int retries =3D *compaction_retries;
 	enum compact_priority priority =3D *compact_priority;
=20
-	if (!order)
-		return false;
+	/* Use reclaim/compaction to produce neutral blocks */
+	order =3D max_t(int, order, pageblock_order);
=20
 	if (fatal_signal_pending(current))
 		return false;
@@ -4624,20 +4641,6 @@ should_compact_retry(struct alloc_context *ac, int o=
rder, int alloc_flags,
 	 * failed, presumably due to a race. Retry a few times.
 	 */
 	if (compact_result =3D=3D COMPACT_SUCCESS) {
-		int max_retries =3D MAX_COMPACT_RETRIES;
-
-		/*
-		 * !costly requests are much more important than
-		 * __GFP_RETRY_MAYFAIL costly ones because they are de
-		 * facto nofail and invoke OOM killer to move on while
-		 * costly can fail and users are ready to cope with
-		 * that. 1/4 retries is rather arbitrary but we would
-		 * need much more detailed feedback from compaction to
-		 * make a better decision.
-		 */
-		if (order > PAGE_ALLOC_COSTLY_ORDER)
-			max_retries /=3D 4;
-
 		ret =3D ++(*compaction_retries) <=3D MAX_COMPACT_RETRIES;
 		goto out;
 	}
@@ -4654,16 +4657,13 @@ should_compact_retry(struct alloc_context *ac, int =
order, int alloc_flags,
 	/*
 	 * Compaction failed. Retry with increasing priority.
 	 */
-	min_priority =3D (order > PAGE_ALLOC_COSTLY_ORDER) ?
-			MIN_COMPACT_COSTLY_PRIORITY : MIN_COMPACT_PRIORITY;
-
-	if (*compact_priority > min_priority) {
+	if (*compact_priority > MIN_COMPACT_PRIORITY) {
 		(*compact_priority)--;
 		*compaction_retries =3D 0;
 		ret =3D true;
 	}
 out:
-	trace_compact_retry(order, priority, compact_result, retries, max_retries=
, ret);
+	trace_compact_retry(order, priority, compact_result, retries, MAX_COMPACT=
_RETRIES, ret);
 	return ret;
 }
 #else
@@ -4822,9 +4822,16 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigne=
d int order,
 	struct page *page =3D NULL;
 	unsigned long pflags;
 	bool drained =3D false;
+	int reclaim_order;
+
+	/* Use reclaim/compaction to produce neutral blocks */
+	if (IS_ENABLED(CONFIG_COMPACTION))
+		reclaim_order =3D max_t(int, order, pageblock_order);
+	else
+		reclaim_order =3D order;
=20
 	psi_memstall_enter(&pflags);
-	*did_some_progress =3D __perform_reclaim(gfp_mask, order, ac);
+	*did_some_progress =3D __perform_reclaim(gfp_mask, reclaim_order, ac);
 	if (unlikely(!(*did_some_progress)))
 		goto out;
=20
@@ -4856,6 +4863,10 @@ static void wake_all_kswapds(unsigned int order, gfp=
_t gfp_mask,
 	pg_data_t *last_pgdat =3D NULL;
 	enum zone_type highest_zoneidx =3D ac->highest_zoneidx;
=20
+	/* Use reclaim/compaction to produce neutral blocks */
+	if (IS_ENABLED(CONFIG_COMPACTION))
+		order =3D max_t(unsigned int, order, pageblock_order);
+
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, highest_zoneidx,
 					ac->nodemask) {
 		if (!managed_zone(zone))
@@ -4970,6 +4981,24 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 	struct zoneref *z;
 	bool ret =3D false;
=20
+	/*
+	 * In the old world, order-0 pages only need reclaim, and
+	 * higher orders might be present but the order-0 watermarks
+	 * aren't met yet. These things can be fixed by reclaim alone.
+	 *
+	 * In the new world, though, watermark checks are against
+	 * MIGRATE_FREE blocks. That means if the watermarks aren't
+	 * met, reclaim isn't going to be the solution. Neither for
+	 * order-0 nor for anything else. Whether it makes sense to
+	 * retry depends fully on whether compaction should retry.
+	 *
+	 * should_compact_retry() already checks for COMPACT_SKIPPED
+	 * and compaction_zonelist_suitable() to test whether reclaim
+	 * is needed.
+	 */
+	if (IS_ENABLED(CONFIG_COMPACTION))
+		goto schedule;
+
 	/*
 	 * Costly allocations might have made a progress but this doesn't mean
 	 * their order will become available due to high fragmentation so
@@ -5019,6 +5048,7 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 		}
 	}
=20
+schedule:
 	/*
 	 * Memory allocation/reclaim might be called from a WQ context and the
 	 * current implementation of the WQ concurrency control doesn't
@@ -8833,6 +8863,13 @@ static void __setup_per_zone_wmarks(void)
 			    mult_frac(zone_managed_pages(zone),
 				      watermark_scale_factor, 10000));
=20
+		/*
+		 * Ensure the watermark delta is a multiple of the
+		 * neutral block that reclaim/compaction produces.
+		 */
+		if (IS_ENABLED(CONFIG_COMPACTION))
+			tmp =3D ALIGN(tmp, 1 << pageblock_order);
+
 		zone->watermark_boost =3D 0;
 		zone->_watermark[WMARK_LOW]  =3D min_wmark_pages(zone) + tmp;
 		zone->_watermark[WMARK_HIGH] =3D low_wmark_pages(zone) + tmp;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 14d6116384cc..a7374cd6fe91 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7438,8 +7438,7 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags=
, int order,
=20
 	/* Hopeless node, leave it to direct reclaim if possible */
 	if (pgdat->kswapd_failures >=3D MAX_RECLAIM_RETRIES ||
-	    (pgdat_balanced(pgdat, order, highest_zoneidx) &&
-	     !pgdat_watermark_boosted(pgdat, highest_zoneidx))) {
+	    pgdat_balanced(pgdat, order, highest_zoneidx)) {
 		/*
 		 * There may be plenty of free memory available, but it's too
 		 * fragmented for high-order allocations.  Wake up kcompactd
@@ -7447,8 +7446,7 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags=
, int order,
 		 * needed.  If it fails, it will defer subsequent attempts to
 		 * ratelimit its work.
 		 */
-		if (!(gfp_flags & __GFP_DIRECT_RECLAIM))
-			wakeup_kcompactd(pgdat, order, highest_zoneidx);
+		wakeup_kcompactd(pgdat, order, highest_zoneidx);
 		return;
 	}
=20
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 534FBC77B7D
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232444AbjDRTPK (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57832 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232902AbjDRTOI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:08 -0400
Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com
 [IPv6:2607:f8b0:4864:20::f30])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0951BB85
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:46 -0700 (PDT)
Received: by mail-qv1-xf30.google.com with SMTP id op30so18259762qvb.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845226;
 x=1684437226;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=+cRr9qK9Ai5PrhDW9GBX61DMQfs8tenll5tn8q6xhSs=;
        b=wmmelD/et9ZFdJKWhHdBu3DZw0uP+uxtuqLIKxl7QWQ1gLIf0x4Dl4qBY5A5PexoJr
         OhMqGiuUdTE3V0nKLvSyHZAoIjUGav09Njn4JLlk8e/qdWSaEO8AbhxXtKfhN9YvxCXg
         l0z11YLxPSDIF42CiCn8RarJvNYgRQLIQ2zqvd+pz+X9iu9cWQQc/1+jMyxJFIG9sCWT
         OLKvnFIUidWTWQ+1Jb0/Pu2fD7pUcAOpyiK/lAooYzOXHFLzWNIhXh6zDBh53/UdDun2
         oq1Tk9nVl3BXreuO2iAOhtOPWohJqF8SUU3eIRYB0cpXOnSCLJoCqIqVgvyWWmUEwzmF
         gGiA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845226; x=1684437226;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=+cRr9qK9Ai5PrhDW9GBX61DMQfs8tenll5tn8q6xhSs=;
        b=VcLZkeRPj+pv/9o69wqLZvj8pL76AykOTG0AFwCNyA6NlSR51cl+Ya2J7A2A3ogV6M
         k5mSftB28xNd93Rxgs/hPoCDi4MCDmlUiZtTsiDa2JY7Q73XOjfA3ZpgNiGOCO21glXg
         RA9WnQqe8E2k/QG5qfm5WddIkUcY9yXJE6hqu9wRHin5N4SWhypU7VDc5VmMlvS4NPiD
         9a3w2WWHwkwxJmZdxCYTHwuJxFqAlQQxs/qUDCQ8JLjgLpe4d4ZZqxZEPiF5GpcsW8Hu
         bSY4B/s4aXjn1XNKFwi6pxYH9WYOBf1xcIFMP8uqKxm6Qz+d2vce8pmLWfaUl+9AZy2r
         9WyA==
X-Gm-Message-State: AAQBX9eTxcTm6jWYxsDCZxHUKx3xeDLvuMlyipatTAiUBiTVOD8s+s2j
        +l7KwbqUAi+bmmaEnw8Ns0o29A==
X-Google-Smtp-Source: 
 AKy350bbq+tUiFNnGALyl/LWu3vvTMgXSZpDDXIn0FD09vHJTVesFePd2LgKVoVwrtg0UDf72P1fJA==
X-Received: by 2002:a05:6214:21c8:b0:5f1:5f73:aed4 with SMTP id
 d8-20020a05621421c800b005f15f73aed4mr4057561qvh.15.1681845225932;
        Tue, 18 Apr 2023 12:13:45 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 lu8-20020a0562145a0800b005eab96abc9esm3844744qvb.140.2023.04.18.12.13.45
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:45 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 23/26] mm: page_alloc: kill highatomic
Date: Tue, 18 Apr 2023 15:13:10 -0400
Message-Id: <20230418191313.268131-24-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

The highatomic reserves are blocks set aside specifically for
higher-order atomic allocations. Since watermarks are now required to
be met in pageblocks, this is no longer necessary.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/gfp.h    |   2 -
 include/linux/mmzone.h |   6 +-
 mm/internal.h          |   5 --
 mm/page_alloc.c        | 187 ++---------------------------------------
 mm/vmstat.c            |   1 -
 5 files changed, 10 insertions(+), 191 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 65a78773dcca..78b5176d354e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -19,8 +19,6 @@ static inline int gfp_migratetype(const gfp_t gfp_flags)
 	BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) !=3D ___GFP_MOVABLE);
 	BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) !=3D MIGRATE_MOVABLE);
 	BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) !=3D MIGRATE_RECLA=
IMABLE);
-	BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >>
-		      GFP_MOVABLE_SHIFT) !=3D MIGRATE_HIGHATOMIC);
=20
 	if (unlikely(page_group_by_mobility_disabled))
 		return MIGRATE_UNMOVABLE;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d1083ab81998..c705f2f7c829 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -44,8 +44,7 @@ enum migratetype {
 	MIGRATE_MOVABLE,
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
-	MIGRATE_HIGHATOMIC =3D MIGRATE_PCPTYPES,
-	MIGRATE_FREE,
+	MIGRATE_FREE =3D MIGRATE_PCPTYPES,
 #ifdef CONFIG_CMA
 	/*
 	 * MIGRATE_CMA migration type is designed to mimic the way
@@ -142,7 +141,6 @@ enum zone_stat_item {
 	NR_FREE_UNMOVABLE,
 	NR_FREE_MOVABLE,
 	NR_FREE_RECLAIMABLE,
-	NR_FREE_HIGHATOMIC,
 	NR_FREE_FREE,
 	NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
 	NR_ZONE_INACTIVE_ANON =3D NR_ZONE_LRU_BASE,
@@ -713,8 +711,6 @@ struct zone {
 	unsigned long _watermark[NR_WMARK];
 	unsigned long watermark_boost;
=20
-	unsigned long nr_reserved_highatomic;
-
 	/*
 	 * We don't know if the memory that we're going to allocate will be
 	 * freeable or/and it will be released eventually, so to avoid totally
diff --git a/mm/internal.h b/mm/internal.h
index 5c76455f8042..24f43f5db88b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -778,11 +778,6 @@ extern const struct trace_print_flags pageflag_names[];
 extern const struct trace_print_flags vmaflag_names[];
 extern const struct trace_print_flags gfpflag_names[];
=20
-static inline bool is_migrate_highatomic(enum migratetype migratetype)
-{
-	return migratetype =3D=3D MIGRATE_HIGHATOMIC;
-}
-
 void setup_zone_pageset(struct zone *zone);
=20
 struct migration_target_control {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6f0bfc226c36..e8ae04feb1bd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -379,7 +379,6 @@ const char * const migratetype_names[MIGRATE_TYPES] =3D=
 {
 	"Unmovable",
 	"Movable",
 	"Reclaimable",
-	"HighAtomic",
 	"Free",
 #ifdef CONFIG_CMA
 	"CMA",
@@ -1202,7 +1201,7 @@ static inline void __free_one_page(struct page *page,
 			 * We want to prevent merge between freepages on pageblock
 			 * without fallbacks and normal pageblock. Without this,
 			 * pageblock isolation could cause incorrect freepage or CMA
-			 * accounting or HIGHATOMIC accounting.
+			 * accounting.
 			 */
 			if (migratetype !=3D buddy_mt
 					&& (!migratetype_is_mergeable(migratetype) ||
@@ -2797,13 +2796,6 @@ static void steal_suitable_fallback(struct zone *zon=
e, struct page *page,
=20
 	old_block_type =3D get_pageblock_migratetype(page);
=20
-	/*
-	 * This can happen due to races and we want to prevent broken
-	 * highatomic accounting.
-	 */
-	if (is_migrate_highatomic(old_block_type))
-		goto single_page;
-
 	/* Take ownership for orders >=3D pageblock_order */
 	if (current_order >=3D pageblock_order) {
 		change_pageblock_range(page, current_order, start_type);
@@ -2918,126 +2910,6 @@ int find_suitable_fallback(struct free_area *area, =
unsigned int order,
 	return -1;
 }
=20
-/*
- * Reserve a pageblock for exclusive use of high-order atomic allocations =
if
- * there are no empty page blocks that contain a page with a suitable order
- */
-static void reserve_highatomic_pageblock(struct page *page, struct zone *z=
one,
-				unsigned int alloc_order)
-{
-	int mt;
-	unsigned long max_managed, flags;
-
-	/*
-	 * Limit the number reserved to 1 pageblock or roughly 1% of a zone.
-	 * Check is race-prone but harmless.
-	 */
-	max_managed =3D (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
-	if (zone->nr_reserved_highatomic >=3D max_managed)
-		return;
-
-	spin_lock_irqsave(&zone->lock, flags);
-
-	/* Recheck the nr_reserved_highatomic limit under the lock */
-	if (zone->nr_reserved_highatomic >=3D max_managed)
-		goto out_unlock;
-
-	/* Yoink! */
-	mt =3D get_pageblock_migratetype(page);
-	/* Only reserve normal pageblocks (i.e., they can merge with others) */
-	if (migratetype_is_mergeable(mt)) {
-		zone->nr_reserved_highatomic +=3D pageblock_nr_pages;
-		set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
-		move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC, NULL);
-	}
-
-out_unlock:
-	spin_unlock_irqrestore(&zone->lock, flags);
-}
-
-/*
- * Used when an allocation is about to fail under memory pressure. This
- * potentially hurts the reliability of high-order allocations when under
- * intense memory pressure but failed atomic allocations should be easier
- * to recover from than an OOM.
- *
- * If @force is true, try to unreserve a pageblock even though highatomic
- * pageblock is exhausted.
- */
-static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
-						bool force)
-{
-	struct zonelist *zonelist =3D ac->zonelist;
-	unsigned long flags;
-	struct zoneref *z;
-	struct zone *zone;
-	struct page *page;
-	int order;
-	bool ret;
-
-	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->highest_zoneidx,
-								ac->nodemask) {
-		/*
-		 * Preserve at least one pageblock unless memory pressure
-		 * is really high.
-		 */
-		if (!force && zone->nr_reserved_highatomic <=3D
-					pageblock_nr_pages)
-			continue;
-
-		spin_lock_irqsave(&zone->lock, flags);
-		for (order =3D 0; order < MAX_ORDER; order++) {
-			struct free_area *area =3D &(zone->free_area[order]);
-			int mt;
-
-			page =3D get_page_from_free_area(area, MIGRATE_HIGHATOMIC);
-			if (!page)
-				continue;
-
-			mt =3D get_pageblock_migratetype(page);
-			/*
-			 * In page freeing path, migratetype change is racy so
-			 * we can counter several free pages in a pageblock
-			 * in this loop although we changed the pageblock type
-			 * from highatomic to ac->migratetype. So we should
-			 * adjust the count once.
-			 */
-			if (is_migrate_highatomic(mt)) {
-				/*
-				 * It should never happen but changes to
-				 * locking could inadvertently allow a per-cpu
-				 * drain to add pages to MIGRATE_HIGHATOMIC
-				 * while unreserving so be safe and watch for
-				 * underflows.
-				 */
-				zone->nr_reserved_highatomic -=3D min(
-						pageblock_nr_pages,
-						zone->nr_reserved_highatomic);
-			}
-
-			/*
-			 * Convert to ac->migratetype and avoid the normal
-			 * pageblock stealing heuristics. Minimally, the caller
-			 * is doing the work and needs the pages. More
-			 * importantly, if the block was always converted to
-			 * MIGRATE_UNMOVABLE or another type then the number
-			 * of pageblocks that cannot be completely freed
-			 * may increase.
-			 */
-			set_pageblock_migratetype(page, ac->migratetype);
-			ret =3D move_freepages_block(zone, page, mt,
-						   ac->migratetype, NULL);
-			if (ret) {
-				spin_unlock_irqrestore(&zone->lock, flags);
-				return ret;
-			}
-		}
-		spin_unlock_irqrestore(&zone->lock, flags);
-	}
-
-	return false;
-}
-
 /*
  * Try finding a free buddy page on the fallback list and put it on the fr=
ee
  * list of requested migratetype, possibly along with other pages from the=
 same
@@ -3510,18 +3382,11 @@ void free_unref_page(struct page *page, unsigned in=
t order)
=20
 	/*
 	 * We only track unmovable, reclaimable and movable on pcp lists.
-	 * Place ISOLATE pages on the isolated list because they are being
-	 * offlined but treat HIGHATOMIC as movable pages so we can get those
-	 * areas back if necessary. Otherwise, we may have to free
-	 * excessively into the page allocator
 	 */
 	migratetype =3D get_pcppage_migratetype(page);
 	if (unlikely(migratetype >=3D MIGRATE_PCPTYPES)) {
-		if (unlikely(is_migrate_isolate(migratetype) || migratetype =3D=3D MIGRA=
TE_FREE)) {
-			free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
-			return;
-		}
-		migratetype =3D MIGRATE_MOVABLE;
+		free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
+		return;
 	}
=20
 	zone =3D page_zone(page);
@@ -3740,24 +3605,11 @@ struct page *rmqueue_buddy(struct zone *preferred_z=
one, struct zone *zone,
 	unsigned long flags;
=20
 	do {
-		page =3D NULL;
 		spin_lock_irqsave(&zone->lock, flags);
-		/*
-		 * order-0 request can reach here when the pcplist is skipped
-		 * due to non-CMA allocation context. HIGHATOMIC area is
-		 * reserved for high-order atomic allocation, so order-0
-		 * request should skip it.
-		 */
-		if (order > 0 && alloc_flags & ALLOC_HARDER)
-			page =3D __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
-		if (!page) {
-			page =3D __rmqueue(zone, order, migratetype, alloc_flags);
-			if (!page) {
-				spin_unlock_irqrestore(&zone->lock, flags);
-				return NULL;
-			}
-		}
+		page =3D __rmqueue(zone, order, migratetype, alloc_flags);
 		spin_unlock_irqrestore(&zone->lock, flags);
+		if (!page)
+			return NULL;
 	} while (check_new_pages(page, order));
=20
 	__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
@@ -4003,8 +3855,6 @@ static long __zone_free_pages(struct zone *zone, int =
alloc_flags, bool safe)
 	 * compaction is necessary to restore the watermarks.
 	 */
 	free_pages =3D page_state(zone, NR_FREE_FREE, safe);
-	if (alloc_flags & (ALLOC_HARDER | ALLOC_OOM))
-		free_pages +=3D page_state(zone, NR_FREE_HIGHATOMIC, safe);
 	if (IS_ENABLED(CONFIG_CMA) && (alloc_flags & ALLOC_CMA))
 		free_pages +=3D page_state(zone, NR_FREE_CMA_PAGES, safe);
=20
@@ -4098,8 +3948,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int=
 order, unsigned long mark,
 			return true;
 		}
 #endif
-		if (alloc_harder && !free_area_empty(area, MIGRATE_HIGHATOMIC))
-			return true;
 	}
 	return false;
 }
@@ -4340,14 +4188,6 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int =
order, int alloc_flags,
 				gfp_mask, alloc_flags, ac->migratetype);
 		if (page) {
 			prep_new_page(page, order, gfp_mask, alloc_flags);
-
-			/*
-			 * If this is a high-order atomic allocation then check
-			 * if the pageblock should be reserved for the future
-			 */
-			if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
-				reserve_highatomic_pageblock(page, zone, order);
-
 			return page;
 		} else {
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
@@ -4844,7 +4684,6 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned=
 int order,
 	 * Shrink them and try again
 	 */
 	if (!page && !drained) {
-		unreserve_highatomic_pageblock(ac, false);
 		drain_all_pages(NULL);
 		drained =3D true;
 		goto retry;
@@ -5013,10 +4852,8 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 	 * Make sure we converge to OOM if we cannot make any progress
 	 * several times in the row.
 	 */
-	if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
-		/* Before OOM, exhaust highatomic_reserve */
-		return unreserve_highatomic_pageblock(ac, true);
-	}
+	if (*no_progress_loops > MAX_RECLAIM_RETRIES)
+		return false;
=20
 	/*
 	 * Keep reclaiming pages while there is a chance this will lead
@@ -6129,7 +5966,6 @@ static void show_migration_types(unsigned char type)
 		[MIGRATE_UNMOVABLE]	=3D 'U',
 		[MIGRATE_MOVABLE]	=3D 'M',
 		[MIGRATE_RECLAIMABLE]	=3D 'E',
-		[MIGRATE_HIGHATOMIC]	=3D 'H',
 		[MIGRATE_FREE]		=3D 'F',
 #ifdef CONFIG_CMA
 		[MIGRATE_CMA]		=3D 'C',
@@ -6194,7 +6030,7 @@ void __show_free_areas(unsigned int filter, nodemask_=
t *nodemask, int max_zone_i
 		" sec_pagetables:%lu bounce:%lu\n"
 		" kernel_misc_reclaimable:%lu\n"
 		" free:%lu free_unmovable:%lu free_movable:%lu\n"
-		" free_reclaimable:%lu free_highatomic:%lu free_free:%lu\n"
+		" free_reclaimable:%lu free_free:%lu\n"
 		" free_cma:%lu free_pcp:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
 		global_node_page_state(NR_INACTIVE_ANON),
@@ -6217,7 +6053,6 @@ void __show_free_areas(unsigned int filter, nodemask_=
t *nodemask, int max_zone_i
 		global_zone_page_state(NR_FREE_UNMOVABLE),
 		global_zone_page_state(NR_FREE_MOVABLE),
 		global_zone_page_state(NR_FREE_RECLAIMABLE),
-		global_zone_page_state(NR_FREE_HIGHATOMIC),
 		global_zone_page_state(NR_FREE_FREE),
 		global_zone_page_state(NR_FREE_CMA_PAGES),
 		free_pcp);
@@ -6301,13 +6136,11 @@ void __show_free_areas(unsigned int filter, nodemas=
k_t *nodemask, int max_zone_i
 			" free_unmovable:%lukB"
 			" free_movable:%lukB"
 			" free_reclaimable:%lukB"
-			" free_highatomic:%lukB"
 			" free_free:%lukB"
 			" boost:%lukB"
 			" min:%lukB"
 			" low:%lukB"
 			" high:%lukB"
-			" reserved_highatomic:%luKB"
 			" active_anon:%lukB"
 			" inactive_anon:%lukB"
 			" active_file:%lukB"
@@ -6327,13 +6160,11 @@ void __show_free_areas(unsigned int filter, nodemas=
k_t *nodemask, int max_zone_i
 			K(zone_page_state(zone, NR_FREE_UNMOVABLE)),
 			K(zone_page_state(zone, NR_FREE_MOVABLE)),
 			K(zone_page_state(zone, NR_FREE_RECLAIMABLE)),
-			K(zone_page_state(zone, NR_FREE_HIGHATOMIC)),
 			K(zone_page_state(zone, NR_FREE_FREE)),
 			K(zone->watermark_boost),
 			K(min_wmark_pages(zone)),
 			K(low_wmark_pages(zone)),
 			K(high_wmark_pages(zone)),
-			K(zone->nr_reserved_highatomic),
 			K(zone_page_state(zone, NR_ZONE_ACTIVE_ANON)),
 			K(zone_page_state(zone, NR_ZONE_INACTIVE_ANON)),
 			K(zone_page_state(zone, NR_ZONE_ACTIVE_FILE)),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c8b8e6e259da..a2f7b41564df 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1171,7 +1171,6 @@ const char * const vmstat_text[] =3D {
 	"nr_free_unmovable",
 	"nr_free_movable",
 	"nr_free_reclaimable",
-	"nr_free_highatomic",
 	"nr_free_free",
 	"nr_zone_inactive_anon",
 	"nr_zone_active_anon",
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 62984C6FD18
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232828AbjDRTPP (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57788 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232901AbjDRTOI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:08 -0400
Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com
 [IPv6:2607:f8b0:4864:20::82c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17036C176
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:48 -0700 (PDT)
Received: by mail-qt1-x82c.google.com with SMTP id fy11so5948237qtb.12
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845227;
 x=1684437227;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=RcOWVPTnlhnbqqS83iZRHWvw9Z/GKVYeqba03KhzlhU=;
        b=a/1OWmR/tUcWE2Qqj1o11ZW4UdowGBE3dveFmCmMXs2fzxld+GikzPbYsZnkzQUhO1
         rszkmym7seH29zcw8ulr+qDK26L0PsIcj2vrordWVhv4mNV7cpWqWuY5SQHsZw8z7mao
         4Z/ONsQuuhKfFYEQ6RM3cdDgMfZH06DbmKmGbAFpl8E2Vd0qVeQ9k1m0bs22PIP//Pjj
         y/MIXkHHsDsENio2rDHGldHoYX51Hv15DAGeCwg3O+QTHInV8nsPqRyC21NUdBPxr8kG
         g6o0GlmUWbBX6RAAFG5tpS7kEnZq+h52JP5HlzaOA46WT6u1sWU3NzgkrM0ulNQCm7Q9
         Iynw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845227; x=1684437227;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=RcOWVPTnlhnbqqS83iZRHWvw9Z/GKVYeqba03KhzlhU=;
        b=UjoE2wWrnWZq4a7jkz8ULEPiX0vR3qNP/3ky9Qiq7vh1dR5K/MCKJyCbTs96B0QBm8
         2TnDyxHcTRfjxRCxslSkBxJiLwlDtQJLhtyEzlU1QSO+aKtIY/E7dROEPiUi2XZmdaEj
         bie5aLyPcz0Q2927nF5Ihb+rpEI2W/pi56cDYZEcWW+1Vg0/5UVGir4IYPmtv66brP5z
         aYBw/EetyVKrsN5PGWjfVQyn67eC5EwfdlXCHAuV6wPCZ4npszmde4LUnv6PtQtgEHLD
         g88vrDZ4s3lfjT8JjYvQrcnAW7WUsRts8ugP0DBAKg/jM59DjEGBpzyk6YCnmPz3ITke
         uiAg==
X-Gm-Message-State: AAQBX9flRljcvfxVZeBBj4YHCysIddOBLj5nY0PjrfCBm/a+ttTlY1fi
        cqFpCFzPd293GfIcCwNCmny5Cg==
X-Google-Smtp-Source: 
 AKy350aQbCigm5jEeHDFUL4OtIuD38+nxMH8T2pdMpNgffiiQpVWw2yOWVdgL1WnHxicR6IJsCf4vw==
X-Received: by 2002:a05:622a:1996:b0:3e6:71d6:5d5d with SMTP id
 u22-20020a05622a199600b003e671d65d5dmr1822298qtc.1.1681845227071;
        Tue, 18 Apr 2023 12:13:47 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 g18-20020ac85812000000b003ef189ffa82sm2022876qtg.90.2023.04.18.12.13.46
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:46 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 24/26] mm: page_alloc: kill watermark boosting
Date: Tue, 18 Apr 2023 15:13:11 -0400
Message-Id: <20230418191313.268131-25-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Watermark boosting is meant to increase the chances of pageblock
production when fallbacks are observed. Since reclaim/compaction now
produce neutral pageblocks per default, this is no longer needed.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 Documentation/admin-guide/sysctl/vm.rst |  21 -----
 include/linux/mm.h                      |   1 -
 include/linux/mmzone.h                  |  12 +--
 kernel/sysctl.c                         |   8 --
 mm/page_alloc.c                         |  67 --------------
 mm/vmscan.c                             | 111 +-----------------------
 mm/vmstat.c                             |   2 -
 7 files changed, 7 insertions(+), 215 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-=
guide/sysctl/vm.rst
index 988f6a4c8084..498655c322bc 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -72,7 +72,6 @@ files can be found in mm/swap.c.
 - unprivileged_userfaultfd
 - user_reserve_kbytes
 - vfs_cache_pressure
-- watermark_boost_factor
 - watermark_scale_factor
 - zone_reclaim_mode
=20
@@ -968,26 +967,6 @@ directory and inode objects. With vfs_cache_pressure=
=3D1000, it will look for
 ten times more freeable objects than there are.
=20
=20
-watermark_boost_factor
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-
-This factor controls the level of reclaim when memory is being fragmented.
-It defines the percentage of the high watermark of a zone that will be
-reclaimed if pages of different mobility are being mixed within pageblocks.
-The intent is that compaction has less work to do in the future and to
-increase the success rate of future high-order allocations such as SLUB
-allocations, THP and hugetlbfs pages.
-
-To make it sensible with respect to the watermark_scale_factor
-parameter, the unit is in fractions of 10,000. The default value of
-15,000 means that up to 150% of the high watermark will be reclaimed in the
-event of a pageblock being mixed due to fragmentation. The level of reclaim
-is determined by the number of fragmentation events that occurred in the
-recent past. If this value is smaller than a pageblock then a pageblocks
-worth of pages will be reclaimed (e.g.  2MB on 64-bit x86). A boost factor
-of 0 will disable the feature.
-
-
 watermark_scale_factor
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f13f20258ce9..e7c2631848ed 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2746,7 +2746,6 @@ extern void setup_per_cpu_pageset(void);
=20
 /* page_alloc.c */
 extern int min_free_kbytes;
-extern int watermark_boost_factor;
 extern int watermark_scale_factor;
 extern bool arch_has_descending_max_zone_pfns(void);
=20
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c705f2f7c829..1363ff6caff3 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -567,10 +567,10 @@ enum zone_watermarks {
 #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER=
 + 1))
 #define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP)
=20
-#define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost)
-#define low_wmark_pages(z) (z->_watermark[WMARK_LOW] + z->watermark_boost)
-#define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boos=
t)
-#define wmark_pages(z, i) (z->_watermark[i] + z->watermark_boost)
+#define min_wmark_pages(z) (z->_watermark[WMARK_MIN])
+#define low_wmark_pages(z) (z->_watermark[WMARK_LOW])
+#define high_wmark_pages(z) (z->_watermark[WMARK_HIGH])
+#define wmark_pages(z, i) (z->_watermark[i])
=20
 /* Fields and list protected by pagesets local_lock in page_alloc.c */
 struct per_cpu_pages {
@@ -709,7 +709,6 @@ struct zone {
=20
 	/* zone watermarks, access with *_wmark_pages(zone) macros */
 	unsigned long _watermark[NR_WMARK];
-	unsigned long watermark_boost;
=20
 	/*
 	 * We don't know if the memory that we're going to allocate will be
@@ -884,9 +883,6 @@ enum pgdat_flags {
 };
=20
 enum zone_flags {
-	ZONE_BOOSTED_WATERMARK,		/* zone recently boosted watermarks.
-					 * Cleared when kswapd is woken.
-					 */
 	ZONE_RECLAIM_ACTIVE,		/* kswapd may be scanning the zone. */
 };
=20
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 137d4abe3eda..68bcd3a7c9c6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2229,14 +2229,6 @@ static struct ctl_table vm_table[] =3D {
 		.proc_handler	=3D min_free_kbytes_sysctl_handler,
 		.extra1		=3D SYSCTL_ZERO,
 	},
-	{
-		.procname	=3D "watermark_boost_factor",
-		.data		=3D &watermark_boost_factor,
-		.maxlen		=3D sizeof(watermark_boost_factor),
-		.mode		=3D 0644,
-		.proc_handler	=3D proc_dointvec_minmax,
-		.extra1		=3D SYSCTL_ZERO,
-	},
 	{
 		.procname	=3D "watermark_scale_factor",
 		.data		=3D &watermark_scale_factor,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e8ae04feb1bd..f835a5548164 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -401,7 +401,6 @@ compound_page_dtor * const compound_page_dtors[NR_COMPO=
UND_DTORS] =3D {
=20
 int min_free_kbytes =3D 1024;
 int user_min_free_kbytes =3D -1;
-int watermark_boost_factor __read_mostly =3D 15000;
 int watermark_scale_factor =3D 10;
=20
 static unsigned long nr_kernel_pages __initdata;
@@ -2742,43 +2741,6 @@ static bool can_steal_fallback(unsigned int order, i=
nt start_mt,
 	return false;
 }
=20
-static inline bool boost_watermark(struct zone *zone)
-{
-	unsigned long max_boost;
-
-	if (!watermark_boost_factor)
-		return false;
-	/*
-	 * Don't bother in zones that are unlikely to produce results.
-	 * On small machines, including kdump capture kernels running
-	 * in a small area, boosting the watermark can cause an out of
-	 * memory situation immediately.
-	 */
-	if ((pageblock_nr_pages * 4) > zone_managed_pages(zone))
-		return false;
-
-	max_boost =3D mult_frac(zone->_watermark[WMARK_HIGH],
-			watermark_boost_factor, 10000);
-
-	/*
-	 * high watermark may be uninitialised if fragmentation occurs
-	 * very early in boot so do not boost. We do not fall
-	 * through and boost by pageblock_nr_pages as failing
-	 * allocations that early means that reclaim is not going
-	 * to help and it may even be impossible to reclaim the
-	 * boosted watermark resulting in a hang.
-	 */
-	if (!max_boost)
-		return false;
-
-	max_boost =3D max(pageblock_nr_pages, max_boost);
-
-	zone->watermark_boost =3D min(zone->watermark_boost + pageblock_nr_pages,
-		max_boost);
-
-	return true;
-}
-
 /*
  * This function implements actual steal behaviour. If order is large enou=
gh,
  * we can steal whole pageblock. If not, we first move freepages in this
@@ -2802,14 +2764,6 @@ static void steal_suitable_fallback(struct zone *zon=
e, struct page *page,
 		goto single_page;
 	}
=20
-	/*
-	 * Boost watermarks to increase reclaim pressure to reduce the
-	 * likelihood of future fallbacks. Wake kswapd now as the node
-	 * may be balanced overall and kswapd will not wake naturally.
-	 */
-	if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD))
-		set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
-
 	/* We are not allowed to try stealing from the whole block */
 	if (!whole_block)
 		goto single_page;
@@ -3738,12 +3692,6 @@ struct page *rmqueue(struct zone *preferred_zone,
 							migratetype);
=20
 out:
-	/* Separate test+clear to avoid unnecessary atomics */
-	if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
-		clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
-		wakeup_kswapd(zone, 0, 0, zone_idx(zone));
-	}
-
 	VM_BUG_ON_PAGE(page && bad_range(zone, page), page);
 	return page;
 }
@@ -3976,18 +3924,6 @@ static inline bool zone_watermark_fast(struct zone *=
z, unsigned int order,
 	if (__zone_watermark_ok(z, order, mark, highest_zoneidx, alloc_flags,
 					free_pages))
 		return true;
-	/*
-	 * Ignore watermark boosting for GFP_ATOMIC order-0 allocations
-	 * when checking the min watermark. The min watermark is the
-	 * point where boosting is ignored so that kswapd is woken up
-	 * when below the low watermark.
-	 */
-	if (unlikely(!order && (gfp_mask & __GFP_ATOMIC) && z->watermark_boost
-		&& ((alloc_flags & ALLOC_WMARK_MASK) =3D=3D WMARK_MIN))) {
-		mark =3D z->_watermark[WMARK_MIN];
-		return __zone_watermark_ok(z, order, mark, highest_zoneidx,
-					alloc_flags, free_pages);
-	}
=20
 	return false;
 }
@@ -6137,7 +6073,6 @@ void __show_free_areas(unsigned int filter, nodemask_=
t *nodemask, int max_zone_i
 			" free_movable:%lukB"
 			" free_reclaimable:%lukB"
 			" free_free:%lukB"
-			" boost:%lukB"
 			" min:%lukB"
 			" low:%lukB"
 			" high:%lukB"
@@ -6161,7 +6096,6 @@ void __show_free_areas(unsigned int filter, nodemask_=
t *nodemask, int max_zone_i
 			K(zone_page_state(zone, NR_FREE_MOVABLE)),
 			K(zone_page_state(zone, NR_FREE_RECLAIMABLE)),
 			K(zone_page_state(zone, NR_FREE_FREE)),
-			K(zone->watermark_boost),
 			K(min_wmark_pages(zone)),
 			K(low_wmark_pages(zone)),
 			K(high_wmark_pages(zone)),
@@ -8701,7 +8635,6 @@ static void __setup_per_zone_wmarks(void)
 		if (IS_ENABLED(CONFIG_COMPACTION))
 			tmp =3D ALIGN(tmp, 1 << pageblock_order);
=20
-		zone->watermark_boost =3D 0;
 		zone->_watermark[WMARK_LOW]  =3D min_wmark_pages(zone) + tmp;
 		zone->_watermark[WMARK_HIGH] =3D low_wmark_pages(zone) + tmp;
 		zone->_watermark[WMARK_PROMO] =3D high_wmark_pages(zone) + tmp;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a7374cd6fe91..5586be6997cd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6827,30 +6827,6 @@ static void kswapd_age_node(struct pglist_data *pgda=
t, struct scan_control *sc)
 	} while (memcg);
 }
=20
-static bool pgdat_watermark_boosted(pg_data_t *pgdat, int highest_zoneidx)
-{
-	int i;
-	struct zone *zone;
-
-	/*
-	 * Check for watermark boosts top-down as the higher zones
-	 * are more likely to be boosted. Both watermarks and boosts
-	 * should not be checked at the same time as reclaim would
-	 * start prematurely when there is no boosting and a lower
-	 * zone is balanced.
-	 */
-	for (i =3D highest_zoneidx; i >=3D 0; i--) {
-		zone =3D pgdat->node_zones + i;
-		if (!managed_zone(zone))
-			continue;
-
-		if (zone->watermark_boost)
-			return true;
-	}
-
-	return false;
-}
-
 /*
  * Returns true if there is an eligible zone balanced for the request order
  * and highest_zoneidx
@@ -7025,14 +7001,13 @@ static void balance_pgdat(pg_data_t *pgdat, int ord=
er, int highest_zoneidx)
 	unsigned long nr_soft_reclaimed;
 	unsigned long nr_soft_scanned;
 	unsigned long pflags;
-	unsigned long nr_boost_reclaim;
-	unsigned long zone_boosts[MAX_NR_ZONES] =3D { 0, };
-	bool boosted;
 	struct zone *zone;
 	struct scan_control sc =3D {
 		.gfp_mask =3D GFP_KERNEL,
 		.order =3D order,
 		.may_unmap =3D 1,
+		.may_swap =3D 1,
+		.may_writepage =3D !laptop_mode,
 	};
=20
 	set_task_reclaim_state(current, &sc.reclaim_state);
@@ -7041,29 +7016,11 @@ static void balance_pgdat(pg_data_t *pgdat, int ord=
er, int highest_zoneidx)
=20
 	count_vm_event(PAGEOUTRUN);
=20
-	/*
-	 * Account for the reclaim boost. Note that the zone boost is left in
-	 * place so that parallel allocations that are near the watermark will
-	 * stall or direct reclaim until kswapd is finished.
-	 */
-	nr_boost_reclaim =3D 0;
-	for (i =3D 0; i <=3D highest_zoneidx; i++) {
-		zone =3D pgdat->node_zones + i;
-		if (!managed_zone(zone))
-			continue;
-
-		nr_boost_reclaim +=3D zone->watermark_boost;
-		zone_boosts[i] =3D zone->watermark_boost;
-	}
-	boosted =3D nr_boost_reclaim;
-
-restart:
 	set_reclaim_active(pgdat, highest_zoneidx);
 	sc.priority =3D DEF_PRIORITY;
 	do {
 		unsigned long nr_reclaimed =3D sc.nr_reclaimed;
 		bool raise_priority =3D true;
-		bool balanced;
 		bool ret;
=20
 		sc.reclaim_idx =3D highest_zoneidx;
@@ -7089,40 +7046,9 @@ static void balance_pgdat(pg_data_t *pgdat, int orde=
r, int highest_zoneidx)
 			}
 		}
=20
-		/*
-		 * If the pgdat is imbalanced then ignore boosting and preserve
-		 * the watermarks for a later time and restart. Note that the
-		 * zone watermarks will be still reset at the end of balancing
-		 * on the grounds that the normal reclaim should be enough to
-		 * re-evaluate if boosting is required when kswapd next wakes.
-		 */
-		balanced =3D pgdat_balanced(pgdat, sc.order, highest_zoneidx);
-		if (!balanced && nr_boost_reclaim) {
-			nr_boost_reclaim =3D 0;
-			goto restart;
-		}
-
-		/*
-		 * If boosting is not active then only reclaim if there are no
-		 * eligible zones. Note that sc.reclaim_idx is not used as
-		 * buffer_heads_over_limit may have adjusted it.
-		 */
-		if (!nr_boost_reclaim && balanced)
+		if (pgdat_balanced(pgdat, sc.order, highest_zoneidx))
 			goto out;
=20
-		/* Limit the priority of boosting to avoid reclaim writeback */
-		if (nr_boost_reclaim && sc.priority =3D=3D DEF_PRIORITY - 2)
-			raise_priority =3D false;
-
-		/*
-		 * Do not writeback or swap pages for boosted reclaim. The
-		 * intent is to relieve pressure not issue sub-optimal IO
-		 * from reclaim context. If no pages are reclaimed, the
-		 * reclaim will be aborted.
-		 */
-		sc.may_writepage =3D !laptop_mode && !nr_boost_reclaim;
-		sc.may_swap =3D !nr_boost_reclaim;
-
 		/*
 		 * Do some background aging, to give pages a chance to be
 		 * referenced before reclaiming. All pages are rotated
@@ -7173,15 +7099,6 @@ static void balance_pgdat(pg_data_t *pgdat, int orde=
r, int highest_zoneidx)
 		 * progress in reclaiming pages
 		 */
 		nr_reclaimed =3D sc.nr_reclaimed - nr_reclaimed;
-		nr_boost_reclaim -=3D min(nr_boost_reclaim, nr_reclaimed);
-
-		/*
-		 * If reclaim made no progress for a boost, stop reclaim as
-		 * IO cannot be queued and it could be an infinite loop in
-		 * extreme circumstances.
-		 */
-		if (nr_boost_reclaim && !nr_reclaimed)
-			break;
=20
 		if (raise_priority || !nr_reclaimed)
 			sc.priority--;
@@ -7193,28 +7110,6 @@ static void balance_pgdat(pg_data_t *pgdat, int orde=
r, int highest_zoneidx)
 out:
 	clear_reclaim_active(pgdat, highest_zoneidx);
=20
-	/* If reclaim was boosted, account for the reclaim done in this pass */
-	if (boosted) {
-		unsigned long flags;
-
-		for (i =3D 0; i <=3D highest_zoneidx; i++) {
-			if (!zone_boosts[i])
-				continue;
-
-			/* Increments are under the zone lock */
-			zone =3D pgdat->node_zones + i;
-			spin_lock_irqsave(&zone->lock, flags);
-			zone->watermark_boost -=3D min(zone->watermark_boost, zone_boosts[i]);
-			spin_unlock_irqrestore(&zone->lock, flags);
-		}
-
-		/*
-		 * As there is now likely space, wakeup kcompact to defragment
-		 * pageblocks.
-		 */
-		wakeup_kcompactd(pgdat, pageblock_order, highest_zoneidx);
-	}
-
 	snapshot_refaults(NULL, pgdat);
 	__fs_reclaim_release(_THIS_IP_);
 	psi_memstall_leave(&pflags);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index a2f7b41564df..80ee26588242 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1682,7 +1682,6 @@ static void zoneinfo_show_print(struct seq_file *m, p=
g_data_t *pgdat,
 	}
 	seq_printf(m,
 		   "\n  pages free     %lu"
-		   "\n        boost    %lu"
 		   "\n        min      %lu"
 		   "\n        low      %lu"
 		   "\n        high     %lu"
@@ -1691,7 +1690,6 @@ static void zoneinfo_show_print(struct seq_file *m, p=
g_data_t *pgdat,
 		   "\n        managed  %lu"
 		   "\n        cma      %lu",
 		   zone_page_state(zone, NR_FREE_PAGES),
-		   zone->watermark_boost,
 		   min_wmark_pages(zone),
 		   low_wmark_pages(zone),
 		   high_wmark_pages(zone),
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8B74FC7EE20
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232940AbjDRTP0 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57812 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232897AbjDRTOH (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:07 -0400
Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com
 [IPv6:2607:f8b0:4864:20::829])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57C3BAF03
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:49 -0700 (PDT)
Received: by mail-qt1-x829.google.com with SMTP id m21so19410750qtg.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845228;
 x=1684437228;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=3kWGHSuKTgkf/KcYFfj9Qt8QOcgzsovuy8ycEW8Qbl8=;
        b=KFvVKf3QkfWQMBDUHAZr4VeBeoisE8O1GGsOodjwgMNuADxrQdnvk+SbprdpW2IP5r
         ag93cZ6VTR9I76TSMGwf5woII5HGfN9l2WN88UYR1tzcnTBufAlpM3lp/xMmNVG97M43
         eNnVONsHRxzcPVv/iZgrUj1Uz8RGQ4JjzKnpXZZNAJj/FsvWfEp9h8m3xbiQ4JGMIzXh
         Wd0SEOETtzw6tJD+77T1b7ruvzZOu+Nz8JErUfQPxnwB2Zm75Cbp78ju67EmWgaNaQRd
         j09JTsw/mPs41GthMRykMfarC7ahjfxAE2/z8L+r1gArqpDDy8cskY5OjwSOMNK5kLZu
         3VpA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845228; x=1684437228;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=3kWGHSuKTgkf/KcYFfj9Qt8QOcgzsovuy8ycEW8Qbl8=;
        b=jSONX7j92+MSG+GmHkB8ljJ/uXXzpWlkGQ+w1PdtwANCmEUguApYPmYIvSNQLAZ1uw
         HEtayMbn6AoVryuZ/1kGSi24nhQe/4yIlhHfkLCFACOocVFTmyKgiQB7/yPisbMTt7ad
         kVRM0i88BwqJEiuju/86lazFT5nnk15+nllZvFPdiAu+38UuHGEFRLpEZ0IH2uVH5q49
         EXV+G9QXcE8oB6GiGUCEk9SGvJSbRII+cIscRtiVlaPbuXWffN8X3O4GrWEiGQQ8kz0J
         CutmGuh0On1X0mPQLRBNbwia55AC2kDIuF086hVq+fEA1P7wvftIq6+n1lgdgtu6YL07
         A+zA==
X-Gm-Message-State: AAQBX9c9s25b2uK0AO8wxPxgZSBlc4xkxSyeoXcRcxuBdRA6o44h2gOK
        zG5oPz3E90H6TNTofZ/GA21vxA==
X-Google-Smtp-Source: 
 AKy350ZnZudpfigs0VHBHaysAID+48G8C5CQkmg6KTT32vxOaKM82OTa8s5aOXsXXAzcGRMb+jKq5A==
X-Received: by 2002:ac8:5f0b:0:b0:3e6:4069:9136 with SMTP id
 x11-20020ac85f0b000000b003e640699136mr1124473qta.45.1681845228420;
        Tue, 18 Apr 2023 12:13:48 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 x6-20020ac84d46000000b003ef231cceeasm1122594qtv.23.2023.04.18.12.13.47
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:48 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 25/26] mm: page_alloc: disallow fallbacks when 2M defrag
 is enabled
Date: Tue, 18 Apr 2023 15:13:12 -0400
Message-Id: <20230418191313.268131-26-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Fallbacks are already unlikely due to watermarks being enforced
against MIGRATE_FREE blocks. Eliminate them altogether. This allows
compaction to look exclusively at movable blocks, reducing the number
of pageblocks it needs to scan on an ongoing basis.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 52 +++++--------------------------------------------
 mm/internal.h   |  2 +-
 mm/page_alloc.c |  8 ++++++++
 3 files changed, 14 insertions(+), 48 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index e33c99eb34a8..37dfd1878bef 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1258,46 +1258,6 @@ isolate_migratepages_range(struct compact_control *c=
c, unsigned long start_pfn,
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
=20
-static bool suitable_migration_source(struct compact_control *cc,
-							struct page *page)
-{
-	int block_mt;
-
-	if (pageblock_skip_persistent(page))
-		return false;
-
-	if ((cc->mode !=3D MIGRATE_ASYNC) || !cc->direct_compaction)
-		return true;
-
-	block_mt =3D get_pageblock_migratetype(page);
-
-	if (cc->migratetype =3D=3D MIGRATE_MOVABLE)
-		return is_migrate_movable(block_mt);
-	else
-		return block_mt =3D=3D cc->migratetype;
-}
-
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct compact_control *cc,
-							struct page *page)
-{
-	int mt =3D get_pageblock_migratetype(page);
-
-	/* If the page is a large free page, then disallow migration */
-	if (mt =3D=3D MIGRATE_FREE)
-		return false;
-
-	if (cc->ignore_block_suitable)
-		return true;
-
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (is_migrate_movable(mt))
-		return true;
-
-	/* Otherwise skip the block */
-	return false;
-}
-
 static inline unsigned int
 freelist_scan_limit(struct compact_control *cc)
 {
@@ -1620,7 +1580,7 @@ static void isolate_freepages(struct compact_control =
*cc)
 			continue;
=20
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(cc, page))
+		if (!is_migrate_movable(get_pageblock_migratetype(page)))
 			continue;
=20
 		/* If isolation recently failed, do not retry */
@@ -1927,14 +1887,12 @@ static isolate_migrate_t isolate_migratepages(struc=
t compact_control *cc)
 			continue;
=20
 		/*
-		 * For async direct compaction, only scan the pageblocks of the
-		 * same migratetype without huge pages. Async direct compaction
-		 * is optimistic to see if the minimum amount of work satisfies
-		 * the allocation. The cached PFN is updated as it's possible
-		 * that all remaining blocks between source and target are
+		 * The cached PFN is updated as it's possible that all
+		 * remaining blocks between source and target are
 		 * unsuitable and the compaction scanners fail to meet.
 		 */
-		if (!suitable_migration_source(cc, page)) {
+		if (pageblock_skip_persistent(page) ||
+		    !is_migrate_movable(get_pageblock_migratetype(page))) {
 			update_cached_migrate(cc, block_end_pfn);
 			continue;
 		}
diff --git a/mm/internal.h b/mm/internal.h
index 24f43f5db88b..1c0886c3ce0e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -741,7 +741,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone =
*zone,
 #define ALLOC_HIGH		 0x20 /* __GFP_HIGH set */
 #define ALLOC_CPUSET		 0x40 /* check for correct cpuset */
 #define ALLOC_CMA		 0x80 /* allow allocations from CMA areas */
-#ifdef CONFIG_ZONE_DMA32
+#if defined(CONFIG_ZONE_DMA32) && !defined(CONFIG_COMPACTION)
 #define ALLOC_NOFRAGMENT	0x100 /* avoid mixing pageblock types */
 #else
 #define ALLOC_NOFRAGMENT	  0x0
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f835a5548164..9db588a1de3b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2622,11 +2622,19 @@ struct page *__rmqueue_smallest(struct zone *zone, =
unsigned int order,
  *
  * The other migratetypes do not have fallbacks.
  */
+#ifdef CONFIG_COMPACTION
+static int fallbacks[MIGRATE_TYPES][2] =3D {
+	[MIGRATE_UNMOVABLE]   =3D { MIGRATE_FREE, MIGRATE_TYPES },
+	[MIGRATE_MOVABLE]     =3D { MIGRATE_FREE, MIGRATE_TYPES },
+	[MIGRATE_RECLAIMABLE] =3D { MIGRATE_FREE, MIGRATE_TYPES },
+};
+#else
 static int fallbacks[MIGRATE_TYPES][4] =3D {
 	[MIGRATE_UNMOVABLE]   =3D { MIGRATE_FREE, MIGRATE_RECLAIMABLE, MIGRATE_MO=
VABLE,   MIGRATE_TYPES },
 	[MIGRATE_MOVABLE]     =3D { MIGRATE_FREE, MIGRATE_RECLAIMABLE, MIGRATE_UN=
MOVABLE, MIGRATE_TYPES },
 	[MIGRATE_RECLAIMABLE] =3D { MIGRATE_FREE, MIGRATE_UNMOVABLE,   MIGRATE_MO=
VABLE,   MIGRATE_TYPES },
 };
+#endif
=20
 #ifdef CONFIG_CMA
 static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zo=
ne,
--=20
2.39.2
From nobody Sun Feb  8 19:47:47 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 788A6C77B7E
	for <linux-kernel@archiver.kernel.org>; Tue, 18 Apr 2023 19:15:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232880AbjDRTPT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 18 Apr 2023 15:15:19 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57838 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232903AbjDRTOI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 18 Apr 2023 15:14:08 -0400
Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com
 [IPv6:2607:f8b0:4864:20::82d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 930489EDF
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:50 -0700 (PDT)
Received: by mail-qt1-x82d.google.com with SMTP id m21so19410812qtg.0
        for <linux-kernel@vger.kernel.org>;
 Tue, 18 Apr 2023 12:13:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1681845229;
 x=1684437229;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=qWzWNVwkSkX/nKeV3fr2KFPOgnXDvxLFllKcLvZxGIY=;
        b=A6AezfgFb558g3zl8KYkj4x8/L0DIOCUiDlAuiOpPB/uNdrligymciqY3pviiovSz7
         vKSf382DQ8ssov3wdJ9qL3nD7oIEAgvToPUv9CYpTlO4HbTHGzp2IZTc2DXrdM3mXahz
         0py57a20ojr2O+PepMwl9/QHioEm1FDw2MRiTZYyYWKB9AsL5d0proRbKpQwyYUfMNP1
         //zhEeIOanhXcaehUnWf/FVKyKlOdRk3NVSIwARpSIl5m2koxzkBtGtcG0JMTTTieE93
         AO69YNUjxBNNIS0LLC1PusXdBE3chI7eqcQajUcD2Uoo7NcibA/0DG0uhwzJHlI926dN
         p8uw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681845229; x=1684437229;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=qWzWNVwkSkX/nKeV3fr2KFPOgnXDvxLFllKcLvZxGIY=;
        b=J5aSft4JB8haMw9NvPttTF7gAKpazGo5g/A85CKm0eF5x4ZEt2E/QXtO7RJgHCR1Cg
         eex9Z8WHC8dqaC36xAbrB47WCiH3xZUJMyo2nh9N3pWoX4WPYXsdbxICtAOeJOCJ29hn
         f1Ns/JmMVv/NFxrJVo+QkPh1LWFJzi31gcZ5MX6xhogFIDl6RLrfcphshBj1HfyIe62L
         FKz0TSPTsDltf2kv1yIMQbsg2iFK8q8PVZccjn+E7scs9gg8Fzu0GoK+Zx9r2+FZycRm
         fzWuuz/vCSLp6SHmJnPsoUJ1lUbtTxvk2ludQgOLvVyw/4YOHnAK03Q8pHyRMgR2cOxP
         6p1g==
X-Gm-Message-State: AAQBX9fwp6oS+wRInX3e7EKDWckBpUmJIQfpLPATxGLs8QzCtQPx367B
        4LJnY2EIFqaQ+ssdwI1WCyhiTQ==
X-Google-Smtp-Source: 
 AKy350ZdZcMKMoHMrh8Lve5YppGzlgXCzZWIFyu+Z0bcXvlyOp64kIBGTX1A+KzTiCSPMF180KBHNA==
X-Received: by 2002:a05:622a:4a:b0:3b8:6ca4:bb23 with SMTP id
 y10-20020a05622a004a00b003b86ca4bb23mr1454631qtw.15.1681845229705;
        Tue, 18 Apr 2023 12:13:49 -0700 (PDT)
Received: from localhost ([2620:10d:c091:400::5:e646])
        by smtp.gmail.com with ESMTPSA id
 d9-20020ac81189000000b003eb136bec50sm3482413qtj.66.2023.04.18.12.13.49
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 18 Apr 2023 12:13:49 -0700 (PDT)
From: Johannes Weiner <hannes@cmpxchg.org>
To: linux-mm@kvack.org
Cc: Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
        Mel Gorman <mgorman@techsingularity.net>,
        Vlastimil Babka <vbabka@suse.cz>,
        David Rientjes <rientjes@google.com>,
        linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [RFC PATCH 26/26] mm: page_alloc: add sanity checks for migratetypes
Date: Tue, 18 Apr 2023 15:13:13 -0400
Message-Id: <20230418191313.268131-27-hannes@cmpxchg.org>
X-Mailer: git-send-email 2.39.2
In-Reply-To: <20230418191313.268131-1-hannes@cmpxchg.org>
References: <20230418191313.268131-1-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Now that known block pollution from fallbacks, !movable compaction,
highatomic reserves and single THP pcplists is gone, add high-level
sanity checks that ensure that pages coming out of the allocator are
of the requested migratetype.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9db588a1de3b..b8767a6075e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3675,6 +3675,7 @@ struct page *rmqueue(struct zone *preferred_zone,
 			int migratetype)
 {
 	struct page *page;
+	int buddy =3D 0;
=20
 	/*
 	 * We most definitely don't want callers attempting to
@@ -3698,9 +3699,14 @@ struct page *rmqueue(struct zone *preferred_zone,
=20
 	page =3D rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
 							migratetype);
+	buddy =3D 1;
=20
 out:
 	VM_BUG_ON_PAGE(page && bad_range(zone, page), page);
+	VM_WARN_ONCE(page && get_pageblock_migratetype(page) !=3D migratetype,
+		     "%d:%s order=3D%u gfp=3D%pGg mt=3D%s alloc_flags=3D%x buddy=3D%d\n",
+		     zone_to_nid(zone), zone->name, order, &gfp_flags,
+		     migratetype_names[migratetype], alloc_flags, buddy);
 	return page;
 }
=20
--=20
2.39.2