From nobody Sat Feb  7 10:16:20 2026
Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com
 [209.85.128.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B7BA15C137
	for <linux-kernel@vger.kernel.org>; Sun, 11 Aug 2024 21:21:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1723411298; cv=none;
 b=GS7BNSGSaZWFWTsKMdA4KP8zBdGhpiKLOhX9U+xzF30XJ+9eKDdK6akjcqhx4CXGQU2q3E+KNRWuCMDFKq78cKvz8911AdD1VXYyhwVIuZPVZ2EQxiekydT8S7w6JCqfHbi2g6jLblUTWqH7SeU48u/CpjLMuoelqLggh4bsCxA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1723411298; c=relaxed/simple;
	bh=CxucUH6eTkoQX20WqNxvTvPTAh0/zOMUPLj1GJwi7lA=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=nVQ190U2RqNG/dyVxj7i1u9MW5rXAIaeVggyWkxYWfCcXo1Xz9wnk2zN/u5W+Ix+jodjQrTRy+Zlr6M/AdlFHOE8DtavBLIcK9he4iUN5KowBDPRS/CCrzyVEHFdrINaGPzK/jbofvrJbb0GItHk8tVmX7/ZLBQkAildahBe5qo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=mr28u22f; arc=none smtp.client-ip=209.85.128.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="mr28u22f"
Received: by mail-yw1-f201.google.com with SMTP id
 00721157ae682-654d96c2bb5so71758667b3.2
        for <linux-kernel@vger.kernel.org>;
 Sun, 11 Aug 2024 14:21:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1723411295; x=1724016095;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=THGaYoJGBVo1NFT+ygjobh7TePYwQA+eDE4p3qbAKWk=;
        b=mr28u22fLVZHFPKPupNnpX1VvV0A/jz074pfLA0e/uYL+5OAXng6I+vwPnHpfxBaoT
         hK0iW+XUmMiUzEV42t0jbkjDBrMisBHKIK+icIrZwoFtmn7AEXGfa0ItActVUg4WOvW7
         Xm3objDgMj7vlHghjB3pg29Dulx1bs1x9nF6P8QhlVZGhpBvU2eE2+aVL8By5WHfL/H/
         J6Kp21YAgatpIWJF0rUDYBJmJP6g1TcFoatPBwsQS/uvxcuNc59a8kxT53GrURbpUMrr
         xRz9wgXV2UL1VM9ENUd+adH7FsJrozD9fztQn+roDqURxQqxwHv5SGAZrJGbCjYyk+yk
         rAXA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1723411295; x=1724016095;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=THGaYoJGBVo1NFT+ygjobh7TePYwQA+eDE4p3qbAKWk=;
        b=pA9oHxsE8v2pjZxjAT4qogaQ4Q9mZrminVj1ezT8yNV5znSNX9XISUaKgD0nKFwkVX
         GLHluuKGHk1sqGs1sagf9I4K8b/V3zp/daOdFLhdVB7fW5YFL9nIH1ItePZBypvUiqV6
         KTmafPgSX1QNnSwHWzMYXpWRDZGa3NIHhYx2d55wFQze9cdmEiCS2iCoM3evcLk5Rf2x
         fOLMbZDruYmjqtt0Qos8J3Zv+MGULplpo8BAu5GAdKcslT+n5XWb9mzLRWodd80AjqpW
         yG9Z9f+yQKqfyLGEicvRoiX9QSM1Ue+86ceirdWSOizs4ZShMbLGJSA9IHUTBpr2RtEB
         uEWA==
X-Forwarded-Encrypted: i=1;
 AJvYcCXJqikFu7ZoQDyTkM3CYWmi3Geq5PiTzgtepS924USuUIRX+5kpdWW6ikSH4I6jKNZMSO1xDsoKNByLZZIlb4iIfj7jdW0aD2osR6m5
X-Gm-Message-State: AOJu0YyKyhQ4vV76fw3ac/pPGOAfgy6L7L50rZ7EXDk0Xc7SlUG2V3Pj
	4D+WfCMKCpXOWHpKgGmbZeC5uypzQL7BT3ebUAsMXStpifn23MbKJmhK1LYnRJwBXwEWLSJoSPg
	24A==
X-Google-Smtp-Source: 
 AGHT+IGkx0nGnqUgrMKet1uE7S+7hqwGUl1ojaHvm5FZlh6Ngw6ccwePmfpItDUFlv/Deh8AXgWhV/Td5YI=
X-Received: from yuzhao2.bld.corp.google.com
 ([2a00:79e0:2e28:6:c9c:12b4:a1e3:7f10])
 (user=yuzhao job=sendgmr) by 2002:a25:dc8d:0:b0:e0e:4350:d7de with SMTP id
 3f1490d57ef6-e0eb9a28207mr13988276.9.1723411295441; Sun, 11 Aug 2024 14:21:35
 -0700 (PDT)
Date: Sun, 11 Aug 2024 15:21:27 -0600
In-Reply-To: <20240811212129.3074314-1-yuzhao@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240811212129.3074314-1-yuzhao@google.com>
X-Mailer: git-send-email 2.46.0.76.ge559c4bf1a-goog
Message-ID: <20240811212129.3074314-2-yuzhao@google.com>
Subject: [PATCH mm-unstable v1 1/3] mm/contig_alloc: support __GFP_COMP
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
 Muchun Song <muchun.song@linux.dev>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, Zi Yan <ziy@nvidia.com>,
 linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Yu Zhao <yuzhao@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Support __GFP_COMP in alloc_contig_range(). When the flag is set, upon
success the function returns a large folio prepared by
prep_new_page(), rather than a range of order-0 pages prepared by
split_free_pages() (which is renamed from split_map_pages()).

alloc_contig_range() can return folios larger than MAX_PAGE_ORDER,
e.g., gigantic hugeTLB folios. As a result, on the free path
free_one_page() needs to handle this case by split_large_buddy(), in
addition to free_contig_range() properly handling large folios by
folio_put().

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/compaction.c |  48 +++------------------
 mm/internal.h   |   9 ++++
 mm/page_alloc.c | 111 ++++++++++++++++++++++++++++++++++--------------
 3 files changed, 94 insertions(+), 74 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index eb95e9b435d0..1ebfef98e1d0 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -79,40 +79,6 @@ static inline bool is_via_compact_memory(int order) { re=
turn false; }
 #define COMPACTION_HPAGE_ORDER	(PMD_SHIFT - PAGE_SHIFT)
 #endif
=20
-static struct page *mark_allocated_noprof(struct page *page, unsigned int =
order, gfp_t gfp_flags)
-{
-	post_alloc_hook(page, order, __GFP_MOVABLE);
-	return page;
-}
-#define mark_allocated(...)	alloc_hooks(mark_allocated_noprof(__VA_ARGS__))
-
-static void split_map_pages(struct list_head *freepages)
-{
-	unsigned int i, order;
-	struct page *page, *next;
-	LIST_HEAD(tmp_list);
-
-	for (order =3D 0; order < NR_PAGE_ORDERS; order++) {
-		list_for_each_entry_safe(page, next, &freepages[order], lru) {
-			unsigned int nr_pages;
-
-			list_del(&page->lru);
-
-			nr_pages =3D 1 << order;
-
-			mark_allocated(page, order, __GFP_MOVABLE);
-			if (order)
-				split_page(page, order);
-
-			for (i =3D 0; i < nr_pages; i++) {
-				list_add(&page->lru, &tmp_list);
-				page++;
-			}
-		}
-		list_splice_init(&tmp_list, &freepages[0]);
-	}
-}
-
 static unsigned long release_free_list(struct list_head *freepages)
 {
 	int order;
@@ -742,11 +708,11 @@ static unsigned long isolate_freepages_block(struct c=
ompact_control *cc,
  *
  * Non-free pages, invalid PFNs, or zone boundaries within the
  * [start_pfn, end_pfn) range are considered errors, cause function to
- * undo its actions and return zero.
+ * undo its actions and return zero. cc->freepages[] are empty.
  *
  * Otherwise, function returns one-past-the-last PFN of isolated page
  * (which may be greater then end_pfn if end fell in a middle of
- * a free page).
+ * a free page). cc->freepages[] contain free pages isolated.
  */
 unsigned long
 isolate_freepages_range(struct compact_control *cc,
@@ -754,10 +720,9 @@ isolate_freepages_range(struct compact_control *cc,
 {
 	unsigned long isolated, pfn, block_start_pfn, block_end_pfn;
 	int order;
-	struct list_head tmp_freepages[NR_PAGE_ORDERS];
=20
 	for (order =3D 0; order < NR_PAGE_ORDERS; order++)
-		INIT_LIST_HEAD(&tmp_freepages[order]);
+		INIT_LIST_HEAD(&cc->freepages[order]);
=20
 	pfn =3D start_pfn;
 	block_start_pfn =3D pageblock_start_pfn(pfn);
@@ -788,7 +753,7 @@ isolate_freepages_range(struct compact_control *cc,
 			break;
=20
 		isolated =3D isolate_freepages_block(cc, &isolate_start_pfn,
-					block_end_pfn, tmp_freepages, 0, true);
+					block_end_pfn, cc->freepages, 0, true);
=20
 		/*
 		 * In strict mode, isolate_freepages_block() returns 0 if
@@ -807,13 +772,10 @@ isolate_freepages_range(struct compact_control *cc,
=20
 	if (pfn < end_pfn) {
 		/* Loop terminated early, cleanup. */
-		release_free_list(tmp_freepages);
+		release_free_list(cc->freepages);
 		return 0;
 	}
=20
-	/* __isolate_free_page() does not map the pages */
-	split_map_pages(tmp_freepages);
-
 	/* We don't use freelists for anything. */
 	return pfn;
 }
diff --git a/mm/internal.h b/mm/internal.h
index acda347620c6..03e795ce755f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -679,6 +679,15 @@ extern void prep_compound_page(struct page *page, unsi=
gned int order);
=20
 extern void post_alloc_hook(struct page *page, unsigned int order,
 					gfp_t gfp_flags);
+
+static inline struct page *post_alloc_hook_noprof(struct page *page, unsig=
ned int order,
+						  gfp_t gfp_flags)
+{
+	post_alloc_hook(page, order, __GFP_MOVABLE);
+	return page;
+}
+#define mark_allocated(...) alloc_hooks(post_alloc_hook_noprof(__VA_ARGS__=
))
+
 extern bool free_pages_prepare(struct page *page, unsigned int order);
=20
 extern int user_min_free_kbytes;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 84a7154fde93..6c801404a108 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1196,16 +1196,36 @@ static void free_pcppages_bulk(struct zone *zone, i=
nt count,
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
=20
+/* Split a multi-block free page into its individual pageblocks */
+static void split_large_buddy(struct zone *zone, struct page *page,
+			      unsigned long pfn, int order, fpi_t fpi)
+{
+	unsigned long end =3D pfn + (1 << order);
+
+	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
+	/* Caller removed page from freelist, buddy info cleared! */
+	VM_WARN_ON_ONCE(PageBuddy(page));
+
+	if (order > pageblock_order)
+		order =3D pageblock_order;
+
+	while (pfn !=3D end) {
+		int mt =3D get_pfnblock_migratetype(page, pfn);
+
+		__free_one_page(page, pfn, zone, order, mt, fpi);
+		pfn +=3D 1 << order;
+		page =3D pfn_to_page(pfn);
+	}
+}
+
 static void free_one_page(struct zone *zone, struct page *page,
 			  unsigned long pfn, unsigned int order,
 			  fpi_t fpi_flags)
 {
 	unsigned long flags;
-	int migratetype;
=20
 	spin_lock_irqsave(&zone->lock, flags);
-	migratetype =3D get_pfnblock_migratetype(page, pfn);
-	__free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
+	split_large_buddy(zone, page, pfn, order, fpi_flags);
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
=20
@@ -1697,27 +1717,6 @@ static unsigned long find_large_buddy(unsigned long =
start_pfn)
 	return start_pfn;
 }
=20
-/* Split a multi-block free page into its individual pageblocks */
-static void split_large_buddy(struct zone *zone, struct page *page,
-			      unsigned long pfn, int order)
-{
-	unsigned long end_pfn =3D pfn + (1 << order);
-
-	VM_WARN_ON_ONCE(order <=3D pageblock_order);
-	VM_WARN_ON_ONCE(pfn & (pageblock_nr_pages - 1));
-
-	/* Caller removed page from freelist, buddy info cleared! */
-	VM_WARN_ON_ONCE(PageBuddy(page));
-
-	while (pfn !=3D end_pfn) {
-		int mt =3D get_pfnblock_migratetype(page, pfn);
-
-		__free_one_page(page, pfn, zone, pageblock_order, mt, FPI_NONE);
-		pfn +=3D pageblock_nr_pages;
-		page =3D pfn_to_page(pfn);
-	}
-}
-
 /**
  * move_freepages_block_isolate - move free pages in block for page isolat=
ion
  * @zone: the zone
@@ -1758,7 +1757,7 @@ bool move_freepages_block_isolate(struct zone *zone, =
struct page *page,
 		del_page_from_free_list(buddy, zone, order,
 					get_pfnblock_migratetype(buddy, pfn));
 		set_pageblock_migratetype(page, migratetype);
-		split_large_buddy(zone, buddy, pfn, order);
+		split_large_buddy(zone, buddy, pfn, order, FPI_NONE);
 		return true;
 	}
=20
@@ -1769,7 +1768,7 @@ bool move_freepages_block_isolate(struct zone *zone, =
struct page *page,
 		del_page_from_free_list(page, zone, order,
 					get_pfnblock_migratetype(page, pfn));
 		set_pageblock_migratetype(page, migratetype);
-		split_large_buddy(zone, page, pfn, order);
+		split_large_buddy(zone, page, pfn, order, FPI_NONE);
 		return true;
 	}
 move:
@@ -6482,6 +6481,31 @@ int __alloc_contig_migrate_range(struct compact_cont=
rol *cc,
 	return (ret < 0) ? ret : 0;
 }
=20
+static void split_free_pages(struct list_head *list)
+{
+	int order;
+
+	for (order =3D 0; order < NR_PAGE_ORDERS; order++) {
+		struct page *page, *next;
+		int nr_pages =3D 1 << order;
+
+		list_for_each_entry_safe(page, next, &list[order], lru) {
+			int i;
+
+			mark_allocated(page, order, __GFP_MOVABLE);
+			if (!order)
+				continue;
+
+			split_page(page, order);
+
+			/* add all subpages to the order-0 head, in sequence */
+			list_del(&page->lru);
+			for (i =3D 0; i < nr_pages; i++)
+				list_add_tail(&page[i].lru, &list[0]);
+		}
+	}
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -6594,12 +6618,25 @@ int alloc_contig_range_noprof(unsigned long start, =
unsigned long end,
 		goto done;
 	}
=20
-	/* Free head and tail (if any) */
-	if (start !=3D outer_start)
-		free_contig_range(outer_start, start - outer_start);
-	if (end !=3D outer_end)
-		free_contig_range(end, outer_end - end);
+	if (!(gfp_mask & __GFP_COMP)) {
+		split_free_pages(cc.freepages);
=20
+		/* Free head and tail (if any) */
+		if (start !=3D outer_start)
+			free_contig_range(outer_start, start - outer_start);
+		if (end !=3D outer_end)
+			free_contig_range(end, outer_end - end);
+	} else if (start =3D=3D outer_start && end =3D=3D outer_end && is_power_o=
f_2(end - start)) {
+		struct page *head =3D pfn_to_page(start);
+		int order =3D ilog2(end - start);
+
+		check_new_pages(head, order);
+		prep_new_page(head, order, gfp_mask, 0);
+	} else {
+		ret =3D -EINVAL;
+		WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
+		     start, end, outer_start, outer_end);
+	}
 done:
 	undo_isolate_page_range(start, end, migratetype);
 	return ret;
@@ -6708,6 +6745,18 @@ struct page *alloc_contig_pages_noprof(unsigned long=
 nr_pages, gfp_t gfp_mask,
 void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 {
 	unsigned long count =3D 0;
+	struct folio *folio =3D pfn_folio(pfn);
+
+	if (folio_test_large(folio)) {
+		int expected =3D folio_nr_pages(folio);
+
+		if (nr_pages =3D=3D expected)
+			folio_put(folio);
+		else
+			WARN(true, "PFN %lu: nr_pages %lu !=3D expected %d\n",
+			     pfn, nr_pages, expected);
+		return;
+	}
=20
 	for (; nr_pages--; pfn++) {
 		struct page *page =3D pfn_to_page(pfn);
--=20
2.46.0.76.ge559c4bf1a-goog
From nobody Sat Feb  7 10:16:20 2026
Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com
 [209.85.128.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 03DE515FCE7
	for <linux-kernel@vger.kernel.org>; Sun, 11 Aug 2024 21:21:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1723411300; cv=none;
 b=REmV3JL3LEyBZ074SD0MtoAG/o8eJC5g1iiqhs5qEET2Ydb8CbGPcyfKx6F2zZU2LjjLUxLgmkMeNK4HBqeRfRex13cpbq9raClyiET6X4s2siufWF59J+MDi8IAfnXpvhtJ8E+6h8NmhoWHMeHOyiCReOUepYqMSSls1jLGB7E=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1723411300; c=relaxed/simple;
	bh=pheRWS/WfgSFG+rDhKvzrJ0Y1XJmoH8WxkVEDAVFjdw=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=JWtE/AUNiDQWlCuZT3pH6z7NiM8a5NswAP8fztHmgJ8KBoWLLl8tbZKuKUdtSmSHQ7rlMx4cbot4mbWnNgUE76y56269O5jp2/4VGsIBgJ9OKJtWnR14b4mNReU7H2FjjTeCajZwLyUULjPnwHwm8r5zChCuYMi5NRnvnxSrb4M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=b+zQpiSd; arc=none smtp.client-ip=209.85.128.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="b+zQpiSd"
Received: by mail-yw1-f201.google.com with SMTP id
 00721157ae682-66b3b4415c7so84417787b3.0
        for <linux-kernel@vger.kernel.org>;
 Sun, 11 Aug 2024 14:21:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1723411298; x=1724016098;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=MiWur5WPUCFOabKPS7p8sIQsgKEHOpII1M/7n2SnGyU=;
        b=b+zQpiSdwvlv2hA0xXq+G9kEk96QmF433yGTqcnzYqTCZs3OVf7LqbSg2OOv4HXrux
         pBSM7n4yALIeFK2l5qyczqCI5MAqv/RVL8p+DqWeX8RJBhXP83EuWOUceuIzygeXqTZ9
         hu+C3Cxi4aqYDdY3n+ulmmxxZJiU7G0/pRss4KGdZmQJCTocQLCCKbb1u8CUPXV2RB/Q
         cz9uzl/BMVGGwu+GfV7W4EWE1jlrY45EWQpZu6lNTqHilmhZjvVkpQa+ztJ+L/DqSnRC
         LET/CWNrXz+aWsck3BNfom8uQsxMYkyo61vg+nJ1QINbBajpFTRZ9GbKi8JBb/v4SS+E
         +ToQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1723411298; x=1724016098;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=MiWur5WPUCFOabKPS7p8sIQsgKEHOpII1M/7n2SnGyU=;
        b=smHzTLFTf368WLDETkeaGlITkRVX3SASaOk8LryWcjCwhUaZkSvMaxinh3SmAfkq7b
         cJLrHB3FwaWIwTD7szccYuJ9++JZOvvKH6+xz9Tyjpq1Pk0pDHr11GmqeE0NYZETB8uT
         JX2+meHRNhzTKRki4z7V2xHdsJWW9w/m7Mm3eoXZboHWtW2l1A5eXSWz/Z1po3052Ry+
         eli/6mgfZ7K2Nz6iU9W2r+N93EP6PIm+PGs0+/QPm00I0ceJ8blfJyBs+hhAgC4DXomL
         3LYfNUVpGKZzRZ/0NcnijDPdz92a0TKYmYASyjv5BmDbLskKT2t+PJ6n2iJlzqJpPTYJ
         cc2g==
X-Forwarded-Encrypted: i=1;
 AJvYcCX5ewRh06r9WiCYLM42oIN5rlHjiRic4VgyTisE8KvmJB4mUiXXSH/aq2U6gypAmhiyK+n7TaPJy+ki5aKVj7VE+RxOZrcVZNegvE9H
X-Gm-Message-State: AOJu0YzJQgTVUAD+EdVBSSPxO61R6gubh/Z8gCc+OkEE0oW6VmgY+L64
	fLYp7oT/uV8N4kJFNap0UuMycrMK74nXasAQQwQLkHqyiV/RxDbnIiupC5tmYtEhRLUE+AOSeBQ
	2Yg==
X-Google-Smtp-Source: 
 AGHT+IEHCLQ7KUCLVKLYCjPKQE5Er4aNEmGlPQGmmKueq/o3fFhVzS87gYKaHPAinDZLXmdPberE0nnpI54=
X-Received: from yuzhao2.bld.corp.google.com
 ([2a00:79e0:2e28:6:c9c:12b4:a1e3:7f10])
 (user=yuzhao job=sendgmr) by 2002:a05:690c:2e13:b0:68e:edf4:7b80 with SMTP id
 00721157ae682-69ec86beceamr2524307b3.5.1723411297967; Sun, 11 Aug 2024
 14:21:37 -0700 (PDT)
Date: Sun, 11 Aug 2024 15:21:28 -0600
In-Reply-To: <20240811212129.3074314-1-yuzhao@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240811212129.3074314-1-yuzhao@google.com>
X-Mailer: git-send-email 2.46.0.76.ge559c4bf1a-goog
Message-ID: <20240811212129.3074314-3-yuzhao@google.com>
Subject: [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio()
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
 Muchun Song <muchun.song@linux.dev>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, Zi Yan <ziy@nvidia.com>,
 linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Yu Zhao <yuzhao@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

With alloc_contig_range() and free_contig_range() supporting large
folios, CMA can allocate and free large folios too, by
cma_alloc_folio() and cma_release().

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/cma.h |  1 +
 mm/cma.c            | 47 ++++++++++++++++++++++++++++++---------------
 2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 9db877506ea8..086553fbda73 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -46,6 +46,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_a=
ddr_t size,
 					struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsign=
ed int align,
 			      bool no_warn);
+extern struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp=
);
 extern bool cma_pages_valid(struct cma *cma, const struct page *pages, uns=
igned long count);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigne=
d long count);
=20
diff --git a/mm/cma.c b/mm/cma.c
index 95d6950e177b..46feb06db8e7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -403,18 +403,8 @@ static void cma_debug_show_areas(struct cma *cma)
 	spin_unlock_irq(&cma->lock);
 }
=20
-/**
- * cma_alloc() - allocate pages from contiguous area
- * @cma:   Contiguous memory region for which the allocation is performed.
- * @count: Requested number of pages.
- * @align: Requested alignment of pages (in PAGE_SIZE order).
- * @no_warn: Avoid printing message about failed allocation
- *
- * This function allocates part of contiguous memory on specific
- * contiguous memory area.
- */
-struct page *cma_alloc(struct cma *cma, unsigned long count,
-		       unsigned int align, bool no_warn)
+static struct page *__cma_alloc(struct cma *cma, unsigned long count,
+				unsigned int align, gfp_t gfp)
 {
 	unsigned long mask, offset;
 	unsigned long pfn =3D -1;
@@ -463,8 +453,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long c=
ount,
=20
 		pfn =3D cma->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma_mutex);
-		ret =3D alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
-				     GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
+		ret =3D alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
 		mutex_unlock(&cma_mutex);
 		if (ret =3D=3D 0) {
 			page =3D pfn_to_page(pfn);
@@ -494,7 +483,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long c=
ount,
 			page_kasan_tag_reset(nth_page(page, i));
 	}
=20
-	if (ret && !no_warn) {
+	if (ret && !(gfp & __GFP_NOWARN)) {
 		pr_err_ratelimited("%s: %s: alloc failed, req-size: %lu pages, ret: %d\n=
",
 				   __func__, cma->name, count, ret);
 		cma_debug_show_areas(cma);
@@ -513,6 +502,34 @@ struct page *cma_alloc(struct cma *cma, unsigned long =
count,
 	return page;
 }
=20
+/**
+ * cma_alloc() - allocate pages from contiguous area
+ * @cma:   Contiguous memory region for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ * @no_warn: Avoid printing message about failed allocation
+ *
+ * This function allocates part of contiguous memory on specific
+ * contiguous memory area.
+ */
+struct page *cma_alloc(struct cma *cma, unsigned long count,
+		       unsigned int align, bool no_warn)
+{
+	return __cma_alloc(cma, count, align, GFP_KERNEL | (no_warn ? __GFP_NOWAR=
N : 0));
+}
+
+struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
+{
+	struct page *page;
+
+	if (WARN_ON(order && !(gfp | __GFP_COMP)))
+		return NULL;
+
+	page =3D __cma_alloc(cma, 1 << order, order, gfp);
+
+	return page ? page_folio(page) : NULL;
+}
+
 bool cma_pages_valid(struct cma *cma, const struct page *pages,
 		     unsigned long count)
 {
--=20
2.46.0.76.ge559c4bf1a-goog
From nobody Sat Feb  7 10:16:20 2026
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93200165EF0
	for <linux-kernel@vger.kernel.org>; Sun, 11 Aug 2024 21:21:41 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1723411303; cv=none;
 b=e42zOwtBmqQbVYMxIu6Pp6AmyKRHBWWLdPFi/5fuqW1UkkGCENs34W3Dsy0b+BhhN/cAkAoSsaAEl2V05QQo7utSqgu1QottFXtUtxf8IKjfwDgUJxHf6bB70lUzB3Qs5SHFiHBeJ3uDAr61A7O1DwYUPH6qyY/S9ipH/BFCvDg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1723411303; c=relaxed/simple;
	bh=rX/uyGK859Qv0c/8SCHhsaXgw8NXAEtdAyPwFlV/th8=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=jmWEarIMxsOyasOq4MDzMvxqbNSIIF+LgNqA1tKbl2yZmlq+MVwSZ8EmXPshpM3jLrnA+m7J9CeuEv4dY38x2uQsx+0R4YK8CA4WorRba4CxMzYRtVU8WSwgtFsu3qelcBT5kOJfNZgBTStM/2J1zhTmS1R7J5R8INPuumtmAZc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=E9l63VT9; arc=none smtp.client-ip=209.85.128.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--yuzhao.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="E9l63VT9"
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-654d96c2bb5so71759517b3.2
        for <linux-kernel@vger.kernel.org>;
 Sun, 11 Aug 2024 14:21:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1723411300; x=1724016100;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=plPNUwSc20wvxDLG/5V4ONiycR26+90aEQUlSgRQtvI=;
        b=E9l63VT9e91/MxKGZp02kE0MK5vH7t4CcV1mPs88z+waj062iJejvzuwInKABnG3Tr
         WF+nFKZ2C8Ol4x1Y5yOeCUwmrT/12Q6kAzkInAhktCiq0f7/2cSRhb/vZFvjvQzeiYJ7
         /1puVbwvEvEa1ndM8w5/IkQO3jnmoM9GDXdHg01lIMBuEnki47ZYdsvoWjQVFFJGvmvt
         Fgl8NCuyT1RmNSE8O/+Qxtg4onuHc7DXOHnI3XbQP3lftgzC+if2bS8Mj8yzzQ5o8Utg
         afvwrHpI3OwtT8IiY9UoH5EkTjYI87yOS2gbnoyV2b3j1FEqJkkmumdLMDApWUqPjD9v
         6YYw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1723411300; x=1724016100;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=plPNUwSc20wvxDLG/5V4ONiycR26+90aEQUlSgRQtvI=;
        b=G9oIfwa7GoYFzYn3wTsma+i4XDmg4+OVp0ppMOEabqc1GfZ6gOtphnSYG4OSE6Gdvg
         xNWi86CW2jJv1IFz20AhfJ6y7RSTOUdHjeoNAwBHDl6H5nVWmqFmWvCzHEMSDfHZIoaw
         kzxbM6J417EM9xsrQ7K9OQLltOUjCMglkhoqF41+82xwdcddKGuhtab8ICcc1syAcwpg
         Ddl5uc5AX31UiKW355DyBjYYYK6IAg02v5dPzoCL71a/fH586YEgh0pEews8XaKx68T6
         iK+DZ5q1qHJcgahY31wr5rQ9Y1m+dwvyw7TW4bfZDurv/xdbIrtOQ1JRc6gfZcCKTQKC
         FjOQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWocmHhUD1WtQb3Vh4dTWiQPGdgfRFXKoCLrVD5OkhSqjuZpug0TuHq3mszZN9lbwC1acsA5D/esUAzdAGnQz6KtDNCh02B7wcSPwgv
X-Gm-Message-State: AOJu0YxG3jaSoDaKaGxm3lytAXQRpTGZ+Pw9gU4IjV2Nt2IEun3Xr5fq
	0nIBQ+uY/jKlQVRr4y3wDrRM6xdOdZE0LYk3A9whPR7PiJoYuDrQBq0opDxMv9y4EdP1dSmKKy7
	nGw==
X-Google-Smtp-Source: 
 AGHT+IHabmrGk91oLaByBgnfM+5JbvP9zqcwTHpzXiVoA50OMZR/9U6IX+5qqzgzKEbG6UqMgpxrz3SgVrA=
X-Received: from yuzhao2.bld.corp.google.com
 ([2a00:79e0:2e28:6:c9c:12b4:a1e3:7f10])
 (user=yuzhao job=sendgmr) by 2002:a81:c64b:0:b0:62f:a56a:cee8 with SMTP id
 00721157ae682-69ec54adf13mr3154657b3.3.1723411300470; Sun, 11 Aug 2024
 14:21:40 -0700 (PDT)
Date: Sun, 11 Aug 2024 15:21:29 -0600
In-Reply-To: <20240811212129.3074314-1-yuzhao@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240811212129.3074314-1-yuzhao@google.com>
X-Mailer: git-send-email 2.46.0.76.ge559c4bf1a-goog
Message-ID: <20240811212129.3074314-4-yuzhao@google.com>
Subject: [PATCH mm-unstable v1 3/3] mm/hugetlb: use __GFP_COMP for gigantic
 folios
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>,
 Muchun Song <muchun.song@linux.dev>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, Zi Yan <ziy@nvidia.com>,
 linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Yu Zhao <yuzhao@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Use __GFP_COMP for gigantic folios to greatly reduce not only the code
but also the allocation and free time.

LOC (approximately): -200, +50

Allocate and free 500 1GB hugeTLB memory without HVO by:
  time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
  time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

       Before  After
Alloc  ~13s    ~10s
Free   ~15s    <1s

The above magnitude generally holds for multiple x86 and arm64 CPU
models.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/hugetlb.h |   9 +-
 mm/hugetlb.c            | 244 ++++++++--------------------------------
 2 files changed, 50 insertions(+), 203 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3100a52ceb73..98c47c394b89 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -896,10 +896,11 @@ static inline bool hugepage_movable_supported(struct =
hstate *h)
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
-	if (hugepage_movable_supported(h))
-		return GFP_HIGHUSER_MOVABLE;
-	else
-		return GFP_HIGHUSER;
+	gfp_t gfp =3D __GFP_COMP | __GFP_NOWARN;
+
+	gfp |=3D hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHU=
SER;
+
+	return gfp;
 }
=20
 static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mas=
k)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1c13e65ab119..691f63408d50 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1512,43 +1512,7 @@ static int hstate_next_node_to_free(struct hstate *h=
, nodemask_t *nodes_allowed)
 		((node =3D hstate_next_node_to_free(hs, mask)) || 1);	\
 		nr_nodes--)
=20
-/* used to demote non-gigantic_huge pages as well */
-static void __destroy_compound_gigantic_folio(struct folio *folio,
-					unsigned int order, bool demote)
-{
-	int i;
-	int nr_pages =3D 1 << order;
-	struct page *p;
-
-	atomic_set(&folio->_entire_mapcount, 0);
-	atomic_set(&folio->_large_mapcount, 0);
-	atomic_set(&folio->_pincount, 0);
-
-	for (i =3D 1; i < nr_pages; i++) {
-		p =3D folio_page(folio, i);
-		p->flags &=3D ~PAGE_FLAGS_CHECK_AT_FREE;
-		p->mapping =3D NULL;
-		clear_compound_head(p);
-		if (!demote)
-			set_page_refcounted(p);
-	}
-
-	__folio_clear_head(folio);
-}
-
-static void destroy_compound_hugetlb_folio_for_demote(struct folio *folio,
-					unsigned int order)
-{
-	__destroy_compound_gigantic_folio(folio, order, true);
-}
-
 #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
-static void destroy_compound_gigantic_folio(struct folio *folio,
-					unsigned int order)
-{
-	__destroy_compound_gigantic_folio(folio, order, false);
-}
-
 static void free_gigantic_folio(struct folio *folio, unsigned int order)
 {
 	/*
@@ -1569,38 +1533,52 @@ static void free_gigantic_folio(struct folio *folio=
, unsigned int order)
 static struct folio *alloc_gigantic_folio(struct hstate *h, gfp_t gfp_mask,
 		int nid, nodemask_t *nodemask)
 {
-	struct page *page;
-	unsigned long nr_pages =3D pages_per_huge_page(h);
+	struct folio *folio;
+	int order =3D huge_page_order(h);
+	bool retry =3D false;
+
 	if (nid =3D=3D NUMA_NO_NODE)
 		nid =3D numa_mem_id();
-
+retry:
+	folio =3D NULL;
 #ifdef CONFIG_CMA
 	{
 		int node;
=20
-		if (hugetlb_cma[nid]) {
-			page =3D cma_alloc(hugetlb_cma[nid], nr_pages,
-					huge_page_order(h), true);
-			if (page)
-				return page_folio(page);
-		}
+		if (hugetlb_cma[nid])
+			folio =3D cma_alloc_folio(hugetlb_cma[nid], order, gfp_mask);
=20
-		if (!(gfp_mask & __GFP_THISNODE)) {
+		if (!folio && !(gfp_mask & __GFP_THISNODE)) {
 			for_each_node_mask(node, *nodemask) {
 				if (node =3D=3D nid || !hugetlb_cma[node])
 					continue;
=20
-				page =3D cma_alloc(hugetlb_cma[node], nr_pages,
-						huge_page_order(h), true);
-				if (page)
-					return page_folio(page);
+				folio =3D cma_alloc_folio(hugetlb_cma[node], order, gfp_mask);
+				if (folio)
+					break;
 			}
 		}
 	}
 #endif
+	if (!folio) {
+		struct page *page =3D alloc_contig_pages(1 << order, gfp_mask, nid, node=
mask);
=20
-	page =3D alloc_contig_pages(nr_pages, gfp_mask, nid, nodemask);
-	return page ? page_folio(page) : NULL;
+		if (!page)
+			return NULL;
+
+		folio =3D page_folio(page);
+	}
+
+	if (folio_ref_freeze(folio, 1))
+		return folio;
+
+	pr_warn("HugeTLB: unexpected refcount on PFN %lu\n", folio_pfn(folio));
+	free_gigantic_folio(folio, order);
+	if (!retry) {
+		retry =3D true;
+		goto retry;
+	}
+	return NULL;
 }
=20
 #else /* !CONFIG_CONTIG_ALLOC */
@@ -1619,8 +1597,6 @@ static struct folio *alloc_gigantic_folio(struct hsta=
te *h, gfp_t gfp_mask,
 }
 static inline void free_gigantic_folio(struct folio *folio,
 						unsigned int order) { }
-static inline void destroy_compound_gigantic_folio(struct folio *folio,
-						unsigned int order) { }
 #endif
=20
 /*
@@ -1747,19 +1723,17 @@ static void __update_and_free_hugetlb_folio(struct =
hstate *h,
 		folio_clear_hugetlb_hwpoison(folio);
=20
 	folio_ref_unfreeze(folio, 1);
+	INIT_LIST_HEAD(&folio->_deferred_list);
=20
 	/*
 	 * Non-gigantic pages demoted from CMA allocated gigantic pages
 	 * need to be given back to CMA in free_gigantic_folio.
 	 */
 	if (hstate_is_gigantic(h) ||
-	    hugetlb_cma_folio(folio, huge_page_order(h))) {
-		destroy_compound_gigantic_folio(folio, huge_page_order(h));
+	    hugetlb_cma_folio(folio, huge_page_order(h)))
 		free_gigantic_folio(folio, huge_page_order(h));
-	} else {
-		INIT_LIST_HEAD(&folio->_deferred_list);
+	else
 		folio_put(folio);
-	}
 }
=20
 /*
@@ -2032,95 +2006,6 @@ static void prep_new_hugetlb_folio(struct hstate *h,=
 struct folio *folio, int ni
 	spin_unlock_irq(&hugetlb_lock);
 }
=20
-static bool __prep_compound_gigantic_folio(struct folio *folio,
-					unsigned int order, bool demote)
-{
-	int i, j;
-	int nr_pages =3D 1 << order;
-	struct page *p;
-
-	__folio_clear_reserved(folio);
-	for (i =3D 0; i < nr_pages; i++) {
-		p =3D folio_page(folio, i);
-
-		/*
-		 * For gigantic hugepages allocated through bootmem at
-		 * boot, it's safer to be consistent with the not-gigantic
-		 * hugepages and clear the PG_reserved bit from all tail pages
-		 * too.  Otherwise drivers using get_user_pages() to access tail
-		 * pages may get the reference counting wrong if they see
-		 * PG_reserved set on a tail page (despite the head page not
-		 * having PG_reserved set).  Enforcing this consistency between
-		 * head and tail pages allows drivers to optimize away a check
-		 * on the head page when they need know if put_page() is needed
-		 * after get_user_pages().
-		 */
-		if (i !=3D 0)	/* head page cleared above */
-			__ClearPageReserved(p);
-		/*
-		 * Subtle and very unlikely
-		 *
-		 * Gigantic 'page allocators' such as memblock or cma will
-		 * return a set of pages with each page ref counted.  We need
-		 * to turn this set of pages into a compound page with tail
-		 * page ref counts set to zero.  Code such as speculative page
-		 * cache adding could take a ref on a 'to be' tail page.
-		 * We need to respect any increased ref count, and only set
-		 * the ref count to zero if count is currently 1.  If count
-		 * is not 1, we return an error.  An error return indicates
-		 * the set of pages can not be converted to a gigantic page.
-		 * The caller who allocated the pages should then discard the
-		 * pages using the appropriate free interface.
-		 *
-		 * In the case of demote, the ref count will be zero.
-		 */
-		if (!demote) {
-			if (!page_ref_freeze(p, 1)) {
-				pr_warn("HugeTLB page can not be used due to unexpected inflated ref c=
ount\n");
-				goto out_error;
-			}
-		} else {
-			VM_BUG_ON_PAGE(page_count(p), p);
-		}
-		if (i !=3D 0)
-			set_compound_head(p, &folio->page);
-	}
-	__folio_set_head(folio);
-	/* we rely on prep_new_hugetlb_folio to set the hugetlb flag */
-	folio_set_order(folio, order);
-	atomic_set(&folio->_entire_mapcount, -1);
-	atomic_set(&folio->_large_mapcount, -1);
-	atomic_set(&folio->_pincount, 0);
-	return true;
-
-out_error:
-	/* undo page modifications made above */
-	for (j =3D 0; j < i; j++) {
-		p =3D folio_page(folio, j);
-		if (j !=3D 0)
-			clear_compound_head(p);
-		set_page_refcounted(p);
-	}
-	/* need to clear PG_reserved on remaining tail pages  */
-	for (; j < nr_pages; j++) {
-		p =3D folio_page(folio, j);
-		__ClearPageReserved(p);
-	}
-	return false;
-}
-
-static bool prep_compound_gigantic_folio(struct folio *folio,
-							unsigned int order)
-{
-	return __prep_compound_gigantic_folio(folio, order, false);
-}
-
-static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
-							unsigned int order)
-{
-	return __prep_compound_gigantic_folio(folio, order, true);
-}
-
 /*
  * Find and lock address space (mapping) in write mode.
  *
@@ -2159,7 +2044,6 @@ static struct folio *alloc_buddy_hugetlb_folio(struct=
 hstate *h,
 	 */
 	if (node_alloc_noretry && node_isset(nid, *node_alloc_noretry))
 		alloc_try_hard =3D false;
-	gfp_mask |=3D __GFP_COMP|__GFP_NOWARN;
 	if (alloc_try_hard)
 		gfp_mask |=3D __GFP_RETRY_MAYFAIL;
 	if (nid =3D=3D NUMA_NO_NODE)
@@ -2206,48 +2090,14 @@ static struct folio *alloc_buddy_hugetlb_folio(stru=
ct hstate *h,
 	return folio;
 }
=20
-static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
-				gfp_t gfp_mask, int nid, nodemask_t *nmask,
-				nodemask_t *node_alloc_noretry)
-{
-	struct folio *folio;
-	bool retry =3D false;
-
-retry:
-	if (hstate_is_gigantic(h))
-		folio =3D alloc_gigantic_folio(h, gfp_mask, nid, nmask);
-	else
-		folio =3D alloc_buddy_hugetlb_folio(h, gfp_mask,
-				nid, nmask, node_alloc_noretry);
-	if (!folio)
-		return NULL;
-
-	if (hstate_is_gigantic(h)) {
-		if (!prep_compound_gigantic_folio(folio, huge_page_order(h))) {
-			/*
-			 * Rare failure to convert pages to compound page.
-			 * Free pages and try again - ONCE!
-			 */
-			free_gigantic_folio(folio, huge_page_order(h));
-			if (!retry) {
-				retry =3D true;
-				goto retry;
-			}
-			return NULL;
-		}
-	}
-
-	return folio;
-}
-
 static struct folio *only_alloc_fresh_hugetlb_folio(struct hstate *h,
 		gfp_t gfp_mask, int nid, nodemask_t *nmask,
 		nodemask_t *node_alloc_noretry)
 {
 	struct folio *folio;
=20
-	folio =3D __alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask,
-						node_alloc_noretry);
+	folio =3D hstate_is_gigantic(h) ? alloc_gigantic_folio(h, gfp_mask, nid, =
nmask) :
+		alloc_buddy_hugetlb_folio(h, gfp_mask, nid, nmask, node_alloc_noretry);
 	if (folio)
 		init_new_hugetlb_folio(h, folio);
 	return folio;
@@ -2265,7 +2115,8 @@ static struct folio *alloc_fresh_hugetlb_folio(struct=
 hstate *h,
 {
 	struct folio *folio;
=20
-	folio =3D __alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
+	folio =3D hstate_is_gigantic(h) ? alloc_gigantic_folio(h, gfp_mask, nid, =
nmask) :
+		alloc_buddy_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
 	if (!folio)
 		return NULL;
=20
@@ -2549,9 +2400,8 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(str=
uct hstate *h,
=20
 	nid =3D huge_node(vma, addr, gfp_mask, &mpol, &nodemask);
 	if (mpol_is_preferred_many(mpol)) {
-		gfp_t gfp =3D gfp_mask | __GFP_NOWARN;
+		gfp_t gfp =3D gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
=20
-		gfp &=3D  ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
 		folio =3D alloc_surplus_hugetlb_folio(h, gfp, nid, nodemask);
=20
 		/* Fallback to all nodes if page=3D=3DNULL */
@@ -3333,6 +3183,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(st=
ruct folio *folio,
 	for (pfn =3D head_pfn + start_page_number; pfn < end_pfn; pfn++) {
 		struct page *page =3D pfn_to_page(pfn);
=20
+		__ClearPageReserved(folio_page(folio, pfn - head_pfn));
 		__init_single_page(page, pfn, zone, nid);
 		prep_compound_tail((struct page *)folio, pfn - head_pfn);
 		ret =3D page_ref_freeze(page, 1);
@@ -3949,21 +3800,16 @@ static long demote_free_hugetlb_folios(struct hstat=
e *src, struct hstate *dst,
 			continue;
=20
 		list_del(&folio->lru);
-		/*
-		 * Use destroy_compound_hugetlb_folio_for_demote for all huge page
-		 * sizes as it will not ref count folios.
-		 */
-		destroy_compound_hugetlb_folio_for_demote(folio, huge_page_order(src));
+
+		split_page_owner(&folio->page, huge_page_order(src), huge_page_order(dst=
));
+		pgalloc_tag_split(&folio->page, 1 <<  huge_page_order(src));
=20
 		for (i =3D 0; i < pages_per_huge_page(src); i +=3D pages_per_huge_page(d=
st)) {
 			struct page *page =3D folio_page(folio, i);
=20
-			if (hstate_is_gigantic(dst))
-				prep_compound_gigantic_folio_for_demote(page_folio(page),
-									dst->order);
-			else
-				prep_compound_page(page, dst->order);
-			set_page_private(page, 0);
+			page->mapping =3D NULL;
+			clear_compound_head(page);
+			prep_compound_page(page, dst->order);
=20
 			init_new_hugetlb_folio(dst, page_folio(page));
 			list_add(&page->lru, &dst_list);
--=20
2.46.0.76.ge559c4bf1a-goog