From nobody Mon Jun  8 08:36:53 2026
Received: from outbound.mr.icloud.com (mr-2001f-snip4-5.eps.apple.com
 [57.103.68.58])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B921522259F
	for <linux-kernel@vger.kernel.org>; Sun, 31 May 2026 04:27:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=57.103.68.58
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780201671; cv=none;
 b=QNdFKsXGAtE2xXnaEPdPCPM71T1ogPceWC5YxotbdnYSy3FlGOPeub/YFUhgBOPBqEhbls0njysXEXNmUlr911y5H4VH20LmT3/AHoXf7yp8Dz5/R6L1JPEmyhI01/c0KYsJY1cVVJJeKZaWMitnS+CfGWrBevhjFuaAed5Jp40=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780201671; c=relaxed/simple;
	bh=084sMr3iQzhpJqGMXyuB8m7l4qfHvL4h3EmXTtfZBiY=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=WHiHEIP4JzN9rzUHIV3AoJQ4JL/7hiQjJBpR8vhRba2Ltd3AM3IO/PJoQiTDTT+1B5T6xki1ftoQuxNBxZijvrdqj7enrGM8G4Us1iKhfL6dOjELvuRQgJ3hr1pMaaOJuAA6CY4RE3NWe/4F9bcB2/MHG7ZCDa+Dm35axjrD7ss=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com;
 spf=pass smtp.mailfrom=icloud.com;
 dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b=mVBpWhAU; arc=none smtp.client-ip=57.103.68.58
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b="mVBpWhAU"
Received: from outbound.mr.icloud.com (unknown [127.0.0.2])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPS id
 4819C1800158;
	Sun, 31 May 2026 04:27:45 +0000 (UTC)
X-ICL-Out-Info: 
 HUtFAUMEWwJACUgBTUQeDx5WFlZNRAJCTQhJB0MFXwteDUAdVAVLVxQEFEYGVg1dE0wLcwRUB10FXVZQAlpLVBQEEVABWB5WXloXXk1FCA9CAVhbCFsEDx9MDFECQgVWXlQKHQRUB10FXVZQAlpLQgRLRWhcBVwcQBdIHV9qS1YUBBFQAVgeVl5aF15NWgJWTQVKA18BWwdDCFVHBUc0UR9VFFIdRA5tGFAWR0BBWh9BFEAFWwRYCxNdTFBfVitGFVcbVgNDRVEfVEYTGU4bV01QG18CQg8=
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icloud.com; s=1a1hai;
 t=1780201669; x=1782793669; bh=8oGvyeNEYGpSfA/4MLm942YXzmR0k9AWn0ouuVGHwI8=;
 h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:x-icloud-hme;
 b=mVBpWhAU74vlRlwTdWZFB5cADCcKL72fGpcQOLiXThFFL/fCQydsljYkaAcyoTz4ZOlEarh2qIu45z9UAQG8KBkhP9JHGyNkGMC7nDvci0NnO/j3xUJRzoDH7lc7jbo4uSUxSVbjl6n/6SYrkb0H/76K64MDr+kCxsB15E7hZpPhbGvI6Rxa7aeIh781NGr7SdfCRwOWEeHpMB9XeMG5XvCtNRMPjP+0/BCvcLLF26IKhdNK26VI7MeFKb4reymh6eoN42HCpqNQG7y58dFwxeZSyo8T0fKPVusxvJiHoNgVeBQWp/hds0f008wpxIBKZX6WwrQ0mWq+56VLxgYv4w==
Received: from [127.0.0.1] (unknown [17.57.152.38])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPSA id
 9F6201800119;
	Sun, 31 May 2026 04:27:34 +0000 (UTC)
From: Luka Bai <lukafocus@icloud.com>
Date: Sun, 31 May 2026 12:27:17 +0800
Subject: [PATCH 1/5] mm/khugepaged: add framework for khugepaged collapse
 hint
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20260531-thp_collapse_hint-v1-1-866339cd4c2a@tencent.com>
References: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
In-Reply-To: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
 David Hildenbrand <david@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
 Zi Yan <ziy@nvidia.com>, Baolin Wang <baolin.wang@linux.alibaba.com>,
 "Liam R. Howlett" <liam@infradead.org>, Nico Pache <npache@redhat.com>,
 Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
 Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
 Vlastimil Babka <vbabka@kernel.org>, Mike Rapoport <rppt@kernel.org>,
 Suren Baghdasaryan <surenb@google.com>, Michal Hocko <mhocko@suse.com>,
 Kairui Song <kasong@tencent.com>, Qi Zheng <qi.zheng@linux.dev>,
 Shakeel Butt <shakeel.butt@linux.dev>,
 Axel Rasmussen <axelrasmussen@google.com>, Yuanchu Xie <yuanchu@google.com>,
 Wei Xu <weixugc@google.com>, Rik van Riel <riel@surriel.com>,
 Harry Yoo <harry@kernel.org>, Jann Horn <jannh@google.com>,
 Johannes Weiner <hannes@cmpxchg.org>, linux-kernel@vger.kernel.org,
 Luka Bai <lukabai@tencent.com>
X-Mailer: b4 0.15.2
X-Developer-Signature: v=1; a=ed25519-sha256; t=1780201643; l=20295;
 i=lukabai@tencent.com; s=20260501; h=from:subject:message-id;
 bh=Qc6j7M54Jdgp5Rr/qpD9Ky5jlwdbb7ICSoG/i3XfXBk=;
 b=pNNF5Uh/IkP1VMu0QtLH/9u5na1YR9uuBGE0RujWThhBjAlfDjzNUGZg/xAiXGn+GHGREofIW
 nwZUwJtqxZ9DXQvZDRrhZSNJyjpy4+7ie7rI755fB/WY5PPqEqmwlfY
X-Developer-Key: i=lukabai@tencent.com; a=ed25519;
 pk=KeaVteSWd00GIAjFyWZnuFsKAKixjga1ZkLMcI66nPM=
X-Proofpoint-ORIG-GUID: CRy-x07Sgl5wvKKtz6DT6bjsmscF4q-U
X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTMxMDA0NiBTYWx0ZWRfXySqhpiHcFi9l
 +uAwyxXwYAf3ItjJYfX9hGaNV1+1Nn9bTpmFXK25Zg24OlSznqX3O9WPvVV3sC7gwQbe03MmZfQ
 FWqLTa7CFWooqPujABORjVZPlkwoaQH7QK9iGos1VxYJKgnzYQhjtnDm/bAH7jyeHKyHgU6Lh4r
 iTdZ0ALxs/HPULds4yveoj44CnzQHzAOQx+xwKYKGIKdpYTQAFk3W/C293EYrivf2VortYAzaY+
 SXYZvOO6zMcW4pm1LDmPKiRAoEXJ96kj/nvLCnfu49NW47KX/c6LqzBlsffPad32cNDZ7MEyYMu
 49H7yfzI1YetwJwn3+koVuXkQYl4T0rDl7AWx6G/Ua1uNIusvN9mso1mB+tak4=
X-Proofpoint-GUID: CRy-x07Sgl5wvKKtz6DT6bjsmscF4q-U
X-Authority-Info-Out: v=2.4 cv=ULfQ3Sfy c=1 sm=1 tr=0 ts=6a1bb8c4
 cx=c_apl:c_pps:t_out a=9OgfyREA4BUYbbCgc0Y0oA==:117
 a=9OgfyREA4BUYbbCgc0Y0oA==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10
 a=x7bEGLp0ZPQA:10 a=UaoJkeuwEpQA:10 a=VkNPw1HP01LnGYTKEx00:22
 a=GvQkQWPkAAAA:8 a=ptP84xlJL3H7308lbwQA:9 a=QEXdDO2ut3YA:10

From: Luka Bai <lukabai@tencent.com>

Currently we just have a simple Round-Robin scanning for all the
feasible mm_structs in khugepaged to do collapsing. It is not very
efficient when memory space is huge, and it may waste precious
large folio resources on some cold memory areas that are seldomly
accessed. While at the same time, khugepaged is a very useful tool
for asynchronous large folio merging.

So we introduced khugepaged collapse hint framework in this patch
to try to give khugepaged some priorities for the hot memory areas
when doing collapsing. The hot area indications are regarded as
"collapse hint". Each "collapse hint" has an address and a vma
associated with it to represent a specific hot area that is
preferred to be collapsed. All these hints are aggregated by both
priority and their belonging mm_struct. When khugepaged tries to
collapse, it will first scan the global priority queues that store
these hints, and find the first khugepaged_mm_slot (We added struct
khugepaged_mm_slot and wrapped the old mm_slot for each mm_struct
inside it) that has hints inside it, then try to do collapse on
the address given by the hint. One example is like below (the
mm_slot represents khugepaged_mm_slot I mentioned above):

prio 0 ------()----------------------------------()---------------
            mm_slot0(process A)                mm_slot1(process B)
               |                                               |
           hint0---hint1---hint2---hint3       hint4---hint5---hint6

prio 1 ------()----------------------------------()---------------
            mm_slot0(process A)                mm_slot1(process B)
               |                                               |
             -------                                       hint7---hint8

The khugepaged will firstly try to scan queue of prio 0 (lower prio
number means higher priority), then go through the list, and check
the first khugepaged_mm_slot, which is mm_slot0, then go through
all the hints in it (hint0 ~ hint3 in the above graph). After handling
this hint (no mater success or fail for collapsing), the hint will be
deleted. If one khugepaged_mm_slot doesn't have any hints in it,
khugepaged will scan the next mm_slot; if there is no hint in prio 0
anymore, khugepaged will scan prio 1; if there is no hints in any
prio queues, then it will fallback to do Round-Robin scanning like
before.

We added a number of NR_KHUGEPAGED_PRIORITY_LEVEL(which is 2 currently)
struct khugepaged_collapse_requests into each struct khugepaged_mm_slot.
Each struct khugepaged_collapse_requests is used for this mm_struct
to be put into the global priority queue. We give each mm_struct a node
in each priority queue for hint dispersion and balancing that may be
introduced in the future and for a better lock pattern. Currently the
khugepaged_collapse_requests[] are linked into the global queues in
__khugepaged_enter() and will live there a lifetime of the mm_struct.

Caller can call khugepaged_add_collapse_hint() to add a new hint for a
specific mm_struct. There is still no callers introduced in this patch.
We will add callers in the following patches.

Signed-off-by: Luka Bai <lukabai@tencent.com>
---
 include/linux/khugepaged.h |  13 ++
 mm/khugepaged.c            | 348 +++++++++++++++++++++++++++++++++++++++++=
+++-
 2 files changed, 355 insertions(+), 6 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index d7a9053ff4fe..815ae87f0f8e 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -17,6 +17,10 @@ extern void khugepaged_enter_vma(struct vm_area_struct *=
vma,
 				 vm_flags_t vm_flags);
 extern void khugepaged_min_free_kbytes_update(void);
 extern bool current_is_khugepaged(void);
+extern void khugepaged_add_collapse_hint(struct mm_struct *mm,
+					struct vm_area_struct *vma,
+					unsigned long address,
+					int priority, int max_order);
 void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
 		bool install_pmd);
=20
@@ -31,6 +35,9 @@ static inline void khugepaged_exit(struct mm_struct *mm)
 	if (mm_flags_test(MMF_VM_HUGEPAGE, mm))
 		__khugepaged_exit(mm);
 }
+
+#define NR_KHUGEPAGED_PRIORITY_LEVEL 2
+
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct =
*oldmm)
 {
@@ -55,6 +62,12 @@ static inline bool current_is_khugepaged(void)
 {
 	return false;
 }
+static inline void khugepaged_add_collapse_hint(struct mm_struct *mm,
+					       struct vm_area_struct *vma,
+					       unsigned long address,
+					       int priority, int max_order)
+{
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
=20
 #endif /* _LINUX_KHUGEPAGED_H */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 35a5f8c44c18..5090ffae73f3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -99,6 +99,8 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLO=
TS_HASH_BITS);
=20
 static struct kmem_cache *mm_slot_cache __ro_after_init;
=20
+#define KHUGEPAGED_PRIORITY_QUEUE_MAX_FAIL 10
+
 #define KHUGEPAGED_MIN_MTHP_ORDER	2
 /*
  * mthp_collapse() does an iterative DFS over a binary tree, from
@@ -160,6 +162,53 @@ static struct khugepaged_scan khugepaged_scan =3D {
 	.mm_head =3D LIST_HEAD_INIT(khugepaged_scan.mm_head),
 };
=20
+/**
+ * struct khugepaged_collapse_hint - one collapse hint for a specific addr=
ess
+ * @node:    list node on khugepaged_collapse_requests.hints
+ * @vma:     hint pointer to the target VMA
+ * @address: PMD-aligned virtual address inside @vma to attempt collapsing=
 on
+ */
+struct khugepaged_collapse_hint {
+	struct list_head node;
+	struct vm_area_struct *vma;
+	unsigned long address;
+};
+
+/**
+ * struct khugepaged_collapse_requests - per-mm, per-priority collapse hin=
ts list
+ * @node:    list node on the matching khugepaged_priority_queue[] list
+ * @hints:   list of pending struct khugepaged_collapse_hint for this mm at
+ *           this priority level
+ *
+ * Each khugepaged_mm_slot embeds one request struct per priority level. At
+ * __khugepaged_enter() time, every request is added to the corresponding
+ * khugepaged_priority_queue[] list and stays on that list until the mm
+ * exits khugepaged. While queued, hints for the mm at a given priority are
+ * appended to that priority's @hints;
+ */
+struct khugepaged_collapse_requests {
+	struct list_head node;
+	struct list_head hints;
+};
+
+/**
+ * struct khugepaged_mm_slot - khugepaged information per mm that is being=
 scanned
+ * @slot:    hash lookup from mm to mm_slot
+ * @request: per-mm collapse requests, one per priority level, each linked
+ *           into the corresponding khugepaged_priority_queue[] list
+ */
+struct khugepaged_mm_slot {
+	struct mm_slot slot;
+	struct khugepaged_collapse_requests request[NR_KHUGEPAGED_PRIORITY_LEVEL];
+};
+
+/*
+ * One queue per priority level. Lower index means higher priority. The
+ * scanner drains queues in ascending index order, so all hints at higher
+ * priority are processed before any hint at a lower priority.
+ */
+static struct list_head khugepaged_priority_queue[NR_KHUGEPAGED_PRIORITY_L=
EVEL];
+
 #ifdef CONFIG_SYSFS
 static ssize_t scan_sleep_millisecs_show(struct kobject *kobj,
 					 struct kobj_attribute *attr,
@@ -500,10 +549,15 @@ int hugepage_madvise(struct vm_area_struct *vma,
=20
 int __init khugepaged_init(void)
 {
-	mm_slot_cache =3D KMEM_CACHE(mm_slot, 0);
+	int i;
+
+	mm_slot_cache =3D KMEM_CACHE(khugepaged_mm_slot, 0);
 	if (!mm_slot_cache)
 		return -ENOMEM;
=20
+	for (i =3D 0; i < NR_KHUGEPAGED_PRIORITY_LEVEL; i++)
+		INIT_LIST_HEAD(&khugepaged_priority_queue[i]);
+
 	khugepaged_pages_to_scan =3D HPAGE_PMD_NR * 8;
 	khugepaged_max_ptes_none =3D KHUGEPAGED_MAX_PTES_LIMIT;
 	khugepaged_max_ptes_swap =3D HPAGE_PMD_NR / 8;
@@ -560,21 +614,27 @@ static bool hugepage_enabled(void)
=20
 void __khugepaged_enter(struct mm_struct *mm)
 {
+	struct khugepaged_mm_slot *khp_mm_slot;
 	struct mm_slot *slot;
 	int wakeup;
+	int i;
=20
 	/* __khugepaged_exit() must not run from under us */
 	VM_BUG_ON_MM(collapse_test_exit(mm), mm);
=20
-	slot =3D mm_slot_alloc(mm_slot_cache);
-	if (!slot)
+	khp_mm_slot =3D mm_slot_alloc(mm_slot_cache);
+	if (!khp_mm_slot)
 		return;
=20
 	if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm))) {
-		mm_slot_free(mm_slot_cache, slot);
+		mm_slot_free(mm_slot_cache, khp_mm_slot);
 		return;
 	}
=20
+	slot =3D &khp_mm_slot->slot;
+	for (i =3D 0; i < NR_KHUGEPAGED_PRIORITY_LEVEL; i++)
+		INIT_LIST_HEAD(&khp_mm_slot->request[i].hints);
+
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot_insert(mm_slots_hash, mm, slot);
 	/*
@@ -583,6 +643,12 @@ void __khugepaged_enter(struct mm_struct *mm)
 	 */
 	wakeup =3D list_empty(&khugepaged_scan.mm_head);
 	list_add_tail(&slot->mm_node, &khugepaged_scan.mm_head);
+	/*
+	 * Link this mm into every priority queue.
+	 */
+	for (i =3D 0; i < NR_KHUGEPAGED_PRIORITY_LEVEL; i++)
+		list_add_tail(&khp_mm_slot->request[i].node,
+			      &khugepaged_priority_queue[i]);
 	spin_unlock(&khugepaged_mm_lock);
=20
 	mmgrab(mm);
@@ -613,23 +679,59 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 		__khugepaged_enter(vma->vm_mm);
 }
=20
+static void khugepaged_release_collapse_hints(
+			  struct khugepaged_collapse_requests *req)
+{
+	struct khugepaged_collapse_hint *hint, *tmp;
+
+	list_for_each_entry_safe(hint, tmp, &req->hints, node) {
+		list_del(&hint->node);
+		kfree(hint);
+	}
+}
+
+/*
+ * Caller must hold khugepaged_mm_lock when removing the request nodes from
+ * the priority queues;
+ */
+static void khugepaged_remove_priority_requests(struct khugepaged_mm_slot =
*khp_mm_slot)
+{
+	int i;
+
+	lockdep_assert_held(&khugepaged_mm_lock);
+	for (i =3D 0; i < NR_KHUGEPAGED_PRIORITY_LEVEL; i++)
+		list_del(&khp_mm_slot->request[i].node);
+}
+
+static void khugepaged_release_all_hints(struct khugepaged_mm_slot *khp_mm=
_slot)
+{
+	int i;
+
+	for (i =3D 0; i < NR_KHUGEPAGED_PRIORITY_LEVEL; i++)
+		khugepaged_release_collapse_hints(&khp_mm_slot->request[i]);
+}
+
 void __khugepaged_exit(struct mm_struct *mm)
 {
+	struct khugepaged_mm_slot *khp_mm_slot =3D NULL;
 	struct mm_slot *slot;
 	int free =3D 0;
=20
 	spin_lock(&khugepaged_mm_lock);
 	slot =3D mm_slot_lookup(mm_slots_hash, mm);
 	if (slot && khugepaged_scan.mm_slot !=3D slot) {
+		khp_mm_slot =3D mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
 		hash_del(&slot->hash);
 		list_del(&slot->mm_node);
+		khugepaged_remove_priority_requests(khp_mm_slot);
 		free =3D 1;
 	}
 	spin_unlock(&khugepaged_mm_lock);
=20
 	if (free) {
 		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
-		mm_slot_free(mm_slot_cache, slot);
+		khugepaged_release_all_hints(khp_mm_slot);
+		mm_slot_free(mm_slot_cache, khp_mm_slot);
 		mmdrop(mm);
 	} else if (slot) {
 		/*
@@ -1804,6 +1906,8 @@ static enum scan_result collapse_scan_pmd(struct mm_s=
truct *mm,
=20
 static void collect_mm_slot(struct mm_slot *slot)
 {
+	struct khugepaged_mm_slot *khp_mm_slot =3D
+		mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
 	struct mm_struct *mm =3D slot->mm;
=20
 	lockdep_assert_held(&khugepaged_mm_lock);
@@ -1812,6 +1916,7 @@ static void collect_mm_slot(struct mm_slot *slot)
 		/* free mm_slot */
 		hash_del(&slot->hash);
 		list_del(&slot->mm_node);
+		khugepaged_remove_priority_requests(khp_mm_slot);
=20
 		/*
 		 * Not strictly needed because the mm exited already.
@@ -1820,7 +1925,8 @@ static void collect_mm_slot(struct mm_slot *slot)
 		 */
=20
 		/* khugepaged_mm_lock actually not necessary for the below */
-		mm_slot_free(mm_slot_cache, slot);
+		khugepaged_release_all_hints(khp_mm_slot);
+		mm_slot_free(mm_slot_cache, khp_mm_slot);
 		mmdrop(mm);
 	}
 }
@@ -2848,6 +2954,211 @@ static enum scan_result collapse_single_pmd(unsigne=
d long addr,
 	return result;
 }
=20
+/*
+ * khugepaged_add_collapse_hint - enqueue a collapse hint
+ * @mm:          target mm
+ * @vma:         hint pointer to the VMA covering @address (treated as a h=
int)
+ * @address:     virtual address; rounded down to HPAGE_PMD_SIZE
+ * @priority:    priority bucket the hint should land in. Lower number =3D=
=3D higher
+ *               priority; must be in [0, NR_KHUGEPAGED_PRIORITY_LEVEL).
+ * @max_order:   max order of continuous pt entries inside this target pmd=
, used
+ *               to decide whether we need to collapse it.
+ *
+ * Tell khugepaged to prioritize collapsing the PMD covering @address in @=
mm.
+ * The next time collapse_scan_mm_slot() runs it will drain these entries
+ * before the regular round-robin scan, walking priority queues from
+ * highest priority (lowest index) to lowest.
+ *
+ * Hints are aggregated per-mm and per-priority: __khugepaged_enter()
+ * pre-installs one collapse_request per priority level on the matching
+ * khugepaged_priority_queue[] list, and this function appends a
+ * (vma, address) hint to the request that matches @priority.
+ *
+ * Caller must keep @vma alive across this call (mmap_lock, per-VMA lock,
+ * or a corresponding rmap-side lock such as anon_vma_lock_read /
+ * i_mmap_lock_read are all sufficient).
+ *
+ * @vma->vm_flags is read with collapse_allowable_orders().  When the
+ * caller does not hold mmap_lock or a per-VMA lock, the result is
+ * advisory; the real validation happens later in
+ * collapse_scan_one_priority_entry() under mmap_read_lock.
+ *
+ * Caller must also guarantee @mm is alive across this call so the underly=
ing
+ * mm_slot cannot be freed while we append.
+ */
+void khugepaged_add_collapse_hint(struct mm_struct *mm,
+				 struct vm_area_struct *vma,
+				 unsigned long address,
+				 int priority, int max_order)
+{
+	struct khugepaged_mm_slot *khp_mm_slot;
+	struct khugepaged_collapse_hint *hint;
+	struct mm_slot *slot;
+	int orders;
+
+	if (!mm || !vma)
+		return;
+	if (priority < 0 || priority >=3D NR_KHUGEPAGED_PRIORITY_LEVEL)
+		return;
+
+	orders =3D collapse_allowable_orders(vma, vma->vm_flags, TVA_KHUGEPAGED);
+	if (highest_order(orders) <=3D max_order)
+		return;
+
+	/*
+	 * Make sure the mm is enrolled in khugepaged so that its embedded
+	 * collapse_request[] entries are on khugepaged_priority_queue[].
+	 */
+	khugepaged_enter_vma(vma, vma->vm_flags);
+	if (!mm_flags_test(MMF_VM_HUGEPAGE, mm))
+		return;
+
+	hint =3D kmalloc_obj(struct khugepaged_collapse_hint);
+	if (!hint)
+		return;
+
+	hint->vma =3D vma;
+	hint->address =3D address & HPAGE_PMD_MASK;
+
+	/*
+	 * Just use try lock to avoid lock contention because collapse hints are
+	 * just "best-effort" optimization.
+	 */
+	if (!spin_trylock(&khugepaged_mm_lock)) {
+		kfree(hint);
+		return;
+	}
+
+	slot =3D mm_slot_lookup(mm_slots_hash, mm);
+	if (!slot) {
+		spin_unlock(&khugepaged_mm_lock);
+		kfree(hint);
+		return;
+	}
+	khp_mm_slot =3D mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
+	list_add_tail(&hint->node, &khp_mm_slot->request[priority].hints);
+	spin_unlock(&khugepaged_mm_lock);
+
+	wake_up_interruptible(&khugepaged_wait);
+}
+
+/*
+ * Each enrolled mm owns one request struct per priority level, all of whi=
ch
+ * live on the matching khugepaged_priority_queue[] list for the lifetime =
of
+ * the mm_slot. The caller iterates priorities from highest to lowest, and
+ * call collapse_scan_one_priority_entry() to process all mms at this prio=
rity,
+ * and handle pending collapse hints for each mm. Repeat until either
+ * @progress_max is reached, the per-mm-slot failure exceeds certain thres=
hold,
+ * or no hints remain for this mm at this priority.
+ *
+ * Caller must hold khugepaged_mm_lock.
+ *
+ * Returns 1 if an mm was processed at this priority, 0 if no mm on
+ * khugepaged_priority_queue[@priority] had any pending hints.
+ */
+static int collapse_scan_one_priority_entry(unsigned int progress_max,
+					     enum scan_result *result,
+					     struct collapse_control *cc,
+					     int priority,
+					     int *fail_count)
+	__releases(&khugepaged_mm_lock)
+	__acquires(&khugepaged_mm_lock)
+{
+	struct khugepaged_collapse_requests *iter_req;
+	struct khugepaged_mm_slot *khp_mm_slot =3D NULL, *iter_slot;
+	struct mm_struct *mm =3D NULL;
+	bool lock_dropped =3D true;
+
+	/*
+	 * We have to call mmget_not_zero() under khugepaged_mm_lock so that
+	 * __khugepaged_exit() cannot free the embedding khugepaged_mm_slot from
+	 * under us once we drop the spinlock.
+	 */
+	list_for_each_entry(iter_req, &khugepaged_priority_queue[priority], node)=
 {
+		if (list_empty(&iter_req->hints))
+			continue;
+		iter_slot =3D container_of(iter_req, struct khugepaged_mm_slot,
+				    request[priority]);
+		if (mmget_not_zero(iter_slot->slot.mm)) {
+			khp_mm_slot =3D iter_slot;
+			mm =3D iter_slot->slot.mm;
+			break;
+		}
+	}
+	if (!khp_mm_slot)
+		return 0;
+
+	spin_unlock(&khugepaged_mm_lock);
+
+	/*
+	 * Drain hints for this mm while we hold mmap_read_lock.
+	 * collapse_single_pmd() may drop the mmap_lock; if so, try once to
+	 * retake it for the next hint.
+	 */
+	while (cc->progress < progress_max &&
+	       *fail_count < KHUGEPAGED_PRIORITY_QUEUE_MAX_FAIL) {
+		struct khugepaged_collapse_hint *hint =3D NULL;
+		struct vm_area_struct *vma;
+		unsigned long addr;
+
+		if (lock_dropped) {
+			if (!mmap_read_trylock(mm)) {
+				(*fail_count)++;
+				continue;
+			}
+			lock_dropped =3D false;
+		}
+
+		spin_lock(&khugepaged_mm_lock);
+		if (!list_empty(&khp_mm_slot->request[priority].hints)) {
+			hint =3D list_first_entry(&khp_mm_slot->request[priority].hints,
+						struct khugepaged_collapse_hint,
+						node);
+			list_del(&hint->node);
+		}
+		spin_unlock(&khugepaged_mm_lock);
+
+		if (!hint)
+			break;
+
+		cc->progress++;
+		addr =3D hint->address;
+
+		if (unlikely(collapse_test_exit_or_disable(mm))) {
+			kfree(hint);
+			break;
+		}
+
+		/*
+		 * Re-validate the cached VMA hint under mmap_read_lock. If the
+		 * address is now covered by a different VMA, or no VMA at all,
+		 * drop the entry. Note that the vma may be a different object
+		 * than the one passed in at enqueue time, but that's a false
+		 * positive that we can safely ignore.
+		 */
+		vma =3D vma_lookup(mm, addr);
+		if (!vma || vma !=3D hint->vma)
+			goto skip_hint;
+		if (!collapse_allowable_orders(vma, vma->vm_flags, TVA_KHUGEPAGED))
+			goto skip_hint;
+		if (addr < ALIGN(vma->vm_start, HPAGE_PMD_SIZE) ||
+		    addr + HPAGE_PMD_SIZE > ALIGN_DOWN(vma->vm_end, HPAGE_PMD_SIZE))
+			goto skip_hint;
+
+		*result =3D collapse_single_pmd(addr, vma, &lock_dropped, cc);
+		if (*result !=3D SCAN_SUCCEED)
+			(*fail_count)++;
+skip_hint:
+		kfree(hint);
+	}
+
+	if (!lock_dropped)
+		mmap_read_unlock(mm);
+	mmput(mm);
+	spin_lock(&khugepaged_mm_lock);
+	return 1;
+}
+
 static void collapse_scan_mm_slot(unsigned int progress_max,
 		enum scan_result *result, struct collapse_control *cc)
 	__releases(&khugepaged_mm_lock)
@@ -2858,10 +3169,35 @@ static void collapse_scan_mm_slot(unsigned int prog=
ress_max,
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	unsigned int progress_prev =3D cc->progress;
+	int priority_queue_fail_times =3D 0;
+	int prio;
=20
 	lockdep_assert_held(&khugepaged_mm_lock);
 	*result =3D SCAN_FAIL;
=20
+	/*
+	 * Drain explicit hints in priority order before the mm_slot scan.
+	 * Iterate priorities from highest (lowest index) to lowest. For each
+	 * priority, handle every mm with hints queued at that priority
+	 * before we move on to the next, lower priority.
+	 */
+	for (prio =3D 0; prio < NR_KHUGEPAGED_PRIORITY_LEVEL; prio++) {
+		while (priority_queue_fail_times < KHUGEPAGED_PRIORITY_QUEUE_MAX_FAIL &&
+			cc->progress < progress_max) {
+			if (collapse_scan_one_priority_entry(progress_max, result, cc,
+				prio, &priority_queue_fail_times) =3D=3D 0)
+				break;
+		}
+
+		if (cc->progress >=3D progress_max ||
+		    priority_queue_fail_times >=3D KHUGEPAGED_PRIORITY_QUEUE_MAX_FAIL)
+			break;
+	}
+
+	if (list_empty(&khugepaged_scan.mm_head) ||
+	    cc->progress >=3D progress_max)
+		return;
+
 	if (khugepaged_scan.mm_slot) {
 		slot =3D khugepaged_scan.mm_slot;
 	} else {

--=20
2.52.0
From nobody Mon Jun  8 08:36:53 2026
Received: from outbound.mr.icloud.com (mr-2001f-snip4-6.eps.apple.com
 [57.103.68.59])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77EA02D3750
	for <linux-kernel@vger.kernel.org>; Sun, 31 May 2026 04:27:58 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=57.103.68.59
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780201679; cv=none;
 b=A4YGElZuEGyP2uPwLh34kUHwnT+2sn/12y/lUJRIRBiHj+tN1BeS0abIpIKsuOgiN2rwfCc/Jt4zK4DDSYTm9zHZWH4jTzGqaTXRlmsdaQ2Ng9kRdOPlH+Q/Nq0ZAewruIrY9SF0Ja0bwS7QaDN/dBrobfIEGZmUA9S2fcpPFtc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780201679; c=relaxed/simple;
	bh=74V7JE4yvLIG0JN3uxxDarz9QqLQvqPMNULvc4WFvUI=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=d1p4NpHp0H/YyIyBxvUzjoj8b5PgEGPeB5aq4KVAWoIqwDwRwG/RSdDXshO4V+5nbeALkcl4fQImT1AcnWvGAufe+t9hOrcsKIymqb7Rjel+FoHLKjKg2N9/iADDUiPfNwlh4+FxAsy3+sNIJ9brWAzBsUi1X1fIAtfbYjEwLbs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com;
 spf=pass smtp.mailfrom=icloud.com;
 dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b=RIb6NQoC; arc=none smtp.client-ip=57.103.68.59
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b="RIb6NQoC"
Received: from outbound.mr.icloud.com (unknown [127.0.0.2])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPS id
 60997180013D;
	Sun, 31 May 2026 04:27:55 +0000 (UTC)
X-ICL-Out-Info: 
 HUtFAUMEWwJACUgBTUQeDx5WFlZNRAJCTQhJB0MFXwteDUAdVAVLVxQEFEYGVg1dE0wLcwRUB10FXVZQAlpLVBQEEVABWB5WXloXXk1FCA9CAVhbCFsEDx9MDFECQgVWXlQKHQRUB10FXVZQAlpLQgRLRWhcBVwcQBdIHV9qS1YUBBFQAVgeVl5aF15NWgJWTQVKA18BWwdDCFVHBUc0UR9VFFIdRA5tGFAWR0BBWh9CFEAFWwRYCxNdTFBfVitGFVcbVgNDRVEfVEYTGU4bV01QG18CQg8=
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icloud.com; s=1a1hai;
 t=1780201678; x=1782793678; bh=t9h/Ynv08eT/7kBNC5M5hKRxLaGxANkcJA7nimD4RAY=;
 h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:x-icloud-hme;
 b=RIb6NQoCWqkSi9d8/57Seu80aFsgI8o7zbmXDvewzfEnKpyd4abxPqmg7JUvcEdPeF767vTiPvjSCZeC0jLxbYlklqg1bxTVQbyqKYETw/2poKWKh2iygi9MUxcuDdGVxms9DL8EN1ugB8CFZOxRk72ep4WXzlvPmTCIP6taHUKVWgmXKwHbhLEfy7Fibr220vL8ZE7LGcwJrrc3yXqw+EgrnSloUvBKfdgXLQSXvMICfXS+cytRgClkr66gQtfqWnwgl91KHwUr8NB9alxHX/rZr7xC+DRvoZgHdccCionoq3Ohsx0BHMnmIGOitW3batTBjAKjd7gSxlQNaoS6+Q==
Received: from [127.0.0.1] (unknown [17.57.152.38])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPSA id
 472341800114;
	Sun, 31 May 2026 04:27:44 +0000 (UTC)
From: Luka Bai <lukafocus@icloud.com>
Date: Sun, 31 May 2026 12:27:18 +0800
Subject: [PATCH 2/5] mm/khugepaged: use slab cache instead of normal
 kmalloc
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20260531-thp_collapse_hint-v1-2-866339cd4c2a@tencent.com>
References: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
In-Reply-To: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
 David Hildenbrand <david@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
 Zi Yan <ziy@nvidia.com>, Baolin Wang <baolin.wang@linux.alibaba.com>,
 "Liam R. Howlett" <liam@infradead.org>, Nico Pache <npache@redhat.com>,
 Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
 Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
 Vlastimil Babka <vbabka@kernel.org>, Mike Rapoport <rppt@kernel.org>,
 Suren Baghdasaryan <surenb@google.com>, Michal Hocko <mhocko@suse.com>,
 Kairui Song <kasong@tencent.com>, Qi Zheng <qi.zheng@linux.dev>,
 Shakeel Butt <shakeel.butt@linux.dev>,
 Axel Rasmussen <axelrasmussen@google.com>, Yuanchu Xie <yuanchu@google.com>,
 Wei Xu <weixugc@google.com>, Rik van Riel <riel@surriel.com>,
 Harry Yoo <harry@kernel.org>, Jann Horn <jannh@google.com>,
 Johannes Weiner <hannes@cmpxchg.org>, linux-kernel@vger.kernel.org,
 Luka Bai <lukabai@tencent.com>
X-Mailer: b4 0.15.2
X-Developer-Signature: v=1; a=ed25519-sha256; t=1780201643; l=2998;
 i=lukabai@tencent.com; s=20260501; h=from:subject:message-id;
 bh=IK4miDrmdbabkQyFGW59A5b6GPx3fnGu+q5Q4xvrQwg=;
 b=25BP/QidmIIMdY6i6uWSXAANFJ6EHVM+SNF76MvPHXem3VllQIhb9Nvnq+lwJnpu0kI7vaE5S
 3jx0hHZdCprDk+Lb0G+7XRUQnZJiFQTPMntsOsR0LRastmpEfON1GBL
X-Developer-Key: i=lukabai@tencent.com; a=ed25519;
 pk=KeaVteSWd00GIAjFyWZnuFsKAKixjga1ZkLMcI66nPM=
X-Proofpoint-GUID: Lk8E2F88_Z5a2ahnBfM5agzJL3mWuPqD
X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTMxMDA0NiBTYWx0ZWRfX/80evBJfXAM+
 q1w3c3bdQEPP8SHCaqi28rjymeG/kLhpf+OaZxD6rrTOd2gtH49AwAaK9fXPb0/gR0Ua8lIzYOQ
 hLKm23YnwH4rawxd/vQfCxDyqd6AE/AeoV2PCvHo7fyiORbalzRRKsfXxzTXIzVRLGYk7pNCIDY
 emMAS+Cn8bqJn9ieuAvaZGftx84yGga7r22ii9weVkjhnFfPb0g/TYTi9ZVQm4fHtffI3Ghs3/1
 Zwlq5S2gjQ8w4C4XdRNQj49drENXSBeEJgCgOiyAOWuRgGHsU3KFJIayYKX4dBcLLIdCGuF4TQZ
 CwiWsEtOOCNcCLyh7DuUnR1xNiHJgqD/MXlZTPExUHcHtBJA1A+gN9PDm0WjYI=
X-Proofpoint-ORIG-GUID: Lk8E2F88_Z5a2ahnBfM5agzJL3mWuPqD
X-Authority-Info-Out: v=2.4 cv=H57WAuYi c=1 sm=1 tr=0 ts=6a1bb8cd
 cx=c_apl:c_pps:t_out a=9OgfyREA4BUYbbCgc0Y0oA==:117
 a=9OgfyREA4BUYbbCgc0Y0oA==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10
 a=x7bEGLp0ZPQA:10 a=UaoJkeuwEpQA:10 a=VkNPw1HP01LnGYTKEx00:22
 a=GvQkQWPkAAAA:8 a=gYzHG_maUEIFqAho4fMA:9 a=QEXdDO2ut3YA:10

From: Luka Bai <lukabai@tencent.com>

We added a kmem slab cached called collapse_hint_cache for
khugepaged collapse hint, to improve the performance in allocation
and freeing for the hint structs.

Signed-off-by: Luka Bai <lukabai@tencent.com>
---
 mm/khugepaged.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5090ffae73f3..04cf85ea5557 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -98,6 +98,7 @@ static unsigned int khugepaged_max_ptes_shared __read_mos=
tly;
 static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
=20
 static struct kmem_cache *mm_slot_cache __ro_after_init;
+static struct kmem_cache *collapse_hint_cache __ro_after_init;
=20
 #define KHUGEPAGED_PRIORITY_QUEUE_MAX_FAIL 10
=20
@@ -555,6 +556,13 @@ int __init khugepaged_init(void)
 	if (!mm_slot_cache)
 		return -ENOMEM;
=20
+	collapse_hint_cache =3D KMEM_CACHE(khugepaged_collapse_hint, 0);
+	if (!collapse_hint_cache) {
+		kmem_cache_destroy(mm_slot_cache);
+		mm_slot_cache =3D NULL;
+		return -ENOMEM;
+	}
+
 	for (i =3D 0; i < NR_KHUGEPAGED_PRIORITY_LEVEL; i++)
 		INIT_LIST_HEAD(&khugepaged_priority_queue[i]);
=20
@@ -569,6 +577,7 @@ int __init khugepaged_init(void)
 void __init khugepaged_destroy(void)
 {
 	kmem_cache_destroy(mm_slot_cache);
+	kmem_cache_destroy(collapse_hint_cache);
 }
=20
 static inline int collapse_test_exit(struct mm_struct *mm)
@@ -686,7 +695,7 @@ static void khugepaged_release_collapse_hints(
=20
 	list_for_each_entry_safe(hint, tmp, &req->hints, node) {
 		list_del(&hint->node);
-		kfree(hint);
+		kmem_cache_free(collapse_hint_cache, hint);
 	}
 }
=20
@@ -3013,7 +3022,7 @@ void khugepaged_add_collapse_hint(struct mm_struct *m=
m,
 	if (!mm_flags_test(MMF_VM_HUGEPAGE, mm))
 		return;
=20
-	hint =3D kmalloc_obj(struct khugepaged_collapse_hint);
+	hint =3D kmem_cache_alloc(collapse_hint_cache, GFP_KERNEL);
 	if (!hint)
 		return;
=20
@@ -3025,14 +3034,14 @@ void khugepaged_add_collapse_hint(struct mm_struct =
*mm,
 	 * just "best-effort" optimization.
 	 */
 	if (!spin_trylock(&khugepaged_mm_lock)) {
-		kfree(hint);
+		kmem_cache_free(collapse_hint_cache, hint);
 		return;
 	}
=20
 	slot =3D mm_slot_lookup(mm_slots_hash, mm);
 	if (!slot) {
 		spin_unlock(&khugepaged_mm_lock);
-		kfree(hint);
+		kmem_cache_free(collapse_hint_cache, hint);
 		return;
 	}
 	khp_mm_slot =3D mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
@@ -3125,7 +3134,7 @@ static int collapse_scan_one_priority_entry(unsigned =
int progress_max,
 		addr =3D hint->address;
=20
 		if (unlikely(collapse_test_exit_or_disable(mm))) {
-			kfree(hint);
+			kmem_cache_free(collapse_hint_cache, hint);
 			break;
 		}
=20
@@ -3149,7 +3158,7 @@ static int collapse_scan_one_priority_entry(unsigned =
int progress_max,
 		if (*result !=3D SCAN_SUCCEED)
 			(*fail_count)++;
 skip_hint:
-		kfree(hint);
+		kmem_cache_free(collapse_hint_cache, hint);
 	}
=20
 	if (!lock_dropped)

--=20
2.52.0
From nobody Mon Jun  8 08:36:53 2026
Received: from outbound.mr.icloud.com (mr-2001e-snip4-7.eps.apple.com
 [57.103.68.50])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2622E1A9F85
	for <linux-kernel@vger.kernel.org>; Sun, 31 May 2026 04:28:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=57.103.68.50
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780201694; cv=none;
 b=mYpKPKlMRTKnb4o3LzKkezrvd9uYyBIO0HnmB84W117Pa9vQX2jy/J1MieKv4YiAy6R3ldNWW2LB5MDEwZvDDDHGb07djWSk2pxZIAdQGBvTRQSp/pvztPr5uXeuI8yHrIWoGoKKd5jtB+IAEzZV1y1oCXyIkMSMORkUqGyr/Uc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780201694; c=relaxed/simple;
	bh=x5fMpXle0vA6Me0ePh/LPjrQrdUhc2UQl/+13ZVwlrc=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=lomZ69/hxDXwaUoaDgd03Jd+0l+cgfzz9mm9RyNQhrrytkge/RRBgz2AH6W5AWBQDRsNKe3Og/+tRjDCnmnHyO5strGOto5kF/B+44i26Pu7o+LI8pAhjLZj8bwrfyMjiuBKeE/2lB2Y4Xwm/iuF6IQIQJx4DIJVSkiGieSZ18U=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com;
 spf=pass smtp.mailfrom=icloud.com;
 dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b=y1Jjwsgf; arc=none smtp.client-ip=57.103.68.50
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b="y1Jjwsgf"
Received: from outbound.mr.icloud.com (unknown [127.0.0.2])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPS id
 158A11800119;
	Sun, 31 May 2026 04:28:06 +0000 (UTC)
X-ICL-Out-Info: 
 HUtFAUMEWwJACUgBTUQeDx5WFlZNRAJCTQhJB0MFXwteDUAdVAVLVxQEFEYGVg1dE0wLcwRUB10FXVZQAlpLVBQEEVABWB5WXloXXk1FCA9CAVhbCFsEDx9MDFECQgVWXlQKHQRUB10FXVZQAlpLQgRLRWhcBVwcQBdIHV9qS1YUBBFQAVgeVl5aF15NWgJWTQVKA18BWwdDCFVHBUc0UR9VFFIdRA5tGFAWR0BBWh9DFEAFWwRYCxNdTFBfVitGFVcbVgNDRVEfVEYTGU4bV01QG18CQg8=
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icloud.com; s=1a1hai;
 t=1780201692; x=1782793692; bh=dismuyxgoUu32oVKxariHZH1KvI6gzuA1yZho9b0qn4=;
 h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:x-icloud-hme;
 b=y1JjwsgfBvMs52AxyxBeGlVUqgjCWgV0TX+Nzge/UYjFeQWgOQY+NMXTkzUNM2qPb/L7wHfJ9KVyQa3HnewtBs2VrANFUg7rlHoW9T1eMCuSiZc3GFG0EGdI38rmOdhLGjI8WD4Lw38iwOx73fK26I4Hilh4//l/jICZ71sjtGvOiwlPwA1fvxve5FO+6PSWK3yecYS7JhAkkZffUMxHsRlMXfH0uqgMuHqM6d2VZpMPJhzx2W3Xfrm5yxNvuBgkXc3LtnY0rlgMU/K7ffgRsRm0R851O/YcIF9pwef/VTvdnODYaNCjcQ0iNb+BoJ/tTdssYGk6BSgiiulJs+hLuQ==
Received: from [127.0.0.1] (unknown [17.57.152.38])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPSA id
 7EF4E18000A8;
	Sun, 31 May 2026 04:27:55 +0000 (UTC)
From: Luka Bai <lukafocus@icloud.com>
Date: Sun, 31 May 2026 12:27:19 +0800
Subject: [PATCH 3/5] mm/khugepaged: add deduplication when adding new
 collapse hint
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20260531-thp_collapse_hint-v1-3-866339cd4c2a@tencent.com>
References: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
In-Reply-To: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
 David Hildenbrand <david@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
 Zi Yan <ziy@nvidia.com>, Baolin Wang <baolin.wang@linux.alibaba.com>,
 "Liam R. Howlett" <liam@infradead.org>, Nico Pache <npache@redhat.com>,
 Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
 Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
 Vlastimil Babka <vbabka@kernel.org>, Mike Rapoport <rppt@kernel.org>,
 Suren Baghdasaryan <surenb@google.com>, Michal Hocko <mhocko@suse.com>,
 Kairui Song <kasong@tencent.com>, Qi Zheng <qi.zheng@linux.dev>,
 Shakeel Butt <shakeel.butt@linux.dev>,
 Axel Rasmussen <axelrasmussen@google.com>, Yuanchu Xie <yuanchu@google.com>,
 Wei Xu <weixugc@google.com>, Rik van Riel <riel@surriel.com>,
 Harry Yoo <harry@kernel.org>, Jann Horn <jannh@google.com>,
 Johannes Weiner <hannes@cmpxchg.org>, linux-kernel@vger.kernel.org,
 Luka Bai <lukabai@tencent.com>
X-Mailer: b4 0.15.2
X-Developer-Signature: v=1; a=ed25519-sha256; t=1780201643; l=7996;
 i=lukabai@tencent.com; s=20260501; h=from:subject:message-id;
 bh=N3P/iJQ8BlWvrseSb9JBmO8QdT3darHooCxwnlA5qr0=;
 b=i5egUqtHfUbtbATcvTL8RDNSKrSYd3FUakErTbxmly9nv8AO94Z/hCdS8r0+l/XMoUa1O1SKH
 fgD9aPwDeTiBnAYX2j6+OH3DtOeUICslAoC2Ed3/S0y7SMtuuUCHH5O
X-Developer-Key: i=lukabai@tencent.com; a=ed25519;
 pk=KeaVteSWd00GIAjFyWZnuFsKAKixjga1ZkLMcI66nPM=
X-Authority-Info-Out: v=2.4 cv=F9xat6hN c=1 sm=1 tr=0 ts=6a1bb8da
 cx=c_apl:c_pps:t_out a=9OgfyREA4BUYbbCgc0Y0oA==:117
 a=9OgfyREA4BUYbbCgc0Y0oA==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10
 a=x7bEGLp0ZPQA:10 a=UaoJkeuwEpQA:10 a=VkNPw1HP01LnGYTKEx00:22
 a=GvQkQWPkAAAA:8 a=nMWqTqnFzTP9CSPNFE4A:9 a=QEXdDO2ut3YA:10
X-Proofpoint-GUID: 3TSgtxcgck9PoY_i5WmbPItoV4rX2K--
X-Proofpoint-ORIG-GUID: 3TSgtxcgck9PoY_i5WmbPItoV4rX2K--
X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTMxMDA0NiBTYWx0ZWRfX8vlJYMDh8xS4
 Kkwu3bFmLwfyB5pwK7bGaUVIJ6ShuM6wgX0voWelHxDosLbaLHhTTgDH9R6CK21Sa13FH5cGhyI
 6jgDXmnqapoLJfQZ2/ZDbZr/eSsjmi1ELRPbH3K9g18mGKuWF05xCllDc/vFPu2UEGXEu4zdMdH
 0ipt2lQ2EU+50Cs0x7/iQ7js7ht04CHmVm6uVpAUHh3zXBGoAhZmrv6TwZRfQgSo4FS/mb3REqq
 eernGYxQBgoiuPGhW4JaXSNrX+qKcWUl727Ntq1xVsep6tME+ApQKCqdnH3AOWYNwp9v7Da1LoM
 gMxVQlDOQnGwgV0sgrfAdB6q2VzIf9N2J3faJnrxgJzCyWFft0a3fEW/Dv8Lzc=

From: Luka Bai <lukabai@tencent.com>

We need to check for duplication before we add a new collapse hint,
and we want the searching and adding to be faster. So there are
several options for doing that:

Option 1. Add a Blooming filter for the hint addresses, but that
will make the hint hard to be deleted after handling.

Option 2. Add a hashtable for each khugepaged_mm_slot. But for a
efficient setup, the hashtable should have maybe 16 ~ 32 slots,
which will cost 128 bytes to 256 bytes for each mm_struct. Seems a
little wasteful.

Option 3. Add an xarray for each khugepaged_mm_slot, which only
takes 16 bytes for each mm_struct. However, each time when we try
to add a new entry into the xarray, it may cause memory allocation.
Collapse hint is supposed to be a best-effort machanism, introducing
xarray seems to be a little too heavy for the calling function.

Option 4. Add a global hashtable for all the memory hints, setup
key by their address and mm_struct ptr. The global hashtable mixes
mm_struct ptr and address as key, but the deduplication only looks
at address for saving memory. As a result, there may be collision
on different mms with a same address. But as we claimed above,
collapse hint is only a best-effort thing, and the collision is
also rare to happen because the address is always 0 for the lower
PMD_SHIFT bits, which normally gives mm struct about 2M size to
scatter (the key is calculated by (ptr of mm ^ pmd aligned address).

By choosing option 4, since the hashtable is global, we decided to
directly use a global lock (we directly use khugepaged_mm_lock here).
To avoid uncessary lock spinning, we used trylock when we try to add
a new hint, and exit when the contension happened. Still, this is
harmless for the correctness of the machanism.

Signed-off-by: Luka Bai <lukabai@tencent.com>
---
 mm/khugepaged.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++=
----
 1 file changed, 78 insertions(+), 5 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 04cf85ea5557..3f5eb8be06d1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -100,6 +100,24 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_slots_hash, MM_=
SLOTS_HASH_BITS);
 static struct kmem_cache *mm_slot_cache __ro_after_init;
 static struct kmem_cache *collapse_hint_cache __ro_after_init;
=20
+/*
+ * Global lookup table used by khugepaged_add_collapse_hint() to deduplica=
te
+ * pending hints against an existing address. The key mixes mm and address
+ * but the dedup comparison only looks at @address. As a result, two
+ * different mms hinting the same address may collapse. This is rare
+ * since the aligned_addr is always 0 for the lower PMD_SHIFT bits, which
+ * normally gives mm struct about 2M size for scattering (for 4K paging).
+ * And it's also harmless if the collision happens.
+ */
+#define KHUGEPAGED_HINTS_HASH_BITS	9
+static DEFINE_HASHTABLE(khugepaged_hint_lookup, KHUGEPAGED_HINTS_HASH_BITS=
);
+
+static inline unsigned long khugepaged_hint_key(struct mm_struct *mm,
+						unsigned long aligned_addr)
+{
+	return (unsigned long)mm ^ aligned_addr;
+}
+
 #define KHUGEPAGED_PRIORITY_QUEUE_MAX_FAIL 10
=20
 #define KHUGEPAGED_MIN_MTHP_ORDER	2
@@ -165,12 +183,15 @@ static struct khugepaged_scan khugepaged_scan =3D {
=20
 /**
  * struct khugepaged_collapse_hint - one collapse hint for a specific addr=
ess
- * @node:    list node on khugepaged_collapse_requests.hints
- * @vma:     hint pointer to the target VMA
- * @address: PMD-aligned virtual address inside @vma to attempt collapsing=
 on
+ * @node:      list node on khugepaged_collapse_requests.hints
+ * @hash_node: hlist node on the global khugepaged_hint_lookup table, used
+ *             for deduplication.
+ * @vma:       hint pointer to the target VMA
+ * @address:   PMD-aligned virtual address inside @vma to attempt collapsi=
ng on
  */
 struct khugepaged_collapse_hint {
 	struct list_head node;
+	struct hlist_node hash_node;
 	struct vm_area_struct *vma;
 	unsigned long address;
 };
@@ -688,6 +709,29 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 		__khugepaged_enter(vma->vm_mm);
 }
=20
+/*
+ * Unhash any hints still queued under @req. Caller must hold
+ * khugepaged_mm_lock so we can safely unhash each hint from the global
+ * khugepaged_hint_lookup table.
+ */
+static void khugepaged_unhash_collapse_hints(
+			  struct khugepaged_collapse_requests *req)
+{
+	struct khugepaged_collapse_hint *hint, *tmp;
+
+	lockdep_assert_held(&khugepaged_mm_lock);
+
+	list_for_each_entry_safe(hint, tmp, &req->hints, node) {
+		hash_del(&hint->hash_node);
+	}
+}
+
+/*
+ * Free any hints still queued under @req. No lock need to be held. Caller
+ * must make sure the hints are already unhashed from the global
+ * khugepaged_hint_lookup table and the mm_slot is removed from the
+ * khugepaged_priority_queue[].
+ */
 static void khugepaged_release_collapse_hints(
 			  struct khugepaged_collapse_requests *req)
 {
@@ -712,6 +756,14 @@ static void khugepaged_remove_priority_requests(struct=
 khugepaged_mm_slot *khp_m
 		list_del(&khp_mm_slot->request[i].node);
 }
=20
+static void khugepaged_unhash_all_hints(struct khugepaged_mm_slot *khp_mm_=
slot)
+{
+	int i;
+
+	for (i =3D 0; i < NR_KHUGEPAGED_PRIORITY_LEVEL; i++)
+		khugepaged_unhash_collapse_hints(&khp_mm_slot->request[i]);
+}
+
 static void khugepaged_release_all_hints(struct khugepaged_mm_slot *khp_mm=
_slot)
 {
 	int i;
@@ -733,6 +785,7 @@ void __khugepaged_exit(struct mm_struct *mm)
 		hash_del(&slot->hash);
 		list_del(&slot->mm_node);
 		khugepaged_remove_priority_requests(khp_mm_slot);
+		khugepaged_unhash_all_hints(khp_mm_slot);
 		free =3D 1;
 	}
 	spin_unlock(&khugepaged_mm_lock);
@@ -1933,6 +1986,7 @@ static void collect_mm_slot(struct mm_slot *slot)
 		 * mm_flags_clear(MMF_VM_HUGEPAGE, mm);
 		 */
=20
+		khugepaged_unhash_all_hints(khp_mm_slot);
 		/* khugepaged_mm_lock actually not necessary for the below */
 		khugepaged_release_all_hints(khp_mm_slot);
 		mm_slot_free(mm_slot_cache, khp_mm_slot);
@@ -3001,8 +3055,9 @@ void khugepaged_add_collapse_hint(struct mm_struct *m=
m,
 				 int priority, int max_order)
 {
 	struct khugepaged_mm_slot *khp_mm_slot;
-	struct khugepaged_collapse_hint *hint;
+	struct khugepaged_collapse_hint *hint, *existing;
 	struct mm_slot *slot;
+	unsigned long aligned_addr, key;
 	int orders;
=20
 	if (!mm || !vma)
@@ -3022,12 +3077,15 @@ void khugepaged_add_collapse_hint(struct mm_struct =
*mm,
 	if (!mm_flags_test(MMF_VM_HUGEPAGE, mm))
 		return;
=20
+	aligned_addr =3D address & HPAGE_PMD_MASK;
+	key =3D khugepaged_hint_key(mm, aligned_addr);
+
 	hint =3D kmem_cache_alloc(collapse_hint_cache, GFP_KERNEL);
 	if (!hint)
 		return;
=20
 	hint->vma =3D vma;
-	hint->address =3D address & HPAGE_PMD_MASK;
+	hint->address =3D aligned_addr;
=20
 	/*
 	 * Just use try lock to avoid lock contention because collapse hints are
@@ -3045,7 +3103,21 @@ void khugepaged_add_collapse_hint(struct mm_struct *=
mm,
 		return;
 	}
 	khp_mm_slot =3D mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
+
+	/*
+	 * For deduplication. The comparison only checks @address here. See comme=
nts
+	 * above khugepaged_hint_lookup definition for details.
+	 */
+	hash_for_each_possible(khugepaged_hint_lookup, existing, hash_node, key) {
+		if (existing->address =3D=3D aligned_addr) {
+			spin_unlock(&khugepaged_mm_lock);
+			kmem_cache_free(collapse_hint_cache, hint);
+			return;
+		}
+	}
+
 	list_add_tail(&hint->node, &khp_mm_slot->request[priority].hints);
+	hash_add(khugepaged_hint_lookup, &hint->hash_node, key);
 	spin_unlock(&khugepaged_mm_lock);
=20
 	wake_up_interruptible(&khugepaged_wait);
@@ -3124,6 +3196,7 @@ static int collapse_scan_one_priority_entry(unsigned =
int progress_max,
 						struct khugepaged_collapse_hint,
 						node);
 			list_del(&hint->node);
+			hash_del(&hint->hash_node);
 		}
 		spin_unlock(&khugepaged_mm_lock);
=20

--=20
2.52.0
From nobody Mon Jun  8 08:36:53 2026
Received: from outbound.mr.icloud.com (mr-2001f-snip4-8.eps.apple.com
 [57.103.68.61])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88DF22EC09B
	for <linux-kernel@vger.kernel.org>; Sun, 31 May 2026 04:28:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=57.103.68.61
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780201704; cv=none;
 b=fx43kXr79tuRx9CSagHqqxkj6BioQ6p16RPFdRPaJBIUu7N6l2LCIBjXiqSLiIePzwoMc+szBz13vTT6aUdiB7ZaiUbUuIeh83eLqToPWHyuutHwZrKFLI2YN7sx1xAtvbxu4lZ0pnXd1VIziCRGR0891I+IoKRjjG7vZJeTeeE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780201704; c=relaxed/simple;
	bh=STH885JstUH7y5ZxJPHL4Ir8bFwC0BDenu+H0aByyxc=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=sWv9rf5yIiGZPzQMvslytxGx2OQ8mENgeEjZLwf7B/9zVC6XGnnLHRu8XapBAmw73sOeacHCKIrGSDpW8T3Hdy4V8jIbGZ2WN0oDi/Mwo2NyI1cCly2+o8nBNYFhEMhEudApPIwrj4iitairBgaaCDD7J4UtL4qgBsf+VMmT9rc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com;
 spf=pass smtp.mailfrom=icloud.com;
 dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b=VR3BYHPX; arc=none smtp.client-ip=57.103.68.61
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b="VR3BYHPX"
Received: from outbound.mr.icloud.com (unknown [127.0.0.2])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPS id
 11D151800171;
	Sun, 31 May 2026 04:28:16 +0000 (UTC)
X-ICL-Out-Info: 
 HUtFAUMEWwJACUgBTUQeDx5WFlZNRAJCTQhJB0MFXwteDUAdVAVLVxQEFEYGVg1dE0wLcwRUB10FXVZQAlpLVBQEEVABWB5WXloXXk1FCA9CAVhbCFsEDx9MDFECQgVWXlQKHQRUB10FXVZQAlpLQgRLRWhcBVwcQBdIHV9qS1YUBBFQAVgeVl5aF15NWgJWTQVKA18BWwdDCFVHBUc0UR9VFFIdRA5tGFAWR0BBWh9EFEAFWwRYCxNdTFBfVitGFVcbVgNDRVEfVEYTGU4bV01QG18CQg8=
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icloud.com; s=1a1hai;
 t=1780201703; x=1782793703; bh=bhbmzU8f+DtLeo7/JiKJOt5bFOzMaSbF+v57Q12naMo=;
 h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:x-icloud-hme;
 b=VR3BYHPXC/5ooK91OfM9H7foeW+CQSHUnC+y9ZCelBzSsdaOv6XMTtXaOQs+HSqO5UeF7fLQ2IJIZA3Z7uVVyeQI+vCvWKqu/wstdMaHb70qGkAbdM2PLjM8YFq0I8Zp1zbcur1RDH9YmLLBI4iHjB9bElUEKgepwYJoZVOy6V7HeOZNjl10X8Phe+QJNFCyMgbqgYPVuYpC0ciuBjk7Rd6fCWR9Z89PJEJ9+TThbcYh5FsDyQo+W6XrCifuaaXxNH8T2pteqrRCQ7kbD5FVBoBgvNf4GRzV72pa2RbKE5nwCcddgBpyEIClC2/j4giwkKgpOh6wS1wH5awxvFgOEQ==
Received: from [127.0.0.1] (unknown [17.57.152.38])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPSA id
 C45731800114;
	Sun, 31 May 2026 04:28:06 +0000 (UTC)
From: Luka Bai <lukafocus@icloud.com>
Date: Sun, 31 May 2026 12:27:20 +0800
Subject: [PATCH 4/5] mm/khugepaged: add accounting for successful hint or
 non-hint collapse
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20260531-thp_collapse_hint-v1-4-866339cd4c2a@tencent.com>
References: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
In-Reply-To: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
 David Hildenbrand <david@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
 Zi Yan <ziy@nvidia.com>, Baolin Wang <baolin.wang@linux.alibaba.com>,
 "Liam R. Howlett" <liam@infradead.org>, Nico Pache <npache@redhat.com>,
 Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
 Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
 Vlastimil Babka <vbabka@kernel.org>, Mike Rapoport <rppt@kernel.org>,
 Suren Baghdasaryan <surenb@google.com>, Michal Hocko <mhocko@suse.com>,
 Kairui Song <kasong@tencent.com>, Qi Zheng <qi.zheng@linux.dev>,
 Shakeel Butt <shakeel.butt@linux.dev>,
 Axel Rasmussen <axelrasmussen@google.com>, Yuanchu Xie <yuanchu@google.com>,
 Wei Xu <weixugc@google.com>, Rik van Riel <riel@surriel.com>,
 Harry Yoo <harry@kernel.org>, Jann Horn <jannh@google.com>,
 Johannes Weiner <hannes@cmpxchg.org>, linux-kernel@vger.kernel.org,
 Luka Bai <lukabai@tencent.com>
X-Mailer: b4 0.15.2
X-Developer-Signature: v=1; a=ed25519-sha256; t=1780201643; l=3543;
 i=lukabai@tencent.com; s=20260501; h=from:subject:message-id;
 bh=fMQSPKl1zRvOYMdFUia4IDMTy4aTjy06OEcQUlcbylM=;
 b=g6hskuajwDifuMlmQvwXr7m3gvsQsG3xboa6bdOJ6SgjH/Fyfhd7w2b5Q80kpTF2fGF4LNo8F
 0yMQCfVmh3mArC+6szHFufSG++WLY6mSpGGl0aVod/sTQ7MAhojQ/Pz
X-Developer-Key: i=lukabai@tencent.com; a=ed25519;
 pk=KeaVteSWd00GIAjFyWZnuFsKAKixjga1ZkLMcI66nPM=
X-Authority-Info-Out: v=2.4 cv=AvTjHe9P c=1 sm=1 tr=0 ts=6a1bb8e4
 cx=c_apl:c_pps:t_out a=9OgfyREA4BUYbbCgc0Y0oA==:117
 a=9OgfyREA4BUYbbCgc0Y0oA==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10
 a=x7bEGLp0ZPQA:10 a=UaoJkeuwEpQA:10 a=VkNPw1HP01LnGYTKEx00:22
 a=GvQkQWPkAAAA:8 a=0Ztl7DgICfUfD0EaiaYA:9 a=QEXdDO2ut3YA:10
X-Proofpoint-ORIG-GUID: yz54fiqE8TjX4RhqkROvSzCb1_eTk_JV
X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTMxMDA0NiBTYWx0ZWRfX6jolbC6oTORJ
 jDnOhHXnV+NXz2xs37S9CoZAHvweRKv5SyS3VtzwTOkjyvw7C6aQkmgdFQ6Zim/eeFsF2jhmPod
 AoOFBOPvCz9EX/SSpHehGjp0fT6niK7Npb3Dzhac6aEIvkF5m09cSyMTw8CZbPRnww38EO4NYQp
 naXS1Wi4WIMDWuT6nN8LnShZL9p32T1od0eDceDXhBXp9Ob09iHPWS2okUGhXpCr/Rgq71n8Z5O
 Us38k33MJiQkrXn1TfNeAaQUGWiOz3yRh3JZiZEcqe1EH9oUY7uoib0htE5dLIX0fJpTFTcpAjU
 zIkJ/PsXG/MvhBUpS0PdxgFMxtZPOHe+yjr9erkbx0hLH+cdLl636DHL9DyjhU=
X-Proofpoint-GUID: yz54fiqE8TjX4RhqkROvSzCb1_eTk_JV

From: Luka Bai <lukabai@tencent.com>

Add two mthp attributes for the accounting of the number of successful
khugepaged collapse, either by hint or not by hint so that we can know
them easily from userspace. Note that these two statistics only care
about the collapse initiated by khugepaged, and they will not consider
the collapse raised by MADV_COLLAPSE.

Signed-off-by: Luka Bai <lukabai@tencent.com>
---
 include/linux/huge_mm.h |  2 ++
 mm/huge_memory.c        |  4 ++++
 mm/khugepaged.c         | 18 +++++++++++++++++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index edece3e26985..9df0d7f71e95 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -147,6 +147,8 @@ enum mthp_stat_item {
 	MTHP_STAT_COLLAPSE_EXCEED_SWAP,
 	MTHP_STAT_COLLAPSE_EXCEED_NONE,
 	MTHP_STAT_COLLAPSE_EXCEED_SHARED,
+	MTHP_STAT_KHUGEPAGED_COLLAPSE_HINT,
+	MTHP_STAT_KHUGEPAGED_COLLAPSE_NON_HINT,
 	__MTHP_STAT_COUNT
 };
=20
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bf9b480bb3b0..0031fb4b0b09 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -720,6 +720,8 @@ DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_ST=
AT_NR_ANON_PARTIALLY_MAPP
 DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_=
SWAP);
 DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_=
NONE);
 DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEE=
D_SHARED);
+DEFINE_MTHP_STAT_ATTR(khugepaged_collapse_hint, MTHP_STAT_KHUGEPAGED_COLLA=
PSE_HINT);
+DEFINE_MTHP_STAT_ATTR(khugepaged_collapse_non_hint, MTHP_STAT_KHUGEPAGED_C=
OLLAPSE_NON_HINT);
=20
=20
 static struct attribute *anon_stats_attrs[] =3D {
@@ -775,6 +777,8 @@ static struct attribute *any_stats_attrs[] =3D {
 	&split_failed_attr.attr,
 	&collapse_alloc_attr.attr,
 	&collapse_alloc_failed_attr.attr,
+	&khugepaged_collapse_hint_attr.attr,
+	&khugepaged_collapse_non_hint_attr.attr,
 	NULL,
 };
=20
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 3f5eb8be06d1..2f21c0b6ab46 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -147,6 +147,15 @@ struct mthp_range {
 struct collapse_control {
 	bool is_khugepaged;
=20
+	/*
+	 * True while khugepaged is draining a collapse hint queued via
+	 * khugepaged_add_collapse_hint(). Used by collapse_single_pmd() to
+	 * attribute a successful collapse to MTHP_STAT_KHUGEPAGED_COLLAPSE_HINT
+	 * or MTHP_STAT_KHUGEPAGED_COLLAPSE_NON_HINT. Only meaningful when the
+	 * collapse is initiated by khugepaged (is_khugepaged =3D=3D true).
+	 */
+	bool from_priority_hint;
+
 	/* Num pages scanned per node */
 	u32 node_load[MAX_NUMNODES];
=20
@@ -3012,8 +3021,13 @@ static enum scan_result collapse_single_pmd(unsigned=
 long addr,
 		mmap_read_unlock(mm);
 	}
 end:
-	if (cc->is_khugepaged && result =3D=3D SCAN_SUCCEED)
+	if (cc->is_khugepaged && result =3D=3D SCAN_SUCCEED) {
 		++khugepaged_pages_collapsed;
+		count_mthp_stat(HPAGE_PMD_ORDER,
+				cc->from_priority_hint ?
+					MTHP_STAT_KHUGEPAGED_COLLAPSE_HINT :
+					MTHP_STAT_KHUGEPAGED_COLLAPSE_NON_HINT);
+	}
 	return result;
 }
=20
@@ -3227,7 +3241,9 @@ static int collapse_scan_one_priority_entry(unsigned =
int progress_max,
 		    addr + HPAGE_PMD_SIZE > ALIGN_DOWN(vma->vm_end, HPAGE_PMD_SIZE))
 			goto skip_hint;
=20
+		cc->from_priority_hint =3D true;
 		*result =3D collapse_single_pmd(addr, vma, &lock_dropped, cc);
+		cc->from_priority_hint =3D false;
 		if (*result !=3D SCAN_SUCCEED)
 			(*fail_count)++;
 skip_hint:

--=20
2.52.0
From nobody Mon Jun  8 08:36:53 2026
Received: from outbound.mr.icloud.com (mr-2005a-snip4-7.eps.apple.com
 [57.103.71.110])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47DFF2F3C07
	for <linux-kernel@vger.kernel.org>; Sun, 31 May 2026 04:28:31 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=57.103.71.110
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780201712; cv=none;
 b=WVmd4SKGxAwnAmWcPv4vodS9cdPG/IlMebWniqs1fwGM9Sg/PbkSHWkG9b+6uzqukFAtdmQbzPcSjDV0sdULoEZk+M8Sw/UQM+Iizflx8eZzxuv46cJ6wxiW8sS6DkFBDg0YS0wqF89ydX0OYcqlMPHq4m9yYTvvh2myI7Z3D5o=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780201712; c=relaxed/simple;
	bh=0IoEA3iaPVlGIi3GbIOyhrlOIYwXDN6lMrvSKjm2mi0=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=DVDNz3CozdsE1+J7e5rDWy0dGf2ctC4fB+v2c2vyu6U/+7rNHg6qcozgzR3N1VHYlk+Jm898h74f/cLvLbsvwG3TsWXh726P6tVN0NBqnZ+GMByes45Hn1CJk8kms/2P6h2WmHjlRnkPVJ1c081tiQYCt57E9PHJOhYUSsHPE9w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com;
 spf=pass smtp.mailfrom=icloud.com;
 dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b=bIgoOTdO; arc=none smtp.client-ip=57.103.71.110
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=icloud.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=icloud.com header.i=@icloud.com
 header.b="bIgoOTdO"
Received: from outbound.mr.icloud.com (unknown [127.0.0.2])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPS id
 58840180016D;
	Sun, 31 May 2026 04:28:28 +0000 (UTC)
X-ICL-Out-Info: 
 HUtFAUMEWwJACUgBTUQeDx5WFlZNRAJCTQhJB0MFXwteDUAdVAVLVxQEFEYGVg1dE0wLcwRUB10FXVZQAlpLVBQEEVABWB5WXloXXk1FCA9CAVhbCFsEDx9MDFECQgVWXlQKHQRUB10FXVZQAlpLQgRLRWhcBVwcQBdIHV9qS1YUBBFQAVgeVl5aF15NWgJWTQVKA18BWwdDCFVHBUc0UR9VFFIdRA5tGFAWR0BBWh9FFEAFWwRYCxNdTFBfVitGFVcbVgNDRVEfVEYTGU4bV01QG18CQg8=
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icloud.com; s=1a1hai;
 t=1780201710; x=1782793710; bh=aJuP8KM5OC30RVX6IAZKOJALOHyJ5CznciVNyRZ7oso=;
 h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:x-icloud-hme;
 b=bIgoOTdOpAwylAwKjgQxuhyRzeag/KbhuUHNBEXFCDFP6Pjql+r4QMHTAJSx6q1/PWq+2uxxEjn/PcymrNxBjaqAGkbIox3Kbt3h7mp3TKh5LCkeZ/Prc1lUgLvuNojcdf3AMbybbm/FVGtxkAH5LG0PzxnsaZcSkM09BBmcBoel4rRybCxZDhwODeg3BcNvwEAobjRRX79G6Y1ChmVxFQ3Y3e0WDfsgm1hn0ICKehdo6NMp0EJ0JkygolNN6fpWLCXHLiersfwRuBgGPYGj0F7SS5wCas0sEguS+sSKkzHXZJuQnrXjITly+Fs2p3qzlOpFfPYfJcax9AziER6r7A==
Received: from [127.0.0.1] (unknown [17.57.152.38])
	by p00-icloudmta-asmtp-us-west-2a-100-percent-4 (Postfix) with ESMTPSA id
 1AA7D1800172;
	Sun, 31 May 2026 04:28:16 +0000 (UTC)
From: Luka Bai <lukafocus@icloud.com>
Date: Sun, 31 May 2026 12:27:21 +0800
Subject: [PATCH 5/5] mm/khugepaged: add khugepaged collapse hint in mglru
 reference checking
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20260531-thp_collapse_hint-v1-5-866339cd4c2a@tencent.com>
References: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
In-Reply-To: <20260531-thp_collapse_hint-v1-0-866339cd4c2a@tencent.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
 David Hildenbrand <david@kernel.org>, Lorenzo Stoakes <ljs@kernel.org>,
 Zi Yan <ziy@nvidia.com>, Baolin Wang <baolin.wang@linux.alibaba.com>,
 "Liam R. Howlett" <liam@infradead.org>, Nico Pache <npache@redhat.com>,
 Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
 Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
 Vlastimil Babka <vbabka@kernel.org>, Mike Rapoport <rppt@kernel.org>,
 Suren Baghdasaryan <surenb@google.com>, Michal Hocko <mhocko@suse.com>,
 Kairui Song <kasong@tencent.com>, Qi Zheng <qi.zheng@linux.dev>,
 Shakeel Butt <shakeel.butt@linux.dev>,
 Axel Rasmussen <axelrasmussen@google.com>, Yuanchu Xie <yuanchu@google.com>,
 Wei Xu <weixugc@google.com>, Rik van Riel <riel@surriel.com>,
 Harry Yoo <harry@kernel.org>, Jann Horn <jannh@google.com>,
 Johannes Weiner <hannes@cmpxchg.org>, linux-kernel@vger.kernel.org,
 Luka Bai <lukabai@tencent.com>
X-Mailer: b4 0.15.2
X-Developer-Signature: v=1; a=ed25519-sha256; t=1780201643; l=11044;
 i=lukabai@tencent.com; s=20260501; h=from:subject:message-id;
 bh=EoL6aiFTXZnLAEbw4iXTl+sM+J/nPblpmXjGgAwVipM=;
 b=oVy+ULdgJO194sz343uXkhFD88xvpUSgqR8u90G61/cMZuZX8QMYmgOb/gX884rYQQ7PEFWSr
 CuBjstk1ETLBtNUXvC5Gcc++AnNBpcJgHASwl7auzluTwUV12U+cNSW
X-Developer-Key: i=lukabai@tencent.com; a=ed25519;
 pk=KeaVteSWd00GIAjFyWZnuFsKAKixjga1ZkLMcI66nPM=
X-Proofpoint-ORIG-GUID: bAljHtwMxCkq2ssXyWcEEq_Fx4voKYmt
X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTMxMDA0NiBTYWx0ZWRfX7utu1cuWg6Mk
 YX90SsXpWD6LlExkptCSwgAph1bl1zbOVg6AAo/Ec4Xt8BbFT5/juy0QyoZVkN1wMqwrRu3o3hx
 TdOXUhwhOCcsJah6gJRi92YjUHb6pDX3KMlvR9UULjiBQUrBt7VRo11F1JGZYs2aEvcQV3OOyMK
 9OLvDvW8jrxwuxHkcpFHT1V2KNutBW+Gpk3km1i0evRw/IFBr844RAqaMAKnpnau8usRCDrxzRa
 X+R24heHLTKeBBHVWusgZgQTpcUAg2AZBNPlsqFeDL35OLqqTqtbBr6HSsTcGqeyNTnS5nukNzk
 TqjCcevBwPrTD+Di5BaoRj/sSdgb+fcmqWTxA6uQ3RWMdje71PNiL9tQOuRL7M=
X-Authority-Info-Out: v=2.4 cv=Jov8bc4C c=1 sm=1 tr=0 ts=6a1bb8ee
 cx=c_apl:c_pps:t_out a=9OgfyREA4BUYbbCgc0Y0oA==:117
 a=9OgfyREA4BUYbbCgc0Y0oA==:17 a=IkcTkHD0fZMA:10 a=NGcC8JguVDcA:10
 a=x7bEGLp0ZPQA:10 a=UaoJkeuwEpQA:10 a=VkNPw1HP01LnGYTKEx00:22
 a=GvQkQWPkAAAA:8 a=6HC9NpuDrtEngJUpRywA:9 a=QEXdDO2ut3YA:10
X-Proofpoint-GUID: bAljHtwMxCkq2ssXyWcEEq_Fx4voKYmt

From: Luka Bai <lukabai@tencent.com>

Function lru_gen_look_around() works for mglru, which is a good way
to reduce the rmap iteration. It is called in folio_referenced_one()
when it tried to reclaim a cold page. By the time it gets the page
table entry lock, it will also check the nearby ptes and try to
update their generation if they are also accessed because of locality
in most of workloads, and put the pmd that it thinks full of hot
pages into a Bloom filter, for the walk through in next aging.

Function walk_mm() is used in mglru during aging. It will go through
all the pmds of a mm_struct if certain pmd is set in the Bloom
filter, which is setup in lru_gen_look_around() above, and indicates
that pmd is frequently accessed in many pages.

Now that lru_gen_look_around() and walk_mm() found hot pmd area, we
can also use their findings as good sources of khugepaged collapse
hint, so we make up collapse hints from there.

Note that lru_gen_look_around() is called with ptl lock locked, so
we don't want to directly call khugepaged_add_collapse_hint() inside
it because it may try to allocate memory. So we introduced a new struct
area_access_info, and use it to get the access info from inside, and
do collapse after the ptl released.

Signed-off-by: Luka Bai <lukabai@tencent.com>
---
 include/linux/khugepaged.h |  7 +++++++
 include/linux/mmzone.h     | 17 +++++++++++++++--
 mm/khugepaged.c            | 12 ++++++++++++
 mm/rmap.c                  | 27 ++++++++++++++++++++++++++-
 mm/vmscan.c                | 33 +++++++++++++++++++++++++++++----
 5 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 815ae87f0f8e..e0793569a9f0 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -17,6 +17,7 @@ extern void khugepaged_enter_vma(struct vm_area_struct *v=
ma,
 				 vm_flags_t vm_flags);
 extern void khugepaged_min_free_kbytes_update(void);
 extern bool current_is_khugepaged(void);
+extern int get_khp_collapse_priority(int total, int young);
 extern void khugepaged_add_collapse_hint(struct mm_struct *mm,
 					struct vm_area_struct *vma,
 					unsigned long address,
@@ -62,6 +63,12 @@ static inline bool current_is_khugepaged(void)
 {
 	return false;
 }
+
+static inline int get_khp_collapse_priority(int total, int young)
+{
+	return 0;
+}
+
 static inline void khugepaged_add_collapse_hint(struct mm_struct *mm,
 					       struct vm_area_struct *vma,
 					       unsigned long address,
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1331a7b93f33..643dd500c121 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -441,6 +441,18 @@ enum lruvec_flags {
=20
 #endif /* !__GENERATING_BOUNDS_H */
=20
+/*
+ * Used to get the young and total counts for a memory area,
+ * and also the maximum order of all the page table entries
+ * during scanning.
+ */
+struct area_access_info {
+	unsigned long address;
+	int total;
+	int young;
+	int max_order;
+};
+
 /*
  * Evictable folios are divided into multiple generations. The youngest an=
d the
  * oldest generation numbers, max_seq and min_seq, are monotonically incre=
asing.
@@ -689,7 +701,8 @@ struct lru_gen_memcg {
=20
 void lru_gen_init_pgdat(struct pglist_data *pgdat);
 void lru_gen_init_lruvec(struct lruvec *lruvec);
-bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw, unsigned int n=
r);
+bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw, unsigned int n=
r,
+			 struct area_access_info **acc_info_ptr);
=20
 void lru_gen_init_memcg(struct mem_cgroup *memcg);
 void lru_gen_exit_memcg(struct mem_cgroup *memcg);
@@ -712,7 +725,7 @@ static inline void lru_gen_init_lruvec(struct lruvec *l=
ruvec)
 }
=20
 static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw,
-		unsigned int nr)
+		unsigned int nr, struct area_access_info **acc_info_ptr)
 {
 	return false;
 }
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 2f21c0b6ab46..50c363846720 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -3031,6 +3031,18 @@ static enum scan_result collapse_single_pmd(unsigned=
 long addr,
 	return result;
 }
=20
+/*
+ * The caller needs to make sure the pmd is at least qualified for the
+ * lowest priority of collapsing since this function will always return
+ * a legal priority value.
+ */
+int get_khp_collapse_priority(int total, int young)
+{
+	if (young * 2 >=3D total)
+		return 0;
+	return NR_KHUGEPAGED_PRIORITY_LEVEL - 1;
+}
+
 /*
  * khugepaged_add_collapse_hint - enqueue a collapse hint
  * @mm:          target mm
diff --git a/mm/rmap.c b/mm/rmap.c
index 1c77d5dc06e9..1cd111e7b299 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -75,6 +75,7 @@
 #include <linux/userfaultfd_k.h>
 #include <linux/mm_inline.h>
 #include <linux/oom.h>
+#include <linux/khugepaged.h>
=20
 #include <asm/tlb.h>
=20
@@ -911,6 +912,12 @@ struct folio_referenced_arg {
 	struct mem_cgroup *memcg;
 };
=20
+/*
+ * acc_info is currently only used to track access patterns for khugepaged
+ * collapse hints. 3 entries are enough for most cases, and it's totally
+ * safe if we missed some hints.
+ */
+#define NR_ACC_INFO_EACH_ITER 3
 /*
  * arg: folio_referenced_arg will be passed
  */
@@ -921,6 +928,8 @@ static bool folio_referenced_one(struct folio *folio,
 	DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
 	int ptes =3D 0, referenced =3D 0;
 	unsigned int nr;
+	struct area_access_info acc_info[NR_ACC_INFO_EACH_ITER] =3D {0};
+	int acc_info_count =3D 0;
=20
 	while (page_vma_mapped_walk(&pvmw)) {
 		address =3D pvmw.address;
@@ -979,8 +988,16 @@ static bool folio_referenced_one(struct folio *folio,
 		 * simplest approach is to disable this look-around optimization.
 		 */
 		if (lru_gen_enabled() && !lru_gen_switching() && pvmw.pte) {
-			if (lru_gen_look_around(&pvmw, nr))
+			struct area_access_info *acc_info_ptr =3D NULL;
+
+			/* If the acc_info is full, skip the remaining ones */
+			if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+				acc_info_count < NR_ACC_INFO_EACH_ITER)
+				acc_info_ptr =3D &acc_info[acc_info_count];
+			if (lru_gen_look_around(&pvmw, nr, &acc_info_ptr))
 				referenced++;
+			if (acc_info_ptr && acc_info_ptr !=3D &acc_info[acc_info_count])
+				acc_info_count++;
 		} else if (pvmw.pte) {
 			if (clear_flush_young_ptes_notify(vma, address, pvmw.pte, nr))
 				referenced++;
@@ -1019,6 +1036,14 @@ static bool folio_referenced_one(struct folio *folio,
 		pra->vm_flags |=3D vma->vm_flags & ~VM_LOCKED;
 	}
=20
+	for (--acc_info_count; acc_info_count >=3D 0; acc_info_count--) {
+		khugepaged_add_collapse_hint(vma->vm_mm, vma,
+			acc_info[acc_info_count].address,
+			get_khp_collapse_priority(acc_info[acc_info_count].total,
+				acc_info[acc_info_count].young),
+			acc_info[acc_info_count].max_order);
+	}
+
 	if (!pra->mapcount)
 		return false; /* To break the loop */
=20
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e8a90911bf88..a0caf5cac951 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3463,7 +3463,7 @@ static void walk_update_folio(struct lru_gen_mm_walk =
*walk, struct folio *folio,
 }
=20
 static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long =
end,
-			   struct mm_walk *args)
+			   struct mm_walk *args, struct area_access_info *acc_info)
 {
 	int i;
 	bool dirty;
@@ -3472,6 +3472,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long =
start, unsigned long end,
 	unsigned long addr;
 	int total =3D 0;
 	int young =3D 0;
+	int max_order =3D 0;
 	struct folio *last =3D NULL;
 	struct lru_gen_mm_walk *walk =3D args->private;
 	struct mem_cgroup *memcg =3D lruvec_memcg(walk->lruvec);
@@ -3522,6 +3523,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long =
start, unsigned long end,
 						   max_nr, FPB_MERGE_YOUNG_DIRTY);
 			total +=3D nr - 1;
 			walk->mm_stats[MM_LEAF_TOTAL] +=3D nr - 1;
+			max_order =3D max(max_order, folio_order(folio));
 		}
=20
 		if (!test_and_clear_young_ptes_notify(args->vma, addr, cur_pte, nr))
@@ -3550,6 +3552,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long =
start, unsigned long end,
 	lazy_mmu_mode_disable();
 	pte_unmap_unlock(pte, ptl);
=20
+	acc_info->young =3D young;
+	acc_info->max_order =3D max_order;
+	acc_info->total =3D total;
 	return suitable_to_scan(total, young);
 }
=20
@@ -3667,6 +3672,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long =
start, unsigned long end,
 	vma =3D args->vma;
 	for (i =3D pmd_index(start), addr =3D start; addr !=3D end; i++, addr =3D=
 next) {
 		pmd_t val =3D pmdp_get_lockless(pmd + i);
+		struct area_access_info acc_info =3D {0};
=20
 		next =3D pmd_addr_end(addr, end);
=20
@@ -3699,11 +3705,16 @@ static void walk_pmd_range(pud_t *pud, unsigned lon=
g start, unsigned long end,
=20
 		walk->mm_stats[MM_NONLEAF_FOUND]++;
=20
-		if (!walk_pte_range(&val, addr, next, args))
+		if (!walk_pte_range(&val, addr, next, args, &acc_info))
 			continue;
=20
 		walk->mm_stats[MM_NONLEAF_ADDED]++;
=20
+		/* When acc_info has valid value */
+		if (acc_info.total > 0)
+			khugepaged_add_collapse_hint(vma->vm_mm, vma, addr,
+				get_khp_collapse_priority(acc_info.total, acc_info.young),
+				acc_info.max_order);
 		/* carry over to the next generation */
 		update_bloom_filter(mm_state, walk->seq + 1, pmd + i);
 	}
@@ -4183,7 +4194,8 @@ static void lru_gen_age_node(struct pglist_data *pgda=
t, struct scan_control *sc)
  * the PTE table to the Bloom filter. This forms a feedback loop between t=
he
  * eviction and the aging.
  */
-bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw, unsigned int n=
r)
+bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw, unsigned int n=
r,
+			 struct area_access_info **acc_info_ptr)
 {
 	int i;
 	bool dirty;
@@ -4202,6 +4214,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk =
*pvmw, unsigned int nr)
 	struct lru_gen_mm_state *mm_state;
 	unsigned long max_seq;
 	int gen;
+	unsigned int max_order =3D 0;
=20
 	lockdep_assert_held(pvmw->ptl);
 	VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio);
@@ -4265,6 +4278,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk =
*pvmw, unsigned int nr)
=20
 			nr =3D folio_pte_batch_flags(folio, NULL, pte, &ptent,
 						   max_nr, FPB_MERGE_YOUNG_DIRTY);
+			max_order =3D max(folio_order(folio), max_order);
 		}
=20
 		if (!test_and_clear_young_ptes_notify(vma, addr, pte, nr))
@@ -4288,8 +4302,19 @@ bool lru_gen_look_around(struct page_vma_mapped_walk=
 *pvmw, unsigned int nr)
 	lazy_mmu_mode_disable();
=20
 	/* feedback from rmap walkers to page table walkers */
-	if (mm_state && suitable_to_scan(i, young))
+	if (mm_state && suitable_to_scan(i, young)) {
+		if (*acc_info_ptr) {
+			struct area_access_info acc_info =3D {
+				.address =3D start,
+				.total =3D i,
+				.young =3D young,
+				.max_order =3D max_order
+			};
+			*(*acc_info_ptr) =3D acc_info;
+			(*acc_info_ptr)++;
+		}
 		update_bloom_filter(mm_state, max_seq, pvmw->pmd);
+	}
=20
 	mem_cgroup_put(memcg);
=20

--=20
2.52.0