From nobody Fri Jun 12 00:22:50 2026 Received: from mxhk.zte.com.cn (mxhk.zte.com.cn [160.30.148.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFDF7408638 for ; Thu, 11 Jun 2026 13:15:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=160.30.148.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781183710; cv=none; b=pN/ZwAbrC/aeqfsZqpqDQ6ZvxLnFFS+DtxeaHmM3BEQMOTXsK6cqEbTdhMdhTmEZS+Jf2drouEcGOTBrIg4kVKsqhUPUj3lwdYaQAycu+RdxYaT2zk/1juO8WpU5xPWZYShllfW4Oq3w+nh9exoYScf7zFSAVUAs5XsF+MS64RU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781183710; c=relaxed/simple; bh=jHFHQvEKOscTOWIJIKnOnKgnPnDpI7V9v0QdgFCnlKo=; h=Message-ID:In-Reply-To:References:Date:Mime-Version:From:To:Cc: Subject:Content-Type; b=U8fYp+n9kCVaEi1X9oCa/LBONtYK0x543xyVqW1gH4T1hPymPlM1DU18mALEDTI0OwTo5bDq/HejXmAum7nd94e8xeTZ98jdio2x/ObYk43PCIFkWweVU6oF7WBHCHuS+Zz4em5jkk8g/xyh04CKcInrXpRTIEjh/VEN4sL1nbo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zte.com.cn; spf=pass smtp.mailfrom=zte.com.cn; arc=none smtp.client-ip=160.30.148.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zte.com.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=zte.com.cn Received: from mse-fl2.zte.com.cn (unknown [10.5.228.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mxhk.zte.com.cn (FangMail) with ESMTPS id 4gbjmc4Xpvz5B0yy; Thu, 11 Jun 2026 21:14:56 +0800 (CST) Received: from xaxapp01.zte.com.cn ([10.88.99.176]) by mse-fl2.zte.com.cn with SMTP id 65BDEq68049883; Thu, 11 Jun 2026 21:14:52 +0800 (+08) (envelope-from xu.xin16@zte.com.cn) Received: from mapi (xaxapp01[null]) by mapi (Zmail) with MAPI id mid32; Thu, 11 Jun 2026 21:14:55 +0800 (CST) X-Zmail-TransId: 2af96a2ab4cf1cf-96fa3 X-Mailer: Zmail v1.0 Message-ID: <20260611211455080AYUdDh-QB1227GDBgfxhv@zte.com.cn> In-Reply-To: <20260611211311025VA1q1gTA-X5NMF0aZjnvh@zte.com.cn> References: 20260611211311025VA1q1gTA-X5NMF0aZjnvh@zte.com.cn Date: Thu, 11 Jun 2026 21:14:55 +0800 (CST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 From: To: Cc: , , , , , , , Subject: =?UTF-8?B?W1BBVENIIHY5IDEvM10ga3NtOiBhZGQgbGluZWFyX3BhZ2VfaW5kZXggaW50byBrc21fcm1hcF9pdGVt?= X-MAIL: mse-fl2.zte.com.cn 65BDEq68049883 X-TLS: YES X-ENVELOPE-SENDER: xu.xin16@zte.com.cn X-SOURCE-IP: 10.5.228.133 unknown Thu, 11 Jun 2026 21:14:56 +0800 X-CLEAN: YES X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 6A2AB4D0.001/4gbjmc4Xpvz5B0yy Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: xu xin As preparation for KSM rmap optimizations, let's track the original linear_page_index() of a de-duplicated page in its ksm_rmap_item, so we can efficiently search for the page in an address space, avoiding scanning the entire address space. This was previously discussed in [1, 2]. To avoid growing ksm_rmap_item, let's squeeze it into the existing structure by overlying some members (oldchecksum, age, remaining_skips) that are only relevant while on the unstable tree. The new entry will only be relevant for entries in the stable tree. However, as the age information is read by should_skip_rmap_item() with the smart-scanning approach even while we have an entry in the stable tree, but the page changes (no longer a KSM page, for example due to COW), we have to change the handling there a bit. We'll calculate the linear page index in try_to_merge_with_ksm_page(), when adding it to the stable tree, and reset the index (to reset overlayed data) when removing an item from the stable tree -- in remove_rmap_item_from_tree(), remove_node_from_stable_tree() and break_cow(). To be specially clarified, the reason for resetting the stored index at break_cow() is: - When a page successfully becomes a KSM page (i.e., after stable_tree_append() sets STABLE_FLAG), both anon_vma and the index are stored and remain valid. - However, during the merging process, there are several failure paths where we already prepared an rmap item to be added to the stable tree, but must revert that as some part of the merge process failed. Examples include: 1 The second call to try_to_merge_with_ksm_page() fails in try_to_merge_two_pages(). 2 stable_tree_insert() fails in cmp_and_merge_page(). In such cases, break_cow() is invoked to break the COW mapping and discard the KSM state. Currently, break_cow() already contains a put_anon_vma(rmap_item->anon_vma) to release the reference taken during the aborted merge. Because the index is logically paired with anon_vma (both are only meaningful when the rmap_item is in a stable state), it must also be cleared (or reset) in break_cow() to avoid leaving stale linear_page_index values that could confuse subsequent rmap walks or scanning logic. [1] https://lore.kernel.org/all/adTPQSb-qSSHviJN@lucifer/ [2] https://lore.kernel.org/all/202604091806051535BJWZ_FTtdIm3Snk24ei_@zte.= com.cn/ Acked-by: David Hildenbrand (Arm) Signed-off-by: xu xin --- mm/ksm.c | 48 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 41 insertions(+), 7 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 7d5b76478f0b..60c6f959d81a 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -195,22 +195,28 @@ struct ksm_stable_node { * @node: rb node of this rmap_item in the unstable tree * @head: pointer to stable_node heading this list in the stable tree * @hlist: link into hlist of rmap_items hanging off that stable_node - * @age: number of scan iterations since creation - * @remaining_skips: how many scans to skip + * @age: number of scan iterations since creation (unstable node) + * @remaining_skips: how many scans to skip (unstable node) + * @linear_page_index: the original page's index before merged by KSM (sta= ble node) */ struct ksm_rmap_item { struct ksm_rmap_item *rmap_list; union { - struct anon_vma *anon_vma; /* when stable */ + struct anon_vma *anon_vma; /* for reverse mapping, when stable */ #ifdef CONFIG_NUMA int nid; /* when node of unstable tree */ #endif }; struct mm_struct *mm; unsigned long address; /* + low bits used for flags below */ - unsigned int oldchecksum; /* when unstable */ - rmap_age_t age; - rmap_age_t remaining_skips; + union { + struct { + unsigned int oldchecksum; + rmap_age_t age; + rmap_age_t remaining_skips; + }; /* when unstable */ + unsigned long linear_page_index; /* for reverse mapping, when stable = */ + }; union { struct rb_node node; /* when node of unstable tree */ struct { /* when listed from stable tree */ @@ -776,6 +782,11 @@ static struct vm_area_struct *find_mergeable_vma(struc= t mm_struct *mm, return vma; } +/* + * break_cow: actively break COW, replacing the KSM page by a fresh anonym= ous + * page. This is called when rmap_item has not yet become stable, but page + * has been merged. + */ static void break_cow(struct ksm_rmap_item *rmap_item) { struct mm_struct *mm =3D rmap_item->mm; @@ -787,6 +798,11 @@ static void break_cow(struct ksm_rmap_item *rmap_item) * to undo, we also need to drop a reference to the anon_vma. */ put_anon_vma(rmap_item->anon_vma); + /* + * Reset linear_page_index that might overlay age-related + * information. (it's still unstable node) + */ + rmap_item->linear_page_index =3D 0; mmap_read_lock(mm); vma =3D find_mergeable_vma(mm, addr); @@ -899,6 +915,8 @@ static void remove_node_from_stable_tree(struct ksm_sta= ble_node *stable_node) VM_BUG_ON(stable_node->rmap_hlist_len <=3D 0); stable_node->rmap_hlist_len--; put_anon_vma(rmap_item->anon_vma); + /* Reset linear_page_index that might overlay age-related information. */ + rmap_item->linear_page_index =3D 0; rmap_item->address &=3D PAGE_MASK; cond_resched(); } @@ -1052,6 +1070,8 @@ static void remove_rmap_item_from_tree(struct ksm_rma= p_item *rmap_item) stable_node->rmap_hlist_len--; put_anon_vma(rmap_item->anon_vma); + /* Reset linear_page_index that might overlay age-related information. */ + rmap_item->linear_page_index =3D 0; rmap_item->head =3D NULL; rmap_item->address &=3D PAGE_MASK; @@ -1598,8 +1618,15 @@ static int try_to_merge_with_ksm_page(struct ksm_rma= p_item *rmap_item, /* Unstable nid is in union with stable anon_vma: remove first */ remove_rmap_item_from_tree(rmap_item); - /* Must get reference to anon_vma while still holding mmap_lock */ + /* + * We can consider the VMA only while still holding the mmap lock, + * so lock, so reference the anon_vma and calculate the linear + * page index early, before stable_tree_append(). If anything goes + * wrong that prevents the rmap_item from being added to the + * stable_tree, break_cow() will clean it up. + */ rmap_item->anon_vma =3D vma->anon_vma; + rmap_item->linear_page_index =3D linear_page_index(vma, rmap_item->addres= s); get_anon_vma(vma->anon_vma); out: mmap_read_unlock(mm); @@ -2458,6 +2485,13 @@ static bool should_skip_rmap_item(struct folio *foli= o, if (folio_test_ksm(folio)) return false; + /* + * There is no age information in stable-tree nodes. We might end up + * here without a KSM page for example after COW. + */ + if (rmap_item->address & STABLE_FLAG) + return false; + age =3D rmap_item->age; if (age !=3D U8_MAX) rmap_item->age++; --=20 2.25.1 From nobody Fri Jun 12 00:22:50 2026 Received: from mxhk.zte.com.cn (mxhk.zte.com.cn [160.30.148.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6E3140F8DA for ; Thu, 11 Jun 2026 13:21:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=160.30.148.35 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781184109; cv=none; b=BR3wlSRWc0S8KyblXJU9xPZ2X3hRwAvmeI0RissHp8roprbTsrPwzjETzxiCC7RcfwFelkXI3OG1rhYv9ebl+DoxaFo428o2rrffc5IIsNh4bUlWPAyMxPEgbILC7KXOZ9rCpN25qLo04XWJt7NUBQtSKCrcU3EHWi1/9bjzRNk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781184109; c=relaxed/simple; bh=y+SmIkGZfqHsx2NqtJIuPRfux/5DojlL1DNacsnsQRw=; h=Message-ID:In-Reply-To:References:Date:Mime-Version:From:To:Cc: Subject:Content-Type; b=YZnaMnv9PYUzT/m7uDn5cYkrhffHhGcnzGIMDBS9MFSBfe0wAJ/phpf3Cl+lxV/Y6zpc2am8EVj/0ij6KFUmUyH1e8epqjW9a5d/ptL0qjmCg1UqbCs8FCS9NR6PEvhT11R/JYa5khN4AWtbBx/2+qnlmFn/sbveuCzgvDpBcLQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zte.com.cn; spf=pass smtp.mailfrom=zte.com.cn; arc=none smtp.client-ip=160.30.148.35 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zte.com.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=zte.com.cn Received: from mse-fl2.zte.com.cn (unknown [10.5.228.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mxhk.zte.com.cn (FangMail) with ESMTPS id 4gbjwP20wJz8Xrrd; Thu, 11 Jun 2026 21:21:41 +0800 (CST) Received: from xaxapp02.zte.com.cn ([10.88.97.241]) by mse-fl2.zte.com.cn with SMTP id 65BDLZif054376; Thu, 11 Jun 2026 21:21:35 +0800 (+08) (envelope-from xu.xin16@zte.com.cn) Received: from mapi (xaxapp02[null]) by mapi (Zmail) with MAPI id mid32; Thu, 11 Jun 2026 21:21:38 +0800 (CST) X-Zmail-TransId: 2afa6a2ab662fbb-95d91 X-Mailer: Zmail v1.0 Message-ID: <20260611212138619BRrXoKru-JeX9_5F1yX4l@zte.com.cn> In-Reply-To: <20260611211311025VA1q1gTA-X5NMF0aZjnvh@zte.com.cn> References: 20260611211311025VA1q1gTA-X5NMF0aZjnvh@zte.com.cn Date: Thu, 11 Jun 2026 21:21:38 +0800 (CST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 From: To: Cc: , , , , , , , , Subject: =?UTF-8?B?W1BBVENIIHY5IDIvM10ga3NtOiBPcHRpbWl6ZSBybWFwX3dhbGtfa3NtIGJ5IHBhc3NpbmcgYSBzdWl0YWJsZSBwYWdlwqBpbmRleA==?= Content-Type: text/plain; charset="utf-8" X-MAIL: mse-fl2.zte.com.cn 65BDLZif054376 X-TLS: YES X-ENVELOPE-SENDER: xu.xin16@zte.com.cn X-SOURCE-IP: 10.5.228.133 unknown Thu, 11 Jun 2026 21:21:41 +0800 X-CLEAN: YES X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 6A2AB665.000/4gbjwP20wJz8Xrrd Content-Transfer-Encoding: quoted-printable From: xu xin User impact / Why this matters to Linux users =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D When a system runs with KSM enabled and memory becomes tight, KSM pages may be swapped out or migrated. The kernel then performs a reverse map walk by rmap_walk_ksm to locate all page table entries that reference these pages. If A large number of unrelated VMAs can attach to a single anon_vma related with this KSM page, then rmap_walk might be severe performance bottleneck. In our embedded test environment, we observed ~20,000 VMAs sharing one anon_vma without any fork =E2=80=93 purely from VMA splits=EF=BC=8C which cause 200~700ms duration of rmap_walk_ksm. When one of those VMAs mapped a KSM page, then this KSM page's rmapping will become bottleneck with hold its anon_vma lock for a long time. The anon_vma lock is not only used by KSM; it is a core lock protecting the VMA interval tree and is acquired by many critical memory operations: =E2=80=A2 Page faults: do_anonymous_page(), do_wp_page() (during COW) =E2=80=A2 Memory reclaim: try_to_unmap() =E2=80=A2 Page migration & compaction: migrate_pages(), compact_zone() =E2=80=A2 mlock / munlock: mlock_fixup() =E2=80=A2 Process exit: exit_mmap() (tearing down VMAs) =E2=80=A2 Cgroup memory accounting: mem_cgroup_move_charge() If one thread holds the anon_vma lock for hundreds of milliseconds because of an inefficient KSM rmap walk, any other thread that tries to acquire the same lock (e.g., an application taking a page fault, kswapd reclaiming pages, or a migration thread) will block. This leads to stalled application threads, increased latency spikes, and in extreme cases container timeouts or watchdog triggers. This patch reduces the worst-case anon_vma lock hold time during ksm_rmap_walk from >500 ms to <1 ms, thereby almost eliminating this source of lock contention and improving system responsiveness under memory pressure. Real-world examples: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D - JVM / Go runtime: These use mmap for heap regions and later call mprotect(PROT_NONE) for garbage collection barriers or guard pages, splitting the original VMA into thousands of small pieces over time. - Database engines (MySQL, PostgreSQL): Large shared memory buffers or anonymous mappings are managed with madvise(MADV_DONTNEED) to release specific pages, which also splits VMAs. Root Cause =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Through local debugging trace analysis, we found that most of the latency of rmap_walk_ksm occurs within anon_vma_interval_tree_foreach, leading to an excessively long hold time on the anon_vma lock (even reaching 500ms or more), which in turn causes upper-layer applications (waiting for the anon_vma lock) to be blocked for extended periods. Further investigation revealed that 99.9% of iterations inside the anon_vma_interval_tree_foreach loop are skipped due to the first check "if (addr < vma->vm_start || addr >=3D vma->vm_end)), indicating that a large number of loop iterations are ineffective. This inefficiency arises because the start page index and the end page index parameters passed to anon_vma_interval_tree_foreach span the entire address space from 0 to ULONG_MAX, resulting in very poor loop efficiency. Solution =3D=3D=3D=3D=3D=3D=3D=3D We cannot rely solely on anon_vma to locate all PTEs mapping this page but also need to have the original page's linear_page_index. Since the implementation of anon_vma_interval_tree_foreach =E2=80=94 it essentially iterates to find a suitable VMA such that the provided page index falls within the candidate's vm_pgoff range. vm_pgoff <=3D original linear page offset <=3D (vm_pgoff + vma_pages(v) - 1) Fortunately, an earlier commit introduced the linear_page_index to struct ksm_rmap_item, allowing for optimizing the RMAP walk. Test results =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D A rmap testbench can be obtained with two Out-Of-Tree patches at [1][2]. After applying the OOT patches and building rmap_benchmark from: tools/testing/rmap/rmap_benchmark.c, we can start the performance test. The testing result in QEMU is shown as follows: KSM rmapping Maximum duration Average duration Before: 705.12 ms (705119858 ns) 532.04 ms (532041586 ns) After: 1.67 ms (1665917 ns) 1.44 ms (1443784 ns) The benchmark numbers are realistic, since we observed ~20,000 VMAs sharing one anon_vma on a production system running a Java application with KSM enabled. The lock hold time before the patch was measured at 228=E2=80=AFms (max) during rmap walks triggered by memory compaction and p= age migration. The benchmark reproduces that VMA count and lock=E2=80=91hold behavior in a controlled environment. [1] https://lore.kernel.org/all/202605301703094695zmVgcSC27BNR0rH0N8_x@zte.= com.cn [2] https://lore.kernel.org/all/20260530170404509QpJmBtpSjn3uQHeVKA2iA@zte.= com.cn/ Co-developed-by: Wang Yaxin Signed-off-by: Wang Yaxin Signed-off-by: xu xin Acked-by: David Hildenbrand (Arm) --- mm/ksm.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/mm/ksm.c b/mm/ksm.c index 60c6f959d81a..454ba2eb46e9 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -3207,6 +3207,7 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_w= alk_control *rwc) hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) { /* Ignore the stable/unstable/sqnr flags */ const unsigned long addr =3D rmap_item->address & PAGE_MASK; + const unsigned long index =3D rmap_item->linear_page_index; struct anon_vma *anon_vma =3D rmap_item->anon_vma; struct anon_vma_chain *vmac; struct vm_area_struct *vma; @@ -3220,8 +3221,18 @@ void rmap_walk_ksm(struct folio *folio, struct rmap_= walk_control *rwc) anon_vma_lock_read(anon_vma); } + /* + * Currently, KSM folios are always small folios, so it's + * sufficient to search for a single page. We can simply use + * the linear_page_index of the original de-duplicate + * anonymous page that we remembered in the rmap_item while + * de-duplicating. Note that mremap() always de-duplicates KSM + * folios: so if there was mremap() in our parent or our child, + * we wouldn't have the KSM folio mapped in these processes + * anymore. + */ anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, - 0, ULONG_MAX) { + index, index) { cond_resched(); vma =3D vmac->vma; --=20 2.25.1 From nobody Fri Jun 12 00:22:50 2026 Received: from mxct.zte.com.cn (mxct.zte.com.cn [183.62.165.209]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8A8A40628D for ; Thu, 11 Jun 2026 13:22:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=183.62.165.209 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781184174; cv=none; b=NPt8ceKPUI3/TE8AWdRrZo/shgofBoEnWL8to4M7tlfi1sqAQNCATz/PiWDD1e7AdI6xrPMhYb+8F+dngIO/rvmWVhi2cZk36heWCY9jFIBXDut7K7q1RhhXL8hq/rwBs0hEebawd6qAvIQYMpxhfpltqliQcpijvn89WzTJjGk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781184174; c=relaxed/simple; bh=t89s9ns9CUTo0DB/hOCEf1eczREKposDG4SpQPNM4ic=; h=Message-ID:In-Reply-To:References:Date:Mime-Version:From:To:Cc: Subject:Content-Type; b=G3KOBDn1BtA8A4DR1XzM/Qd2w1xBOssSJnOGhVHoO6KbJsl6ol8iNdZ26NhgBZusdULrDAdTYfo8lq92I0CtdDPJL0s1xKfgOx2vVlM7i5k/KyxN7i8OInpgNvhNUw6uvQysX+7vtlty12c3cxXWkRPFbgThflepB7Sy25wTl9c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zte.com.cn; spf=pass smtp.mailfrom=zte.com.cn; arc=none smtp.client-ip=183.62.165.209 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zte.com.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=zte.com.cn Received: from mse-fl1.zte.com.cn (unknown [10.5.228.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mxct.zte.com.cn (FangMail) with ESMTPS id 4gbjxS1wR1z4x69t; Thu, 11 Jun 2026 21:22:36 +0800 (CST) Received: from xaxapp04.zte.com.cn ([10.99.98.157]) by mse-fl1.zte.com.cn with SMTP id 65BDMWrc002095; Thu, 11 Jun 2026 21:22:32 +0800 (+08) (envelope-from xu.xin16@zte.com.cn) Received: from mapi (xaxapp05[null]) by mapi (Zmail) with MAPI id mid32; Thu, 11 Jun 2026 21:22:35 +0800 (CST) X-Zmail-TransId: 2afc6a2ab69b5c4-823ad X-Mailer: Zmail v1.0 Message-ID: <202606112122356956c-u4WhFlRWs4-OJ4TAsu@zte.com.cn> In-Reply-To: <20260611211311025VA1q1gTA-X5NMF0aZjnvh@zte.com.cn> References: 20260611211311025VA1q1gTA-X5NMF0aZjnvh@zte.com.cn Date: Thu, 11 Jun 2026 21:22:35 +0800 (CST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 From: To: Cc: , , , , , , , Subject: =?UTF-8?B?W1BBVENIIHY5IDMvM10ga3NtOiBhZGQgbXJlbWFwIHNlbGZ0ZXN0cyBmb3Iga3NtX3JtYXBfd2Fsaw==?= X-MAIL: mse-fl1.zte.com.cn 65BDMWrc002095 X-TLS: YES X-ENVELOPE-SENDER: xu.xin16@zte.com.cn X-SOURCE-IP: 10.5.228.132 unknown Thu, 11 Jun 2026 21:22:36 +0800 X-CLEAN: YES X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 6A2AB69C.000/4gbjxS1wR1z4x69t Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: xu xin The existing tools/testing/selftests/mm/rmap.c has already one testcase for ksm_rmap_walk in TEST_F(migrate, ksm), which takes use of migration of page from one NUMA node to another NUMA node. However, it just lacks the scenario of mremapped VMAs. We add the calling of mremap() and then trigger KSM to merge pages before migrating, which is specifically to test an optimization which is introduced by this patch ("ksm: Optimize rmap_walk_ksm by passing a suitable address pgoff"). This test can reproduce the issue that Hugh points out at https://lore.kernel.org/all/02e1b8df-d568-8cbb-b8f6-46d5476d9d75@google.com/ Signed-off-by: xu xin --- tools/testing/selftests/mm/rmap.c | 90 +++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) diff --git a/tools/testing/selftests/mm/rmap.c b/tools/testing/selftests/mm= /rmap.c index 53f2058b0ef2..375bcb1fe1ab 100644 --- a/tools/testing/selftests/mm/rmap.c +++ b/tools/testing/selftests/mm/rmap.c @@ -430,4 +430,94 @@ TEST_F(migrate, ksm) propagate_children(_metadata, data); } +static int mremap_merge_and_migrate(struct global_data *data) +{ + int ret, i, pagemap_fd; + void *old_region; + void *new_region; + int nr_pages =3D 32; + int page_size =3D getpagesize(); + unsigned long old_pfn; + + /* Allocate exactly pages for the test */ + data->mapsize =3D nr_pages * page_size; + data->region =3D mmap(NULL, data->mapsize, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON, -1, 0); + if (data->region =3D=3D MAP_FAILED) + ksft_exit_fail_perror("mmap failed"); + memset(data->region, 0x77, data->mapsize); + + old_region =3D data->region; + /* + * Mremap the second half region to the first half location (FIXED). + */ + new_region =3D mremap(old_region + data->mapsize / 2, data->mapsize / 2, + data->mapsize / 2, MREMAP_MAYMOVE | MREMAP_FIXED, + old_region); + if (new_region =3D=3D MAP_FAILED) { + ksft_print_msg("mremap failed: %s\n", strerror(errno)); + return FAIL_ON_CHECK; + } + data->region =3D new_region; + data->mapsize /=3D 2; + + /* Trigger KSM to merge these pages */ + if (ksm_start() < 0) + return FAIL_ON_CHECK; + + pagemap_fd =3D open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd =3D=3D -1) + return FAIL_ON_WORK; + + /* Before migrating, check if All pages's PFN are the same */ + *data->expected_pfn =3D pagemap_get_pfn(pagemap_fd, data->region); + for (i =3D 1; i < nr_pages / 2; i++) { + if (pagemap_get_pfn(pagemap_fd, data->region + i * page_size) + !=3D *data->expected_pfn) { + ksft_print_msg("PFN is not expected\n"); + return FAIL_ON_CHECK; + } + } + old_pfn =3D *data->expected_pfn; + + /* Attempt to migrate the merged KSM page */ + ret =3D try_to_move_page(data->region); + if (ret !=3D 0) { + ksft_print_msg("migration of KSM page after mremap failed\n"); + return FAIL_ON_CHECK; + } + + /* After migrating, check if all PFN aren't the old */ + for (i =3D 1; i < nr_pages / 2; i++) { + if (pagemap_get_pfn(pagemap_fd, data->region + i * page_size) + =3D=3D old_pfn) { + ksft_print_msg("Bug migration: still old PFN\n"); + return FAIL_ON_CHECK; + } + } + + return 0; +} + + +TEST_F(migrate, ksm_and_mremap) +{ + struct global_data *data =3D &self->data; + int ret; + + /* Skip if KSM is not available */ + if (ksm_stop() < 0) + SKIP(return, "accessing \"/sys/kernel/mm/ksm/run\" failed"); + if (ksm_get_full_scans() < 0) + SKIP(return, "accessing \"/sys/kernel/mm/ksm/full_scan\" failed"); + + ret =3D prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0); + if (ret < 0 && errno =3D=3D EINVAL) + SKIP(return, "PR_SET_MEMORY_MERGE not supported"); + else if (ret) + ksft_exit_fail_perror("PR_SET_MEMORY_MERGE=3D1 failed"); + + ASSERT_EQ(mremap_merge_and_migrate(data), 0); +} + TEST_HARNESS_MAIN --=20 2.25.1