[RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page

Wenchao Hao posted 1 patch 16 hours ago
include/linux/pgtable.h | 28 ++++++++++++++++++++++++++++
mm/memory.c             |  2 +-
2 files changed, 29 insertions(+), 1 deletion(-)
[RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
Posted by Wenchao Hao 16 hours ago
When do_anonymous_page() creates mappings for huge pages, it currently sets
the access bit for all mapped PTEs (Page Table Entries) by default.

This causes an issue where the Referenced field in /proc/pid/smaps cannot
distinguish whether a page was actually accessed.

So here introduces a new interface, set_anon_ptes(), which only sets the
access bit for the PTE corresponding to the faulting address. This allows
accurate tracking of page access status in /proc/pid/smaps before memory
reclaim scan the folios.

During memory reclaim: folio_referenced() checks and clears the access bits
of PTEs, rmap verifies all PTEs under a folio. If any PTE mapped subpage of
folio has access bit set, the folio is retained during reclaim. So only
set the access bit for the faulting PTE in do_anonymous_page() is safe, as
it does not interfere with reclaim decisions.

The patch only supports architectures without custom set_ptes()
implementations (e.g., x86). ARM64 and other architectures are not yet
supported.

Additionally, I have some questions regarding the contiguous page tables
for 64K huge pages on the ARM64 architecture.

'commit 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")'
described as following:

> Since a contpte block only has a single access and dirty bit, the semantic
> here changes slightly; when getting a pte (e.g.  ptep_get()) that is part
> of a contpte mapping, the access and dirty information are pulled from the
> block (so all ptes in the block return the same access/dirty info).

While the ARM64 manual states:

> If hardware updates a translation table entry, and if the Contiguous bit in
> that entry is 1, then the members in a group of contiguous translation table
> entries can have different AF, AP[2], and S2AP[1] values.

Does this mean the 16 PTEs are not necessary to share same AF for ARM?

Currently, for ARM64 huge pages with contiguous page tables enabled, the access
and dirty bits for 64K huge pages are actually folded in software.

However, I haven't found whether these access and dirty bits affect the TLB
coalescing of contiguous page tables. If they do not affect it, I think ARM64
can also set the access bit only for the PTE corresponding to the actual fault
address in do_anonymous_page().

Signed-off-by: Wenchao Hao <haowenchao22@gmail.com>
---
 include/linux/pgtable.h | 28 ++++++++++++++++++++++++++++
 mm/memory.c             |  2 +-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 652f287c1ef6..e2f3c932d672 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -302,6 +302,34 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 #endif
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
+#ifndef set_ptes
+static inline void set_anon_ptes(struct mm_struct *mm, unsigned long addr,
+		unsigned long fault_addr, pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	bool young = pte_young(pte);
+
+	page_table_check_ptes_set(mm, ptep, pte, nr);
+
+	for (;;) {
+		if (young && addr == fault_addr)
+			pte = pte_mkyoung(pte);
+		else
+			pte = pte_mkold(pte);
+
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+
+		addr += PAGE_SIZE;
+		ptep++;
+		pte = pte_next_pfn(pte);
+	}
+}
+#else
+#define set_anon_ptes(mm, addr, fault_addr, ptep, pte, nr) \
+		set_ptes(mm, addr, ptep, pte, nr)
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma,
 				 unsigned long address, pte_t *ptep,
diff --git a/mm/memory.c b/mm/memory.c
index da360a6eb8a4..65c69c7116a7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5273,7 +5273,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 setpte:
 	if (vmf_orig_pte_uffd_wp(vmf))
 		entry = pte_mkuffd_wp(entry);
-	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
+	set_anon_ptes(vma->vm_mm, addr, vmf->address, vmf->pte, entry, nr_pages);
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
-- 
2.45.0
Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
Posted by Kiryl Shutsemau 8 hours ago
On Tue, Feb 10, 2026 at 12:34:56PM +0800, Wenchao Hao wrote:
> When do_anonymous_page() creates mappings for huge pages, it currently sets
> the access bit for all mapped PTEs (Page Table Entries) by default.
> 
> This causes an issue where the Referenced field in /proc/pid/smaps cannot
> distinguish whether a page was actually accessed.
> 
> So here introduces a new interface, set_anon_ptes(), which only sets the
> access bit for the PTE corresponding to the faulting address. This allows
> accurate tracking of page access status in /proc/pid/smaps before memory
> reclaim scan the folios.
> 
> During memory reclaim: folio_referenced() checks and clears the access bits
> of PTEs, rmap verifies all PTEs under a folio. If any PTE mapped subpage of
> folio has access bit set, the folio is retained during reclaim. So only
> set the access bit for the faulting PTE in do_anonymous_page() is safe, as
> it does not interfere with reclaim decisions.

We had similar discussion about faultaround and briefly made it produce
old ptes, but it caused performance regression as old ptes require
additional pagewalk to set accessed bit on touch. It got reverted,
but arch can opt-in for setting up old ptes for non-fault address.

See commits:

5c0a85fad949 ("mm: make faultaround produce old ptes")
315d09bf30c2 ("Revert "mm: make faultaround produce old ptes"")
46bdb4277f98 ("mm: Allow architectures to request 'old' entries when prefaulting")

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page
Posted by David Hildenbrand (Arm) 11 hours ago
On 2/10/26 05:34, Wenchao Hao wrote:
> When do_anonymous_page() creates mappings for huge pages, it currently sets
> the access bit for all mapped PTEs (Page Table Entries) by default.
> 
> This causes an issue where the Referenced field in /proc/pid/smaps cannot
> distinguish whether a page was actually accessed.

What is the use case that cares about that?

What we have right now is the exact same behavior as if you would get a 
PMD THP that has a single access+dirty bit at fault time.

Also, architectures that support transparent PTE coalescing will not be 
able to coalesce until all PTE bits are equal.

This level of imprecision is to be expected with large folios that only 
have a single access+dirty bit.

-- 
Cheers,

David