From: David Hildenbrand <david@redhat.com>
Let's add variants to be used where "full" does not apply -- which will
be the majority of cases in the future. "full" really only applies if
we are about to tear down a full MM.
Use get_and_clear_ptes() in existing code, clear_ptes() users will
be added next.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
arch/arm64/mm/mmu.c | 2 +-
include/linux/pgtable.h | 45 +++++++++++++++++++++++++++++++++++++++++
mm/mremap.c | 2 +-
mm/rmap.c | 2 +-
4 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index abd9725796e9..20a89ab97dc5 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1528,7 +1528,7 @@ early_initcall(prevent_bootmem_remove_init);
pte_t modify_prot_start_ptes(struct vm_area_struct *vma, unsigned long addr,
pte_t *ptep, unsigned int nr)
{
- pte_t pte = get_and_clear_full_ptes(vma->vm_mm, addr, ptep, nr, /* full = */ 0);
+ pte_t pte = get_and_clear_ptes(vma->vm_mm, addr, ptep, nr);
if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) {
/*
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index e3b99920be05..4c035637eeb7 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -736,6 +736,29 @@ static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm,
}
#endif
+/**
+ * get_and_clear_ptes - Clear present PTEs that map consecutive pages of
+ * the same folio, collecting dirty/accessed bits.
+ * @mm: Address space the pages are mapped into.
+ * @addr: Address the first page is mapped at.
+ * @ptep: Page table pointer for the first entry.
+ * @nr: Number of entries to clear.
+ *
+ * Use this instead of get_and_clear_full_ptes() if it is known that we don't
+ * need to clear the full mm, which is mostly the case.
+ *
+ * Note that PTE bits in the PTE range besides the PFN can differ. For example,
+ * some PTEs might be write-protected.
+ *
+ * Context: The caller holds the page table lock. The PTEs map consecutive
+ * pages that belong to the same folio. The PTEs are all in the same PMD.
+ */
+static inline pte_t get_and_clear_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, unsigned int nr)
+{
+ return get_and_clear_full_ptes(mm, addr, ptep, nr, 0);
+}
+
#ifndef clear_full_ptes
/**
* clear_full_ptes - Clear present PTEs that map consecutive pages of the same
@@ -768,6 +791,28 @@ static inline void clear_full_ptes(struct mm_struct *mm, unsigned long addr,
}
#endif
+/**
+ * clear_ptes - Clear present PTEs that map consecutive pages of the same folio.
+ * @mm: Address space the pages are mapped into.
+ * @addr: Address the first page is mapped at.
+ * @ptep: Page table pointer for the first entry.
+ * @nr: Number of entries to clear.
+ *
+ * Use this instead of clear_full_ptes() if it is known that we don't need to
+ * clear the full mm, which is mostly the case.
+ *
+ * Note that PTE bits in the PTE range besides the PFN can differ. For example,
+ * some PTEs might be write-protected.
+ *
+ * Context: The caller holds the page table lock. The PTEs map consecutive
+ * pages that belong to the same folio. The PTEs are all in the same PMD.
+ */
+static inline void clear_ptes(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, unsigned int nr)
+{
+ clear_full_ptes(mm, addr, ptep, nr, 0);
+}
+
/*
* If two threads concurrently fault at the same page, the thread that
* won the race updates the PTE and its local TLB/Cache. The other thread
diff --git a/mm/mremap.c b/mm/mremap.c
index ac39845e9718..677a4d744df9 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -280,7 +280,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
old_pte, max_nr_ptes);
force_flush = true;
}
- pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr_ptes, 0);
+ pte = get_and_clear_ptes(mm, old_addr, old_ptep, nr_ptes);
pte = move_pte(pte, old_addr, new_addr);
pte = move_soft_dirty_pte(pte);
diff --git a/mm/rmap.c b/mm/rmap.c
index f93ce27132ab..568198e9efc2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2036,7 +2036,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
flush_cache_range(vma, address, end_addr);
/* Nuke the page table entry. */
- pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0);
+ pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages);
/*
* We clear the PTE but do not flush so potentially
* a remote CPU could still be writing to the folio.
--
2.30.2
On 24 Jul 2025, at 1:22, Dev Jain wrote: > From: David Hildenbrand <david@redhat.com> > > Let's add variants to be used where "full" does not apply -- which will > be the majority of cases in the future. "full" really only applies if > we are about to tear down a full MM. > > Use get_and_clear_ptes() in existing code, clear_ptes() users will > be added next. > > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Dev Jain <dev.jain@arm.com> > --- > arch/arm64/mm/mmu.c | 2 +- > include/linux/pgtable.h | 45 +++++++++++++++++++++++++++++++++++++++++ > mm/mremap.c | 2 +- > mm/rmap.c | 2 +- > 4 files changed, 48 insertions(+), 3 deletions(-) > Reviewed-by: Zi Yan <ziy@nvidia.com> Best Regards, Yan, Zi
On Thu, Jul 24, 2025 at 10:52:59AM +0530, Dev Jain wrote: > From: David Hildenbrand <david@redhat.com> > > Let's add variants to be used where "full" does not apply -- which will > be the majority of cases in the future. "full" really only applies if > we are about to tear down a full MM. > > Use get_and_clear_ptes() in existing code, clear_ptes() users will > be added next. > > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Dev Jain <dev.jain@arm.com> LGTM, so: Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > arch/arm64/mm/mmu.c | 2 +- > include/linux/pgtable.h | 45 +++++++++++++++++++++++++++++++++++++++++ > mm/mremap.c | 2 +- > mm/rmap.c | 2 +- > 4 files changed, 48 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index abd9725796e9..20a89ab97dc5 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -1528,7 +1528,7 @@ early_initcall(prevent_bootmem_remove_init); > pte_t modify_prot_start_ptes(struct vm_area_struct *vma, unsigned long addr, > pte_t *ptep, unsigned int nr) > { > - pte_t pte = get_and_clear_full_ptes(vma->vm_mm, addr, ptep, nr, /* full = */ 0); > + pte_t pte = get_and_clear_ptes(vma->vm_mm, addr, ptep, nr); > > if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) { > /* > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index e3b99920be05..4c035637eeb7 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -736,6 +736,29 @@ static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm, > } > #endif > > +/** > + * get_and_clear_ptes - Clear present PTEs that map consecutive pages of > + * the same folio, collecting dirty/accessed bits. > + * @mm: Address space the pages are mapped into. > + * @addr: Address the first page is mapped at. > + * @ptep: Page table pointer for the first entry. > + * @nr: Number of entries to clear. > + * > + * Use this instead of get_and_clear_full_ptes() if it is known that we don't > + * need to clear the full mm, which is mostly the case. > + * > + * Note that PTE bits in the PTE range besides the PFN can differ. For example, > + * some PTEs might be write-protected. > + * > + * Context: The caller holds the page table lock. The PTEs map consecutive > + * pages that belong to the same folio. The PTEs are all in the same PMD. > + */ > +static inline pte_t get_and_clear_ptes(struct mm_struct *mm, unsigned long addr, > + pte_t *ptep, unsigned int nr) > +{ > + return get_and_clear_full_ptes(mm, addr, ptep, nr, 0); > +} > + > #ifndef clear_full_ptes > /** > * clear_full_ptes - Clear present PTEs that map consecutive pages of the same > @@ -768,6 +791,28 @@ static inline void clear_full_ptes(struct mm_struct *mm, unsigned long addr, > } > #endif > > +/** > + * clear_ptes - Clear present PTEs that map consecutive pages of the same folio. > + * @mm: Address space the pages are mapped into. > + * @addr: Address the first page is mapped at. > + * @ptep: Page table pointer for the first entry. > + * @nr: Number of entries to clear. > + * > + * Use this instead of clear_full_ptes() if it is known that we don't need to > + * clear the full mm, which is mostly the case. > + * > + * Note that PTE bits in the PTE range besides the PFN can differ. For example, > + * some PTEs might be write-protected. > + * > + * Context: The caller holds the page table lock. The PTEs map consecutive > + * pages that belong to the same folio. The PTEs are all in the same PMD. > + */ > +static inline void clear_ptes(struct mm_struct *mm, unsigned long addr, > + pte_t *ptep, unsigned int nr) > +{ > + clear_full_ptes(mm, addr, ptep, nr, 0); > +} > + > /* > * If two threads concurrently fault at the same page, the thread that > * won the race updates the PTE and its local TLB/Cache. The other thread > diff --git a/mm/mremap.c b/mm/mremap.c > index ac39845e9718..677a4d744df9 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -280,7 +280,7 @@ static int move_ptes(struct pagetable_move_control *pmc, > old_pte, max_nr_ptes); > force_flush = true; > } > - pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr_ptes, 0); > + pte = get_and_clear_ptes(mm, old_addr, old_ptep, nr_ptes); > pte = move_pte(pte, old_addr, new_addr); > pte = move_soft_dirty_pte(pte); > > diff --git a/mm/rmap.c b/mm/rmap.c > index f93ce27132ab..568198e9efc2 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -2036,7 +2036,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > flush_cache_range(vma, address, end_addr); > > /* Nuke the page table entry. */ > - pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0); > + pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages); > /* > * We clear the PTE but do not flush so potentially > * a remote CPU could still be writing to the folio. > -- > 2.30.2 >
On Thu, Jul 24, 2025 at 1:23 PM Dev Jain <dev.jain@arm.com> wrote: > > From: David Hildenbrand <david@redhat.com> > > Let's add variants to be used where "full" does not apply -- which will > be the majority of cases in the future. "full" really only applies if > we are about to tear down a full MM. > > Use get_and_clear_ptes() in existing code, clear_ptes() users will > be added next. > > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Barry Song <baohua@kernel.org> Thanks Barry
© 2016 - 2025 Red Hat, Inc.