To use PTE batching, we want to determine whether the folio mapped by
the PTE is large, thus requiring the use of vm_normal_folio(). We want
to avoid the cost of vm_normal_folio() if the code path doesn't already
require the folio. For arm64, pte_batch_hint() does the job. To generalize
this hint, add a helper which will determine whether two consecutive PTEs
point to consecutive PFNs, in which case there is a high probability that
the underlying folio is large.
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
include/linux/pgtable.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index b50447ef1c92..28e21fcc7837 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -369,6 +369,22 @@ static inline pgd_t pgdp_get(pgd_t *pgdp)
}
#endif
+/* Caller must ensure that ptep + 1 exists */
+static inline bool maybe_contiguous_pte_pfns(pte_t *ptep, pte_t pte)
+{
+ pte_t *next_ptep, next_pte;
+
+ if (pte_batch_hint(ptep, pte) != 1)
+ return true;
+
+ next_ptep = ptep + 1;
+ next_pte = ptep_get(next_ptep);
+ if (!pte_present(next_pte))
+ return false;
+
+ return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE);
+}
+
#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long address,
--
2.30.2
On 06.05.25 07:00, Dev Jain wrote:
> To use PTE batching, we want to determine whether the folio mapped by
> the PTE is large, thus requiring the use of vm_normal_folio(). We want
> to avoid the cost of vm_normal_folio() if the code path doesn't already
> require the folio. For arm64, pte_batch_hint() does the job. To generalize
> this hint, add a helper which will determine whether two consecutive PTEs
> point to consecutive PFNs, in which case there is a high probability that
> the underlying folio is large.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> include/linux/pgtable.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index b50447ef1c92..28e21fcc7837 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -369,6 +369,22 @@ static inline pgd_t pgdp_get(pgd_t *pgdp)
> }
> #endif
>
> +/* Caller must ensure that ptep + 1 exists */
> +static inline bool maybe_contiguous_pte_pfns(pte_t *ptep, pte_t pte)
> +{
> + pte_t *next_ptep, next_pte;
> +
> + if (pte_batch_hint(ptep, pte) != 1)
> + return true;
> +
> + next_ptep = ptep + 1;
> + next_pte = ptep_get(next_ptep);
> + if (!pte_present(next_pte))
> + return false;
> +
> + return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE);
> +}
So, where we want to use that is:
if (pte_present(old_pte)) {
if ((max_nr != 1) && maybe_contiguous_pte_pfns(old_ptep, old_pte)) {
struct folio *folio = vm_normal_folio(vma, old_addr, old_pte);
if (folio && folio_test_large(folio))
nr = folio_pte_batch(folio, old_addr, old_ptep,
old_pte, max_nr, fpb_flags, NULL, NULL, NULL);
}
}
where we won't need the folio later. But want it all part of the same folio?
And the simpler version would be
if (pte_present(old_pte)) {
if (max_nr != 1) {
struct folio *folio = vm_normal_folio(vma, old_addr, old_pte);
if (folio && folio_test_large(folio))
nr = folio_pte_batch(folio, old_addr, old_ptep,
old_pte, max_nr, fpb_flags, NULL, NULL, NULL);
}
}
Two things come to mind:
(1) Do we *really* care about the vm_normal_folio() + folio_test_large() call that much, that you
have to add this optimization ahead of times ? :)
(2) Do we really need "must be part of the same folio", or could be just batch over present
ptes that map consecutive PFNs? In that case, a helper that avoids folio_pte_batch() completely
might be better.
--
Cheers,
David / dhildenb
On 07/05/25 3:33 pm, David Hildenbrand wrote:
> On 06.05.25 07:00, Dev Jain wrote:
>> To use PTE batching, we want to determine whether the folio mapped by
>> the PTE is large, thus requiring the use of vm_normal_folio(). We want
>> to avoid the cost of vm_normal_folio() if the code path doesn't already
>> require the folio. For arm64, pte_batch_hint() does the job. To
>> generalize
>> this hint, add a helper which will determine whether two consecutive PTEs
>> point to consecutive PFNs, in which case there is a high probability that
>> the underlying folio is large.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>> include/linux/pgtable.h | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index b50447ef1c92..28e21fcc7837 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -369,6 +369,22 @@ static inline pgd_t pgdp_get(pgd_t *pgdp)
>> }
>> #endif
>> +/* Caller must ensure that ptep + 1 exists */
>> +static inline bool maybe_contiguous_pte_pfns(pte_t *ptep, pte_t pte)
>> +{
>> + pte_t *next_ptep, next_pte;
>> +
>> + if (pte_batch_hint(ptep, pte) != 1)
>> + return true;
>> +
>> + next_ptep = ptep + 1;
>> + next_pte = ptep_get(next_ptep);
>> + if (!pte_present(next_pte))
>> + return false;
>> +
>> + return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE);
>> +}
>
> So, where we want to use that is:
>
> if (pte_present(old_pte)) {
> if ((max_nr != 1) && maybe_contiguous_pte_pfns(old_ptep, old_pte)) {
> struct folio *folio = vm_normal_folio(vma, old_addr, old_pte);
>
> if (folio && folio_test_large(folio))
> nr = folio_pte_batch(folio, old_addr, old_ptep,
> old_pte, max_nr, fpb_flags, NULL, NULL, NULL);
> }
> }
>
> where we won't need the folio later. But want it all part of the same
> folio?
>
>
> And the simpler version would be
>
>
> if (pte_present(old_pte)) {
> if (max_nr != 1) {
> struct folio *folio = vm_normal_folio(vma, old_addr, old_pte);
>
> if (folio && folio_test_large(folio))
> nr = folio_pte_batch(folio, old_addr, old_ptep,
> old_pte, max_nr, fpb_flags, NULL, NULL, NULL);
> }
> }
>
>
> Two things come to mind:
>
> (1) Do we *really* care about the vm_normal_folio() + folio_test_large()
> call that much, that you
> have to add this optimization ahead of times ? :)
For my mprotect series, I see a regression of almost (7.7 - 7.65)/7.7 =
0.65% for the small folio case. I am happy to remove this
micro-optimization if that is the preference.
>
> (2) Do we really need "must be part of the same folio", or could be just
> batch over present
> ptes that map consecutive PFNs? In that case, a helper that avoids
> folio_pte_batch() completely
> might be better.
>
I am not sure I get you here. folio_pte_batch() seems to be the simplest
thing we can do as being done around in the code elsewhere, I am not
aware of any alternate.
>> (2) Do we really need "must be part of the same folio", or could be just
>> batch over present
>> ptes that map consecutive PFNs? In that case, a helper that avoids
>> folio_pte_batch() completely
>> might be better.
>>
> I am not sure I get you here. folio_pte_batch() seems to be the simplest
> thing we can do as being done around in the code elsewhere, I am not
> aware of any alternate.
If we don't need the folio, then we can have a batching function that
doesn't require the folio.
Likely, we could even factor that (non-folio batching) out from folio_pte_batch().
The recent fix [1] might make that easier. See below.
So my question is: is something relying on all of these PTEs to point at the same folio?
[1] https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz
Something like this: (would need kerneldoc, probably remove "addr" parameter from folio_pte_batch(),
and look into other related cleanups as discussed with Andrew)
From f56f67ee5ae9879adb99a8da37fa7ec848c4d256 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Thu, 8 May 2025 12:53:52 +0200
Subject: [PATCH] tmp
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/internal.h | 84 ++++++++++++++++++++++++++++-----------------------
1 file changed, 46 insertions(+), 38 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 25a29872c634b..53ff8f8a7c8f9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -217,36 +217,8 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
return pte_wrprotect(pte_mkold(pte));
}
-/**
- * folio_pte_batch - detect a PTE batch for a large folio
- * @folio: The large folio to detect a PTE batch for.
- * @addr: The user virtual address the first page is mapped at.
- * @start_ptep: Page table pointer for the first entry.
- * @pte: Page table entry for the first page.
- * @max_nr: The maximum number of table entries to consider.
- * @flags: Flags to modify the PTE batch semantics.
- * @any_writable: Optional pointer to indicate whether any entry except the
- * first one is writable.
- * @any_young: Optional pointer to indicate whether any entry except the
- * first one is young.
- * @any_dirty: Optional pointer to indicate whether any entry except the
- * first one is dirty.
- *
- * Detect a PTE batch: consecutive (present) PTEs that map consecutive
- * pages of the same large folio.
- *
- * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
- * the accessed bit, writable bit, dirty bit (with FPB_IGNORE_DIRTY) and
- * soft-dirty bit (with FPB_IGNORE_SOFT_DIRTY).
- *
- * start_ptep must map any page of the folio. max_nr must be at least one and
- * must be limited by the caller so scanning cannot exceed a single page table.
- *
- * Return: the number of table entries in the batch.
- */
-static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
- pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
- bool *any_writable, bool *any_young, bool *any_dirty)
+static inline int pte_batch(pte_t *start_ptep, pte_t pte, int max_nr,
+ fpb_t flags, bool *any_writable, bool *any_young, bool *any_dirty)
{
pte_t expected_pte, *ptep;
bool writable, young, dirty;
@@ -259,14 +231,6 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
if (any_dirty)
*any_dirty = false;
- VM_WARN_ON_FOLIO(!pte_present(pte), folio);
- VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
- VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
-
- /* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
- max_nr = min_t(unsigned long, max_nr,
- folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
-
nr = pte_batch_hint(start_ptep, pte);
expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
ptep = start_ptep + nr;
@@ -300,6 +264,50 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
return min(nr, max_nr);
}
+/**
+ * folio_pte_batch - detect a PTE batch for a large folio
+ * @folio: The large folio to detect a PTE batch for.
+ * @addr: The user virtual address the first page is mapped at.
+ * @start_ptep: Page table pointer for the first entry.
+ * @pte: Page table entry for the first page.
+ * @max_nr: The maximum number of table entries to consider.
+ * @flags: Flags to modify the PTE batch semantics.
+ * @any_writable: Optional pointer to indicate whether any entry except the
+ * first one is writable.
+ * @any_young: Optional pointer to indicate whether any entry except the
+ * first one is young.
+ * @any_dirty: Optional pointer to indicate whether any entry except the
+ * first one is dirty.
+ *
+ * Detect a PTE batch: consecutive (present) PTEs that map consecutive
+ * pages of the same large folio.
+ *
+ * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
+ * the accessed bit, writable bit, dirty bit (with FPB_IGNORE_DIRTY) and
+ * soft-dirty bit (with FPB_IGNORE_SOFT_DIRTY).
+ *
+ * start_ptep must map any page of the folio. max_nr must be at least one and
+ * must be limited by the caller so scanning cannot exceed a single page table.
+ *
+ * Return: the number of table entries in the batch.
+ */
+static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
+ pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
+ bool *any_writable, bool *any_young, bool *any_dirty)
+{
+
+ VM_WARN_ON_FOLIO(!pte_present(pte), folio);
+ VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
+ VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
+
+ /* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
+ max_nr = min_t(unsigned long, max_nr,
+ folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
+
+ return pte_batch(start_ptep, pte, max_nr, flags, any_writable, any_young,
+ any_dirty);
+}
+
/**
* pte_move_swp_offset - Move the swap entry offset field of a swap pte
* forward or backward by delta
--
2.49.0
--
Cheers,
David / dhildenb
On 08/05/25 4:25 pm, David Hildenbrand wrote:
>
>>> (2) Do we really need "must be part of the same folio", or could be just
>>> batch over present
>>> ptes that map consecutive PFNs? In that case, a helper that avoids
>>> folio_pte_batch() completely
>>> might be better.
>>>
>> I am not sure I get you here. folio_pte_batch() seems to be the simplest
>> thing we can do as being done around in the code elsewhere, I am not
>> aware of any alternate.
>
> If we don't need the folio, then we can have a batching function that
> doesn't require the folio.
>
> Likely, we could even factor that (non-folio batching) out from
> folio_pte_batch().
> The recent fix [1] might make that easier. See below.
>
>
> So my question is: is something relying on all of these PTEs to point at
> the same folio?
Hmm...get_and_clear_full_ptes, as you say in another mail, will require
that...
>
> [1] https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz
>
>
> Something like this: (would need kerneldoc, probably remove "addr"
> parameter from folio_pte_batch(),
> and look into other related cleanups as discussed with Andrew)
I like this refactoring! Can you tell the commit hash on which you make
the patch, I cannot apply it.
So we need to collect/not collect a/d bits according to whether the pte
batch belongs to a large folio/small folios. Seems complicated :)
>
>
> From f56f67ee5ae9879adb99a8da37fa7ec848c4d256 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Thu, 8 May 2025 12:53:52 +0200
> Subject: [PATCH] tmp
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> mm/internal.h | 84 ++++++++++++++++++++++++++++-----------------------
> 1 file changed, 46 insertions(+), 38 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 25a29872c634b..53ff8f8a7c8f9 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -217,36 +217,8 @@ static inline pte_t __pte_batch_clear_ignored(pte_t
> pte, fpb_t flags)
> return pte_wrprotect(pte_mkold(pte));
> }
>
> -/**
> - * folio_pte_batch - detect a PTE batch for a large folio
> - * @folio: The large folio to detect a PTE batch for.
> - * @addr: The user virtual address the first page is mapped at.
> - * @start_ptep: Page table pointer for the first entry.
> - * @pte: Page table entry for the first page.
> - * @max_nr: The maximum number of table entries to consider.
> - * @flags: Flags to modify the PTE batch semantics.
> - * @any_writable: Optional pointer to indicate whether any entry except
> the
> - * first one is writable.
> - * @any_young: Optional pointer to indicate whether any entry except the
> - * first one is young.
> - * @any_dirty: Optional pointer to indicate whether any entry except the
> - * first one is dirty.
> - *
> - * Detect a PTE batch: consecutive (present) PTEs that map consecutive
> - * pages of the same large folio.
> - *
> - * All PTEs inside a PTE batch have the same PTE bits set, excluding
> the PFN,
> - * the accessed bit, writable bit, dirty bit (with FPB_IGNORE_DIRTY) and
> - * soft-dirty bit (with FPB_IGNORE_SOFT_DIRTY).
> - *
> - * start_ptep must map any page of the folio. max_nr must be at least
> one and
> - * must be limited by the caller so scanning cannot exceed a single
> page table.
> - *
> - * Return: the number of table entries in the batch.
> - */
> -static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
> - pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
> - bool *any_writable, bool *any_young, bool *any_dirty)
> +static inline int pte_batch(pte_t *start_ptep, pte_t pte, int max_nr,
> + fpb_t flags, bool *any_writable, bool *any_young, bool *any_dirty)
> {
> pte_t expected_pte, *ptep;
> bool writable, young, dirty;
> @@ -259,14 +231,6 @@ static inline int folio_pte_batch(struct folio
> *folio, unsigned long addr,
> if (any_dirty)
> *any_dirty = false;
>
> - VM_WARN_ON_FOLIO(!pte_present(pte), folio);
> - VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
> - VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio,
> folio);
> -
> - /* Limit max_nr to the actual remaining PFNs in the folio we could
> batch. */
> - max_nr = min_t(unsigned long, max_nr,
> - folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
> -
> nr = pte_batch_hint(start_ptep, pte);
> expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr),
> flags);
> ptep = start_ptep + nr;
> @@ -300,6 +264,50 @@ static inline int folio_pte_batch(struct folio
> *folio, unsigned long addr,
> return min(nr, max_nr);
> }
>
> +/**
> + * folio_pte_batch - detect a PTE batch for a large folio
> + * @folio: The large folio to detect a PTE batch for.
> + * @addr: The user virtual address the first page is mapped at.
> + * @start_ptep: Page table pointer for the first entry.
> + * @pte: Page table entry for the first page.
> + * @max_nr: The maximum number of table entries to consider.
> + * @flags: Flags to modify the PTE batch semantics.
> + * @any_writable: Optional pointer to indicate whether any entry except
> the
> + * first one is writable.
> + * @any_young: Optional pointer to indicate whether any entry except the
> + * first one is young.
> + * @any_dirty: Optional pointer to indicate whether any entry except the
> + * first one is dirty.
> + *
> + * Detect a PTE batch: consecutive (present) PTEs that map consecutive
> + * pages of the same large folio.
> + *
> + * All PTEs inside a PTE batch have the same PTE bits set, excluding
> the PFN,
> + * the accessed bit, writable bit, dirty bit (with FPB_IGNORE_DIRTY) and
> + * soft-dirty bit (with FPB_IGNORE_SOFT_DIRTY).
> + *
> + * start_ptep must map any page of the folio. max_nr must be at least
> one and
> + * must be limited by the caller so scanning cannot exceed a single
> page table.
> + *
> + * Return: the number of table entries in the batch.
> + */
> +static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
> + pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
> + bool *any_writable, bool *any_young, bool *any_dirty)
> +{
> +
> + VM_WARN_ON_FOLIO(!pte_present(pte), folio);
> + VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
> + VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio,
> folio);
> +
> + /* Limit max_nr to the actual remaining PFNs in the folio we could
> batch. */
> + max_nr = min_t(unsigned long, max_nr,
> + folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
> +
> + return pte_batch(start_ptep, pte, max_nr, flags, any_writable,
> any_young,
> + any_dirty);
> +}
> +
> /**
> * pte_move_swp_offset - Move the swap entry offset field of a swap pte
> * forward or backward by delta
On 09.05.25 07:25, Dev Jain wrote: > > > On 08/05/25 4:25 pm, David Hildenbrand wrote: >> >>>> (2) Do we really need "must be part of the same folio", or could be just >>>> batch over present >>>> ptes that map consecutive PFNs? In that case, a helper that avoids >>>> folio_pte_batch() completely >>>> might be better. >>>> >>> I am not sure I get you here. folio_pte_batch() seems to be the simplest >>> thing we can do as being done around in the code elsewhere, I am not >>> aware of any alternate. >> >> If we don't need the folio, then we can have a batching function that >> doesn't require the folio. >> >> Likely, we could even factor that (non-folio batching) out from >> folio_pte_batch(). >> The recent fix [1] might make that easier. See below. >> >> >> So my question is: is something relying on all of these PTEs to point at >> the same folio? > > Hmm...get_and_clear_full_ptes, as you say in another mail, will require > that... > >> >> [1] https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz >> >> >> Something like this: (would need kerneldoc, probably remove "addr" >> parameter from folio_pte_batch(), >> and look into other related cleanups as discussed with Andrew) > > I like this refactoring! Can you tell the commit hash on which you make > the patch, I cannot apply it. Oh, it was just on top of my private version of [1]. It should now be in mm-new (or mm-unstable, did not check). But as raised in my other mail, get_and_clear_full_ptes() might be problematic across folios. -- Cheers, David / dhildenb
On Tue, May 06, 2025 at 10:30:55AM +0530, Dev Jain wrote: > + return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE); umm ... PFN is in units of PAGE_SIZE. Did you mean == 1?
On 06/05/25 9:16 pm, Matthew Wilcox wrote: > On Tue, May 06, 2025 at 10:30:55AM +0530, Dev Jain wrote: >> + return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE); > > umm ... PFN is in units of PAGE_SIZE. Did you mean == 1? My bad, thanks for pointing out.
On 5/6/25 10:30, Dev Jain wrote:
> To use PTE batching, we want to determine whether the folio mapped by
> the PTE is large, thus requiring the use of vm_normal_folio(). We want
> to avoid the cost of vm_normal_folio() if the code path doesn't already
> require the folio. For arm64, pte_batch_hint() does the job. To generalize
> this hint, add a helper which will determine whether two consecutive PTEs
> point to consecutive PFNs, in which case there is a high probability that
> the underlying folio is large.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> include/linux/pgtable.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index b50447ef1c92..28e21fcc7837 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -369,6 +369,22 @@ static inline pgd_t pgdp_get(pgd_t *pgdp)
> }
> #endif
>
> +/* Caller must ensure that ptep + 1 exists */
> +static inline bool maybe_contiguous_pte_pfns(pte_t *ptep, pte_t pte)
> +{
> + pte_t *next_ptep, next_pte;
> +
> + if (pte_batch_hint(ptep, pte) != 1)
> + return true;
> +
> + next_ptep = ptep + 1;
> + next_pte = ptep_get(next_ptep);
> + if (!pte_present(next_pte))
> + return false;
> +
> + return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE);
> +}
> +
> #ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
> static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
> unsigned long address,
As mentioned earlier, this new helper maybe_contiguous_pte_pfns() does not
have a proposed caller. Hence please do consider folding this forward with
the next patch where move_ptes() starts using the helper. Besides, it is
also difficult to review this patch without a proper caller context.
On Tue, May 06, 2025 at 02:40:44PM +0530, Anshuman Khandual wrote:
> On 5/6/25 10:30, Dev Jain wrote:
> > To use PTE batching, we want to determine whether the folio mapped by
> > the PTE is large, thus requiring the use of vm_normal_folio(). We want
> > to avoid the cost of vm_normal_folio() if the code path doesn't already
> > require the folio. For arm64, pte_batch_hint() does the job. To generalize
> > this hint, add a helper which will determine whether two consecutive PTEs
> > point to consecutive PFNs, in which case there is a high probability that
> > the underlying folio is large.
> >
> > Signed-off-by: Dev Jain <dev.jain@arm.com>
> > ---
> > include/linux/pgtable.h | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
> >
> > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > index b50447ef1c92..28e21fcc7837 100644
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -369,6 +369,22 @@ static inline pgd_t pgdp_get(pgd_t *pgdp)
> > }
> > #endif
> >
> > +/* Caller must ensure that ptep + 1 exists */
> > +static inline bool maybe_contiguous_pte_pfns(pte_t *ptep, pte_t pte)
> > +{
> > + pte_t *next_ptep, next_pte;
> > +
> > + if (pte_batch_hint(ptep, pte) != 1)
> > + return true;
> > +
> > + next_ptep = ptep + 1;
> > + next_pte = ptep_get(next_ptep);
> > + if (!pte_present(next_pte))
> > + return false;
> > +
> > + return unlikely(pte_pfn(next_pte) - pte_pfn(pte) == PAGE_SIZE);
> > +}
> > +
> > #ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
> > static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
> > unsigned long address,
>
> As mentioned earlier, this new helper maybe_contiguous_pte_pfns() does not
> have a proposed caller. Hence please do consider folding this forward with
> the next patch where move_ptes() starts using the helper. Besides, it is
> also difficult to review this patch without a proper caller context.
Agreed, please combine the two.
© 2016 - 2025 Red Hat, Inc.