Let's clean up a bit:
(1) No need for start_ptep vs. ptep anymore, we can simply use ptep
(2) Let's switch to "unsigned int" for everything
(3) We can simplify the code by leaving the pte unchanged after the
pte_same() check.
(4) Clarify that we should never exceed a single VMA; it indicates a
problem in the caller.
No functional change intended.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/internal.h | 37 +++++++++++++++----------------------
1 file changed, 15 insertions(+), 22 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 9690c75063881..ca6590c6d9eab 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -221,7 +221,7 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
* folio_pte_batch - detect a PTE batch for a large folio
* @folio: The large folio to detect a PTE batch for.
* @addr: The user virtual address the first page is mapped at.
- * @start_ptep: Page table pointer for the first entry.
+ * @ptep: Page table pointer for the first entry.
* @pte: Page table entry for the first page.
* @max_nr: The maximum number of table entries to consider.
* @flags: Flags to modify the PTE batch semantics.
@@ -233,24 +233,24 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
* first one is dirty.
*
* Detect a PTE batch: consecutive (present) PTEs that map consecutive
- * pages of the same large folio.
+ * pages of the same large folio in a single VMA and a single page table.
*
* All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
* the accessed bit, writable bit, dirty bit (unless FPB_HONOR_DIRTY is set) and
* soft-dirty bit (unless FPB_HONOR_SOFT_DIRTY is set).
*
- * start_ptep must map any page of the folio. max_nr must be at least one and
- * must be limited by the caller so scanning cannot exceed a single page table.
+ * @ptep must map any page of the folio. max_nr must be at least one and
+ * must be limited by the caller so scanning cannot exceed a single VMA and
+ * a single page table.
*
* Return: the number of table entries in the batch.
*/
-static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
- pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
+static inline unsigned int folio_pte_batch(struct folio *folio, unsigned long addr,
+ pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags,
bool *any_writable, bool *any_young, bool *any_dirty)
{
- pte_t expected_pte, *ptep;
- bool writable, young, dirty;
- int nr, cur_nr;
+ unsigned int nr, cur_nr;
+ pte_t expected_pte;
if (any_writable)
*any_writable = false;
@@ -267,29 +267,22 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
max_nr = min_t(unsigned long, max_nr,
folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
- nr = pte_batch_hint(start_ptep, pte);
+ nr = pte_batch_hint(ptep, pte);
expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
- ptep = start_ptep + nr;
+ ptep = ptep + nr;
while (nr < max_nr) {
pte = ptep_get(ptep);
- if (any_writable)
- writable = !!pte_write(pte);
- if (any_young)
- young = !!pte_young(pte);
- if (any_dirty)
- dirty = !!pte_dirty(pte);
- pte = __pte_batch_clear_ignored(pte, flags);
- if (!pte_same(pte, expected_pte))
+ if (!pte_same(__pte_batch_clear_ignored(pte, flags), expected_pte))
break;
if (any_writable)
- *any_writable |= writable;
+ *any_writable |= pte_write(pte);
if (any_young)
- *any_young |= young;
+ *any_young |= pte_young(pte);
if (any_dirty)
- *any_dirty |= dirty;
+ *any_dirty |= pte_dirty(pte);
cur_nr = pte_batch_hint(ptep, pte);
expected_pte = pte_advance_pfn(expected_pte, cur_nr);
--
2.49.0
On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: > Let's clean up a bit: > > (1) No need for start_ptep vs. ptep anymore, we can simply use ptep > > (2) Let's switch to "unsigned int" for everything > > (3) We can simplify the code by leaving the pte unchanged after the > pte_same() check. > > (4) Clarify that we should never exceed a single VMA; it indicates a > problem in the caller. > > No functional change intended. > > Signed-off-by: David Hildenbrand <david@redhat.com> Hi David :-), I have to confess that I fell in the same trap as Lorenzo wrt. __pte_batch_clear_ignored changing the pte value. So I'm not sure if it would be nice to place a little comment in __pte_batch_clear_ignored claryfing that pte's value remains unchanged ? Reviewed-by: Oscar Salvador <osalvador@suse.de> -- Oscar Salvador SUSE Labs
On 02.07.25 10:42, Oscar Salvador wrote: > On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: >> Let's clean up a bit: >> >> (1) No need for start_ptep vs. ptep anymore, we can simply use ptep >> >> (2) Let's switch to "unsigned int" for everything >> >> (3) We can simplify the code by leaving the pte unchanged after the >> pte_same() check. >> >> (4) Clarify that we should never exceed a single VMA; it indicates a >> problem in the caller. >> >> No functional change intended. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Hi David :-), > > I have to confess that I fell in the same trap as Lorenzo wrt. > __pte_batch_clear_ignored changing the pte value. > So I'm not sure if it would be nice to place a little comment in > __pte_batch_clear_ignored claryfing that pte's value remains unchanged ? I mean, that's how all our pte modification functions work, really? :) Thanks! -- Cheers, David / dhildenb
On Wed, Jul 02, 2025 at 10:48:20AM +0200, David Hildenbrand wrote: > On 02.07.25 10:42, Oscar Salvador wrote: > > On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: > > > Let's clean up a bit: > > > > > > (1) No need for start_ptep vs. ptep anymore, we can simply use ptep > > > > > > (2) Let's switch to "unsigned int" for everything > > > > > > (3) We can simplify the code by leaving the pte unchanged after the > > > pte_same() check. > > > > > > (4) Clarify that we should never exceed a single VMA; it indicates a > > > problem in the caller. > > > > > > No functional change intended. > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > > Hi David :-), > > > > I have to confess that I fell in the same trap as Lorenzo wrt. > > __pte_batch_clear_ignored changing the pte value. > > So I'm not sure if it would be nice to place a little comment in > > __pte_batch_clear_ignored claryfing that pte's value remains unchanged ? > > I mean, that's how all our pte modification functions work, really? :) > > Thanks! I mean, it might be that me and Oscar are similarly 'challenged' in this respect :P (high 5 Oscar!) but I think the issue here is that it's sort of a compounded use, and in fact some functions do modify stuff, which is why we end up with all the ptep ptent etc. fun. Up to you re: comment, but I think maybe in cases where it's a reallly compounded set of stuff it's potentially useful. But obviously we still do do this all over the place elsewhere with no comment... > > -- > Cheers, > > David / dhildenb > >
On 02.07.25 10:51, Lorenzo Stoakes wrote: > On Wed, Jul 02, 2025 at 10:48:20AM +0200, David Hildenbrand wrote: >> On 02.07.25 10:42, Oscar Salvador wrote: >>> On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: >>>> Let's clean up a bit: >>>> >>>> (1) No need for start_ptep vs. ptep anymore, we can simply use ptep >>>> >>>> (2) Let's switch to "unsigned int" for everything >>>> >>>> (3) We can simplify the code by leaving the pte unchanged after the >>>> pte_same() check. >>>> >>>> (4) Clarify that we should never exceed a single VMA; it indicates a >>>> problem in the caller. >>>> >>>> No functional change intended. >>>> >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> >>> Hi David :-), >>> >>> I have to confess that I fell in the same trap as Lorenzo wrt. >>> __pte_batch_clear_ignored changing the pte value. >>> So I'm not sure if it would be nice to place a little comment in >>> __pte_batch_clear_ignored claryfing that pte's value remains unchanged ? >> >> I mean, that's how all our pte modification functions work, really? :) >> >> Thanks! > > I mean, it might be that me and Oscar are similarly 'challenged' in this > respect :P (high 5 Oscar!) but I think the issue here is that it's sort of > a compounded use, and in fact some functions do modify stuff, which is why > we end up with all the ptep ptent etc. fun. > > Up to you re: comment, but I think maybe in cases where it's a reallly > compounded set of stuff it's potentially useful. > > But obviously we still do do this all over the place elsewhere with no > comment... Well, if you are not passing in a *value* and not a pointer to a function, you would not expect for that *value* to change? :) Yes, once we pass pointers it's different. Or when we're using weird macros. Adding a comment that a function will not modify a value that is ... passed-by-value? Maybe it's just me that doesn't get why that should be particularly helpful :) -- Cheers, David / dhildenb
On Wed, Jul 02, 2025 at 11:00:48AM +0200, David Hildenbrand wrote: > On 02.07.25 10:51, Lorenzo Stoakes wrote: > > On Wed, Jul 02, 2025 at 10:48:20AM +0200, David Hildenbrand wrote: > > > On 02.07.25 10:42, Oscar Salvador wrote: > > > > On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: > > > > > Let's clean up a bit: > > > > > > > > > > (1) No need for start_ptep vs. ptep anymore, we can simply use ptep > > > > > > > > > > (2) Let's switch to "unsigned int" for everything > > > > > > > > > > (3) We can simplify the code by leaving the pte unchanged after the > > > > > pte_same() check. > > > > > > > > > > (4) Clarify that we should never exceed a single VMA; it indicates a > > > > > problem in the caller. > > > > > > > > > > No functional change intended. > > > > > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > > > > > > Hi David :-), > > > > > > > > I have to confess that I fell in the same trap as Lorenzo wrt. > > > > __pte_batch_clear_ignored changing the pte value. > > > > So I'm not sure if it would be nice to place a little comment in > > > > __pte_batch_clear_ignored claryfing that pte's value remains unchanged ? > > > > > > I mean, that's how all our pte modification functions work, really? :) > > > > > > Thanks! > > > > I mean, it might be that me and Oscar are similarly 'challenged' in this > > respect :P (high 5 Oscar!) but I think the issue here is that it's sort of > > a compounded use, and in fact some functions do modify stuff, which is why > > we end up with all the ptep ptent etc. fun. > > > > Up to you re: comment, but I think maybe in cases where it's a reallly > > compounded set of stuff it's potentially useful. > > > > But obviously we still do do this all over the place elsewhere with no > > comment... > > Well, if you are not passing in a *value* and not a pointer to a function, > you would not expect for that *value* to change? :) > > Yes, once we pass pointers it's different. Or when we're using weird macros. > > Adding a comment that a function will not modify a value that is ... > passed-by-value? Maybe it's just me that doesn't get why that should be > particularly helpful :) I think the issue is that we've passed around 'pte' as value and pointer (and of course, via macros...) previously so that's the cause of the confusion, often. This is why I really am a fan of us consistently saying ptep when passing a pointer. Anyway, I think on balance a comment isn't useful here, agreed! ;) > > -- > Cheers, > > David / dhildenb >
On 02.07.25 11:08, Lorenzo Stoakes wrote: > On Wed, Jul 02, 2025 at 11:00:48AM +0200, David Hildenbrand wrote: >> On 02.07.25 10:51, Lorenzo Stoakes wrote: >>> On Wed, Jul 02, 2025 at 10:48:20AM +0200, David Hildenbrand wrote: >>>> On 02.07.25 10:42, Oscar Salvador wrote: >>>>> On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: >>>>>> Let's clean up a bit: >>>>>> >>>>>> (1) No need for start_ptep vs. ptep anymore, we can simply use ptep >>>>>> >>>>>> (2) Let's switch to "unsigned int" for everything >>>>>> >>>>>> (3) We can simplify the code by leaving the pte unchanged after the >>>>>> pte_same() check. >>>>>> >>>>>> (4) Clarify that we should never exceed a single VMA; it indicates a >>>>>> problem in the caller. >>>>>> >>>>>> No functional change intended. >>>>>> >>>>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>>> >>>>> Hi David :-), >>>>> >>>>> I have to confess that I fell in the same trap as Lorenzo wrt. >>>>> __pte_batch_clear_ignored changing the pte value. >>>>> So I'm not sure if it would be nice to place a little comment in >>>>> __pte_batch_clear_ignored claryfing that pte's value remains unchanged ? >>>> >>>> I mean, that's how all our pte modification functions work, really? :) >>>> >>>> Thanks! >>> >>> I mean, it might be that me and Oscar are similarly 'challenged' in this >>> respect :P (high 5 Oscar!) but I think the issue here is that it's sort of >>> a compounded use, and in fact some functions do modify stuff, which is why >>> we end up with all the ptep ptent etc. fun. >>> >>> Up to you re: comment, but I think maybe in cases where it's a reallly >>> compounded set of stuff it's potentially useful. >>> >>> But obviously we still do do this all over the place elsewhere with no >>> comment... >> >> Well, if you are not passing in a *value* and not a pointer to a function, >> you would not expect for that *value* to change? :) >> >> Yes, once we pass pointers it's different. Or when we're using weird macros. >> >> Adding a comment that a function will not modify a value that is ... >> passed-by-value? Maybe it's just me that doesn't get why that should be >> particularly helpful :) > > I think the issue is that we've passed around 'pte' as value and pointer (and of > course, via macros...) previously so that's the cause of the confusion, often. > > This is why I really am a fan of us consistently saying ptep when passing a > pointer. 100%: pte for pointers is absolutely nasty. -- Cheers, David / dhildenb
On 27 Jun 2025, at 7:55, David Hildenbrand wrote: > Let's clean up a bit: > > (1) No need for start_ptep vs. ptep anymore, we can simply use ptep > > (2) Let's switch to "unsigned int" for everything > > (3) We can simplify the code by leaving the pte unchanged after the > pte_same() check. > > (4) Clarify that we should never exceed a single VMA; it indicates a > problem in the caller. > > No functional change intended. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/internal.h | 37 +++++++++++++++---------------------- > 1 file changed, 15 insertions(+), 22 deletions(-) > LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com> Best Regards, Yan, Zi
On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: > Let's clean up a bit: > > (1) No need for start_ptep vs. ptep anymore, we can simply use ptep > > (2) Let's switch to "unsigned int" for everything > > (3) We can simplify the code by leaving the pte unchanged after the > pte_same() check. > > (4) Clarify that we should never exceed a single VMA; it indicates a > problem in the caller. > > No functional change intended. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/internal.h | 37 +++++++++++++++---------------------- > 1 file changed, 15 insertions(+), 22 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 9690c75063881..ca6590c6d9eab 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -221,7 +221,7 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) > * folio_pte_batch - detect a PTE batch for a large folio > * @folio: The large folio to detect a PTE batch for. > * @addr: The user virtual address the first page is mapped at. > - * @start_ptep: Page table pointer for the first entry. > + * @ptep: Page table pointer for the first entry. > * @pte: Page table entry for the first page. > * @max_nr: The maximum number of table entries to consider. > * @flags: Flags to modify the PTE batch semantics. > @@ -233,24 +233,24 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) > * first one is dirty. > * > * Detect a PTE batch: consecutive (present) PTEs that map consecutive > - * pages of the same large folio. > + * pages of the same large folio in a single VMA and a single page table. > * > * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN, > * the accessed bit, writable bit, dirty bit (unless FPB_HONOR_DIRTY is set) and > * soft-dirty bit (unless FPB_HONOR_SOFT_DIRTY is set). > * > - * start_ptep must map any page of the folio. max_nr must be at least one and > - * must be limited by the caller so scanning cannot exceed a single page table. > + * @ptep must map any page of the folio. max_nr must be at least one and > + * must be limited by the caller so scanning cannot exceed a single VMA and > + * a single page table. > * > * Return: the number of table entries in the batch. > */ > -static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > - pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, > +static inline unsigned int folio_pte_batch(struct folio *folio, unsigned long addr, Do we need to worry about propagating this type change? mremap_folio_pte_batch() and zap_present_ptes() return the value as an int for example. I mean I doubt we're going to be seeing an overflow here :) but maybe worth propagating this everywhere. > + pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags, > bool *any_writable, bool *any_young, bool *any_dirty) > { > - pte_t expected_pte, *ptep; > - bool writable, young, dirty; > - int nr, cur_nr; > + unsigned int nr, cur_nr; > + pte_t expected_pte; > > if (any_writable) > *any_writable = false; > @@ -267,29 +267,22 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > max_nr = min_t(unsigned long, max_nr, > folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte)); > > - nr = pte_batch_hint(start_ptep, pte); > + nr = pte_batch_hint(ptep, pte); > expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags); > - ptep = start_ptep + nr; > + ptep = ptep + nr; > > while (nr < max_nr) { > pte = ptep_get(ptep); > - if (any_writable) > - writable = !!pte_write(pte); > - if (any_young) > - young = !!pte_young(pte); > - if (any_dirty) > - dirty = !!pte_dirty(pte); > - pte = __pte_batch_clear_ignored(pte, flags); > > - if (!pte_same(pte, expected_pte)) > + if (!pte_same(__pte_batch_clear_ignored(pte, flags), expected_pte)) Doing this here will change the output of any_writable, any_young: static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) { ... return pte_wrprotect(pte_mkold(pte)); } So we probably need to get these values earlier? > break; > > if (any_writable) > - *any_writable |= writable; > + *any_writable |= pte_write(pte); > if (any_young) > - *any_young |= young; > + *any_young |= pte_young(pte); > if (any_dirty) > - *any_dirty |= dirty; > + *any_dirty |= pte_dirty(pte); > > cur_nr = pte_batch_hint(ptep, pte); > expected_pte = pte_advance_pfn(expected_pte, cur_nr); > -- > 2.49.0 >
On 27.06.25 18:51, Lorenzo Stoakes wrote: > On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: >> Let's clean up a bit: >> >> (1) No need for start_ptep vs. ptep anymore, we can simply use ptep >> >> (2) Let's switch to "unsigned int" for everything >> >> (3) We can simplify the code by leaving the pte unchanged after the >> pte_same() check. >> >> (4) Clarify that we should never exceed a single VMA; it indicates a >> problem in the caller. >> >> No functional change intended. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/internal.h | 37 +++++++++++++++---------------------- >> 1 file changed, 15 insertions(+), 22 deletions(-) >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 9690c75063881..ca6590c6d9eab 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -221,7 +221,7 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) >> * folio_pte_batch - detect a PTE batch for a large folio >> * @folio: The large folio to detect a PTE batch for. >> * @addr: The user virtual address the first page is mapped at. >> - * @start_ptep: Page table pointer for the first entry. >> + * @ptep: Page table pointer for the first entry. >> * @pte: Page table entry for the first page. >> * @max_nr: The maximum number of table entries to consider. >> * @flags: Flags to modify the PTE batch semantics. >> @@ -233,24 +233,24 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) >> * first one is dirty. >> * >> * Detect a PTE batch: consecutive (present) PTEs that map consecutive >> - * pages of the same large folio. >> + * pages of the same large folio in a single VMA and a single page table. >> * >> * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN, >> * the accessed bit, writable bit, dirty bit (unless FPB_HONOR_DIRTY is set) and >> * soft-dirty bit (unless FPB_HONOR_SOFT_DIRTY is set). >> * >> - * start_ptep must map any page of the folio. max_nr must be at least one and >> - * must be limited by the caller so scanning cannot exceed a single page table. >> + * @ptep must map any page of the folio. max_nr must be at least one and >> + * must be limited by the caller so scanning cannot exceed a single VMA and >> + * a single page table. >> * >> * Return: the number of table entries in the batch. >> */ >> -static inline int folio_pte_batch(struct folio *folio, unsigned long addr, >> - pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, >> +static inline unsigned int folio_pte_batch(struct folio *folio, unsigned long addr, > > Do we need to worry about propagating this type change? > > mremap_folio_pte_batch() and zap_present_ptes() return the value as an int for example. > > I mean I doubt we're going to be seeing an overflow here :) but maybe worth > propagating this everywhere. Yeah, I'm planning on cleaning some of that stuff separately. As you say, this shouldn't cause harm. Really, it could only cause harm if someone would be doing "- folio_pte_batch()" and then assigning it to a "long". > >> + pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags, >> bool *any_writable, bool *any_young, bool *any_dirty) >> { >> - pte_t expected_pte, *ptep; >> - bool writable, young, dirty; >> - int nr, cur_nr; >> + unsigned int nr, cur_nr; >> + pte_t expected_pte; >> >> if (any_writable) >> *any_writable = false; >> @@ -267,29 +267,22 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, >> max_nr = min_t(unsigned long, max_nr, >> folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte)); >> >> - nr = pte_batch_hint(start_ptep, pte); >> + nr = pte_batch_hint(ptep, pte); >> expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags); >> - ptep = start_ptep + nr; >> + ptep = ptep + nr; >> >> while (nr < max_nr) { >> pte = ptep_get(ptep); >> - if (any_writable) >> - writable = !!pte_write(pte); >> - if (any_young) >> - young = !!pte_young(pte); >> - if (any_dirty) >> - dirty = !!pte_dirty(pte); >> - pte = __pte_batch_clear_ignored(pte, flags); >> >> - if (!pte_same(pte, expected_pte)) >> + if (!pte_same(__pte_batch_clear_ignored(pte, flags), expected_pte)) > > Doing this here will change the output of any_writable, any_young: > > static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) > { > ... > return pte_wrprotect(pte_mkold(pte)); > } > > So we probably need to get these values earlier? Note that __pte_batch_clear_ignored() leaves the pte unchanged, and only returns the modified pte. (like pte_mkold() and friends). So what we read through ptep_get() is still what we have after the pte_same() check. -- Cheers, David / dhildenb
On Fri, Jun 27, 2025 at 07:02:16PM +0200, David Hildenbrand wrote: > On 27.06.25 18:51, Lorenzo Stoakes wrote: > > On Fri, Jun 27, 2025 at 01:55:08PM +0200, David Hildenbrand wrote: > > > Let's clean up a bit: > > > > > > (1) No need for start_ptep vs. ptep anymore, we can simply use ptep > > > > > > (2) Let's switch to "unsigned int" for everything > > > > > > (3) We can simplify the code by leaving the pte unchanged after the > > > pte_same() check. > > > > > > (4) Clarify that we should never exceed a single VMA; it indicates a > > > problem in the caller. > > > > > > No functional change intended. > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> Given below, LGTM: Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > -static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > > > - pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, > > > +static inline unsigned int folio_pte_batch(struct folio *folio, unsigned long addr, > > > > Do we need to worry about propagating this type change? > > > > mremap_folio_pte_batch() and zap_present_ptes() return the value as an int for example. > > > > I mean I doubt we're going to be seeing an overflow here :) but maybe worth > > propagating this everywhere. > > Yeah, I'm planning on cleaning some of that stuff separately. As you say, > this shouldn't cause harm. > > Really, it could only cause harm if someone would be doing > > "- folio_pte_batch()" and then assigning it to a "long". Right yeah. > > > > > > + pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags, > > > bool *any_writable, bool *any_young, bool *any_dirty) > > > { > > > - pte_t expected_pte, *ptep; > > > - bool writable, young, dirty; > > > - int nr, cur_nr; > > > + unsigned int nr, cur_nr; > > > + pte_t expected_pte; > > > > > > if (any_writable) > > > *any_writable = false; > > > @@ -267,29 +267,22 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > > > max_nr = min_t(unsigned long, max_nr, > > > folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte)); > > > > > > - nr = pte_batch_hint(start_ptep, pte); > > > + nr = pte_batch_hint(ptep, pte); > > > expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags); > > > - ptep = start_ptep + nr; > > > + ptep = ptep + nr; > > > > > > while (nr < max_nr) { > > > pte = ptep_get(ptep); > > > - if (any_writable) > > > - writable = !!pte_write(pte); > > > - if (any_young) > > > - young = !!pte_young(pte); > > > - if (any_dirty) > > > - dirty = !!pte_dirty(pte); > > > - pte = __pte_batch_clear_ignored(pte, flags); > > > > > > - if (!pte_same(pte, expected_pte)) > > > + if (!pte_same(__pte_batch_clear_ignored(pte, flags), expected_pte)) > > > > Doing this here will change the output of any_writable, any_young: > > > > static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) > > { > > ... > > return pte_wrprotect(pte_mkold(pte)); > > } > > > > So we probably need to get these values earlier? > > > Note that __pte_batch_clear_ignored() leaves the pte unchanged, and only > returns the modified pte. (like pte_mkold() and friends). > > So what we read through ptep_get() is still what we have after the > pte_same() check. Yeah you're right, sorry, these are just chaging the value that is returned for comparison against expected_pte. Then LGTM!
On Fri, Jun 27, 2025 at 7:55 PM David Hildenbrand <david@redhat.com> wrote: > > Let's clean up a bit: > > (1) No need for start_ptep vs. ptep anymore, we can simply use ptep > > (2) Let's switch to "unsigned int" for everything > > (3) We can simplify the code by leaving the pte unchanged after the > pte_same() check. > > (4) Clarify that we should never exceed a single VMA; it indicates a > problem in the caller. > > No functional change intended. Nice cleanup! Simplifying the loop and removing the temporary variables makes the code much easier to follow. Also, clarifying the caller's responsibility to stay within a single VMA and page table is a nice change ;) Feel free to add: Reviewed-by: Lance Yang <lance.yang@linux.dev> Thanks, Lance > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/internal.h | 37 +++++++++++++++---------------------- > 1 file changed, 15 insertions(+), 22 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 9690c75063881..ca6590c6d9eab 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -221,7 +221,7 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) > * folio_pte_batch - detect a PTE batch for a large folio > * @folio: The large folio to detect a PTE batch for. > * @addr: The user virtual address the first page is mapped at. > - * @start_ptep: Page table pointer for the first entry. > + * @ptep: Page table pointer for the first entry. > * @pte: Page table entry for the first page. > * @max_nr: The maximum number of table entries to consider. > * @flags: Flags to modify the PTE batch semantics. > @@ -233,24 +233,24 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) > * first one is dirty. > * > * Detect a PTE batch: consecutive (present) PTEs that map consecutive > - * pages of the same large folio. > + * pages of the same large folio in a single VMA and a single page table. > * > * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN, > * the accessed bit, writable bit, dirty bit (unless FPB_HONOR_DIRTY is set) and > * soft-dirty bit (unless FPB_HONOR_SOFT_DIRTY is set). > * > - * start_ptep must map any page of the folio. max_nr must be at least one and > - * must be limited by the caller so scanning cannot exceed a single page table. > + * @ptep must map any page of the folio. max_nr must be at least one and > + * must be limited by the caller so scanning cannot exceed a single VMA and > + * a single page table. > * > * Return: the number of table entries in the batch. > */ > -static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > - pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, > +static inline unsigned int folio_pte_batch(struct folio *folio, unsigned long addr, > + pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags, > bool *any_writable, bool *any_young, bool *any_dirty) > { > - pte_t expected_pte, *ptep; > - bool writable, young, dirty; > - int nr, cur_nr; > + unsigned int nr, cur_nr; > + pte_t expected_pte; > > if (any_writable) > *any_writable = false; > @@ -267,29 +267,22 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > max_nr = min_t(unsigned long, max_nr, > folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte)); > > - nr = pte_batch_hint(start_ptep, pte); > + nr = pte_batch_hint(ptep, pte); > expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags); > - ptep = start_ptep + nr; > + ptep = ptep + nr; > > while (nr < max_nr) { > pte = ptep_get(ptep); > - if (any_writable) > - writable = !!pte_write(pte); > - if (any_young) > - young = !!pte_young(pte); > - if (any_dirty) > - dirty = !!pte_dirty(pte); > - pte = __pte_batch_clear_ignored(pte, flags); > > - if (!pte_same(pte, expected_pte)) > + if (!pte_same(__pte_batch_clear_ignored(pte, flags), expected_pte)) > break; > > if (any_writable) > - *any_writable |= writable; > + *any_writable |= pte_write(pte); > if (any_young) > - *any_young |= young; > + *any_young |= pte_young(pte); > if (any_dirty) > - *any_dirty |= dirty; > + *any_dirty |= pte_dirty(pte); > > cur_nr = pte_batch_hint(ptep, pte); > expected_pte = pte_advance_pfn(expected_pte, cur_nr); > -- > 2.49.0 > >
© 2016 - 2025 Red Hat, Inc.