[PATCH RFC 10/14] mm/memory: factor out common code from vm_normal_page_*()

David Hildenbrand posted 14 patches 3 months, 3 weeks ago
[PATCH RFC 10/14] mm/memory: factor out common code from vm_normal_page_*()
Posted by David Hildenbrand 3 months, 3 weeks ago
Let's reduce the code duplication and factor out the non-pte/pmd related
magic into vm_normal_page_pfn().

To keep it simpler, check the pfn against both zero folios. We could
optimize this, but as it's only for the !CONFIG_ARCH_HAS_PTE_SPECIAL
case, it's not a compelling micro-optimization.

With CONFIG_ARCH_HAS_PTE_SPECIAL we don't have to check anything else,
really.

It's a good question if we can even hit the !CONFIG_ARCH_HAS_PTE_SPECIAL
scenario in the PMD case in practice: but doesn't really matter, as
it's now all unified in vm_normal_page_pfn().

While at it, add a check that pmd_special() is really only set where we
would expect it.

No functional change intended.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 104 +++++++++++++++++++++++-----------------------------
 1 file changed, 46 insertions(+), 58 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index b6c069f4ad11f..3d3fa01cd217e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -536,6 +536,46 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
 }
 
+/* Called only if the page table entry is not marked special. */
+static inline struct page *vm_normal_page_pfn(struct vm_area_struct *vma,
+		unsigned long addr, unsigned long pfn)
+{
+	/*
+	 * With CONFIG_ARCH_HAS_PTE_SPECIAL, any special page table mappings
+	 * (incl. shared zero folios) are marked accordingly.
+	 */
+	if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL))
+		goto normal_page;
+
+	if (unlikely(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))) {
+		if (vma->vm_flags & VM_MIXEDMAP) {
+			/* If it has a "struct page", it's "normal". */
+			if (!pfn_valid(pfn))
+				return NULL;
+		} else {
+			unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
+
+			/* Only CoW'ed anon folios are "normal". */
+			if (pfn == vma->vm_pgoff + off)
+				return NULL;
+			if (!is_cow_mapping(vma->vm_flags))
+				return NULL;
+		}
+	}
+
+	if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn))
+		return NULL;
+
+normal_page:
+	/*
+	 * NOTE! We still have PageReserved() pages in the page tables.
+	 * For example, VDSO mappings can cause them to exist.
+	 */
+	VM_WARN_ON_ONCE(!pfn_valid(pfn));
+	VM_WARN_ON_ONCE(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn));
+	return pfn_to_page(pfn);
+}
+
 /*
  * vm_normal_page -- This function gets the "struct page" associated with a pte.
  *
@@ -591,9 +631,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 {
 	unsigned long pfn = pte_pfn(pte);
 
-	if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
-		if (likely(!pte_special(pte)))
-			goto out;
+	if (unlikely(pte_special(pte))) {
 		if (vma->vm_ops && vma->vm_ops->find_special_page)
 			return vma->vm_ops->find_special_page(vma, addr);
 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
@@ -604,34 +642,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 		print_bad_pte(vma, addr, pte, NULL);
 		return NULL;
 	}
-
-	/* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */
-
-	if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
-		if (vma->vm_flags & VM_MIXEDMAP) {
-			if (!pfn_valid(pfn))
-				return NULL;
-		} else {
-			unsigned long off;
-			off = (addr - vma->vm_start) >> PAGE_SHIFT;
-			if (pfn == vma->vm_pgoff + off)
-				return NULL;
-			if (!is_cow_mapping(vma->vm_flags))
-				return NULL;
-		}
-	}
-
-	if (is_zero_pfn(pfn))
-		return NULL;
-
-	/*
-	 * NOTE! We still have PageReserved() pages in the page tables.
-	 * eg. VDSO mappings can cause them to exist.
-	 */
-out:
-	VM_WARN_ON_ONCE(!pfn_valid(pfn));
-	VM_WARN_ON_ONCE(is_zero_pfn(pfn));
-	return pfn_to_page(pfn);
+	return vm_normal_page_pfn(vma, addr, pfn);
 }
 
 struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr,
@@ -650,35 +661,12 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 {
 	unsigned long pfn = pmd_pfn(pmd);
 
-	/* Currently it's only used for huge pfnmaps */
-	if (unlikely(pmd_special(pmd)))
+	if (unlikely(pmd_special(pmd))) {
+		VM_WARN_ON_ONCE(!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) &&
+				!is_huge_zero_pfn(pfn));
 		return NULL;
-
-	if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
-		if (vma->vm_flags & VM_MIXEDMAP) {
-			if (!pfn_valid(pfn))
-				return NULL;
-			goto out;
-		} else {
-			unsigned long off;
-			off = (addr - vma->vm_start) >> PAGE_SHIFT;
-			if (pfn == vma->vm_pgoff + off)
-				return NULL;
-			if (!is_cow_mapping(vma->vm_flags))
-				return NULL;
-		}
 	}
-
-	if (is_huge_zero_pfn(pfn))
-		return NULL;
-
-	/*
-	 * NOTE! We still have PageReserved() pages in the page tables.
-	 * eg. VDSO mappings can cause them to exist.
-	 */
-out:
-	VM_WARN_ON_ONCE(!pfn_valid(pfn));
-	return pfn_to_page(pfn);
+	return vm_normal_page_pfn(vma, addr, pfn);
 }
 
 struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
-- 
2.49.0
Re: [PATCH RFC 10/14] mm/memory: factor out common code from vm_normal_page_*()
Posted by Oscar Salvador 3 months, 2 weeks ago
On Tue, Jun 17, 2025 at 05:43:41PM +0200, David Hildenbrand wrote:
> Let's reduce the code duplication and factor out the non-pte/pmd related
> magic into vm_normal_page_pfn().
> 
> To keep it simpler, check the pfn against both zero folios. We could
> optimize this, but as it's only for the !CONFIG_ARCH_HAS_PTE_SPECIAL
> case, it's not a compelling micro-optimization.
> 
> With CONFIG_ARCH_HAS_PTE_SPECIAL we don't have to check anything else,
> really.
> 
> It's a good question if we can even hit the !CONFIG_ARCH_HAS_PTE_SPECIAL
> scenario in the PMD case in practice: but doesn't really matter, as
> it's now all unified in vm_normal_page_pfn().
> 
> While at it, add a check that pmd_special() is really only set where we
> would expect it.
> 
> No functional change intended.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

Comment below

>  struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr,
> @@ -650,35 +661,12 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>  {
>  	unsigned long pfn = pmd_pfn(pmd);
>  
> -	/* Currently it's only used for huge pfnmaps */

Although the check kind of spells it out, we could leave this one and also add
that huge_zero_pfn, to make it more explicit.
 

-- 
Oscar Salvador
SUSE Labs
Re: [PATCH RFC 10/14] mm/memory: factor out common code from vm_normal_page_*()
Posted by David Hildenbrand 3 months, 2 weeks ago
On 25.06.25 10:53, Oscar Salvador wrote:
> On Tue, Jun 17, 2025 at 05:43:41PM +0200, David Hildenbrand wrote:
>> Let's reduce the code duplication and factor out the non-pte/pmd related
>> magic into vm_normal_page_pfn().
>>
>> To keep it simpler, check the pfn against both zero folios. We could
>> optimize this, but as it's only for the !CONFIG_ARCH_HAS_PTE_SPECIAL
>> case, it's not a compelling micro-optimization.
>>
>> With CONFIG_ARCH_HAS_PTE_SPECIAL we don't have to check anything else,
>> really.
>>
>> It's a good question if we can even hit the !CONFIG_ARCH_HAS_PTE_SPECIAL
>> scenario in the PMD case in practice: but doesn't really matter, as
>> it's now all unified in vm_normal_page_pfn().
>>
>> While at it, add a check that pmd_special() is really only set where we
>> would expect it.
>>
>> No functional change intended.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> 
> Comment below
> 
>>   struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr,
>> @@ -650,35 +661,12 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>>   {
>>   	unsigned long pfn = pmd_pfn(pmd);
>>   
>> -	/* Currently it's only used for huge pfnmaps */
> 
> Although the check kind of spells it out, we could leave this one and also add
> that huge_zero_pfn, to make it more explicit.

I don't think that comment is required anymore -- we do exactly what 
vm_normal_page() does + documents,

What the current users are is not particularly important anymore.

Or why do you think it would still be important?

-- 
Cheers,

David / dhildenb
Re: [PATCH RFC 10/14] mm/memory: factor out common code from vm_normal_page_*()
Posted by Oscar Salvador 3 months, 2 weeks ago
On Wed, Jun 25, 2025 at 10:57:39AM +0200, David Hildenbrand wrote:
> I don't think that comment is required anymore -- we do exactly what
> vm_normal_page() does + documents,
> 
> What the current users are is not particularly important anymore.
> 
> Or why do you think it would still be important?

Maybe the current users are not important, but at least a comment directing 
to vm_normal_page like "See comment in vm_normal_page".
Here, and in vm_normal_page_pud().

Just someone has it clear why we're only checking for X and Y when we find a
pte/pmd/pud special.

But not really a strong opinion here, just I think that it might be helpful.


-- 
Oscar Salvador
SUSE Labs
Re: [PATCH RFC 10/14] mm/memory: factor out common code from vm_normal_page_*()
Posted by David Hildenbrand 3 months, 2 weeks ago
On 25.06.25 11:20, Oscar Salvador wrote:
> On Wed, Jun 25, 2025 at 10:57:39AM +0200, David Hildenbrand wrote:
>> I don't think that comment is required anymore -- we do exactly what
>> vm_normal_page() does + documents,
>>
>> What the current users are is not particularly important anymore.
>>
>> Or why do you think it would still be important?
> 
> Maybe the current users are not important, but at least a comment directing
> to vm_normal_page like "See comment in vm_normal_page".
> Here, and in vm_normal_page_pud().
> 
> Just someone has it clear why we're only checking for X and Y when we find a
> pte/pmd/pud special.
> 
> But not really a strong opinion here, just I think that it might be helpful.

I was already debating with myself whether to add full kerneldoc for 
these functions ... but yeah, to me the link to "vm_normal_page()" is 
obvious, but we can just spell it out "see vm_normal_page()".

-- 
Cheers,

David / dhildenb