mm: vm_normal_page*() + CoW PFNMAP improvements

[PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by David Hildenbrand 7 months, 3 weeks ago

In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
readily available.

Nowadays, this is the last remaining highest_memmap_pfn user, and this
sanity check is not really triggering ... frequently.

Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
simplify and get rid of highest_memmap_pfn. Checking for
pfn_to_online_page() might be even better, but it would not handle
ZONE_DEVICE properly.

Do the same in vm_normal_page_pmd(), where we don't even report a
problem at all ...

What might be better in the future is having a runtime option like
page-table-check to enable such checks dynamically on-demand. Something
for the future.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 0163d127cece9..188b84ebf479a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -590,7 +590,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 
 	if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
 		if (likely(!pte_special(pte)))
-			goto check_pfn;
+			goto out;
 		if (vma->vm_ops && vma->vm_ops->find_special_page)
 			return vma->vm_ops->find_special_page(vma, addr);
 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
@@ -608,9 +608,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 		if (vma->vm_flags & VM_MIXEDMAP) {
 			if (!pfn_valid(pfn))
 				return NULL;
-			if (is_zero_pfn(pfn))
-				return NULL;
-			goto out;
 		} else {
 			unsigned long off;
 			off = (addr - vma->vm_start) >> PAGE_SHIFT;
@@ -624,17 +621,12 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 	if (is_zero_pfn(pfn))
 		return NULL;
 
-check_pfn:
-	if (unlikely(pfn > highest_memmap_pfn)) {
-		print_bad_pte(vma, addr, pte, NULL);
-		return NULL;
-	}
-
 	/*
 	 * NOTE! We still have PageReserved() pages in the page tables.
 	 * eg. VDSO mappings can cause them to exist.
 	 */
 out:
+	VM_WARN_ON_ONCE(!pfn_valid(pfn));
 	VM_WARN_ON_ONCE(is_zero_pfn(pfn));
 	return pfn_to_page(pfn);
 }
@@ -676,14 +668,13 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 	if (is_huge_zero_pmd(pmd))
 		return NULL;
-	if (unlikely(pfn > highest_memmap_pfn))
-		return NULL;
 
 	/*
 	 * NOTE! We still have PageReserved() pages in the page tables.
 	 * eg. VDSO mappings can cause them to exist.
 	 */
 out:
+	VM_WARN_ON_ONCE(!pfn_valid(pfn));
 	return pfn_to_page(pfn);
 }
 
-- 
2.49.0

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Lance Yang 7 months, 1 week ago

On Tue, Jun 17, 2025 at 11:44 PM David Hildenbrand <david@redhat.com> wrote:
>
> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
> readily available.
>
> Nowadays, this is the last remaining highest_memmap_pfn user, and this
> sanity check is not really triggering ... frequently.
>
> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
> simplify and get rid of highest_memmap_pfn. Checking for
> pfn_to_online_page() might be even better, but it would not handle
> ZONE_DEVICE properly.
>
> Do the same in vm_normal_page_pmd(), where we don't even report a
> problem at all ...
>
> What might be better in the future is having a runtime option like
> page-table-check to enable such checks dynamically on-demand. Something
> for the future.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>

LGTM. Feel free to add:
Reviewed-by: Lance Yang <lance.yang@linux.dev>

Thanks,
Lance

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Oscar Salvador 7 months, 2 weeks ago

On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
> readily available.
> 
> Nowadays, this is the last remaining highest_memmap_pfn user, and this
> sanity check is not really triggering ... frequently.
> 
> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
> simplify and get rid of highest_memmap_pfn. Checking for
> pfn_to_online_page() might be even better, but it would not handle
> ZONE_DEVICE properly.
> 
> Do the same in vm_normal_page_pmd(), where we don't even report a
> problem at all ...
> 
> What might be better in the future is having a runtime option like
> page-table-check to enable such checks dynamically on-demand. Something
> for the future.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>


-- 
Oscar Salvador
SUSE Labs

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Oscar Salvador 7 months, 3 weeks ago

On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
> readily available.
> 
> Nowadays, this is the last remaining highest_memmap_pfn user, and this
> sanity check is not really triggering ... frequently.
> 
> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
> simplify and get rid of highest_memmap_pfn. Checking for
> pfn_to_online_page() might be even better, but it would not handle
> ZONE_DEVICE properly.
> 
> Do the same in vm_normal_page_pmd(), where we don't even report a
> problem at all ...
> 
> What might be better in the future is having a runtime option like
> page-table-check to enable such checks dynamically on-demand. Something
> for the future.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

I'm confused, I'm missing something here.
Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but
now we just print the warning and call pfn_to_page() anyway.
AFAIK, pfn_to_page() doesn't return NULL?
 

-- 
Oscar Salvador
SUSE Labs

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by David Hildenbrand 7 months, 2 weeks ago

On 20.06.25 14:50, Oscar Salvador wrote:
> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
>> readily available.
>>
>> Nowadays, this is the last remaining highest_memmap_pfn user, and this
>> sanity check is not really triggering ... frequently.
>>
>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
>> simplify and get rid of highest_memmap_pfn. Checking for
>> pfn_to_online_page() might be even better, but it would not handle
>> ZONE_DEVICE properly.
>>
>> Do the same in vm_normal_page_pmd(), where we don't even report a
>> problem at all ...
>>
>> What might be better in the future is having a runtime option like
>> page-table-check to enable such checks dynamically on-demand. Something
>> for the future.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 

Hi Oscar,

> I'm confused, I'm missing something here.
> Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but
> now we just print the warning and call pfn_to_page() anyway.
> AFAIK, pfn_to_page() doesn't return NULL?

You're missing that vm_normal_page_pmd() was created as a copy from 
vm_normal_page() [history of the sanity check above], but as we don't 
have (and shouldn't have ...) print_bad_pmd(), we made the code look 
like this would be something that can just happen.

"
Do the same in vm_normal_page_pmd(), where we don't even report a
problem at all ...
"

So we made something that should never happen a runtime sanity check 
without ever reporting a problem ...

-- 
Cheers,

David / dhildenb

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Lance Yang 7 months, 1 week ago

On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 20.06.25 14:50, Oscar Salvador wrote:
> > On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
> >> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
> >> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
> >> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
> >> readily available.
> >>
> >> Nowadays, this is the last remaining highest_memmap_pfn user, and this
> >> sanity check is not really triggering ... frequently.
> >>
> >> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
> >> simplify and get rid of highest_memmap_pfn. Checking for
> >> pfn_to_online_page() might be even better, but it would not handle
> >> ZONE_DEVICE properly.
> >>
> >> Do the same in vm_normal_page_pmd(), where we don't even report a
> >> problem at all ...
> >>
> >> What might be better in the future is having a runtime option like
> >> page-table-check to enable such checks dynamically on-demand. Something
> >> for the future.
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >
>
> Hi Oscar,
>
> > I'm confused, I'm missing something here.
> > Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but
> > now we just print the warning and call pfn_to_page() anyway.
> > AFAIK, pfn_to_page() doesn't return NULL?
>
> You're missing that vm_normal_page_pmd() was created as a copy from
> vm_normal_page() [history of the sanity check above], but as we don't
> have (and shouldn't have ...) print_bad_pmd(), we made the code look
> like this would be something that can just happen.
>
> "
> Do the same in vm_normal_page_pmd(), where we don't even report a
> problem at all ...
> "
>
> So we made something that should never happen a runtime sanity check
> without ever reporting a problem ...

IIUC, the reasoning is that because this case should never happen, we can
change the behavior from returning NULL to a "warn and continue" model?

Thanks,
Lance

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by David Hildenbrand 7 months, 1 week ago

On 03.07.25 14:34, Lance Yang wrote:
> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 20.06.25 14:50, Oscar Salvador wrote:
>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
>>>> readily available.
>>>>
>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this
>>>> sanity check is not really triggering ... frequently.
>>>>
>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
>>>> simplify and get rid of highest_memmap_pfn. Checking for
>>>> pfn_to_online_page() might be even better, but it would not handle
>>>> ZONE_DEVICE properly.
>>>>
>>>> Do the same in vm_normal_page_pmd(), where we don't even report a
>>>> problem at all ...
>>>>
>>>> What might be better in the future is having a runtime option like
>>>> page-table-check to enable such checks dynamically on-demand. Something
>>>> for the future.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>
>> Hi Oscar,
>>
>>> I'm confused, I'm missing something here.
>>> Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but
>>> now we just print the warning and call pfn_to_page() anyway.
>>> AFAIK, pfn_to_page() doesn't return NULL?
>>
>> You're missing that vm_normal_page_pmd() was created as a copy from
>> vm_normal_page() [history of the sanity check above], but as we don't
>> have (and shouldn't have ...) print_bad_pmd(), we made the code look
>> like this would be something that can just happen.
>>
>> "
>> Do the same in vm_normal_page_pmd(), where we don't even report a
>> problem at all ...
>> "
>>
>> So we made something that should never happen a runtime sanity check
>> without ever reporting a problem ...
> 
> IIUC, the reasoning is that because this case should never happen, we can
> change the behavior from returning NULL to a "warn and continue" model?

Well, yes. Point is, that check should have never been copy-pasted that 
way, while dropping the actual warning :)

It's a sanity check for something that should never happen, turned into 
something that looks like it must be handled and would be valid to 
encounter.

-- 
Cheers,

David / dhildenb

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Lance Yang 7 months, 1 week ago


On 2025/7/3 20:39, David Hildenbrand wrote:
> On 03.07.25 14:34, Lance Yang wrote:
>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> 
>> wrote:
>>>
>>> On 20.06.25 14:50, Oscar Salvador wrote:
>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
>>>>> readily available.
>>>>>
>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this
>>>>> sanity check is not really triggering ... frequently.
>>>>>
>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
>>>>> simplify and get rid of highest_memmap_pfn. Checking for
>>>>> pfn_to_online_page() might be even better, but it would not handle
>>>>> ZONE_DEVICE properly.
>>>>>
>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a
>>>>> problem at all ...
>>>>>
>>>>> What might be better in the future is having a runtime option like
>>>>> page-table-check to enable such checks dynamically on-demand. 
>>>>> Something
>>>>> for the future.
>>>>>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>
>>>
>>> Hi Oscar,
>>>
>>>> I'm confused, I'm missing something here.
>>>> Before this change we would return NULL if e.g: pfn > 
>>>> highest_memmap_pfn, but
>>>> now we just print the warning and call pfn_to_page() anyway.
>>>> AFAIK, pfn_to_page() doesn't return NULL?
>>>
>>> You're missing that vm_normal_page_pmd() was created as a copy from
>>> vm_normal_page() [history of the sanity check above], but as we don't
>>> have (and shouldn't have ...) print_bad_pmd(), we made the code look
>>> like this would be something that can just happen.
>>>
>>> "
>>> Do the same in vm_normal_page_pmd(), where we don't even report a
>>> problem at all ...
>>> "
>>>
>>> So we made something that should never happen a runtime sanity check
>>> without ever reporting a problem ...
>>
>> IIUC, the reasoning is that because this case should never happen, we can
>> change the behavior from returning NULL to a "warn and continue" model?
> 
> Well, yes. Point is, that check should have never been copy-pasted that 
> way, while dropping the actual warning :)

Ah, I see your point now. Thanks for clarifying!

> 
> It's a sanity check for something that should never happen, turned into 
> something that looks like it must be handled and would be valid to 
> encounter.

Yeah. Makes sense to me ;)

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by David Hildenbrand 7 months, 1 week ago

On 03.07.25 16:44, Lance Yang wrote:
> 
> 
> On 2025/7/3 20:39, David Hildenbrand wrote:
>> On 03.07.25 14:34, Lance Yang wrote:
>>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com>
>>> wrote:
>>>>
>>>> On 20.06.25 14:50, Oscar Salvador wrote:
>>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
>>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
>>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
>>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
>>>>>> readily available.
>>>>>>
>>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this
>>>>>> sanity check is not really triggering ... frequently.
>>>>>>
>>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
>>>>>> simplify and get rid of highest_memmap_pfn. Checking for
>>>>>> pfn_to_online_page() might be even better, but it would not handle
>>>>>> ZONE_DEVICE properly.
>>>>>>
>>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a
>>>>>> problem at all ...
>>>>>>
>>>>>> What might be better in the future is having a runtime option like
>>>>>> page-table-check to enable such checks dynamically on-demand.
>>>>>> Something
>>>>>> for the future.
>>>>>>
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>
>>>>
>>>> Hi Oscar,
>>>>
>>>>> I'm confused, I'm missing something here.
>>>>> Before this change we would return NULL if e.g: pfn >
>>>>> highest_memmap_pfn, but
>>>>> now we just print the warning and call pfn_to_page() anyway.
>>>>> AFAIK, pfn_to_page() doesn't return NULL?
>>>>
>>>> You're missing that vm_normal_page_pmd() was created as a copy from
>>>> vm_normal_page() [history of the sanity check above], but as we don't
>>>> have (and shouldn't have ...) print_bad_pmd(), we made the code look
>>>> like this would be something that can just happen.
>>>>
>>>> "
>>>> Do the same in vm_normal_page_pmd(), where we don't even report a
>>>> problem at all ...
>>>> "
>>>>
>>>> So we made something that should never happen a runtime sanity check
>>>> without ever reporting a problem ...
>>>
>>> IIUC, the reasoning is that because this case should never happen, we can
>>> change the behavior from returning NULL to a "warn and continue" model?
>>
>> Well, yes. Point is, that check should have never been copy-pasted that
>> way, while dropping the actual warning :)
> 
> Ah, I see your point now. Thanks for clarifying!

I'll add some of that to the patch description. Thanks!

-- 
Cheers,

David / dhildenb

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Hugh Dickins 7 months ago

On Fri, 4 Jul 2025, David Hildenbrand wrote:
> On 03.07.25 16:44, Lance Yang wrote:
> > On 2025/7/3 20:39, David Hildenbrand wrote:
> >> On 03.07.25 14:34, Lance Yang wrote:
> >>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com>
> >>> wrote:
> >>>>
> >>>> On 20.06.25 14:50, Oscar Salvador wrote:
> >>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
> >>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
> >>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
> >>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
> >>>>>> readily available.
> >>>>>>
> >>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this
> >>>>>> sanity check is not really triggering ... frequently.
> >>>>>>
> >>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
> >>>>>> simplify and get rid of highest_memmap_pfn. Checking for
> >>>>>> pfn_to_online_page() might be even better, but it would not handle
> >>>>>> ZONE_DEVICE properly.
> >>>>>>
> >>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a
> >>>>>> problem at all ...
> >>>>>>
> >>>>>> What might be better in the future is having a runtime option like
> >>>>>> page-table-check to enable such checks dynamically on-demand.
> >>>>>> Something
> >>>>>> for the future.
> >>>>>>
> >>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>

The author of 22b31eec63e5 thinks this is not at all an improvement.
Of course the condition is not triggering frequently, of course it
should not happen: but it does happen, and it still seems worthwhile
to catch it in production with a "Bad page map" than to let it run on
to whatever kind of crash it hits instead.

Hugh

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by David Hildenbrand 7 months ago

On 07.07.25 08:31, Hugh Dickins wrote:
> On Fri, 4 Jul 2025, David Hildenbrand wrote:
>> On 03.07.25 16:44, Lance Yang wrote:
>>> On 2025/7/3 20:39, David Hildenbrand wrote:
>>>> On 03.07.25 14:34, Lance Yang wrote:
>>>>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com>
>>>>> wrote:
>>>>>>
>>>>>> On 20.06.25 14:50, Oscar Salvador wrote:
>>>>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
>>>>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
>>>>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
>>>>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
>>>>>>>> readily available.
>>>>>>>>
>>>>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this
>>>>>>>> sanity check is not really triggering ... frequently.
>>>>>>>>
>>>>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
>>>>>>>> simplify and get rid of highest_memmap_pfn. Checking for
>>>>>>>> pfn_to_online_page() might be even better, but it would not handle
>>>>>>>> ZONE_DEVICE properly.
>>>>>>>>
>>>>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a
>>>>>>>> problem at all ...
>>>>>>>>
>>>>>>>> What might be better in the future is having a runtime option like
>>>>>>>> page-table-check to enable such checks dynamically on-demand.
>>>>>>>> Something
>>>>>>>> for the future.
>>>>>>>>
>>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> The author of 22b31eec63e5 thinks this is not at all an improvement.
> Of course the condition is not triggering frequently, of course it
> should not happen: but it does happen, and it still seems worthwhile
> to catch it in production with a "Bad page map" than to let it run on
> to whatever kind of crash it hits instead.

Well, obviously I don't agree and was waiting for having this discussion :)

We catch corruption in a handful of PTE bits, and that's about it. You 
neither detect corruption of flags nor of PFN bits that result in 
another valid PFN.

Corruption of the "special" bit might be fun.

When I was able to trigger this during development once, the whole 
machine went down shortly after -- mostly because of use-after-free of 
something that is now a page table, which is just bad for both users of 
such a page!

E.g., quit that process and we will happily clear the PTE, corrupting 
data of the other user. Fun.

I'm sure I could find a way to unify the code while printing some 
comparable message, but this check as it stands is just not worth it 
IMHO: trying to handle something gracefully that shouldn't happen, when 
really we cannot handle it gracefully.

-- 
Cheers,

David / dhildenb

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Hugh Dickins 7 months ago

On Mon, 7 Jul 2025, David Hildenbrand wrote:
> On 07.07.25 08:31, Hugh Dickins wrote:
> > On Fri, 4 Jul 2025, David Hildenbrand wrote:
> >> On 03.07.25 16:44, Lance Yang wrote:
> >>> On 2025/7/3 20:39, David Hildenbrand wrote:
> >>>> On 03.07.25 14:34, Lance Yang wrote:
> >>>>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> On 20.06.25 14:50, Oscar Salvador wrote:
> >>>>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
> >>>>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
> >>>>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
> >>>>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
> >>>>>>>> readily available.

highest_memmap_pfn was introduced by that commit for this purpose.

> >>>>>>>>
> >>>>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and
> >>>>>>>> this
> >>>>>>>> sanity check is not really triggering ... frequently.
> >>>>>>>>
> >>>>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
> >>>>>>>> simplify and get rid of highest_memmap_pfn. Checking for
> >>>>>>>> pfn_to_online_page() might be even better, but it would not handle
> >>>>>>>> ZONE_DEVICE properly.
> >>>>>>>>
> >>>>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a
> >>>>>>>> problem at all ...
> >>>>>>>>
> >>>>>>>> What might be better in the future is having a runtime option like
> >>>>>>>> page-table-check to enable such checks dynamically on-demand.
> >>>>>>>> Something
> >>>>>>>> for the future.
> >>>>>>>>
> >>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
> > 
> > The author of 22b31eec63e5 thinks this is not at all an improvement.
> > Of course the condition is not triggering frequently, of course it
> > should not happen: but it does happen, and it still seems worthwhile
> > to catch it in production with a "Bad page map" than to let it run on
> > to whatever kind of crash it hits instead.
> 
> Well, obviously I don't agree and was waiting for having this discussion :)
> 
> We catch corruption in a handful of PTE bits, and that's about it. You neither
> detect corruption of flags nor of PFN bits that result in another valid PFN.

Of course it's limited in what it can catch (and won't even get called
if the present bit was not set - a more complete patch might unify with
those various "Bad swap" messages). Of course. But it's still useful for
stopping pfn_to_page() veering off the end of the memmap[] (in some configs).
And it's still useful for printing out a series of "Bad page map" messages
when the page table is corrupted: from which a picture can sometimes be
built up (isolated instance may just be a bitflip; series of them can
sometimes show e.g. ascii text, occasionally helpful for debugging).

> 
> Corruption of the "special" bit might be fun.
> 
> When I was able to trigger this during development once, the whole machine
> went down shortly after -- mostly because of use-after-free of something that
> is now a page table, which is just bad for both users of such a page!
> 
> E.g., quit that process and we will happily clear the PTE, corrupting data of
> the other user. Fun.
> 
> I'm sure I could find a way to unify the code while printing some comparable
> message, but this check as it stands is just not worth it IMHO: trying to
> handle something gracefully that shouldn't happen, when really we cannot
> handle it gracefully.

So, you have experience of a time when it didn't help you. Okay. And we
have had experience of other times when it has helped, if only a little.
Like with other "Bad page"s: sometimes helpful, often not; but tending to
build up a big picture from repeated occurrences.

We continue to disagree. I can't argue more than append the 2.6.29
commit message, which seems to me as valid now as it was then.

From 22b31eec63e5f2e219a3ee15f456897272bc73e8 Mon Sep 17 00:00:00 2001
From: Hugh Dickins <hugh@veritas.com>
Date: Tue, 6 Jan 2009 14:40:09 -0800
Subject: [PATCH] badpage: vm_normal_page use print_bad_pte

print_bad_pte() is so far being called only when zap_pte_range() finds
negative page_mapcount, or there's a fault on a pte_file where it does not
belong.  That's weak coverage when we suspect pagetable corruption.

Originally, it was called when vm_normal_page() found an invalid pfn: but
pfn_valid is expensive on some architectures and configurations, so 2.6.24
put that under CONFIG_DEBUG_VM (which doesn't help in the field), then
2.6.26 replaced it by a VM_BUG_ON (likewise).

Reinstate the print_bad_pte() in vm_normal_page(), but use a cheaper test
than pfn_valid(): memmap_init_zone() (used in bootup and hotplug) keep a
__read_mostly note of the highest_memmap_pfn, vm_normal_page() then check
pfn against that.  We could call this pfn_plausible() or pfn_sane(), but I
doubt we'll need it elsewhere: of course it's not reliable, but gives much
stronger pagetable validation on many boxes.

Also use print_bad_pte() when the pte_special bit is found outside a
VM_PFNMAP or VM_MIXEDMAP area, instead of VM_BUG_ON.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by David Hildenbrand 7 months ago

On 08.07.25 04:52, Hugh Dickins wrote:
> On Mon, 7 Jul 2025, David Hildenbrand wrote:
>> On 07.07.25 08:31, Hugh Dickins wrote:
>>> On Fri, 4 Jul 2025, David Hildenbrand wrote:
>>>> On 03.07.25 16:44, Lance Yang wrote:
>>>>> On 2025/7/3 20:39, David Hildenbrand wrote:
>>>>>> On 03.07.25 14:34, Lance Yang wrote:
>>>>>>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On 20.06.25 14:50, Oscar Salvador wrote:
>>>>>>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
>>>>>>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
>>>>>>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
>>>>>>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
>>>>>>>>>> readily available.
> 
> highest_memmap_pfn was introduced by that commit for this purpose.
> 

Oh, somehow I thought it was around before that.

[...]

>> We catch corruption in a handful of PTE bits, and that's about it. You neither
>> detect corruption of flags nor of PFN bits that result in another valid PFN.
> 
> Of course it's limited in what it can catch (and won't even get called
> if the present bit was not set - a more complete patch might unify with
> those various "Bad swap" messages). Of course. But it's still useful for
> stopping pfn_to_page() veering off the end of the memmap[] (in some configs).

Right, probably in the configs we both don't care that much about 
nowadays :)

> And it's still useful for printing out a series of "Bad page map" messages
> when the page table is corrupted: from which a picture can sometimes be
> built up (isolated instance may just be a bitflip; series of them can
> sometimes show e.g. ascii text, occasionally helpful for debugging).

It's kind of a weird thing, because we do something very different 
opposed to other areas where we detect that something serious is going 
wrong (e.g., WARN).

But another thread just sparked whether we should WARN here, so I'll 
leave that discussion to the other thread.

> 
>>
>> Corruption of the "special" bit might be fun.
>>
>> When I was able to trigger this during development once, the whole machine
>> went down shortly after -- mostly because of use-after-free of something that
>> is now a page table, which is just bad for both users of such a page!
>>
>> E.g., quit that process and we will happily clear the PTE, corrupting data of
>> the other user. Fun.
>>
>> I'm sure I could find a way to unify the code while printing some comparable
>> message, but this check as it stands is just not worth it IMHO: trying to
>> handle something gracefully that shouldn't happen, when really we cannot
>> handle it gracefully.
> 
> So, you have experience of a time when it didn't help you. Okay. And we
> have had experience of other times when it has helped, if only a little.
> Like with other "Bad page"s: sometimes helpful, often not; but tending to
> build up a big picture from repeated occurrences.

Okay. I was rather curious how often we would actually hit this one 
here: from my recollection, the mapcount underflows are much more 
frequent than the ones from vm_normal_page().

> 
> We continue to disagree. I can't argue more than append the 2.6.29
> commit message, which seems to me as valid now as it was then.

Not that I agree that performing these sanity checks in each and every 
config is something reasonable, but apparently you think that they are 
still useful, 16 years after they were introduced.

So, let me try finding a way to unify the code while keeping that error 
handling for now in place.

-- 
Cheers,

David / dhildenb

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Hugh Dickins 7 months ago

On Fri, 11 Jul 2025, David Hildenbrand wrote:
> On 08.07.25 04:52, Hugh Dickins wrote:
> > 
> > Of course it's limited in what it can catch (and won't even get called
> > if the present bit was not set - a more complete patch might unify with
> > those various "Bad swap" messages). Of course. But it's still useful for
> > stopping pfn_to_page() veering off the end of the memmap[] (in some
> > configs).
> 
> Right, probably in the configs we both don't care that much about nowadays :)

I thought it was the other way round: it's useful for stopping
pfn_to_page() veering off the end of the memmap[] if it's a memory model
where pfn_to_page() is a simple linear conversion.

As with SPARSEMEM_VMEMMAP, which I thought was the favourite nowadays.

If you don't care about that one much (does hotplug prevent it?), then
you do care about the complex pfn_to_page()s, and we should have worried
more when "page++"s got unnecessarily converted to folio_page(folio, i)
a year or two back (I'm thinking of in mm/rmap.c, maybe elsewhere).

Hugh

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by David Hildenbrand 7 months ago

On 11.07.25 20:49, Hugh Dickins wrote:
> On Fri, 11 Jul 2025, David Hildenbrand wrote:
>> On 08.07.25 04:52, Hugh Dickins wrote:
>>>
>>> Of course it's limited in what it can catch (and won't even get called
>>> if the present bit was not set - a more complete patch might unify with
>>> those various "Bad swap" messages). Of course. But it's still useful for
>>> stopping pfn_to_page() veering off the end of the memmap[] (in some
>>> configs).
>>
>> Right, probably in the configs we both don't care that much about nowadays :)
> 
> I thought it was the other way round: it's useful for stopping
> pfn_to_page() veering off the end of the memmap[] if it's a memory model
> where pfn_to_page() is a simple linear conversion.
> 
> As with SPARSEMEM_VMEMMAP, which I thought was the favourite nowadays.

Yes, you're right, I had the !SPARSEMEM model in mind, but obviously, 
the same applies for the SPARSEMEM_VMEMMAP case as well.

Only the SPARSEMEM && !SPARSEMEM_VMEMMAP model is weird. IIRC, it will 
dereference NULL when given a non-existant PFN. (__nr_to_section 
returning NULL and pfn_to_section_nr() not beeing happy about that)

-- 
Cheers,

David / dhildenb

Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()

Posted by Oscar Salvador 7 months, 2 weeks ago

On Mon, Jun 23, 2025 at 04:04:01PM +0200, David Hildenbrand wrote:
> Hi Oscar,

Hi David,

> 
> > I'm confused, I'm missing something here.
> > Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but
> > now we just print the warning and call pfn_to_page() anyway.
> > AFAIK, pfn_to_page() doesn't return NULL?
> 
> You're missing that vm_normal_page_pmd() was created as a copy from
> vm_normal_page() [history of the sanity check above], but as we don't have
> (and shouldn't have ...) print_bad_pmd(), we made the code look like this
> would be something that can just happen.

Ok I see.

> 
> "
> Do the same in vm_normal_page_pmd(), where we don't even report a
> problem at all ...
> "
> 
> So we made something that should never happen a runtime sanity check without
> ever reporting a problem ...

all clear now, thanks :-)



-- 
Oscar Salvador
SUSE Labs