In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
readily available.
Nowadays, this is the last remaining highest_memmap_pfn user, and this
sanity check is not really triggering ... frequently.
Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
simplify and get rid of highest_memmap_pfn. Checking for
pfn_to_online_page() might be even better, but it would not handle
ZONE_DEVICE properly.
Do the same in vm_normal_page_pmd(), where we don't even report a
problem at all ...
What might be better in the future is having a runtime option like
page-table-check to enable such checks dynamically on-demand. Something
for the future.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/memory.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 0163d127cece9..188b84ebf479a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -590,7 +590,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
if (likely(!pte_special(pte)))
- goto check_pfn;
+ goto out;
if (vma->vm_ops && vma->vm_ops->find_special_page)
return vma->vm_ops->find_special_page(vma, addr);
if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
@@ -608,9 +608,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
if (vma->vm_flags & VM_MIXEDMAP) {
if (!pfn_valid(pfn))
return NULL;
- if (is_zero_pfn(pfn))
- return NULL;
- goto out;
} else {
unsigned long off;
off = (addr - vma->vm_start) >> PAGE_SHIFT;
@@ -624,17 +621,12 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
if (is_zero_pfn(pfn))
return NULL;
-check_pfn:
- if (unlikely(pfn > highest_memmap_pfn)) {
- print_bad_pte(vma, addr, pte, NULL);
- return NULL;
- }
-
/*
* NOTE! We still have PageReserved() pages in the page tables.
* eg. VDSO mappings can cause them to exist.
*/
out:
+ VM_WARN_ON_ONCE(!pfn_valid(pfn));
VM_WARN_ON_ONCE(is_zero_pfn(pfn));
return pfn_to_page(pfn);
}
@@ -676,14 +668,13 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
if (is_huge_zero_pmd(pmd))
return NULL;
- if (unlikely(pfn > highest_memmap_pfn))
- return NULL;
/*
* NOTE! We still have PageReserved() pages in the page tables.
* eg. VDSO mappings can cause them to exist.
*/
out:
+ VM_WARN_ON_ONCE(!pfn_valid(pfn));
return pfn_to_page(pfn);
}
--
2.49.0
On Tue, Jun 17, 2025 at 11:44 PM David Hildenbrand <david@redhat.com> wrote: > > In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current > highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: > vm_normal_page use print_bad_pte"), because highest_memmap_pfn was > readily available. > > Nowadays, this is the last remaining highest_memmap_pfn user, and this > sanity check is not really triggering ... frequently. > > Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can > simplify and get rid of highest_memmap_pfn. Checking for > pfn_to_online_page() might be even better, but it would not handle > ZONE_DEVICE properly. > > Do the same in vm_normal_page_pmd(), where we don't even report a > problem at all ... > > What might be better in the future is having a runtime option like > page-table-check to enable such checks dynamically on-demand. Something > for the future. > > Signed-off-by: David Hildenbrand <david@redhat.com> LGTM. Feel free to add: Reviewed-by: Lance Yang <lance.yang@linux.dev> Thanks, Lance
On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: > In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current > highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: > vm_normal_page use print_bad_pte"), because highest_memmap_pfn was > readily available. > > Nowadays, this is the last remaining highest_memmap_pfn user, and this > sanity check is not really triggering ... frequently. > > Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can > simplify and get rid of highest_memmap_pfn. Checking for > pfn_to_online_page() might be even better, but it would not handle > ZONE_DEVICE properly. > > Do the same in vm_normal_page_pmd(), where we don't even report a > problem at all ... > > What might be better in the future is having a runtime option like > page-table-check to enable such checks dynamically on-demand. Something > for the future. > > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> -- Oscar Salvador SUSE Labs
On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: > In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current > highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: > vm_normal_page use print_bad_pte"), because highest_memmap_pfn was > readily available. > > Nowadays, this is the last remaining highest_memmap_pfn user, and this > sanity check is not really triggering ... frequently. > > Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can > simplify and get rid of highest_memmap_pfn. Checking for > pfn_to_online_page() might be even better, but it would not handle > ZONE_DEVICE properly. > > Do the same in vm_normal_page_pmd(), where we don't even report a > problem at all ... > > What might be better in the future is having a runtime option like > page-table-check to enable such checks dynamically on-demand. Something > for the future. > > Signed-off-by: David Hildenbrand <david@redhat.com> I'm confused, I'm missing something here. Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but now we just print the warning and call pfn_to_page() anyway. AFAIK, pfn_to_page() doesn't return NULL? -- Oscar Salvador SUSE Labs
On 20.06.25 14:50, Oscar Salvador wrote: > On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: >> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current >> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: >> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was >> readily available. >> >> Nowadays, this is the last remaining highest_memmap_pfn user, and this >> sanity check is not really triggering ... frequently. >> >> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can >> simplify and get rid of highest_memmap_pfn. Checking for >> pfn_to_online_page() might be even better, but it would not handle >> ZONE_DEVICE properly. >> >> Do the same in vm_normal_page_pmd(), where we don't even report a >> problem at all ... >> >> What might be better in the future is having a runtime option like >> page-table-check to enable such checks dynamically on-demand. Something >> for the future. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> > Hi Oscar, > I'm confused, I'm missing something here. > Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but > now we just print the warning and call pfn_to_page() anyway. > AFAIK, pfn_to_page() doesn't return NULL? You're missing that vm_normal_page_pmd() was created as a copy from vm_normal_page() [history of the sanity check above], but as we don't have (and shouldn't have ...) print_bad_pmd(), we made the code look like this would be something that can just happen. " Do the same in vm_normal_page_pmd(), where we don't even report a problem at all ... " So we made something that should never happen a runtime sanity check without ever reporting a problem ... -- Cheers, David / dhildenb
On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> wrote: > > On 20.06.25 14:50, Oscar Salvador wrote: > > On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: > >> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current > >> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: > >> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was > >> readily available. > >> > >> Nowadays, this is the last remaining highest_memmap_pfn user, and this > >> sanity check is not really triggering ... frequently. > >> > >> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can > >> simplify and get rid of highest_memmap_pfn. Checking for > >> pfn_to_online_page() might be even better, but it would not handle > >> ZONE_DEVICE properly. > >> > >> Do the same in vm_normal_page_pmd(), where we don't even report a > >> problem at all ... > >> > >> What might be better in the future is having a runtime option like > >> page-table-check to enable such checks dynamically on-demand. Something > >> for the future. > >> > >> Signed-off-by: David Hildenbrand <david@redhat.com> > > > > Hi Oscar, > > > I'm confused, I'm missing something here. > > Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but > > now we just print the warning and call pfn_to_page() anyway. > > AFAIK, pfn_to_page() doesn't return NULL? > > You're missing that vm_normal_page_pmd() was created as a copy from > vm_normal_page() [history of the sanity check above], but as we don't > have (and shouldn't have ...) print_bad_pmd(), we made the code look > like this would be something that can just happen. > > " > Do the same in vm_normal_page_pmd(), where we don't even report a > problem at all ... > " > > So we made something that should never happen a runtime sanity check > without ever reporting a problem ... IIUC, the reasoning is that because this case should never happen, we can change the behavior from returning NULL to a "warn and continue" model? Thanks, Lance
On 03.07.25 14:34, Lance Yang wrote: > On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 20.06.25 14:50, Oscar Salvador wrote: >>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: >>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current >>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: >>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was >>>> readily available. >>>> >>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this >>>> sanity check is not really triggering ... frequently. >>>> >>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can >>>> simplify and get rid of highest_memmap_pfn. Checking for >>>> pfn_to_online_page() might be even better, but it would not handle >>>> ZONE_DEVICE properly. >>>> >>>> Do the same in vm_normal_page_pmd(), where we don't even report a >>>> problem at all ... >>>> >>>> What might be better in the future is having a runtime option like >>>> page-table-check to enable such checks dynamically on-demand. Something >>>> for the future. >>>> >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> >> >> Hi Oscar, >> >>> I'm confused, I'm missing something here. >>> Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but >>> now we just print the warning and call pfn_to_page() anyway. >>> AFAIK, pfn_to_page() doesn't return NULL? >> >> You're missing that vm_normal_page_pmd() was created as a copy from >> vm_normal_page() [history of the sanity check above], but as we don't >> have (and shouldn't have ...) print_bad_pmd(), we made the code look >> like this would be something that can just happen. >> >> " >> Do the same in vm_normal_page_pmd(), where we don't even report a >> problem at all ... >> " >> >> So we made something that should never happen a runtime sanity check >> without ever reporting a problem ... > > IIUC, the reasoning is that because this case should never happen, we can > change the behavior from returning NULL to a "warn and continue" model? Well, yes. Point is, that check should have never been copy-pasted that way, while dropping the actual warning :) It's a sanity check for something that should never happen, turned into something that looks like it must be handled and would be valid to encounter. -- Cheers, David / dhildenb
On 2025/7/3 20:39, David Hildenbrand wrote: > On 03.07.25 14:34, Lance Yang wrote: >> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> >> wrote: >>> >>> On 20.06.25 14:50, Oscar Salvador wrote: >>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: >>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current >>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: >>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was >>>>> readily available. >>>>> >>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this >>>>> sanity check is not really triggering ... frequently. >>>>> >>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can >>>>> simplify and get rid of highest_memmap_pfn. Checking for >>>>> pfn_to_online_page() might be even better, but it would not handle >>>>> ZONE_DEVICE properly. >>>>> >>>>> Do the same in vm_normal_page_pmd(), where we don't even report a >>>>> problem at all ... >>>>> >>>>> What might be better in the future is having a runtime option like >>>>> page-table-check to enable such checks dynamically on-demand. >>>>> Something >>>>> for the future. >>>>> >>>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>> >>> >>> Hi Oscar, >>> >>>> I'm confused, I'm missing something here. >>>> Before this change we would return NULL if e.g: pfn > >>>> highest_memmap_pfn, but >>>> now we just print the warning and call pfn_to_page() anyway. >>>> AFAIK, pfn_to_page() doesn't return NULL? >>> >>> You're missing that vm_normal_page_pmd() was created as a copy from >>> vm_normal_page() [history of the sanity check above], but as we don't >>> have (and shouldn't have ...) print_bad_pmd(), we made the code look >>> like this would be something that can just happen. >>> >>> " >>> Do the same in vm_normal_page_pmd(), where we don't even report a >>> problem at all ... >>> " >>> >>> So we made something that should never happen a runtime sanity check >>> without ever reporting a problem ... >> >> IIUC, the reasoning is that because this case should never happen, we can >> change the behavior from returning NULL to a "warn and continue" model? > > Well, yes. Point is, that check should have never been copy-pasted that > way, while dropping the actual warning :) Ah, I see your point now. Thanks for clarifying! > > It's a sanity check for something that should never happen, turned into > something that looks like it must be handled and would be valid to > encounter. Yeah. Makes sense to me ;)
On 03.07.25 16:44, Lance Yang wrote: > > > On 2025/7/3 20:39, David Hildenbrand wrote: >> On 03.07.25 14:34, Lance Yang wrote: >>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> >>> wrote: >>>> >>>> On 20.06.25 14:50, Oscar Salvador wrote: >>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: >>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current >>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: >>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was >>>>>> readily available. >>>>>> >>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this >>>>>> sanity check is not really triggering ... frequently. >>>>>> >>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can >>>>>> simplify and get rid of highest_memmap_pfn. Checking for >>>>>> pfn_to_online_page() might be even better, but it would not handle >>>>>> ZONE_DEVICE properly. >>>>>> >>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a >>>>>> problem at all ... >>>>>> >>>>>> What might be better in the future is having a runtime option like >>>>>> page-table-check to enable such checks dynamically on-demand. >>>>>> Something >>>>>> for the future. >>>>>> >>>>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>>> >>>> >>>> Hi Oscar, >>>> >>>>> I'm confused, I'm missing something here. >>>>> Before this change we would return NULL if e.g: pfn > >>>>> highest_memmap_pfn, but >>>>> now we just print the warning and call pfn_to_page() anyway. >>>>> AFAIK, pfn_to_page() doesn't return NULL? >>>> >>>> You're missing that vm_normal_page_pmd() was created as a copy from >>>> vm_normal_page() [history of the sanity check above], but as we don't >>>> have (and shouldn't have ...) print_bad_pmd(), we made the code look >>>> like this would be something that can just happen. >>>> >>>> " >>>> Do the same in vm_normal_page_pmd(), where we don't even report a >>>> problem at all ... >>>> " >>>> >>>> So we made something that should never happen a runtime sanity check >>>> without ever reporting a problem ... >>> >>> IIUC, the reasoning is that because this case should never happen, we can >>> change the behavior from returning NULL to a "warn and continue" model? >> >> Well, yes. Point is, that check should have never been copy-pasted that >> way, while dropping the actual warning :) > > Ah, I see your point now. Thanks for clarifying! I'll add some of that to the patch description. Thanks! -- Cheers, David / dhildenb
On Fri, 4 Jul 2025, David Hildenbrand wrote: > On 03.07.25 16:44, Lance Yang wrote: > > On 2025/7/3 20:39, David Hildenbrand wrote: > >> On 03.07.25 14:34, Lance Yang wrote: > >>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> > >>> wrote: > >>>> > >>>> On 20.06.25 14:50, Oscar Salvador wrote: > >>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: > >>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current > >>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: > >>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was > >>>>>> readily available. > >>>>>> > >>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this > >>>>>> sanity check is not really triggering ... frequently. > >>>>>> > >>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can > >>>>>> simplify and get rid of highest_memmap_pfn. Checking for > >>>>>> pfn_to_online_page() might be even better, but it would not handle > >>>>>> ZONE_DEVICE properly. > >>>>>> > >>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a > >>>>>> problem at all ... > >>>>>> > >>>>>> What might be better in the future is having a runtime option like > >>>>>> page-table-check to enable such checks dynamically on-demand. > >>>>>> Something > >>>>>> for the future. > >>>>>> > >>>>>> Signed-off-by: David Hildenbrand <david@redhat.com> The author of 22b31eec63e5 thinks this is not at all an improvement. Of course the condition is not triggering frequently, of course it should not happen: but it does happen, and it still seems worthwhile to catch it in production with a "Bad page map" than to let it run on to whatever kind of crash it hits instead. Hugh
On 07.07.25 08:31, Hugh Dickins wrote: > On Fri, 4 Jul 2025, David Hildenbrand wrote: >> On 03.07.25 16:44, Lance Yang wrote: >>> On 2025/7/3 20:39, David Hildenbrand wrote: >>>> On 03.07.25 14:34, Lance Yang wrote: >>>>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> >>>>> wrote: >>>>>> >>>>>> On 20.06.25 14:50, Oscar Salvador wrote: >>>>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: >>>>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current >>>>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: >>>>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was >>>>>>>> readily available. >>>>>>>> >>>>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and this >>>>>>>> sanity check is not really triggering ... frequently. >>>>>>>> >>>>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can >>>>>>>> simplify and get rid of highest_memmap_pfn. Checking for >>>>>>>> pfn_to_online_page() might be even better, but it would not handle >>>>>>>> ZONE_DEVICE properly. >>>>>>>> >>>>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a >>>>>>>> problem at all ... >>>>>>>> >>>>>>>> What might be better in the future is having a runtime option like >>>>>>>> page-table-check to enable such checks dynamically on-demand. >>>>>>>> Something >>>>>>>> for the future. >>>>>>>> >>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com> > > The author of 22b31eec63e5 thinks this is not at all an improvement. > Of course the condition is not triggering frequently, of course it > should not happen: but it does happen, and it still seems worthwhile > to catch it in production with a "Bad page map" than to let it run on > to whatever kind of crash it hits instead. Well, obviously I don't agree and was waiting for having this discussion :) We catch corruption in a handful of PTE bits, and that's about it. You neither detect corruption of flags nor of PFN bits that result in another valid PFN. Corruption of the "special" bit might be fun. When I was able to trigger this during development once, the whole machine went down shortly after -- mostly because of use-after-free of something that is now a page table, which is just bad for both users of such a page! E.g., quit that process and we will happily clear the PTE, corrupting data of the other user. Fun. I'm sure I could find a way to unify the code while printing some comparable message, but this check as it stands is just not worth it IMHO: trying to handle something gracefully that shouldn't happen, when really we cannot handle it gracefully. -- Cheers, David / dhildenb
On Mon, 7 Jul 2025, David Hildenbrand wrote: > On 07.07.25 08:31, Hugh Dickins wrote: > > On Fri, 4 Jul 2025, David Hildenbrand wrote: > >> On 03.07.25 16:44, Lance Yang wrote: > >>> On 2025/7/3 20:39, David Hildenbrand wrote: > >>>> On 03.07.25 14:34, Lance Yang wrote: > >>>>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> > >>>>> wrote: > >>>>>> > >>>>>> On 20.06.25 14:50, Oscar Salvador wrote: > >>>>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: > >>>>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current > >>>>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: > >>>>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was > >>>>>>>> readily available. highest_memmap_pfn was introduced by that commit for this purpose. > >>>>>>>> > >>>>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and > >>>>>>>> this > >>>>>>>> sanity check is not really triggering ... frequently. > >>>>>>>> > >>>>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can > >>>>>>>> simplify and get rid of highest_memmap_pfn. Checking for > >>>>>>>> pfn_to_online_page() might be even better, but it would not handle > >>>>>>>> ZONE_DEVICE properly. > >>>>>>>> > >>>>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a > >>>>>>>> problem at all ... > >>>>>>>> > >>>>>>>> What might be better in the future is having a runtime option like > >>>>>>>> page-table-check to enable such checks dynamically on-demand. > >>>>>>>> Something > >>>>>>>> for the future. > >>>>>>>> > >>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com> > > > > The author of 22b31eec63e5 thinks this is not at all an improvement. > > Of course the condition is not triggering frequently, of course it > > should not happen: but it does happen, and it still seems worthwhile > > to catch it in production with a "Bad page map" than to let it run on > > to whatever kind of crash it hits instead. > > Well, obviously I don't agree and was waiting for having this discussion :) > > We catch corruption in a handful of PTE bits, and that's about it. You neither > detect corruption of flags nor of PFN bits that result in another valid PFN. Of course it's limited in what it can catch (and won't even get called if the present bit was not set - a more complete patch might unify with those various "Bad swap" messages). Of course. But it's still useful for stopping pfn_to_page() veering off the end of the memmap[] (in some configs). And it's still useful for printing out a series of "Bad page map" messages when the page table is corrupted: from which a picture can sometimes be built up (isolated instance may just be a bitflip; series of them can sometimes show e.g. ascii text, occasionally helpful for debugging). > > Corruption of the "special" bit might be fun. > > When I was able to trigger this during development once, the whole machine > went down shortly after -- mostly because of use-after-free of something that > is now a page table, which is just bad for both users of such a page! > > E.g., quit that process and we will happily clear the PTE, corrupting data of > the other user. Fun. > > I'm sure I could find a way to unify the code while printing some comparable > message, but this check as it stands is just not worth it IMHO: trying to > handle something gracefully that shouldn't happen, when really we cannot > handle it gracefully. So, you have experience of a time when it didn't help you. Okay. And we have had experience of other times when it has helped, if only a little. Like with other "Bad page"s: sometimes helpful, often not; but tending to build up a big picture from repeated occurrences. We continue to disagree. I can't argue more than append the 2.6.29 commit message, which seems to me as valid now as it was then. From 22b31eec63e5f2e219a3ee15f456897272bc73e8 Mon Sep 17 00:00:00 2001 From: Hugh Dickins <hugh@veritas.com> Date: Tue, 6 Jan 2009 14:40:09 -0800 Subject: [PATCH] badpage: vm_normal_page use print_bad_pte print_bad_pte() is so far being called only when zap_pte_range() finds negative page_mapcount, or there's a fault on a pte_file where it does not belong. That's weak coverage when we suspect pagetable corruption. Originally, it was called when vm_normal_page() found an invalid pfn: but pfn_valid is expensive on some architectures and configurations, so 2.6.24 put that under CONFIG_DEBUG_VM (which doesn't help in the field), then 2.6.26 replaced it by a VM_BUG_ON (likewise). Reinstate the print_bad_pte() in vm_normal_page(), but use a cheaper test than pfn_valid(): memmap_init_zone() (used in bootup and hotplug) keep a __read_mostly note of the highest_memmap_pfn, vm_normal_page() then check pfn against that. We could call this pfn_plausible() or pfn_sane(), but I doubt we'll need it elsewhere: of course it's not reliable, but gives much stronger pagetable validation on many boxes. Also use print_bad_pte() when the pte_special bit is found outside a VM_PFNMAP or VM_MIXEDMAP area, instead of VM_BUG_ON. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 08.07.25 04:52, Hugh Dickins wrote: > On Mon, 7 Jul 2025, David Hildenbrand wrote: >> On 07.07.25 08:31, Hugh Dickins wrote: >>> On Fri, 4 Jul 2025, David Hildenbrand wrote: >>>> On 03.07.25 16:44, Lance Yang wrote: >>>>> On 2025/7/3 20:39, David Hildenbrand wrote: >>>>>> On 03.07.25 14:34, Lance Yang wrote: >>>>>>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> On 20.06.25 14:50, Oscar Salvador wrote: >>>>>>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote: >>>>>>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current >>>>>>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage: >>>>>>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was >>>>>>>>>> readily available. > > highest_memmap_pfn was introduced by that commit for this purpose. > Oh, somehow I thought it was around before that. [...] >> We catch corruption in a handful of PTE bits, and that's about it. You neither >> detect corruption of flags nor of PFN bits that result in another valid PFN. > > Of course it's limited in what it can catch (and won't even get called > if the present bit was not set - a more complete patch might unify with > those various "Bad swap" messages). Of course. But it's still useful for > stopping pfn_to_page() veering off the end of the memmap[] (in some configs). Right, probably in the configs we both don't care that much about nowadays :) > And it's still useful for printing out a series of "Bad page map" messages > when the page table is corrupted: from which a picture can sometimes be > built up (isolated instance may just be a bitflip; series of them can > sometimes show e.g. ascii text, occasionally helpful for debugging). It's kind of a weird thing, because we do something very different opposed to other areas where we detect that something serious is going wrong (e.g., WARN). But another thread just sparked whether we should WARN here, so I'll leave that discussion to the other thread. > >> >> Corruption of the "special" bit might be fun. >> >> When I was able to trigger this during development once, the whole machine >> went down shortly after -- mostly because of use-after-free of something that >> is now a page table, which is just bad for both users of such a page! >> >> E.g., quit that process and we will happily clear the PTE, corrupting data of >> the other user. Fun. >> >> I'm sure I could find a way to unify the code while printing some comparable >> message, but this check as it stands is just not worth it IMHO: trying to >> handle something gracefully that shouldn't happen, when really we cannot >> handle it gracefully. > > So, you have experience of a time when it didn't help you. Okay. And we > have had experience of other times when it has helped, if only a little. > Like with other "Bad page"s: sometimes helpful, often not; but tending to > build up a big picture from repeated occurrences. Okay. I was rather curious how often we would actually hit this one here: from my recollection, the mapcount underflows are much more frequent than the ones from vm_normal_page(). > > We continue to disagree. I can't argue more than append the 2.6.29 > commit message, which seems to me as valid now as it was then. Not that I agree that performing these sanity checks in each and every config is something reasonable, but apparently you think that they are still useful, 16 years after they were introduced. So, let me try finding a way to unify the code while keeping that error handling for now in place. -- Cheers, David / dhildenb
On Fri, 11 Jul 2025, David Hildenbrand wrote: > On 08.07.25 04:52, Hugh Dickins wrote: > > > > Of course it's limited in what it can catch (and won't even get called > > if the present bit was not set - a more complete patch might unify with > > those various "Bad swap" messages). Of course. But it's still useful for > > stopping pfn_to_page() veering off the end of the memmap[] (in some > > configs). > > Right, probably in the configs we both don't care that much about nowadays :) I thought it was the other way round: it's useful for stopping pfn_to_page() veering off the end of the memmap[] if it's a memory model where pfn_to_page() is a simple linear conversion. As with SPARSEMEM_VMEMMAP, which I thought was the favourite nowadays. If you don't care about that one much (does hotplug prevent it?), then you do care about the complex pfn_to_page()s, and we should have worried more when "page++"s got unnecessarily converted to folio_page(folio, i) a year or two back (I'm thinking of in mm/rmap.c, maybe elsewhere). Hugh
On 11.07.25 20:49, Hugh Dickins wrote: > On Fri, 11 Jul 2025, David Hildenbrand wrote: >> On 08.07.25 04:52, Hugh Dickins wrote: >>> >>> Of course it's limited in what it can catch (and won't even get called >>> if the present bit was not set - a more complete patch might unify with >>> those various "Bad swap" messages). Of course. But it's still useful for >>> stopping pfn_to_page() veering off the end of the memmap[] (in some >>> configs). >> >> Right, probably in the configs we both don't care that much about nowadays :) > > I thought it was the other way round: it's useful for stopping > pfn_to_page() veering off the end of the memmap[] if it's a memory model > where pfn_to_page() is a simple linear conversion. > > As with SPARSEMEM_VMEMMAP, which I thought was the favourite nowadays. Yes, you're right, I had the !SPARSEMEM model in mind, but obviously, the same applies for the SPARSEMEM_VMEMMAP case as well. Only the SPARSEMEM && !SPARSEMEM_VMEMMAP model is weird. IIRC, it will dereference NULL when given a non-existant PFN. (__nr_to_section returning NULL and pfn_to_section_nr() not beeing happy about that) -- Cheers, David / dhildenb
On Mon, Jun 23, 2025 at 04:04:01PM +0200, David Hildenbrand wrote: > Hi Oscar, Hi David, > > > I'm confused, I'm missing something here. > > Before this change we would return NULL if e.g: pfn > highest_memmap_pfn, but > > now we just print the warning and call pfn_to_page() anyway. > > AFAIK, pfn_to_page() doesn't return NULL? > > You're missing that vm_normal_page_pmd() was created as a copy from > vm_normal_page() [history of the sanity check above], but as we don't have > (and shouldn't have ...) print_bad_pmd(), we made the code look like this > would be something that can just happen. Ok I see. > > " > Do the same in vm_normal_page_pmd(), where we don't even report a > problem at all ... > " > > So we made something that should never happen a runtime sanity check without > ever reporting a problem ... all clear now, thanks :-) -- Oscar Salvador SUSE Labs
© 2016 - 2025 Red Hat, Inc.