include/linux/huge_mm.h | 3 +++ 1 file changed, 3 insertions(+)
Calling is_huge_zero_folio(NULL) should not be legal - it makes no
sense, and a different (theoretical) implementation may dereference
the pointer. But currently, lacking any explicit documentation, this
call is possible.
But if somebody really passes NULL, the function should not return
true - this isn't the huge zero folio after all! However, if the
`huge_zero_folio` hasn't been allocated yet, it's NULL, and
is_huge_zero_folio(NULL) just happens to return true, which is a lie.
This weird side effect prevented me from reproducing a kernel crash
that occurred when the elements of a folio_batch were NULL - since
folios_put_refs() skips huge zero folios, this sometimes causes a
crash, but sometimes does not. For debugging, it is better to reveal
such bugs reliably and not hide them behind random preconditions like
"has the huge zero folio already been created?"
To improve detection of such bugs, David Hildenbrand suggested adding
a VM_WARN_ON_ONCE().
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
include/linux/huge_mm.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7748489fde1b..bc9ca7798f2e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -2,6 +2,7 @@
#ifndef _LINUX_HUGE_MM_H
#define _LINUX_HUGE_MM_H
+#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()
#include <linux/mm_types.h>
#include <linux/fs.h> /* only for vma_is_dax() */
@@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn;
static inline bool is_huge_zero_folio(const struct folio *folio)
{
+ VM_WARN_ON_ONCE(folio == NULL);
+
return READ_ONCE(huge_zero_folio) == folio;
}
--
2.47.2
On Wed, 27 Aug 2025 17:03:30 +0200 Max Kellermann <max.kellermann@ionos.com> wrote: > Calling is_huge_zero_folio(NULL) should not be legal - it makes no > sense, and a different (theoretical) implementation may dereference > the pointer. But currently, lacking any explicit documentation, this > call is possible. > > But if somebody really passes NULL, the function should not return > true - this isn't the huge zero folio after all! However, if the > `huge_zero_folio` hasn't been allocated yet, it's NULL, and > is_huge_zero_folio(NULL) just happens to return true, which is a lie. > > This weird side effect prevented me from reproducing a kernel crash > that occurred when the elements of a folio_batch were NULL - since > folios_put_refs() skips huge zero folios, this sometimes causes a > crash, but sometimes does not. For debugging, it is better to reveal > such bugs reliably and not hide them behind random preconditions like > "has the huge zero folio already been created?" > > To improve detection of such bugs, David Hildenbrand suggested adding > a VM_WARN_ON_ONCE(). > > ... > > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -2,6 +2,7 @@ > #ifndef _LINUX_HUGE_MM_H > #define _LINUX_HUGE_MM_H > > +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE() > #include <linux/mm_types.h> > > #include <linux/fs.h> /* only for vma_is_dax() */ > @@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn; > > static inline bool is_huge_zero_folio(const struct folio *folio) > { > + VM_WARN_ON_ONCE(folio == NULL); > + > return READ_ONCE(huge_zero_folio) == folio; > } OK, but it remains the case that we have seen code which calls is_huge_zero_folio() prior to the initialization of huge_zero_folio. Is this a bug? I think so. Should we be checking for recurrences of this bug? Also, sigh. I do dislike seeing VM_WARN_ON_ONCE() in an inline function - heaven knows how much bloat that adds. Defconfig mm/huge_memory.o (which has three calls) grows by 80 bytes so I guess that's livable with.
On 28.08.25 01:53, Andrew Morton wrote: > On Wed, 27 Aug 2025 17:03:30 +0200 Max Kellermann <max.kellermann@ionos.com> wrote: > >> Calling is_huge_zero_folio(NULL) should not be legal - it makes no >> sense, and a different (theoretical) implementation may dereference >> the pointer. But currently, lacking any explicit documentation, this >> call is possible. >> >> But if somebody really passes NULL, the function should not return >> true - this isn't the huge zero folio after all! However, if the >> `huge_zero_folio` hasn't been allocated yet, it's NULL, and >> is_huge_zero_folio(NULL) just happens to return true, which is a lie. >> >> This weird side effect prevented me from reproducing a kernel crash >> that occurred when the elements of a folio_batch were NULL - since >> folios_put_refs() skips huge zero folios, this sometimes causes a >> crash, but sometimes does not. For debugging, it is better to reveal >> such bugs reliably and not hide them behind random preconditions like >> "has the huge zero folio already been created?" >> >> To improve detection of such bugs, David Hildenbrand suggested adding >> a VM_WARN_ON_ONCE(). >> >> ... >> >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -2,6 +2,7 @@ >> #ifndef _LINUX_HUGE_MM_H >> #define _LINUX_HUGE_MM_H >> >> +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE() >> #include <linux/mm_types.h> >> >> #include <linux/fs.h> /* only for vma_is_dax() */ >> @@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn; >> >> static inline bool is_huge_zero_folio(const struct folio *folio) >> { >> + VM_WARN_ON_ONCE(folio == NULL); >> + >> return READ_ONCE(huge_zero_folio) == folio; >> } > > OK, but it remains the case that we have seen code which calls > is_huge_zero_folio() prior to the initialization of huge_zero_folio. > > Is this a bug? I think so. Should we be checking for recurrences of > this bug? As answered elsewhere, this is perfectly fine as the huge zero folio is allocated on demand only (and only once enabled). > > > Also, sigh. I do dislike seeing VM_WARN_ON_ONCE() in an inline > function - heaven knows how much bloat that adds. Defconfig > mm/huge_memory.o (which has three calls) grows by 80 bytes so I guess > that's livable with. Common practice to use VM_WARN_ON_ONCE() and friend in inline functions. Just look at page-flags.h. If someone complains about kernel image size with CONFIG_DEBUG_VM, they shopuld reevaluate life choices. :) -- Cheers David / dhildenb
On Thu, Aug 28, 2025 at 1:53 AM Andrew Morton <akpm@linux-foundation.org> wrote: > OK, but it remains the case that we have seen code which calls > is_huge_zero_folio() prior to the initialization of huge_zero_folio. > > Is this a bug? I think so. Should we be checking for recurrences of > this bug? Why do you believe it's a bug? To me, as a newbie here, such a call seems perfectly fine; please explain. > Also, sigh. I do dislike seeing VM_WARN_ON_ONCE() in an inline > function - heaven knows how much bloat that adds. Defconfig > mm/huge_memory.o (which has three calls) grows by 80 bytes so I guess > that's livable with. Oh, how is this possible? defconfig has CONFIG_DEBUG_VM=n and that means VM_WARN_ON_ONCE() is just BUILD_BUG_ON_INVALID() which should generate no code at all. Here on my computer (aarch64 with GCC 14), the size (and the disassembly) is exactly the same. This confirms what David Hildenbrand predicted. Are you seeing some compiler weirdness, maybe?
On Wed, Aug 27, 2025 at 05:03:30PM +0200, Max Kellermann wrote: > Calling is_huge_zero_folio(NULL) should not be legal - it makes no > sense, and a different (theoretical) implementation may dereference > the pointer. But currently, lacking any explicit documentation, this > call is possible. This is true of a huge amount of kernel logic though. We can't always cover off every single possibility when documenting these things, and there's often an implicit assumption of this not being null. > > But if somebody really passes NULL, the function should not return > true - this isn't the huge zero folio after all! However, if the > `huge_zero_folio` hasn't been allocated yet, it's NULL, and > is_huge_zero_folio(NULL) just happens to return true, which is a lie. Hmm seems like this is a bug under a bug. folio_put_refs() shouldn't be passed a folio batch of NULL's. Shouldn't we just put the VM_WARN_ON_ONCE() there? > > This weird side effect prevented me from reproducing a kernel crash > that occurred when the elements of a folio_batch were NULL - since > folios_put_refs() skips huge zero folios, this sometimes causes a > crash, but sometimes does not. For debugging, it is better to reveal > such bugs reliably and not hide them behind random preconditions like > "has the huge zero folio already been created?" > > To improve detection of such bugs, David Hildenbrand suggested adding > a VM_WARN_ON_ONCE(). But I really don't think passing NULL to is_huge_zero_folio() is a valid enough situation to justify this? You've encountered a case where a bug caused folio_put_refs() to be called with an invalid parameter, then you're arbitrarily changing is_huge_zero_folio() so it would deref the folio and splat. This seems not so great. I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based on what you've said. > > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> > --- > include/linux/huge_mm.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 7748489fde1b..bc9ca7798f2e 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -2,6 +2,7 @@ > #ifndef _LINUX_HUGE_MM_H > #define _LINUX_HUGE_MM_H > > +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE() Please don't do //. This include is suspect though, huge_mm.h is included from mm.h and thus this very easily might break some arch that is weird about this stuff, because a ton of stuff includes mm.h including things that might absolutely baulk at mmdebug. I've had this kind of thing happen several times before. > #include <linux/mm_types.h> > > #include <linux/fs.h> /* only for vma_is_dax() */ > @@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn; > > static inline bool is_huge_zero_folio(const struct folio *folio) > { > + VM_WARN_ON_ONCE(folio == NULL); > + Convention is VM_WARN_ON_ONCE(!folio); > return READ_ONCE(huge_zero_folio) == folio; > } > > -- > 2.47.2 >
> This seems not so great. > > I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based > on what you've said. No. Everything in folio_put_refs() will dereference the folio and properly crash the machine when doing something stupid. This function, however, will silently swallow a "NULL" pointer. > >> >> Signed-off-by: Max Kellermann <max.kellermann@ionos.com> >> --- >> include/linux/huge_mm.h | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 7748489fde1b..bc9ca7798f2e 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -2,6 +2,7 @@ >> #ifndef _LINUX_HUGE_MM_H >> #define _LINUX_HUGE_MM_H >> >> +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE() > > Please don't do //. > > This include is suspect though, huge_mm.h is included from mm.h and thus > this very easily might break some arch that is weird about this stuff, > because a ton of stuff includes mm.h including things that might absolutely > baulk at mmdebug. Jup. Very likely this is not required. > > I've had this kind of thing happen several times before. > >> #include <linux/mm_types.h> >> >> #include <linux/fs.h> /* only for vma_is_dax() */ >> @@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn; >> >> static inline bool is_huge_zero_folio(const struct folio *folio) >> { >> + VM_WARN_ON_ONCE(folio == NULL); >> + > > Convention is VM_WARN_ON_ONCE(!folio); Wanted to comment the same thing. -- Cheers David / dhildenb
On 27.08.25 22:01, David Hildenbrand wrote: > >> This seems not so great. >> >> I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based >> on what you've said. > > No. > > Everything in folio_put_refs() will dereference the folio and properly > crash the machine when doing something stupid. > > This function, however, will silently swallow a "NULL" pointer. > >> >>> >>> Signed-off-by: Max Kellermann <max.kellermann@ionos.com> >>> --- >>> include/linux/huge_mm.h | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>> index 7748489fde1b..bc9ca7798f2e 100644 >>> --- a/include/linux/huge_mm.h >>> +++ b/include/linux/huge_mm.h >>> @@ -2,6 +2,7 @@ >>> #ifndef _LINUX_HUGE_MM_H >>> #define _LINUX_HUGE_MM_H >>> >>> +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE() >> >> Please don't do //. >> >> This include is suspect though, huge_mm.h is included from mm.h and thus >> this very easily might break some arch that is weird about this stuff, >> because a ton of stuff includes mm.h including things that might absolutely >> baulk at mmdebug. > > Jup. Very likely this is not required. Took another look and including mmdebug.h should likely be fine, because all it includes is really (1) include/linux/bug.h (2) include/linux/stringify.h But originally, I wonder if we even need mmdebug.h or if it is already implicitly included. huge_mm.h includes mm_types.h (where we already do use VM_WARN...). I assume there we get it implicitly through percpu.h, mmap_lock.h or percpu.h So the include should just get dropped. -- Cheers David / dhildenb
On Wed, Aug 27, 2025 at 10:26:09PM +0200, David Hildenbrand wrote: > On 27.08.25 22:01, David Hildenbrand wrote: > > > > > This seems not so great. > > > > > > I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based > > > on what you've said. > > > > No. > > > > Everything in folio_put_refs() will dereference the folio and properly > > crash the machine when doing something stupid. > > > > This function, however, will silently swallow a "NULL" pointer. > > > > > > > > > > > > > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> > > > > --- > > > > include/linux/huge_mm.h | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > > index 7748489fde1b..bc9ca7798f2e 100644 > > > > --- a/include/linux/huge_mm.h > > > > +++ b/include/linux/huge_mm.h > > > > @@ -2,6 +2,7 @@ > > > > #ifndef _LINUX_HUGE_MM_H > > > > #define _LINUX_HUGE_MM_H > > > > > > > > +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE() > > > > > > Please don't do //. > > > > > > This include is suspect though, huge_mm.h is included from mm.h and thus > > > this very easily might break some arch that is weird about this stuff, > > > because a ton of stuff includes mm.h including things that might absolutely > > > baulk at mmdebug. > > > > Jup. Very likely this is not required. > > Took another look and including mmdebug.h should likely be fine, because all > it includes is really > > (1) include/linux/bug.h > (2) include/linux/stringify.h > > But originally, I wonder if we even need mmdebug.h or if it is already > implicitly included. > > huge_mm.h includes mm_types.h (where we already do use VM_WARN...). > > I assume there we get it implicitly through percpu.h, mmap_lock.h or > percpu.h > > So the include should just get dropped. Ah nice, let's do this then please! :) Thanks for checking! Cheers, Lorenzo
Calling is_huge_zero_folio(NULL) should not be legal - it makes no
sense, and a different (theoretical) implementation may dereference
the pointer. But currently, lacking any explicit documentation, this
call is possible.
But if somebody really passes NULL, the function should not return
true - this isn't the huge zero folio after all! However, if the
`huge_zero_folio` hasn't been allocated yet, it's NULL, and
is_huge_zero_folio(NULL) just happens to return true, which is a lie.
This weird side effect prevented me from reproducing a kernel crash
that occurred when the elements of a folio_batch were NULL - since
folios_put_refs() skips huge zero folios, this sometimes causes a
crash, but sometimes does not. For debugging, it is better to reveal
such bugs reliably and not hide them behind random preconditions like
"has the huge zero folio already been created?"
To improve detection of such bugs, David Hildenbrand suggested adding
a VM_WARN_ON_ONCE().
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
v1->v2: using VM_WARN_ON_ONCE() instead of checking huge_zero_folio
v2->v3: use "!" to check NULL; removed the #include
---
include/linux/huge_mm.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7748489fde1b..96ac47603d97 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -479,6 +479,8 @@ extern unsigned long huge_zero_pfn;
static inline bool is_huge_zero_folio(const struct folio *folio)
{
+ VM_WARN_ON_ONCE(!folio);
+
return READ_ONCE(huge_zero_folio) == folio;
}
--
2.47.2
On 28 Aug 2025, at 4:48, Max Kellermann wrote: > Calling is_huge_zero_folio(NULL) should not be legal - it makes no > sense, and a different (theoretical) implementation may dereference > the pointer. But currently, lacking any explicit documentation, this > call is possible. > > But if somebody really passes NULL, the function should not return > true - this isn't the huge zero folio after all! However, if the > `huge_zero_folio` hasn't been allocated yet, it's NULL, and > is_huge_zero_folio(NULL) just happens to return true, which is a lie. > > This weird side effect prevented me from reproducing a kernel crash > that occurred when the elements of a folio_batch were NULL - since > folios_put_refs() skips huge zero folios, this sometimes causes a > crash, but sometimes does not. For debugging, it is better to reveal > such bugs reliably and not hide them behind random preconditions like > "has the huge zero folio already been created?" > > To improve detection of such bugs, David Hildenbrand suggested adding > a VM_WARN_ON_ONCE(). > > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> > --- > v1->v2: using VM_WARN_ON_ONCE() instead of checking huge_zero_folio > v2->v3: use "!" to check NULL; removed the #include > --- > include/linux/huge_mm.h | 2 ++ > 1 file changed, 2 insertions(+) > LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com> -- Best Regards, Yan, Zi
Please don't send this in reply to previous versions, just send as a separate mail :) (yes email for dev sucks and there's a bunch of weird nuances so totally understandable!) Hopefully Andrew will pick this up regardless. On Thu, Aug 28, 2025 at 10:48:20AM +0200, Max Kellermann wrote: > Calling is_huge_zero_folio(NULL) should not be legal - it makes no > sense, and a different (theoretical) implementation may dereference > the pointer. But currently, lacking any explicit documentation, this > call is possible. > > But if somebody really passes NULL, the function should not return > true - this isn't the huge zero folio after all! However, if the > `huge_zero_folio` hasn't been allocated yet, it's NULL, and > is_huge_zero_folio(NULL) just happens to return true, which is a lie. > > This weird side effect prevented me from reproducing a kernel crash > that occurred when the elements of a folio_batch were NULL - since > folios_put_refs() skips huge zero folios, this sometimes causes a > crash, but sometimes does not. For debugging, it is better to reveal > such bugs reliably and not hide them behind random preconditions like > "has the huge zero folio already been created?" > > To improve detection of such bugs, David Hildenbrand suggested adding > a VM_WARN_ON_ONCE(). > > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> LGTM, so: Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > v1->v2: using VM_WARN_ON_ONCE() instead of checking huge_zero_folio > v2->v3: use "!" to check NULL; removed the #include Putting the history in is great though thanks! :) > --- > include/linux/huge_mm.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 7748489fde1b..96ac47603d97 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -479,6 +479,8 @@ extern unsigned long huge_zero_pfn; > > static inline bool is_huge_zero_folio(const struct folio *folio) > { > + VM_WARN_ON_ONCE(!folio); > + > return READ_ONCE(huge_zero_folio) == folio; > } > > -- > 2.47.2 >
On Wed, Aug 27, 2025 at 5:21 PM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > But if somebody really passes NULL, the function should not return > > true - this isn't the huge zero folio after all! However, if the > > `huge_zero_folio` hasn't been allocated yet, it's NULL, and > > is_huge_zero_folio(NULL) just happens to return true, which is a lie. > > Hmm seems like this is a bug under a bug. folio_put_refs() shouldn't be > passed a folio batch of NULL's. Agree! That was exactly my point - I was hunting down a bug that sometimes caused folio_put_refs() to crash, but most of the time not (when no zero huge page was allocated yet). And this randomness is what I'd like to get rid of. > Shouldn't we just put the VM_WARN_ON_ONCE() there? Agree, but that was the 2/2 patch I dropped after David's objection. > But I really don't think passing NULL to is_huge_zero_folio() is a valid > enough situation to justify this? > > You've encountered a case where a bug caused folio_put_refs() to be called > with an invalid parameter, then you're arbitrarily changing > is_huge_zero_folio() so it would deref the folio and splat. Actually, my v1 patch did not do that. Instead, it checked whether the huge zero page was already allocated, in order to make is_huge_zero_folio(NULL) to reliably return false, because NULL is not the huge zero page. Then David disagreed and asked me to add VM_WARN_ON_ONCE() instead. > I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based > on what you've said. You only disagree with David, but not with me. I'm happy with either way of dealing with this kind of bug/abuse. > > +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE() > > Please don't do //. In Linux-main, there are currently 432 comments documenting #include lines. This is a pretty common coding style. > This include is suspect though, huge_mm.h is included from mm.h and thus > this very easily might break some arch that is weird about this stuff, > because a ton of stuff includes mm.h including things that might absolutely > baulk at mmdebug. What would you suggest doing instead, to make the VM_WARN_ON_ONCE() macro available? > I've had this kind of thing happen several times before. I know, #includes in Linux are a big mess. A while ago, I tried to help clean it up, but my effort was rejected by the kernel maintainers. Which is a pity.
© 2016 - 2025 Red Hat, Inc.