[PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)

Max Kellermann posted 1 patch 1 month, 1 week ago
There is a newer version of this series
include/linux/huge_mm.h | 3 +++
1 file changed, 3 insertions(+)
[PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Max Kellermann 1 month, 1 week ago
Calling is_huge_zero_folio(NULL) should not be legal - it makes no
sense, and a different (theoretical) implementation may dereference
the pointer.  But currently, lacking any explicit documentation, this
call is possible.

But if somebody really passes NULL, the function should not return
true - this isn't the huge zero folio after all!  However, if the
`huge_zero_folio` hasn't been allocated yet, it's NULL, and
is_huge_zero_folio(NULL) just happens to return true, which is a lie.

This weird side effect prevented me from reproducing a kernel crash
that occurred when the elements of a folio_batch were NULL - since
folios_put_refs() skips huge zero folios, this sometimes causes a
crash, but sometimes does not.  For debugging, it is better to reveal
such bugs reliably and not hide them behind random preconditions like
"has the huge zero folio already been created?"

To improve detection of such bugs, David Hildenbrand suggested adding
a VM_WARN_ON_ONCE().

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 include/linux/huge_mm.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7748489fde1b..bc9ca7798f2e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_HUGE_MM_H
 #define _LINUX_HUGE_MM_H
 
+#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()
 #include <linux/mm_types.h>
 
 #include <linux/fs.h> /* only for vma_is_dax() */
@@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn;
 
 static inline bool is_huge_zero_folio(const struct folio *folio)
 {
+	VM_WARN_ON_ONCE(folio == NULL);
+
 	return READ_ONCE(huge_zero_folio) == folio;
 }
 
-- 
2.47.2
Re: [PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Andrew Morton 1 month ago
On Wed, 27 Aug 2025 17:03:30 +0200 Max Kellermann <max.kellermann@ionos.com> wrote:

> Calling is_huge_zero_folio(NULL) should not be legal - it makes no
> sense, and a different (theoretical) implementation may dereference
> the pointer.  But currently, lacking any explicit documentation, this
> call is possible.
> 
> But if somebody really passes NULL, the function should not return
> true - this isn't the huge zero folio after all!  However, if the
> `huge_zero_folio` hasn't been allocated yet, it's NULL, and
> is_huge_zero_folio(NULL) just happens to return true, which is a lie.
> 
> This weird side effect prevented me from reproducing a kernel crash
> that occurred when the elements of a folio_batch were NULL - since
> folios_put_refs() skips huge zero folios, this sometimes causes a
> crash, but sometimes does not.  For debugging, it is better to reveal
> such bugs reliably and not hide them behind random preconditions like
> "has the huge zero folio already been created?"
> 
> To improve detection of such bugs, David Hildenbrand suggested adding
> a VM_WARN_ON_ONCE().
>
> ...
>
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -2,6 +2,7 @@
>  #ifndef _LINUX_HUGE_MM_H
>  #define _LINUX_HUGE_MM_H
>  
> +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()
>  #include <linux/mm_types.h>
>  
>  #include <linux/fs.h> /* only for vma_is_dax() */
> @@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn;
>  
>  static inline bool is_huge_zero_folio(const struct folio *folio)
>  {
> +	VM_WARN_ON_ONCE(folio == NULL);
> +
>  	return READ_ONCE(huge_zero_folio) == folio;
>  }

OK, but it remains the case that we have seen code which calls
is_huge_zero_folio() prior to the initialization of huge_zero_folio.

Is this a bug?  I think so.  Should we be checking for recurrences of
this bug?


Also, sigh.  I do dislike seeing VM_WARN_ON_ONCE() in an inline
function - heaven knows how much bloat that adds.  Defconfig
mm/huge_memory.o (which has three calls) grows by 80 bytes so I guess
that's livable with.
Re: [PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by David Hildenbrand 1 month ago
On 28.08.25 01:53, Andrew Morton wrote:
> On Wed, 27 Aug 2025 17:03:30 +0200 Max Kellermann <max.kellermann@ionos.com> wrote:
> 
>> Calling is_huge_zero_folio(NULL) should not be legal - it makes no
>> sense, and a different (theoretical) implementation may dereference
>> the pointer.  But currently, lacking any explicit documentation, this
>> call is possible.
>>
>> But if somebody really passes NULL, the function should not return
>> true - this isn't the huge zero folio after all!  However, if the
>> `huge_zero_folio` hasn't been allocated yet, it's NULL, and
>> is_huge_zero_folio(NULL) just happens to return true, which is a lie.
>>
>> This weird side effect prevented me from reproducing a kernel crash
>> that occurred when the elements of a folio_batch were NULL - since
>> folios_put_refs() skips huge zero folios, this sometimes causes a
>> crash, but sometimes does not.  For debugging, it is better to reveal
>> such bugs reliably and not hide them behind random preconditions like
>> "has the huge zero folio already been created?"
>>
>> To improve detection of such bugs, David Hildenbrand suggested adding
>> a VM_WARN_ON_ONCE().
>>
>> ...
>>
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -2,6 +2,7 @@
>>   #ifndef _LINUX_HUGE_MM_H
>>   #define _LINUX_HUGE_MM_H
>>   
>> +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()
>>   #include <linux/mm_types.h>
>>   
>>   #include <linux/fs.h> /* only for vma_is_dax() */
>> @@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn;
>>   
>>   static inline bool is_huge_zero_folio(const struct folio *folio)
>>   {
>> +	VM_WARN_ON_ONCE(folio == NULL);
>> +
>>   	return READ_ONCE(huge_zero_folio) == folio;
>>   }
> 
> OK, but it remains the case that we have seen code which calls
> is_huge_zero_folio() prior to the initialization of huge_zero_folio.
> 
> Is this a bug?  I think so.  Should we be checking for recurrences of
> this bug?

As answered elsewhere, this is perfectly fine as the huge zero folio is 
allocated on demand only (and only once enabled).

> 
> 
> Also, sigh.  I do dislike seeing VM_WARN_ON_ONCE() in an inline
> function - heaven knows how much bloat that adds.  Defconfig
> mm/huge_memory.o (which has three calls) grows by 80 bytes so I guess
> that's livable with.

Common practice to use VM_WARN_ON_ONCE() and friend in inline functions. 
Just look at page-flags.h.

If someone complains about kernel image size with CONFIG_DEBUG_VM, they 
shopuld reevaluate life choices. :)

-- 
Cheers

David / dhildenb
Re: [PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Max Kellermann 1 month ago
On Thu, Aug 28, 2025 at 1:53 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> OK, but it remains the case that we have seen code which calls
> is_huge_zero_folio() prior to the initialization of huge_zero_folio.
>
> Is this a bug?  I think so.  Should we be checking for recurrences of
> this bug?

Why do you believe it's a bug? To me, as a newbie here, such a call
seems perfectly fine; please explain.

> Also, sigh.  I do dislike seeing VM_WARN_ON_ONCE() in an inline
> function - heaven knows how much bloat that adds.  Defconfig
> mm/huge_memory.o (which has three calls) grows by 80 bytes so I guess
> that's livable with.

Oh, how is this possible? defconfig has CONFIG_DEBUG_VM=n and that
means VM_WARN_ON_ONCE() is just BUILD_BUG_ON_INVALID() which should
generate no code at all.
Here on my computer (aarch64 with GCC 14), the size (and the
disassembly) is exactly the same. This confirms what David Hildenbrand
predicted.
Are you seeing some compiler weirdness, maybe?
Re: [PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Lorenzo Stoakes 1 month, 1 week ago
On Wed, Aug 27, 2025 at 05:03:30PM +0200, Max Kellermann wrote:
> Calling is_huge_zero_folio(NULL) should not be legal - it makes no
> sense, and a different (theoretical) implementation may dereference
> the pointer.  But currently, lacking any explicit documentation, this
> call is possible.

This is true of a huge amount of kernel logic though. We can't always cover
off every single possibility when documenting these things, and there's
often an implicit assumption of this not being null.

>
> But if somebody really passes NULL, the function should not return
> true - this isn't the huge zero folio after all!  However, if the
> `huge_zero_folio` hasn't been allocated yet, it's NULL, and
> is_huge_zero_folio(NULL) just happens to return true, which is a lie.

Hmm seems like this is a bug under a bug. folio_put_refs() shouldn't be
passed a folio batch of NULL's.

Shouldn't we just put the VM_WARN_ON_ONCE() there?

>
> This weird side effect prevented me from reproducing a kernel crash
> that occurred when the elements of a folio_batch were NULL - since
> folios_put_refs() skips huge zero folios, this sometimes causes a
> crash, but sometimes does not.  For debugging, it is better to reveal
> such bugs reliably and not hide them behind random preconditions like
> "has the huge zero folio already been created?"
>
> To improve detection of such bugs, David Hildenbrand suggested adding
> a VM_WARN_ON_ONCE().

But I really don't think passing NULL to is_huge_zero_folio() is a valid
enough situation to justify this?

You've encountered a case where a bug caused folio_put_refs() to be called
with an invalid parameter, then you're arbitrarily changing
is_huge_zero_folio() so it would deref the folio and splat.

This seems not so great.

I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based
on what you've said.

>
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
>  include/linux/huge_mm.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7748489fde1b..bc9ca7798f2e 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -2,6 +2,7 @@
>  #ifndef _LINUX_HUGE_MM_H
>  #define _LINUX_HUGE_MM_H
>
> +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()

Please don't do //.

This include is suspect though, huge_mm.h is included from mm.h and thus
this very easily might break some arch that is weird about this stuff,
because a ton of stuff includes mm.h including things that might absolutely
baulk at mmdebug.

I've had this kind of thing happen several times before.

>  #include <linux/mm_types.h>
>
>  #include <linux/fs.h> /* only for vma_is_dax() */
> @@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn;
>
>  static inline bool is_huge_zero_folio(const struct folio *folio)
>  {
> +	VM_WARN_ON_ONCE(folio == NULL);
> +

Convention is VM_WARN_ON_ONCE(!folio);

>  	return READ_ONCE(huge_zero_folio) == folio;
>  }
>
> --
> 2.47.2
>
Re: [PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by David Hildenbrand 1 month ago
> This seems not so great.
> 
> I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based
> on what you've said.

No.

Everything in folio_put_refs() will dereference the folio and properly 
crash the machine when doing something stupid.

This function, however, will silently swallow a "NULL" pointer.

> 
>>
>> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
>> ---
>>   include/linux/huge_mm.h | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 7748489fde1b..bc9ca7798f2e 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -2,6 +2,7 @@
>>   #ifndef _LINUX_HUGE_MM_H
>>   #define _LINUX_HUGE_MM_H
>>
>> +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()
> 
> Please don't do //.
> 
> This include is suspect though, huge_mm.h is included from mm.h and thus
> this very easily might break some arch that is weird about this stuff,
> because a ton of stuff includes mm.h including things that might absolutely
> baulk at mmdebug.

Jup. Very likely this is not required.

> 
> I've had this kind of thing happen several times before.
> 
>>   #include <linux/mm_types.h>
>>
>>   #include <linux/fs.h> /* only for vma_is_dax() */
>> @@ -479,6 +480,8 @@ extern unsigned long huge_zero_pfn;
>>
>>   static inline bool is_huge_zero_folio(const struct folio *folio)
>>   {
>> +	VM_WARN_ON_ONCE(folio == NULL);
>> +
> 
> Convention is VM_WARN_ON_ONCE(!folio);

Wanted to comment the same thing.

-- 
Cheers

David / dhildenb
Re: [PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by David Hildenbrand 1 month ago
On 27.08.25 22:01, David Hildenbrand wrote:
> 
>> This seems not so great.
>>
>> I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based
>> on what you've said.
> 
> No.
> 
> Everything in folio_put_refs() will dereference the folio and properly
> crash the machine when doing something stupid.
> 
> This function, however, will silently swallow a "NULL" pointer.
> 
>>
>>>
>>> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
>>> ---
>>>    include/linux/huge_mm.h | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 7748489fde1b..bc9ca7798f2e 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -2,6 +2,7 @@
>>>    #ifndef _LINUX_HUGE_MM_H
>>>    #define _LINUX_HUGE_MM_H
>>>
>>> +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()
>>
>> Please don't do //.
>>
>> This include is suspect though, huge_mm.h is included from mm.h and thus
>> this very easily might break some arch that is weird about this stuff,
>> because a ton of stuff includes mm.h including things that might absolutely
>> baulk at mmdebug.
> 
> Jup. Very likely this is not required.

Took another look and including mmdebug.h should likely be fine, because 
all it includes is really

(1) include/linux/bug.h
(2) include/linux/stringify.h

But originally, I wonder if we even need mmdebug.h or if it is already 
implicitly included.

huge_mm.h includes mm_types.h (where we already do use VM_WARN...).

I assume there we get it implicitly through percpu.h, mmap_lock.h or 
percpu.h

So the include should just get dropped.

-- 
Cheers

David / dhildenb
Re: [PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Lorenzo Stoakes 1 month ago
On Wed, Aug 27, 2025 at 10:26:09PM +0200, David Hildenbrand wrote:
> On 27.08.25 22:01, David Hildenbrand wrote:
> >
> > > This seems not so great.
> > >
> > > I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based
> > > on what you've said.
> >
> > No.
> >
> > Everything in folio_put_refs() will dereference the folio and properly
> > crash the machine when doing something stupid.
> >
> > This function, however, will silently swallow a "NULL" pointer.
> >
> > >
> > > >
> > > > Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> > > > ---
> > > >    include/linux/huge_mm.h | 3 +++
> > > >    1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 7748489fde1b..bc9ca7798f2e 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -2,6 +2,7 @@
> > > >    #ifndef _LINUX_HUGE_MM_H
> > > >    #define _LINUX_HUGE_MM_H
> > > >
> > > > +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()
> > >
> > > Please don't do //.
> > >
> > > This include is suspect though, huge_mm.h is included from mm.h and thus
> > > this very easily might break some arch that is weird about this stuff,
> > > because a ton of stuff includes mm.h including things that might absolutely
> > > baulk at mmdebug.
> >
> > Jup. Very likely this is not required.
>
> Took another look and including mmdebug.h should likely be fine, because all
> it includes is really
>
> (1) include/linux/bug.h
> (2) include/linux/stringify.h
>
> But originally, I wonder if we even need mmdebug.h or if it is already
> implicitly included.
>
> huge_mm.h includes mm_types.h (where we already do use VM_WARN...).
>
> I assume there we get it implicitly through percpu.h, mmap_lock.h or
> percpu.h
>
> So the include should just get dropped.

Ah nice, let's do this then please! :)

Thanks for checking!

Cheers, Lorenzo
[PATCH v3] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Max Kellermann 1 month ago
Calling is_huge_zero_folio(NULL) should not be legal - it makes no
sense, and a different (theoretical) implementation may dereference
the pointer.  But currently, lacking any explicit documentation, this
call is possible.

But if somebody really passes NULL, the function should not return
true - this isn't the huge zero folio after all!  However, if the
`huge_zero_folio` hasn't been allocated yet, it's NULL, and
is_huge_zero_folio(NULL) just happens to return true, which is a lie.

This weird side effect prevented me from reproducing a kernel crash
that occurred when the elements of a folio_batch were NULL - since
folios_put_refs() skips huge zero folios, this sometimes causes a
crash, but sometimes does not.  For debugging, it is better to reveal
such bugs reliably and not hide them behind random preconditions like
"has the huge zero folio already been created?"

To improve detection of such bugs, David Hildenbrand suggested adding
a VM_WARN_ON_ONCE().

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
v1->v2: using VM_WARN_ON_ONCE() instead of checking huge_zero_folio
v2->v3: use "!" to check NULL; removed the #include
---
 include/linux/huge_mm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7748489fde1b..96ac47603d97 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -479,6 +479,8 @@ extern unsigned long huge_zero_pfn;
 
 static inline bool is_huge_zero_folio(const struct folio *folio)
 {
+	VM_WARN_ON_ONCE(!folio);
+
 	return READ_ONCE(huge_zero_folio) == folio;
 }
 
-- 
2.47.2
Re: [PATCH v3] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Zi Yan 1 month ago
On 28 Aug 2025, at 4:48, Max Kellermann wrote:

> Calling is_huge_zero_folio(NULL) should not be legal - it makes no
> sense, and a different (theoretical) implementation may dereference
> the pointer.  But currently, lacking any explicit documentation, this
> call is possible.
>
> But if somebody really passes NULL, the function should not return
> true - this isn't the huge zero folio after all!  However, if the
> `huge_zero_folio` hasn't been allocated yet, it's NULL, and
> is_huge_zero_folio(NULL) just happens to return true, which is a lie.
>
> This weird side effect prevented me from reproducing a kernel crash
> that occurred when the elements of a folio_batch were NULL - since
> folios_put_refs() skips huge zero folios, this sometimes causes a
> crash, but sometimes does not.  For debugging, it is better to reveal
> such bugs reliably and not hide them behind random preconditions like
> "has the huge zero folio already been created?"
>
> To improve detection of such bugs, David Hildenbrand suggested adding
> a VM_WARN_ON_ONCE().
>
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
> v1->v2: using VM_WARN_ON_ONCE() instead of checking huge_zero_folio
> v2->v3: use "!" to check NULL; removed the #include
> ---
>  include/linux/huge_mm.h | 2 ++
>  1 file changed, 2 insertions(+)
>

LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi
Re: [PATCH v3] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Lorenzo Stoakes 1 month ago
Please don't send this in reply to previous versions, just send as a
separate mail :) (yes email for dev sucks and there's a bunch of weird
nuances so totally understandable!)

Hopefully Andrew will pick this up regardless.

On Thu, Aug 28, 2025 at 10:48:20AM +0200, Max Kellermann wrote:
> Calling is_huge_zero_folio(NULL) should not be legal - it makes no
> sense, and a different (theoretical) implementation may dereference
> the pointer.  But currently, lacking any explicit documentation, this
> call is possible.
>
> But if somebody really passes NULL, the function should not return
> true - this isn't the huge zero folio after all!  However, if the
> `huge_zero_folio` hasn't been allocated yet, it's NULL, and
> is_huge_zero_folio(NULL) just happens to return true, which is a lie.
>
> This weird side effect prevented me from reproducing a kernel crash
> that occurred when the elements of a folio_batch were NULL - since
> folios_put_refs() skips huge zero folios, this sometimes causes a
> crash, but sometimes does not.  For debugging, it is better to reveal
> such bugs reliably and not hide them behind random preconditions like
> "has the huge zero folio already been created?"
>
> To improve detection of such bugs, David Hildenbrand suggested adding
> a VM_WARN_ON_ONCE().
>
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>

LGTM, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
> v1->v2: using VM_WARN_ON_ONCE() instead of checking huge_zero_folio
> v2->v3: use "!" to check NULL; removed the #include

Putting the history in is great though thanks! :)

> ---
>  include/linux/huge_mm.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7748489fde1b..96ac47603d97 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -479,6 +479,8 @@ extern unsigned long huge_zero_pfn;
>
>  static inline bool is_huge_zero_folio(const struct folio *folio)
>  {
> +	VM_WARN_ON_ONCE(!folio);
> +
>  	return READ_ONCE(huge_zero_folio) == folio;
>  }
>
> --
> 2.47.2
>
Re: [PATCH v2] huge_mm.h: disallow is_huge_zero_folio(NULL)
Posted by Max Kellermann 1 month, 1 week ago
On Wed, Aug 27, 2025 at 5:21 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> > But if somebody really passes NULL, the function should not return
> > true - this isn't the huge zero folio after all!  However, if the
> > `huge_zero_folio` hasn't been allocated yet, it's NULL, and
> > is_huge_zero_folio(NULL) just happens to return true, which is a lie.
>
> Hmm seems like this is a bug under a bug. folio_put_refs() shouldn't be
> passed a folio batch of NULL's.

Agree! That was exactly my point - I was hunting down a bug that
sometimes caused folio_put_refs() to crash, but most of the time not
(when no zero huge page was allocated yet). And this randomness is
what I'd like to get rid of.

> Shouldn't we just put the VM_WARN_ON_ONCE() there?

Agree, but that was the 2/2 patch I dropped after David's objection.

> But I really don't think passing NULL to is_huge_zero_folio() is a valid
> enough situation to justify this?
>
> You've encountered a case where a bug caused folio_put_refs() to be called
> with an invalid parameter, then you're arbitrarily changing
> is_huge_zero_folio() so it would deref the folio and splat.

Actually, my v1 patch did not do that. Instead, it checked whether the
huge zero page was already allocated, in order to make
is_huge_zero_folio(NULL) to reliably return false, because NULL is not
the huge zero page. Then David disagreed and asked me to add
VM_WARN_ON_ONCE() instead.

> I really think the VM_WARN_ON_ONCE() should be in folios_put_refs() based
> on what you've said.

You only disagree with David, but not with me. I'm happy with either
way of dealing with this kind of bug/abuse.

> > +#include <linux/mmdebug.h> // for VM_WARN_ON_ONCE()
>
> Please don't do //.

In Linux-main, there are currently 432 comments documenting #include
lines. This is a pretty common coding style.

> This include is suspect though, huge_mm.h is included from mm.h and thus
> this very easily might break some arch that is weird about this stuff,
> because a ton of stuff includes mm.h including things that might absolutely
> baulk at mmdebug.

What would you suggest doing instead, to make the VM_WARN_ON_ONCE()
macro available?

> I've had this kind of thing happen several times before.

I know, #includes in Linux are a big mess. A while ago, I tried to
help clean it up, but my effort was rejected by the kernel
maintainers. Which is a pity.