folio->mapping == NULL check issue

[RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by Zi Yan 2 months, 2 weeks ago

folio_split_supported() used in try_folio_split_to_order() requires
folio->mapping to be non NULL, but current try_folio_split_to_order() does
not check it. Add the check to prevent NULL pointer dereference.

There is no issue in the current code, since try_folio_split_to_order() is
only used in truncate_inode_partial_folio(), where folio->mapping is not
NULL.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/huge_mm.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 1d439de1ca2c..0d55354e3a34 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
 static inline int try_folio_split_to_order(struct folio *folio,
 		struct page *page, unsigned int new_order)
 {
+	/*
+	 * Folios that just got truncated cannot get split. Signal to the
+	 * caller that there was a race.
+	 */
+	if (!folio_test_anon(folio) && !folio->mapping)
+		return -EBUSY;
+
 	if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
 		return split_huge_page_to_order(&folio->page, new_order);
 	return folio_split(folio, new_order, page, NULL);
-- 
2.51.0

Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by David Hildenbrand (Red Hat) 2 months, 2 weeks ago

On 11/20/25 04:59, Zi Yan wrote:
> folio_split_supported() used in try_folio_split_to_order() requires
> folio->mapping to be non NULL, but current try_folio_split_to_order() does
> not check it. Add the check to prevent NULL pointer dereference.
> 
> There is no issue in the current code, since try_folio_split_to_order() is
> only used in truncate_inode_partial_folio(), where folio->mapping is not
> NULL.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   include/linux/huge_mm.h | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 1d439de1ca2c..0d55354e3a34 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>   static inline int try_folio_split_to_order(struct folio *folio,
>   		struct page *page, unsigned int new_order)
>   {
> +	/*
> +	 * Folios that just got truncated cannot get split. Signal to the
> +	 * caller that there was a race.
> +	 */
> +	if (!folio_test_anon(folio) && !folio->mapping)
> +		return -EBUSY;
> +
>   	if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>   		return split_huge_page_to_order(&folio->page, new_order);
>   	return folio_split(folio, new_order, page, NULL);

I guess we'll take the one from Wei

https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com

right?

-- 
Cheers

David

Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by Zi Yan 2 months, 2 weeks ago

On 20 Nov 2025, at 4:25, David Hildenbrand (Red Hat) wrote:

> On 11/20/25 04:59, Zi Yan wrote:
>> folio_split_supported() used in try_folio_split_to_order() requires
>> folio->mapping to be non NULL, but current try_folio_split_to_order() does
>> not check it. Add the check to prevent NULL pointer dereference.
>>
>> There is no issue in the current code, since try_folio_split_to_order() is
>> only used in truncate_inode_partial_folio(), where folio->mapping is not
>> NULL.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>   include/linux/huge_mm.h | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 1d439de1ca2c..0d55354e3a34 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>>   static inline int try_folio_split_to_order(struct folio *folio,
>>   		struct page *page, unsigned int new_order)
>>   {
>> +	/*
>> +	 * Folios that just got truncated cannot get split. Signal to the
>> +	 * caller that there was a race.
>> +	 */
>> +	if (!folio_test_anon(folio) && !folio->mapping)
>> +		return -EBUSY;
>> +
>>   	if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>>   		return split_huge_page_to_order(&folio->page, new_order);
>>   	return folio_split(folio, new_order, page, NULL);
>
> I guess we'll take the one from Wei
>
> https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com
>
> right?

This is different. Wei’s fix is to __folio_split(), but mine is to
try_folio_split_to_order(). Both call folio_split_supported(), thus
both need the folio->mapping check.

That is also my question in the cover letter on whether we should
move folio->mapping check to folio_split_supported() and return
error code instead of bool. Otherwise, any folio_split_supported()
caller needs to check folio->mapping.

Best Regards,
Yan, Zi

Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by David Hildenbrand (Red Hat) 2 months, 2 weeks ago

On 11/20/25 15:41, Zi Yan wrote:
> On 20 Nov 2025, at 4:25, David Hildenbrand (Red Hat) wrote:
> 
>> On 11/20/25 04:59, Zi Yan wrote:
>>> folio_split_supported() used in try_folio_split_to_order() requires
>>> folio->mapping to be non NULL, but current try_folio_split_to_order() does
>>> not check it. Add the check to prevent NULL pointer dereference.
>>>
>>> There is no issue in the current code, since try_folio_split_to_order() is
>>> only used in truncate_inode_partial_folio(), where folio->mapping is not
>>> NULL.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>    include/linux/huge_mm.h | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 1d439de1ca2c..0d55354e3a34 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>>>    static inline int try_folio_split_to_order(struct folio *folio,
>>>    		struct page *page, unsigned int new_order)
>>>    {
>>> +	/*
>>> +	 * Folios that just got truncated cannot get split. Signal to the
>>> +	 * caller that there was a race.
>>> +	 */
>>> +	if (!folio_test_anon(folio) && !folio->mapping)
>>> +		return -EBUSY;
>>> +
>>>    	if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>>>    		return split_huge_page_to_order(&folio->page, new_order);
>>>    	return folio_split(folio, new_order, page, NULL);
>>
>> I guess we'll take the one from Wei
>>
>> https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com
>>
>> right?
> 
> This is different. Wei’s fix is to __folio_split(), but mine is to
> try_folio_split_to_order(). Both call folio_split_supported(), thus
> both need the folio->mapping check.

Ah, good that I double-checked :)

> 
> That is also my question in the cover letter on whether we should
> move folio->mapping check to folio_split_supported() and return
> error code instead of bool. Otherwise, any folio_split_supported()
> caller needs to check folio->mapping.

I think the situation with truncation (-that shmem swapcache thing, 
let's ignore that for now) is that the folio cannot be split until fully 
freed. But we don't want to return -EINVAL to the caller, the assumption 
is that the folio will soon get resolved -- folio freed -- and the 
caller will be able to make progress. So it's not really expected to be 
persistent.

-EINVAL rather signals "this cannot possibly work, so fail whatever you 
are trying".

We rather want to indicate "there was some race situation, if you try 
again later it might work or might have resolved itself".

Not sure I like returning an error from folio_split_supported(), as it's 
rather a boolean check (supported vs. not supported).

Likely we could just return "false" for truncated folios in 
folio_split_supported(), but then state that that case must be handled 
upfront.

We could provide another helper to wrap the truncation check, hmmm


BTW, I wonder if the is_huge_zero_folio() check should go into 
folio_split_supported() and just return in -EINVAL. (we shouldn't really 
trigger that). Similarly we could add a hugetlb sanity check.

-- 
Cheers

David

Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by Zi Yan 2 months, 2 weeks ago

On 20 Nov 2025, at 14:56, David Hildenbrand (Red Hat) wrote:

> On 11/20/25 15:41, Zi Yan wrote:
>> On 20 Nov 2025, at 4:25, David Hildenbrand (Red Hat) wrote:
>>
>>> On 11/20/25 04:59, Zi Yan wrote:
>>>> folio_split_supported() used in try_folio_split_to_order() requires
>>>> folio->mapping to be non NULL, but current try_folio_split_to_order() does
>>>> not check it. Add the check to prevent NULL pointer dereference.
>>>>
>>>> There is no issue in the current code, since try_folio_split_to_order() is
>>>> only used in truncate_inode_partial_folio(), where folio->mapping is not
>>>> NULL.
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>>    include/linux/huge_mm.h | 7 +++++++
>>>>    1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 1d439de1ca2c..0d55354e3a34 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>>>>    static inline int try_folio_split_to_order(struct folio *folio,
>>>>    		struct page *page, unsigned int new_order)
>>>>    {
>>>> +	/*
>>>> +	 * Folios that just got truncated cannot get split. Signal to the
>>>> +	 * caller that there was a race.
>>>> +	 */
>>>> +	if (!folio_test_anon(folio) && !folio->mapping)
>>>> +		return -EBUSY;
>>>> +
>>>>    	if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>>>>    		return split_huge_page_to_order(&folio->page, new_order);
>>>>    	return folio_split(folio, new_order, page, NULL);
>>>
>>> I guess we'll take the one from Wei
>>>
>>> https://lkml.kernel.org/r/20251119235302.24773-1-richard.weiyang@gmail.com
>>>
>>> right?
>>
>> This is different. Wei’s fix is to __folio_split(), but mine is to
>> try_folio_split_to_order(). Both call folio_split_supported(), thus
>> both need the folio->mapping check.
>
> Ah, good that I double-checked :)
>
>>
>> That is also my question in the cover letter on whether we should
>> move folio->mapping check to folio_split_supported() and return
>> error code instead of bool. Otherwise, any folio_split_supported()
>> caller needs to check folio->mapping.
>
> I think the situation with truncation (-that shmem swapcache thing, let's ignore that for now) is that the folio cannot be split until fully freed. But we don't want to return -EINVAL to the caller, the assumption is that the folio will soon get resolved -- folio freed -- and the caller will be able to make progress. So it's not really expected to be persistent.
>
> -EINVAL rather signals "this cannot possibly work, so fail whatever you are trying".
>
> We rather want to indicate "there was some race situation, if you try again later it might work or might have resolved itself".
>
> Not sure I like returning an error from folio_split_supported(), as it's rather a boolean check (supported vs. not supported).

Right. My current idea (from the cover letter) is to rename it to
folio_split_can_split (or folio_split_check, so that it does not sound like
a bool return is needed).

>
> Likely we could just return "false" for truncated folios in folio_split_supported(), but then state that that case must be handled upfront.
>
> We could provide another helper to wrap the truncation check, hmmm

Yeah, that sounds complicated too. Putting this truncated folio check outside
of folio_split_supported() looks really error prone and anonying.

>
>
> BTW, I wonder if the is_huge_zero_folio() check should go into folio_split_supported() and just return in -EINVAL. (we shouldn't really trigger that). Similarly we could add a hugetlb sanity check.

Yeah, is_huge_zero_folio() should return -EINVAL not -EBUSY, except
the case the split happens before a process writes 0 to a zero large folio
and gets a new writable large folio, in which we can kinda say it looks like
-EBUSY. But it is still a stretch.

Ack on adding hugetlb sanity check.

OK, just to reiterate my above idea on renaming folio_split_supported().
Are you OK with renaming it to folio_split_check(), so that returning -EBUSY
and -EINVAL looks more reasonable? The benefit is that we no longer need
to worry about we need to always do folio->mapping check before
folio_split_supported(). (In addition, I would rename can_split_folio()
to folio_split_refcount_check() for clarification)


Best Regards,
Yan, Zi

Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by David Hildenbrand (Red Hat) 2 months, 2 weeks ago

>>
>> BTW, I wonder if the is_huge_zero_folio() check should go into folio_split_supported() and just return in -EINVAL. (we shouldn't really trigger that). Similarly we could add a hugetlb sanity check.
> 
> Yeah, is_huge_zero_folio() should return -EINVAL not -EBUSY, except
> the case the split happens before a process writes 0 to a zero large folio
> and gets a new writable large folio, in which we can kinda say it looks like
> -EBUSY. But it is still a stretch.

I see what you mean, but I think this has less to do with actual races. 
SO yeah, -EINVAL is likely the tight thing.

> 
> Ack on adding hugetlb sanity check.
> 
> OK, just to reiterate my above idea on renaming folio_split_supported().
> Are you OK with renaming it to folio_split_check(), so that returning -EBUSY
> and -EINVAL looks more reasonable? The benefit is that we no longer need
> to worry about we need to always do folio->mapping check before
> folio_split_supported(). (In addition, I would rename can_split_folio()
> to folio_split_refcount_check() for clarification)
I guess having some function that tells you "I performed all checks I 
could without taking locks/references (like anon_vma) and starting with 
the real magic" is what you have in mind.

For these we don't have to prefix with "folio_split" if it sounds weird.

folio_check_splittable() ?

Regarding can_split_folio(), I was wondering whether we can just get rid 
of it and use folio_expect_ref_count() instead?

For the two callers that need extra_pins, we could just have something 
simple helper in huge_memory.c like

/* Number of folio references from the pagecache or the swapcache. */
unsigned int folio_cache_references(const struct folio *folio)
{
	if (folio_test_anon(folio) && !folio_test_swapcache(folio))
		return 0;
	return folio_nr_pages(folio);
}


-- 
Cheers

David

Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by Zi Yan 2 months, 2 weeks ago

On 21 Nov 2025, at 12:09, David Hildenbrand (Red Hat) wrote:

>>>
>>> BTW, I wonder if the is_huge_zero_folio() check should go into folio_split_supported() and just return in -EINVAL. (we shouldn't really trigger that). Similarly we could add a hugetlb sanity check.
>>
>> Yeah, is_huge_zero_folio() should return -EINVAL not -EBUSY, except
>> the case the split happens before a process writes 0 to a zero large folio
>> and gets a new writable large folio, in which we can kinda say it looks like
>> -EBUSY. But it is still a stretch.
>
> I see what you mean, but I think this has less to do with actual races. SO yeah, -EINVAL is likely the tight thing.
>
Sure. Will move it and use -EINVAL.

>>
>> Ack on adding hugetlb sanity check.
>>
>> OK, just to reiterate my above idea on renaming folio_split_supported().
>> Are you OK with renaming it to folio_split_check(), so that returning -EBUSY
>> and -EINVAL looks more reasonable? The benefit is that we no longer need
>> to worry about we need to always do folio->mapping check before
>> folio_split_supported(). (In addition, I would rename can_split_folio()
>> to folio_split_refcount_check() for clarification)
> I guess having some function that tells you "I performed all checks I could without taking locks/references (like anon_vma) and starting with the real magic" is what you have in mind.

Yes.

>
> For these we don't have to prefix with "folio_split" if it sounds weird.
>
> folio_check_splittable() ?

Sounds good to me.

>
> Regarding can_split_folio(), I was wondering whether we can just get rid of it and use folio_expect_ref_count() instead?
>
> For the two callers that need extra_pins, we could just have something simple helper in huge_memory.c like
>
> /* Number of folio references from the pagecache or the swapcache. */
> unsigned int folio_cache_references(const struct folio *folio)
> {
> 	if (folio_test_anon(folio) && !folio_test_swapcache(folio))
> 		return 0;
> 	return folio_nr_pages(folio);
> }

OK, I will give this a try in a separate patch in an updated version of this
series.

Thank you for the feedback.

Best Regards,
Yan, Zi

Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by Balbir Singh 2 months, 2 weeks ago

On 11/20/25 14:59, Zi Yan wrote:
> folio_split_supported() used in try_folio_split_to_order() requires
> folio->mapping to be non NULL, but current try_folio_split_to_order() does
> not check it. Add the check to prevent NULL pointer dereference.
> 
> There is no issue in the current code, since try_folio_split_to_order() is
> only used in truncate_inode_partial_folio(), where folio->mapping is not
> NULL.
> 

Just reading through the description does not clarify one thing

What is the race between just truncated and trying to split them - is there a common lock 
that needs to be held? Is it the subsequent call in truncate_inode_partial_folio()
that causes the race?

IOW, if a folio is not anonymous and does not have a mapping, how is it
being passed to this function?

> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  include/linux/huge_mm.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 1d439de1ca2c..0d55354e3a34 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>  static inline int try_folio_split_to_order(struct folio *folio,
>  		struct page *page, unsigned int new_order)
>  {
> +	/*
> +	 * Folios that just got truncated cannot get split. Signal to the
> +	 * caller that there was a race.
> +	 */
> +	if (!folio_test_anon(folio) && !folio->mapping)
> +		return -EBUSY;
> +
>  	if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>  		return split_huge_page_to_order(&folio->page, new_order);
>  	return folio_split(folio, new_order, page, NULL);

Re: [RFC PATCH 1/3] mm/huge_memory: prevent NULL pointer dereference in try_folio_split_to_order()

Posted by Zi Yan 2 months, 2 weeks ago

On 19 Nov 2025, at 23:28, Balbir Singh wrote:

> On 11/20/25 14:59, Zi Yan wrote:
>> folio_split_supported() used in try_folio_split_to_order() requires
>> folio->mapping to be non NULL, but current try_folio_split_to_order() does
>> not check it. Add the check to prevent NULL pointer dereference.
>>
>> There is no issue in the current code, since try_folio_split_to_order() is
>> only used in truncate_inode_partial_folio(), where folio->mapping is not
>> NULL.
>>
>
> Just reading through the description does not clarify one thing
>
> What is the race between just truncated and trying to split them - is there a common lock
> that needs to be held? Is it the subsequent call in truncate_inode_partial_folio()
> that causes the race?
>
> IOW, if a folio is not anonymous and does not have a mapping, how is it
> being passed to this function?

Quote David’s explanation[1] (note: shmem is not anonymous):

vmscan triggers shmem_writeout() after unmapping the folio and after making
sure that there are no unexpected folio references.

shmem_writeout() will do the shmem_delete_from_page_cache() where we set
folio->mapping = NULL.

So anything walking the page tables (like s390x) could never find it.


Such shmem folios really cannot get split right now until we either
reclaimed them (-> freed) or until shmem_swapin_folio() re-obtained them
from the swapcache to re-add them to the swapcache through
shmem_add_to_page_cache().

[1] https://lore.kernel.org/all/14253d62-0a85-4f61-aed6-72da17bcef77@kernel.org/
>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  include/linux/huge_mm.h | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 1d439de1ca2c..0d55354e3a34 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -407,6 +407,13 @@ static inline int split_huge_page_to_order(struct page *page, unsigned int new_o
>>  static inline int try_folio_split_to_order(struct folio *folio,
>>  		struct page *page, unsigned int new_order)
>>  {
>> +	/*
>> +	 * Folios that just got truncated cannot get split. Signal to the
>> +	 * caller that there was a race.
>> +	 */
>> +	if (!folio_test_anon(folio) && !folio->mapping)
>> +		return -EBUSY;
>> +
>>  	if (!folio_split_supported(folio, new_order, SPLIT_TYPE_NON_UNIFORM, /* warns= */ false))
>>  		return split_huge_page_to_order(&folio->page, new_order);
>>  	return folio_split(folio, new_order, page, NULL);


Best Regards,
Yan, Zi