Remove READ_ONLY_THP_FOR_FS Kconfig

[PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Zi Yan 6 days, 17 hours ago

collapse_file() requires FSes supporting large folio with at least
PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
huge option turned on also sets large folio order on mapping, so the check
also applies to shmem.

While at it, replace VM_BUG_ON with returning failure values.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/khugepaged.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d06d84219e1b..45b12ffb1550 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 	int nr_none = 0;
 	bool is_shmem = shmem_file(file);
 
-	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
-	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
+	/* "huge" shmem sets mapping folio order and passes the check below */
+	if (mapping_max_folio_order(mapping) < PMD_ORDER)
+		return SCAN_FAIL;
+	if (start & (HPAGE_PMD_NR - 1))
+		return SCAN_ADDRESS_RANGE;
 
 	result = alloc_charge_folio(&new_folio, mm, cc);
 	if (result != SCAN_SUCCEED)
-- 
2.43.0

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by David Hildenbrand (Arm) 6 days, 5 hours ago

On 3/27/26 02:42, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> huge option turned on also sets large folio order on mapping, so the check
> also applies to shmem.
> 
> While at it, replace VM_BUG_ON with returning failure values.

Why not VM_WARN_ON_ONCE() ?

These are conditions that must be checked earlier, no?


-- 
Cheers,

David

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Zi Yan 6 days, 4 hours ago

On 27 Mar 2026, at 9:37, David Hildenbrand (Arm) wrote:

> On 3/27/26 02:42, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>> huge option turned on also sets large folio order on mapping, so the check
>> also applies to shmem.
>>
>> While at it, replace VM_BUG_ON with returning failure values.
>
> Why not VM_WARN_ON_ONCE() ?
>
> These are conditions that must be checked earlier, no?

start & (HPAGE_PMD_NR - 1) yes. I can convert it to VM_WARN_ON_ONCE().

For mapping_max_folio_order(mapping) < PMD_ORDER, I probably should
move it to collapse_scan_file() to prevent wasting scanning time
if the file does not support large folio. Then, I can turn it
into a VM_WARN_ON_ONCE().

Best Regards,
Yan, Zi

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Lorenzo Stoakes (Oracle) 6 days, 6 hours ago

On Thu, Mar 26, 2026 at 09:42:47PM -0400, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> huge option turned on also sets large folio order on mapping, so the check
> also applies to shmem.
>
> While at it, replace VM_BUG_ON with returning failure values.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>



> ---
>  mm/khugepaged.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d06d84219e1b..45b12ffb1550 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  	int nr_none = 0;
>  	bool is_shmem = shmem_file(file);
>
> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> +	/* "huge" shmem sets mapping folio order and passes the check below */

I think this isn't quite clear and could be improved to e.g.:

	/*
	 * Either anon shmem supports huge pages as set by shmem_enabled sysfs,
	 * or a shmem file system mounted with the "huge" option.
	 */

> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> +		return SCAN_FAIL;

As per rest of thread, this looks correct.

> +	if (start & (HPAGE_PMD_NR - 1))
> +		return SCAN_ADDRESS_RANGE;

Hmm, we're kinda making this - presumably buggy situation - into a valid input
that just fails the scan.

Maybe just make it a VM_WARN_ON_ONCE()? Or if we want to avoid propagating the
bug that'd cause it any further:

	if (start & (HPAGE_PMD_NR - 1)) {
		VM_WARN_ON_ONCE(true);
		return SCAN_ADDRESS_RANGE;
	}

Or similar.

>
>  	result = alloc_charge_folio(&new_folio, mm, cc);
>  	if (result != SCAN_SUCCEED)
> --
> 2.43.0
>

Cheers, Lorenzo

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Zi Yan 6 days, 4 hours ago

On 27 Mar 2026, at 8:07, Lorenzo Stoakes (Oracle) wrote:

> On Thu, Mar 26, 2026 at 09:42:47PM -0400, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>> huge option turned on also sets large folio order on mapping, so the check
>> also applies to shmem.
>>
>> While at it, replace VM_BUG_ON with returning failure values.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
>
>
>> ---
>>  mm/khugepaged.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d06d84219e1b..45b12ffb1550 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>  	int nr_none = 0;
>>  	bool is_shmem = shmem_file(file);
>>
>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>
> I think this isn't quite clear and could be improved to e.g.:
>
> 	/*
> 	 * Either anon shmem supports huge pages as set by shmem_enabled sysfs,
> 	 * or a shmem file system mounted with the "huge" option.
> 	 */
>
>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>> +		return SCAN_FAIL;
>
> As per rest of thread, this looks correct.

Will respond to that thread.

>
>> +	if (start & (HPAGE_PMD_NR - 1))
>> +		return SCAN_ADDRESS_RANGE;
>
> Hmm, we're kinda making this - presumably buggy situation - into a valid input
> that just fails the scan.
>
> Maybe just make it a VM_WARN_ON_ONCE()? Or if we want to avoid propagating the
> bug that'd cause it any further:
>
> 	if (start & (HPAGE_PMD_NR - 1)) {
> 		VM_WARN_ON_ONCE(true);
> 		return SCAN_ADDRESS_RANGE;
> 	}
>
> Or similar.

As I responded to David, will change it to VM_WARN_ON_ONCE().

>
>>
>>  	result = alloc_charge_folio(&new_folio, mm, cc);
>>  	if (result != SCAN_SUCCEED)
>> --
>> 2.43.0
>>
>
> Cheers, Lorenzo


Best Regards,
Yan, Zi

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Lorenzo Stoakes (Oracle) 6 days, 4 hours ago

On Fri, Mar 27, 2026 at 12:07:22PM +0000, Lorenzo Stoakes (Oracle) wrote:
> > +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> > +		return SCAN_FAIL;
>
> As per rest of thread, this looks correct.

Actually, no :)

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Baolin Wang 6 days, 9 hours ago


On 3/27/26 9:42 AM, Zi Yan wrote:
> collapse_file() requires FSes supporting large folio with at least
> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> huge option turned on also sets large folio order on mapping, so the check
> also applies to shmem.
> 
> While at it, replace VM_BUG_ON with returning failure values.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   mm/khugepaged.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d06d84219e1b..45b12ffb1550 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>   	int nr_none = 0;
>   	bool is_shmem = shmem_file(file);
>   
> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> +	/* "huge" shmem sets mapping folio order and passes the check below */
> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> +		return SCAN_FAIL;

This is not true for anonymous shmem, since its large order allocation 
logic is similar to anonymous memory. That means it will not call 
mapping_set_large_folios() for anonymous shmem.

So I think the check should be:

if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
      return SCAN_FAIL;

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Lorenzo Stoakes (Oracle) 6 days, 7 hours ago

On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>
>
> On 3/27/26 9:42 AM, Zi Yan wrote:
> > collapse_file() requires FSes supporting large folio with at least
> > PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> > huge option turned on also sets large folio order on mapping, so the check
> > also applies to shmem.
> >
> > While at it, replace VM_BUG_ON with returning failure values.
> >
> > Signed-off-by: Zi Yan <ziy@nvidia.com>
> > ---
> >   mm/khugepaged.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index d06d84219e1b..45b12ffb1550 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> >   	int nr_none = 0;
> >   	bool is_shmem = shmem_file(file);
> > -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> > -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> > +	/* "huge" shmem sets mapping folio order and passes the check below */
> > +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> > +		return SCAN_FAIL;
>
> This is not true for anonymous shmem, since its large order allocation logic
> is similar to anonymous memory. That means it will not call
> mapping_set_large_folios() for anonymous shmem.
>
> So I think the check should be:
>
> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>      return SCAN_FAIL;

Hmm but in shmem_init() we have:

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
	else
		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */

	/*
	 * Default to setting PMD-sized THP to inherit the global setting and
	 * disable all other multi-size THPs.
	 */
	if (!shmem_orders_configured)
		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
#endif

And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
shmem_enabled_store() updates that if necessary.

So we're still fine right?

__shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
__shmem_get_inode() which has:

	if (sbinfo->huge)
		mapping_set_large_folios(inode->i_mapping);

Shared for both anon shmem and tmpfs-style shmem.

So I think it's fine as-is.

Cheers, Lorenzo

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Baolin Wang 6 days, 5 hours ago


On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>> collapse_file() requires FSes supporting large folio with at least
>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>> huge option turned on also sets large folio order on mapping, so the check
>>> also applies to shmem.
>>>
>>> While at it, replace VM_BUG_ON with returning failure values.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>    mm/khugepaged.c | 7 +++++--
>>>    1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index d06d84219e1b..45b12ffb1550 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>    	int nr_none = 0;
>>>    	bool is_shmem = shmem_file(file);
>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>> +		return SCAN_FAIL;
>>
>> This is not true for anonymous shmem, since its large order allocation logic
>> is similar to anonymous memory. That means it will not call
>> mapping_set_large_folios() for anonymous shmem.
>>
>> So I think the check should be:
>>
>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>       return SCAN_FAIL;
> 
> Hmm but in shmem_init() we have:
> 
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
> 	else
> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
> 
> 	/*
> 	 * Default to setting PMD-sized THP to inherit the global setting and
> 	 * disable all other multi-size THPs.
> 	 */
> 	if (!shmem_orders_configured)
> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
> #endif
> 
> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
> shmem_enabled_store() updates that if necessary.
> 
> So we're still fine right?
> 
> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
> __shmem_get_inode() which has:
> 
> 	if (sbinfo->huge)
> 		mapping_set_large_folios(inode->i_mapping);
> 
> Shared for both anon shmem and tmpfs-style shmem.
> 
> So I think it's fine as-is.

I'm afraid not. Sorry, I should have been clearer.

First, anonymous shmem large order allocation is dynamically controlled 
via the global interface 
(/sys/kernel/mm/transparent_hugepage/shmem_enabled) and the mTHP 
interfaces 
(/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).

This means that during anonymous shmem initialization, these interfaces 
might be set to 'never'. so it will not call mapping_set_large_folios() 
because sbinfo->huge is 'SHMEM_HUGE_NEVER'.

Even if shmem large order allocation is subsequently enabled via the 
interfaces, __shmem_file_setup -> mapping_set_large_folios() is not 
called again.

Anonymous shmem behaves similarly to anonymous pages: it is controlled 
by the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() 
to check for allowed large orders, rather than relying on 
mapping_max_folio_order().

The mapping_max_folio_order() is intended to control large page 
allocation only for tmpfs mounts. Therefore, I find the current code 
confusing and think it needs to be fixed:

/* Don't consider 'deny' for emergencies and 'force' for testing */
if (sb != shm_mnt->mnt_sb && sbinfo->huge)
        mapping_set_large_folios(inode->i_mapping);

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Lorenzo Stoakes (Oracle) 6 days, 4 hours ago

On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>
>
> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
> > On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
> > >
> > >
> > > On 3/27/26 9:42 AM, Zi Yan wrote:
> > > > collapse_file() requires FSes supporting large folio with at least
> > > > PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> > > > huge option turned on also sets large folio order on mapping, so the check
> > > > also applies to shmem.
> > > >
> > > > While at it, replace VM_BUG_ON with returning failure values.
> > > >
> > > > Signed-off-by: Zi Yan <ziy@nvidia.com>
> > > > ---
> > > >    mm/khugepaged.c | 7 +++++--
> > > >    1 file changed, 5 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index d06d84219e1b..45b12ffb1550 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> > > >    	int nr_none = 0;
> > > >    	bool is_shmem = shmem_file(file);
> > > > -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> > > > -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> > > > +	/* "huge" shmem sets mapping folio order and passes the check below */
> > > > +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> > > > +		return SCAN_FAIL;
> > >
> > > This is not true for anonymous shmem, since its large order allocation logic
> > > is similar to anonymous memory. That means it will not call
> > > mapping_set_large_folios() for anonymous shmem.
> > >
> > > So I think the check should be:
> > >
> > > if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
> > >       return SCAN_FAIL;
> >
> > Hmm but in shmem_init() we have:
> >
> > #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
> > 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
> > 	else
> > 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
> >
> > 	/*
> > 	 * Default to setting PMD-sized THP to inherit the global setting and
> > 	 * disable all other multi-size THPs.
> > 	 */
> > 	if (!shmem_orders_configured)
> > 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
> > #endif
> >
> > And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
> > shmem_enabled_store() updates that if necessary.
> >
> > So we're still fine right?
> >
> > __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
> > __shmem_get_inode() which has:
> >
> > 	if (sbinfo->huge)
> > 		mapping_set_large_folios(inode->i_mapping);
> >
> > Shared for both anon shmem and tmpfs-style shmem.
> >
> > So I think it's fine as-is.
>
> I'm afraid not. Sorry, I should have been clearer.
>
> First, anonymous shmem large order allocation is dynamically controlled via
> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
> the mTHP interfaces
> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>
> This means that during anonymous shmem initialization, these interfaces
> might be set to 'never'. so it will not call mapping_set_large_folios()
> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>
> Even if shmem large order allocation is subsequently enabled via the
> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
> again.

I see your point, oh this is all a bit of a mess...

It feels like entirely the wrong abstraction anyway, since at best you're
getting a global 'is enabled'.

I guess what happened before was we'd never call into this with ! r/o thp for fs
&& ! is_shmem.

But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
suggestion is correct I think actualyl.

I do hate:

	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)

As a bit of code though. It's horrible.

Let's abstract that...

It'd be nice if we could find a way to clean things up in the lead up to changes
in series like this instead of sticking with the mess, but I guess since it
mostly removes stuff that's ok for now.

>
> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
> check for allowed large orders, rather than relying on
> mapping_max_folio_order().
>
> The mapping_max_folio_order() is intended to control large page allocation
> only for tmpfs mounts. Therefore, I find the current code confusing and
> think it needs to be fixed:
>
> /* Don't consider 'deny' for emergencies and 'force' for testing */
> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>        mapping_set_large_folios(inode->i_mapping);

Cheers, Lorenzo

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Baolin Wang 6 days, 4 hours ago


On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>
>>>>
>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>> also applies to shmem.
>>>>>
>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>>     mm/khugepaged.c | 7 +++++--
>>>>>     1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>> --- a/mm/khugepaged.c
>>>>> +++ b/mm/khugepaged.c
>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>     	int nr_none = 0;
>>>>>     	bool is_shmem = shmem_file(file);
>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>> +		return SCAN_FAIL;
>>>>
>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>> is similar to anonymous memory. That means it will not call
>>>> mapping_set_large_folios() for anonymous shmem.
>>>>
>>>> So I think the check should be:
>>>>
>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>        return SCAN_FAIL;
>>>
>>> Hmm but in shmem_init() we have:
>>>
>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>> 	else
>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>
>>> 	/*
>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>> 	 * disable all other multi-size THPs.
>>> 	 */
>>> 	if (!shmem_orders_configured)
>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>> #endif
>>>
>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>> shmem_enabled_store() updates that if necessary.
>>>
>>> So we're still fine right?
>>>
>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>> __shmem_get_inode() which has:
>>>
>>> 	if (sbinfo->huge)
>>> 		mapping_set_large_folios(inode->i_mapping);
>>>
>>> Shared for both anon shmem and tmpfs-style shmem.
>>>
>>> So I think it's fine as-is.
>>
>> I'm afraid not. Sorry, I should have been clearer.
>>
>> First, anonymous shmem large order allocation is dynamically controlled via
>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>> the mTHP interfaces
>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>
>> This means that during anonymous shmem initialization, these interfaces
>> might be set to 'never'. so it will not call mapping_set_large_folios()
>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>
>> Even if shmem large order allocation is subsequently enabled via the
>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>> again.
> 
> I see your point, oh this is all a bit of a mess...
> 
> It feels like entirely the wrong abstraction anyway, since at best you're
> getting a global 'is enabled'.
> 
> I guess what happened before was we'd never call into this with ! r/o thp for fs
> && ! is_shmem.

Right.

> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
> suggestion is correct I think actualyl.
> 
> I do hate:
> 
> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
> 
> As a bit of code though. It's horrible.

Indeed.

> Let's abstract that...
> 
> It'd be nice if we could find a way to clean things up in the lead up to changes
> in series like this instead of sticking with the mess, but I guess since it
> mostly removes stuff that's ok for now.

I think this check can be removed from this patch.

During the khugepaged's scan, it will call thp_vma_allowable_order() to 
check if the VMA is allowed to collapse into a PMD.

Specifically, within the call chain thp_vma_allowable_order() -> 
__thp_vma_allowable_orders(), shmem is checked via 
shmem_allowable_huge_orders(), while other FSes are checked via 
file_thp_enabled().

For those other filesystems, Patch 5 has already added the following 
check, which I think is sufficient to filter out those FSes that do not 
support large folios:

if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
	return false;


>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>> check for allowed large orders, rather than relying on
>> mapping_max_folio_order().
>>
>> The mapping_max_folio_order() is intended to control large page allocation
>> only for tmpfs mounts. Therefore, I find the current code confusing and
>> think it needs to be fixed:
>>
>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>         mapping_set_large_folios(inode->i_mapping);
> 
> Cheers, Lorenzo

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Lorenzo Stoakes (Oracle) 6 days, 4 hours ago

On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>
>
> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
> > On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
> > >
> > >
> > > On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
> > > > On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
> > > > >
> > > > >
> > > > > On 3/27/26 9:42 AM, Zi Yan wrote:
> > > > > > collapse_file() requires FSes supporting large folio with at least
> > > > > > PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
> > > > > > huge option turned on also sets large folio order on mapping, so the check
> > > > > > also applies to shmem.
> > > > > >
> > > > > > While at it, replace VM_BUG_ON with returning failure values.
> > > > > >
> > > > > > Signed-off-by: Zi Yan <ziy@nvidia.com>
> > > > > > ---
> > > > > >     mm/khugepaged.c | 7 +++++--
> > > > > >     1 file changed, 5 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > > index d06d84219e1b..45b12ffb1550 100644
> > > > > > --- a/mm/khugepaged.c
> > > > > > +++ b/mm/khugepaged.c
> > > > > > @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> > > > > >     	int nr_none = 0;
> > > > > >     	bool is_shmem = shmem_file(file);
> > > > > > -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
> > > > > > -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
> > > > > > +	/* "huge" shmem sets mapping folio order and passes the check below */
> > > > > > +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
> > > > > > +		return SCAN_FAIL;
> > > > >
> > > > > This is not true for anonymous shmem, since its large order allocation logic
> > > > > is similar to anonymous memory. That means it will not call
> > > > > mapping_set_large_folios() for anonymous shmem.
> > > > >
> > > > > So I think the check should be:
> > > > >
> > > > > if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
> > > > >        return SCAN_FAIL;
> > > >
> > > > Hmm but in shmem_init() we have:
> > > >
> > > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > > 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
> > > > 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
> > > > 	else
> > > > 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
> > > >
> > > > 	/*
> > > > 	 * Default to setting PMD-sized THP to inherit the global setting and
> > > > 	 * disable all other multi-size THPs.
> > > > 	 */
> > > > 	if (!shmem_orders_configured)
> > > > 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
> > > > #endif
> > > >
> > > > And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
> > > > shmem_enabled_store() updates that if necessary.
> > > >
> > > > So we're still fine right?
> > > >
> > > > __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
> > > > __shmem_get_inode() which has:
> > > >
> > > > 	if (sbinfo->huge)
> > > > 		mapping_set_large_folios(inode->i_mapping);
> > > >
> > > > Shared for both anon shmem and tmpfs-style shmem.
> > > >
> > > > So I think it's fine as-is.
> > >
> > > I'm afraid not. Sorry, I should have been clearer.
> > >
> > > First, anonymous shmem large order allocation is dynamically controlled via
> > > the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
> > > the mTHP interfaces
> > > (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
> > >
> > > This means that during anonymous shmem initialization, these interfaces
> > > might be set to 'never'. so it will not call mapping_set_large_folios()
> > > because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
> > >
> > > Even if shmem large order allocation is subsequently enabled via the
> > > interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
> > > again.
> >
> > I see your point, oh this is all a bit of a mess...
> >
> > It feels like entirely the wrong abstraction anyway, since at best you're
> > getting a global 'is enabled'.
> >
> > I guess what happened before was we'd never call into this with ! r/o thp for fs
> > && ! is_shmem.
>
> Right.
>
> > But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
> > suggestion is correct I think actualyl.
> >
> > I do hate:
> >
> > 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
> >
> > As a bit of code though. It's horrible.
>
> Indeed.
>
> > Let's abstract that...
> >
> > It'd be nice if we could find a way to clean things up in the lead up to changes
> > in series like this instead of sticking with the mess, but I guess since it
> > mostly removes stuff that's ok for now.
>
> I think this check can be removed from this patch.
>
> During the khugepaged's scan, it will call thp_vma_allowable_order() to
> check if the VMA is allowed to collapse into a PMD.
>
> Specifically, within the call chain thp_vma_allowable_order() ->
> __thp_vma_allowable_orders(), shmem is checked via
> shmem_allowable_huge_orders(), while other FSes are checked via
> file_thp_enabled().

It sucks not to have an assert. Maybe in that case make it a
VM_WARN_ON_ONCE().

I hate that you're left tracing things back like that...

>
> For those other filesystems, Patch 5 has already added the following check,
> which I think is sufficient to filter out those FSes that do not support
> large folios:
>
> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> 	return false;

2 < 5, we won't tolerate bisection hazards.

>
>
> > > Anonymous shmem behaves similarly to anonymous pages: it is controlled by
> > > the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
> > > check for allowed large orders, rather than relying on
> > > mapping_max_folio_order().
> > >
> > > The mapping_max_folio_order() is intended to control large page allocation
> > > only for tmpfs mounts. Therefore, I find the current code confusing and
> > > think it needs to be fixed:
> > >
> > > /* Don't consider 'deny' for emergencies and 'force' for testing */
> > > if (sb != shm_mnt->mnt_sb && sbinfo->huge)
> > >         mapping_set_large_folios(inode->i_mapping);
> >
> > Cheers, Lorenzo
>

Cheers, Lorenzo

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Zi Yan 6 days, 4 hours ago

On 27 Mar 2026, at 10:31, Lorenzo Stoakes (Oracle) wrote:

> On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
>>> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>>>
>>>>
>>>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>>>> also applies to shmem.
>>>>>>>
>>>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>>>
>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>> ---
>>>>>>>     mm/khugepaged.c | 7 +++++--
>>>>>>>     1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>>>> --- a/mm/khugepaged.c
>>>>>>> +++ b/mm/khugepaged.c
>>>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>>>     	int nr_none = 0;
>>>>>>>     	bool is_shmem = shmem_file(file);
>>>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>> +		return SCAN_FAIL;
>>>>>>
>>>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>>>> is similar to anonymous memory. That means it will not call
>>>>>> mapping_set_large_folios() for anonymous shmem.
>>>>>>
>>>>>> So I think the check should be:
>>>>>>
>>>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>        return SCAN_FAIL;
>>>>>
>>>>> Hmm but in shmem_init() we have:
>>>>>
>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>>>> 	else
>>>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>>>
>>>>> 	/*
>>>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>>>> 	 * disable all other multi-size THPs.
>>>>> 	 */
>>>>> 	if (!shmem_orders_configured)
>>>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>>>> #endif
>>>>>
>>>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>>>> shmem_enabled_store() updates that if necessary.
>>>>>
>>>>> So we're still fine right?
>>>>>
>>>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>>>> __shmem_get_inode() which has:
>>>>>
>>>>> 	if (sbinfo->huge)
>>>>> 		mapping_set_large_folios(inode->i_mapping);
>>>>>
>>>>> Shared for both anon shmem and tmpfs-style shmem.
>>>>>
>>>>> So I think it's fine as-is.
>>>>
>>>> I'm afraid not. Sorry, I should have been clearer.
>>>>
>>>> First, anonymous shmem large order allocation is dynamically controlled via
>>>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>>>> the mTHP interfaces
>>>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>>>
>>>> This means that during anonymous shmem initialization, these interfaces
>>>> might be set to 'never'. so it will not call mapping_set_large_folios()
>>>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>>>
>>>> Even if shmem large order allocation is subsequently enabled via the
>>>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>>>> again.
>>>
>>> I see your point, oh this is all a bit of a mess...
>>>
>>> It feels like entirely the wrong abstraction anyway, since at best you're
>>> getting a global 'is enabled'.
>>>
>>> I guess what happened before was we'd never call into this with ! r/o thp for fs
>>> && ! is_shmem.
>>
>> Right.
>>
>>> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
>>> suggestion is correct I think actualyl.
>>>
>>> I do hate:
>>>
>>> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>
>>> As a bit of code though. It's horrible.
>>
>> Indeed.
>>
>>> Let's abstract that...
>>>
>>> It'd be nice if we could find a way to clean things up in the lead up to changes
>>> in series like this instead of sticking with the mess, but I guess since it
>>> mostly removes stuff that's ok for now.
>>
>> I think this check can be removed from this patch.
>>
>> During the khugepaged's scan, it will call thp_vma_allowable_order() to
>> check if the VMA is allowed to collapse into a PMD.
>>
>> Specifically, within the call chain thp_vma_allowable_order() ->
>> __thp_vma_allowable_orders(), shmem is checked via
>> shmem_allowable_huge_orders(), while other FSes are checked via
>> file_thp_enabled().

But for madvise(MADV_COLLAPSE) case, IIRC, it ignores shmem huge config
and can perform collapse anyway. This means without !is_shmem the check
will break madvise(MADV_COLLAPSE). Let me know if I get it wrong, since
I was in that TVA_FORCED_COLLAPSE email thread but does not remember
everything there.


>
> It sucks not to have an assert. Maybe in that case make it a
> VM_WARN_ON_ONCE().

Will do that as I replied to David already.

>
> I hate that you're left tracing things back like that...
>
>>
>> For those other filesystems, Patch 5 has already added the following check,
>> which I think is sufficient to filter out those FSes that do not support
>> large folios:
>>
>> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>> 	return false;
>
> 2 < 5, we won't tolerate bisection hazards.
>
>>
>>
>>>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>>>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>>>> check for allowed large orders, rather than relying on
>>>> mapping_max_folio_order().
>>>>
>>>> The mapping_max_folio_order() is intended to control large page allocation
>>>> only for tmpfs mounts. Therefore, I find the current code confusing and
>>>> think it needs to be fixed:
>>>>
>>>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>>>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>>>         mapping_set_large_folios(inode->i_mapping);
>>>

Hi Baolin,

Do you want to send a fix for this?

Also I wonder how I can distinguish between anonymous shmem code and tmpfs code.
I thought they are the same thing except that they have different user interface,
but it seems that I was wrong.


Best Regards,
Yan, Zi

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Lance Yang 6 days, 2 hours ago

On Fri, Mar 27, 2026 at 11:00:26AM -0400, Zi Yan wrote:
>On 27 Mar 2026, at 10:31, Lorenzo Stoakes (Oracle) wrote:
>
>> On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>>>
>>>
>>> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
>>>> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>>>>> also applies to shmem.
>>>>>>>>
>>>>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>>>>
>>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>> ---
>>>>>>>>     mm/khugepaged.c | 7 +++++--
>>>>>>>>     1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>>>>> --- a/mm/khugepaged.c
>>>>>>>> +++ b/mm/khugepaged.c
>>>>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>>>>     	int nr_none = 0;
>>>>>>>>     	bool is_shmem = shmem_file(file);
>>>>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>> +		return SCAN_FAIL;
>>>>>>>
>>>>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>>>>> is similar to anonymous memory. That means it will not call
>>>>>>> mapping_set_large_folios() for anonymous shmem.
>>>>>>>
>>>>>>> So I think the check should be:
>>>>>>>
>>>>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>        return SCAN_FAIL;
>>>>>>
>>>>>> Hmm but in shmem_init() we have:
>>>>>>
>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>>>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>>>>> 	else
>>>>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>>>>
>>>>>> 	/*
>>>>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>>>>> 	 * disable all other multi-size THPs.
>>>>>> 	 */
>>>>>> 	if (!shmem_orders_configured)
>>>>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>>>>> #endif
>>>>>>
>>>>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>>>>> shmem_enabled_store() updates that if necessary.
>>>>>>
>>>>>> So we're still fine right?
>>>>>>
>>>>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>>>>> __shmem_get_inode() which has:
>>>>>>
>>>>>> 	if (sbinfo->huge)
>>>>>> 		mapping_set_large_folios(inode->i_mapping);
>>>>>>
>>>>>> Shared for both anon shmem and tmpfs-style shmem.
>>>>>>
>>>>>> So I think it's fine as-is.
>>>>>
>>>>> I'm afraid not. Sorry, I should have been clearer.
>>>>>
>>>>> First, anonymous shmem large order allocation is dynamically controlled via
>>>>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>>>>> the mTHP interfaces
>>>>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>>>>
>>>>> This means that during anonymous shmem initialization, these interfaces
>>>>> might be set to 'never'. so it will not call mapping_set_large_folios()
>>>>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>>>>
>>>>> Even if shmem large order allocation is subsequently enabled via the
>>>>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>>>>> again.
>>>>
>>>> I see your point, oh this is all a bit of a mess...
>>>>
>>>> It feels like entirely the wrong abstraction anyway, since at best you're
>>>> getting a global 'is enabled'.
>>>>
>>>> I guess what happened before was we'd never call into this with ! r/o thp for fs
>>>> && ! is_shmem.
>>>
>>> Right.
>>>
>>>> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
>>>> suggestion is correct I think actualyl.
>>>>
>>>> I do hate:
>>>>
>>>> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>
>>>> As a bit of code though. It's horrible.
>>>
>>> Indeed.
>>>
>>>> Let's abstract that...
>>>>
>>>> It'd be nice if we could find a way to clean things up in the lead up to changes
>>>> in series like this instead of sticking with the mess, but I guess since it
>>>> mostly removes stuff that's ok for now.
>>>
>>> I think this check can be removed from this patch.
>>>
>>> During the khugepaged's scan, it will call thp_vma_allowable_order() to
>>> check if the VMA is allowed to collapse into a PMD.
>>>
>>> Specifically, within the call chain thp_vma_allowable_order() ->
>>> __thp_vma_allowable_orders(), shmem is checked via
>>> shmem_allowable_huge_orders(), while other FSes are checked via
>>> file_thp_enabled().
>
>But for madvise(MADV_COLLAPSE) case, IIRC, it ignores shmem huge config
>and can perform collapse anyway. This means without !is_shmem the check
>will break madvise(MADV_COLLAPSE). Let me know if I get it wrong, since

Right. That will break MADV_COLLAPSE, IIUC.

For MADV_COLLAPSE on anonymous shmem, eligibility is determined by the
TVA_FORCED_COLLAPSE path via shmem_allowable_huge_orders(), not by
whether the inode mapping got mapping_set_large_folios() at creation
time.

Using mmap(MAP_SHARED | MAP_ANONYMOUS):
- create time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=never
- collapse time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=always

With the !is_shmem guard, collapse succeeds. Without it, the same setup
fails with -EINVAL.

Thanks,
Lance

>I was in that TVA_FORCED_COLLAPSE email thread but does not remember
>everything there.
>
>
>>
>> It sucks not to have an assert. Maybe in that case make it a
>> VM_WARN_ON_ONCE().
>
>Will do that as I replied to David already.
>
>>
>> I hate that you're left tracing things back like that...
>>
>>>
>>> For those other filesystems, Patch 5 has already added the following check,
>>> which I think is sufficient to filter out those FSes that do not support
>>> large folios:
>>>
>>> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>> 	return false;
>>
>> 2 < 5, we won't tolerate bisection hazards.
>>
>>>
>>>
>>>>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>>>>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>>>>> check for allowed large orders, rather than relying on
>>>>> mapping_max_folio_order().
>>>>>
>>>>> The mapping_max_folio_order() is intended to control large page allocation
>>>>> only for tmpfs mounts. Therefore, I find the current code confusing and
>>>>> think it needs to be fixed:
>>>>>
>>>>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>>>>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>>>>         mapping_set_large_folios(inode->i_mapping);
>>>>
>
>Hi Baolin,
>
>Do you want to send a fix for this?
>
>Also I wonder how I can distinguish between anonymous shmem code and tmpfs code.
>I thought they are the same thing except that they have different user interface,
>but it seems that I was wrong.
>
>
>Best Regards,
>Yan, Zi
>
>

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Zi Yan 6 days, 2 hours ago

On 27 Mar 2026, at 12:22, Lance Yang wrote:

> On Fri, Mar 27, 2026 at 11:00:26AM -0400, Zi Yan wrote:
>> On 27 Mar 2026, at 10:31, Lorenzo Stoakes (Oracle) wrote:
>>
>>> On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>>>>
>>>>
>>>> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>>>>>> also applies to shmem.
>>>>>>>>>
>>>>>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>>> ---
>>>>>>>>>     mm/khugepaged.c | 7 +++++--
>>>>>>>>>     1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>>>>>> --- a/mm/khugepaged.c
>>>>>>>>> +++ b/mm/khugepaged.c
>>>>>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>>>>>     	int nr_none = 0;
>>>>>>>>>     	bool is_shmem = shmem_file(file);
>>>>>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>>> +		return SCAN_FAIL;
>>>>>>>>
>>>>>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>>>>>> is similar to anonymous memory. That means it will not call
>>>>>>>> mapping_set_large_folios() for anonymous shmem.
>>>>>>>>
>>>>>>>> So I think the check should be:
>>>>>>>>
>>>>>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>>        return SCAN_FAIL;
>>>>>>>
>>>>>>> Hmm but in shmem_init() we have:
>>>>>>>
>>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>>>>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>>>>>> 	else
>>>>>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>>>>>
>>>>>>> 	/*
>>>>>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>>>>>> 	 * disable all other multi-size THPs.
>>>>>>> 	 */
>>>>>>> 	if (!shmem_orders_configured)
>>>>>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>>>>>> #endif
>>>>>>>
>>>>>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>>>>>> shmem_enabled_store() updates that if necessary.
>>>>>>>
>>>>>>> So we're still fine right?
>>>>>>>
>>>>>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>>>>>> __shmem_get_inode() which has:
>>>>>>>
>>>>>>> 	if (sbinfo->huge)
>>>>>>> 		mapping_set_large_folios(inode->i_mapping);
>>>>>>>
>>>>>>> Shared for both anon shmem and tmpfs-style shmem.
>>>>>>>
>>>>>>> So I think it's fine as-is.
>>>>>>
>>>>>> I'm afraid not. Sorry, I should have been clearer.
>>>>>>
>>>>>> First, anonymous shmem large order allocation is dynamically controlled via
>>>>>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>>>>>> the mTHP interfaces
>>>>>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>>>>>
>>>>>> This means that during anonymous shmem initialization, these interfaces
>>>>>> might be set to 'never'. so it will not call mapping_set_large_folios()
>>>>>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>>>>>
>>>>>> Even if shmem large order allocation is subsequently enabled via the
>>>>>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>>>>>> again.
>>>>>
>>>>> I see your point, oh this is all a bit of a mess...
>>>>>
>>>>> It feels like entirely the wrong abstraction anyway, since at best you're
>>>>> getting a global 'is enabled'.
>>>>>
>>>>> I guess what happened before was we'd never call into this with ! r/o thp for fs
>>>>> && ! is_shmem.
>>>>
>>>> Right.
>>>>
>>>>> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
>>>>> suggestion is correct I think actualyl.
>>>>>
>>>>> I do hate:
>>>>>
>>>>> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>
>>>>> As a bit of code though. It's horrible.
>>>>
>>>> Indeed.
>>>>
>>>>> Let's abstract that...
>>>>>
>>>>> It'd be nice if we could find a way to clean things up in the lead up to changes
>>>>> in series like this instead of sticking with the mess, but I guess since it
>>>>> mostly removes stuff that's ok for now.
>>>>
>>>> I think this check can be removed from this patch.
>>>>
>>>> During the khugepaged's scan, it will call thp_vma_allowable_order() to
>>>> check if the VMA is allowed to collapse into a PMD.
>>>>
>>>> Specifically, within the call chain thp_vma_allowable_order() ->
>>>> __thp_vma_allowable_orders(), shmem is checked via
>>>> shmem_allowable_huge_orders(), while other FSes are checked via
>>>> file_thp_enabled().
>>
>> But for madvise(MADV_COLLAPSE) case, IIRC, it ignores shmem huge config
>> and can perform collapse anyway. This means without !is_shmem the check
>> will break madvise(MADV_COLLAPSE). Let me know if I get it wrong, since
>
> Right. That will break MADV_COLLAPSE, IIUC.
>
> For MADV_COLLAPSE on anonymous shmem, eligibility is determined by the
> TVA_FORCED_COLLAPSE path via shmem_allowable_huge_orders(), not by
> whether the inode mapping got mapping_set_large_folios() at creation
> time.
>
> Using mmap(MAP_SHARED | MAP_ANONYMOUS):
> - create time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=never
> - collapse time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=always
>
> With the !is_shmem guard, collapse succeeds. Without it, the same setup
> fails with -EINVAL.

Thank you for the confirmation. I will fix it.

>
> Thanks,
> Lance
>
>> I was in that TVA_FORCED_COLLAPSE email thread but does not remember
>> everything there.
>>
>>
>>>
>>> It sucks not to have an assert. Maybe in that case make it a
>>> VM_WARN_ON_ONCE().
>>
>> Will do that as I replied to David already.
>>
>>>
>>> I hate that you're left tracing things back like that...
>>>
>>>>
>>>> For those other filesystems, Patch 5 has already added the following check,
>>>> which I think is sufficient to filter out those FSes that do not support
>>>> large folios:
>>>>
>>>> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>>> 	return false;
>>>
>>> 2 < 5, we won't tolerate bisection hazards.
>>>
>>>>
>>>>
>>>>>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>>>>>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>>>>>> check for allowed large orders, rather than relying on
>>>>>> mapping_max_folio_order().
>>>>>>
>>>>>> The mapping_max_folio_order() is intended to control large page allocation
>>>>>> only for tmpfs mounts. Therefore, I find the current code confusing and
>>>>>> think it needs to be fixed:
>>>>>>
>>>>>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>>>>>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>>>>>         mapping_set_large_folios(inode->i_mapping);
>>>>>
>>
>> Hi Baolin,
>>
>> Do you want to send a fix for this?
>>
>> Also I wonder how I can distinguish between anonymous shmem code and tmpfs code.
>> I thought they are the same thing except that they have different user interface,
>> but it seems that I was wrong.
>>
>>
>> Best Regards,
>> Yan, Zi
>>
>>


Best Regards,
Yan, Zi

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Baolin Wang 5 days, 16 hours ago


On 3/28/26 12:30 AM, Zi Yan wrote:
> On 27 Mar 2026, at 12:22, Lance Yang wrote:
> 
>> On Fri, Mar 27, 2026 at 11:00:26AM -0400, Zi Yan wrote:
>>> On 27 Mar 2026, at 10:31, Lorenzo Stoakes (Oracle) wrote:
>>>
>>>> On Fri, Mar 27, 2026 at 10:26:53PM +0800, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 3/27/26 10:12 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>>> On Fri, Mar 27, 2026 at 09:45:03PM +0800, Baolin Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 3/27/26 8:02 PM, Lorenzo Stoakes (Oracle) wrote:
>>>>>>>> On Fri, Mar 27, 2026 at 05:44:49PM +0800, Baolin Wang wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/27/26 9:42 AM, Zi Yan wrote:
>>>>>>>>>> collapse_file() requires FSes supporting large folio with at least
>>>>>>>>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>>>>>>>>>> huge option turned on also sets large folio order on mapping, so the check
>>>>>>>>>> also applies to shmem.
>>>>>>>>>>
>>>>>>>>>> While at it, replace VM_BUG_ON with returning failure values.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>>>> ---
>>>>>>>>>>      mm/khugepaged.c | 7 +++++--
>>>>>>>>>>      1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>>>>>>>> index d06d84219e1b..45b12ffb1550 100644
>>>>>>>>>> --- a/mm/khugepaged.c
>>>>>>>>>> +++ b/mm/khugepaged.c
>>>>>>>>>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>>>>>>>      	int nr_none = 0;
>>>>>>>>>>      	bool is_shmem = shmem_file(file);
>>>>>>>>>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>>>>>>>>>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>>>>>>>>>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>>>>>>>>>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>>>> +		return SCAN_FAIL;
>>>>>>>>>
>>>>>>>>> This is not true for anonymous shmem, since its large order allocation logic
>>>>>>>>> is similar to anonymous memory. That means it will not call
>>>>>>>>> mapping_set_large_folios() for anonymous shmem.
>>>>>>>>>
>>>>>>>>> So I think the check should be:
>>>>>>>>>
>>>>>>>>> if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>>>>         return SCAN_FAIL;
>>>>>>>>
>>>>>>>> Hmm but in shmem_init() we have:
>>>>>>>>
>>>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>>>>> 	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
>>>>>>>> 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
>>>>>>>> 	else
>>>>>>>> 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
>>>>>>>>
>>>>>>>> 	/*
>>>>>>>> 	 * Default to setting PMD-sized THP to inherit the global setting and
>>>>>>>> 	 * disable all other multi-size THPs.
>>>>>>>> 	 */
>>>>>>>> 	if (!shmem_orders_configured)
>>>>>>>> 		huge_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
>>>>>>>> #endif
>>>>>>>>
>>>>>>>> And shm_mnt->mnt_sb is the superblock used for anon shmem. Also
>>>>>>>> shmem_enabled_store() updates that if necessary.
>>>>>>>>
>>>>>>>> So we're still fine right?
>>>>>>>>
>>>>>>>> __shmem_file_setup() (used for anon shmem) calls shmem_get_inode() ->
>>>>>>>> __shmem_get_inode() which has:
>>>>>>>>
>>>>>>>> 	if (sbinfo->huge)
>>>>>>>> 		mapping_set_large_folios(inode->i_mapping);
>>>>>>>>
>>>>>>>> Shared for both anon shmem and tmpfs-style shmem.
>>>>>>>>
>>>>>>>> So I think it's fine as-is.
>>>>>>>
>>>>>>> I'm afraid not. Sorry, I should have been clearer.
>>>>>>>
>>>>>>> First, anonymous shmem large order allocation is dynamically controlled via
>>>>>>> the global interface (/sys/kernel/mm/transparent_hugepage/shmem_enabled) and
>>>>>>> the mTHP interfaces
>>>>>>> (/sys/kernel/mm/transparent_hugepage/hugepages-*kB/shmem_enabled).
>>>>>>>
>>>>>>> This means that during anonymous shmem initialization, these interfaces
>>>>>>> might be set to 'never'. so it will not call mapping_set_large_folios()
>>>>>>> because sbinfo->huge is 'SHMEM_HUGE_NEVER'.
>>>>>>>
>>>>>>> Even if shmem large order allocation is subsequently enabled via the
>>>>>>> interfaces, __shmem_file_setup -> mapping_set_large_folios() is not called
>>>>>>> again.
>>>>>>
>>>>>> I see your point, oh this is all a bit of a mess...
>>>>>>
>>>>>> It feels like entirely the wrong abstraction anyway, since at best you're
>>>>>> getting a global 'is enabled'.
>>>>>>
>>>>>> I guess what happened before was we'd never call into this with ! r/o thp for fs
>>>>>> && ! is_shmem.
>>>>>
>>>>> Right.
>>>>>
>>>>>> But now we are allowing it, but should STILL be gating on !is_shmem so yeah your
>>>>>> suggestion is correct I think actualyl.
>>>>>>
>>>>>> I do hate:
>>>>>>
>>>>>> 	if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)
>>>>>>
>>>>>> As a bit of code though. It's horrible.
>>>>>
>>>>> Indeed.
>>>>>
>>>>>> Let's abstract that...
>>>>>>
>>>>>> It'd be nice if we could find a way to clean things up in the lead up to changes
>>>>>> in series like this instead of sticking with the mess, but I guess since it
>>>>>> mostly removes stuff that's ok for now.
>>>>>
>>>>> I think this check can be removed from this patch.
>>>>>
>>>>> During the khugepaged's scan, it will call thp_vma_allowable_order() to
>>>>> check if the VMA is allowed to collapse into a PMD.
>>>>>
>>>>> Specifically, within the call chain thp_vma_allowable_order() ->
>>>>> __thp_vma_allowable_orders(), shmem is checked via
>>>>> shmem_allowable_huge_orders(), while other FSes are checked via
>>>>> file_thp_enabled().
>>>
>>> But for madvise(MADV_COLLAPSE) case, IIRC, it ignores shmem huge config
>>> and can perform collapse anyway. This means without !is_shmem the check
>>> will break madvise(MADV_COLLAPSE). Let me know if I get it wrong, since
>>
>> Right. That will break MADV_COLLAPSE, IIUC.
>>
>> For MADV_COLLAPSE on anonymous shmem, eligibility is determined by the
>> TVA_FORCED_COLLAPSE path via shmem_allowable_huge_orders(), not by
>> whether the inode mapping got mapping_set_large_folios() at creation
>> time.
>>
>> Using mmap(MAP_SHARED | MAP_ANONYMOUS):
>> - create time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=never
>> - collapse time: shmem_enabled=never, hugepages-2048kB/shmem_enabled=always
>>
>> With the !is_shmem guard, collapse succeeds. Without it, the same setup
>> fails with -EINVAL.

Right. So my suggestion is that the check should be:

if (!is_shmem && mapping_max_folio_order(mapping) < PMD_ORDER)

or just keep a single VM_WARN_ONCE() here, becuase I hope the 
thp_vma_allowable_order() will filter out  those FSes that do not 
support large folios.

>>> I was in that TVA_FORCED_COLLAPSE email thread but does not remember
>>> everything there.
>>>
>>>
>>>>
>>>> It sucks not to have an assert. Maybe in that case make it a
>>>> VM_WARN_ON_ONCE().
>>>
>>> Will do that as I replied to David already.
>>>
>>>>
>>>> I hate that you're left tracing things back like that...
>>>>
>>>>>
>>>>> For those other filesystems, Patch 5 has already added the following check,
>>>>> which I think is sufficient to filter out those FSes that do not support
>>>>> large folios:
>>>>>
>>>>> if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>>>> 	return false;
>>>>
>>>> 2 < 5, we won't tolerate bisection hazards.
>>>>
>>>>>
>>>>>
>>>>>>> Anonymous shmem behaves similarly to anonymous pages: it is controlled by
>>>>>>> the 'shmem_enabled' interfaces and uses shmem_allowable_huge_orders() to
>>>>>>> check for allowed large orders, rather than relying on
>>>>>>> mapping_max_folio_order().
>>>>>>>
>>>>>>> The mapping_max_folio_order() is intended to control large page allocation
>>>>>>> only for tmpfs mounts. Therefore, I find the current code confusing and
>>>>>>> think it needs to be fixed:
>>>>>>>
>>>>>>> /* Don't consider 'deny' for emergencies and 'force' for testing */
>>>>>>> if (sb != shm_mnt->mnt_sb && sbinfo->huge)
>>>>>>>          mapping_set_large_folios(inode->i_mapping);
>>>>>>
>>>
>>> Hi Baolin,
>>>
>>> Do you want to send a fix for this?
>>>
>>> Also I wonder how I can distinguish between anonymous shmem code and tmpfs code.
>>> I thought they are the same thing except that they have different user interface,
>>> but it seems that I was wrong.

Sure. I can send a patch to make the code clear.

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Lance Yang 6 days, 11 hours ago

On Thu, Mar 26, 2026 at 09:42:47PM -0400, Zi Yan wrote:
>collapse_file() requires FSes supporting large folio with at least
>PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>huge option turned on also sets large folio order on mapping, so the check
>also applies to shmem.
>
>While at it, replace VM_BUG_ON with returning failure values.
>
>Signed-off-by: Zi Yan <ziy@nvidia.com>
>---
> mm/khugepaged.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index d06d84219e1b..45b12ffb1550 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> 	int nr_none = 0;
> 	bool is_shmem = shmem_file(file);
> 
>-	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>-	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>+	/* "huge" shmem sets mapping folio order and passes the check below */
>+	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>+		return SCAN_FAIL;

Yep, for shmem inodes, if the mount has huge= enabled, inode creation
marks the mapping are large-folio capable:

	/* Don't consider 'deny' for emergencies and 'force' for testing */
	if (sbinfo->huge)
		mapping_set_large_folios(inode->i_mapping);

LGTM!

Reviewed-by: Lance Yang <lance.yang@linux.dev>

Re: [PATCH v1 02/10] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check

Posted by Lance Yang 6 days, 11 hours ago


On 2026/3/27 15:29, Lance Yang wrote:
> 
> On Thu, Mar 26, 2026 at 09:42:47PM -0400, Zi Yan wrote:
>> collapse_file() requires FSes supporting large folio with at least
>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. shmem with
>> huge option turned on also sets large folio order on mapping, so the check
>> also applies to shmem.
>>
>> While at it, replace VM_BUG_ON with returning failure values.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> mm/khugepaged.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d06d84219e1b..45b12ffb1550 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1899,8 +1899,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>> 	int nr_none = 0;
>> 	bool is_shmem = shmem_file(file);
>>
>> -	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
>> -	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
>> +	/* "huge" shmem sets mapping folio order and passes the check below */
>> +	if (mapping_max_folio_order(mapping) < PMD_ORDER)
>> +		return SCAN_FAIL;
> 
> Yep, for shmem inodes, if the mount has huge= enabled, inode creation
> marks the mapping are large-folio capable:

Oops, s/are/as/

> 
> 	/* Don't consider 'deny' for emergencies and 'force' for testing */
> 	if (sbinfo->huge)
> 		mapping_set_large_folios(inode->i_mapping);
> 
> LGTM!
> 
> Reviewed-by: Lance Yang <lance.yang@linux.dev>