[PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()

Zi Yan posted 10 patches 6 days, 17 hours ago
[PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Zi Yan 6 days, 17 hours ago
Replace it with a check on the max folio order of the file's address space
mapping, making sure PMD_ORDER is supported.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c7873dbdc470..1da1467328a3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 {
 	struct inode *inode;
 
-	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
-		return false;
-
 	if (!vma->vm_file)
 		return false;
 
@@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 	if (IS_ANON_FILE(inode))
 		return false;
 
+	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
+		return false;
+
 	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
 }
 
-- 
2.43.0
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Lorenzo Stoakes (Oracle) 6 days, 6 hours ago
On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
> Replace it with a check on the max folio order of the file's address space
> mapping, making sure PMD_ORDER is supported.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/huge_memory.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c7873dbdc470..1da1467328a3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  {
>  	struct inode *inode;
>
> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> -		return false;
> -
>  	if (!vma->vm_file)
>  		return false;
>
> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  	if (IS_ANON_FILE(inode))
>  		return false;
>
> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> +		return false;
> +

At this point I think this should be a separate function quite honestly and
share it with 2/10's use, and then you can put the comment in here re: anon
shmem etc.

Though that won't apply here of course as shmem_allowable_huge_orders() would
have been invoked :)

But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
unfortunate.

Buuut having said that is this right actually?

Because we have:

		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
			return orders;

Above it, and now you're enabling huge folio file systems to do non-page fault
THP and that's err... isn't that quite a big change?

So yeah probably no to this patch as is :) we should just drop
file_thp_enabled()?

>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>  }
>
> --
> 2.43.0
>
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Zi Yan 6 days, 3 hours ago
On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:

> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
>> Replace it with a check on the max folio order of the file's address space
>> mapping, making sure PMD_ORDER is supported.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/huge_memory.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index c7873dbdc470..1da1467328a3 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>  {
>>  	struct inode *inode;
>>
>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
>> -		return false;
>> -
>>  	if (!vma->vm_file)
>>  		return false;
>>
>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>  	if (IS_ANON_FILE(inode))
>>  		return false;
>>
>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>> +		return false;
>> +
>
> At this point I think this should be a separate function quite honestly and
> share it with 2/10's use, and then you can put the comment in here re: anon
> shmem etc.
>
> Though that won't apply here of course as shmem_allowable_huge_orders() would
> have been invoked :)
>
> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
> unfortunate.
>
> Buuut having said that is this right actually?
>
> Because we have:
>
> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> 			return orders;
>
> Above it, and now you're enabling huge folio file systems to do non-page fault
> THP and that's err... isn't that quite a big change?

That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
This patchset changes the condition from all FSes to FSes with large folio
support.

Will add a helper, mapping_support_pmd_folio(), for
mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.

>
> So yeah probably no to this patch as is :) we should just drop
> file_thp_enabled()?



>
>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>>  }
>>
>> --
>> 2.43.0
>>


Best Regards,
Yan, Zi
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Lorenzo Stoakes (Oracle) 6 days, 3 hours ago
On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
>
> > On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
> >> Replace it with a check on the max folio order of the file's address space
> >> mapping, making sure PMD_ORDER is supported.
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >> ---
> >>  mm/huge_memory.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index c7873dbdc470..1da1467328a3 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>  {
> >>  	struct inode *inode;
> >>
> >> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> >> -		return false;
> >> -
> >>  	if (!vma->vm_file)
> >>  		return false;
> >>
> >> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>  	if (IS_ANON_FILE(inode))
> >>  		return false;
> >>
> >> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> >> +		return false;
> >> +
> >
> > At this point I think this should be a separate function quite honestly and
> > share it with 2/10's use, and then you can put the comment in here re: anon
> > shmem etc.
> >
> > Though that won't apply here of course as shmem_allowable_huge_orders() would
> > have been invoked :)
> >
> > But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
> > unfortunate.
> >
> > Buuut having said that is this right actually?
> >
> > Because we have:
> >
> > 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> > 			return orders;
> >
> > Above it, and now you're enabling huge folio file systems to do non-page fault
> > THP and that's err... isn't that quite a big change?
>
> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
> This patchset changes the condition from all FSes to FSes with large folio
> support.

No, READ_ONLY_THP_FOR_FS operates differently.

It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
mounted read-only.

So due to:

		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
			return orders;

		if (((!in_pf || smaps)) && file_thp_enabled(vma))
			return orders;

                      |    PF     | MADV_COLLAPSE | khugepaged |
		      |-----------|---------------|------------|
large folio fs        |     ✓     |       x       |      x     |
READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |

After this change:

                      |    PF     | MADV_COLLAPSE | khugepaged |
		      |-----------|---------------|------------|
large folio fs        |     ✓     |       ✓       |      ?     |

(I hope we're not enabling khugepaged for large folio fs - which shouldn't
be necessary anyway as we try to give them folios on page fault and they
use thp-friendly get_unused_area etc. :)

We shouldn't be doing this.

It should remain:

                      |    PF     | MADV_COLLAPSE | khugepaged |
		      |-----------|---------------|------------|
large folio fs        |     ✓     |       x       |      x     |

If we're going to remove it, we should first _just remove it_, not
simultaneously increase the scope of what all the MADV_COLLAPSE code is
doing without any confidence in any of it working properly.

And it makes the whole series misleading - you're actually _enabling_ a
feature not (only) _removing_ one.

So let's focus as David suggested on one thing at a time, incrementally.

And let's please try and sort some of this confusing mess out in the code
if at all possible...

>
> Will add a helper, mapping_support_pmd_folio(), for
> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
>
> >
> > So yeah probably no to this patch as is :) we should just drop
> > file_thp_enabled()?
>
>
>
> >
> >>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> >>  }
> >>
> >> --
> >> 2.43.0
> >>
>
>
> Best Regards,
> Yan, Zi

Cheers, Lorenzo
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Zi Yan 6 days, 3 hours ago
On 27 Mar 2026, at 11:29, Lorenzo Stoakes (Oracle) wrote:

> On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
>> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
>>
>>> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
>>>> Replace it with a check on the max folio order of the file's address space
>>>> mapping, making sure PMD_ORDER is supported.
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>>  mm/huge_memory.c | 6 +++---
>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index c7873dbdc470..1da1467328a3 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>  {
>>>>  	struct inode *inode;
>>>>
>>>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
>>>> -		return false;
>>>> -
>>>>  	if (!vma->vm_file)
>>>>  		return false;
>>>>
>>>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>  	if (IS_ANON_FILE(inode))
>>>>  		return false;
>>>>
>>>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>>> +		return false;
>>>> +
>>>
>>> At this point I think this should be a separate function quite honestly and
>>> share it with 2/10's use, and then you can put the comment in here re: anon
>>> shmem etc.
>>>
>>> Though that won't apply here of course as shmem_allowable_huge_orders() would
>>> have been invoked :)
>>>
>>> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
>>> unfortunate.
>>>
>>> Buuut having said that is this right actually?
>>>
>>> Because we have:
>>>
>>> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
>>> 			return orders;
>>>
>>> Above it, and now you're enabling huge folio file systems to do non-page fault
>>> THP and that's err... isn't that quite a big change?
>>
>> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
>> This patchset changes the condition from all FSes to FSes with large folio
>> support.
>
> No, READ_ONLY_THP_FOR_FS operates differently.
>
> It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
> mounted read-only.
>
> So due to:
>
> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> 			return orders;
>
> 		if (((!in_pf || smaps)) && file_thp_enabled(vma))
> 			return orders;
>
>                       |    PF     | MADV_COLLAPSE | khugepaged |
> 		      |-----------|---------------|------------|
> large folio fs        |     ✓     |       x       |      x     |
> READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
>
> After this change:
>
>                       |    PF     | MADV_COLLAPSE | khugepaged |
> 		      |-----------|---------------|------------|
> large folio fs        |     ✓     |       ✓       |      ?     |
>
> (I hope we're not enabling khugepaged for large folio fs - which shouldn't
> be necessary anyway as we try to give them folios on page fault and they
> use thp-friendly get_unused_area etc. :)
>
> We shouldn't be doing this.
>
> It should remain:
>
>                       |    PF     | MADV_COLLAPSE | khugepaged |
> 		      |-----------|---------------|------------|
> large folio fs        |     ✓     |       x       |      x     |
>
> If we're going to remove it, we should first _just remove it_, not
> simultaneously increase the scope of what all the MADV_COLLAPSE code is
> doing without any confidence in any of it working properly.
>
> And it makes the whole series misleading - you're actually _enabling_ a
> feature not (only) _removing_ one.

That is what my RFC patch does, but David and willy told me to do this. :)
IIUC, with READ_ONLY_THP_FOR_FS, FSes with large folio support will
get THP via MADV_COLLAPSE or khugepaged. So removing the code like I
did in RFC would cause regressions.

I guess I need to rename the series to avoid confusion. How about?

Remove read-only THP support for FSes without large folio support.

[1] https://lore.kernel.org/all/7382046f-7c58-4a3e-ab34-b2704355b7d5@kernel.org/

>
> So let's focus as David suggested on one thing at a time, incrementally.
>
> And let's please try and sort some of this confusing mess out in the code
> if at all possible...
>
>>
>> Will add a helper, mapping_support_pmd_folio(), for
>> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
>>
>>>
>>> So yeah probably no to this patch as is :) we should just drop
>>> file_thp_enabled()?
>>
>>
>>
>>>
>>>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>>>>  }
>>>>
>>>> --
>>>> 2.43.0
>>>>
>>
>>
>> Best Regards,
>> Yan, Zi
>
> Cheers, Lorenzo


Best Regards,
Yan, Zi
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Lorenzo Stoakes (Oracle) 6 days, 2 hours ago
On Fri, Mar 27, 2026 at 11:43:57AM -0400, Zi Yan wrote:
> On 27 Mar 2026, at 11:29, Lorenzo Stoakes (Oracle) wrote:
>
> > On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
> >> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
> >>
> >>> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
> >>>> Replace it with a check on the max folio order of the file's address space
> >>>> mapping, making sure PMD_ORDER is supported.
> >>>>
> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>> ---
> >>>>  mm/huge_memory.c | 6 +++---
> >>>>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>> index c7873dbdc470..1da1467328a3 100644
> >>>> --- a/mm/huge_memory.c
> >>>> +++ b/mm/huge_memory.c
> >>>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>>>  {
> >>>>  	struct inode *inode;
> >>>>
> >>>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> >>>> -		return false;
> >>>> -
> >>>>  	if (!vma->vm_file)
> >>>>  		return false;
> >>>>
> >>>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>>>  	if (IS_ANON_FILE(inode))
> >>>>  		return false;
> >>>>
> >>>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
> >>>> +		return false;
> >>>> +
> >>>
> >>> At this point I think this should be a separate function quite honestly and
> >>> share it with 2/10's use, and then you can put the comment in here re: anon
> >>> shmem etc.
> >>>
> >>> Though that won't apply here of course as shmem_allowable_huge_orders() would
> >>> have been invoked :)
> >>>
> >>> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
> >>> unfortunate.
> >>>
> >>> Buuut having said that is this right actually?
> >>>
> >>> Because we have:
> >>>
> >>> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> >>> 			return orders;
> >>>
> >>> Above it, and now you're enabling huge folio file systems to do non-page fault
> >>> THP and that's err... isn't that quite a big change?
> >>
> >> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
> >> This patchset changes the condition from all FSes to FSes with large folio
> >> support.
> >
> > No, READ_ONLY_THP_FOR_FS operates differently.
> >
> > It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
> > mounted read-only.
> >
> > So due to:
> >
> > 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> > 			return orders;
> >
> > 		if (((!in_pf || smaps)) && file_thp_enabled(vma))
> > 			return orders;
> >
> >                       |    PF     | MADV_COLLAPSE | khugepaged |
> > 		      |-----------|---------------|------------|
> > large folio fs        |     ✓     |       x       |      x     |
> > READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
> >
> > After this change:
> >
> >                       |    PF     | MADV_COLLAPSE | khugepaged |
> > 		      |-----------|---------------|------------|
> > large folio fs        |     ✓     |       ✓       |      ?     |
> >
> > (I hope we're not enabling khugepaged for large folio fs - which shouldn't
> > be necessary anyway as we try to give them folios on page fault and they
> > use thp-friendly get_unused_area etc. :)
> >
> > We shouldn't be doing this.
> >
> > It should remain:
> >
> >                       |    PF     | MADV_COLLAPSE | khugepaged |
> > 		      |-----------|---------------|------------|
> > large folio fs        |     ✓     |       x       |      x     |
> >
> > If we're going to remove it, we should first _just remove it_, not
> > simultaneously increase the scope of what all the MADV_COLLAPSE code is
> > doing without any confidence in any of it working properly.
> >
> > And it makes the whole series misleading - you're actually _enabling_ a
> > feature not (only) _removing_ one.
>
> That is what my RFC patch does, but David and willy told me to do this. :)
> IIUC, with READ_ONLY_THP_FOR_FS, FSes with large folio support will
> get THP via MADV_COLLAPSE or khugepaged. So removing the code like I
> did in RFC would cause regressions.

OK I think we're dealing with a union of the two states here.

READ_ONLY_THP_FOR_FS is separate from large folio support, as checked by
file_thp_enabled():

static inline bool file_thp_enabled(struct vm_area_struct *vma)
{
	struct inode *inode;

	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
		return false;

	if (!vma->vm_file)
		return false;

	inode = file_inode(vma->vm_file);

	if (IS_ANON_FILE(inode))
		return false;

	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}

So actually:

                       |    PF     | MADV_COLLAPSE | khugepaged |
		       |-----------|---------------|------------|
 large folio fs        |     ✓     |       x       |      x     |
 READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
 both!                 |     ✓     |       ✓       |      ✓     |

(Where it's impllied it's a read-only mapping obviously for the later two
cases.)

Now without READ_ONLY_THP_FOR_FS you're going to:

                       |    PF     | MADV_COLLAPSE | khugepaged |
		       |-----------|---------------|------------|
 large folio fs        |     ✓     |       x       |      x     |
 large folio + r/o     |     ✓     |       ✓       |      ✓     |

And intentionally leaving behind the 'not large folio fs, r/o' case because
those file systems need to implement large folio support.

I guess we'll regress those users but we don't care?

I do think all this needs to be spelled out in the commit message though as it's
subtle.

Turns out this PitA config option is going to kick and scream a bit first before
it goes...

>
> I guess I need to rename the series to avoid confusion. How about?
>
> Remove read-only THP support for FSes without large folio support.

Yup that'd be better :)

Cheers, Lorenzo

>
> [1] https://lore.kernel.org/all/7382046f-7c58-4a3e-ab34-b2704355b7d5@kernel.org/
>
> >
> > So let's focus as David suggested on one thing at a time, incrementally.
> >
> > And let's please try and sort some of this confusing mess out in the code
> > if at all possible...
> >
> >>
> >> Will add a helper, mapping_support_pmd_folio(), for
> >> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
> >>
> >>>
> >>> So yeah probably no to this patch as is :) we should just drop
> >>> file_thp_enabled()?
> >>
> >>
> >>
> >>>
> >>>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> >>>>  }
> >>>>
> >>>> --
> >>>> 2.43.0
> >>>>
> >>
> >>
> >> Best Regards,
> >> Yan, Zi
> >
> > Cheers, Lorenzo
>
>
> Best Regards,
> Yan, Zi
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Zi Yan 6 days, 2 hours ago
On 27 Mar 2026, at 12:08, Lorenzo Stoakes (Oracle) wrote:

> On Fri, Mar 27, 2026 at 11:43:57AM -0400, Zi Yan wrote:
>> On 27 Mar 2026, at 11:29, Lorenzo Stoakes (Oracle) wrote:
>>
>>> On Fri, Mar 27, 2026 at 11:12:46AM -0400, Zi Yan wrote:
>>>> On 27 Mar 2026, at 8:42, Lorenzo Stoakes (Oracle) wrote:
>>>>
>>>>> On Thu, Mar 26, 2026 at 09:42:50PM -0400, Zi Yan wrote:
>>>>>> Replace it with a check on the max folio order of the file's address space
>>>>>> mapping, making sure PMD_ORDER is supported.
>>>>>>
>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>> ---
>>>>>>  mm/huge_memory.c | 6 +++---
>>>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index c7873dbdc470..1da1467328a3 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -89,9 +89,6 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>>>  {
>>>>>>  	struct inode *inode;
>>>>>>
>>>>>> -	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
>>>>>> -		return false;
>>>>>> -
>>>>>>  	if (!vma->vm_file)
>>>>>>  		return false;
>>>>>>
>>>>>> @@ -100,6 +97,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>>>>>  	if (IS_ANON_FILE(inode))
>>>>>>  		return false;
>>>>>>
>>>>>> +	if (mapping_max_folio_order(inode->i_mapping) < PMD_ORDER)
>>>>>> +		return false;
>>>>>> +
>>>>>
>>>>> At this point I think this should be a separate function quite honestly and
>>>>> share it with 2/10's use, and then you can put the comment in here re: anon
>>>>> shmem etc.
>>>>>
>>>>> Though that won't apply here of course as shmem_allowable_huge_orders() would
>>>>> have been invoked :)
>>>>>
>>>>> But no harm in refactoring it anyway, and the repetitive < PMD_ORDER stuff is
>>>>> unfortunate.
>>>>>
>>>>> Buuut having said that is this right actually?
>>>>>
>>>>> Because we have:
>>>>>
>>>>> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
>>>>> 			return orders;
>>>>>
>>>>> Above it, and now you're enabling huge folio file systems to do non-page fault
>>>>> THP and that's err... isn't that quite a big change?
>>>>
>>>> That is what READ_ONLY_THP_FOR_FS does, creating THPs after page faults, right?
>>>> This patchset changes the condition from all FSes to FSes with large folio
>>>> support.
>>>
>>> No, READ_ONLY_THP_FOR_FS operates differently.
>>>
>>> It explicitly _only_ is allowed for MADV_COLLAPSE and only if the file is
>>> mounted read-only.
>>>
>>> So due to:
>>>
>>> 		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
>>> 			return orders;
>>>
>>> 		if (((!in_pf || smaps)) && file_thp_enabled(vma))
>>> 			return orders;
>>>
>>>                       |    PF     | MADV_COLLAPSE | khugepaged |
>>> 		      |-----------|---------------|------------|
>>> large folio fs        |     ✓     |       x       |      x     |
>>> READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
>>>
>>> After this change:
>>>
>>>                       |    PF     | MADV_COLLAPSE | khugepaged |
>>> 		      |-----------|---------------|------------|
>>> large folio fs        |     ✓     |       ✓       |      ?     |
>>>
>>> (I hope we're not enabling khugepaged for large folio fs - which shouldn't
>>> be necessary anyway as we try to give them folios on page fault and they
>>> use thp-friendly get_unused_area etc. :)
>>>
>>> We shouldn't be doing this.
>>>
>>> It should remain:
>>>
>>>                       |    PF     | MADV_COLLAPSE | khugepaged |
>>> 		      |-----------|---------------|------------|
>>> large folio fs        |     ✓     |       x       |      x     |
>>>
>>> If we're going to remove it, we should first _just remove it_, not
>>> simultaneously increase the scope of what all the MADV_COLLAPSE code is
>>> doing without any confidence in any of it working properly.
>>>
>>> And it makes the whole series misleading - you're actually _enabling_ a
>>> feature not (only) _removing_ one.
>>
>> That is what my RFC patch does, but David and willy told me to do this. :)
>> IIUC, with READ_ONLY_THP_FOR_FS, FSes with large folio support will
>> get THP via MADV_COLLAPSE or khugepaged. So removing the code like I
>> did in RFC would cause regressions.
>
> OK I think we're dealing with a union of the two states here.
>
> READ_ONLY_THP_FOR_FS is separate from large folio support, as checked by
> file_thp_enabled():
>
> static inline bool file_thp_enabled(struct vm_area_struct *vma)
> {
> 	struct inode *inode;
>
> 	if (!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
> 		return false;
>
> 	if (!vma->vm_file)
> 		return false;
>
> 	inode = file_inode(vma->vm_file);
>
> 	if (IS_ANON_FILE(inode))
> 		return false;
>
> 	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
> So actually:
>
>                        |    PF     | MADV_COLLAPSE | khugepaged |
> 		       |-----------|---------------|------------|
>  large folio fs        |     ✓     |       x       |      x     |
>  READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
>  both!                 |     ✓     |       ✓       |      ✓     |
>
> (Where it's impllied it's a read-only mapping obviously for the later two
> cases.)
>
> Now without READ_ONLY_THP_FOR_FS you're going to:
>
>                        |    PF     | MADV_COLLAPSE | khugepaged |
> 		       |-----------|---------------|------------|
>  large folio fs        |     ✓     |       x       |      x     |
>  large folio + r/o     |     ✓     |       ✓       |      ✓     |
>
> And intentionally leaving behind the 'not large folio fs, r/o' case because
> those file systems need to implement large folio support.
>
> I guess we'll regress those users but we don't care?

Yes. This also motivates FSes without large folio support to add large folio
support instead of relying on READ_ONLY_THP_FOR_FS hack.

>
> I do think all this needs to be spelled out in the commit message though as it's
> subtle.
>
> Turns out this PitA config option is going to kick and scream a bit first before
> it goes...

Sure. I will shameless steal your tables. Thank you for the contribution. ;)

>
>>
>> I guess I need to rename the series to avoid confusion. How about?
>>
>> Remove read-only THP support for FSes without large folio support.
>
> Yup that'd be better :)
>
> Cheers, Lorenzo
>
>>
>> [1] https://lore.kernel.org/all/7382046f-7c58-4a3e-ab34-b2704355b7d5@kernel.org/
>>
>>>
>>> So let's focus as David suggested on one thing at a time, incrementally.
>>>
>>> And let's please try and sort some of this confusing mess out in the code
>>> if at all possible...
>>>
>>>>
>>>> Will add a helper, mapping_support_pmd_folio(), for
>>>> mapping_max_folio_order(inode->i_mapping) < PMD_ORDER.
>>>>
>>>>>
>>>>> So yeah probably no to this patch as is :) we should just drop
>>>>> file_thp_enabled()?
>>>>
>>>>
>>>>
>>>>>
>>>>>>  	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>>>>>>  }
>>>>>>
>>>>>> --
>>>>>> 2.43.0
>>>>>>
>>>>
>>>>
>>>> Best Regards,
>>>> Yan, Zi
>>>
>>> Cheers, Lorenzo
>>
>>
>> Best Regards,
>> Yan, Zi


Best Regards,
Yan, Zi
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by WANG Rui 4 days, 14 hours ago
Hi Zi,

>> Now without READ_ONLY_THP_FOR_FS you're going to:
>>
>>                        |    PF     | MADV_COLLAPSE | khugepaged |
>> 		       |-----------|---------------|------------|
>>  large folio fs        |     ✓     |       x       |      x     |
>>  large folio + r/o     |     ✓     |       ✓       |      ✓     |
>>
>> And intentionally leaving behind the 'not large folio fs, r/o' case because
>> those file systems need to implement large folio support.
>>
>> I guess we'll regress those users but we don't care?
>
> Yes. This also motivates FSes without large folio support to add large folio
> support instead of relying on READ_ONLY_THP_FOR_FS hack.

Interesting, thanks for making this feature unconditional.

From my experiments, this is going to be a performance regression.

Before this patch, even when the filesystem (e.g. btrfs without experimental)
didn't support large folios, READ_ONLY_THP_FOR_FS still allowed read-only
file-backed code segments to be collapsed into huge page mappings via khugepaged.

After this patch, FilePmdMapped will always be 0 unless the filesystem supports
large folios up to PMD order, and it doesn't look like that support will arrive
anytime soon [1].

Is there a reason we can't keep this hack while continuing to push filesystems
toward proper large folio support?

I'm currently working on making the ELF loader more THP-friendly by adjusting
the virtual address alignment of read-only code segments [2]. The data shows a
noticeable drop in iTLB misses, especially for programs whose text size is just
slightly larger than PMD_SIZE. That size profile is actually quite common for
real-world binaries when using 2M huge pages. This optimization relies on
READ_ONLY_THP_FOR_FS. If the availability of huge page mappings for code segments
ends up depending on filesystem support, it will be much harder to take advantage
of this in practice. [3]

[1] https://lore.kernel.org/linux-fsdevel/ab2IIwKzmK9qwIlZ@casper.infradead.org/
[2] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc/
[3] https://lore.kernel.org/linux-fsdevel/20260320160519.80962-1-r@hev.cc/

Thanks,
Rui
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Lorenzo Stoakes (Oracle) 3 days, 7 hours ago
On Sun, Mar 29, 2026 at 12:07:41PM +0800, WANG Rui wrote:
> Hi Zi,
>
> >> Now without READ_ONLY_THP_FOR_FS you're going to:
> >>
> >>                        |    PF     | MADV_COLLAPSE | khugepaged |
> >> 		       |-----------|---------------|------------|
> >>  large folio fs        |     ✓     |       x       |      x     |
> >>  large folio + r/o     |     ✓     |       ✓       |      ✓     |
> >>
> >> And intentionally leaving behind the 'not large folio fs, r/o' case because
> >> those file systems need to implement large folio support.
> >>
> >> I guess we'll regress those users but we don't care?
> >
> > Yes. This also motivates FSes without large folio support to add large folio
> > support instead of relying on READ_ONLY_THP_FOR_FS hack.
>
> Interesting, thanks for making this feature unconditional.
>
> From my experiments, this is going to be a performance regression.
>
> Before this patch, even when the filesystem (e.g. btrfs without experimental)
> didn't support large folios, READ_ONLY_THP_FOR_FS still allowed read-only
> file-backed code segments to be collapsed into huge page mappings via khugepaged.
>
> After this patch, FilePmdMapped will always be 0 unless the filesystem supports
> large folios up to PMD order, and it doesn't look like that support will arrive
> anytime soon [1].

I think Matthew was being a little sarcastic there ;) but I suppose it's
hinting at the fact they need to get a move on.

>
> Is there a reason we can't keep this hack while continuing to push filesystems
> toward proper large folio support?

IMO - It's time for us to stop allowing filesystems to fail to implement what
mm requires of them, while still providing a hack to improve performance.

Really this hack shouldn't have been there in the first place, but it was a
'putting on notice' that filesystems need to support large folios, which
has been made amply clear to them for some time.

So yes there will be regressions for filesystems which _still_ do not
implement this, I'd suggest you focus on trying to convince them to do so
(or send patches :)

>
> I'm currently working on making the ELF loader more THP-friendly by adjusting
> the virtual address alignment of read-only code segments [2]. The data shows a
> noticeable drop in iTLB misses, especially for programs whose text size is just
> slightly larger than PMD_SIZE. That size profile is actually quite common for
> real-world binaries when using 2M huge pages. This optimization relies on
> READ_ONLY_THP_FOR_FS. If the availability of huge page mappings for code segments
> ends up depending on filesystem support, it will be much harder to take advantage
> of this in practice. [3]

Yeah, again IMO - sorry, but tough.

This is something filesystems need to implement, if they fail to do so,
that's on them.

>
> [1] https://lore.kernel.org/linux-fsdevel/ab2IIwKzmK9qwIlZ@casper.infradead.org/
> [2] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc/
> [3] https://lore.kernel.org/linux-fsdevel/20260320160519.80962-1-r@hev.cc/
>
> Thanks,
> Rui

Cheers, Lorenzo
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Zi Yan 3 days, 4 hours ago
On 30 Mar 2026, at 7:17, Lorenzo Stoakes (Oracle) wrote:

> On Sun, Mar 29, 2026 at 12:07:41PM +0800, WANG Rui wrote:
>> Hi Zi,
>>
>>>> Now without READ_ONLY_THP_FOR_FS you're going to:
>>>>
>>>>                        |    PF     | MADV_COLLAPSE | khugepaged |
>>>> 		       |-----------|---------------|------------|
>>>>  large folio fs        |     ✓     |       x       |      x     |
>>>>  large folio + r/o     |     ✓     |       ✓       |      ✓     |
>>>>
>>>> And intentionally leaving behind the 'not large folio fs, r/o' case because
>>>> those file systems need to implement large folio support.
>>>>
>>>> I guess we'll regress those users but we don't care?
>>>
>>> Yes. This also motivates FSes without large folio support to add large folio
>>> support instead of relying on READ_ONLY_THP_FOR_FS hack.
>>
>> Interesting, thanks for making this feature unconditional.
>>
>> From my experiments, this is going to be a performance regression.
>>
>> Before this patch, even when the filesystem (e.g. btrfs without experimental)
>> didn't support large folios, READ_ONLY_THP_FOR_FS still allowed read-only
>> file-backed code segments to be collapsed into huge page mappings via khugepaged.
>>
>> After this patch, FilePmdMapped will always be 0 unless the filesystem supports
>> large folios up to PMD order, and it doesn't look like that support will arrive
>> anytime soon [1].
>
> I think Matthew was being a little sarcastic there ;) but I suppose it's
> hinting at the fact they need to get a move on.
>
>>
>> Is there a reason we can't keep this hack while continuing to push filesystems
>> toward proper large folio support?
>
> IMO - It's time for us to stop allowing filesystems to fail to implement what
> mm requires of them, while still providing a hack to improve performance.
>
> Really this hack shouldn't have been there in the first place, but it was a
> 'putting on notice' that filesystems need to support large folios, which
> has been made amply clear to them for some time.
>
> So yes there will be regressions for filesystems which _still_ do not
> implement this, I'd suggest you focus on trying to convince them to do so
> (or send patches :)
>

Thank Lorenzo for clarifying the intention of this patchset.


Hi Rui,

READ_ONLY_THP_FOR_FS is an experimental feature since 2019 and that means the
feature can go away at any time.

In addition, Matthew has made a heads-up on its removal [1] several months ago.
We have not heard any objection since.

It seems that you care about btrfs with large folio support. Have you
talked to btrfs people on the timeline of moving the large folio support out
of the experimental state?


[1] https://lore.kernel.org/all/aTJg9vOijOGVTnVt@casper.infradead.org/


>>
>> I'm currently working on making the ELF loader more THP-friendly by adjusting
>> the virtual address alignment of read-only code segments [2]. The data shows a
>> noticeable drop in iTLB misses, especially for programs whose text size is just
>> slightly larger than PMD_SIZE. That size profile is actually quite common for
>> real-world binaries when using 2M huge pages. This optimization relies on
>> READ_ONLY_THP_FOR_FS. If the availability of huge page mappings for code segments
>> ends up depending on filesystem support, it will be much harder to take advantage
>> of this in practice. [3]
>
> Yeah, again IMO - sorry, but tough.
>
> This is something filesystems need to implement, if they fail to do so,
> that's on them.
>
>>
>> [1] https://lore.kernel.org/linux-fsdevel/ab2IIwKzmK9qwIlZ@casper.infradead.org/
>> [2] https://lore.kernel.org/linux-fsdevel/20260313005211.882831-1-r@hev.cc/
>> [3] https://lore.kernel.org/linux-fsdevel/20260320160519.80962-1-r@hev.cc/
>>
>> Thanks,
>> Rui
>
> Cheers, Lorenzo


--
Best Regards,
Yan, Zi
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by WANG Rui 3 days, 2 hours ago
Hi Lorenzo and Zi,

>>> Is there a reason we can't keep this hack while continuing to push filesystems
>>> toward proper large folio support?
>>
>> IMO - It's time for us to stop allowing filesystems to fail to implement what
>> mm requires of them, while still providing a hack to improve performance.
>>
>> Really this hack shouldn't have been there in the first place, but it was a
>> 'putting on notice' that filesystems need to support large folios, which
>> has been made amply clear to them for some time.
>>
>> So yes there will be regressions for filesystems which _still_ do not
>> implement this, I'd suggest you focus on trying to convince them to do so
>> (or send patches :)
>>
>
> Thank Lorenzo for clarifying the intention of this patchset.
>
> Hi Rui,
>
> READ_ONLY_THP_FOR_FS is an experimental feature since 2019 and that means the
> feature can go away at any time.
>
> In addition, Matthew has made a heads-up on its removal [1] several months ago.
> We have not heard any objection since.
>
> It seems that you care about btrfs with large folio support. Have you
> talked to btrfs people on the timeline of moving the large folio support out
> of the experimental state?
>
>
> [1] https://lore.kernel.org/all/aTJg9vOijOGVTnVt@casper.infradead.org/

Thanks for the clarification.

I fully agree with the long-term direction here. Ideally this should be
handled by filesystems, and mm has already done a lot of work to make
that possible.

However, in practice it does not look like simply enabling an
experimental feature is sufficient today. I did a quick check of
mapping_max_folio_size() across a few common filesystems, and only XFS
consistently reaches PMD order under both 4K and 16K base pages.
Even ext4 falls short under 16K.

PAGE_SIZE = 4K, PMD_SIZE = 2M

Filesystem                     mapping_max_folio_size   PMD order
------------------------------------------------------------------
ext4                           2M                       yes
btrfs (without experimental)   4K                       no
btrfs (with experimental)      256K                     no
xfs                            2M                       yes

PAGE_SIZE = 16K, PMD_SIZE = 32M

Filesystem                     mapping_max_folio_size   PMD order
------------------------------------------------------------------
ext4                           8M                       no
btrfs (without experimental)   16K                      no
btrfs (with experimental)      256K                     no
xfs                            32M                      yes

Given the diversity of filesystems in use, each one requires dedicated
engineering effort to implement and validate large folio support, and
that assumes both sufficient resources and prioritization on the
filesystem side. Even after support lands, coverage across different
base page sizes and configurations may take additional time to mature.

What I am really concerned about is the transition period: if filesystem
support is not yet broadly ready, while we have already removed the
fallback path, we may end up in a situation where PMD-sized mappings
become effectively unavailable on many systems for some time.

This is not about the long-term direction, but about the timing and
practical readiness.

Thanks,
Rui
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Matthew Wilcox 3 days, 2 hours ago
On Tue, Mar 31, 2026 at 12:09:42AM +0800, WANG Rui wrote:
> Given the diversity of filesystems in use, each one requires dedicated
> engineering effort to implement and validate large folio support, and
> that assumes both sufficient resources and prioritization on the
> filesystem side. Even after support lands, coverage across different
> base page sizes and configurations may take additional time to mature.
> 
> What I am really concerned about is the transition period: if filesystem
> support is not yet broadly ready, while we have already removed the
> fallback path, we may end up in a situation where PMD-sized mappings
> become effectively unavailable on many systems for some time.
> 
> This is not about the long-term direction, but about the timing and
> practical readiness.

If we leave this fallback in place, we'll never get filesystems to move
forward.  It's time to rip off this bandaid; they've got eight months
before the next stable kernel.  I've talked to them about it for years

LSFMM 2022: https://lwn.net/Articles/893512/
LSFMM 2023: https://lwn.net/Articles/931794/
LSFMM 2024: https://lwn.net/Articles/973565/
LSFMM 2025: https://lwn.net/Articles/1015320/

(and earlier, but I think I've made my point)
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by David Hildenbrand (Arm) 1 day, 4 hours ago
On 3/30/26 18:19, Matthew Wilcox wrote:
> On Tue, Mar 31, 2026 at 12:09:42AM +0800, WANG Rui wrote:
>> Given the diversity of filesystems in use, each one requires dedicated
>> engineering effort to implement and validate large folio support, and
>> that assumes both sufficient resources and prioritization on the
>> filesystem side. Even after support lands, coverage across different
>> base page sizes and configurations may take additional time to mature.
>>
>> What I am really concerned about is the transition period: if filesystem
>> support is not yet broadly ready, while we have already removed the
>> fallback path, we may end up in a situation where PMD-sized mappings
>> become effectively unavailable on many systems for some time.
>>
>> This is not about the long-term direction, but about the timing and
>> practical readiness.
> 
> If we leave this fallback in place, we'll never get filesystems to move
> forward.  It's time to rip off this bandaid; they've got eight months
> before the next stable kernel.

I guess if we don't force them to work on it I guess this will never
happen. They shouldn't be holding our THP hacks we want to remove hostage.

-- 
Cheers,

David
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Darrick J. Wong 1 day, 4 hours ago
On Wed, Apr 01, 2026 at 04:38:21PM +0200, David Hildenbrand (Arm) wrote:
> On 3/30/26 18:19, Matthew Wilcox wrote:
> > On Tue, Mar 31, 2026 at 12:09:42AM +0800, WANG Rui wrote:
> >> Given the diversity of filesystems in use, each one requires dedicated
> >> engineering effort to implement and validate large folio support, and
> >> that assumes both sufficient resources and prioritization on the
> >> filesystem side. Even after support lands, coverage across different
> >> base page sizes and configurations may take additional time to mature.
> >>
> >> What I am really concerned about is the transition period: if filesystem
> >> support is not yet broadly ready, while we have already removed the
> >> fallback path, we may end up in a situation where PMD-sized mappings
> >> become effectively unavailable on many systems for some time.
> >>
> >> This is not about the long-term direction, but about the timing and
> >> practical readiness.
> > 
> > If we leave this fallback in place, we'll never get filesystems to move
> > forward.  It's time to rip off this bandaid; they've got eight months
> > before the next stable kernel.
> 
> I guess if we don't force them to work on it I guess this will never
> happen. They shouldn't be holding our THP hacks we want to remove hostage.

+1.  There are too many filesystems for the ever shrinking number of
filesystem maintainers so the work won't get done without leverage.
Leverage, as in "hey why did my fault counts go up?"

--D

> -- 
> Cheers,
> 
> David
>
Re: [PATCH v1 05/10] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled()
Posted by Lorenzo Stoakes (Oracle) 6 days, 2 hours ago
On Fri, Mar 27, 2026 at 12:12:04PM -0400, Zi Yan wrote:
> On 27 Mar 2026, at 12:08, Lorenzo Stoakes (Oracle) wrote:
> > So actually:
> >
> >                        |    PF     | MADV_COLLAPSE | khugepaged |
> > 		       |-----------|---------------|------------|
> >  large folio fs        |     ✓     |       x       |      x     |
> >  READ_ONLY_THP_FOR_FS  |     x     |       ✓       |      ✓     |
> >  both!                 |     ✓     |       ✓       |      ✓     |
> >
> > (Where it's impllied it's a read-only mapping obviously for the later two
> > cases.)
> >
> > Now without READ_ONLY_THP_FOR_FS you're going to:
> >
> >                        |    PF     | MADV_COLLAPSE | khugepaged |
> > 		       |-----------|---------------|------------|
> >  large folio fs        |     ✓     |       x       |      x     |
> >  large folio + r/o     |     ✓     |       ✓       |      ✓     |
> >
> > And intentionally leaving behind the 'not large folio fs, r/o' case because
> > those file systems need to implement large folio support.
> >
> > I guess we'll regress those users but we don't care?
>
> Yes. This also motivates FSes without large folio support to add large folio
> support instead of relying on READ_ONLY_THP_FOR_FS hack.

Ack that's something I can back :)

>
> >
> > I do think all this needs to be spelled out in the commit message though as it's
> > subtle.
> >
> > Turns out this PitA config option is going to kick and scream a bit first before
> > it goes...
>
> Sure. I will shameless steal your tables. Thank you for the contribution. ;)
>

Haha good I love to spread ASCII art :)

Cheers, Lorenzo