[PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled

Baolin Wang posted 2 patches 6 months, 2 weeks ago
There is a newer version of this series
[PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
Posted by Baolin Wang 6 months, 2 weeks ago
The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
will still attempt to collapse into a shmem THP. This violates the rule we have
agreed upon: never means never.

Another rule for madvise, referring to David's suggestion: “allowing for collapsing
in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".

Then the current strategy is:

For shmem, if none of always, madvise, within_size, and inherit have enabled
PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.

For tmpfs, if the mount option is set with the 'huge=never' parameter, then
MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.

Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/huge_memory.c | 2 +-
 mm/shmem.c       | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d3e66136e41a..a8cfa37cae72 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
 	 * own flags.
 	 */
 	if (!in_pf && shmem_file(vma->vm_file))
-		return shmem_allowable_huge_orders(file_inode(vma->vm_file),
+		return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
 						   vma, vma->vm_pgoff, 0,
 						   !enforce_sysfs);
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 4b42419ce6b2..9af45d4e27e6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
 		return 0;
 	if (shmem_huge == SHMEM_HUGE_DENY)
 		return 0;
-	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
+	if (shmem_huge == SHMEM_HUGE_FORCE)
 		return maybe_pmd_order;
 
 	/*
@@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
 
 		fallthrough;
 	case SHMEM_HUGE_ADVISE:
-		if (vm_flags & VM_HUGEPAGE)
+		if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
 			return maybe_pmd_order;
 		fallthrough;
 	default:
@@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
 	/* Allow mTHP that will be fully within i_size. */
 	mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
 
-	if (vm_flags & VM_HUGEPAGE)
+	if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
 		mask |= READ_ONCE(huge_shmem_orders_madvise);
 
 	if (global_orders > 0)
-- 
2.43.5

Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
Posted by Lorenzo Stoakes 6 months, 2 weeks ago
On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a shmem THP. This violates the rule we have
> agreed upon: never means never.

Ugh that we have separate shmem logic. And split between huge_memory.c and
shmem.c too :)) Again, not your fault, just a general moan about existing
stuff :P

>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".

Hm I'm not sure if this is enforced is it? I may have missed something here
however.

>
> Then the current strategy is:
>
> For shmem, if none of always, madvise, within_size, and inherit have enabled
> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.

Again, is this just MADV_COLLAPSE? Surely this is a general change?

We should be clear that we are not explicitly limiting ourselves to
MADV_COLLAPSE here.

You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
TVA_ENFORCE_SYSFS and that's the key difference here.

>
> For tmpfs, if the mount option is set with the 'huge=never' parameter, then
> MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/huge_memory.c | 2 +-
>  mm/shmem.c       | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d3e66136e41a..a8cfa37cae72 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>  	 * own flags.
>  	 */
>  	if (!in_pf && shmem_file(vma->vm_file))
> -		return shmem_allowable_huge_orders(file_inode(vma->vm_file),
> +		return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
>  						   vma, vma->vm_pgoff, 0,
>  						   !enforce_sysfs);

Did you mean to do &&?

Also, is this achieving what you want to achieve? Is it necessary? The
changes in patch 1/2 enforce the global settings and before this code in
__thp_vma_allowable_orders():

unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
					 unsigned long vm_flags,
					 unsigned long tva_flags,
					 unsigned long orders)
{
	... (no early exits) ...

	orders &= supported_orders;
	if (!orders)
		return 0;

	...
}

So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
is the only caller of __thp_vma_allowable_orders() then we _always_ exit
early here before we get to this shmem_allowable_huge_orders() code.

>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4b42419ce6b2..9af45d4e27e6 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>  		return 0;
>  	if (shmem_huge == SHMEM_HUGE_DENY)
>  		return 0;
> -	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
> +	if (shmem_huge == SHMEM_HUGE_FORCE)
>  		return maybe_pmd_order;
>
>  	/*
> @@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>
>  		fallthrough;
>  	case SHMEM_HUGE_ADVISE:
> -		if (vm_flags & VM_HUGEPAGE)
> +		if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>  			return maybe_pmd_order;
>  		fallthrough;
>  	default:
> @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>  	/* Allow mTHP that will be fully within i_size. */
>  	mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
>
> -	if (vm_flags & VM_HUGEPAGE)
> +	if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>  		mask |= READ_ONCE(huge_shmem_orders_madvise);

I'm also not sure these are necessary:

The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
-> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
cover off this case by early exiting __thp_vma_allowable_orders() if orders
== 0 as established in patch 1/2.



>
>  	if (global_orders > 0)
> --
> 2.43.5
>
Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
Posted by Baolin Wang 6 months, 1 week ago

On 2025/6/7 20:14, Lorenzo Stoakes wrote:
> On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
>> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
>> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
>> will still attempt to collapse into a shmem THP. This violates the rule we have
>> agreed upon: never means never.
> 
> Ugh that we have separate shmem logic. And split between huge_memory.c and
> shmem.c too :)) Again, not your fault, just a general moan about existing
> stuff :P
> 
>>
>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> 
> Hm I'm not sure if this is enforced is it? I may have missed something here
> however.
> 
>>
>> Then the current strategy is:
>>
>> For shmem, if none of always, madvise, within_size, and inherit have enabled
>> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
> 
> Again, is this just MADV_COLLAPSE? Surely this is a general change?
> 
> We should be clear that we are not explicitly limiting ourselves to
> MADV_COLLAPSE here.
> 
> You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
> TVA_ENFORCE_SYSFS and that's the key difference here.

Sure.

>> For tmpfs, if the mount option is set with the 'huge=never' parameter, then
>> MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>>
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/huge_memory.c | 2 +-
>>   mm/shmem.c       | 6 +++---
>>   2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index d3e66136e41a..a8cfa37cae72 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>   	 * own flags.
>>   	 */
>>   	if (!in_pf && shmem_file(vma->vm_file))
>> -		return shmem_allowable_huge_orders(file_inode(vma->vm_file),
>> +		return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
>>   						   vma, vma->vm_pgoff, 0,
>>   						   !enforce_sysfs);
> 
> Did you mean to do &&?

No. It might be worth having a separate fix patch here, because the 
original logic is incorrect and needs to perform an '&' operation with 
’orders‘.

This change should be a general change.

> Also, is this achieving what you want to achieve? Is it necessary? The
> changes in patch 1/2 enforce the global settings and before this code in
> __thp_vma_allowable_orders():
> 
> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> 					 unsigned long vm_flags,
> 					 unsigned long tva_flags,
> 					 unsigned long orders)
> {
> 	... (no early exits) ...
> 
> 	orders &= supported_orders;
> 	if (!orders)
> 		return 0;
> 
> 	...
> }
> 
> So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
> is the only caller of __thp_vma_allowable_orders() then we _always_ exit
> early here before we get to this shmem_allowable_huge_orders() code.

Not for this case. Since shmem already supports mTHP, this is to check 
whether the 'orders' are enabled in the shmem mTHP configuration. For 
example, shmem mTHP might only enable 64K mTHP, which obviously does not 
allow PMD-sized THP to collapse.

>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 4b42419ce6b2..9af45d4e27e6 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>>   		return 0;
>>   	if (shmem_huge == SHMEM_HUGE_DENY)
>>   		return 0;
>> -	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
>> +	if (shmem_huge == SHMEM_HUGE_FORCE)
>>   		return maybe_pmd_order;
>>
>>   	/*
>> @@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>>
>>   		fallthrough;
>>   	case SHMEM_HUGE_ADVISE:
>> -		if (vm_flags & VM_HUGEPAGE)
>> +		if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>>   			return maybe_pmd_order;
>>   		fallthrough;
>>   	default:
>> @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>>   	/* Allow mTHP that will be fully within i_size. */
>>   	mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
>>
>> -	if (vm_flags & VM_HUGEPAGE)
>> +	if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>>   		mask |= READ_ONCE(huge_shmem_orders_madvise);
> 
> I'm also not sure these are necessary:
> 
> The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
> -> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
> only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
> cover off this case by early exiting __thp_vma_allowable_orders() if orders
> == 0 as established in patch 1/2.

Not ture. Shmem has its own separate mTHP sysfs setting, which is 
different from the mTHP sysfs setting for anonymous pages mentioned 
earlier. These checks are in the shmem file. You can check more for 
shmem mTHP in Documentation/admin-guide/mm/transhuge.rst :)
Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
Posted by Lorenzo Stoakes 6 months, 1 week ago
OK overall the logic looks good now I realise the mistake I made is that
the thp_vma_allowable_orders() check is for the vma_is_anonymous() case
only.

On Mon, Jun 09, 2025 at 02:31:46PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/7 20:14, Lorenzo Stoakes wrote:
> > On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
> > > The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
> > > means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
> > > will still attempt to collapse into a shmem THP. This violates the rule we have
> > > agreed upon: never means never.
> >
> > Ugh that we have separate shmem logic. And split between huge_memory.c and
> > shmem.c too :)) Again, not your fault, just a general moan about existing
> > stuff :P
> >
> > >
> > > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> >
> > Hm I'm not sure if this is enforced is it? I may have missed something here
> > however.
> >
> > >
> > > Then the current strategy is:
> > >
> > > For shmem, if none of always, madvise, within_size, and inherit have enabled
> > > PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
> >
> > Again, is this just MADV_COLLAPSE? Surely this is a general change?
> >
> > We should be clear that we are not explicitly limiting ourselves to
> > MADV_COLLAPSE here.
> >
> > You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
> > TVA_ENFORCE_SYSFS and that's the key difference here.
>
> Sure.
>

Thanks!

> > > For tmpfs, if the mount option is set with the 'huge=never' parameter, then
> > > MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
> > >
> > > Acked-by: Zi Yan <ziy@nvidia.com>
> > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > ---
> > >   mm/huge_memory.c | 2 +-
> > >   mm/shmem.c       | 6 +++---
> > >   2 files changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index d3e66136e41a..a8cfa37cae72 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > >   	 * own flags.
> > >   	 */
> > >   	if (!in_pf && shmem_file(vma->vm_file))
> > > -		return shmem_allowable_huge_orders(file_inode(vma->vm_file),
> > > +		return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
> > >   						   vma, vma->vm_pgoff, 0,
> > >   						   !enforce_sysfs);
> >
> > Did you mean to do &&?
>
> No. It might be worth having a separate fix patch here, because the original
> logic is incorrect and needs to perform an '&' operation with ’orders‘.
>
> This change should be a general change.

Ah yeah, I did think that _perhaps_ it could be. I think it would make
sense to separate out into another patch albeit very small, just so we can
separate your further changes from the fix for this.

>
> > Also, is this achieving what you want to achieve? Is it necessary? The
> > changes in patch 1/2 enforce the global settings and before this code in
> > __thp_vma_allowable_orders():
> >
> > unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > 					 unsigned long vm_flags,
> > 					 unsigned long tva_flags,
> > 					 unsigned long orders)
> > {
> > 	... (no early exits) ...
> >
> > 	orders &= supported_orders;
> > 	if (!orders)
> > 		return 0;
> >
> > 	...
> > }
> >
> > So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
> > is the only caller of __thp_vma_allowable_orders() then we _always_ exit
> > early here before we get to this shmem_allowable_huge_orders() code.
>
> Not for this case. Since shmem already supports mTHP, this is to check
> whether the 'orders' are enabled in the shmem mTHP configuration. For
> example, shmem mTHP might only enable 64K mTHP, which obviously does not
> allow PMD-sized THP to collapse.

Yeah sorry I get it now thp_vma_allowable_orders() does a
vma_is_anonymous() predicate. Doh! :P

God what a mess this is (not your fault, pre-existing obviously :P)

Yeah le sigh.

>
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 4b42419ce6b2..9af45d4e27e6 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
> > >   		return 0;
> > >   	if (shmem_huge == SHMEM_HUGE_DENY)
> > >   		return 0;
> > > -	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
> > > +	if (shmem_huge == SHMEM_HUGE_FORCE)
> > >   		return maybe_pmd_order;

OK I get it now, this means the !check sysfs doesn't just get
actioned... :)

> > >
> > >   	/*
> > > @@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
> > >
> > >   		fallthrough;
> > >   	case SHMEM_HUGE_ADVISE:
> > > -		if (vm_flags & VM_HUGEPAGE)
> > > +		if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
> > >   			return maybe_pmd_order;
> > >   		fallthrough;
> > >   	default:
> > > @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
> > >   	/* Allow mTHP that will be fully within i_size. */
> > >   	mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
> > >
> > > -	if (vm_flags & VM_HUGEPAGE)
> > > +	if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
> > >   		mask |= READ_ONCE(huge_shmem_orders_madvise);
> >
> > I'm also not sure these are necessary:
> >
> > The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
> > -> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
> > only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
> > cover off this case by early exiting __thp_vma_allowable_orders() if orders
> > == 0 as established in patch 1/2.
>
> Not ture. Shmem has its own separate mTHP sysfs setting, which is different
> from the mTHP sysfs setting for anonymous pages mentioned earlier. These
> checks are in the shmem file. You can check more for shmem mTHP in
> Documentation/admin-guide/mm/transhuge.rst :)

Ah yeah the issue is if (vma_is_anonymous())... doh!

The stack trace is correct though, this is the only place we do it:

~~

static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
					      loff_t write_end, bool shmem_huge_force,
					      struct vm_area_struct *vma,
					      unsigned long vm_flags)

Is called from shmem_getattr():

	if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
		stat->blksize = HPAGE_PMD_SIZE;

Note that smem_huge_force == false here

And shmem_allowable_huge_orders():

unsigned long shmem_allowable_huge_orders(struct inode *inode,
				struct vm_area_struct *vma, pgoff_t index,
				loff_t write_end, bool shmem_huge_force)

	global_orders = shmem_huge_global_enabled(inode, index, write_end,
						  shmem_huge_force, vma, vm_flags);

Which forwards 'shmem_huge_force'.

In shmem_get_folow_gfp():

	orders = shmem_allowable_huge_orders(inode, vma, index, write_end, false);

Note that shmem_huge_force == false.

In __thp_vma_allowable_orders();

unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
					 unsigned long vm_flags,
					 unsigned long tva_flags,
					 unsigned long orders)
{
	...
	bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;

	...

	if (!in_pf && shmem_file(vma->vm_file))
		return shmem_allowable_huge_orders(file_inode(vma->vm_file),
						   vma, vma->vm_pgoff, 0,
						   !enforce_sysfs);

So we set shmem_huge_force only if TVA_ENFORCE_SYSFS is not set in tva_flags passed to __thp_vma_allowable_orders()

The only caller of __thp_vma_allowable_orders() is thp_vma_allowable_orders().

~~

But yeah we do need to do shmem things... sorry my mistake.
Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
Posted by Baolin Wang 6 months, 1 week ago

On 2025/6/9 23:33, Lorenzo Stoakes wrote:
> OK overall the logic looks good now I realise the mistake I made is that
> the thp_vma_allowable_orders() check is for the vma_is_anonymous() case
> only.
> 
> On Mon, Jun 09, 2025 at 02:31:46PM +0800, Baolin Wang wrote:
>>
>>
>> On 2025/6/7 20:14, Lorenzo Stoakes wrote:
>>> On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
>>>> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
>>>> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
>>>> will still attempt to collapse into a shmem THP. This violates the rule we have
>>>> agreed upon: never means never.
>>>
>>> Ugh that we have separate shmem logic. And split between huge_memory.c and
>>> shmem.c too :)) Again, not your fault, just a general moan about existing
>>> stuff :P
>>>
>>>>
>>>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>
>>> Hm I'm not sure if this is enforced is it? I may have missed something here
>>> however.
>>>
>>>>
>>>> Then the current strategy is:
>>>>
>>>> For shmem, if none of always, madvise, within_size, and inherit have enabled
>>>> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>>>
>>> Again, is this just MADV_COLLAPSE? Surely this is a general change?
>>>
>>> We should be clear that we are not explicitly limiting ourselves to
>>> MADV_COLLAPSE here.
>>>
>>> You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
>>> TVA_ENFORCE_SYSFS and that's the key difference here.
>>
>> Sure.
>>
> 
> Thanks!
> 
>>>> For tmpfs, if the mount option is set with the 'huge=never' parameter, then
>>>> MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>>>>
>>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>>    mm/huge_memory.c | 2 +-
>>>>    mm/shmem.c       | 6 +++---
>>>>    2 files changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index d3e66136e41a..a8cfa37cae72 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>>    	 * own flags.
>>>>    	 */
>>>>    	if (!in_pf && shmem_file(vma->vm_file))
>>>> -		return shmem_allowable_huge_orders(file_inode(vma->vm_file),
>>>> +		return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
>>>>    						   vma, vma->vm_pgoff, 0,
>>>>    						   !enforce_sysfs);
>>>
>>> Did you mean to do &&?
>>
>> No. It might be worth having a separate fix patch here, because the original
>> logic is incorrect and needs to perform an '&' operation with ’orders‘.
>>
>> This change should be a general change.
> 
> Ah yeah, I did think that _perhaps_ it could be. I think it would make
> sense to separate out into another patch albeit very small, just so we can
> separate your further changes from the fix for this.

OK.

>>> Also, is this achieving what you want to achieve? Is it necessary? The
>>> changes in patch 1/2 enforce the global settings and before this code in
>>> __thp_vma_allowable_orders():
>>>
>>> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>> 					 unsigned long vm_flags,
>>> 					 unsigned long tva_flags,
>>> 					 unsigned long orders)
>>> {
>>> 	... (no early exits) ...
>>>
>>> 	orders &= supported_orders;
>>> 	if (!orders)
>>> 		return 0;
>>>
>>> 	...
>>> }
>>>
>>> So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
>>> is the only caller of __thp_vma_allowable_orders() then we _always_ exit
>>> early here before we get to this shmem_allowable_huge_orders() code.
>>
>> Not for this case. Since shmem already supports mTHP, this is to check
>> whether the 'orders' are enabled in the shmem mTHP configuration. For
>> example, shmem mTHP might only enable 64K mTHP, which obviously does not
>> allow PMD-sized THP to collapse.
> 
> Yeah sorry I get it now thp_vma_allowable_orders() does a
> vma_is_anonymous() predicate. Doh! :P
> 
> God what a mess this is (not your fault, pre-existing obviously :P)
> 
> Yeah le sigh.
> 
>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 4b42419ce6b2..9af45d4e27e6 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>>>>    		return 0;
>>>>    	if (shmem_huge == SHMEM_HUGE_DENY)
>>>>    		return 0;
>>>> -	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
>>>> +	if (shmem_huge == SHMEM_HUGE_FORCE)
>>>>    		return maybe_pmd_order;
> 
> OK I get it now, this means the !check sysfs doesn't just get
> actioned... :)
> 
>>>>
>>>>    	/*
>>>> @@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>>>>
>>>>    		fallthrough;
>>>>    	case SHMEM_HUGE_ADVISE:
>>>> -		if (vm_flags & VM_HUGEPAGE)
>>>> +		if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>>>>    			return maybe_pmd_order;
>>>>    		fallthrough;
>>>>    	default:
>>>> @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>>>>    	/* Allow mTHP that will be fully within i_size. */
>>>>    	mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
>>>>
>>>> -	if (vm_flags & VM_HUGEPAGE)
>>>> +	if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>>>>    		mask |= READ_ONCE(huge_shmem_orders_madvise);
>>>
>>> I'm also not sure these are necessary:
>>>
>>> The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
>>> -> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
>>> only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
>>> cover off this case by early exiting __thp_vma_allowable_orders() if orders
>>> == 0 as established in patch 1/2.
>>
>> Not ture. Shmem has its own separate mTHP sysfs setting, which is different
>> from the mTHP sysfs setting for anonymous pages mentioned earlier. These
>> checks are in the shmem file. You can check more for shmem mTHP in
>> Documentation/admin-guide/mm/transhuge.rst :)
> 
> Ah yeah the issue is if (vma_is_anonymous())... doh!
> 
> The stack trace is correct though, this is the only place we do it:
> 
> ~~
> 
> static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
> 					      loff_t write_end, bool shmem_huge_force,
> 					      struct vm_area_struct *vma,
> 					      unsigned long vm_flags)
> 
> Is called from shmem_getattr():
> 
> 	if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
> 		stat->blksize = HPAGE_PMD_SIZE;
> 
> Note that smem_huge_force == false here
> 
> And shmem_allowable_huge_orders():
> 
> unsigned long shmem_allowable_huge_orders(struct inode *inode,
> 				struct vm_area_struct *vma, pgoff_t index,
> 				loff_t write_end, bool shmem_huge_force)
> 
> 	global_orders = shmem_huge_global_enabled(inode, index, write_end,
> 						  shmem_huge_force, vma, vm_flags);
> 
> Which forwards 'shmem_huge_force'.
> 
> In shmem_get_folow_gfp():
> 
> 	orders = shmem_allowable_huge_orders(inode, vma, index, write_end, false);
> 
> Note that shmem_huge_force == false.
> 
> In __thp_vma_allowable_orders();
> 
> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> 					 unsigned long vm_flags,
> 					 unsigned long tva_flags,
> 					 unsigned long orders)
> {
> 	...
> 	bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
> 
> 	...
> 
> 	if (!in_pf && shmem_file(vma->vm_file))
> 		return shmem_allowable_huge_orders(file_inode(vma->vm_file),
> 						   vma, vma->vm_pgoff, 0,
> 						   !enforce_sysfs);
> 
> So we set shmem_huge_force only if TVA_ENFORCE_SYSFS is not set in tva_flags passed to __thp_vma_allowable_orders()
> 
> The only caller of __thp_vma_allowable_orders() is thp_vma_allowable_orders().

Right :)

> 
> But yeah we do need to do shmem things... sorry my mistake.

No worries. I appreciate your useful comments.
Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
Posted by Lorenzo Stoakes 6 months, 2 weeks ago
On Sat, Jun 07, 2025 at 01:14:41PM +0100, Lorenzo Stoakes wrote:
> On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
[snip]
> >
> > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> Hm I'm not sure if this is enforced is it? I may have missed something here
> however.

Oh right actually I think it is implicitly - if TVA_ENFORCE_SYSFS is not
specified in tva_flags, then we don't bother applying an madvise filter at all
anyway, and we account for that in our 'enabled' check in
thp_vma_allowable_orders().

But I don't think this patch changes anything, I actually _think_ we can just
drop this one.

[snip]
Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
Posted by Baolin Wang 6 months, 1 week ago

On 2025/6/7 20:17, Lorenzo Stoakes wrote:
> On Sat, Jun 07, 2025 at 01:14:41PM +0100, Lorenzo Stoakes wrote:
>> On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
> [snip]
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> Hm I'm not sure if this is enforced is it? I may have missed something here
>> however.
> 
> Oh right actually I think it is implicitly - if TVA_ENFORCE_SYSFS is not
> specified in tva_flags, then we don't bother applying an madvise filter at all
> anyway, and we account for that in our 'enabled' check in
> thp_vma_allowable_orders().
> 
> But I don't think this patch changes anything, I actually _think_ we can just
> drop this one.

See my previous replies. Shmem mTHP sysfs settings are different from 
Anonymous pages.

Thanks for reviewing.
Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
Posted by Lorenzo Stoakes 6 months, 1 week ago
On Mon, Jun 09, 2025 at 02:34:22PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/7 20:17, Lorenzo Stoakes wrote:
> > On Sat, Jun 07, 2025 at 01:14:41PM +0100, Lorenzo Stoakes wrote:
> > > On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
> > [snip]
> > > >
> > > > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> > >
> > > Hm I'm not sure if this is enforced is it? I may have missed something here
> > > however.
> >
> > Oh right actually I think it is implicitly - if TVA_ENFORCE_SYSFS is not
> > specified in tva_flags, then we don't bother applying an madvise filter at all
> > anyway, and we account for that in our 'enabled' check in
> > thp_vma_allowable_orders().
> >
> > But I don't think this patch changes anything, I actually _think_ we can just
> > drop this one.
>
> See my previous replies. Shmem mTHP sysfs settings are different from
> Anonymous pages.

Yeah sorry you're right, responded elsewhere in thread.

>
> Thanks for reviewing.

You're welcome!